CN113469279A - Method, system and device for amplifying character sample set - Google Patents

Method, system and device for amplifying character sample set Download PDF

Info

Publication number
CN113469279A
CN113469279A CN202110829494.4A CN202110829494A CN113469279A CN 113469279 A CN113469279 A CN 113469279A CN 202110829494 A CN202110829494 A CN 202110829494A CN 113469279 A CN113469279 A CN 113469279A
Authority
CN
China
Prior art keywords
character
amplification
samples
sample
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110829494.4A
Other languages
Chinese (zh)
Inventor
王博帝
姚毅
杨艺
全煜鸣
金刚
彭斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Lingyun Shixun Technology Co ltd
Luster LightTech Co Ltd
Original Assignee
Shenzhen Lingyun Shixun Technology Co ltd
Luster LightTech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Lingyun Shixun Technology Co ltd, Luster LightTech Co Ltd filed Critical Shenzhen Lingyun Shixun Technology Co ltd
Priority to CN202110829494.4A priority Critical patent/CN113469279A/en
Publication of CN113469279A publication Critical patent/CN113469279A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The application discloses method, system and device for amplifying character sample set, the method includes, obtains the character sample set, the character sample set includes a certain amount of character samples, judges in proper order whether the character sample is the dot matrix character sample, if the character sample is the dot matrix character sample, it is right the character sample carries out morphological processing, obtains the continuous character sample, right the continuous character sample carries out the amplification operation, obtains the continuous character sample after the amplification, will the continuous character sample after the amplification adds in the character sample set. The character sample set is amplified through multiple amplification modes on the basis of the existing character sample set, so that the magnitude of the character samples in the character sample set is improved, and the identification precision of image processing is further improved.

Description

Method, system and device for amplifying character sample set
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, a system, and an apparatus for amplifying a character sample set.
Background
The text is transmitted through various carriers, and people observe the text to acquire information. By applying machine vision instead of human eyes, efficient recognition of text in images has become an important component of automated production.
The text recognition based on machine learning is mainly recognized by a single-character classification method, and a certain-magnitude character sample set is required to be used as the basis of the single-character classification, but in the existing text recognition process. However, in practice the magnitude of the character sample set for a scene may be far from the expected value. Resulting in poor recognition accuracy. Therefore, the character sample set needs to be amplified on the basis of the existing character sample set, so that the purpose of amplifying the character sample is achieved, and the identification precision is improved.
Disclosure of Invention
The application provides a method, a system and a device for amplifying a character sample set, which are used for solving the problem of poor recognition accuracy caused by small magnitude of character samples in the character sample set in the existing text recognition process.
In a first aspect, the present application provides a method for amplifying a character sample set, the method comprising:
acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;
sequentially judging whether the character samples are dot matrix character samples or not;
if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;
carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
and adding the amplified continuous character samples into the character sample set.
Further comprising the steps of:
if the character samples are continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
and adding the amplified continuous character samples into the character sample set.
Performing amplification operation on the continuous character sample to obtain an amplified continuous character sample, and specifically comprising the following steps of:
performing geometric transformation on the continuous character sample, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification;
and performing gray scale transformation on the continuous character samples, wherein the gray scale transformation comprises noise amplification, fuzzy amplification and stroke width amplification.
The rotary amplification comprises the following steps:
rotating the consecutive character samples by a fixed angle around a central point;
rotating the binary image corresponding to the continuous character sample by a fixed angle around a central point;
acquiring a character boundary;
performing boundary cutting on the rotation result of the continuous character samples according to the character boundary;
the miscut amplification comprises the following steps:
horizontally shifting the gray values of all pixels of the consecutive character samples;
performing nearest neighbor filling operation on the gray value of the moved pixel point;
acquiring a character boundary;
and performing boundary cutting on the rotation result of the continuous character samples according to the character boundary.
The local deformation amplification comprises the following steps:
taking the character edge of the continuous character sample as an outer contour;
randomly translating the gray value of each pixel point positioned in the outer contour, and performing truncation operation at an extreme value;
the radial deformation amplification comprises the following steps:
carrying out angle inclination on the image boundary of the continuous character samples to obtain a new view plane;
successive character samples are then projected onto the new viewing plane.
The elastically deformable amplification comprises the steps of:
sequentially carrying out random space translation on all pixels of the continuous character sample within a unit distance to obtain a continuous character sample after random space translation;
and filtering the continuous character samples after the random space translation to obtain character samples subjected to elastic deformation amplification.
The noise amplification is to set gray values of four vertex pixels of the continuous character sample as an image foreground;
the fuzzy amplification is to perform telescopic translation on the gray value of the pixel point of the random data of the continuous character sample according to the local gray value;
the stroke width augmentation comprises the following steps: and carrying out erosion or expansion operation on the stroke width of the continuous character sample.
The rotation amplification, the miscut amplification, the local deformation amplification, the radial deformation amplification, the elastic deformation amplification, the noise amplification, the fuzzy amplification and the stroke width amplification may be processed in series or in parallel.
In a second aspect, the present application provides a system for augmenting a sample set of characters, the system comprising:
a sample acquisition module: acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;
a sample judgment module: sequentially judging whether the character samples are dot matrix character samples or not;
a sample processing module: if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;
a sample amplification module: carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
a sample generation module: and adding the amplified continuous character samples into the character sample set.
In a third aspect, the present application provides an apparatus for amplifying a character sample set, the apparatus comprising: at least one processor, a memory, and an input-output unit; wherein the memory is used for storing a computer program, and the processor is used for calling the computer program stored in the memory to execute the method.
According to the technical scheme, the method comprises the steps of obtaining a character sample set, wherein the character sample set comprises a certain number of character samples, judging whether the character samples are dot matrix character samples or not in sequence, if the character samples are dot matrix character samples, carrying out morphological processing on the character samples to obtain continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples, and adding the amplified continuous character samples into the character sample set. The character sample set is amplified through multiple amplification modes on the basis of the existing character sample set, so that the magnitude of the character samples in the character sample set is improved, and the identification precision of image processing is further improved. The amplification method is not only suitable for character samples, but also suitable for general target detection, image classification and semantic segmentation, and has wide application range.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a diagram illustrating an application scenario of a method for amplifying a character sample set according to the present application;
FIG. 2 is a flow chart of a method for amplifying a sample set of characters provided by the present application;
FIG. 3 is a flow chart of a method according to one embodiment provided herein;
FIG. 4 is a flow chart of a method of a second embodiment provided herein;
FIG. 5 is a flow chart of a method of a third embodiment provided herein;
fig. 6 is a schematic structural diagram of an amplification system for a character sample set provided in the present application.
Detailed Description
In order to solve the problems in the prior art, the application provides a method, a system and a device for amplifying a character sample set, so as to solve the problem that in the existing text recognition process, the recognition accuracy is poor due to the fact that the number order of the character samples in the character sample set is small.
Referring to fig. 1, an application scenario diagram of the method for amplifying a character sample set according to the present application is shown, where the method includes obtaining a character sample set, where the character sample set includes a certain number of character samples, performing multiple amplification operations on consecutive character samples to obtain amplified character samples, and adding the amplified character samples to the character sample set to achieve an objective of increasing the order of magnitude of the character samples in the character sample set.
In a first aspect, referring to fig. 2, the present application provides a method for amplifying a character sample set, the method comprising:
s100: acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;
s110: sequentially judging whether the character samples are dot matrix character samples or not;
s120: if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;
by carrying out morphological treatment on the dot matrix character sample, the amplification method is not only suitable for the continuous character sample, but also suitable for the dot matrix character sample, and the applicability of the amplification method of the technical scheme of the application is improved.
S130: carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
and carrying out geometric transformation on the continuous character samples, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification, and the gray-scale transformation comprises noise amplification, fuzzy amplification and stroke width amplification. Through multiple amplification modes, the order of magnitude of the character sample set is increased, and the identification precision is effectively improved
S140: and adding the amplified continuous character samples into the character sample set.
Further comprising the steps of:
if the character samples are continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
and adding the amplified continuous character samples into the character sample set.
In some embodiments, referring to fig. 3, performing an amplification operation on the consecutive character samples to obtain amplified consecutive character samples specifically includes the following steps:
s131: performing geometric transformation on the continuous character sample, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification;
s132: and carrying out gray level transformation on the continuous character samples, wherein the gray level transformation comprises noise amplification, fuzzy amplification and stroke width amplification.
In some embodiments, referring to fig. 4, spin amplification comprises the steps of:
s200: rotating the continuous character samples by a fixed angle around a central point;
s210: rotating the binary images corresponding to the continuous character samples by a fixed angle around a central point;
s220: acquiring a character boundary;
s230: performing boundary cutting on the rotation result of the continuous character samples according to the character boundary;
in some embodiments, the miscut amplification comprises the steps of:
s300: horizontally shifting the gray values of all pixels of the continuous character samples;
s310: performing nearest neighbor filling operation on the gray value of the moved pixel point;
s320: acquiring a character boundary;
s330: and performing boundary cutting on the rotation result of the continuous character samples according to the character boundary.
In some embodiments, referring to fig. 5, local deformation amplification comprises the steps of:
taking the character edges of the continuous character samples as outer outlines;
randomly translating the gray value of each pixel point positioned in the outer contour, and performing truncation operation at an extreme value;
in some embodiments, the radial deformation amplification comprises the steps of:
carrying out angle inclination on the image boundaries of the continuous character samples to obtain a new viewing plane;
successive character samples are then projected onto the new viewing plane.
In some embodiments, elastically deforming amplification comprises the steps of:
sequentially carrying out random space translation on all pixels of the continuous character sample within a unit distance to obtain a continuous character sample after random space translation;
and filtering the continuous character samples after the random space translation to obtain character samples subjected to elastic deformation amplification.
In some embodiments, noise amplification is to set the gray values of the four vertex pixels of successive character samples to the image foreground;
in some embodiments, the fuzzy expansion is to perform telescopic translation on the gray value of the pixel point of the random data of the continuous character samples according to the local gray value;
in some embodiments, stroke width augmentation comprises the steps of: and carrying out erosion or expansion operation on the stroke width of the continuous character sample.
In some embodiments, rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification, elastic deformation amplification, noise amplification, blur amplification, and stroke width amplification may be processed serially or in parallel.
In a second aspect, referring to fig. 6, the present application provides a system for amplifying a character sample set, the system comprising:
a sample acquisition module: acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;
a sample judgment module: sequentially judging whether the character samples are dot matrix character samples or not;
a sample processing module: if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;
a sample amplification module: carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
a sample generation module: and adding the amplified continuous character samples into the character sample set.
In a third aspect, the present application provides an apparatus for amplifying a character sample set, the apparatus comprising: at least one processor, a memory, and an input-output unit; wherein the memory is used for storing a computer program and the processor is used for calling the computer program stored in the memory to execute the method.
According to the technical scheme, the method comprises the steps of obtaining a character sample set, wherein the character sample set comprises a certain number of character samples, sequentially judging whether the character samples are dot matrix character samples, carrying out morphological processing on the character samples if the character samples are dot matrix character samples to obtain continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples, and adding the amplified continuous character samples into the character sample set. The character sample set is amplified through multiple amplification modes on the basis of the existing character sample set, so that the magnitude of the character samples in the character sample set is improved, and the identification precision of image processing is further improved.
The technical scheme has the beneficial effects that: firstly, one image can be subjected to one or more of rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification, elastic deformation amplification, noise amplification, fuzzy amplification and stroke width amplification simultaneously, the synthesized data conforms to the characteristics of real data, the diversity of samples is effectively covered, and the subsequent classification generalization is improved. Secondly, before amplification, morphological connection is applied to inhibit the difference between the dot matrix characters and the continuous characters, so that the amplification method is suitable for the two characters. Finally, the amplification method is not only suitable for character samples, but also suitable for general target detection, image classification and semantic segmentation, and the method is wide in application range.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for amplifying a sample set of characters, the method comprising:
acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;
sequentially judging whether the character samples are dot matrix character samples or not;
if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;
carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
and adding the amplified continuous character samples into the character sample set.
2. The method for amplifying a character sample set according to claim 1, further comprising the steps of:
if the character samples are continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
and adding the amplified continuous character samples into the character sample set.
3. The method for amplifying the character sample set according to claim 2, wherein the method for amplifying the consecutive character samples to obtain the amplified consecutive character samples comprises the following steps:
performing geometric transformation on the continuous character sample, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification;
and performing gray scale transformation on the continuous character samples, wherein the gray scale transformation comprises noise amplification, fuzzy amplification and stroke width amplification.
4. The method for amplifying a sample set of characters according to claim 3,
the rotary amplification comprises the following steps:
rotating the consecutive character samples by a fixed angle around a central point;
rotating the binary image corresponding to the continuous character sample by a fixed angle around a central point;
acquiring a character boundary;
performing boundary cutting on the rotation result of the continuous character samples according to the character boundary;
the miscut amplification comprises the following steps:
horizontally shifting the gray values of all pixels of the consecutive character samples;
performing nearest neighbor filling operation on the gray value of the moved pixel point;
acquiring a character boundary;
and performing boundary cutting on the rotation result of the continuous character samples according to the character boundary.
5. The method for amplifying a sample set of characters according to claim 4,
the local deformation amplification comprises the following steps:
taking the character edge of the continuous character sample as an outer contour;
randomly translating the gray value of each pixel point positioned in the outer contour, and performing truncation operation at an extreme value;
the radial deformation amplification comprises the following steps:
carrying out angle inclination on the image boundary of the continuous character samples to obtain a new view plane;
successive character samples are then projected onto the new viewing plane.
6. The method for amplifying a character sample set according to claim 5, wherein the elastically deforming amplification includes the steps of:
sequentially carrying out random space translation on all pixels of the continuous character sample within a unit distance to obtain a continuous character sample after random space translation;
and filtering the continuous character samples after the random space translation to obtain character samples subjected to elastic deformation amplification.
7. The method for amplifying a sample set of characters according to claim 6,
the noise amplification is to set gray values of four vertex pixels of the continuous character sample as an image foreground;
the fuzzy amplification is to perform telescopic translation on the gray value of the pixel point of the random data of the continuous character sample according to the local gray value;
the stroke width augmentation comprises the following steps: and carrying out erosion or expansion operation on the stroke width of the continuous character sample.
8. The method of claim 6, wherein the rotation amplification, the miscut amplification, the local deformation amplification, the radial deformation amplification, the elastic deformation amplification, the noise amplification, the blur amplification, and the stroke width amplification can be processed serially or in parallel.
9. An augmentation system for a sample set of characters, the system comprising:
a sample acquisition module: acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;
a sample judgment module: sequentially judging whether the character samples are dot matrix character samples or not;
a sample processing module: if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;
a sample amplification module: carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;
a sample generation module: and adding the amplified continuous character samples into the character sample set.
10. An apparatus for augmenting a sample set of characters, the apparatus comprising: at least one processor, a memory, and an input-output unit; wherein the memory is for storing a computer program and the processor is for calling the computer program stored in the memory to perform the method of any one of claims 1-8.
CN202110829494.4A 2021-07-22 2021-07-22 Method, system and device for amplifying character sample set Pending CN113469279A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110829494.4A CN113469279A (en) 2021-07-22 2021-07-22 Method, system and device for amplifying character sample set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110829494.4A CN113469279A (en) 2021-07-22 2021-07-22 Method, system and device for amplifying character sample set

Publications (1)

Publication Number Publication Date
CN113469279A true CN113469279A (en) 2021-10-01

Family

ID=77881838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110829494.4A Pending CN113469279A (en) 2021-07-22 2021-07-22 Method, system and device for amplifying character sample set

Country Status (1)

Country Link
CN (1) CN113469279A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650721A (en) * 2016-12-28 2017-05-10 吴晓军 Industrial character identification method based on convolution neural network
CN109920538A (en) * 2019-03-07 2019-06-21 中南大学 A kind of zero sample learning method based on data enhancing
CN111666994A (en) * 2020-05-28 2020-09-15 平安科技(深圳)有限公司 Sample image data enhancement method and device, electronic equipment and storage medium
CN111967457A (en) * 2020-08-06 2020-11-20 赖明钟 OCR detection method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650721A (en) * 2016-12-28 2017-05-10 吴晓军 Industrial character identification method based on convolution neural network
CN109920538A (en) * 2019-03-07 2019-06-21 中南大学 A kind of zero sample learning method based on data enhancing
CN111666994A (en) * 2020-05-28 2020-09-15 平安科技(深圳)有限公司 Sample image data enhancement method and device, electronic equipment and storage medium
WO2021114832A1 (en) * 2020-05-28 2021-06-17 平安科技(深圳)有限公司 Sample image data enhancement method, apparatus, electronic device, and storage medium
CN111967457A (en) * 2020-08-06 2020-11-20 赖明钟 OCR detection method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
毕佳晶;李敏;郑蕊蕊;许爽;贺建军;黄荻;: "面向满文字符识别的训练数据增广方法研究", 大连民族大学学报, no. 01 *
雷飞;孙康;王雪丽;: "基于改进LeNet-5的牛奶生产日期识别研究", 计算机技术与发展, no. 07 *

Similar Documents

Publication Publication Date Title
CN114529459B (en) Method, system and medium for enhancing image edge
CN111489337B (en) Automatic optical detection pseudo defect removal method and system
CN112183038A (en) Form identification and typing method, computer equipment and computer readable storage medium
JP7198922B2 (en) Tire/Sidewall Imaging Method
CN111223065B (en) Image correction method, irregular text recognition device, storage medium and apparatus
CN112614062A (en) Bacterial colony counting method and device and computer storage medium
CN112464829B (en) Pupil positioning method, pupil positioning equipment, storage medium and sight tracking system
CN114049499A (en) Target object detection method, apparatus and storage medium for continuous contour
CN111311497B (en) Bar code image angle correction method and device
CN109741273A (en) A kind of mobile phone photograph low-quality images automatically process and methods of marking
CN111178153A (en) Traffic sign detection method and system
CN113065553A (en) Data processing method and device, three-dimensional scanning system and electronic device
CN112712058A (en) Character recognition and extraction method
Sachdeva et al. Automatic segmentation and area calculation of optic disc in ophthalmic images
CN111798481B (en) Image sequence segmentation method and device
CN117132503A (en) Method, system, equipment and storage medium for repairing local highlight region of image
CN113469279A (en) Method, system and device for amplifying character sample set
CN112749696A (en) Text detection method and device
CN113420767A (en) Method, system and device for extracting features for font classification
CN111382741B (en) Method, system and equipment for detecting text in natural scene picture
CN110264488B (en) Binary image edge extraction device
CN113362347A (en) Image defect region segmentation method and system based on multi-scale superpixel feature enhancement
CN116109891B (en) Image data amplification method, device, computing equipment and storage medium
CN111797843B (en) Method, system, storage medium and equipment for extracting laser marking Chinese character outline
CN113255665B (en) Target text extraction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination