CN113469279A

CN113469279A - Method, system and device for amplifying character sample set

Info

Publication number: CN113469279A
Application number: CN202110829494.4A
Authority: CN
Inventors: 王博帝; 姚毅; 杨艺; 全煜鸣; 金刚; 彭斌
Original assignee: Shenzhen Lingyun Shixun Technology Co ltd; Luster LightTech Co Ltd
Current assignee: Shenzhen Lingyun Shixun Technology Co ltd; Luster LightTech Co Ltd
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2021-10-01

Abstract

The application discloses method, system and device for amplifying character sample set, the method includes, obtains the character sample set, the character sample set includes a certain amount of character samples, judges in proper order whether the character sample is the dot matrix character sample, if the character sample is the dot matrix character sample, it is right the character sample carries out morphological processing, obtains the continuous character sample, right the continuous character sample carries out the amplification operation, obtains the continuous character sample after the amplification, will the continuous character sample after the amplification adds in the character sample set. The character sample set is amplified through multiple amplification modes on the basis of the existing character sample set, so that the magnitude of the character samples in the character sample set is improved, and the identification precision of image processing is further improved.

Description

Method, system and device for amplifying character sample set

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, a system, and an apparatus for amplifying a character sample set.

Background

The text is transmitted through various carriers, and people observe the text to acquire information. By applying machine vision instead of human eyes, efficient recognition of text in images has become an important component of automated production.

The text recognition based on machine learning is mainly recognized by a single-character classification method, and a certain-magnitude character sample set is required to be used as the basis of the single-character classification, but in the existing text recognition process. However, in practice the magnitude of the character sample set for a scene may be far from the expected value. Resulting in poor recognition accuracy. Therefore, the character sample set needs to be amplified on the basis of the existing character sample set, so that the purpose of amplifying the character sample is achieved, and the identification precision is improved.

Disclosure of Invention

The application provides a method, a system and a device for amplifying a character sample set, which are used for solving the problem of poor recognition accuracy caused by small magnitude of character samples in the character sample set in the existing text recognition process.

In a first aspect, the present application provides a method for amplifying a character sample set, the method comprising:

acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;

sequentially judging whether the character samples are dot matrix character samples or not;

if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;

carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;

and adding the amplified continuous character samples into the character sample set.

Further comprising the steps of:

if the character samples are continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;

Performing amplification operation on the continuous character sample to obtain an amplified continuous character sample, and specifically comprising the following steps of:

performing geometric transformation on the continuous character sample, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification;

and performing gray scale transformation on the continuous character samples, wherein the gray scale transformation comprises noise amplification, fuzzy amplification and stroke width amplification.

The rotary amplification comprises the following steps:

rotating the consecutive character samples by a fixed angle around a central point;

rotating the binary image corresponding to the continuous character sample by a fixed angle around a central point;

acquiring a character boundary;

performing boundary cutting on the rotation result of the continuous character samples according to the character boundary;

the miscut amplification comprises the following steps:

horizontally shifting the gray values of all pixels of the consecutive character samples;

performing nearest neighbor filling operation on the gray value of the moved pixel point;

acquiring a character boundary;

and performing boundary cutting on the rotation result of the continuous character samples according to the character boundary.

The local deformation amplification comprises the following steps:

taking the character edge of the continuous character sample as an outer contour;

randomly translating the gray value of each pixel point positioned in the outer contour, and performing truncation operation at an extreme value;

the radial deformation amplification comprises the following steps:

carrying out angle inclination on the image boundary of the continuous character samples to obtain a new view plane;

successive character samples are then projected onto the new viewing plane.

The elastically deformable amplification comprises the steps of:

sequentially carrying out random space translation on all pixels of the continuous character sample within a unit distance to obtain a continuous character sample after random space translation;

and filtering the continuous character samples after the random space translation to obtain character samples subjected to elastic deformation amplification.

The noise amplification is to set gray values of four vertex pixels of the continuous character sample as an image foreground;

the fuzzy amplification is to perform telescopic translation on the gray value of the pixel point of the random data of the continuous character sample according to the local gray value;

the stroke width augmentation comprises the following steps: and carrying out erosion or expansion operation on the stroke width of the continuous character sample.

The rotation amplification, the miscut amplification, the local deformation amplification, the radial deformation amplification, the elastic deformation amplification, the noise amplification, the fuzzy amplification and the stroke width amplification may be processed in series or in parallel.

In a second aspect, the present application provides a system for augmenting a sample set of characters, the system comprising:

a sample acquisition module: acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;

a sample judgment module: sequentially judging whether the character samples are dot matrix character samples or not;

a sample processing module: if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;

a sample amplification module: carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;

a sample generation module: and adding the amplified continuous character samples into the character sample set.

In a third aspect, the present application provides an apparatus for amplifying a character sample set, the apparatus comprising: at least one processor, a memory, and an input-output unit; wherein the memory is used for storing a computer program, and the processor is used for calling the computer program stored in the memory to execute the method.

According to the technical scheme, the method comprises the steps of obtaining a character sample set, wherein the character sample set comprises a certain number of character samples, judging whether the character samples are dot matrix character samples or not in sequence, if the character samples are dot matrix character samples, carrying out morphological processing on the character samples to obtain continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples, and adding the amplified continuous character samples into the character sample set. The character sample set is amplified through multiple amplification modes on the basis of the existing character sample set, so that the magnitude of the character samples in the character sample set is improved, and the identification precision of image processing is further improved. The amplification method is not only suitable for character samples, but also suitable for general target detection, image classification and semantic segmentation, and has wide application range.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a diagram illustrating an application scenario of a method for amplifying a character sample set according to the present application;

FIG. 2 is a flow chart of a method for amplifying a sample set of characters provided by the present application;

FIG. 3 is a flow chart of a method according to one embodiment provided herein;

FIG. 4 is a flow chart of a method of a second embodiment provided herein;

FIG. 5 is a flow chart of a method of a third embodiment provided herein;

fig. 6 is a schematic structural diagram of an amplification system for a character sample set provided in the present application.

Detailed Description

In order to solve the problems in the prior art, the application provides a method, a system and a device for amplifying a character sample set, so as to solve the problem that in the existing text recognition process, the recognition accuracy is poor due to the fact that the number order of the character samples in the character sample set is small.

Referring to fig. 1, an application scenario diagram of the method for amplifying a character sample set according to the present application is shown, where the method includes obtaining a character sample set, where the character sample set includes a certain number of character samples, performing multiple amplification operations on consecutive character samples to obtain amplified character samples, and adding the amplified character samples to the character sample set to achieve an objective of increasing the order of magnitude of the character samples in the character sample set.

In a first aspect, referring to fig. 2, the present application provides a method for amplifying a character sample set, the method comprising:

s100: acquiring a character sample set, wherein the character sample set comprises a certain number of character samples;

s110: sequentially judging whether the character samples are dot matrix character samples or not;

s120: if the character sample is a dot matrix character sample, performing morphological processing on the character sample to obtain a continuous character sample;

by carrying out morphological treatment on the dot matrix character sample, the amplification method is not only suitable for the continuous character sample, but also suitable for the dot matrix character sample, and the applicability of the amplification method of the technical scheme of the application is improved.

S130: carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples;

and carrying out geometric transformation on the continuous character samples, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification, and the gray-scale transformation comprises noise amplification, fuzzy amplification and stroke width amplification. Through multiple amplification modes, the order of magnitude of the character sample set is increased, and the identification precision is effectively improved

S140: and adding the amplified continuous character samples into the character sample set.

Further comprising the steps of:

In some embodiments, referring to fig. 3, performing an amplification operation on the consecutive character samples to obtain amplified consecutive character samples specifically includes the following steps:

s131: performing geometric transformation on the continuous character sample, wherein the geometric transformation comprises rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification and elastic deformation amplification;

s132: and carrying out gray level transformation on the continuous character samples, wherein the gray level transformation comprises noise amplification, fuzzy amplification and stroke width amplification.

In some embodiments, referring to fig. 4, spin amplification comprises the steps of:

s200: rotating the continuous character samples by a fixed angle around a central point;

s210: rotating the binary images corresponding to the continuous character samples by a fixed angle around a central point;

s220: acquiring a character boundary;

s230: performing boundary cutting on the rotation result of the continuous character samples according to the character boundary;

in some embodiments, the miscut amplification comprises the steps of:

s300: horizontally shifting the gray values of all pixels of the continuous character samples;

s310: performing nearest neighbor filling operation on the gray value of the moved pixel point;

s320: acquiring a character boundary;

s330: and performing boundary cutting on the rotation result of the continuous character samples according to the character boundary.

In some embodiments, referring to fig. 5, local deformation amplification comprises the steps of:

taking the character edges of the continuous character samples as outer outlines;

in some embodiments, the radial deformation amplification comprises the steps of:

carrying out angle inclination on the image boundaries of the continuous character samples to obtain a new viewing plane;

successive character samples are then projected onto the new viewing plane.

In some embodiments, elastically deforming amplification comprises the steps of:

In some embodiments, noise amplification is to set the gray values of the four vertex pixels of successive character samples to the image foreground;

in some embodiments, the fuzzy expansion is to perform telescopic translation on the gray value of the pixel point of the random data of the continuous character samples according to the local gray value;

in some embodiments, stroke width augmentation comprises the steps of: and carrying out erosion or expansion operation on the stroke width of the continuous character sample.

In some embodiments, rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification, elastic deformation amplification, noise amplification, blur amplification, and stroke width amplification may be processed serially or in parallel.

In a second aspect, referring to fig. 6, the present application provides a system for amplifying a character sample set, the system comprising:

In a third aspect, the present application provides an apparatus for amplifying a character sample set, the apparatus comprising: at least one processor, a memory, and an input-output unit; wherein the memory is used for storing a computer program and the processor is used for calling the computer program stored in the memory to execute the method.

According to the technical scheme, the method comprises the steps of obtaining a character sample set, wherein the character sample set comprises a certain number of character samples, sequentially judging whether the character samples are dot matrix character samples, carrying out morphological processing on the character samples if the character samples are dot matrix character samples to obtain continuous character samples, carrying out amplification operation on the continuous character samples to obtain amplified continuous character samples, and adding the amplified continuous character samples into the character sample set. The character sample set is amplified through multiple amplification modes on the basis of the existing character sample set, so that the magnitude of the character samples in the character sample set is improved, and the identification precision of image processing is further improved.

The technical scheme has the beneficial effects that: firstly, one image can be subjected to one or more of rotation amplification, miscut amplification, local deformation amplification, radial deformation amplification, elastic deformation amplification, noise amplification, fuzzy amplification and stroke width amplification simultaneously, the synthesized data conforms to the characteristics of real data, the diversity of samples is effectively covered, and the subsequent classification generalization is improved. Secondly, before amplification, morphological connection is applied to inhibit the difference between the dot matrix characters and the continuous characters, so that the amplification method is suitable for the two characters. Finally, the amplification method is not only suitable for character samples, but also suitable for general target detection, image classification and semantic segmentation, and the method is wide in application range.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for amplifying a sample set of characters, the method comprising:

2. The method for amplifying a character sample set according to claim 1, further comprising the steps of:

3. The method for amplifying the character sample set according to claim 2, wherein the method for amplifying the consecutive character samples to obtain the amplified consecutive character samples comprises the following steps:

4. The method for amplifying a sample set of characters according to claim 3,

the rotary amplification comprises the following steps:

acquiring a character boundary;

the miscut amplification comprises the following steps:

acquiring a character boundary;

5. The method for amplifying a sample set of characters according to claim 4,

the local deformation amplification comprises the following steps:

the radial deformation amplification comprises the following steps:

successive character samples are then projected onto the new viewing plane.

6. The method for amplifying a character sample set according to claim 5, wherein the elastically deforming amplification includes the steps of:

7. The method for amplifying a sample set of characters according to claim 6,

8. The method of claim 6, wherein the rotation amplification, the miscut amplification, the local deformation amplification, the radial deformation amplification, the elastic deformation amplification, the noise amplification, the blur amplification, and the stroke width amplification can be processed serially or in parallel.

9. An augmentation system for a sample set of characters, the system comprising:

10. An apparatus for augmenting a sample set of characters, the apparatus comprising: at least one processor, a memory, and an input-output unit; wherein the memory is for storing a computer program and the processor is for calling the computer program stored in the memory to perform the method of any one of claims 1-8.