Efficient data labeling method
Technical Field
The invention relates to the technical field of text data labeling, in particular to a high-efficiency data labeling method.
Background
When the problem of target detection is solved, the position frame of the target to be detected is often required to be marked on the existing image data, and due to the angle problem of the camera and the shot object, the target with the original regular shape is deformed after imaging, so that a lot of difficulty is added to marking work, and the condition is particularly obvious on an OCR sample.
One of the existing methods is to adopt a rectangular marking method, namely, a target position is marked by uniformly using a rectangular marking frame, the method has the advantages that marking efficiency is high, only two points (a form of clicking- > dragging- > releasing by a mouse) are needed to finish marking of one target, and the method has the defect of low precision, and a large gap is left because an object cannot fill the whole rectangular frame after deformation.
The other method is to label the object by any quadrangle, and the method needs to select 4 points to finish the labeling of each target, and has the advantages of higher labeling precision and capability of obtaining higher-quality labeling data under the condition of serious labeling by a labeling person. However, the method has the disadvantages that the marking work intensity is high (four points are needed), errors are easy to occur, and the position of a certain point is slightly deviated during actual operation, so that the whole quadrangle is greatly deformed, and frequent modification is caused.
Disclosure of Invention
The invention aims to provide an efficient data labeling method to solve the problems that the precision is not high, the whole rectangular frame cannot be filled with objects after the objects are deformed, a large gap is left, the working strength is high, and errors are easy to occur.
In order to achieve the purpose, the invention provides the following technical scheme: a high-efficiency data labeling method comprises the following specific labeling steps:
s1: putting an image to be marked: transmitting the image to be marked to a data marking platform so that a marking system can process and mark the image by a marking person;
s2: performing projection transformation on the image to enable the shape of the labeling target to be close to a rectangle: establishing a planar rectangular coordinate system by taking the left side and the upper side of the display area as a Y axis and an X axis, taking the intersection point of the Y axis and the X axis as an origin, performing projection transformation processing on the image to be labeled put in the step S1 to enable the target to be labeled in the image to be labeled to be close to a rectangular shape, and placing the target to be labeled close to the rectangular shape in the middle of the vision;
s3: labeling by a rectangular labeling method: after the image to be marked is projected and transformed to be horizontal, the image to be marked can be conveniently marked by a rectangular marking method, and a rectangular marking frame can be obtained only by selecting two points, namely the upper left point and the lower right point, of the rectangle;
s4: coordinate inverse transformation: the coordinates obtained by marking in the second step can be regarded as the coordinates after projection transformation, and the coordinates are inversely transformed by utilizing the projection matrix obtained in the previous step, so that the coordinates of the corresponding position on the original image can be obtained;
s5: obtaining the corresponding marking information of the original image: and outputting the coordinates of the corresponding position on the original image subjected to inverse transformation in the step S4, so as to obtain the corresponding annotation information of the original image.
Preferably, the projective transformation processing modes include projective transformation processing modes such as rotation, inversion, translation, and scaling.
Preferably, the rotational projective transformation processing mode is divided into three parts, the first part is to translate the center of the image to the origin, the second part is to rotate at an angle θ, and the third part is to translate the center of the image back.
Preferably, the reversed projective transformation processing mode specifically includes: the control image is folded and turned over with respect to any straight line in the display area.
Preferably, the translational projective transformation processing mode is specifically: and translating the image center to the original point, then moving the image center and driving the image to move transversely and longitudinally, wherein the transverse moving length and the longitudinal moving length of the image are respectively half of the transverse length and half of the longitudinal length of the display area.
Preferably, the scaling projective transformation processing mode specifically includes: and selecting the central point of the display area as a zooming point, and zooming the image according to the N times of proportion.
Compared with the prior art, the invention has the beneficial effects that:
1) and (4) marking time saving: under the condition that a single image sample contains a large number of targets to be marked in the same direction (form) (for example, a bill sample contains a large number of text boxes in the same angle direction), the marking difficulty of all targets is greatly reduced after one-time projection transformation, and the marking speed is greatly improved;
2) the marking precision is high: the obtained marking frame can be well attached to a target, and has small gaps and high precision.
Drawings
FIG. 1 is a flow chart of the labeling method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
Referring to fig. 1, the present invention provides a technical solution: a high-efficiency data labeling method comprises the following specific labeling steps:
s1: putting an image to be marked: the image to be marked is transmitted to a data marking platform so that a marking system can process and mark personnel conveniently, the image needing data marking is usually manually placed in an identification area, and the image needing data marking is acquired through a camera;
s2: performing projection transformation on the image to enable the shape of the labeling target to be close to a rectangle: establishing a planar rectangular coordinate system by taking the left side and the upper side of the display area as a Y axis and an X axis, taking the intersection point of the Y axis and the X axis as an origin, performing projection transformation processing on the image to be marked put in the step S1 to enable the target to be marked in the image to be marked to be close to a rectangular shape, placing the target to be marked close to the rectangular shape in the middle of the vision, and covering the rectification display area by the established coordinate system;
s3: labeling by a rectangular labeling method: after the image to be marked is projected and transformed to be horizontal, the image to be marked can be conveniently marked by a rectangular marking method, a rectangular marking frame can be obtained by selecting two points, namely the upper left point and the lower right point, of the rectangle, and the area of the rectangular marking frame covers the whole marking target;
s4: coordinate inverse transformation: the coordinates obtained by marking in the second step can be regarded as the coordinates after projection transformation, and the coordinates are inversely transformed by utilizing the projection matrix obtained in the previous step, so that the coordinates of the corresponding position on the original image can be obtained;
s5: obtaining the corresponding marking information of the original image: and outputting the coordinates of the corresponding position on the original image subjected to inverse transformation in the step S4, so as to obtain the corresponding annotation information of the original image.
The projection transformation processing mode comprises rotation, turnover, translation, scaling and other projection transformation processing modes, and a single projection transformation processing mode of rotation, turnover, translation and scaling or a combined operation mode of two or more of the projection transformation processing modes is selected according to different use conditions.
The rotating projection transformation processing mode is divided into three parts, the first part is that the center of an image is translated to an original point, the center of the image is taken as a reference point to drive the whole image to move simultaneously, the second part rotates at an angle theta, the specific value of the angle theta is selected according to the specific image, the image is rotated, so that the target to be marked in the image can reach a state close to straight through rotation, and the third part is that the center of the image is translated back, so that the target to be marked in the image can be displayed in the middle of a display area in a straight mode.
The turning projection transformation processing mode specifically comprises the following steps: the regulating image is folded and turned over about any straight line in the display area, and the turning turns the image about any straight line in the display area by 180 degrees, so that the image is turned over about the straight line and the turned image information is displayed.
The translation projection transformation processing mode specifically comprises the following steps: the image center is translated to the original point, then the image center is moved and the image is driven to move transversely and longitudinally, the transverse moving length and the longitudinal moving length of the image are respectively half of the transverse length and half of the longitudinal length of the display area, and when an object to be marked displayed by the image is located at the edge of the display area and is inconvenient to see clearly or difficult to identify, the image is adjusted to the middle of the display area to facilitate identification.
The scaling projective transformation processing mode specifically comprises: the central point of the selected display area is a zooming point, the image is zoomed according to the proportion of N times, when the proportion of the target to be marked relative to the display area is small, the identification is inaccurate, the identification effect is difficult to perform, the image needs to be amplified, and the proportion of the target to be marked and the display area is moderate.
Example (b):
exemplified by a rotational-translational transformation. A rotating text is arranged in a following image bill sample, and the rotating text is inconvenient to label directly, and the rotating is carried out by using the image center until the text is in the horizontal direction (manually completed by a labeling person). The three parts of the rotating projective transformation processing mode are embodied in three transformation matrixes (from right to left) in the following formula, wherein width and height are the length and width of the image, and x 'and y' are transformed coordinates.
After the transformation relation is determined, the determination of each pixel value of the transformed image can be obtained by mapping the coordinates corresponding to the original image reversely and then calculating by interpolation.
It is sometimes not sufficient to adjust the object to a horizontal rectangular shape by just rotation, translation transformation, and it is necessary to rely on other kinds of projective transformation or combinations thereof, but all projective transformations and combinations of projective transformations can be expressed by the following formulas. The intermediate projective transformation matrix is often a multiplication of multiple transformation matrices.
In the actual implementation of the labeling system, the projection matrix can be decomposed into a plurality of different transformation matrices, and different transformation parameters can be set separately, so that labeling personnel can conveniently perform corresponding operations in the labeling system.
After the image to be marked is rotated to be horizontal, the image can be marked conveniently by a rectangular marking method, and a rectangular marking frame (generally, coordinate information of four vertexes of a rectangle can be stored) can be obtained only by selecting two points, namely the upper left point and the lower right point, of the rectangle.
The coordinates obtained by labeling in the second step can be regarded as the coordinates after projection transformation, and the coordinates are inversely transformed by using the projection matrix obtained before, so that the coordinates (coordinate information of four vertexes of the quadrangle) of the corresponding position on the original image can be obtained.
Therefore, more accurate target frame position information on the original image can be obtained.
While there have been shown and described the fundamental principles and essential features of the invention and advantages thereof, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof; the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.