CN115359493B - Method and device for detecting rotary text - Google Patents

Method and device for detecting rotary text Download PDF

Info

Publication number
CN115359493B
CN115359493B CN202211219674.1A CN202211219674A CN115359493B CN 115359493 B CN115359493 B CN 115359493B CN 202211219674 A CN202211219674 A CN 202211219674A CN 115359493 B CN115359493 B CN 115359493B
Authority
CN
China
Prior art keywords
text
angle
frame
value
graphic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211219674.1A
Other languages
Chinese (zh)
Other versions
CN115359493A (en
Inventor
张存义
艾国
杨作兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen MicroBT Electronics Technology Co Ltd
Original Assignee
Shenzhen MicroBT Electronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen MicroBT Electronics Technology Co Ltd filed Critical Shenzhen MicroBT Electronics Technology Co Ltd
Priority to CN202211219674.1A priority Critical patent/CN115359493B/en
Publication of CN115359493A publication Critical patent/CN115359493A/en
Application granted granted Critical
Publication of CN115359493B publication Critical patent/CN115359493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1463Orientation detection or correction, e.g. rotation of multiples of 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1448Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/186Extraction of features or characteristics of the image by deriving mathematical or geometrical properties from the whole image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)

Abstract

The present disclosure relates to a method and apparatus for detecting a rotation text, the method comprising: obtaining graphic sample data, and obtaining a text region frame tag value and a normalized angle tag value according to the labels of text graphic sample areas; inputting the graph sample data into a target detection model to obtain a text region frame predicted value and a normalized angle predicted value, and further obtaining text region frame regression loss and angle normalized regression loss; obtaining overall regression loss according to the text region frame regression loss and the angle normalization regression loss; according to the integral regression loss, adjusting the target detection model to obtain a trained target detection model; and detecting the graphic data to be detected based on the trained target detection model to obtain a text region detection frame. In the method, the regression loss of the text region frame and the angle normalization regression loss are independent, the regression accuracy of the text region frame is not affected, and the end-to-end accurate detection of various rotation angle characters in the graphic data is realized.

Description

Method and device for detecting rotary text
Technical Field
The disclosure relates to the technical field of image recognition, and in particular relates to a method and a device for detecting rotary characters.
Background
At present, character recognition is widely applied to various scenes, for example, characters appearing in scenes are recognized through a shooting device of a mobile phone, and people can be assisted in quickly extracting strange language character information and obtaining needed information through auxiliary translation.
The general target detection means generally adopts a rectangular frame for detection, and under some scenes of OCR (Optical Character Recognition ) character detection, a shooting device is required to rotate along with the direction of the character, so that the imaging direction of the character in an image is forward, and the image in a character area can be successfully extracted for subsequent character recognition.
However, in some scenes, it is difficult to ensure that the direction in which the text is imaged in the image is forward due to the angle between the camera and the text to be recognized, so the existing target detection method cannot be applied to a case in which the direction of the text in the image is presented at various angles.
Therefore, how to detect characters presented in various angles in an image so as to ensure character recognition under various conditions is a problem to be solved.
Disclosure of Invention
In view of this, the present disclosure provides a method and apparatus for detecting a rotation character, which can accurately detect a character area containing graphic data of characters with various rotation angles from end to end.
The technical scheme of the present disclosure is realized as follows:
a method of detecting a rotating text, comprising:
obtaining graphic sample data containing a text graphic sample area;
obtaining label information of the text and graphic sample area according to the label of the text and graphic sample area, wherein the label information comprises a text area frame label value and a normalized angle label value which are related to the text and graphic sample area, and the normalized angle label value represents the inclination angle of the text and graphic sample area relative to a coordinate transverse axis of the graphic sample data;
inputting the graphic sample data into a target detection model to be trained, and obtaining a text region frame predicted value and a normalized angle predicted value which are related to the text graphic sample region through the target detection model to be trained;
obtaining text region frame regression loss according to the text region frame predicted value and the text region frame label value, and obtaining angle normalization regression loss according to the normalization angle predicted value and the normalization angle label value;
obtaining the integral regression loss associated with the text graphic sample area according to the text area frame regression loss and the angle normalization regression loss;
According to the integral regression loss, the target detection model to be trained is adjusted, and a trained target detection model is obtained;
and detecting the graphic data to be detected based on the trained target detection model to obtain a text region detection frame associated with the text graphic region in the graphic data to be detected.
Further, the text region frame tag value comprises a normalized coordinate value of a center point of the text graphic sample area, a normalized width value of the text graphic sample area and a normalized height value of the text graphic sample area;
the normalized angle label value is obtained by:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the normalized angle tag value,for the inclination angle of the text graphic sample area relative to the coordinate transverse axis of the graphic sample data,
further, the text region frame predicted value comprises a normalized coordinate predicted value of a center point of the text graphic sample area in the graphic sample data, a normalized width predicted value of the text graphic sample area and a normalized height predicted value of the text graphic sample area.
Further, the text region box regression loss is a generalized cross joint GIoU loss function;
The angle normalized regression loss is a Smooth average absolute value error smoothl 1 loss function.
Further, the obtaining the overall regression loss associated with the text graphic sample area according to the text region frame regression loss and the angle normalization regression loss includes:
and adding the text region frame regression loss and the angle normalization regression loss to obtain the integral regression loss.
Further, the detecting the graphic data to be detected based on the trained target detection model to obtain a text region detection frame associated with the text graphic region in the graphic data to be detected, including:
inputting the graphic data to be detected into the trained target detection model, and obtaining a text region frame detection value and an angle normalization detection value of the text graphic region through the trained target detection model;
obtaining an angle value according to the angle normalized detection value;
and obtaining the text region detection frame according to the text region detection value and the angle value.
Further, the text region frame detection value comprises a normalized coordinate value of a center point of the text region detection frame, a normalized width value of the text region detection frame and a normalized height value of the text region detection frame.
Further, the normalizing the detection value according to the angle to obtain an angle value includes obtaining the angle value by adopting the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,the detection values are normalized for the angle,is the angle value.
Further, the obtaining the text region detection frame according to the text region detection value and the angle value includes:
according to the text region frame detection value, the position of the center point of the text region detection frame in the graphic data to be detected and the width and the height of the text region detection frame are obtained;
obtaining the inclination angle of the text region detection frame relative to the coordinate transverse axis of the graphic data to be detected according to the angle value;
and obtaining the character area detection frame according to the position of the central point of the character area detection frame in the graphic data to be detected, the width and the height of the character area detection frame and the inclination angle of the character area detection frame relative to the coordinate transverse axis of the graphic data to be detected.
Further, after the text region detection frame is obtained, the method for detecting the rotating text further includes:
and presenting the text region detection frame to the graphic data to be detected.
A rotary text detection device comprising:
a graphic sample data acquisition module configured to perform acquisition of graphic sample data containing a text graphic sample area;
a tag information obtaining module configured to obtain tag information of the text graphic sample area according to the labeling of the text graphic sample area, wherein the tag information comprises a text area frame tag value and a normalized angle tag value associated with the text graphic sample area, and the normalized angle tag value represents an inclination angle of the text graphic sample area relative to a coordinate transverse axis of the graphic sample data;
the predicted value obtaining module is configured to input the graphic sample data into a target detection model to be trained, and obtain a text region frame predicted value and a normalized angle predicted value which are associated with the text graphic sample region through the target detection model to be trained;
the regional frame and angle regression loss obtaining module is configured to execute the steps of obtaining a text regional frame regression loss according to the text regional frame predicted value and the text regional frame label value, and obtaining an angle normalized regression loss according to the normalized angle predicted value and the normalized angle label value;
The integral regression loss obtaining module is configured to execute integral regression loss related to the text image sample area according to the text area frame regression loss and the angle normalization regression loss;
the model training module is configured to execute the adjustment of the target detection model to be trained according to the overall regression loss to obtain a trained target detection model;
and the graphic detection module is configured to execute detection on the graphic data to be detected based on the trained target detection model, and a text region detection frame associated with a text graphic region in the graphic data to be detected is obtained.
An electronic device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the executable instructions to implement a method of detecting rotational text as claimed in any one of the preceding claims.
A computer readable storage medium, at least one instruction of which, when executed by a processor of an electronic device, enables the electronic device to implement a method of detecting rotational text as claimed in any one of the preceding claims.
According to the method and the device for detecting the rotary text, disclosed by the scheme, the normalized angle label value except the text region frame label value is obtained through labeling the graphic sample data, and the output dimension of the target detection model is increased relative to the normalized angle, wherein the overall regression loss of the target detection model comprises the independent text region frame regression loss and the independent angle normalized regression loss. In the present disclosure, the regression of the text region frame and the regression of the angle are independent, so the detected rotated text region detection frame is more accurate.
The content in the text region can comprise text content in a document, text content in a road sign, text content of a traffic sign, text content of a vehicle license plate, text content of a building surface, text content of a container surface, text content of various object surfaces and the like, the graphic data can be pictures, and the detection of text content presented in graphic data of various shooting angles can be realized by adopting the rotary text detection method and device disclosed by the invention, so that the requirement of end-to-end identification of text content in various shooting scenes can be met, and the problem that text content is difficult to identify rapidly and accurately in a complex environment is solved.
Drawings
FIG. 1 is a flow chart of a method of detecting rotational text according to an exemplary embodiment;
FIG. 2 is a graphical sample data diagram shown in accordance with an illustrative embodiment;
FIG. 3A is a schematic diagram illustrating a graphical sample data calibration in accordance with an illustrative embodiment;
FIG. 3B is a schematic diagram illustrating another graphical sample data calibration in accordance with an illustrative embodiment;
FIG. 4 is a diagram illustrating a relationship between text region box label values and text graphic sample areas according to an exemplary embodiment;
FIG. 5 is a schematic diagram showing normalized angle label values versus tilt angle, according to an exemplary embodiment;
FIG. 6 is a schematic diagram illustrating a process for obtaining text region detection boxes based on a trained target detection model, according to one illustrative embodiment;
FIG. 7 is a schematic diagram illustrating a process of detecting a box according to a text region box detection value and an angle value, according to an illustrative embodiment;
FIG. 8 is a schematic diagram illustrating character recognition by one prior art OCR character detection scheme in accordance with an illustrative embodiment;
FIG. 9 is a flowchart of an application scenario of a method of detecting rotational text, according to an exemplary embodiment;
FIG. 10 is a schematic diagram of a rotary text detection device according to an exemplary embodiment;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail below with reference to the accompanying drawings and examples.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Currently, common rotating rectangle detection methods include scrset, RSDet. The SCRDet regresses the rectangular frame and regresses 5 parameters, wherein the parameters comprise the center point coordinates (two parameters), the width and the height (two parameters) and the rotation angle (one parameter) of the rectangular frame; RSDet, a rotation sensitive detector (Rotation Sensitive Detector), employs an eight parameter regression method that can use the same unit coordinates to detect a rotation rectangle. These methods all consider rotating rectangular anchors (anchors) when predicting the rectangular frame rotation angle, which can lose regression accuracy of the rectangular frame.
Regarding the character detection method, various pixel-level character detection methods such as DBNet (a method for OCR text detection) cannot distinguish the character directions, and a network for recognizing the character directions is generally required to be connected after the character detection, so that the whole OCR process is more complex and the error transmission is more.
In view of this, the embodiments of the present disclosure provide a method and an apparatus for detecting a rotating character, which implement end-to-end accurate detection of a character area of graphic data containing characters with various angles of rotation, and implement rapid recognition of characters with various angles of rotation in the graphic data.
Fig. 1 is a flowchart of a method for detecting a rotating text according to an exemplary embodiment, and as shown in fig. 1, the method for detecting a rotating text mainly includes the following steps 101 to 107.
Step 101, obtaining graphic sample data containing a text graphic sample area.
Fig. 2 is a schematic diagram of graphic sample data according to an exemplary embodiment, as shown in fig. 2, the graphic sample data 200 includes a text graphic sample area 201, and in the embodiment shown in fig. 2, text content in the text graphic sample area 201 is "ABCDEFG", and an included angle greater than 0 ° is formed between the text graphic sample area 201 and a coordinate horizontal axis 202 of the graphic sample data 200. In fig. 2, the horizontal axis 202 of the graphic sample data 200 is shown with a dotted line, and in general, the horizontal axis of the graphic sample data 200 is defined as the x-axis, that is, the text graphic sample area 201 forms an angle greater than 0 ° with the x-axis of the graphic sample data 200. The dashed line in fig. 2 is merely representative of the direction of extension of the coordinate horizontal axis 202, and is not necessarily the presentation of the graphical sample data 200.
Step 102, according to the labeling of the text graphic sample area, obtaining label information of the text graphic sample area, wherein the label information comprises a text area frame label value and a normalized angle label value which are related to the text graphic sample area, and the normalized angle label value represents the inclination angle of the text graphic sample area relative to the coordinate transverse axis of the graphic sample data.
In some embodiments, the labeling means for the text graphic sample area 201 is as follows:
and marking the upper left corner of the text and graphic sample area 201 as a first marking point, the upper right corner as a second marking point, the lower right corner as a third marking point and the lower left corner as a fourth marking point according to the sequence of the first marking point, the second marking point, the third marking point and the fourth marking point according to the horizontal placement direction of the text content in the text and graphic sample area 201 during normal reading. For example, for "ABCDEFG" shown in fig. 2, the upper left corner of the letter "a" is used as the first marking point, the upper right corner of the letter "G" is used as the second marking point, the lower right corner of the letter "G" is used as the third marking point, the lower left corner of the letter "a" is used as the fourth marking point, and the first marking point, the second marking point, the third marking point and the fourth marking point are marked in the order of the first marking point, the second marking point, the third marking point and the fourth marking point.
In this way, after the labeling of the text and graphic sample area 201 is completed to obtain the data information of the first labeling point, the second labeling point, the third labeling point and the fourth labeling point, the position and the size (the central point coordinate and the length and width value of the text area frame) of the text and graphic sample area 201 can be obtained, and the included angle between the text area frame and the coordinate horizontal axis 202 of the graphic sample data 200, that is, the rotation angle or the inclination angle of the text area frame can be obtained according to the data of the four labeling points, for example, the included angle between the text area frame and the coordinate horizontal axis 202 of the graphic sample data 200 can be obtained through the first labeling point and the second labeling point which are calibrated sequentially or the third labeling point and the fourth labeling point which are calibrated sequentially.
Fig. 3A is a schematic diagram after calibration of graphic sample data according to an exemplary embodiment, and as shown in fig. 3A, an angle θ between an extending direction of a straight line where a third mark point (a lower right corner of a letter "G") and a fourth mark point (a lower left corner of a letter "a") are located and a coordinate horizontal axis 202 is calibrated in order, is taken as an angle θ between a text region frame 301 and the coordinate horizontal axis 202 of the graphic sample data 200. In the embodiment shown in fig. 3A, 0 ° < θ <180 °.
Fig. 3B is a schematic diagram after calibrating another graphic sample data according to an exemplary embodiment, where the embodiment shown in fig. 3B is also an angle θ between the coordinate horizontal axis 202 and the extending direction of the straight line where the third labeling point (the lower right corner of the letter "G") and the fourth labeling point (the lower left corner of the letter "a") are located, which are calibrated in order, as the angle θ between the text region frame 301 and the coordinate horizontal axis 202 of the graphic sample data 200. In the embodiment shown in fig. 3B, 180 ° < θ <360 °.
As shown in fig. 3A and 3B, the angle θ is an angle between the coordinate horizontal axis 202 and the text region frame 301 in the counterclockwise direction.
In some embodiments, the text region box label value includes a normalized coordinate value of a center point of the text graphic sample area in the graphic sample data, a normalized width value of the text graphic sample area, a normalized height value of the text graphic sample area.
FIG. 4 is a diagram illustrating a relationship between text region box label values and text graphics sample areas, according to an exemplary embodiment. As shown in fig. 4, O is the center point of the text graphic sample area 201, that is, the center point of the text region box 301, and the normalized coordinate values of the center point O of the text graphic sample area 201 in the graphic sample data 200 include x and y, where x is the horizontal axis normalized coordinate value and y is the vertical axis normalized coordinate value. The horizontal axis normalized coordinate value is a ratio of a horizontal axis (width direction) where x is a center point O and a width of the graphic sample data 200, where the width of the graphic sample data 200 (corresponding to the horizontal axis direction) is 1, for example, x=0.5 if the center point O of the text graphic sample area 201 is exactly at the middle position in the horizontal axis direction of the graphic sample data 200; the vertical axis normalized coordinate value refers to a ratio of the vertical axis (height direction) of the graphic sample data 200 with the height (corresponding to the vertical axis direction) of the graphic sample data 200 being 1 and y being the center point O, for example, y=0.5 if the center point O of the text graphic sample area 201 is exactly at the middle position in the vertical axis direction of the graphic sample data 200. As shown in fig. 4, w is a normalized width value of the text pattern sample area, h is a normalized height value of the text pattern sample area, and the normalized width value and the normalized height value are normalized with the width and the height of the pattern sample data 200 as 1.
In some embodiments, to avoid errors in the resulting normalized height value h and normalized width value w caused by differences in the actual width and height values of the graphic sample data 200, the width and height of the graphic sample data 200 are set equal. In some embodiments, a method of processing raw graphic data to obtain graphic sample data 200 of equal width and height may include: compressing or stretching the width of the original graphic data so that the width and the height are equal; compressing or stretching the height of the original graphic data so that the height and the width are equal; the smaller value is expanded to be equal to the larger value in terms of the larger value in the width and height of the original graphic data without changing the aspect ratio of the original graphic data, and the expanded region is filled with black, for example, in the case where the width of the original graphic data is greater than the height, the height is expanded to be equal to the width, and the expanded region is filled with black in the height direction.
In some embodiments, the normalized angle label value is obtained by:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the normalization of the angle label value,the inclination angle of the character pattern sample area relative to the coordinate transverse axis of the pattern sample data,
Fig. 5 is a schematic diagram of a relationship between a normalized angle label value and an inclination angle, which is shown in fig. 5, and the relationship is a change curve of the above formula, and it can be seen from fig. 5 that the normalized angle label value obtained by normalizing the inclination angle corresponds to the inclination angle in a one-to-one correspondence manner, that is, in a range from-1 to 1, each normalized angle label value uniquely corresponds to an inclination angle between 0 ° and 360 °, so that the normalized angle label value can accurately represent the inclination angle.
And step 103, inputting the graphic sample data into a target detection model to be trained, and obtaining a text region frame predicted value and a normalized angle predicted value which are related to the text graphic sample region through the target detection model to be trained.
In some embodiments, the text region box predictor includes a normalized coordinate predictor of a center point of the text graphic sample area in the graphic sample data, a normalized width predictor of the text graphic sample area, a normalized height predictor of the text graphic sample area.
The normalized coordinate predicted value, the normalized width predicted value and the normalized height predicted value are predicted values obtained by the target detection model to be trained by taking the width and the height of the graphic sample data as 1.
Wherein, referring to the above formula, the normalized angle predicted value can uniquely correspond to an inclination angle predicted value between 0 ° and 360 °, that is, the normalized angle predicted value characterizes the predicted inclination angle.
In some embodiments, the target detection model may be a one-stage target detection algorithm of YOLOv3, YOLOv4, YOLOv5, etc., which belongs to a lightweight network model, has fast detection speed, is convenient to arrange, and can perform end-to-end detection.
On the basis of the existing YOLOv3, YOLOv4, YOLOv5 models, in order to increase the output dimension with respect to the normalized angle predictors, the output layer of the models can be modified by setting the relevant parameters, for example, YOLOv3, YOLOv4, YOLOv5 detects that the class is 80 on the coco dataset, and thus the output dimensions include 80 dimensions of the class, 4 dimensions (normalized x, y, w, h) of the prediction box (positive rectangular box) and 1 confidence dimension, for a total of 85 output dimensions. In some embodiments, in the method for detecting a rotating text of the present disclosure, an output dimension is additionally added to output a normalized angle predicted value on the basis of the 85 output dimensions, so that the output dimension becomes: the class has 80 dimensions, 4 dimensions (normalized x, y, w, h) of the prediction box (positive rectangular box), 1 dimension of the prediction box normalization angle, and 1 confidence dimension, totaling 86 output dimensions. An increase in the output dimension of the YOLOv3, YOLOv4, YOLOv5 model can be achieved by setting parameters in the relevant build file of the model, for example in the YOLOv3, YOLOv4, YOLOv5 model, the parameter nc represents the dimension of the category, nc=80, the output dimension is nc+5, where 5 is the 4 dimensions (normalized x, y, w, h) of the prediction box (positive rectangular box) and 1 confidence dimension, the output dimension is modified to nc+6 corresponding to the output dimension of the increased normalized angle prediction value, so that the original set parameter no=na (nc+5) is adjusted to no=na (nc+6), where no is number of outputs, the number of output parameters na is the number of anchor boxes (anchor boxes) in the YOLOv3, YOLOv4, YOLOv5 model, na=3, and hence no is adjusted to 258 from original 255.
In response, corresponding settings are required at both the data processing level and the network core structure level.
For example, at the data processing level of YOLOv3 (in the dataset. Py file), the original correlation code is as follows:
label[best_detect][yind, xind, best_anchor, 0:4] = bbox_xywh
label[best_detect][yind, xind, best_anchor, 4:5] = 1.0
label[best_detect][yind, xind, best_anchor, 5:] = smooth_onehot
in some embodiments, corresponding to increasing the normalized angle tag value, the upper piece of code is modified accordingly:
label[best_detect][yind, xind, best_anchor, 0:4] = bbox_xywh
label[best_detect][yind, xind, best_anchor, 4:5] = degrees
label[best_detect][yind, xind, best_anchor, 5:6] = 1.0
label[best_detect][yind, xind, best_anchor, 6:] = smooth_onehot
for example, at the network core structure level of YOLOv3 (YOLOv 3.Py file), the original correlation code is as follows:
pred_xywh= pred[:, :, :, :, 0:4]
pred_conf= pred[:, :, :, :, 4:5]
label_xywh= label[:, :, :, :, 0:4]
respond_bbox= label[:, :, :, :, 4:5]
label_prob= label[:, :, :, :, 5:]
in some embodiments, corresponding to increasing the normalized angle tag value, the upper piece of code is modified accordingly:
pred_xywh= pred[:, :, :, :, 0:4]
pred_angle= pred[:, :, :, :, 4:5]
pred_conf= pred[:, :, :, :, 5:6]
label_xywh= label[:, :, :, :, 0:4]
label_angle= label[:, :, :, :, 4:5]
respond_bbox= label[:, :, :, :, 5:6]
label_prob= label[:, :, :, :, 6:]
in addition, at the network core structure level of YOLOv3 (YOLOv 3.Py document), corresponding to increasing the normalized angle label value, the angle normalized regression loss needs to be increased:
angle_loss=respond_bbox*lossfunc smooth_l1_loss(label_angle, pred_angle)
for further description and related settings of the YOLOv3, YOLOv4, YOLOv5 models, reference is made to related technical documents, which are not repeated here.
And 104, obtaining text region frame regression loss according to the text region frame predicted value and the text region frame label value, and obtaining angle normalized regression loss according to the normalized angle predicted value and the normalized angle label value.
In the embodiment of the disclosure, a means of respectively regressing a prediction frame and a prediction angle is adopted, and a text region frame regression loss and an angle normalization regression loss are respectively obtained according to the prediction frame and the prediction angle, wherein the text region frame regression loss and the angle normalization regression loss are mutually independent.
In some embodiments, the text region box regression loss is a GIoU (Generalized Intersection over Union, generalized cross joint) loss function, the GIoU loss function is a predicted box regression loss function adopted by YOLOv3, YOLOv4, YOLOv5 models, and in some embodiments, the GIoU loss function is used, and further description of the GIoU loss function can be found in related technical documents, which are not repeated herein.
In some embodiments, the angle normalized regression loss is a smoothl 1 loss function. The smoothl 1 Loss function is a target detection regression Loss function, and the smoothl 1 Loss function is called smoothl 1 Loss, where L1 Loss is also called mean absolute value error (MAE, mean Absolute Error), and is the mean value of the absolute difference between the model predicted value and the true value, where the MAE function is continuous but is not conductive at 0, and the derivative of the MAE is constant, so at smaller Loss values, the obtained gradient is relatively large, and model oscillation may be caused to be unfavorable for convergence. The Smooth L1 Loss is a Smooth L1 Loss, the function is a piecewise function, L2 Loss is arranged between [ -1,1] to solve the problem that L1 Loss is not conductive at 0, L1 Loss is arranged outside [ -1,1] intervals, and the problem of gradient explosion of outliers can be solved. Where L2 Loss is also referred to as mean square error (MSE, mean Square Error), which refers to the average of the square of the difference between the model predicted value and the true value. For further description of the smoothl 1 loss function, see related technical documents, which are not described here in detail.
And 105, obtaining the integral regression loss associated with the text graphic sample area according to the text area frame regression loss and the angle normalization regression loss.
In some embodiments, step 105 may further comprise:
and adding the text region frame regression loss and the angle normalization regression loss to obtain the integral regression loss.
Therefore, in the embodiment of the disclosure, the overall regression loss includes two parts, i.e., the regression loss of the text region frame and the angle normalization regression loss, and the regression of the angle and the regression of the text region frame are independent, so that the regression accuracy of the text region frame is not affected by adopting the mode of the embodiment of the disclosure.
And 106, adjusting the target detection model to be trained according to the integral regression loss to obtain a trained target detection model.
Wherein, adjusting the target detection model to be trained may include adjusting parameters such as a weight of the target detection model to be trained.
In each step, the number of graphic sample data may be plural in order to achieve the training purpose of high detection rate of the target detection model. The number of the graphic sample data acquired in step 101 is a plurality, in step 102, label information of a text graphic sample area in each graphic sample data is obtained through labeling for each graphic sample data, and steps 103 to 106 may be repeatedly performed based on different graphic sample data until the difference converges to a preset range or the iteration reaches a set number of times, so as to complete training of the target detection model.
And step 107, detecting the graphic data to be detected based on the trained target detection model to obtain a text region detection frame associated with the text graphic region in the graphic data to be detected.
Fig. 6 is a schematic diagram illustrating a process of obtaining a text region detection frame based on a trained target detection model according to an exemplary embodiment, and as shown in fig. 6, step 107 may specifically include the following steps 601 to 603.
And 601, inputting graphic data to be detected into a trained target detection model, and obtaining a text region frame detection value and an angle normalization detection value of a text graphic region through the trained target detection model.
In some embodiments, the text region box detection value includes a normalized coordinate value of a center point of the text region detection box in the graphics data to be detected, a normalized width value of the text region detection box, a normalized height value of the text region detection box, i.e., the text region box detection value includes a normalized x, y, w, h of the text region detection box.
Step 602, normalizing the detection value according to the angle to obtain an angle value.
In some embodiments, step 602 includes obtaining the angle value using the following equation:
wherein, the liquid crystal display device comprises a liquid crystal display device, For the angle-normalized detection value,is an angle value.
The curve relationship between the angle normalized detection value and the angle value may refer to the relationship change curve of the normalized angle label value and the inclination angle shown in fig. 5. The angle normalization detection value obtained in step 602 can uniquely determine an angle value, namely an included angle between the text region detection frame and the x-axis (horizontal axis) of the graphic data to be detected. Wherein, the liquid crystal display device comprises a liquid crystal display device,correspondingly, the position of the first and second contact elements,
and 603, obtaining a text region detection frame according to the text region frame detection value and the angle value.
Fig. 7 is a schematic diagram illustrating a process of detecting a frame according to a text region frame detection value and an angle value according to an exemplary embodiment, and as shown in fig. 7, step 603 may specifically include the following steps 701 to 703.
And 701, obtaining the position of the center point of the text region detection frame in the graphic data to be detected and the width and the height of the text region detection frame according to the text region frame detection value.
And 702, obtaining the inclination angle of the text region detection frame relative to the coordinate transverse axis of the graphic data to be detected according to the angle value.
Step 703, obtaining the text region detection frame according to the position of the center point of the text region detection frame in the graphic data to be detected, the width and the height of the text region detection frame, and the inclination angle of the text region detection frame relative to the coordinate transverse axis of the graphic data to be detected.
In step 703, the text region detection frame is rotated with the position of the center point of the text region detection frame in the graphic data to be detected as the axis to obtain the text region detection frame, the rotation angle is the inclination angle of the text region detection frame with respect to the coordinate horizontal axis of the graphic data to be detected, and the rotation direction is the same as the inclination angle, that is, if the inclination angle is the angle between the coordinate horizontal axis of the graphic data to be detected and the text region detection frame in the counterclockwise direction, the rotation direction is the counterclockwise rotation direction, for example, if the inclination angle of the text region detection frame is 45 ° in step 702, the inclined text region detection frame is rotated counterclockwise by 45 ° in step 703.
In step 703, the text region detection frame is rotated, not the text, and the text content frame in the graphic data to be detected can be selected only by rotating the text region detection frame to an inclined angle. After that, in the subsequent step, the text content selected by the frame may be rotated in the forward direction according to the inclination angle, and the direction in which the text content is rotated is opposite to the direction of the inclination angle, that is, if the inclination angle is the angle between the horizontal axis of the coordinates of the graphic data to be detected and the text region detection frame in the counterclockwise direction, the direction in which the text content is rotated is the clockwise rotation direction, for example, if the inclination angle of the text region detection frame obtained in step 702 is 45 °, the text content selected by the frame is rotated clockwise by 45 °.
In some embodiments, after obtaining the text region detection frame, the method for detecting a rotating text according to the embodiments of the present disclosure may further include:
and displaying the text region detection frame on the graphic data to be detected.
The display and tracking of the character and graphic area in the graphic data to be detected by utilizing the character area detection frame are realized by adopting the steps, and the quick positioning and observation of characters appearing in the graphic data to be detected by character detection personnel can be assisted.
In some embodiments, after the text region detection frame is obtained, the content in the text region detection frame may be extracted (e.g., by automatic interception), and according to the angle value obtained in step 602, the content in the text region detection frame is rotated into a rectangular frame, so that the text content in the text region detection frame is in a horizontal arrangement direction during normal reading, and then the extracted text content is identified by using an OCR text recognition method. In some embodiments, the rotation of the content in the text region detection box into a positive rectangular box may be performed as follows:
rotating the content in the text region detection frame into a regular rectangular frame according to the inclination angle, wherein the rotation direction is opposite to the inclination angle, namely, if the inclination angle is an included angle between a coordinate horizontal axis of the graphic data to be detected and the text region detection frame in a counterclockwise direction, the rotation direction of the content in the text region detection frame is a clockwise rotation direction, for example, if the inclination angle of the text region detection frame is 45 degrees in step 702, the content in the text region detection frame is rotated 45 degrees clockwise;
And identifying and extracting the text content from the rotated positive rectangular frame.
According to the rotary text detection method, the normalized angle label value except the text region frame label value is obtained through labeling of the graphic sample data, and the output dimension of the target detection model is increased relative to the normalized angle, wherein the overall regression loss of the target detection model comprises the independent text region frame regression loss and the independent angle normalized regression loss. In the embodiment of the disclosure, the regression of the text region frame and the regression of the angle are independent, so that the detected rotating text region detection frame is more accurate.
The content in the text region can comprise text content in a document, text content in a road sign, text content of a traffic sign, text content of a vehicle license plate, text content of a building surface, text content of a container surface, text content of various object surfaces and the like, the graphic data can be pictures, and the detection of text content presented in graphic data of various shooting angles can be realized by adopting the rotary text detection method of the embodiment of the disclosure, so that the requirement of end-to-end identification of text content in various shooting scenes can be met, and the problem that text content is difficult to identify rapidly and accurately in a complex environment is solved. For example, in some application scenarios, the image capturing device may not be able to sufficiently adjust the capturing angle to ensure forward presentation of captured content, and after the rotational text detection method of the embodiment of the present disclosure is adopted, it is possible to implement quick end-to-end text detection on a captured guideboard picture containing strange language text that is not forward presented without adjusting the capturing angle of the image capturing device in an outdoor scenario, so that user experience can be greatly improved.
Compared with the technical scheme of the disclosure, in other existing detection methods, a rotating rectangular frame is adopted as an anchor frame, so that balance needs to be carried out between angle regression and rectangular frame regression, regression of the rectangular frame is affected, and regression accuracy is reduced. In the present disclosure, text region frame regression loss and angle normalization regression loss are obtained for a prediction frame and a prediction angle, and the text region frame regression loss and the angle normalization regression loss are independent of each other, so that the angle regression and the regression of a rectangular frame do not affect each other, and in the present disclosure, an anchor frame is a positive rectangular frame instead of a rotating rectangular frame, and in comparison with the present disclosure, anchor frames of some other existing rotating rectangular detection methods include inclined rectangles, in this case, in some other existing rotating rectangular detection methods, when calculating the angle regression and the rectangular frame regression, there is an influence between each other, so that balance between angle prediction and rectangular frame prediction is required, and regression of a rectangular frame is compromised in this process, so that regression accuracy is reduced.
In addition, in the embodiment of the disclosure, YOLOv3, YOLOv4 or YOLOv5 is adopted as a target detection model, and end-to-end detection can be realized based on the lightweight network characteristics of YOLOv3, YOLOv4 or YOLOv 5. The detection of the text content based on the relation of the YOLO series and the trigonometric function between the angle and the normalized angle, disclosed by the embodiment of the invention, has important application value in the fields of OCR text detection and recognition and the like. In addition to the above, the technical solution of the present disclosure may be applied to various general rectangular frame detection methods under the spirit and principles of the present disclosure, where the YOLO series mentioned in the embodiments of the present disclosure is only one of the various general rectangular frame detection methods; other general rectangular box detection methods, such as a scheme in which a central net (central network) model uses four point coordinates of a heat map regression box, can also increase the dimension regarding the inclination angle within the scope of the spirit principles of the present disclosure; in any rectangular frame detection method, the rotation text detection method according to the embodiments of the present disclosure may be used as long as the output of the prediction frame of the model is a fixed dimension output, for example, YOLO series is a fixed output (x, y, w, h, score, class), and only one dimension needs to be added for regression angle.
According to the method for detecting the rotating text, disclosed by the embodiment of the disclosure, the detection of the rotating rectangular frame can be realized based on the YOLOv3, YOLOv4 or YOLOv5 under the condition that excessive parameters are not increased, and compared with other rotating rectangular methods, the target detection model is lighter, and the end-to-end detection of text at any angle is realized. The regression of the angle is mapped to the [ -1, 1) interval and is continuous in the [0 degree, 360 degree interval, so that the angle prediction becomes simpler, and the problem that the angles of many rotating rectangle detection methods oscillate at the positions of 90 degrees and 270 degrees in the current stage is solved. In the existing many rotary rectangle detection methods, because some methods use the anchor frame itself to be inclined, and the calculation of the angle of the rectangular frame of the characters by the methods only takes the horizontal axis as the base line, the characters upside down sometimes generate angle confusion; calculating the angle at 90 DEG and 270 DEG using a trigonometric function tangent function value, wherein the position is discontinuous in the function; the angle is set according to the character direction, so even if the overturn occurs, for example, the horizontal direction is 0 degrees, the overturn is 180 degrees, and different values in the [ -1, 1) interval are presented on the trigonometric function corresponding to the scheme of the present disclosure, so that the distinction can be made.
In addition, in many of the conventional OCR character detection schemes, only pixel-level segmentation is used, for example, paddleOCR, and this pixel-level segmentation cannot determine the character direction, and a determination process of character direction classification needs to be added. Fig. 8 is a schematic diagram illustrating a conventional OCR text detection scheme according to an exemplary embodiment, where, as shown in fig. 8, in the conventional OCR text detection scheme using pixel level segmentation, dbnet pixel level text detection is adopted in a text detection stage, a text direction classification model is adopted in a detection frame adjustment stage, as an example, in the vertical text recognition of the "ODM OEM" of the bottle body shown in fig. 8, after the vertical detection frame about the "ODM OEM" text is obtained, the vertical detection frame is subjected to two rotation operations successively to obtain the "ODM OEM" information through text recognition, and the "ODM OEM" information is output.
Fig. 9 is a flowchart of an application scenario of a method for detecting rotation characters according to an exemplary embodiment, as shown in fig. 9, the flowchart includes the following steps 901 to 913.
Step 901, obtaining a plurality of graphic sample data containing text graphic sample areas, and then proceeding to step 902.
And 902, marking each text and graphic sample area to obtain a text area frame label value and a normalized angle label value of each text and graphic sample area, and then entering step 903.
The text and graphic sample area can be marked by adopting related marking software. The labeling process is to label the upper left corner of the text graphic sample area as a first labeling point, the upper right corner as a second labeling point, the lower right corner as a third labeling point and the lower left corner as a fourth labeling point according to the sequence of the first labeling point, the second labeling point, the third labeling point and the fourth labeling point.
The label value of the text region frame comprises x and y coordinates of a normalized center point of the text region frame and normalized width w and height h of the text region frame.
Wherein the normalized angle label value is obtained by:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the normalization of the angle label value, The inclination angle of the character pattern sample area relative to the coordinate transverse axis of the pattern sample data,
step 903, inputting one graphic sample data in all graphic sample data into a target detection model, obtaining a text region frame predicted value and a normalized angle predicted value of a text graphic sample region in the graphic sample data through the target detection model, and then entering step 904.
Step 904, obtaining a text region frame regression loss of the graphic sample data according to the text region frame predicted value and the text region frame label value of the text graphic sample region in the graphic sample data, obtaining an angle normalized regression loss according to the normalized angle predicted value and the normalized angle label value of the text graphic sample region in the graphic sample data, adding the text region frame regression loss and the angle normalized regression loss to obtain an overall regression loss of the target detection model, and then entering step 905.
Step 905, judging whether the training completion condition of the target detection model is satisfied, if so, proceeding to step 907, otherwise proceeding to step 906.
The training completion condition includes that the difference converges to a preset range or the iteration reaches a set number of times.
Step 906, adjusting the target detection model according to the overall regression loss, and returning to step 903.
Wherein adjusting the target detection model may include adjusting parameters such as a weight of the target detection model.
Step 907, training of the target detection model is completed, and then step 908 is performed.
Step 908, inputting the graphic data to be detected into a target detection model, obtaining a text region frame detection value and an angle normalization detection value of a text graphic region in the graphic data to be detected through the target detection model, and then entering step 909.
Step 909, normalizing the detection value according to the angle to obtain an angle value, and then proceeding to step 910 and step 911.
Wherein, the angle value is obtained by adopting the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the angle-normalized detection value,is an angle value.
Step 910, according to the text region frame detection value, the position of the center point of the text region detection frame in the graphic data to be detected, and the width and the height of the text region detection frame are obtained, and then step 912 is performed.
Step 911, obtaining the inclination angle of the text region detection frame relative to the coordinate horizontal axis of the graphic data to be detected according to the angle value, and then proceeding to step 912.
Step 912, obtaining a text region detection frame according to the position of the center point of the text region detection frame in the graphic data to be detected, the width and the height of the text region detection frame, and the inclination angle of the text region detection frame relative to the coordinate horizontal axis of the graphic data to be detected, and then entering step 913.
Step 913, the text region detection frame is presented to the graphic data to be detected.
Fig. 10 is a schematic structural diagram of a rotary text detection device according to an exemplary embodiment, and the rotary text detection device shown in fig. 10 includes a graphic sample data obtaining module 1001, a tag information obtaining module 1002, a predicted value obtaining module 1003, an area frame and angle regression loss obtaining module 1004, an overall regression loss obtaining module 1005, a model training module 1006, and a graphic detection module 1007.
The graphic sample data obtaining module 1001 is configured to perform obtaining graphic sample data including a text graphic sample area.
The tag information obtaining module 1002 is configured to obtain tag information of the text graphic sample area according to the labeling of the text graphic sample area, where the tag information includes a text area frame tag value and a normalized angle tag value associated with the text graphic sample area, and the normalized angle tag value characterizes an inclination angle of the text graphic sample area with respect to a coordinate horizontal axis of the graphic sample data.
The predicted value obtaining module 1003 is configured to perform inputting graphic sample data into a target detection model to be trained, and obtain a text region frame predicted value and a normalized angle predicted value associated with the text graphic sample region through the target detection model to be trained.
The region frame and angle regression loss obtaining module 1004 is configured to obtain a text region frame regression loss according to the text region frame prediction value and the text region frame label value, and obtain an angle normalized regression loss according to the normalized angle prediction value and the normalized angle label value.
The global regression loss obtaining module 1005 is configured to perform normalization of the regression loss according to the text region frame regression loss and the angle, and obtain the global regression loss associated with the text graphic sample region.
The model training module 1006 is configured to perform adjustment of the target detection model to be trained according to the overall regression loss, resulting in a trained target detection model.
The graphic detection module 1007 is configured to perform detection on the graphic data to be detected based on the trained target detection model, so as to obtain a text region detection frame associated with the text graphic region in the graphic data to be detected.
In some embodiments, the text region box label value includes a normalized coordinate value of a center point of the text graphic sample area in the graphic sample data, a normalized width value of the text graphic sample area, a normalized height value of the text graphic sample area.
In some embodiments, the normalized angle label value is obtained by:
Wherein, the liquid crystal display device comprises a liquid crystal display device,for the normalization of the angle label value,the inclination angle of the character pattern sample area relative to the coordinate transverse axis of the pattern sample data,
in some embodiments, the text region box predictor includes a normalized coordinate predictor of a center point of the text graphic sample area in the graphic sample data, a normalized width predictor of the text graphic sample area, a normalized height predictor of the text graphic sample area.
In some embodiments, the text region box regression loss is a generalized cross joint GIoU loss function; the angle normalized regression loss is a Smooth average absolute error, smoothl 1, loss function.
In some embodiments, the global regression loss obtaining module 1005 is further configured to perform: and adding the text region frame regression loss and the angle normalization regression loss to obtain the integral regression loss.
In some embodiments, the graphic detection module 1007 further includes a detection value acquisition sub-module, an angle value acquisition sub-module, and a text region detection box acquisition sub-module.
The detection value obtaining sub-module is configured to input the graphic data to be detected into the trained target detection model, and obtain the detection value of the text region frame and the angle normalization detection value of the text graphic region through the trained target detection model.
And the angle value obtaining sub-module is configured to perform normalization of the detection value according to the angle to obtain an angle value.
And the text region detection frame obtaining submodule is configured to obtain a text region detection frame according to the text region frame detection value and the angle value.
In some embodiments, the text region box detection value includes a normalized coordinate value of a center point of the text region detection box in the graphic data to be detected, a normalized width value of the text region detection box, and a normalized height value of the text region detection box.
In some embodiments, the angle value obtaining sub-module is further configured to perform obtaining the angle value using:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the angle-normalized detection value,is an angle value.
In some embodiments, the text region detection box obtaining submodule further includes:
the detection frame basic information obtaining sub-module is configured to obtain the position of the center point of the text region detection frame in the graphic data to be detected and the width and the height of the text region detection frame according to the text region frame detection value;
the detection frame inclination angle obtaining sub-module is configured to obtain an inclination angle of the text region detection frame relative to a coordinate transverse axis of the graphic data to be detected according to the angle value;
The detection frame obtaining sub-module is configured to obtain the text region detection frame according to the position of the center point of the text region detection frame in the graphic data to be detected, the width and the height of the text region detection frame and the inclination angle of the text region detection frame relative to the coordinate transverse axis of the graphic data to be detected.
In some embodiments, the rotary text detection device further comprises:
and the detection frame presenting module is configured to execute the presentation of the text region detection frame on the graphic data to be detected.
According to the rotary text detection device, the normalized angle label value except the text region frame label value is obtained through labeling of the graphic sample data, and the output dimension of the target detection model is increased relative to the normalized angle, wherein the overall regression loss of the target detection model comprises the independent text region frame regression loss and the independent angle normalized regression loss. In the embodiment of the disclosure, the regression of the text region frame and the regression of the angle are independent, so that the detected rotating text region detection frame is more accurate.
With respect to the rotary letter detecting device in the above-described embodiment, the specific manner in which the respective units perform the operation has been described in detail in the embodiment regarding the rotary letter detecting method, and will not be described in detail here.
It should be noted that: the above embodiments are only exemplified by the division of the above functional modules, and in practical applications, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. In some embodiments, the electronic device is a server. The electronic device 1100 may include one or more processors (Central Processing Units, CPU) 1101 and one or more memories 1102, where the memories 1102 store at least one program code that is loaded and executed by the processors 1101 to implement the method for detecting a rotational text provided by the above embodiments. Of course, the electronic device 1100 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
In an exemplary embodiment, a computer readable storage medium is also provided, such as a memory, comprising at least one instruction executable by a processor in a computer device to perform the method of detecting rotational text in the above embodiment.
Alternatively, the above-described computer-readable storage medium may be a non-transitory computer-readable storage medium, which may include, for example, ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, and the like.
The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to cover all modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present disclosure.

Claims (13)

1. A method of detecting a rotating text, comprising:
obtaining graphic sample data containing a text graphic sample area;
obtaining label information of the text and graphic sample area according to the label of the text and graphic sample area, wherein the label information comprises a text area frame label value and a normalized angle label value which are related to the text and graphic sample area, the normalized angle label value represents the inclination angle of the text and graphic sample area relative to the coordinate transverse axis of the graphic sample data, and the inclination angle is set according to the text direction;
Inputting the graphic sample data into a target detection model to be trained, and obtaining a text region frame predicted value and a normalized angle predicted value which are related to the text graphic sample region through the target detection model to be trained;
obtaining text region frame regression loss according to the text region frame predicted value and the text region frame tag value, and obtaining angle normalization regression loss according to the normalization angle predicted value and the normalization angle tag value, wherein the text region frame regression loss and the angle normalization regression loss are obtained according to the text region frame predicted value and the normalization angle predicted value respectively, and the text region frame regression loss and the angle normalization regression loss are mutually independent;
obtaining the integral regression loss associated with the text graphic sample area according to the text area frame regression loss and the angle normalization regression loss;
according to the integral regression loss, the target detection model to be trained is adjusted, and a trained target detection model is obtained;
detecting the graphic data to be detected based on the trained target detection model to obtain a text region detection frame associated with a text graphic region in the graphic data to be detected, wherein the text region detection frame comprises the following steps of:
According to the position of the central point of the character area detection frame in the graphic data to be detected, the width and the height of the character area detection frame and the inclination angle of the character area detection frame relative to the coordinate transverse axis of the graphic data to be detected, rotating the character area detection frame by the inclination angle along the direction same as the direction of the inclination angle by taking the position of the central point of the character area detection frame in the graphic data to be detected as the axis, so as to obtain the character area detection frame, and selecting the character content frame in the graphic data to be detected in the character area detection frame; and
and rotating the text content selected by the text region detection frame by the size of the inclination angle along the direction opposite to the direction of the inclination angle, so that the text direction of the text content is adjusted to be positive.
2. The method for detecting rotary text according to claim 1, wherein:
the text region frame tag value comprises a normalized coordinate value of a center point of the text graphic sample area, a normalized width value of the text graphic sample area and a normalized height value of the text graphic sample area;
The normalized angle label value is obtained by:
label al =sin[(θ s -π)/2]
wherein, label al For the normalized angle label value, θ s The inclination angle of the character pattern sample area relative to the coordinate transverse axis of the pattern sample data is 0 degree or less and is equal to or less than theta s <360°。
3. The method for detecting rotary text according to claim 1, wherein:
the character area frame predicted value comprises a normalized coordinate predicted value of a central point of the character graphic sample area in the graphic sample data, a normalized width predicted value of the character graphic sample area and a normalized height predicted value of the character graphic sample area.
4. The method for detecting rotary text according to claim 1, wherein:
the text region frame regression loss is a generalized cross joint GIoU loss function;
the angle normalized regression loss is a Smooth average absolute value error smoothl 1 loss function.
5. The method of claim 1, wherein the obtaining the overall regression loss associated with the text graphic sample area based on the text region box regression loss and the angle normalized regression loss comprises:
and adding the text region frame regression loss and the angle normalization regression loss to obtain the integral regression loss.
6. The method for detecting rotary text according to claim 1, wherein the detecting graphics data to be detected based on the trained target detection model to obtain a text region detection frame associated with a text graphics region in the graphics data to be detected, comprises:
inputting the graphic data to be detected into the trained target detection model, and obtaining a text region frame detection value and an angle normalization detection value of the text graphic region through the trained target detection model;
obtaining an angle value according to the angle normalized detection value;
and obtaining the text region detection frame according to the text region detection value and the angle value.
7. The method for detecting rotary text according to claim 6, wherein:
the character area frame detection value comprises a normalized coordinate value of a center point of the character area detection frame, a normalized width value of the character area detection frame and a normalized height value of the character area detection frame.
8. The method for detecting rotary text according to claim 6, wherein the normalizing the detected value according to the angle to obtain an angle value includes obtaining the angle value by using the following formula:
normal al =sin[(θ-π)/2]
Wherein, normal al And normalizing the detection value for the angle, wherein θ is the angle value.
9. The method for detecting a text region according to claim 6, wherein the step of obtaining the text region detection frame from the text region detection value and the angle value includes:
according to the text region frame detection value, the position of the center point of the text region detection frame in the graphic data to be detected and the width and the height of the text region detection frame are obtained;
obtaining the inclination angle of the text region detection frame relative to the coordinate transverse axis of the graphic data to be detected according to the angle value;
and obtaining the character area detection frame according to the position of the central point of the character area detection frame in the graphic data to be detected, the width and the height of the character area detection frame and the inclination angle of the character area detection frame relative to the coordinate transverse axis of the graphic data to be detected.
10. The method according to claim 1, wherein after obtaining the text region detection frame, the method further comprises:
and presenting the text region detection frame to the graphic data to be detected.
11. A rotary text detection device, comprising:
a graphic sample data acquisition module configured to perform acquisition of graphic sample data containing a text graphic sample area;
a tag information obtaining module configured to obtain tag information of the text graphic sample area according to the labeling of the text graphic sample area, the tag information including a text area frame tag value and a normalized angle tag value associated with the text graphic sample area, wherein the normalized angle tag value characterizes an inclination angle of the text graphic sample area with respect to a coordinate horizontal axis of the graphic sample data, the inclination angle being set according to a text direction;
the predicted value obtaining module is configured to input the graphic sample data into a target detection model to be trained, and obtain a text region frame predicted value and a normalized angle predicted value which are associated with the text graphic sample region through the target detection model to be trained;
the regional frame and angle regression loss obtaining module is configured to execute obtaining a regional frame regression loss according to the regional frame predicted value and the regional frame label value, and obtain an angle normalized regression loss according to the normalized angle predicted value and the normalized angle label value, wherein the regional frame regression loss and the angle normalized regression loss are obtained according to the regional frame predicted value and the normalized angle predicted value respectively, and the regional frame regression loss and the angle normalized regression loss are mutually independent;
The integral regression loss obtaining module is configured to execute integral regression loss related to the text image sample area according to the text area frame regression loss and the angle normalization regression loss;
the model training module is configured to execute the adjustment of the target detection model to be trained according to the overall regression loss to obtain a trained target detection model;
a graphics detection module configured to:
detecting the graphic data to be detected based on the trained target detection model by the following operations, so as to obtain a text region detection frame associated with a text graphic region in the graphic data to be detected:
according to the position of the central point of the character area detection frame in the graphic data to be detected, the width and the height of the character area detection frame and the inclination angle of the character area detection frame relative to the coordinate transverse axis of the graphic data to be detected, rotating the character area detection frame by the inclination angle along the direction same as the direction of the inclination angle by taking the position of the central point of the character area detection frame in the graphic data to be detected as the axis, so as to obtain the character area detection frame, and selecting the character content frame in the graphic data to be detected in the character area detection frame; and
And rotating the text content selected by the text region detection frame by the size of the inclination angle along the direction opposite to the direction of the inclination angle, so that the text direction of the text content is adjusted to be positive.
12. An electronic device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the executable instructions to implement the method of detecting rotational text as claimed in any one of claims 1 to 10.
13. A computer readable storage medium, characterized in that at least one instruction in the computer readable storage medium, when executed by a processor of an electronic device, enables the electronic device to implement the rotational text detection method of any one of claims 1 to 10.
CN202211219674.1A 2022-10-08 2022-10-08 Method and device for detecting rotary text Active CN115359493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211219674.1A CN115359493B (en) 2022-10-08 2022-10-08 Method and device for detecting rotary text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211219674.1A CN115359493B (en) 2022-10-08 2022-10-08 Method and device for detecting rotary text

Publications (2)

Publication Number Publication Date
CN115359493A CN115359493A (en) 2022-11-18
CN115359493B true CN115359493B (en) 2023-09-08

Family

ID=84008557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211219674.1A Active CN115359493B (en) 2022-10-08 2022-10-08 Method and device for detecting rotary text

Country Status (1)

Country Link
CN (1) CN115359493B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000113106A (en) * 1998-10-09 2000-04-21 Fuji Xerox Co Ltd Document image processor
US9224061B1 (en) * 2014-07-24 2015-12-29 Amazon Technologies, Inc. Text orientation estimation in camera captured OCR
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN111259846A (en) * 2020-01-21 2020-06-09 第四范式(北京)技术有限公司 Text positioning method and system and text positioning model training method and system
CN111353489A (en) * 2020-02-27 2020-06-30 平安国际智慧城市科技股份有限公司 Text image processing method and device, computer equipment and storage medium
CN111444918A (en) * 2020-04-01 2020-07-24 中移雄安信息通信科技有限公司 Image inclined text line detection model training and image inclined text line detection method
WO2020223859A1 (en) * 2019-05-05 2020-11-12 华为技术有限公司 Slanted text detection method, apparatus and device
CN112287927A (en) * 2020-10-14 2021-01-29 中国人民解放军战略支援部队信息工程大学 Method and device for detecting inclination angle of text image
CN113569194A (en) * 2021-06-10 2021-10-29 中国人民解放军海军工程大学 Rotating rectangular box representation and regression method for target detection
CN114037822A (en) * 2021-10-28 2022-02-11 多伦科技股份有限公司 Method and system for detecting driving license
CN114266884A (en) * 2021-12-13 2022-04-01 浙江工业大学 Method for detecting sorting target of multi-form bottle-shaped articles positioned by rotating frame
CN114359906A (en) * 2021-12-10 2022-04-15 武汉科技大学 Network image text recognition method and system based on multi-scale feature fusion

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000113106A (en) * 1998-10-09 2000-04-21 Fuji Xerox Co Ltd Document image processor
US9224061B1 (en) * 2014-07-24 2015-12-29 Amazon Technologies, Inc. Text orientation estimation in camera captured OCR
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
WO2020223859A1 (en) * 2019-05-05 2020-11-12 华为技术有限公司 Slanted text detection method, apparatus and device
CN111259846A (en) * 2020-01-21 2020-06-09 第四范式(北京)技术有限公司 Text positioning method and system and text positioning model training method and system
CN111353489A (en) * 2020-02-27 2020-06-30 平安国际智慧城市科技股份有限公司 Text image processing method and device, computer equipment and storage medium
CN111444918A (en) * 2020-04-01 2020-07-24 中移雄安信息通信科技有限公司 Image inclined text line detection model training and image inclined text line detection method
CN112287927A (en) * 2020-10-14 2021-01-29 中国人民解放军战略支援部队信息工程大学 Method and device for detecting inclination angle of text image
CN113569194A (en) * 2021-06-10 2021-10-29 中国人民解放军海军工程大学 Rotating rectangular box representation and regression method for target detection
CN114037822A (en) * 2021-10-28 2022-02-11 多伦科技股份有限公司 Method and system for detecting driving license
CN114359906A (en) * 2021-12-10 2022-04-15 武汉科技大学 Network image text recognition method and system based on multi-scale feature fusion
CN114266884A (en) * 2021-12-13 2022-04-01 浙江工业大学 Method for detecting sorting target of multi-form bottle-shaped articles positioned by rotating frame

Also Published As

Publication number Publication date
CN115359493A (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN108520229B (en) Image detection method, image detection device, electronic equipment and computer readable medium
US20240078646A1 (en) Image processing method, image processing apparatus, and non-transitory storage medium
EP3680609A1 (en) Antenna downward inclination angle measurement method based on multi-scale deep semantic segmentation network
JP7390730B2 (en) Analysis of captured images to determine test conclusions
EP3309751B1 (en) Image processing device, method, and program
US9787960B2 (en) Image processing apparatus, image processing system, image processing method, and computer program
US7965904B2 (en) Position and orientation measuring apparatus and position and orientation measuring method, mixed-reality system, and computer program
CN111428717B (en) Text recognition method, text recognition device, electronic equipment and computer readable storage medium
TWI716012B (en) Sample labeling method, device, storage medium and computing equipment, damage category identification method and device
CN104167109A (en) Detection method and detection apparatus for vehicle position
CN110260857A (en) Calibration method, device and the storage medium of vision map
CN111429482A (en) Target tracking method and device, computer equipment and storage medium
WO2021239156A1 (en) Traffic target recognition model training method, and traffic target positioning method and apparatus
KR20160128930A (en) Apparatus and method for detecting bar-type traffic sign in traffic sign recognition system
CN117152484B (en) Small target cloth flaw detection method based on improved YOLOv5s
CN110377670B (en) Method, device, medium and equipment for determining road element information
CN114549390A (en) Circuit board detection method, electronic device and storage medium
CN108052869B (en) Lane line recognition method, lane line recognition device and computer-readable storage medium
CN117876994A (en) Parking space recognition model training method, parking space recognition method and related device
CN105631849B (en) The change detecting method and device of target polygon
CN115359493B (en) Method and device for detecting rotary text
CN117115823A (en) Tamper identification method and device, computer equipment and storage medium
WO2022179016A1 (en) Lane detection method and apparatus, device, and storage medium
CN116203976A (en) Indoor inspection method and device for transformer substation, unmanned aerial vehicle and storage medium
CN112232132A (en) Target identification and positioning method fusing navigation information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant