WO2022095318A1

WO2022095318A1 - Character detection method and apparatus, electronic device, storage medium, and program

Info

Publication number: WO2022095318A1
Application number: PCT/CN2021/080318
Authority: WO
Inventors: 毕研广; 胡志强
Original assignee: 上海商汤智能科技有限公司
Priority date: 2020-11-06
Filing date: 2021-03-11
Publication date: 2022-05-12
Also published as: TW202219822A; CN112348025B; CN112348025A

Abstract

The present disclosure relates to a character detection method and apparatus, an electronic device, a storage medium, and a program, the method comprising: respectively performing prediction on multiple boundary lines of a first character sequence in an image to be processed to obtain prediction parameters of the multiple boundary lines of the first character sequence, the boundary lines of the first character sequence representing the boundary lines between an area in which the first character sequence is located and an area in which the first character sequence is not located; on the basis of the prediction parameters of the multiple boundary lines of the first character sequence, determining position information of the vertices of a boundary box of the first character sequence; and, on the basis of the position information of the vertices of the boundary box of the first character sequence, determining position information of the boundary box of the first character sequence. The embodiments of the present disclosure can increase the accuracy of character detection.

Description

Character detection method, device, electronic device, storage medium and program

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is based on a Chinese patent application with application number 202011229418.1 and an application date of November 06, 2020, and claims the priority of the Chinese patent application, the entire contents of which are incorporated herein by reference.

technical field

The present disclosure relates to the technical field of computer vision, and in particular, to a character detection method, device, electronic device, storage medium and program.

Background technique

Character detection in natural scenes is an important research field in computer vision, and has been applied to a variety of application scenarios, such as real-time text translation, document recognition, license plate recognition, etc. In related technologies, characters are in a rigid plane in practical application scenarios. , however, during the imaging process, due to the distortion and distortion of the camera's angle of view, the characters on the rigid plane appear as irregular arbitrary quadrilateral shapes; for these characters, it is necessary to accurately return and locate their four boundaries before subsequent character recognition can be performed. The correct character shape is corrected in the link, so as to correctly identify the character content.

SUMMARY OF THE INVENTION

The present disclosure provides a technical solution for character detection.

An embodiment of the present disclosure provides a character detection method, including:

Predicting multiple boundary lines of the first character sequence in the image to be processed respectively, to obtain prediction parameters of multiple boundary lines of the first character sequence, wherein the boundary line of the first character sequence represents the first character The dividing line between the region where the sequence is located and the region not where the first character sequence is located;

According to the prediction parameters of a plurality of boundary lines of the first character sequence, determine the position information of the vertices of the bounding box of the first character sequence;

According to the position information of the vertices of the bounding box of the first character sequence, the position information of the bounding box of the first character sequence is determined. In this way, by predicting the prediction parameters of multiple boundary lines of the first character sequence in the image to be processed, the position information of the vertices of the bounding box of the first character sequence is determined, and according to the position of the vertices of the bounding box of the first character sequence information, determine the position information of the bounding box of the first character sequence, thereby disassembling the polygon (such as quadrilateral) bounding box of the character sequence into multiple (such as four) independent boundary lines, and carry out a separate process for each independent boundary line. Therefore, the detection of each boundary line will not be disturbed by two different vertices, thereby improving the accuracy of character detection.

In some embodiments of the present disclosure, the multiple boundary lines of the first character sequence in the image to be processed are predicted respectively, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained, including:

Based on the to-be-processed image, with respect to the first feature point related to the first character sequence, respectively predict the parameters corresponding to the first feature point of the plurality of boundary lines of the first character sequence;

The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points. In this way, based on the image to be processed, for the first feature points related to the first character sequence, the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature points are respectively predicted, and according to the multiple boundaries of the first character sequence The line corresponds to the parameters of the first feature point, and the prediction parameters of the plurality of boundary lines of the first character sequence are determined, thereby predicting the parameters of the boundary line of the first character sequence based on the feature points related to the first character sequence, thereby It helps to improve the efficiency of obtaining the prediction parameters of the boundary line, and helps to improve the accuracy of the obtained prediction parameters.

In some embodiments of the present disclosure, the method further includes:

Predict the probability that the position of the pixel in the to-be-processed image belongs to the character;

The first feature point is determined according to the probability that the position of the pixel in the image to be processed belongs to a character. In this way, by predicting the probability that the position of the pixel in the image to be processed belongs to the character, and determining the first feature point according to the probability that the position of the pixel in the image to be processed belongs to the character, it is possible to accurately determine the first feature point related to the first character sequence the first feature point of . Predicting the parameters of the boundary line of the first character sequence based on the first feature point thus determined helps to further improve the efficiency of obtaining the prediction parameters of the boundary line, and helps to further improve the accuracy of the obtained prediction parameters .

In some embodiments of the present disclosure, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include:

Distance parameters and angle parameters of multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system in which the feature points are poles. In this way, by mapping the straight line equation of the boundary line in the Cartesian coordinate system to the polar coordinate system, the distance parameter and angle parameter with clear physical meaning and independent of each other in the image are obtained, which reduces the parameter amount and correlation, and Conducive to online learning.

In some embodiments of the present disclosure, the prediction parameters of the plurality of boundary lines of the first character sequence are determined according to parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points, include:

Mapping the distance parameters and angle parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to the Cartesian coordinate system to obtain the plurality of boundary lines of the first character sequence A parameter corresponding to the first feature point in the Cartesian coordinate system;

According to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system, the prediction parameters of the plurality of boundary lines of the first character sequence are determined. In this way, by mapping the distance parameters and angle parameters of the multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to the Cartesian coordinate system, the multiple boundary lines of the first character sequence are obtained in the Cartesian coordinate system. The parameters corresponding to the first feature point in the coordinate system, and according to the parameters corresponding to the first feature point of the multiple boundary lines of the first character sequence in the Cartesian coordinate system, the prediction of multiple boundary lines of the first character sequence is determined parameters, so that the predicted parameters of the boundary line can be obtained by regression based on the parameters in different polar coordinate systems.

In some embodiments of the present disclosure, the plurality of boundary lines of the first character sequence include an upper boundary line, a right boundary line, a lower boundary line and a left boundary line of the first character sequence. In this way, since in most cases, the shape of the character sequence is a quadrilateral, according to this implementation, it is helpful to obtain more accurate position information of the bounding box of the character sequence in most cases.

In some embodiments of the present disclosure, based on the to-be-processed image, for the first feature points related to the first character sequence, it is respectively predicted that a plurality of boundary lines of the first character sequence corresponds to the first character sequence. Parameters of feature points, including:

Input the image to be processed into a pre-trained neural network, and through the neural network, for the first feature points related to the first character sequence, respectively predict that multiple boundary lines of the first character sequence correspond to the first character sequence. Parameters of feature points. In this way, for the first feature points related to the first character sequence, the pre-trained neural network respectively predicts the parameters corresponding to the first feature points of the multiple boundary lines of the first character sequence, thereby improving the speed of parameter prediction, And can improve the accuracy of the predicted parameters.

In some embodiments of the present disclosure, the method further includes:

The probability that the position of the pixel in the image to be processed belongs to a character is predicted through the neural network. In this way, the pre-trained neural network predicts the probability that the pixel location in the image to be processed belongs to the character, thereby improving the speed of predicting the probability that the pixel location belongs to the character, and improving the accuracy of the predicted probability.

In some embodiments of the present disclosure, before inputting the to-be-processed image into a pre-trained neural network, the method further includes:

Input the training image into the neural network, and through the neural network, for the second feature points related to the second character sequence in the training image, respectively predict that a plurality of boundary lines of the second character sequence correspond to the The predicted value of the parameter of the second feature point;

According to the multiple boundary lines of the second character sequence corresponding to the predicted value of the parameter of the second feature point, and the multiple boundary lines of the second character sequence corresponding to the true value of the parameter of the second feature point value to train the neural network. In this way, by decomposing the polygon (such as quadrilateral) bounding box of the character sequence into multiple (such as four) independent boundary lines, each independent boundary line is detected separately, so that the neural network will not be affected by regression vertices. The training disturbance is brought about, thereby improving the learning efficiency and detection effect of the neural network, and the neural network trained according to this implementation can learn the ability to accurately predict the parameters of the boundary line of the character sequence.

In some embodiments of the present disclosure, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature point include: the plurality of boundary lines of the second character sequence are at the second feature point The distance parameter and the angle parameter under the corresponding polar coordinate system, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole;

The multiple boundary lines according to the second character sequence correspond to the predicted value of the parameter of the second feature point, and the multiple boundary lines of the second character sequence correspond to the parameter of the second feature point The true value of , train the neural network, including:

According to the predicted value of the distance parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence, and the distance parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence The true value of , train the neural network;

and / or,

According to the predicted value of the angle parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence, and the angle parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence The true value of , to train the neural network. In this way, by mapping the straight line equation in the Cartesian coordinate system to the polar coordinate system, the amount of parameters and correlation are reduced, and the actual physical meaning of the parameters is given, which is beneficial to network learning, and learning to detect the character sequence by training the neural network. Each boundary line corresponds to the distance and angle of the feature points, so that the detection of the boundary lines does not interfere with each other, so that the learning efficiency and detection effect of the neural network can be improved.

In some embodiments of the present disclosure, the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the distance parameter of the second feature point, and the plurality of boundary lines of the second character sequence Corresponding to the true value of the distance parameter of the second feature point, training the neural network, including:

For any boundary line among the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the smaller value and the larger value of the distance parameter of the second feature point and the predicted value The ratio of , to train the neural network. In this way, for any one of the multiple boundary lines of the second character sequence, according to the ratio of the smaller value to the larger value among the true value and predicted value of the distance parameter corresponding to the boundary line to the second feature point, By training the neural network, the distance parameters of different sizes in different application scenarios can be normalized, which can help to perform multi-scale character detection, that is, it helps to achieve higher accuracy in character detection of different scales. .

In some embodiments of the present disclosure, the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence Corresponding to the true value of the angle parameter of the second feature point, training the neural network, including:

For any one of the multiple boundary lines of the second character sequence, determine the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point;

The neural network is trained according to the sine of the half angle of the absolute value. In this way, for any one of the multiple boundary lines of the second character sequence, the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point is determined, and according to the absolute value of the difference The sine value of half the angle is used to train the neural network, so that the learning of the neural network will not be disturbed due to the confusion of 0 and 2π, thereby helping to improve the learning efficiency and detection effect of the neural network.

In some embodiments of the present disclosure, the second feature points include feature points in an effective area corresponding to the second character area. In this way, when calculating the loss function of the neural network, only the feature points in the valid region corresponding to the second character region are supervised, and the feature points outside the valid region corresponding to the second character region are not supervised, which helps to reduce the network burden.

In some embodiments of the present disclosure, the method further includes:

Predicting the probability that the position of the pixel in the training image belongs to a character via the neural network;

The neural network is trained according to the probability that the positions of the pixels in the training image belong to characters, and the labeled data that the positions of the pixels in the training images belong to the characters. In this way, the neural network can learn the ability to predict the probability that the position of the pixel belongs to the character.

In some embodiments of the present disclosure, the training of the neural network according to the probability that the position of the pixel in the training image belongs to a character, and the labeled data that the position of the pixel in the training image belongs to the character, includes:

The neural network is trained according to the probability that the position of the pixel in the valid region corresponding to the second character sequence belongs to the character, and the labeled data that the position of the pixel in the valid region belongs to the character. In this way, by training the neural network according to the probability that the position of the pixel in the effective area corresponding to the second character sequence belongs to the character, and the labeled data that the position of the pixel in the effective area belongs to the character, the neural network can learn the ability of character segmentation , and can improve the efficiency of neural network learning character segmentation.

In some embodiments of the present disclosure, the method further includes:

obtaining the position information of the real bounding box of the second character sequence;

According to the position information of the real bounding box and the preset ratio, the real bounding box is reduced to obtain an effective area corresponding to the second character sequence. In this way, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to reduce the network burden.

In some embodiments of the present disclosure, according to the position information of the real bounding box and a preset ratio, reducing the real bounding box to obtain an effective area corresponding to the second character sequence, including:

Determine the anchor point of the real bounding box according to the position information of the real bounding box, wherein the anchor point of the real bounding box is the intersection of the diagonal lines of the real bounding box;

According to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of the distance to the second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area. In this way, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to improve the learning efficiency and prediction accuracy of the neural network.

For descriptions of the effects of the following apparatuses, electronic devices, etc., reference may be made to the descriptions of the above-mentioned methods, which will not be repeated here.

The embodiment of the present disclosure also provides a character detection device, including:

The first prediction module is configured to respectively predict multiple boundary lines of the first character sequence in the image to be processed, and obtain prediction parameters of the multiple boundary lines of the first character sequence, wherein the boundary of the first character sequence The line represents the dividing line between the area where the first character sequence is located and the area not where the first character sequence is located;

a first determining module, configured to determine the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of a plurality of boundary lines of the first character sequence;

The second determining module is configured to determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence.

In some embodiments of the present disclosure, the first prediction module is configured to, based on the to-be-processed image, respectively predict a plurality of boundary lines of the first character sequence for first feature points related to the first character sequence a parameter corresponding to the first feature point;

The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points.

In some embodiments of the present disclosure, the apparatus further includes:

a second prediction module, configured to predict the probability that the position of the pixel in the to-be-processed image belongs to a character;

The third determining module is configured to determine the first feature point according to the probability that the position of the pixel in the image to be processed belongs to a character.

Distance parameters and angle parameters of multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system in which the feature points are poles.

In some embodiments of the present disclosure, the first prediction module is configured to map distance parameters and angle parameters of a plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to A Cartesian coordinate system, to obtain the parameters corresponding to the first feature points of the plurality of boundary lines of the first character sequence under the Cartesian coordinate system;

According to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system, the prediction parameters of the plurality of boundary lines of the first character sequence are determined.

In some embodiments of the present disclosure, the plurality of boundary lines of the first character sequence include an upper boundary line, a right boundary line, a lower boundary line and a left boundary line of the first character sequence.

In some embodiments of the present disclosure, the first prediction module is configured to input the to-be-processed image into a pre-trained neural network, and through the neural network, respectively predict the first feature points related to the first character sequence through the neural network. A plurality of boundary lines of the first character sequence correspond to parameters of the first feature point.

In some embodiments of the present disclosure, the apparatus further includes:

The third prediction module is configured to predict the probability that the position of the pixel in the image to be processed belongs to the character through the neural network.

In some embodiments of the present disclosure, the apparatus further includes:

The fourth prediction module is configured to input the training image into the neural network, and through the neural network, for the second feature points related to the second character sequence in the training image, respectively predict the number of the second character sequence. A boundary line corresponds to the predicted value of the parameter of the second feature point;

The first training module is configured to correspond to the predicted value of the parameter of the second feature point according to the plurality of boundary lines of the second character sequence, and the plurality of boundary lines of the second character sequence correspond to the first The true values of the parameters of the two feature points, train the neural network.

The first training module is configured to correspond to the predicted value of the distance parameter of the second feature point according to the plurality of boundary lines of the second character sequence, and the plurality of boundary lines of the second character sequence correspond to the predicted value of the distance parameter of the second feature point. The true value of the distance parameter of the second feature point is used to train the neural network;

and / or,

According to the predicted value of the angle parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence, and the angle parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence The true value of , to train the neural network.

In some embodiments of the present disclosure, the first training module is configured to, for any one of a plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the second feature point The ratio of the smaller to larger of the true and predicted values of the distance parameter, trains the neural network.

In some embodiments of the present disclosure, the first training module is configured to, for any one of a plurality of boundary lines of the second character sequence, determine that the boundary line corresponds to the second feature point. The absolute value of the difference between the true value of the angle parameter and the predicted value;

The neural network is trained according to the sine of the half angle of the absolute value.

In some embodiments of the present disclosure, the second feature points include feature points in an effective area corresponding to the second character area.

In some embodiments of the present disclosure, the apparatus further includes:

a fifth prediction module, configured to predict the probability that the position of the pixel in the training image belongs to the character via the neural network;

The second training module is configured to train the neural network according to the probability that the position of the pixel in the training image belongs to a character, and the labeled data that the position of the pixel in the training image belongs to the character.

In some embodiments of the present disclosure, the second training module is configured to, according to the probability that the position of the pixel in the effective area corresponding to the second character sequence belongs to the character, and the position of the pixel in the effective area belongs to the character the labeled data to train the neural network.

In some embodiments of the present disclosure, the apparatus further includes:

an acquisition module, configured to acquire the position information of the real bounding box of the second character sequence;

The shrinking module is configured to shrink the real bounding box according to the position information of the real bounding box and a preset ratio to obtain an effective area corresponding to the second character sequence.

In some embodiments of the present disclosure, the reduction module is configured to determine the anchor point of the real bounding box according to the position information of the real bounding box, wherein the anchor point of the real bounding box is the real boundary the intersection of the diagonals of the boxes;

According to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of the distance to the second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area.

Embodiments of the present disclosure also provide an electronic device, comprising: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to call executable instructions stored in the memory The instruction is executed to execute the character detection method described in any of the above embodiments.

Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the character detection method described in any of the foregoing embodiments.

An embodiment of the present disclosure further provides a computer program, where the computer program includes computer-readable codes, and when the computer-readable codes are executed in an electronic device, a processor of the electronic device executes any of the foregoing embodiments The described character detection method.

In the embodiment of the present disclosure, the prediction parameters of the multiple boundary lines of the first character sequence are obtained by respectively predicting multiple boundary lines of the first character sequence in the image to be processed. The prediction parameter determines the position information of the vertices of the bounding box of the first character sequence, and determines the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence. The polygon (such as quadrilateral) bounding box is disassembled into multiple (such as four) independent boundary lines, and each independent boundary line is detected separately, so that the detection of each boundary line will not be disturbed by two different vertices , which can improve the accuracy of character detection.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the technical solutions of the present disclosure.

FIG. 1 shows a flowchart of a character detection method provided by an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a system architecture to which the character detection method according to an embodiment of the present disclosure is applied;

Fig. 3 shows the schematic diagram of the distance parameter and the angle parameter of 4 boundary lines of the first character sequence under the polar coordinate system corresponding to a certain first feature point;

FIG. 4 shows a schematic diagram of the real bounding box 31 and the effective area 32 of the second character area;

FIG. 5 shows a schematic diagram of an application scenario of an embodiment of the present disclosure;

6 shows a block diagram of a character detection apparatus provided by an embodiment of the present disclosure;

FIG. 7 shows a block diagram of an electronic device 800 provided by an embodiment of the present disclosure;

FIG. 8 shows a block diagram of an electronic device 1900 provided by an embodiment of the present disclosure.

Detailed ways

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.

In addition, in order to better illustrate the present disclosure, numerous specific details are set forth in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.

In the related art, a rectangular frame or a rotated rectangular frame is mostly used to detect characters, but these rectangular frames or rotated rectangular frames cannot accurately locate the character boundary, which affects subsequent character recognition. In addition, the related art also proposes a character detection method in which a bounding box of a character is formed by regressing four vertices of a quadrilateral. However, a vertex is actually formed by the intersection of two adjacent edges, and the regression of each vertex affects both edges, so each edge is disturbed by two different vertices, which affects the accuracy of character detection results.

Based on the above problems, embodiments of the present disclosure provide a character detection method, apparatus, electronic device, storage medium, and program, by decomposing a polygonal (eg, quadrilateral) bounding box of a character into multiple (eg, four) independent boundary lines , each independent boundary line is independently detected, so that the detection of each boundary line will not be disturbed by two different vertices, so that the accuracy of character detection can be improved.

The character detection method provided by the embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

FIG. 1 shows a flowchart of a character detection method provided by an embodiment of the present disclosure. Wherein, the execution body of the character detection method may be a character detection device. In some embodiments of the present disclosure, the character detection method may be performed by a terminal device or a server or other processing device. The terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable devices, etc. In some embodiments of the present disclosure, the character detection method may be implemented by a processor invoking computer-readable instructions stored in a memory. As shown in FIG. 1 , the character detection method includes steps S11 to S13.

In step S11, the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained.

Wherein, the boundary line of the first character sequence represents the dividing line between the area where the first character sequence is located and the area not where the first character sequence is located.

In an embodiment of the present disclosure, character detection may refer to detecting a position of a character and/or character sequence in an image, for example, may refer to detecting a position of a bounding box of a character and/or character sequence in an image. In the embodiment of the present disclosure, the to-be-processed image may represent an image that needs character detection. The first character sequence represents any character sequence in the image to be processed. The image to be processed may include one or more character sequences. The first sequence of characters may include one or more characters, and the characters may include at least one of words, letters, numbers, punctuation marks, operation symbols, and the like. In some embodiments of the present disclosure, in the image to be processed, if the distance between any two characters is less than or equal to a preset first distance threshold, it is determined that the two characters belong to the same character sequence. In other embodiments of the present disclosure, when the writing direction in the image to be processed is the horizontal direction, if any two characters belong to the same line of text, and the distance between the two characters is less than or equal to a preset With the second distance threshold, it is determined that the two characters belong to the same character sequence; when the writing direction in the image to be processed is the vertical direction, if any two characters belong to the same column of characters, and the difference between the two characters is If the distance is less than or equal to the preset third distance threshold, it is determined that the two characters belong to the same character sequence. The writing direction may represent the positional relationship between two adjacent characters. For example, if the positional relationship between two adjacent characters is the left-right relationship, the writing direction is the horizontal direction; if the positional relationship between the two adjacent characters is the up-down relationship, the writing direction is the vertical direction.

In the embodiment of the present disclosure, the boundary line of the first character sequence represents the boundary line between the area where the first character sequence is located and the area where the non-first character sequence is located, wherein the area where the non-first character sequence is located may be a background area (ie non-character region) and/or other character sequences. The boundary line of the first character sequence may be a straight line or a curved line, which is not limited herein. The prediction parameter of any one boundary line of the first character sequence may represent the parameter of the predicted boundary line. When the boundary line of the first character sequence is a straight line, the prediction parameter of any boundary line of the first character sequence may represent the prediction parameter of the line equation corresponding to the boundary line. Based on the prediction parameters of the line equation corresponding to the boundary line, the position of the boundary line can be determined.

In the embodiment of the present disclosure, when the boundary line of the first character sequence is a straight line, the number of boundary lines of the first character sequence is at least three, and multiple boundary lines of the first character sequence may enclose the first character The bounding box of the sequence. The bounding box of the first character sequence may be a polygon, and accordingly, the number of boundary lines of the first character sequence may correspond to the number of sides of the bounding box of the first character sequence. For example, if the bounding box of the first character sequence is a quadrilateral, the number of boundary lines of the first character sequence is 4. Of course, the bounding box of the first character sequence may also be a pentagon, a triangle, etc., which is not limited herein.

In some embodiments of the present disclosure, the plurality of boundary lines of the first character sequence include an upper boundary line, a right boundary line, a lower boundary line, and a left boundary line of the first character sequence. In this embodiment, the bounding box of the first character sequence is a quadrilateral, and the number of boundary lines of the first character sequence is four. Wherein, the upper boundary line of the first character sequence may indicate that the direction of the characters in the first character sequence is used as a reference for dividing the area where the first character sequence is located and the area where the non-first character sequence above the first character sequence is located. Demarcation line; the right boundary line of the first character sequence, which can be used to demarcate the area where the first character sequence is located and the area where the non-first character sequence to the right of the first character sequence is located with reference to the direction of the characters in the first character sequence. Demarcation line; the lower boundary line of the first character sequence, which can be used to demarcate the area where the first character sequence is located and the area where the non-first character sequence below the first character sequence is located with reference to the direction of the characters in the first character sequence. Demarcation line; the left boundary line of the first character sequence, which can be used to divide the area where the first character sequence is located and the area where the non-first character sequence to the left of the first character sequence is located with reference to the direction of the characters in the first character sequence. dividing line. Since the shape of the character sequence is a quadrilateral in most cases, according to this embodiment, it is helpful to obtain more accurate position information of the bounding box of the character sequence in most cases.

In this embodiment, the multiple boundary lines of the first character sequence in the image to be processed are predicted respectively, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained, which may include: the upper part of the first character sequence in the image to be processed Predict the boundary line to obtain the prediction parameters of the straight line equation corresponding to the upper boundary line of the first character sequence; predict the right boundary line of the first character sequence in the image to be processed to obtain the prediction of the straight line equation corresponding to the right boundary line of the first character sequence parameter; predict the lower boundary line of the first character sequence in the image to be processed, and obtain the prediction parameters of the straight line equation corresponding to the lower boundary line of the first character sequence; predict the left boundary line of the first character sequence in the image to be processed, and obtain the first character The predicted parameters of the line equation corresponding to the left boundary line of the series.

In step S12, the position information of the vertices of the bounding box of the first character sequence is determined according to the prediction parameters of the plurality of boundary lines of the first character sequence.

In the embodiment of the present disclosure, according to the prediction parameters of the plurality of boundary lines of the first character sequence, the intersection of the plurality of boundary lines of the first character sequence can be obtained, and the intersection of the plurality of boundary lines of the first character sequence can be calculated. Position information, which is the position information of the vertices of the bounding box of the first character sequence. For example, the multiple boundary lines of the first character sequence include an upper boundary line, a right boundary line, a lower boundary line and a left boundary line of the first character sequence; according to the prediction parameters of the line equation corresponding to the upper boundary line of the first character sequence and the first character The prediction parameters of the straight line equation corresponding to the right boundary line of the sequence can obtain the intersection of the upper boundary line of the first character sequence and the right boundary line of the first character sequence, and the upper boundary line of the first character sequence and the first character sequence can be obtained. The position information of the intersection of the right boundary line is used as the position information of the upper right corner vertex of the bounding box of the first character sequence; according to the prediction parameter of the line equation corresponding to the right boundary line of the first character sequence and the line equation corresponding to the lower boundary line of the first character sequence The prediction parameters of The position information of the vertex at the lower right corner of the bounding box of the character sequence; according to the prediction parameter of the straight line equation corresponding to the lower boundary line of the first character sequence and the prediction parameter of the straight line equation corresponding to the left border line of the first character sequence, the first character sequence can be obtained The intersection of the lower boundary line of the first character sequence and the left boundary line of the first character sequence, and the position information of the intersection of the lower boundary line of the first character sequence and the left boundary line of the first character sequence can be used as the lower left corner vertex of the bounding box of the first character sequence. Position information; according to the prediction parameter of the straight line equation corresponding to the left boundary line of the first character sequence and the prediction parameter of the straight line equation corresponding to the upper boundary line of the first character sequence, the left boundary line of the first character sequence and the first character sequence can be obtained. The intersection point of the upper boundary line, and the position information of the intersection point of the left boundary line of the first character sequence and the upper boundary line of the first character sequence can be used as the position information of the upper left corner vertex of the bounding box of the first character sequence. In this embodiment of the present disclosure, the position information of the vertices of the bounding box of the first character sequence may be represented by the coordinates of the vertices of the bounding box of the first character sequence. For example, the location information of the vertices of the bounding box of the first character sequence may include the coordinates of the upper left vertex, the upper right vertex, the lower right vertex and the lower left vertex of the bounding box of the first character sequence.

In step S13, the position information of the bounding box of the first character sequence is determined according to the position information of the vertices of the bounding box of the first character sequence.

In this embodiment of the present disclosure, the position information of the vertices of the bounding box of the first character sequence may be used as the position information of the bounding box of the first character sequence. For example, the location information of the bounding box of the first character sequence may include coordinates of respective vertices of the bounding box of the first character sequence. Of course, in the case where the bounding box of the first character sequence is a rectangle, the coordinates of any vertex of the bounding box of the first character sequence and the side lengths of two sides connected to the vertex can also be used to represent the first character sequence The position information of the bounding box is not limited here.

FIG. 2 is a schematic diagram of a system architecture to which a character detection method according to an embodiment of the present disclosure can be applied; as shown in FIG. 2 , the system architecture includes: an image acquisition terminal 201 , a network 202 and a location determination terminal 203 . In order to support an exemplary application, the image acquisition terminal 201 and the position determination terminal 203 establish a communication connection through the network 202, the image acquisition terminal 201 reports the image to be processed to the position determination terminal 203 through the network 202, and the position determination terminal 203 responds to the received image. The image to be processed, and the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted to obtain the prediction parameters of the multiple boundary lines of the first character sequence, and the prediction parameters based on the multiple boundary lines of the first character sequence are obtained. , determine the position information of the vertices of the bounding box of the first character sequence; determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence. Finally, the location determination terminal 203 uploads the determined location information to the network 202 , and sends the determined location information to the image acquisition terminal 201 through the network 202 .

As an example, the image acquisition terminal 201 may include an image acquisition device, and the location determination terminal 203 may include a visual processing device or a remote server with visual information processing capability. The network 202 can be wired or wireless. Wherein, when it is determined that the location terminal 203 is a visual processing device, the image acquisition terminal 201 can be connected to the visual processing device through a wired connection, such as data communication through a bus; when it is determined that the location terminal 203 is a remote server, the image acquisition terminal 201 can perform data interaction with a remote server through a wireless network.

Alternatively, in some scenarios, the image acquisition terminal 201 may be a vision processing device with an image acquisition module, which is specifically implemented as a host with a camera. At this time, the character detection method of the embodiment of the present disclosure may be executed by the image acquisition terminal 201 , and the above-mentioned system architecture may not include the network 202 and the location determination terminal 203 .

In some embodiments of the present disclosure, the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted to obtain the prediction parameters of the multiple boundary lines of the first character sequence, including: based on the image to be processed, for A first feature point related to a character sequence, respectively predict the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature point; according to the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature point, determine Prediction parameters for multiple boundary lines of the first character sequence. In this embodiment, the first feature points represent feature points associated with the first character sequence. The feature points may represent points where the gray value of the image changes drastically and/or points with large curvature on the edge of the image (ie, the intersection of two edges). The number of the first feature points may be multiple, of course, may also be one, which is not limited here. For example, when the number of first feature points is multiple and the number of boundary lines of the first character sequence is 4, for any first feature point, it is predicted that each boundary line of the first character sequence corresponds to parameters of the first feature point, and for any boundary line, the prediction parameter of the boundary line is determined according to the parameters of the boundary line corresponding to each of the first feature points. For example, the parameters of the boundary line corresponding to each of the first feature points can be regressed to obtain the predicted parameters of the boundary line. In this embodiment, based on the image to be processed, for the first feature points related to the first character sequence, parameters corresponding to the first feature points of multiple boundary lines of the first character sequence are respectively predicted, and according to the first character sequence The plurality of boundary lines correspond to the parameters of the first feature point, and the prediction parameters of the plurality of boundary lines of the first character sequence are determined, whereby the parameters of the boundary line of the first character sequence are based on the feature points related to the first character sequence. The prediction is performed, thereby helping to improve the efficiency of obtaining the prediction parameters of the boundary line, and helping to improve the accuracy of the obtained prediction parameters. Of course, in other embodiments of the present disclosure, the multiple boundary lines of the first character sequence may also be determined based on all pixel points related to the first character sequence (not limited to the first feature points related to the first character sequence). The prediction parameters are not limited here.

As an example of this embodiment, the method further includes: predicting the probability that the position of the pixel in the image to be processed belongs to the character; and determining the first feature point according to the probability that the position of the pixel in the image to be processed belongs to the character. In this example, the probability that each pixel in the image to be processed belongs to a character can be predicted. According to the probability that the position of each pixel in the image to be processed belongs to a character, the area occupied by each character sequence in the image to be processed can be preliminarily determined. For any first character sequence, the first feature point may be determined according to the preliminarily determined feature points in the area occupied by the first character sequence. For example, all or part of the feature points in the area occupied by the initially determined first character sequence may be determined as the first feature points. In this example, by predicting the probability that the position of the pixel in the image to be processed belongs to the character, and determining the first feature point according to the probability that the position of the pixel in the image to be processed belongs to the character, the first feature point can be accurately determined. The first feature point associated with the character sequence. Predicting the parameters of the boundary line of the first character sequence based on a feature point thus determined helps to further improve the efficiency and accuracy of obtaining the prediction parameters of the boundary line.

In the examples provided by other embodiments of the present disclosure, the feature points in the image to be processed may also be used as the first feature points respectively, without the need to predict the character probability. For example, when there is only one first character sequence in the image to be processed, and the first character sequence occupies or almost occupies the image to be processed, the feature points in the image to be processed can be respectively used as the first feature points.

As an example of this embodiment, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include: distance parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point and angle parameters, wherein the polar coordinate system corresponding to the first feature point represents a polar coordinate system with the first feature point as a pole. In this example, the polar coordinate system corresponding to the first feature point may use the axis whose pole points to the positive direction of the x-axis as the polar axis. Of course, those skilled in the art can flexibly set the polar axis according to actual application scenario requirements, which is not limited here. In this example, the distance parameter of any boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point can represent the distance between the first feature point and the boundary line in the polar coordinate system corresponding to the first feature point The minimum distance between , that is, it can represent the length of the vertical line segment from the first feature point to the boundary line in the polar coordinate system corresponding to the first feature point; any boundary line of the first character sequence corresponds to the first feature point. The angle parameter in the polar coordinate system of the The included angle between , wherein the vertical point on the boundary line represents the intersection point of the vertical line segment from the first feature point to the boundary line and the boundary line.

In an example, the equation of a straight line in a Cartesian coordinate system (a Cartesian coordinate system or an oblique coordinate system) can be expressed by formula (1):

Ax+By+C=0 Formula (1);

where A, B and C represent the parameters of the equation of the line.

However, when C≠0, there is redundancy in the parameters and the correlations between the parameters in the equation of the straight line shown in Equation 1. In addition, the parameters of the straight line equation in the Cartesian coordinate system have no clear physical meaning in the image, which is not conducive to network learning.

In this example, the equation of the line in the Cartesian coordinate system can be converted to the polar coordinate system, resulting in formula (2):

ρ=x cosθ+y sinθ Formula (2);

Among them, ρ can represent the distance parameter of any boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point, and θ can represent the polar coordinate corresponding to the first feature point of any boundary line of the first character sequence The angle parameter under the system.

Correspondingly, the parameters of the straight line equation can be expressed by formula (3):

A=cosθ, B=sinθ, C=-ρ Formula (3);

FIG. 3 is a schematic diagram showing the distance parameters and angle parameters of the four boundary lines of the first character sequence in a polar coordinate system corresponding to a certain first feature point. As shown in FIG. 3 , the distance parameter of the upper boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point is ρ ₁ , and the angle parameter is θ ₁ ; the right boundary line of the first character sequence is in the first character sequence. The distance parameter in the polar coordinate system corresponding to the feature point is ρ ₂ , and the angle parameter is θ ₂ ; the distance parameter of the lower boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point is ρ ₃ , and the angle parameter is θ ₃ ; the distance parameter of the left boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point is ρ ₄ , and the angle parameter is θ ₄ .

In this example, by mapping the straight line equation of the boundary line in the Cartesian coordinate system to the polar coordinate system, the distance parameters and angle parameters that have clear physical meaning and are independent of each other in the image are obtained, reducing the amount of parameters and related sex, and is conducive to online learning.

Wherein, determining the prediction parameters of the plurality of boundary lines of the first character sequence according to the parameters corresponding to the first feature points of the plurality of boundary lines of the first character sequence includes: placing the plurality of boundary lines of the first character sequence on the first The distance parameters and angle parameters in the polar coordinate system corresponding to the feature points are mapped to the Cartesian coordinate system, and the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system are obtained; according to the first character The multiple boundary lines of the sequence correspond to the parameters of the first feature point in the Cartesian coordinate system, and the prediction parameters of the multiple boundary lines of the first character sequence are determined. In this example, when the number of the first feature points is multiple, the multiple first feature points correspond to different polar coordinate systems, wherein the polar coordinate system corresponding to any first feature point is based on the first feature point. Feature points are poles. Therefore, for any boundary line of the first character sequence, when regressing the distance parameter and angle parameter of the boundary line in the polar coordinate system corresponding to the plurality of first feature points to obtain the prediction parameter of the boundary line, the prediction parameter of the boundary line can be obtained by first The distance parameters and angle parameters of the boundary line in the polar coordinate system corresponding to the plurality of first feature points are mapped to the same Cartesian coordinate system to obtain the boundary line corresponding to the plurality of feature points in the Cartesian coordinate system. parameters, and then perform regression according to the parameters of the boundary line corresponding to multiple feature points in the Cartesian coordinate system to obtain the predicted parameters of the boundary line. Wherein, by mapping the distance parameters and angle parameters of the multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to the Cartesian coordinate system, the multiple boundary lines of the first character sequence are obtained in Cartesian coordinates. The parameters corresponding to the first feature point in the coordinate system, and according to the parameters corresponding to the first feature point of the multiple boundary lines of the first character sequence in the Cartesian coordinate system, the prediction of multiple boundary lines of the first character sequence is determined parameters, so that the predicted parameters of the boundary line can be obtained by regression based on the parameters in different polar coordinate systems.

As shown in FIG. 3 , the prediction parameters of the upper boundary line of the first character sequence are A ₁ , B ₁ and C ₁ , that is, the predicted straight line equation of the upper boundary line of the first character sequence can be expressed as A ₁ x+B ₁ y+C ₁ =0; the prediction parameters of the right boundary line of the first character sequence are A ₂ , B ₂ and C ₂ , that is, the predicted straight line equation of the right boundary line of the first character sequence can be expressed as A ₂ x+B ₂ y +C ₂ =0; the prediction parameters of the lower boundary line of the first character sequence are A ₃ , B ₃ and C ₃ , that is, the linear equation of the predicted lower boundary line of the first character sequence can be expressed as A ₃ x+B ₃ y+ C ₃ =0; the prediction parameters of the left boundary line of the first character sequence are A ₄ , B ₄ and C ₄ , that is, the predicted straight line equation of the left boundary line of the first character sequence can be expressed as A ₄ x+B ₄ y +C ₄ =0. That is, according to formulas (4) to (6), the coordinates of each vertex of the bounding box of the first character sequence can be obtained:

D _kl =A _k B _l -A _l B _k formula (4);

Among them, 1≤k≤4, 1≤l≤4, k and l are integers. For example, (x ₁₂ , y ₁₂ ) may represent the coordinates of the upper right vertex of the bounding box of the first character sequence, (x ₂₃ , y ₂₃ ) may represent the coordinates of the lower right vertex of the bounding box of the first character sequence, (x ₃₄ , y ₃₄ ) may represent the coordinates of the lower left corner vertex of the bounding box of the first character sequence, and (x ₄₁ , y ₄₁ ) may represent the coordinates of the upper left corner vertex of the bounding box of the first character sequence.

In other examples, the parameters of any boundary line of the first character sequence corresponding to the first feature point may include parameters of the boundary line predicted based on the first feature point in a Cartesian coordinate system, which is not limited herein.

In an example, based on the image to be processed, for the first feature point related to the first character sequence, respectively predicting the parameters corresponding to the first feature point of multiple boundary lines of the first character sequence, including: inputting the image to be processed into The pre-trained neural network respectively predicts parameters corresponding to the first feature points of the plurality of boundary lines of the first character sequence for the first feature points related to the first character sequence via the neural network. Wherein, for the first feature points related to the first character sequence, the pre-trained neural network respectively predicts the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature points, thereby improving the speed of predicting parameters and accuracy. At the same time, the parameters corresponding to the first feature points of the multiple boundary lines of the first character sequence can also be predicted by using a pre-established model, function, etc., which is not limited here.

In the embodiments provided by the present disclosure, the probability that the position of the pixel in the image to be processed belongs to the character can also be predicted through the neural network. The pre-trained neural network is used to predict the probability that the position of the pixel in the image to be processed belongs to the character, thereby improving the speed of predicting the probability that the position of the pixel belongs to the character, and improving the accuracy of the predicted probability. Of course, in other examples, a pre-established model, function, etc. can also be used to predict the probability that the location of the pixel in the image to be processed belongs to a character, which is not limited here.

In some embodiments of the present disclosure, before inputting the image to be processed into the pre-trained neural network, the training image may also be input into the neural network, and through the neural network, for the second feature point related to the second character sequence in the training image, Predicting the predicted values of the parameters of the second feature point corresponding to the multiple boundary lines of the second character sequence respectively; according to the predicted values of the parameters of the second feature point corresponding to the multiple boundary lines of the second character sequence, and the second character sequence The multiple boundary lines correspond to the true values of the parameters of the second feature point, training the neural network.

In the related art, a bounding box of a character is constructed by regressing four vertices of a quadrilateral. Since the vertex is actually formed by the intersection of two adjacent edges, the regression of each vertex will affect the two edges, therefore, each edge will be disturbed by two different vertices, thus affecting the learning efficiency and detection effect of the network. In the embodiments provided by the present disclosure, by decomposing the polygon (such as quadrilateral) bounding box of the character sequence into multiple (such as four) independent boundary lines, each independent boundary line is independently detected, so that no The training disturbance is brought to the neural network due to the regression vertex, thereby improving the learning efficiency and detection effect of the neural network. The neural network trained according to this embodiment can learn the ability to accurately predict the parameters of the boundary line of the character sequence.

As an example of this embodiment, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature points include: distance parameters of the plurality of boundary lines of the second character sequence in the polar coordinate system corresponding to the second feature points and angle parameters, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole; according to the predicted values of the parameters of the second feature point corresponding to multiple boundary lines of the second character sequence, And multiple boundary lines of the second character sequence correspond to the true value of the parameter of the second feature point, training the neural network, comprising: according to the multiple boundary lines of the second character sequence corresponding to the predicted value of the distance parameter of the second feature point , and the multiple boundary lines of the second character sequence correspond to the true value of the distance parameter of the second feature point, train the neural network; and/or, according to the multiple boundary lines of the second character sequence correspond to the angle of the second feature point The predicted value of the parameter, and the true value of the angle parameter of the plurality of boundary lines of the second character sequence corresponding to the second feature point, train the neural network. In this example, by mapping the straight line equation in the Cartesian coordinate system to the polar coordinate system, the correlation between the learning parameters and the parameters is reduced, and the parameters are given the actual physical meaning in the image, which is beneficial to network learning. In addition, in this example, by training the neural network to learn to detect the distance and angle of each boundary line of the character sequence corresponding to the feature point, the detection of the boundary lines can not interfere with each other, so that the learning efficiency and detection effect of the neural network can be improved. .

In one example, the plurality of boundary lines according to the second character sequence correspond to the predicted value of the distance parameter of the second feature point, and the plurality of boundary lines of the second character sequence correspond to the true value of the distance parameter of the second feature point , training the neural network, including: for any boundary line in the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the distance parameter of the second feature point between the true value and the predicted value of the smaller value and the larger value The ratio of values to train the neural network.

For example, for any one of the multiple boundary lines of the second character sequence, the loss function L _ρ corresponding to the distance parameter can be obtained by using formula (7):

Among them, N represents the number of second feature points,

represents the true value of the distance parameter of the boundary line corresponding to the second feature point i, ρ _i represents the predicted value of the distance parameter of the boundary line corresponding to the second feature point i,

indicates that the boundary line corresponds to the smaller of the true value and the predicted value of the distance parameter of the second feature point i,

Indicates that the boundary line corresponds to the larger of the true value and the predicted value of the distance parameter of the second feature point i. For example, if

but

like

but

because

The poles corresponding to ρ _i are the same (both are the second feature point i), that is,

is at the same point as one end of ρ _i , therefore, the loss function L _ρ corresponding to the distance parameter can be called the Intersection Over Union (IOU) loss function.

In this example, for any one of the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the actual value and the predicted value of the distance parameter of the second feature point, the smaller value and the larger value The ratio of , trains the neural network, which can normalize the distance parameters of different sizes in different application scenarios, which can help to perform multi-scale character detection, that is, it is helpful for character detection at different scales. achieve higher accuracy.

Of course, in other examples, for any boundary line among the multiple boundary lines of the second character sequence, the neural network can also be trained according to the search of the true value and the predicted value of the distance parameter corresponding to the boundary line to the second feature point, It is not limited here.

In one example, the plurality of boundary lines of the second character sequence correspond to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence correspond to the true value of the angle parameter of the second feature point , training the neural network, including: for any one of the multiple boundary lines of the second character sequence, determining the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point; according to The absolute value of the sine of half the angle, training the neural network.

Among them, the half angle of the absolute value is equal to 0.5 times the absolute value. For example, for any one of the multiple boundary lines of the second character sequence, the difference between the predicted value and the true value of the angle parameter of the boundary line corresponding to any second feature point is 90° or -90°, then The absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point is 90°, and the half angle of the absolute value is 45°.

For example, for any one of the multiple boundary lines of the second character sequence, the loss function L _θ corresponding to the angle parameter can be obtained by using formula (8):

Among them, N represents the number of second feature points,

represents the true value of the angle parameter of the boundary line corresponding to the second feature point i, θ _i represents the predicted value of the angle parameter of the boundary line corresponding to the second feature point i,

represents the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point i,

Indicates that the boundary line corresponds to the half angle of the absolute value of the difference between the true value and the predicted value of the angle parameter of the second feature point i.

Wherein, for any one of the multiple boundary lines of the second character sequence, the value range of the true value and the predicted value of the angle parameter of the boundary line corresponding to any second feature point may be [0, 2π], which is

_{0≤θi≤2π} . In polar coordinates, however, 0 coincides with 2π. For any one of the multiple boundary lines of the second character sequence, determine the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point, and determine the absolute value of the difference according to the half angle of the absolute value. The sine value of , trains the neural network, so that the learning of the neural network will not be disturbed due to the confusion of 0 and 2π, thereby improving the learning efficiency and detection effect of the neural network.

Of course, those skilled in the art can also use a cosine loss function or the like after transforming Equation 8, which is not limited here.

As an example of this embodiment, the second feature points include feature points in an effective area corresponding to the second character area. Wherein, the second feature points may only include feature points in the valid region corresponding to the second character region, and do not include feature points outside the valid region corresponding to the second character region. When calculating the loss function of the neural network, by only supervising the feature points in the valid region corresponding to the second character region, and not supervising the feature points outside the valid region corresponding to the second character region, it helps to reduce the network burden. For any feature point in the real bounding box of the second character region or in the edge region close to the real bounding box, the distance between the feature point and the boundary line of the real bounding box is small, so it is difficult to detect accurately, and it is easy to cause relatively big error. For example, for a feature point in the effective area, the predicted value of the distance parameter between the feature point and a certain boundary line of the real bounding box is 9, and the real value is 10, then the error is 10%; Feature point, the predicted value of the distance parameter between the feature point and a certain boundary line of the real bounding box is 1, and the real value is 2, then the error is 50%. Therefore, by ignoring feature points outside the effective region, it helps to reduce the network burden. Of course, in other examples, all the feature points in the real bounding box of the second character sequence are not limited here.

In an example, the character detection method provided by the embodiment of the present disclosure further includes: acquiring position information of the real bounding box of the second character sequence; reducing the real bounding box according to the position information of the real bounding box and a preset ratio to obtain the first The valid region corresponding to the two-character sequence. In this example, the range of the valid region corresponding to the second character sequence is within the real bounding box of the second character sequence, and the size of the valid region corresponding to the second character sequence is smaller than the size of the real bounding box of the second character sequence. FIG. 4 shows a schematic diagram of the real bounding box 31 and the effective area 32 of the second character area. Based on this example, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to reduce the network burden.

For example, reducing the real bounding box according to the position information of the real bounding box and the preset ratio to obtain an effective area corresponding to the second character sequence, including: determining the anchor point of the real bounding box according to the position information of the real bounding box, wherein, The anchor point of the real bounding box is the intersection of the diagonal lines of the real bounding box; according to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset scale, reduce the real bounding box to obtain the second character sequence The corresponding valid area, where the ratio of the first distance to the second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the valid area and the anchor point, and the second distance represents the first vertex in the real bounding box The distance between the corresponding vertex and the anchor point, the first vertex represents any vertex of the valid area. For example, the preset ratio may be 0.35, 0.4, 0.3, etc., which is not limited herein. For example, if the first vertex is the upper left vertex of the valid area, the vertex corresponding to the first vertex in the real bounding box is the upper left vertex of the real bounding box, and so on. According to this example, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to improve the learning efficiency and prediction accuracy of the neural network.

In one example, the coordinates of the 4 vertices of the true bounding box of the second character sequence may be represented as ( _xi , _yi ), i=1, 2, 3, 4. Among them, the four vertices of the real bounding box of the second character sequence can be sorted clockwise, (x ₁ , y ₁ ) can represent the upper left corner vertex of the real bounding box of the second character sequence, (x ₂ , y ₂ ) can represent the upper right corner vertex of the real bounding box of the second character sequence, (x ₃ , y ₃ ) can represent the lower right corner vertex of the real bounding box of the second character sequence, (x ₄ , y ₄ ) can represent the second character sequence The bottom-left vertex of the ground-truth bounding box. For any second feature point (x ₀ , y ₀ ), any boundary line of the true bounding box of the second character sequence corresponds to the true value of the distance parameter of the second feature point

and the truth value of the angle parameter

Equations (9) to (16) can be used to determine:

j=mod(i,4)+1 Formula (9);

A = y _j -y _i formula (10);

B=x _i -x _j formula (11);

C=x _j y _i -x _i y _j formula (12);

e=(1,0) Formula (15);

Among them, q represents the true value of the vertical vector from the second feature point to the boundary line, q is parallel to the vertical line from the second feature point to the boundary line, and points from the second feature point to the vertical point, BC>0 means q below the polar axis.

As an example of this embodiment, the method further includes: predicting the probability that the position of the pixel in the training image belongs to the character via the neural network; according to the probability that the position of the pixel in the training image belongs to the character, and the position of the pixel in the training image Labeled data belonging to characters to train a neural network. In this example, the neural network can be a multi-task learning model, learning character segmentation (ie, learning to detect the probability that a pixel in an image belongs to a character) and parameter prediction of boundary lines. According to this example, the neural network can be made to learn the ability to predict the probability that the position of the pixel belongs to the character.

In one example, training a neural network according to the probability that the position of the pixel in the training image belongs to the character, and the labeled data of the position of the pixel in the training image belonging to the character, includes: according to the pixel in the effective area corresponding to the second character sequence The probability that the position belongs to the character, and the labeled data of the position of the pixel in the effective area belonging to the character, train the neural network.

For example, the loss function corresponding to character segmentation can be obtained by using formula (17):

in,

Indicates the valid area corresponding to the second character sequence,

Indicates the number of pixels in the valid area corresponding to the second character sequence;

Indicates that the position of the pixel j in the valid area corresponding to the second character sequence belongs to the label data of the character. For example, if the position of the pixel j belongs to the character, then

If the position of pixel j does not belong to a character, then

y _j represents the probability that the position of the pixel _j in the effective area corresponding to the second character sequence belongs to the character, 0≤yj≤1.

In this example, the neural network can be trained by training the neural network according to the probability that the position of the pixel in the effective area corresponding to the second character sequence belongs to the character, and the labeled data that the position of the pixel in the effective area belongs to the character. It can improve the ability of character segmentation and improve the efficiency of neural network learning character segmentation.

In one example, a neural network can be trained with a loss function L as shown in Equation (18):

L=λ ₁ L _cls +λ ₂ L _ρ +λ ₃ L _θ Formula (18);

Among them, L _cls represents the loss function corresponding to character segmentation, L _ρ represents the loss function corresponding to the distance parameter, L _θ represents the loss function corresponding to the angle parameter, λ ₁ represents the weight corresponding to L _cls , λ ₂ represents the weight corresponding to L _ρ , λ ₃ represents the weight corresponding to L _θ , and λ ₁ , λ ₂ and λ ₃ can be flexibly set according to experience or training strategies, for example, λ ₁ =λ ₂ =λ ₃ =1, which is not limited here.

As an example of this embodiment, the neural network may include at least one channel reduction module, so as to reduce the calculation amount of the neural network and improve the speed of the boundary line detection by the neural network.

As an example of this embodiment, the neural network may include at least one feature aggregation module, so as to make full use of multi-scale features and improve the accuracy of boundary line detection performed by the neural network.

An application scenario of the embodiment of the present disclosure will be described below. FIG. 5 shows a schematic diagram of an application scenario of an embodiment of the present disclosure. As shown in Figure 5, the neural network can be an encoder-decoder structure. In Figure 5, 506 denotes a channel reduction module. For example, the channel reduction module 506 may be implemented using 1x1 convolutions. Of course, the channel reduction module 506 can also be implemented by using 3×3 convolution, etc., which is not limited here. 507 represents a feature aggregation module. The feature aggregation module 507 may be used to perform at least one operation of multiplying, adding, concat (merging) and the like on the input feature maps. For example, as shown in FIG. 5 , the feature aggregation module 507 can double the size (width and height) of the input feature map, and then perform concat, 1×1 concat, 1×1 based on the enlarged feature map and the output of the channel reduction module 506 Non-linear convolution and 3×3 non-linear convolution. As shown in Figure 5, the neural network can use the skeleton network to extract basic features, and continuously integrate the features of different scales through the feature aggregation module, and finally obtain the feature map of 9 channels, one of which is the text confidence level 504 (that is, the input image in the The probability of each pixel inputting a character), the other 8 channels are the distance parameters and angle parameters of the straight line equation of the four boundary lines, that is, the parameters 503 of the four boundary lines. According to the distance parameters and angle parameters of each boundary line of the three character sequences in the input image 501 in the polar coordinate system, the straight line equation of each boundary line of the three character sequences in the Cartesian coordinate system can be obtained. The straight line equation of the four boundary lines is visualized in the dashed box 505 on the right side of FIG. 5 , wherein the upper boundary line, the right boundary line, the lower boundary line and the left boundary line of the 3 character sequences are sequentially shown from top to bottom. According to the straight line equation of each boundary line of the 3 character sequences in the input image, the bounding box 502 of the 3 character sequences can be obtained, as shown in the lower left of FIG. 5 .

The character detection method provided by the embodiments of the present disclosure can be applied to character detection in general natural scenarios, as well as real-time text translation, document recognition, certificate recognition (such as ID cards, bank cards), license plate recognition, and other application scenarios, which are not limited here. . In some natural scenes, characters in the image will appear as irregular quadrilaterals due to camera perspective distortion. By adopting the embodiments of the present disclosure, the boundary of the character can be accurately detected, so that the shape of the character can be further corrected, which is beneficial to the subsequent character recognition. In addition, in addition to characters, some character carriers also exhibit the above phenomenon, such as rigid ID cards, bank cards, and license plates. By using the embodiments of the present disclosure to detect the boundaries of these quadrilateral carriers containing characters, it is also beneficial to the subsequent character recognition process.

It can be understood that the above-mentioned method embodiments mentioned in the present disclosure can be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method of the specific embodiment, the specific execution order of each step should be determined by its function and possible internal logic.

In addition, the present disclosure also provides a character detection device, an electronic device, a storage medium, and a program, all of which can be used to implement any character detection method provided by the present disclosure. For the corresponding technical solutions and technical effects, please refer to the corresponding records in the method section. Repeat.

FIG. 6 shows a block diagram of a character detection apparatus provided by an embodiment of the present disclosure. As shown in Figure 6, the character detection device 6 includes:

The first prediction module 61 is configured to respectively predict multiple boundary lines of the first character sequence in the image to be processed, and obtain prediction parameters of the multiple boundary lines of the first character sequence, wherein the The boundary line represents the dividing line between the area where the first character sequence is located and the area not where the first character sequence is located;

a first determination module 62, configured to determine the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of a plurality of boundary lines of the first character sequence;

The second determining module 63 is configured to determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence.

In some embodiments of the present disclosure, the first prediction module 61 is further configured to, based on the to-be-processed image, respectively predict a plurality of pieces of the first character sequence for the first feature points related to the first character sequence The boundary line corresponds to the parameter of the first feature point;

In some embodiments of the present disclosure, the character detection device 6 further includes:

In some embodiments of the present disclosure, the first prediction module 61 is further configured to calculate distance parameters and angle parameters of multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point Map to a Cartesian coordinate system, and obtain the parameters corresponding to the first feature point of a plurality of boundary lines of the first character sequence under the Cartesian coordinate system;

In some embodiments of the present disclosure, the first prediction module 61 is further configured to input the image to be processed into a pre-trained neural network, and for the first feature point related to the first character sequence via the neural network, A plurality of boundary lines of the first character sequence are respectively predicted to correspond to parameters of the first feature point.

and / or,

In some embodiments of the present disclosure, the apparatus further includes:

In some embodiments of the present disclosure, the second training module is further configured to, according to the probability that the position of the pixel in the effective area corresponding to the second character sequence belongs to the character, and the position of the pixel in the effective area belongs to Annotated data of characters to train the neural network.

In some embodiments of the present disclosure, the reduction module is further configured to determine the anchor point of the real bounding box according to the position information of the real bounding box, wherein the anchor point of the real bounding box is the real bounding box the intersection of the diagonals of the bounding box;

In some embodiments, the functions or modules included in the character detection apparatus 6 provided in the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the specific implementation and technical effects thereof may refer to the above method embodiments. description, which will not be repeated here.

Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented. Wherein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.

An embodiment of the present disclosure further provides a computer program, including computer-readable code, when the computer-readable code is executed in an electronic device, a processor in the electronic device executes the character detection method provided by any of the foregoing embodiments.

Embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, which, when executed, cause the computer to perform the operations of the character detection method provided by any of the foregoing embodiments.

Embodiments of the present disclosure further provide an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to call the executable instructions stored in the memory to execute The character detection method provided by any of the above embodiments.

The electronic device may be provided as a terminal, server or other form of device.

FIG. 7 shows a block diagram of an electronic device 800 provided by an embodiment of the present disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, etc. terminal.

7, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an Input/Output (I/O) interface 812, Sensor assembly 814 , and communication assembly 816 .

The processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

Memory 804 is configured to store various types of data to support operation at electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable) Erasable Programmable read only memory, EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (Read-Only Memory) , ROM), magnetic memory, flash memory, magnetic disk or optical disk.

Power supply assembly 806 provides power to various components of electronic device 800 . Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

Audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (Microphone, MIC) configured to receive external audio signals when the electronic device 800 is in an operating mode, such as a calling mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of electronic device 800 . For example, the sensor assembly 814 can detect the on/off state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or one of the electronic device 800 Changes in the position of components, presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge-coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second-generation mobile communication technology (2-Generation, 2G), a third-generation mobile communication technology (3rd-Generation, 3G), The fourth generation mobile communication technology (4-Generation, 4G)/, the long term evolution (Long Term Evolution, LTE) of the universal mobile communication technology, the fifth generation mobile communication technology (5-Generation, 5G) or their combination. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BitTorrent, BT) technology and other technology to achieve.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Process (DSP), Digital Signal Processing Device (Digital Signal Process Device) , DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation, used to perform the above method.

In an exemplary embodiment, a non-volatile computer-readable storage medium, such as a memory 804 comprising computer program instructions executable by the processor 820 of the electronic device 800 to perform the above method is also provided.

FIG. 8 shows a block diagram of an electronic device 1900 provided by an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server. 8, electronic device 1900 includes processing component 1922, which further includes one or more processors, and a memory resource represented by memory 1932 for storing instructions executable by processing component 1922, such as applications. An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 1922 is configured to execute instructions to perform the above-described methods.

The electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input-output interface 1958. The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows ServerTM), a graphical user interface based operating system (Mac OS XTM) introduced by Apple, a multi-user multi-process computer operating system (UnixTM). ), Free and Open Source Unix-like Operating System (LinuxTM), Open Source Unix-like Operating System (FreeBSDTM) or similar.

In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as memory 1932 comprising computer program instructions executable by processing component 1922 of electronic device 1900 to perform the above-described method.

The present disclosure may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present disclosure.

A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), ROM, EPROM or flash memory, SRAM, portable compact disk read only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanical coding devices, such as punched cards or recessed protrusions on which instructions are stored structure, and any suitable combination of the above. Computer-readable storage media, as used herein, are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.

The computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

The computer program instructions for carrying out the operations of the present disclosure may be assembly instructions, Industry Standard Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, may be connected to an external computer (eg, use an internet service provider to connect via the internet). In some embodiments, electronic circuits, such as programmable logic circuits, FPGAs, or Programmable Logic Arrays (PLAs), that can execute computer-readable Program instructions are read to implement various aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.

The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

Various embodiments of the present disclosure have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

Industrial Applicability

The present disclosure provides a character detection method, device, electronic device, storage medium and program; wherein, multiple boundary lines of a first character sequence in a to-be-processed image are respectively predicted to obtain multiple boundaries of the first character sequence line prediction parameters, wherein the boundary line of the first character sequence represents the dividing line between the area where the first character sequence is located and the area not where the first character sequence is located; according to the multiplicity of the first character sequence The prediction parameters of the boundary lines determine the position information of the vertices of the bounding box of the first character sequence; according to the position information of the vertices of the bounding box of the first character sequence, determine the location information.

Claims

A character detection method comprising:

Predicting multiple boundary lines of the first character sequence in the image to be processed respectively, to obtain prediction parameters of multiple boundary lines of the first character sequence, wherein the boundary line of the first character sequence represents the first character The dividing line between the region where the sequence is located and the region not where the first character sequence is located;

According to the prediction parameters of a plurality of boundary lines of the first character sequence, determine the position information of the vertices of the bounding box of the first character sequence;

According to the position information of the vertices of the bounding box of the first character sequence, the position information of the bounding box of the first character sequence is determined.
The method according to claim 1, wherein the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained, comprising:

Based on the to-be-processed image, with respect to the first feature point related to the first character sequence, respectively predict the parameters corresponding to the first feature point of the plurality of boundary lines of the first character sequence;

The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points.
The method of claim 2, further comprising:

Predict the probability that the position of the pixel in the to-be-processed image belongs to the character;

The first feature point is determined according to the probability that the position of the pixel in the image to be processed belongs to a character.
According to the method according to claim 2 or 3, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points include:

Distance parameters and angle parameters of multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system in which the feature points are poles.
The method according to claim 4, wherein the prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points, include:

Mapping the distance parameters and angle parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to the Cartesian coordinate system to obtain the plurality of boundary lines of the first character sequence A parameter corresponding to the first feature point in the Cartesian coordinate system;

According to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system, the prediction parameters of the plurality of boundary lines of the first character sequence are determined.
According to the method according to any one of claims 1 to 5, the plurality of boundary lines of the first character sequence includes an upper boundary line, a right boundary line, a lower boundary line and a left boundary line of the first character sequence.
The method according to claim 2, wherein, based on the to-be-processed image, for the first feature points related to the first character sequence, it is respectively predicted that a plurality of boundary lines of the first character sequence corresponds to the first character sequence. Parameters of feature points, including:

Input the image to be processed into a pre-trained neural network, and through the neural network, for the first feature points related to the first character sequence, respectively predict that multiple boundary lines of the first character sequence correspond to the first character sequence. Parameters of feature points.
The method of claim 7, further comprising:

The probability that the position of the pixel in the image to be processed belongs to a character is predicted through the neural network.
The method according to claim 7 or 8, before inputting the image to be processed into a pre-trained neural network, the method further comprises:

Input the training image into the neural network, and through the neural network, for the second feature points related to the second character sequence in the training image, respectively predict that a plurality of boundary lines of the second character sequence correspond to the The predicted value of the parameter of the second feature point;

According to the multiple boundary lines of the second character sequence corresponding to the predicted value of the parameter of the second feature point, and the multiple boundary lines of the second character sequence corresponding to the true value of the parameter of the second feature point value to train the neural network.
The method according to claim 9, wherein the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature point comprise: the plurality of boundary lines of the second character sequence are at the second feature point The distance parameter and the angle parameter under the corresponding polar coordinate system, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole;

The multiple boundary lines according to the second character sequence correspond to the predicted value of the parameter of the second feature point, and the multiple boundary lines of the second character sequence correspond to the parameter of the second feature point The true value of , train the neural network, including:

According to the predicted value of the distance parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence, and the distance parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence The true value of , train the neural network;

and / or,

According to the predicted value of the angle parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence, and the angle parameter of the second feature point corresponding to the plurality of boundary lines of the second character sequence The true value of , to train the neural network.
The method according to claim 10, wherein the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the distance parameter of the second feature point, and the plurality of boundary lines of the second character sequence Corresponding to the true value of the distance parameter of the second feature point, training the neural network, including:

For any boundary line among the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the smaller value and the larger value of the distance parameter of the second feature point and the predicted value The ratio of , to train the neural network.
The method according to claim 10, wherein the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence Corresponding to the true value of the angle parameter of the second feature point, training the neural network, including:

For any one of the multiple boundary lines of the second character sequence, determine the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point;

The neural network is trained according to the sine of the half angle of the absolute value.
The method according to any one of claims 9 to 12, wherein the second feature points include feature points in an effective area corresponding to the second character area.
The method of any one of claims 9 to 13, further comprising:

Predicting the probability that the position of the pixel in the training image belongs to a character via the neural network;

The neural network is trained according to the probability that the positions of the pixels in the training image belong to characters, and the labeled data that the positions of the pixels in the training image belong to the characters.
The method according to claim 14, wherein the training of the neural network according to the probability that the position of the pixel in the training image belongs to the character, and the labeled data that the position of the pixel in the training image belongs to the character, comprises:

The neural network is trained according to the probability that the position of the pixel in the valid region corresponding to the second character sequence belongs to the character, and the labeled data that the position of the pixel in the valid region belongs to the character.
The method of claim 13 or 15, further comprising:

obtaining the position information of the real bounding box of the second character sequence;

According to the position information of the real bounding box and the preset ratio, the real bounding box is reduced to obtain an effective area corresponding to the second character sequence.
The method according to claim 16, wherein reducing the real bounding box according to the position information of the real bounding box and a preset ratio to obtain an effective area corresponding to the second character sequence, comprising:

Determine the anchor point of the real bounding box according to the position information of the real bounding box, wherein the anchor point of the real bounding box is the intersection of the diagonal lines of the real bounding box;

According to the position information of the real bounding box, the position information of the anchor points of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of the distance to the second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area.
A character detection device, comprising:

The first prediction module is configured to respectively predict multiple boundary lines of the first character sequence in the image to be processed, and obtain prediction parameters of the multiple boundary lines of the first character sequence, wherein the boundary of the first character sequence The line represents the dividing line between the area where the first character sequence is located and the area not where the first character sequence is located;

a first determining module, configured to determine the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of a plurality of boundary lines of the first character sequence;

The second determining module is configured to determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence.
An electronic device comprising:

one or more processors;

memory for storing executable instructions;

Wherein, the one or more processors are configured to invoke executable instructions stored in the memory to perform the character detection method of any one of claims 1 to 17.
A computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the character detection method according to any one of claims 1 to 17.
A computer program comprising computer-readable code, in the case of the computer-readable code being executed in an electronic device, executed by a processor of the electronic device for implementing any one of claims 1 to 17 A method of character detection as described.