TW202219822A - Character detection method, electronic equipment and computer-readable storage medium - Google Patents

Character detection method, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
TW202219822A
TW202219822A TW110112439A TW110112439A TW202219822A TW 202219822 A TW202219822 A TW 202219822A TW 110112439 A TW110112439 A TW 110112439A TW 110112439 A TW110112439 A TW 110112439A TW 202219822 A TW202219822 A TW 202219822A
Authority
TW
Taiwan
Prior art keywords
character sequence
character
feature point
boundary lines
parameters
Prior art date
Application number
TW110112439A
Other languages
Chinese (zh)
Inventor
畢研廣
胡志强
Original Assignee
大陸商上海商湯智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商上海商湯智能科技有限公司 filed Critical 大陸商上海商湯智能科技有限公司
Publication of TW202219822A publication Critical patent/TW202219822A/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

The present disclosure relates to a character detection method, electronic equipment and computer-readable storage medium. The method includes: respectively predicting multiple boundary lines of a first character sequence in an image to be processed to obtain prediction parameters of multiple boundary lines of the first character sequence, wherein the boundary line of the first character sequence represents the dividing line between the area where the first character sequence is located and the area where the first character sequence is not located; determine the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of the multiple boundary lines of the first character sequence; determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence. The embodiments of the present disclosure can improve the accuracy of character detection.

Description

字元檢測方法、電子設備及電腦可讀儲存介質Character detection method, electronic device and computer-readable storage medium

本發明關於電腦視覺技術領域,尤其關於一種字元檢測方法、電子設備及電腦可讀儲存介質。The present invention relates to the technical field of computer vision, and in particular, to a character detection method, an electronic device and a computer-readable storage medium.

自然場景下的字元檢測是電腦視覺中的重要研究領域,且已被應用於多種應用場景,例如即時文本翻譯、單據識別、車牌識別等;相關技術中,字元在實際應用場景中處於剛性平面,然而在成像過程中,由於相機的視角扭曲和畸變,導致處於剛性平面的字元呈現為不規則的任意四邊形形狀;對於這些字元,需要對其四條邊界精確地回歸定位,才能在後續的字元識別環節中矯正出正確的字元形狀,從而正確識別出字元內容。Character detection in natural scenes is an important research field in computer vision, and has been applied to various application scenarios, such as real-time text translation, document recognition, license plate recognition, etc. In related technologies, characters are rigid in practical application scenarios. However, during the imaging process, due to the distortion and distortion of the camera's viewing angle, the characters on the rigid plane appear as irregular arbitrary quadrilateral shapes. The correct character shape is corrected in the character recognition link of , so as to correctly identify the character content.

本發明提供了一種字元檢測的技術方案。The present invention provides a technical solution for character detection.

本發明實施例提供了一種字元檢測方法,包括: 對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,其中,所述第一字元序列的邊界線表示所述第一字元序列所在區域與非所述第一字元序列所在區域之間的分界線; 根據所述第一字元序列的多條邊界線的預測參數,確定所述第一字元序列的邊界框的頂點的位置資訊; 根據所述第一字元序列的邊界框的頂點的位置資訊,確定所述第一字元序列的邊界框的位置資訊。如此,通過預測出的待處理圖像中第一字元序列的多條邊界線的預測參數,確定第一字元序列的邊界框的頂點的位置資訊,並根據第一字元序列的邊界框的頂點的位置資訊,確定第一字元序列的邊界框的位置資訊,由此將字元序列的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,從而每一條邊界線的檢測均不會被兩個不同的頂點所干擾,進而能夠提高字元檢測的準確性。 An embodiment of the present invention provides a character detection method, including: Predicting the multiple boundary lines of the first character sequence in the image to be processed respectively, to obtain the prediction parameters of the multiple boundary lines of the first character sequence, wherein the boundary line of the first character sequence represents the the boundary between the area where the first character sequence is located and the area not where the first character sequence is located; determining the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of the plurality of boundary lines of the first character sequence; The position information of the bounding box of the first character sequence is determined according to the position information of the vertices of the bounding box of the first character sequence. In this way, by predicting the prediction parameters of multiple boundary lines of the first character sequence in the image to be processed, the position information of the vertices of the bounding box of the first character sequence is determined, and according to the bounding box of the first character sequence The position information of the vertices of the character sequence is determined, and the position information of the bounding box of the first character sequence is determined, so that the polygon (such as quadrilateral) bounding box of the character sequence is decomposed into multiple (such as four) independent boundary lines. Independent boundary lines are independently detected, so that the detection of each boundary line will not be disturbed by two different vertices, thereby improving the accuracy of character detection.

在本發明的一些實施例中,所述對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,包括: 基於所述待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。如此,基於待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數,並根據第一字元序列的多條邊界線對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數,由此基於與第一字元序列相關的特徵點對第一字元序列的邊界線的參數進行預測,從而有助於提高得到邊界線的預測參數的效率,並有助於提高所得到的預測參數的準確性。 In some embodiments of the present invention, the multiple boundary lines of the first character sequence in the image to be processed are predicted respectively, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained, including: Based on the to-be-processed image, for the first feature point related to the first character sequence, predict the parameters corresponding to the first feature point of a plurality of boundary lines of the first character sequence respectively; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points. In this way, based on the image to be processed, for the first feature points related to the first character sequence, the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature points are respectively predicted, and according to the first character sequence The plurality of boundary lines correspond to the parameters of the first feature point, and the prediction parameters of the plurality of boundary lines of the first character sequence are determined, so that the boundary of the first character sequence is based on the feature points related to the first character sequence. The parameters of the line are predicted, thereby helping to improve the efficiency of obtaining the predicted parameters of the boundary line, and helping to improve the accuracy of the obtained predicted parameters.

在本發明的一些實施例中,所述方法還包括: 預測所述待處理圖像中的圖元所在位置屬於字元的概率; 根據所述待處理圖像中的圖元所在位置屬於字元的概率,確定所述第一特徵點。如此,通過預測待處理圖像中的圖元所在位置屬於字元的概率,並根據待處理圖像中的圖元所在位置屬於字元的概率,確定第一特徵點,由此能夠準確地確定與第一字元序列相關的第一特徵點。基於由此確定的第一特徵點對第一字元序列的邊界線的參數進行預測,有助於進一步提高得到邊界線的預測參數的效率,並有助於進一步提高所得到的預測參數的準確性。 In some embodiments of the present invention, the method further includes: Predict the probability that the position of the primitive in the image to be processed belongs to the character; The first feature point is determined according to the probability that the position of the graphic element in the image to be processed belongs to the character element. In this way, by predicting the probability that the position of the primitive in the image to be processed belongs to the character, and according to the probability that the position of the primitive in the image to be processed belongs to the character, the first feature point can be determined, thereby accurately determining A first feature point associated with the first sequence of characters. Predicting the parameters of the boundary line of the first character sequence based on the first feature point thus determined is helpful to further improve the efficiency of obtaining the prediction parameters of the boundary line and further improve the accuracy of the obtained prediction parameters sex.

在本發明的一些實施例中,所述第一字元序列的多條邊界線對應於所述第一特徵點的參數包括: 所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數,其中,所述第一特徵點對應的極座標系表示以所述第一特徵點為極點的極座標系。如此,通過將邊界線在笛卡爾座標系下的直線方程映射到極座標系中,得到在圖像中具有明確的物理意義且相互獨立的距離參數和角度參數,減少了參數量及相關性,且有利於網路學習。 In some embodiments of the present invention, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include: The distance parameters and angle parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system with a point as a pole. In this way, by mapping the straight line equation of the boundary line in the Cartesian coordinate system to the polar coordinate system, the distance parameter and angle parameter with clear physical meaning and independent of each other in the image are obtained, which reduces the parameter quantity and correlation, and Useful for online learning.

在本發明的一些實施例中,所述根據所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數,包括: 將所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。如此,通過將第一字元序列的多條邊界線在第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到第一字元序列的多條邊界線在笛卡爾座標系下對應於第一特徵點的參數,並根據第一字元序列的多條邊界線在笛卡爾座標系下對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數,由此能夠基於不同極座標系下的參數回歸得到邊界線的預測參數。 In some embodiments of the present invention, the prediction of the plurality of boundary lines of the first character sequence is determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point parameters, including: The distance parameters and angle parameters of the plurality of boundary lines of the first character sequence under the polar coordinate system corresponding to the first feature point are mapped to the Cartesian coordinate system to obtain the plurality of boundaries of the first character sequence The line corresponds to the parameter of the first feature point in the Cartesian coordinate system; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system. In this way, by mapping the distance parameters and angle parameters of the multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to the Cartesian coordinate system, the multiple boundary lines of the first character sequence are obtained in the Cartesian coordinate system. The parameters corresponding to the first feature point in the Cartesian coordinate system, and according to the parameters corresponding to the first feature point in the Cartesian coordinate system of the multiple boundary lines of the first character sequence, multiple boundaries of the first character sequence are determined The predicted parameters of the boundary line can be obtained by regression based on the parameters in different polar coordinate systems.

在本發明的一些實施例中,所述第一字元序列的多條邊界線包括所述第一字元序列的上邊界線、右邊界線、下邊界線和左邊界線。如此,由於在大多數情況下,字元序列的形狀為四邊形,進而根據該實現方式,有助於在大多數情況下獲得較準確的字元序列的邊界框的位置資訊。In some embodiments of the present invention, the plurality of boundary lines of the first sequence of characters include an upper boundary, a right boundary, a lower boundary and a left boundary of the first sequence of characters. In this way, since in most cases, the shape of the character sequence is a quadrilateral, according to this implementation, it is helpful to obtain more accurate position information of the bounding box of the character sequence in most cases.

在本發明的一些實施例中,所述基於所述待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,包括: 將所述待處理圖像輸入預先訓練的神經網路,經由所述神經網路針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數。如此,通過預先訓練的神經網路針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數,由此能夠提高預測參數的速度,並能提高所預測的參數的準確性。 In some embodiments of the present invention, based on the to-be-processed image, for the first feature points related to the first character sequence, it is respectively predicted that a plurality of boundary lines of the first character sequence corresponds to the Describe the parameters of the first feature point, including: Input the image to be processed into a pre-trained neural network, and through the neural network, for the first feature points related to the first character sequence, respectively predict the correspondence of multiple boundary lines of the first character sequence parameters at the first feature point. In this way, for the first feature points related to the first character sequence, the pre-trained neural network respectively predicts the parameters corresponding to the first feature points of the multiple boundary lines of the first character sequence, thereby improving the prediction parameters. speed and improve the accuracy of the predicted parameters.

在本發明的一些實施例中,所述方法還包括: 經由所述神經網路預測所述待處理圖像中的圖元所在位置屬於字元的概率。如此,通過預先訓練的神經網路預測待處理圖像中的圖元所在位置屬於字元的概率,由此能夠提高預測圖元所在位置屬於字元的概的速度,並能夠提高所預測的概率的準確性。 In some embodiments of the present invention, the method further includes: The probability that the location of the primitive in the to-be-processed image belongs to the character is predicted through the neural network. In this way, by predicting the probability that the position of the primitive in the image to be processed belongs to the character through the pre-trained neural network, the speed of predicting the probability that the position of the primitive belongs to the character can be improved, and the predicted probability can be improved. accuracy.

在本發明的一些實施例中,所述將所述待處理圖像輸入預先訓練的神經網路之前,所述方法還包括: 將訓練圖像輸入所述神經網路,經由所述神經網路針對與所述訓練圖像中的第二字元序列相關的第二特徵點,分別預測所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值; 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的真值,訓練所述神經網路。如此,通過將字元序列的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,由此不會因回歸頂點而給神經網路帶來訓練擾動,從而提高神經網路的學習效率和檢測效果,且根據該實現方式訓練得到的神經網路能夠學習到準確地預測字元序列的邊界線的參數的能力。 In some embodiments of the present invention, before inputting the to-be-processed image into a pre-trained neural network, the method further includes: Input the training image into the neural network, and through the neural network, for the second feature points related to the second character sequence in the training image, respectively predict multiple items of the second character sequence. The boundary line corresponds to the predicted value of the parameter of the second feature point; According to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the parameter of the second feature point The true value of , to train the neural network. In this way, by disassembling the polygon (such as quadrilateral) bounding box of the character sequence into multiple (such as four) independent boundary lines, each independent boundary line is detected separately, so that the neural network will not be affected due to regression vertices. The network brings training disturbance, thereby improving the learning efficiency and detection effect of the neural network, and the neural network trained according to this implementation can learn the ability to accurately predict the parameters of the boundary line of the character sequence.

在本發明的一些實施例中,所述第二字元序列的多條邊界線對應於所述第二特徵點的參數包括:所述第二字元序列的多條邊界線在所述第二特徵點對應的極座標系下的距離參數和角度參數,其中,所述第二特徵點對應的極座標系表示以所述第二特徵點為極點的極座標系; 所述根據所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的真值,訓練所述神經網路,包括: 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的真值,訓練所述神經網路;和/或, 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的真值,訓練所述神經網路。如此,通過將笛卡爾座標系下的直線方程映射到極座標系中,減少了參數量及相關性,並且賦予了參數實際物理意義,有利於網路學習,且通過訓練神經網路學習檢測字元序列的各條邊界線對應於特徵點的距離與角度,能夠使邊界線的檢測不互相干擾,從而能夠提高神經網路的學習效率和檢測效果。 In some embodiments of the present invention, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature point include: the plurality of boundary lines of the second character sequence are in the second The distance parameter and the angle parameter under the polar coordinate system corresponding to the feature point, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole; The plurality of boundary lines according to the second character sequence corresponds to the predicted value of the parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponds to the second feature point The ground-truth values of the parameters that train the neural network, including: The plurality of boundary lines of the second character sequence correspond to the predicted value of the distance parameter of the second feature point, and the plurality of boundary lines of the second character sequence correspond to the predicted value of the distance parameter of the second feature point. the true value of the distance parameter to train the neural network; and/or, According to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the second feature point The true value of the angle parameter to train the neural network. In this way, by mapping the straight line equation under the Cartesian coordinate system to the polar coordinate system, the amount of parameters and the correlation are reduced, and the actual physical meaning of the parameters is given, which is conducive to network learning, and learning to detect characters by training a neural network. Each boundary line of the sequence corresponds to the distance and angle of the feature points, so that the detection of the boundary lines does not interfere with each other, thereby improving the learning efficiency and detection effect of the neural network.

在本發明的一些實施例中,所述根據所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的真值,訓練所述神經網路,包括: 對於所述第二字元序列的多條邊界線中的任意一條邊界線,根據所述邊界線對應於所述第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練所述神經網路。如此,通過對於第二字元序列的多條邊界線中的任意一條邊界線,根據邊界線對應於第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練神經網路,由此能夠對不同應用場景下不同大小的距離參數進行歸一化,從而能夠有助於進行多尺度的字元檢測,即有助於在不同尺度的字元檢測中達到更高準確性。 In some embodiments of the present invention, the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the distance parameter of the second feature point, and the plurality of the second character sequence The boundary line corresponds to the true value of the distance parameter of the second feature point, and the training of the neural network includes: For any boundary line among the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the actual value and the predicted value of the distance parameter of the second feature point, the smaller value and the larger value The ratio of values to train the neural network. In this way, for any one of the multiple boundary lines of the second character sequence, according to the boundary line corresponding to the ratio of the smaller value to the larger value of the distance parameter of the second feature point and the predicted value , training the neural network, which can normalize the distance parameters of different sizes in different application scenarios, which can help to perform multi-scale character detection, that is, it is helpful to achieve different scales in character detection. higher accuracy.

在本發明的一些實施例中,所述根據所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的真值,訓練所述神經網路,包括: 對於所述第二字元序列的多條邊界線中的任意一條邊界線,確定所述邊界線對應於所述第二特徵點的角度參數的真值與預測值的差值的絕對值; 根據所述絕對值的半倍角的正弦值,訓練所述神經網路。如此,通過對於第二字元序列的多條邊界線中的任意一條邊界線,確定邊界線對應於第二特徵點的角度參數的真值與預測值的差值的絕對值,並根據絕對值的半倍角的正弦值,訓練神經網路,由此不會因為0與

Figure 02_image001
混淆而對神經網路的學習帶來干擾,從而有助於提高神經網路的學習效率和檢測效果。 In some embodiments of the present invention, the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the angle parameter of the second feature point, and the plurality of the second character sequence The boundary line corresponds to the true value of the angle parameter of the second feature point, and training the neural network includes: for any boundary line in the plurality of boundary lines of the second character sequence, determining the boundary The line corresponds to the absolute value of the difference between the true value of the angle parameter of the second feature point and the predicted value; the neural network is trained according to the sine value of the half angle of the absolute value. In this way, for any one of the multiple boundary lines of the second character sequence, the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point is determined, and according to the absolute value The sine of half the angle of , trains the neural network so that it will not be
Figure 02_image001
Confusion interferes with the learning of the neural network, thereby helping to improve the learning efficiency and detection effect of the neural network.

在本發明的一些實施例中,所述第二特徵點包括所述第二字元區域對應的有效區域中的特徵點。如此,在計算神經網路的損失函數時,通過僅監督第二字元區域對應的有效區域中的特徵點,不監督第二字元區域對應的有效區域外的特徵點,有助於減少網路負擔。In some embodiments of the present invention, the second feature points include feature points in an effective area corresponding to the second character area. In this way, when calculating the loss function of the neural network, by only supervising the feature points in the effective area corresponding to the second character area, and not supervising the feature points outside the effective area corresponding to the second character area, it is helpful to reduce the network cost. road burden.

在本發明的一些實施例中,所述方法還包括: 經由所述神經網路預測所述訓練圖像中的圖元所在位置屬於字元的概率; 根據所述訓練圖像中的圖元所在位置屬於字元的概率,以及所述訓練圖像中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。如此,能夠使神經網路學習到預測圖元所在位置屬於字元的概率的能力。 In some embodiments of the present invention, the method further includes: Predicting the probability that the position of the primitive in the training image belongs to the character through the neural network; The neural network is trained according to the probability that the position of the graphic element in the training image belongs to the character element, and the labeling data that the position of the graphic element in the training image belongs to the character element. In this way, the neural network can learn the ability to predict the probability that the location of the primitive belongs to the character.

在本發明的一些實施例中,所述根據所述訓練圖像中的圖元所在位置屬於字元的概率,以及所述訓練圖像中的圖元所在位置屬於字元的標注資料,訓練所述神經網路,包括: 根據所述第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及所述有效區域中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。如此,通過根據第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及有效區域中的圖元所在位置屬於字元的標注資料,訓練神經網路,能夠使神經網路學習到字元分割的能力,且能提高神經網路學習字元分割的效率。 In some embodiments of the present invention, according to the probability that the position of the primitive in the training image belongs to the character, and the labeling data of the position of the primitive in the training image belonging to the character, the training The neural network described above, including: The neural network is trained according to the probability that the position of the graphic element in the effective area corresponding to the second character sequence belongs to the character element, and the labeling data of the position of the graphic element in the effective area belonging to the character element. In this way, by training the neural network according to the probability that the position of the graphic element in the effective area corresponding to the second character sequence belongs to the character, and the labeling data of the position of the graphic element in the effective area belonging to the character, the neural network can be trained. It can learn the ability of character segmentation, and can improve the efficiency of neural network learning character segmentation.

在本發明的一些實施例中,所述方法還包括: 獲取所述第二字元序列的真實邊界框的位置資訊; 根據所述真實邊界框的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域。如此,得到第二字元序列對應的有效區域,並基於第二字元序列對應的有效區域中的特徵點進行神經網路的訓練,有助於減少網路負擔。 In some embodiments of the present invention, the method further includes: obtaining location information of the real bounding box of the second character sequence; According to the position information of the real bounding box and a preset ratio, the real bounding box is reduced to obtain an effective area corresponding to the second character sequence. In this way, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to reduce the network load.

在本發明的一些實施例中,所述根據所述真實邊界框的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域,包括: 根據所述真實邊界框的位置資訊,確定所述真實邊界框的錨點,其中,所述真實邊界框的錨點為所述真實邊界框的對角線的交點; 根據所述真實邊界框的位置資訊,所述真實邊界框的錨點的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域,其中,第一距離與第二距離的比值等於所述預設比例,所述第一距離表示所述有效區域的第一頂點與所述錨點之間的距離,所述第二距離表示真實邊界框中所述第一頂點對應的頂點與所述錨點之間的距離,所述第一頂點表示所述有效區域的任一頂點。如此,得到第二字元序列對應的有效區域,並基於第二字元序列對應的有效區域中的特徵點進行神經網路的訓練,有助於提高神經網路的學習效率和預測準確性。 In some embodiments of the present invention, according to the position information of the real bounding box and a preset ratio, reducing the real bounding box to obtain an effective area corresponding to the second character sequence, including: According to the position information of the real bounding box, the anchor point of the real bounding box is determined, wherein the anchor point of the real bounding box is the intersection of the diagonal lines of the real bounding box; According to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of a distance to a second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the distance in the real bounding box The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area. In this way, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to improve the learning efficiency and prediction accuracy of the neural network.

以下裝置、電子設備等的效果描述參見上述方法的說明,這裡不再贅述。For descriptions of the effects of the following apparatuses, electronic devices, etc., reference may be made to the descriptions of the above-mentioned methods, which will not be repeated here.

本發明實施例還提供了一種字元檢測裝置,包括: 第一預測模組,配置為對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,其中,所述第一字元序列的邊界線表示所述第一字元序列所在區域與非所述第一字元序列所在區域之間的分界線; 第一確定模組,配置為根據所述第一字元序列的多條邊界線的預測參數,確定所述第一字元序列的邊界框的頂點的位置資訊; 第二確定模組,配置為根據所述第一字元序列的邊界框的頂點的位置資訊,確定所述第一字元序列的邊界框的位置資訊。 The embodiment of the present invention also provides a character detection device, including: The first prediction module is configured to respectively predict multiple boundary lines of the first character sequence in the image to be processed, and obtain prediction parameters of the multiple boundary lines of the first character sequence, wherein the first The boundary line of the character sequence represents the dividing line between the area where the first character sequence is located and the area not where the first character sequence is located; a first determining module configured to determine position information of vertices of the bounding box of the first character sequence according to prediction parameters of a plurality of boundary lines of the first character sequence; The second determining module is configured to determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence.

在本發明的一些實施例中,所述第一預測模組配置為基於所述待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。 In some embodiments of the present invention, the first prediction module is configured to predict, based on the to-be-processed image, the first feature points of the first character sequence, respectively, for the first character sequence. A plurality of boundary lines correspond to parameters of the first feature point; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points.

在本發明的一些實施例中,所述裝置還包括: 第二預測模組,配置為預測所述待處理圖像中的圖元所在位置屬於字元的概率; 第三確定模組,配置為根據所述待處理圖像中的圖元所在位置屬於字元的概率,確定所述第一特徵點。 In some embodiments of the present invention, the apparatus further comprises: The second prediction module is configured to predict the probability that the position of the primitive in the to-be-processed image belongs to the character; The third determination module is configured to determine the first feature point according to the probability that the position of the graphic element in the image to be processed belongs to the character element.

在本發明的一些實施例中,所述第一字元序列的多條邊界線對應於所述第一特徵點的參數包括: 所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數,其中,所述第一特徵點對應的極座標系表示以所述第一特徵點為極點的極座標系。 In some embodiments of the present invention, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include: The distance parameters and angle parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system with a point as a pole.

在本發明的一些實施例中,所述第一預測模組配置為將所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。 In some embodiments of the present invention, the first prediction module is configured to map distance parameters and angle parameters of a plurality of boundary lines of the first character sequence in a polar coordinate system corresponding to the first feature point To the Cartesian coordinate system, obtain a plurality of boundary lines of the first character sequence corresponding to the parameters of the first feature point under the Cartesian coordinate system; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system.

在本發明的一些實施例中,所述第一字元序列的多條邊界線包括所述第一字元序列的上邊界線、右邊界線、下邊界線和左邊界線。In some embodiments of the present invention, the plurality of boundary lines of the first sequence of characters include an upper boundary, a right boundary, a lower boundary and a left boundary of the first sequence of characters.

在本發明的一些實施例中,所述第一預測模組配置為將所述待處理圖像輸入預先訓練的神經網路,經由所述神經網路針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數。In some embodiments of the present invention, the first prediction module is configured to input the to-be-processed image into a pre-trained neural network, via the neural network, for the first prediction associated with the first sequence of characters. feature points, respectively predicting that multiple boundary lines of the first character sequence correspond to parameters of the first feature points.

在本發明的一些實施例中,所述裝置還包括: 第三預測模組,配置為經由所述神經網路預測所述待處理圖像中的圖元所在位置屬於字元的概率。 In some embodiments of the present invention, the apparatus further comprises: The third prediction module is configured to predict the probability that the location of the primitive in the image to be processed belongs to the character through the neural network.

在本發明的一些實施例中,所述裝置還包括: 第四預測模組,配置為將訓練圖像輸入所述神經網路,經由所述神經網路針對與所述訓練圖像中的第二字元序列相關的第二特徵點,分別預測所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值; 第一訓練模組,配置為根據所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的真值,訓練所述神經網路。 In some embodiments of the present invention, the apparatus further comprises: The fourth prediction module is configured to input the training image into the neural network, and through the neural network, respectively predict the second feature points related to the second character sequence in the training image. A plurality of boundary lines of the second character sequence corresponds to the predicted value of the parameter of the second feature point; The first training module is configured to correspond to the predicted value of the parameter of the second feature point according to the multiple boundary lines of the second character sequence, and the multiple boundary lines of the second character sequence correspond to The true value of the parameter of the second feature point is used to train the neural network.

在本發明的一些實施例中,所述第二字元序列的多條邊界線對應於所述第二特徵點的參數包括:所述第二字元序列的多條邊界線在所述第二特徵點對應的極座標系下的距離參數和角度參數,其中,所述第二特徵點對應的極座標系表示以所述第二特徵點為極點的極座標系; 所述第一訓練模組配置為根據所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的真值,訓練所述神經網路;和/或, 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的真值,訓練所述神經網路。 In some embodiments of the present invention, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature point include: the plurality of boundary lines of the second character sequence are in the second The distance parameter and the angle parameter under the polar coordinate system corresponding to the feature point, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole; The first training module is configured to correspond to the predicted value of the distance parameter of the second feature point according to the plurality of boundary lines of the second character sequence, and the plurality of boundary lines of the second character sequence training the neural network corresponding to the true value of the distance parameter of the second feature point; and/or, According to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the second feature point The true value of the angle parameter to train the neural network.

在本發明的一些實施例中,所述第一訓練模組配置為對於所述第二字元序列的多條邊界線中的任意一條邊界線,根據所述邊界線對應於所述第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練所述神經網路。In some embodiments of the present invention, the first training module is configured to, for any one of a plurality of boundary lines of the second character sequence, correspond to the second feature according to the boundary line The ratio of the smaller value to the larger value of the true value and the predicted value of the distance parameter of the point, the neural network is trained.

在本發明的一些實施例中,所述第一訓練模組配置為對於所述第二字元序列的多條邊界線中的任意一條邊界線,確定所述邊界線對應於所述第二特徵點的角度參數的真值與預測值的差值的絕對值; 根據所述絕對值的半倍角的正弦值,訓練所述神經網路。 In some embodiments of the present invention, the first training module is configured to, for any one of a plurality of boundary lines of the second character sequence, determine that the boundary line corresponds to the second feature The absolute value of the difference between the true value of the point's angle parameter and the predicted value; The neural network is trained according to the sine of the half angle of the absolute value.

在本發明的一些實施例中,所述第二特徵點包括所述第二字元區域對應的有效區域中的特徵點。In some embodiments of the present invention, the second feature points include feature points in an effective area corresponding to the second character area.

在本發明的一些實施例中,所述裝置還包括: 第五預測模組,配置為經由所述神經網路預測所述訓練圖像中的圖元所在位置屬於字元的概率; 第二訓練模組,配置為根據所述訓練圖像中的圖元所在位置屬於字元的概率,以及所述訓練圖像中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。 In some embodiments of the present invention, the apparatus further comprises: a fifth prediction module, configured to predict the probability that the position of the primitive in the training image belongs to the character through the neural network; The second training module is configured to train the neural network according to the probability that the position of the graphic element in the training image belongs to the character, and the labeling data that the position of the graphic element in the training image belongs to the character road.

在本發明的一些實施例中,所述第二訓練模組配置為根據所述第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及所述有效區域中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。In some embodiments of the present invention, the second training module is configured to be based on the probability that the location of the graphic element in the valid area corresponding to the second character sequence belongs to the character, and the graph in the valid area The location of the element belongs to the labeling data of the character, and the neural network is trained.

在本發明的一些實施例中,所述裝置還包括: 獲取模組,配置為獲取所述第二字元序列的真實邊界框的位置資訊; 縮小模組,配置為根據所述真實邊界框的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域。 In some embodiments of the present invention, the apparatus further comprises: an acquisition module, configured to acquire the position information of the real bounding box of the second character sequence; The shrinking module is configured to shrink the real bounding box according to the position information of the real bounding box and a preset ratio to obtain an effective area corresponding to the second character sequence.

在本發明的一些實施例中,所述縮小模組配置為根據所述真實邊界框的位置資訊,確定所述真實邊界框的錨點,其中,所述真實邊界框的錨點為所述真實邊界框的對角線的交點; 根據所述真實邊界框的位置資訊,所述真實邊界框的錨點的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域,其中,第一距離與第二距離的比值等於所述預設比例,所述第一距離表示所述有效區域的第一頂點與所述錨點之間的距離,所述第二距離表示真實邊界框中所述第一頂點對應的頂點與所述錨點之間的距離,所述第一頂點表示所述有效區域的任一頂點。 In some embodiments of the present invention, the reduction module is configured to determine the anchor point of the real bounding box according to the position information of the real bounding box, wherein the anchor point of the real bounding box is the real bounding box the intersection of the diagonals of the bounding box; According to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of a distance to a second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the distance in the real bounding box The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area.

本發明實施例還提供了一種電子設備,包括:一個或多個處理器;用於儲存可執行指令的記憶體;其中,所述一個或多個處理器被配置為調用所述記憶體儲存的可執行指令,以執行上述任一實施例所述的字元檢測方法。An embodiment of the present invention further provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to call the memory stored in the memory The executable instruction is used to execute the character detection method described in any of the above embodiments.

本發明實施例還提供了一種電腦可讀儲存介質,其上儲存有電腦程式指令,所述電腦程式指令被處理器執行時實現上述任一實施例所述的字元檢測方法。An embodiment of the present invention further provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the character detection method described in any of the foregoing embodiments is implemented.

本發明實施例還提供一種電腦程式,所述電腦程式包括電腦可讀代碼,在所述電腦可讀代碼在電子設備中運行的情況下,所述電子設備的處理器執行上述任一實施例所述的字元檢測方法。An embodiment of the present invention further provides a computer program, where the computer program includes computer-readable code, and when the computer-readable code is executed in an electronic device, the processor of the electronic device executes any of the above-mentioned embodiments. The character detection method described above.

在本發明實施例中,通過對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數,根據第一字元序列的多條邊界線的預測參數,確定第一字元序列的邊界框的頂點的位置資訊,並根據第一字元序列的邊界框的頂點的位置資訊,確定第一字元序列的邊界框的位置資訊,由此將字元序列的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,從而每一條邊界線的檢測均不會被兩個不同的頂點所干擾,進而能夠提高字元檢測的準確性。In the embodiment of the present invention, the prediction parameters of the plurality of boundary lines of the first character sequence are obtained by respectively predicting the plurality of boundary lines of the first character sequence in the image to be processed. the prediction parameters of the boundary lines, determine the position information of the vertices of the bounding box of the first character sequence, and determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence , so that the polygon (such as quadrilateral) bounding box of the character sequence is decomposed into multiple (such as four) independent boundary lines, and each independent boundary line is detected separately, so that the detection of each boundary line will not is interfered by two different vertices, which in turn can improve the accuracy of character detection.

應當理解的是,以上的一般描述和後文的細節描述僅是示例性和解釋性的,而非限制本發明。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

根據下面參考附圖對示例性實施例的詳細說明,本發明的其它特徵及方面將變得清楚。Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

以下將參考附圖詳細說明本發明的各種示例性實施例、特徵和方面。附圖中相同的附圖標記表示功能相同或相似的組件。儘管在附圖中示出了實施例的各種方面,但是除非特別指出,不必按比例繪製附圖。Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote components that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

在這裡專用的詞“示例性”意為“用作例子、實施例或說明性”。這裡作為“示例性”所說明的任何實施例不必解釋為優於或好於其它實施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

本文中術語“和/或”,僅僅是一種描述關聯物件的關聯關係,表示可以存在三種關係,例如,A和/或B,可以表示:單獨存在A,同時存在A和B,單獨存在B這三種情況。另外,本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合,例如,包括A、B、C中的至少一種,可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only a relationship to describe related objects, which means that there can be three relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including those composed of A, B, and C. Any one or more elements selected in the collection.

另外,為了更好地說明本發明,在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解,沒有某些具體細節,本發明同樣可以實施。在一些實例中,對於本領域技術人員熟知的方法、手段、組件和電路未作詳細描述,以便於凸顯本發明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present invention may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present invention.

相關技術中,大多採用矩形框或者旋轉矩形框來檢測字元,但這些矩形框或者旋轉矩形框均無法準確定位字元邊界,導致影響後續的字元識別。另外,相關技術中還提出了通過回歸四邊形的四個頂點來構成字元的邊界框的字元檢測方法。然而,頂點實際上是兩條相鄰邊相交形成的,每個頂點的回歸會影響兩條邊,因此,每條邊都會被兩個不同的頂點所干擾,從而影響字元檢測結果的準確性。In the related art, a rectangular frame or a rotated rectangular frame is mostly used to detect characters, but these rectangular frames or rotated rectangular frames cannot accurately locate the character boundary, which affects subsequent character recognition. In addition, the related art also proposes a character detection method in which a bounding box of a character is formed by regressing four vertices of a quadrilateral. However, a vertex is actually formed by the intersection of two adjacent edges, and the regression of each vertex affects both edges, so each edge is disturbed by two different vertices, which affects the accuracy of the character detection results.

基於以上問題,本發明實施例提供了一種字元檢測方法、裝置、電子設備、儲存介質及程式,通過將字元的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,由此每一條邊界線的檢測均不會被兩個不同的頂點所干擾,從而能夠提高字元檢測的準確性。Based on the above problems, embodiments of the present invention provide a character detection method, device, electronic device, storage medium, and program. By decomposing a polygon (eg, quadrilateral) bounding box of a character into multiple (eg, four) independent boundaries Each independent boundary line is independently detected, so that the detection of each boundary line will not be disturbed by two different vertices, so that the accuracy of character detection can be improved.

下面結合附圖對本發明實施例提供的字元檢測方法進行詳細的說明。The character detection method provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

圖1示出本發明實施例提供的字元檢測方法的流程圖。其中,字元檢測方法的執行主體可以是字元檢測裝置。在本發明的一些實施例中,字元檢測方法可以由終端設備或伺服器或其它處理設備執行。其中,終端設備可以是使用者設備(User Equipment,UE)、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理(Personal Digital Assistant,PDA)、手持設備、計算設備、車載設備或者可穿戴設備等。在本發明的一些實施例中,字元檢測方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。如圖1所示,字元檢測方法包括步驟S11至步驟S13。FIG. 1 shows a flowchart of a character detection method provided by an embodiment of the present invention. The execution subject of the character detection method may be a character detection device. In some embodiments of the present invention, the character detection method may be performed by a terminal device or a server or other processing device. The terminal device may be User Equipment (UE), mobile device, user terminal, terminal, cellular phone, wireless phone, Personal Digital Assistant (PDA), handheld device, computing device, vehicle-mounted device Or wearable devices, etc. In some embodiments of the present invention, the character detection method may be implemented by the processor calling computer-readable instructions stored in the memory. As shown in FIG. 1 , the character detection method includes steps S11 to S13.

在步驟S11中,對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數。In step S11 , the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained.

其中,第一字元序列的邊界線表示第一字元序列所在區域與非第一字元序列所在區域之間的分界線。Wherein, the boundary line of the first character sequence represents the boundary between the area where the first character sequence is located and the area not where the first character sequence is located.

在本發明實施例中,字元檢測可以表示檢測圖像中的字元和/或字元序列的位置,例如,可以表示檢測圖像中的字元和/或字元序列的邊界框的位置。在本發明實施例中,待處理圖像可以表示需要進行字元檢測的圖像。第一字元序列表示待處理圖像中的任一字元序列。待處理圖像可以包括一個或多個字元序列。第一字元序列可以包括一個或多個字元,字元可以包括文字、字母、數位、標點符號、運算子號等中的至少之一。在本發明的一些實施例中,在待處理圖像中,若任意兩個字元之間的距離小於或等於預設的第一距離閾值,則確定該兩個字元屬於同一字元序列。在本發明的另一些實施例中,在待處理圖像中的書寫方向為水準方向的情況下,若任意兩個字元屬於同一行文字,且該兩個字元之間的距離小於或等於預設的第二距離閾值,則確定該兩個字元屬於同一字元序列;在待處理圖像中的書寫方向為豎直方向的情況下,若任意兩個字元屬於同一列文字,且該兩個字元之間的距離小於或等於預設的第三距離閾值,則確定該兩個字元屬於同一字元序列。其中,書寫方向可以表示相鄰的兩個字元之間的位置關係。例如,若相鄰的兩個字元之間的位置關係為左右關係,則書寫方向為水準方向;若相鄰的兩個字元之間的位置關係為上下關係,則書寫方向為豎直方向。In this embodiment of the present invention, the character detection may indicate the position of detecting the character and/or the character sequence in the image, for example, may indicate the position of the bounding box of detecting the character and/or character sequence in the image . In this embodiment of the present invention, the to-be-processed image may represent an image that needs to be subjected to character detection. The first sequence of characters represents any sequence of characters in the image to be processed. The image to be processed may include one or more sequences of characters. The first sequence of characters may include one or more characters, and the characters may include at least one of words, letters, digits, punctuation, operator symbols, and the like. In some embodiments of the present invention, in the image to be processed, if the distance between any two characters is less than or equal to a preset first distance threshold, it is determined that the two characters belong to the same character sequence. In other embodiments of the present invention, when the writing direction in the image to be processed is the horizontal direction, if any two characters belong to the same line of text, and the distance between the two characters is less than or equal to The preset second distance threshold, it is determined that the two characters belong to the same character sequence; in the case where the writing direction in the image to be processed is the vertical direction, if any two characters belong to the same column of text, and If the distance between the two characters is less than or equal to the preset third distance threshold, it is determined that the two characters belong to the same character sequence. The writing direction may represent the positional relationship between two adjacent characters. For example, if the positional relationship between two adjacent characters is the left-right relationship, the writing direction is the horizontal direction; if the positional relationship between the two adjacent characters is the up-down relationship, the writing direction is the vertical direction .

在本發明實施例中,第一字元序列的邊界線表示第一字元序列所在區域與非第一字元序列所在區域之間的分界線,其中,非第一字元序列所在區域可以是背景區域(即非字元所在區域)和/或其他字元序列所在區域。第一字元序列的邊界線可以是直線,也可以是曲線,在此不作限定。第一字元序列的任意一條邊界線的預測參數可以表示所預測的邊界線的參數。在第一字元序列的邊界線為直線的情況下,第一字元序列的任意一條邊界線的預測參數可以表示邊界線對應的直線方程的預測參數。基於邊界線對應的直線方程的預測參數,能夠確定邊界線的位置。In the embodiment of the present invention, the boundary line of the first character sequence represents the boundary between the area where the first character sequence is located and the area where the non-first character sequence is located, wherein the area where the non-first character sequence is located may be The background area (that is, the area where the non-characters are located) and/or the area where other sequences of characters are located. The boundary line of the first character sequence may be a straight line or a curved line, which is not limited herein. The prediction parameter of any one of the boundary lines of the first sequence of characters may represent the parameters of the predicted boundary line. When the boundary line of the first character sequence is a straight line, the prediction parameter of any boundary line of the first character sequence may represent the prediction parameter of the line equation corresponding to the boundary line. Based on the prediction parameters of the line equation corresponding to the boundary line, the position of the boundary line can be determined.

在本發明實施例中,在第一字元序列的邊界線為直線的情況下,第一字元序列的邊界線的數量為至少3條,第一字元序列的多條邊界線可以圍成第一字元序列的邊界框。第一字元序列的邊界框可以是多邊形,相應地,第一字元序列的邊界線的數量可以與第一字元序列的邊界框的邊數相對應。例如,第一字元序列的邊界框為四邊形,則第一字元序列的邊界線的數量為4。當然,第一字元序列的邊界框也可以為五邊形、三角形等,在此不作限定。In this embodiment of the present invention, when the boundary line of the first character sequence is a straight line, the number of the boundary lines of the first character sequence is at least three, and multiple boundary lines of the first character sequence may enclose The bounding box of the first character sequence. The bounding box of the first character sequence may be a polygon, and accordingly, the number of boundary lines of the first character sequence may correspond to the number of sides of the bounding box of the first character sequence. For example, if the bounding box of the first character sequence is a quadrilateral, the number of boundary lines of the first character sequence is four. Of course, the bounding box of the first character sequence may also be a pentagon, a triangle, etc., which is not limited herein.

在本發明的一些實施例中,第一字元序列的多條邊界線包括第一字元序列的上邊界線、右邊界線、下邊界線和左邊界線。在該實施例中,第一字元序列的邊界框為四邊形,第一字元序列的邊界線的數量為4。其中,第一字元序列的上邊界線,可以表示以第一字元序列中的字元的方向為參照,用於劃分第一字元序列所在區域與第一字元序列上方的非第一字元序列所在區域的分界線;第一字元序列的右邊界線,可以表示以第一字元序列中的字元的方向為參照,用於劃分第一字元序列所在區域與第一字元序列右邊的非第一字元序列所在區域的分界線;第一字元序列的下邊界線,可以表示以第一字元序列中的字元的方向為參照,用於劃分第一字元序列所在區域與第一字元序列下方的非第一字元序列所在區域的分界線;第一字元序列的左邊界線,可以表示以第一字元序列中的字元的方向為參照,用於劃分第一字元序列所在區域與第一字元序列左邊的非第一字元序列所在區域的分界線。由於在大多數情況下,字元序列的形狀為四邊形,因此,根據該實施例,有助於在大多數情況下獲得較準確的字元序列的邊界框的位置資訊。In some embodiments of the present invention, the plurality of boundary lines of the first sequence of characters includes an upper boundary, a right boundary, a lower boundary and a left boundary of the first sequence of characters. In this embodiment, the bounding box of the first character sequence is a quadrilateral, and the number of boundary lines of the first character sequence is four. Wherein, the upper boundary line of the first character sequence may indicate that the direction of the characters in the first character sequence is used as a reference to divide the area where the first character sequence is located and the non-first character sequence above the first character sequence. The dividing line of the area where the character sequence is located; the right boundary line of the first character sequence can indicate that the direction of the characters in the first character sequence is used as a reference to divide the area where the first character sequence is located and the first character. The boundary of the area on the right side of the sequence that is not the first character sequence; the lower boundary line of the first character sequence can be used to divide the first character sequence with the direction of the characters in the first character sequence as a reference The dividing line between the area where the first character sequence is located and the area where the non-first character sequence is located below the first character sequence; the left boundary line of the first character sequence can indicate that the direction of the characters in the first character sequence is used as a reference. A boundary line that divides the area where the first character sequence is located and the area on the left of the first character sequence that is not where the first character sequence is located. Since in most cases, the shape of the character sequence is a quadrilateral, according to this embodiment, it is helpful to obtain more accurate position information of the bounding box of the character sequence in most cases.

在該實施例中,對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數,可以包括:對待處理圖像中第一字元序列的上邊界線進行預測,得到第一字元序列的上邊界線對應的直線方程的預測參數;對待處理圖像中第一字元序列的右邊界線進行預測,得到第一字元序列的右邊界線對應的直線方程的預測參數;對待處理圖像中第一字元序列的下邊界線進行預測,得到第一字元序列的下邊界線對應的直線方程的預測參數;對待處理圖像中第一字元序列的左邊界線進行預測,得到第一字元序列的左邊界線對應的直線方程的預測參數。In this embodiment, predicting multiple boundary lines of the first character sequence in the image to be processed respectively, to obtain prediction parameters of multiple boundary lines of the first character sequence, may include: the first character sequence in the image to be processed. The upper boundary line of the character sequence is predicted to obtain the prediction parameters of the straight line equation corresponding to the upper boundary line of the first character sequence; the right boundary line of the first character sequence in the image to be processed is predicted to obtain the first character sequence The prediction parameters of the straight line equation corresponding to the right boundary line of the image to be processed; predict the lower boundary line of the first character sequence in the image to be processed, and obtain the prediction parameters of the straight line equation corresponding to the lower boundary line of the first character sequence; the image to be processed The left boundary line of the first character sequence is predicted, and the prediction parameter of the straight line equation corresponding to the left boundary line of the first character sequence is obtained.

在步驟S12中,根據第一字元序列的多條邊界線的預測參數,確定第一字元序列的邊界框的頂點的位置資訊。In step S12, the position information of the vertices of the bounding box of the first character sequence is determined according to the prediction parameters of the plurality of boundary lines of the first character sequence.

在本發明實施例中,根據第一字元序列的多條邊界線的預測參數,可以得到第一字元序列的多條邊界線的交點,並可以將第一字元序列的多條邊界線的交點的位置資訊,作為第一字元序列的邊界框的頂點的位置資訊。例如,第一字元序列的多條邊界線包括第一字元序列的上邊界線、右邊界線、下邊界線和左邊界線;根據第一字元序列的上邊界線對應的直線方程的預測參數和第一字元序列的右邊界線對應的直線方程的預測參數,可以得到第一字元序列的上邊界線與第一字元序列的右邊界線的交點,並可以將第一字元序列的上邊界線與第一字元序列的右邊界線的交點的位置資訊作為第一字元序列的邊界框的右上角頂點的位置資訊;根據第一字元序列的右邊界線對應的直線方程的預測參數和第一字元序列的下邊界線對應的直線方程的預測參數,可以得到第一字元序列的右邊界線與第一字元序列的下邊界線的交點,並可以將第一字元序列的右邊界線與第一字元序列的下邊界線的交點的位置資訊作為第一字元序列的邊界框的右下角頂點的位置資訊;根據第一字元序列的下邊界線對應的直線方程的預測參數和第一字元序列的左邊界線對應的直線方程的預測參數,可以得到第一字元序列的下邊界線與第一字元序列的左邊界線的交點,並可以將第一字元序列的下邊界線與第一字元序列的左邊界線的交點的位置資訊作為第一字元序列的邊界框的左下角頂點的位置資訊;根據第一字元序列的左邊界線對應的直線方程的預測參數和第一字元序列的上邊界線對應的直線方程的預測參數,可以得到第一字元序列的左邊界線與第一字元序列的上邊界線的交點,並可以將第一字元序列的左邊界線與第一字元序列的上邊界線的交點的位置資訊作為第一字元序列的邊界框的左上角頂點的位置資訊。在本發明實施例中,第一字元序列的邊界框的頂點的位置資訊可以採用第一字元序列的邊界框的頂點的座標來表示。例如,第一字元序列的邊界框的頂點的位置資訊可以包括第一字元序列的邊界框的左上角頂點的座標、右上角頂點的座標、右下角頂點的座標和左下角頂點的座標。In the embodiment of the present invention, according to the prediction parameters of the multiple boundary lines of the first character sequence, the intersection of the multiple boundary lines of the first character sequence can be obtained, and the multiple boundary lines of the first character sequence can be The position information of the intersection point of , as the position information of the vertices of the bounding box of the first character sequence. For example, the multiple boundary lines of the first character sequence include an upper boundary line, a right boundary line, a lower boundary line and a left boundary line of the first character sequence; according to the prediction parameters of the straight line equation corresponding to the upper boundary line of the first character sequence The prediction parameters of the straight line equation corresponding to the right boundary line of the first character sequence can obtain the intersection of the upper boundary line of the first character sequence and the right boundary line of the first character sequence, and the upper boundary line of the first character sequence can be obtained. The position information of the intersection of the boundary line and the right boundary line of the first character sequence is used as the position information of the upper right corner vertex of the bounding box of the first character sequence; according to the prediction parameters of the straight line equation corresponding to the right boundary line of the first character sequence and The prediction parameters of the straight line equation corresponding to the lower boundary line of the first character sequence can be obtained by obtaining the intersection of the right boundary line of the first character sequence and the lower boundary line of the first character sequence, and the right side of the first character sequence can be The position information of the intersection of the boundary line and the lower boundary line of the first character sequence is used as the position information of the lower right corner vertex of the bounding box of the first character sequence; according to the prediction parameter of the straight line equation corresponding to the lower boundary line of the first character sequence The prediction parameters of the straight line equation corresponding to the left boundary line of the first character sequence can obtain the intersection of the lower boundary line of the first character sequence and the left boundary line of the first character sequence, and the lower boundary line of the first character sequence can be obtained. The position information of the intersection of the boundary line and the left boundary line of the first character sequence is used as the position information of the lower left corner vertex of the bounding box of the first character sequence; according to the prediction parameters of the line equation corresponding to the left boundary line of the first character sequence and The prediction parameters of the straight line equation corresponding to the upper boundary line of the first character sequence can obtain the intersection of the left boundary line of the first character sequence and the upper boundary line of the first character sequence, and the left boundary line of the first character sequence can be obtained. The position information of the intersection of the boundary line and the upper boundary line of the first character sequence is used as the position information of the upper left corner vertex of the bounding box of the first character sequence. In this embodiment of the present invention, the position information of the vertices of the bounding box of the first character sequence may be represented by the coordinates of the vertices of the bounding box of the first character sequence. For example, the location information of the vertices of the bounding box of the first character sequence may include the coordinates of the upper left vertex, the upper right vertex, the lower right vertex and the lower left vertex of the bounding box of the first character sequence.

在步驟S13中,根據第一字元序列的邊界框的頂點的位置資訊,確定第一字元序列的邊界框的位置資訊。In step S13, the position information of the bounding box of the first character sequence is determined according to the position information of the vertices of the bounding box of the first character sequence.

在本發明實施例中,可以將第一字元序列的邊界框的頂點的位置資訊,作為第一字元序列的邊界框的位置資訊。例如,第一字元序列的邊界框的位置資訊可以包括第一字元序列的邊界框的各個頂點的座標。當然,在第一字元序列的邊界框為矩形的情況下,還可以採用第一字元序列的邊界框的任意一個頂點的座標和與該頂點相連的兩條邊的邊長來表示第一字元序列的邊界框的位置資訊,在此不作限定。In this embodiment of the present invention, the position information of the vertices of the bounding box of the first character sequence may be used as the position information of the bounding box of the first character sequence. For example, the location information of the bounding box of the first sequence of characters may include coordinates of respective vertices of the bounding box of the first sequence of characters. Of course, when the bounding box of the first character sequence is a rectangle, the coordinates of any vertex of the bounding box of the first character sequence and the lengths of two sides connected to the vertex can also be used to represent the first character. The position information of the bounding box of the meta-sequence is not limited here.

圖2為可以應用本發明實施例的字元檢測方法的一種系統架構示意圖;如圖2所示,該系統架構中包括:圖像獲取終端201、網路202和確定位置終端203。為實現支撐一個示例性應用,圖像獲取終端201和確定位置終端203通過網路202建立通信連接,圖像獲取終端201通過網路202向確定位置終端203上報待處理圖像,確定位置終端203回應於接收到的待處理圖像,並對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數,並基於第一字元序列的多條邊界線的預測參數,確定第一字元序列的邊界框的頂點的位置資訊;根據第一字元序列的邊界框的頂點的位置資訊,確定第一字元序列的邊界框的位置資訊。最後,確定位置終端203將該確定的位置資訊上傳至網路202,並通過網路202發送給圖像獲取終端201。FIG. 2 is a schematic diagram of a system architecture to which a character detection method according to an embodiment of the present invention can be applied; as shown in FIG. In order to support an exemplary application, the image acquisition terminal 201 and the location determination terminal 203 establish a communication connection through the network 202, and the image acquisition terminal 201 reports the image to be processed to the location determination terminal 203 through the network 202, and the location determination terminal 203 reports the image to be processed. 203 In response to the received image to be processed, the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted to obtain prediction parameters of multiple boundary lines of the first character sequence, and based on the first The prediction parameters of multiple boundary lines of the character sequence determine the position information of the vertices of the bounding box of the first character sequence; according to the position information of the vertices of the bounding box of the first character sequence, the boundary of the first character sequence is determined The location information of the box. Finally, the location determination terminal 203 uploads the determined location information to the network 202 , and sends the determined location information to the image acquisition terminal 201 through the network 202 .

作為示例,圖像獲取終端201可以包括圖像採集設備,確定位置終端203可以包括具有視覺資訊處理能力的視覺處理設備或遠端伺服器。網路202可以採用有線連接或無線連接方式。其中,當確定位置終端203為視覺處理設備時,圖像獲取終端201可以通過有線連接的方式與視覺處理設備通信連接,例如通過匯流排進行資料通信;當確定位置終端203為遠端伺服器時,圖像獲取終端201可以通過無線網路與遠端伺服器進行資料交互。As an example, the image acquisition terminal 201 may include an image acquisition device, and the location determination terminal 203 may include a visual processing device or a remote server with visual information processing capability. The network 202 can be wired or wireless. Wherein, when it is determined that the location terminal 203 is a visual processing device, the image acquisition terminal 201 can be connected to the visual processing device through a wired connection, such as data communication through a bus; when it is determined that the location terminal 203 is a remote server , the image acquisition terminal 201 can exchange data with the remote server through the wireless network.

或者,在一些場景中,圖像獲取終端201可以是帶有圖像採集模組的視覺處理設備,具體實現為帶有攝影頭的主機。這時,本發明實施例的字元檢測方法可以由圖像獲取終端201執行,上述系統架構可以不包含網路202和確定位置終端203。Alternatively, in some scenarios, the image acquisition terminal 201 may be a vision processing device with an image acquisition module, and is specifically implemented as a host with a camera. At this time, the character detection method according to the embodiment of the present invention may be executed by the image acquisition terminal 201 , and the above-mentioned system architecture may not include the network 202 and the location determination terminal 203 .

在本發明實施例中,通過對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數,根據第一字元序列的多條邊界線的預測參數,確定第一字元序列的邊界框的頂點的位置資訊,並根據第一字元序列的邊界框的頂點的位置資訊,確定第一字元序列的邊界框的位置資訊,由此將字元序列的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,從而每一條邊界線的檢測均不會被兩個不同的頂點所干擾,進而能夠提高字元檢測的準確性。In the embodiment of the present invention, the prediction parameters of the plurality of boundary lines of the first character sequence are obtained by respectively predicting the plurality of boundary lines of the first character sequence in the image to be processed. the prediction parameters of the boundary lines, determine the position information of the vertices of the bounding box of the first character sequence, and determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence , so that the polygon (such as quadrilateral) bounding box of the character sequence is decomposed into multiple (such as four) independent boundary lines, and each independent boundary line is detected separately, so that the detection of each boundary line will not is interfered by two different vertices, which in turn can improve the accuracy of character detection.

在本發明的一些實施例中,對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數,包括:基於待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數;根據第一字元序列的多條邊界線對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數。在該實施例中,第一特徵點表示與第一字元序列相關的特徵點。其中,特徵點可以表示圖像灰度值發生劇烈變化的點和/或在圖像邊緣上曲率較大的點(即兩個邊緣的交點)。第一特徵點的數量可以為多個,當然,也可以為1個,在此不作限定。例如,在第一特徵點的數量為多個且第一字元序列的邊界線的數量為4條的情況下,對於任意一個第一特徵點,分別預測第一字元序列的每條邊界線對應於第一特徵點的參數,並對於任意一條邊界線,根據該邊界線對應於各個第一特徵點的參數,確定該邊界線的預測參數。例如,可以對該邊界線對應於各個第一特徵點的參數進行回歸,得到該邊界線的預測參數。在該實施例中,基於待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數,並根據第一字元序列的多條邊界線對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數,由此基於與第一字元序列相關的特徵點對第一字元序列的邊界線的參數進行預測,從而有助於提高得到邊界線的預測參數的效率,並有助於提高所得到的預測參數的準確性。當然,在本發明的其他實施例中,還可以基於與第一字元序列相關的所有圖元點(不限於與第一字元序列相關的第一特徵點)確定第一字元序列的多條邊界線的預測參數,在此不作限定。In some embodiments of the present invention, the multiple boundary lines of the first character sequence in the image to be processed are respectively predicted to obtain the prediction parameters of the multiple boundary lines of the first character sequence, including: based on the image to be processed , for the first feature point relevant to the first character sequence, respectively predict that the multiple boundary lines of the first character sequence correspond to the parameters of the first feature point; A parameter of a feature point that determines prediction parameters of a plurality of boundary lines of the first character sequence. In this embodiment, the first feature points represent feature points associated with the first sequence of characters. Among them, the feature points can represent the points where the gray value of the image changes drastically and/or the points with large curvature on the edge of the image (that is, the intersection of two edges). The number of the first feature points may be multiple, of course, may also be one, which is not limited here. For example, when the number of first feature points is multiple and the number of boundary lines of the first character sequence is 4, for any first feature point, each boundary line of the first character sequence is predicted separately Corresponding to the parameters of the first feature point, and for any boundary line, the prediction parameters of the boundary line are determined according to the parameters of the boundary line corresponding to each of the first feature points. For example, the parameters of the boundary line corresponding to each of the first feature points can be regressed to obtain the predicted parameters of the boundary line. In this embodiment, based on the image to be processed, for the first feature points related to the first character sequence, the parameters corresponding to the first feature points of the plurality of boundary lines of the first character sequence are respectively predicted, and according to the first feature point The plurality of boundary lines of a sequence of characters correspond to parameters of the first feature point, and the prediction parameters of the plurality of boundary lines of the first sequence of characters are determined, whereby the first character The parameters of the boundary line of the meta-sequence are predicted, thereby helping to improve the efficiency of obtaining the prediction parameters of the boundary line, and helping to improve the accuracy of the obtained prediction parameters. Of course, in other embodiments of the present invention, it is also possible to determine the number of the first character sequence based on all the primitive points related to the first character sequence (not limited to the first feature points related to the first character sequence). The prediction parameters of the boundary lines are not limited here.

作為該實施例的一個示例,該方法還包括:預測待處理圖像中的圖元所在位置屬於字元的概率;根據待處理圖像中的圖元所在位置屬於字元的概率,確定第一特徵點。在該示例中,可以預測待處理圖像中的各個圖元所在位置屬於字元的概率。根據待處理圖像中的各個圖元所在位置屬於字元的概率,可以初步確定待處理圖像中的各個字元序列所占的區域。對於任一第一字元序列,可以根據初步確定的第一字元序列所占區域中的特徵點,確定第一特徵點。例如,可以將初步確定的第一字元序列所占區域中的全部或部分特徵點,確定為第一特徵點。在該示例中,通過預測待處理圖像中的圖元所在位置屬於字元的概率,並根據待處理圖像中的圖元所在位置屬於字元的概率,確定第一特徵點,由此能夠準確地確定與第一字元序列相關的第一特徵點。基於由此確定的一特徵點對第一字元序列的邊界線的參數進行預測,有助於進一步提高得到邊界線的預測參數的效率和準確性。As an example of this embodiment, the method further includes: predicting the probability that the position of the graphic element in the image to be processed belongs to the character element; according to the probability that the position of the graphic element in the image to be processed belongs to the character element Feature points. In this example, the probability that the location of each primitive in the image to be processed belongs to a character can be predicted. According to the probability that the position of each graphic element in the image to be processed belongs to a character element, the area occupied by each sequence of characters in the image to be processed can be preliminarily determined. For any first character sequence, the first feature point may be determined according to the preliminarily determined feature points in the area occupied by the first character sequence. For example, all or part of the feature points in the area occupied by the initially determined first character sequence may be determined as the first feature points. In this example, by predicting the probability that the position of the primitive in the image to be processed belongs to the character, and determining the first feature point according to the probability that the position of the primitive in the image to be processed belongs to the character, it is possible to A first feature point associated with the first sequence of characters is accurately determined. Predicting the parameters of the boundary line of the first character sequence based on a feature point thus determined helps to further improve the efficiency and accuracy of obtaining the prediction parameters of the boundary line.

在本發明的其他實施例提供的示例中,還可以將待處理圖像中的特徵點分別作為第一特徵點,而無需進行字元概率的預測。例如,在待處理圖像中只存在一個第一字元序列,且第一字元序列占滿或幾乎占滿待處理圖像的情況下,可以將待處理圖像中的特徵點分別作為第一特徵點。In the examples provided by other embodiments of the present invention, the feature points in the to-be-processed image can also be used as the first feature points, respectively, without predicting the probability of characters. For example, in the case that there is only one first character sequence in the image to be processed, and the first character sequence occupies or almost occupies the image to be processed, the feature points in the image to be processed can be respectively used as the first character sequence. a feature point.

作為該實施例的一個示例,第一字元序列的多條邊界線對應於第一特徵點的參數包括:第一字元序列的多條邊界線在第一特徵點對應的極座標系下的距離參數和角度參數,其中,第一特徵點對應的極座標系表示以第一特徵點為極點的極座標系。在該示例中,第一特徵點對應的極座標系可以以極點指向x軸正方向的軸作為極軸。當然,本領域技術人員可以根據實際應用場景需求靈活設置極軸,在此不作限定。在該示例中,第一字元序列的任意一條邊界線在第一特徵點對應的極座標系下的距離參數,可以表示在第一特徵點對應的極座標系下、第一特徵點與邊界線之間的最小距離,即,可以表示在第一特徵點對應的極座標系下、第一特徵點到邊界線的垂線段的長度;第一字元序列的任意一條邊界線在第一特徵點對應的極座標系下的角度參數,可以表示在第一特徵點對應的極座標系下、由第一特徵點指向邊界線上的垂點的向量與第一特徵點對應的極座標系的極軸之間的夾角,其中,邊界線上的垂點表示第一特徵點到邊界線的垂線段與邊界線的交點。As an example of this embodiment, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include: distances of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point parameters and angle parameters, wherein the polar coordinate system corresponding to the first feature point represents a polar coordinate system with the first feature point as a pole. In this example, the polar coordinate system corresponding to the first feature point may use the axis whose pole points to the positive direction of the x-axis as the polar axis. Of course, those skilled in the art can flexibly set the polar axis according to actual application scenario requirements, which is not limited here. In this example, the distance parameter of any boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point can represent the distance between the first feature point and the boundary line in the polar coordinate system corresponding to the first feature point. The minimum distance between, that is, it can represent the length of the vertical line segment from the first feature point to the boundary line in the polar coordinate system corresponding to the first feature point; any boundary line of the first character sequence is in the first feature point. The angle parameter in the polar coordinate system can represent the angle between the vector from the first feature point to the vertical point on the boundary line in the polar coordinate system corresponding to the first feature point and the polar axis of the polar coordinate system corresponding to the first feature point, The vertical point on the boundary line represents the intersection point of the vertical line segment from the first feature point to the boundary line and the boundary line.

在一個例子中,笛卡爾座標系(直角座標系或者斜座標系)下的直線方程可以採用公式(1)來表示:

Figure 02_image003
公式(1); 其中,
Figure 02_image005
Figure 02_image007
Figure 02_image009
表示直線方程的參數。 In one example, the equation of a straight line in the Cartesian coordinate system (the Cartesian coordinate system or the oblique coordinate system) can be expressed by formula (1):
Figure 02_image003
Formula (1); where,
Figure 02_image005
,
Figure 02_image007
and
Figure 02_image009
The parameters representing the equation of the line.

然而,當

Figure 02_image011
時,公式1所示的直線方程中的參數及參數之間的相關性存在冗餘。另外,笛卡爾座標系下的直線方程的參數在圖像中沒有明確的物理意義,導致不利於網路學習。 However, when
Figure 02_image011
When , the parameters in the equation of the straight line shown in Equation 1 and the correlations between the parameters are redundant. In addition, the parameters of the straight line equation in the Cartesian coordinate system have no clear physical meaning in the image, which is not conducive to network learning.

在該示例中,可以將笛卡爾座標系下的直線方程轉換至極座標系,得到公式(2):

Figure 02_image013
公式(2); 其中,
Figure 02_image015
可以表示第一字元序列的任意一條邊界線在第一特徵點對應的極座標系下的距離參數,
Figure 02_image017
可以表示第一字元序列的任意一條邊界線在第一特徵點對應的極座標系下的角度參數。 In this example, the equation of the line in the Cartesian coordinate system can be converted to the polar coordinate system, and the formula (2) can be obtained:
Figure 02_image013
Formula (2); where,
Figure 02_image015
can represent the distance parameter of any boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point,
Figure 02_image017
It can represent the angle parameter of any boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point.

相應地,直線方程的參數可以採用公式(3)來表示:

Figure 02_image019
公式(3)。 Correspondingly, the parameters of the straight line equation can be expressed by formula (3):
Figure 02_image019
Formula (3).

圖3示出第一字元序列的4條邊界線在某一第一特徵點對應的極座標系下的距離參數和角度參數的示意圖。如圖3所示,第一字元序列的上邊界線在該第一特徵點對應的極座標系下的距離參數為

Figure 02_image021
,角度參數為
Figure 02_image023
;第一字元序列的右邊界線在該第一特徵點對應的極座標系下的距離參數為
Figure 02_image025
,角度參數為
Figure 02_image027
;第一字元序列的下邊界線在該第一特徵點對應的極座標系下的距離參數為
Figure 02_image029
,角度參數為
Figure 02_image031
;第一字元序列的左邊界線在該第一特徵點對應的極座標系下的距離參數為
Figure 02_image033
,角度參數為
Figure 02_image035
。 FIG. 3 is a schematic diagram showing the distance parameters and angle parameters of the four boundary lines of the first character sequence in a polar coordinate system corresponding to a certain first feature point. As shown in Figure 3, the distance parameter of the upper boundary line of the first character sequence in the polar coordinate system corresponding to the first feature point is:
Figure 02_image021
, the angle parameter is
Figure 02_image023
; The distance parameter of the right boundary line of the first character sequence under the polar coordinate system corresponding to the first feature point is
Figure 02_image025
, the angle parameter is
Figure 02_image027
; the distance parameter of the lower boundary line of the first character sequence under the polar coordinate system corresponding to the first feature point is
Figure 02_image029
, the angle parameter is
Figure 02_image031
; The distance parameter of the left boundary line of the first character sequence under the polar coordinate system corresponding to the first feature point is
Figure 02_image033
, the angle parameter is
Figure 02_image035
.

在該示例中,通過將邊界線在笛卡爾座標系下的直線方程映射到極座標系中,得到在圖像中具有明確的物理意義且相互獨立的距離參數和角度參數,減少了參數量及相關性,且有利於網路學習。In this example, by mapping the straight line equation of the boundary line in the Cartesian coordinate system to the polar coordinate system, the distance parameters and angle parameters that have clear physical meaning and are independent of each other in the image are obtained, reducing the amount of parameters and related Sexuality, and is conducive to online learning.

其中,根據第一字元序列的多條邊界線對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數,包括:將第一字元序列的多條邊界線在第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到第一字元序列的多條邊界線在笛卡爾座標系下對應於第一特徵點的參數;根據第一字元序列的多條邊界線在笛卡爾座標系下對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數。在這個例子中,在第一特徵點的數量為多個的情況下,多個第一特徵點對應於不同的極座標系,其中,任一第一特徵點對應的極座標系以該第一特徵點為極點。因此,對於第一字元序列的任意一條邊界線,在根據該邊界線在多個第一特徵點對應的極座標系下的距離參數和角度參數回歸得到該邊界線的預測參數時,可先將該邊界線在多個第一特徵點對應的極座標系下的距離參數和角度參數映射至同一個笛卡爾座標系下,得到該邊界線在該笛卡爾座標系下對應於多個特徵點的參數,再根據該邊界線在該笛卡爾座標系下對應於多個特徵點的參數進行回歸,得到該邊界線的預測參數。其中,通過將第一字元序列的多條邊界線在第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到第一字元序列的多條邊界線在笛卡爾座標系下對應於第一特徵點的參數,並根據第一字元序列的多條邊界線在笛卡爾座標系下對應於第一特徵點的參數,確定第一字元序列的多條邊界線的預測參數,由此能夠基於不同極座標系下的參數回歸得到邊界線的預測參數。Wherein, according to the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature point, determining the prediction parameters of the multiple boundary lines of the first character sequence, including: combining the multiple boundary lines of the first character sequence The distance parameter and the angle parameter under the polar coordinate system corresponding to the first feature point are mapped to the Cartesian coordinate system, and a plurality of boundary lines of the first character sequence are obtained under the Cartesian coordinate system corresponding to the parameters of the first feature point; according to The multiple boundary lines of the first character sequence correspond to the parameters of the first feature point in the Cartesian coordinate system, and the prediction parameters of the multiple boundary lines of the first character sequence are determined. In this example, when the number of the first feature points is multiple, the multiple first feature points correspond to different polar coordinate systems, wherein the polar coordinate system corresponding to any first feature point is based on the first feature point. to the extreme. Therefore, for any boundary line of the first character sequence, when regressing the distance parameter and angle parameter of the boundary line in the polar coordinate system corresponding to the plurality of first feature points to obtain the prediction parameter of the boundary line, the prediction parameters of the boundary line can be obtained first. The distance parameters and angle parameters of the boundary line in the polar coordinate system corresponding to the plurality of first feature points are mapped to the same Cartesian coordinate system, and the parameters of the boundary line corresponding to the plurality of feature points in the Cartesian coordinate system are obtained. , and then perform regression according to the parameters of the boundary line corresponding to a plurality of feature points in the Cartesian coordinate system to obtain the prediction parameters of the boundary line. Wherein, by mapping the distance parameters and angle parameters of the multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point to the Cartesian coordinate system, the multiple boundary lines of the first character sequence are obtained in the Cartesian coordinate system. The parameters corresponding to the first feature point in the Cartesian coordinate system, and according to the parameters corresponding to the first feature point in the Cartesian coordinate system of the multiple boundary lines of the first character sequence, multiple boundaries of the first character sequence are determined The predicted parameters of the boundary line can be obtained by regression based on the parameters in different polar coordinate systems.

如圖3所示,第一字元序列的上邊界線的預測參數為

Figure 02_image037
Figure 02_image039
Figure 02_image041
,即預測的第一字元序列的上邊界線的直線方程可以表示為
Figure 02_image043
;第一字元序列的右邊界線的預測參數為
Figure 02_image045
Figure 02_image047
Figure 02_image049
,即預測的第一字元序列的右邊界線的直線方程可以表示為
Figure 02_image051
;第一字元序列的下邊界線的預測參數為
Figure 02_image053
Figure 02_image055
Figure 02_image057
,即預測的第一字元序列的下邊界線的直線方程可以表示為
Figure 02_image059
;第一字元序列的左邊界線的預測參數為
Figure 02_image061
Figure 02_image063
Figure 02_image065
,即,所預測的第一字元序列的左邊界線的直線方程可以表示為
Figure 02_image067
。即可根據公式(4)至(6),得到第一字元序列的邊界框的各個頂點的座標:
Figure 02_image069
公式(4);
Figure 02_image071
公式(5);
Figure 02_image073
公式(6); 其中,
Figure 02_image075
Figure 02_image077
Figure 02_image079
Figure 02_image081
均為整數。例如,
Figure 02_image083
可以表示第一字元序列的邊界框的右上角頂點的座標,
Figure 02_image085
可以表示第一字元序列的邊界框的右下角頂點的座標,
Figure 02_image087
可以表示第一字元序列的邊界框的左下角頂點的座標,
Figure 02_image089
可以表示第一字元序列的邊界框的左上角頂點的座標。 As shown in Fig. 3, the prediction parameter of the upper boundary line of the first character sequence is
Figure 02_image037
,
Figure 02_image039
and
Figure 02_image041
, that is, the straight line equation of the upper boundary line of the predicted first character sequence can be expressed as
Figure 02_image043
; the prediction parameter of the right boundary line of the first character sequence is
Figure 02_image045
,
Figure 02_image047
and
Figure 02_image049
, that is, the straight line equation of the predicted right boundary line of the first character sequence can be expressed as
Figure 02_image051
; the prediction parameter of the lower boundary line of the first character sequence is
Figure 02_image053
,
Figure 02_image055
and
Figure 02_image057
, that is, the straight line equation of the lower boundary line of the predicted first character sequence can be expressed as
Figure 02_image059
; the prediction parameter of the left boundary line of the first character sequence is
Figure 02_image061
,
Figure 02_image063
and
Figure 02_image065
, that is, the straight line equation of the predicted left boundary line of the first character sequence can be expressed as
Figure 02_image067
. Then, according to formulas (4) to (6), the coordinates of each vertex of the bounding box of the first character sequence can be obtained:
Figure 02_image069
formula (4);
Figure 02_image071
formula (5);
Figure 02_image073
Formula (6); where,
Figure 02_image075
,
Figure 02_image077
,
Figure 02_image079
and
Figure 02_image081
All are integers. E.g,
Figure 02_image083
may represent the coordinates of the upper-right vertex of the bounding box of the first sequence of characters,
Figure 02_image085
may represent the coordinates of the lower-right vertex of the bounding box of the first sequence of characters,
Figure 02_image087
may represent the coordinates of the lower-left corner vertex of the bounding box of the first sequence of characters,
Figure 02_image089
The coordinates of the upper-left corner vertex of the bounding box that may represent the first sequence of characters.

在其他示例中,第一字元序列的任意一條邊界線對應於第一特徵點的參數可以包括基於第一特徵點預測的邊界線在笛卡爾座標系下的參數,在此不作限定。In other examples, the parameters of any boundary line of the first character sequence corresponding to the first feature point may include parameters of the boundary line predicted based on the first feature point in a Cartesian coordinate system, which is not limited herein.

在一個例子中,基於待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數,包括:將待處理圖像輸入預先訓練的神經網路,經由神經網路針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數。其中,通過預先訓練的神經網路針對與第一字元序列相關的第一特徵點,分別預測第一字元序列的多條邊界線對應於第一特徵點的參數,由此能夠提高預測參數的速度以及準確性。同時還可通過預先建立的模型、函數等預測第一字元序列的多條邊界線對應於第一特徵點的參數,在此不作限定。In an example, based on the image to be processed, for the first feature point related to the first character sequence, respectively predicting the parameters corresponding to the first feature point of multiple boundary lines of the first character sequence, including: The processing image is input to a pre-trained neural network, and the parameters corresponding to the first feature points of the plurality of boundary lines of the first character sequence are respectively predicted through the neural network for the first feature points related to the first character sequence. Wherein, for the first feature points related to the first character sequence, the pre-trained neural network respectively predicts the parameters of the multiple boundary lines of the first character sequence corresponding to the first feature points, so that the prediction parameters can be improved. speed and accuracy. At the same time, the parameters corresponding to the first feature point of the multiple boundary lines of the first character sequence can also be predicted by using a pre-established model, function, etc., which is not limited herein.

本發明提供的實施例中,還可經由神經網路預測待處理圖像中的圖元所在位置屬於字元的概率。其中,通過預先訓練的神經網路預測待處理圖像中的圖元所在位置屬於字元的概率,由此能夠提高預測圖元所在位置屬於字元的概的速度,並能夠提高所預測的概率的準確性。當然,在其他例子中,還可以通過預先建立的模型、函數等預測待處理圖像中的圖元所在位置屬於字元的概率,在此不作限定。In the embodiment provided by the present invention, the probability that the location of the graphic element in the image to be processed belongs to the character element can also be predicted through the neural network. Among them, the pre-trained neural network is used to predict the probability that the position of the primitive in the image to be processed belongs to the character, so that the speed of predicting the probability that the position of the primitive belongs to the character can be improved, and the predicted probability can be improved. accuracy. Of course, in other examples, a pre-established model, function, etc. can also be used to predict the probability that the location of the graphic element in the image to be processed belongs to the character element, which is not limited here.

在本發明一些的實施例中,將待處理圖像輸入預先訓練的神經網路之前,還可以將訓練圖像輸入神經網路,經由神經網路針對與訓練圖像中的第二字元序列相關的第二特徵點,分別預測第二字元序列的多條邊界線對應於第二特徵點的參數的預測值;根據第二字元序列的多條邊界線對應於第二特徵點的參數的預測值,以及第二字元序列的多條邊界線對應於第二特徵點的參數的真值,訓練神經網路。In some embodiments of the present invention, before the image to be processed is input into the pre-trained neural network, the training image can also be input into the neural network, and the second character sequence in the training image can be compared via the neural network. Relevant second feature points, respectively predict the predicted values of the parameters of the second feature point corresponding to the multiple boundary lines of the second character sequence; according to the multiple boundary lines of the second character sequence corresponding to the parameters of the second feature point The predicted value of , and the multiple boundary lines of the second sequence of characters correspond to the true values of the parameters of the second feature point to train the neural network.

在相關技術中,通過回歸四邊形的四個頂點來構成字元的邊界框。由於頂點實際上是兩條相鄰邊相交形成的,每個頂點的回歸會影響兩條邊,因此,每條邊都會被兩個不同的頂點所干擾,從而影響網路的學習效率和檢測效果。在本發明提供的實施例中,通過將字元序列的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,由此不會由於回歸頂點而給神經網路帶來訓練擾動,從而提高神經網路的學習效率和檢測效果。根據該實施例訓練得到的神經網路能夠學習到準確地預測字元序列的邊界線的參數的能力。In the related art, the bounding box of the character is formed by regressing the four vertices of the quadrilateral. Since the vertex is actually formed by the intersection of two adjacent edges, the regression of each vertex will affect the two edges, therefore, each edge will be disturbed by two different vertices, thus affecting the learning efficiency and detection effect of the network. In the embodiment provided by the present invention, the polygon (for example, quadrilateral) bounding box of the character sequence is decomposed into a plurality of (for example, four) independent boundary lines, and each independent boundary line is independently detected, so that no It will bring training disturbance to the neural network due to the regression vertex, thereby improving the learning efficiency and detection effect of the neural network. The neural network trained according to this embodiment can learn the ability to accurately predict the parameters of the boundary line of the character sequence.

作為該實施例的一個示例,第二字元序列的多條邊界線對應於第二特徵點的參數包括:第二字元序列的多條邊界線在第二特徵點對應的極座標系下的距離參數和角度參數,其中,第二特徵點對應的極座標系表示以第二特徵點為極點的極座標系;根據第二字元序列的多條邊界線對應於第二特徵點的參數的預測值,以及第二字元序列的多條邊界線對應於第二特徵點的參數的真值,訓練神經網路,包括:根據第二字元序列的多條邊界線對應於第二特徵點的距離參數的預測值,以及第二字元序列的多條邊界線對應於第二特徵點的距離參數的真值,訓練神經網路;和/或,根據第二字元序列的多條邊界線對應於第二特徵點的角度參數的預測值,以及第二字元序列的多條邊界線對應於第二特徵點的角度參數的真值,訓練神經網路。在該示例中,通過將笛卡爾座標系下的直線方程映射到極座標系中,減少了學習參數和參數間的相關性,並且賦予了參數以圖像中的實際物理意義,有利於網路學習。另外,在該示例中,通過訓練神經網路學習檢測字元序列的各條邊界線對應於特徵點的距離與角度,能夠使邊界線的檢測不互相干擾,從而能夠提高神經網路的學習效率和檢測效果。As an example of this embodiment, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature points include: distances of the plurality of boundary lines of the second character sequence in the polar coordinate system corresponding to the second feature points parameter and angle parameter, wherein, the polar coordinate system corresponding to the second feature point represents the polar coordinate system with the second feature point as the pole; according to the predicted value of the parameter corresponding to the second feature point according to multiple boundary lines of the second character sequence, And the multiple boundary lines of the second character sequence are corresponding to the true value of the parameter of the second feature point, and the training neural network includes: according to the multiple boundary lines of the second character sequence corresponding to the distance parameter of the second feature point The predicted value of , and the multiple boundary lines of the second character sequence correspond to the true value of the distance parameter of the second feature point, training the neural network; and/or, according to the multiple boundary lines of the second character sequence corresponding to The predicted value of the angle parameter of the second feature point, and the multiple boundary lines of the second character sequence corresponding to the true value of the angle parameter of the second feature point, train the neural network. In this example, by mapping the straight line equation in the Cartesian coordinate system to the polar coordinate system, the correlation between the learning parameters and the parameters is reduced, and the parameters are given the actual physical meaning in the image, which is conducive to network learning . In addition, in this example, by training the neural network to learn the distance and angle of each boundary line of the detected character sequence corresponding to the feature point, the detection of the boundary lines can not interfere with each other, so that the learning efficiency of the neural network can be improved. and detection effects.

在一個例子中,根據第二字元序列的多條邊界線對應於第二特徵點的距離參數的預測值,以及第二字元序列的多條邊界線對應於第二特徵點的距離參數的真值,訓練神經網路,包括:對於第二字元序列的多條邊界線中的任意一條邊界線,根據邊界線對應於第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練神經網路。In one example, according to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the distance parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the distance parameter of the second feature point The true value, training the neural network, including: for any one of the multiple boundary lines of the second character sequence, according to the boundary line corresponding to the distance parameter of the second feature point, the smaller of the true value and the predicted value The ratio of the value to the larger value to train the neural network.

例如,針對第二字元序列的多條邊界線中的任意一條邊界線,距離參數對應的損失函數

Figure 02_image091
可以採用公式(7)得到:
Figure 02_image093
公式(7); 其中,
Figure 02_image095
表示第二特徵點的數量,
Figure 02_image097
表示該邊界線對應於第二特徵點
Figure 02_image099
的距離參數的真值,
Figure 02_image101
表示該邊界線對應於第二特徵點
Figure 02_image099
的距離參數的預測值,
Figure 02_image103
表示該邊界線對應於第二特徵點
Figure 02_image099
的距離參數的真值和預測值中的較小值,
Figure 02_image105
表示該邊界線對應於第二特徵點
Figure 02_image099
的距離參數的真值和預測值中的較大值。例如,若
Figure 02_image107
,則
Figure 02_image109
Figure 02_image111
;若
Figure 02_image113
,則
Figure 02_image115
Figure 02_image117
。由於
Figure 02_image097
Figure 02_image101
對應的極點相同(均為第二特徵點
Figure 02_image099
),即
Figure 02_image097
Figure 02_image101
的一端處於相同點,因此,該距離參數對應的損失函數
Figure 02_image091
可以稱為射線重疊度(Intersection Over Union,IOU)損失函數。 For example, for any one of the multiple boundary lines of the second character sequence, the loss function corresponding to the distance parameter
Figure 02_image091
It can be obtained by formula (7):
Figure 02_image093
Formula (7); where,
Figure 02_image095
represents the number of second feature points,
Figure 02_image097
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
the true value of the distance parameter,
Figure 02_image101
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
The predicted value of the distance parameter,
Figure 02_image103
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
The smaller of the true and predicted values of the distance parameter,
Figure 02_image105
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
The larger of the true and predicted values of the distance parameter. For example, if
Figure 02_image107
,but
Figure 02_image109
,
Figure 02_image111
;like
Figure 02_image113
,but
Figure 02_image115
,
Figure 02_image117
. because
Figure 02_image097
and
Figure 02_image101
The corresponding poles are the same (both are the second feature points
Figure 02_image099
),which is
Figure 02_image097
and
Figure 02_image101
One end of is at the same point, therefore, the loss function corresponding to the distance parameter
Figure 02_image091
It can be called the Intersection Over Union (IOU) loss function.

在這個例子中,通過對於第二字元序列的多條邊界線中的任意一條邊界線,根據邊界線對應於第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練神經網路,由此能夠對不同應用場景下、不同大小的距離參數進行歸一化,從而能夠有助於進行多尺度的字元檢測,即,有助於在不同尺度的字元檢測中達到更高的準確性。In this example, for any one of the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the distance parameter of the second feature point, the smaller value and the larger value among the true value and the predicted value of the distance parameter The ratio of the value to train the neural network, which can normalize the distance parameters of different application scenarios and different sizes, which can help to perform multi-scale character detection, that is, it is helpful to detect characters at different scales. Achieve higher accuracy in character detection.

當然,在其他例子中,對於第二字元序列的多條邊界線中的任意一條邊界線,還可以根據邊界線對應於第二特徵點的距離參數的真值與預測值的查找訓練神經網路,在此不作限定。Of course, in other examples, for any boundary line among the plurality of boundary lines of the second character sequence, the neural network can also be trained according to the search for the true value and the predicted value of the distance parameter corresponding to the boundary line to the second feature point The road is not limited here.

在一個例子中,根據第二字元序列的多條邊界線對應於第二特徵點的角度參數的預測值,以及第二字元序列的多條邊界線對應於第二特徵點的角度參數的真值,訓練神經網路,包括:對於第二字元序列的多條邊界線中的任意一條邊界線,確定邊界線對應於第二特徵點的角度參數的真值與預測值的差值的絕對值;根據絕對值的半倍角的正弦值,訓練神經網路。In one example, according to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the angle parameter of the second feature point The true value, training the neural network, comprising: for any one of the multiple boundary lines of the second character sequence, determining the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point; Absolute value; trains the neural network based on the sine of the half angle of the absolute value.

其中,絕對值的半倍角等於絕對值的0.5倍。例如,對於第二字元序列的多條邊界線中的任意一條邊界線,邊界線對應於任一第二特徵點的角度參數的預測值與真值的差值為90°或﹣90°,則邊界線對應於該第二特徵點的角度參數的真值與預測值的差值的絕對值為90°,則絕對值的半倍角為45°。Among them, the half angle of the absolute value is equal to 0.5 times the absolute value. For example, for any one of the multiple boundary lines of the second character sequence, the difference between the predicted value and the true value of the angle parameter of the boundary line corresponding to any second feature point is 90° or −90°, Then, the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point is 90°, and the half angle of the absolute value is 45°.

例如,針對第二字元序列的多條邊界線中的任意一條邊界線,角度參數對應的損失函數

Figure 02_image119
可以採用公式(8)得到:
Figure 02_image121
公式(8); 其中,
Figure 02_image095
表示第二特徵點的數量,
Figure 02_image123
表示該邊界線對應於第二特徵點
Figure 02_image099
的角度參數的真值,
Figure 02_image125
表示該邊界線對應於第二特徵點
Figure 02_image099
的角度參數的預測值,
Figure 02_image127
表示該邊界線對應於第二特徵點
Figure 02_image099
的角度參數的真值與預測值的差值的絕對值,
Figure 02_image129
表示該邊界線對應於第二特徵點
Figure 02_image099
的角度參數的真值與預測值的差值的絕對值的半倍角。 For example, for any one of the multiple boundary lines of the second character sequence, the loss function corresponding to the angle parameter
Figure 02_image119
Equation (8) can be used to get:
Figure 02_image121
Formula (8); where,
Figure 02_image095
represents the number of second feature points,
Figure 02_image123
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
The true value of the angle parameter of ,
Figure 02_image125
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
The predicted value of the angle parameter of ,
Figure 02_image127
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
The absolute value of the difference between the true value of the angle parameter and the predicted value,
Figure 02_image129
Indicates that the boundary line corresponds to the second feature point
Figure 02_image099
Half the angle of the absolute value of the difference between the true value of the angle parameter and the predicted value.

其中,對於第二字元序列的多條邊界線中的任意一條邊界線,邊界線對應於任一第二特徵點的角度參數的真值和預測值的取值範圍可以是

Figure 02_image131
,即
Figure 02_image133
Figure 02_image135
。然而在極座標系中,0與
Figure 02_image001
重合。通過對於第二字元序列的多條邊界線中的任意一條邊界線,確定邊界線對應於第二特徵點的角度參數的真值與預測值的差值的絕對值,並根據絕對值的半倍角的正弦值,訓練神經網路,由此不會因為0與
Figure 02_image001
混淆而對神經網路的學習帶來干擾,從而提高神經網路的學習效率和檢測效果。 Wherein, for any one of the multiple boundary lines of the second character sequence, the value range of the true value and the predicted value of the angle parameter of the boundary line corresponding to any second feature point may be
Figure 02_image131
,Right now
Figure 02_image133
,
Figure 02_image135
. However, in the polar coordinate system, 0 and
Figure 02_image001
coincide. For any one of the multiple boundary lines of the second character sequence, determine the absolute value of the difference between the true value and the predicted value of the angle parameter of the boundary line corresponding to the second feature point, and determine the absolute value of the difference according to the half of the absolute value. The sine value of the double angle, training the neural network, so that it will not be caused by 0 and
Figure 02_image001
Confusion interferes with the learning of the neural network, thereby improving the learning efficiency and detection effect of the neural network.

當然,本領域技術人員還可以對公式8進行變形後採用余弦損失函數等,在此不作限定。Of course, those skilled in the art can also use a cosine loss function or the like after transforming Equation 8, which is not limited here.

作為該實施例的一個示例,第二特徵點包括第二字元區域對應的有效區域中的特徵點。其中,第二特徵點可以僅包括第二字元區域對應的有效區域中的特徵點,不包括第二字元區域對應的有效區域外的特徵點。在計算神經網路的損失函數時,通過僅監督第二字元區域對應的有效區域中的特徵點,不監督第二字元區域對應的有效區域外的特徵點,有助於減少網路負擔。對於第二字元區域的真實邊界框中、靠近真實邊界框的邊緣區域的任一特徵點而言,該特徵點與真實邊界框的邊界線之間的距離較小,難以準確檢測,容易造成較大的誤差。例如,對於有效區域中的某一特徵點,該特徵點與真實邊界框的某一邊界線的距離參數的預測值為9,真實值為10,則誤差為10%;對於有效區域外的某一特徵點,該特徵點與真實邊界框的某一邊界線的距離參數的預測值為1,真實值為2,則誤差為50%。因此,通過忽略有效區域之外的特徵點,有助於減少網路負擔。當然,在其他示例中,第二字元序列的真實邊界框中的所有特徵點,在此不作限定。As an example of this embodiment, the second feature points include feature points in an effective area corresponding to the second character area. Wherein, the second feature points may only include feature points in the effective area corresponding to the second character area, and do not include feature points outside the effective area corresponding to the second character area. When calculating the loss function of the neural network, by only supervising the feature points in the effective area corresponding to the second character area, and not supervising the feature points outside the effective area corresponding to the second character area, it helps to reduce the network burden . For any feature point in the real bounding box of the second character area or in the edge area close to the real bounding box, the distance between the feature point and the boundary line of the real bounding box is small, which is difficult to detect accurately and easily causes larger error. For example, for a feature point in the valid area, the predicted value of the distance parameter between the feature point and a certain boundary line of the real bounding box is 9, and the real value is 10, then the error is 10%; Feature point, the predicted value of the distance parameter between the feature point and a certain boundary line of the real bounding box is 1, and the real value is 2, then the error is 50%. Therefore, by ignoring feature points outside the effective area, it helps to reduce the network burden. Of course, in other examples, all the feature points in the real bounding box of the second character sequence are not limited here.

在一個例子中,本發明實施例提供的字元檢測方法還包括:獲取第二字元序列的真實邊界框的位置資訊;根據真實邊界框的位置資訊,以及預設比例,縮小真實邊界框,得到第二字元序列對應的有效區域。在這個例子中,第二字元序列對應的有效區域的範圍在第二字元序列的真實邊界框內,且第二字元序列對應的有效區域的尺寸小於第二字元序列的真實邊界框的尺寸。圖4示出第二字元區域的真實邊界框31和有效區域32的示意圖。基於這個例子得到第二字元序列對應的有效區域,並基於第二字元序列對應的有效區域中的特徵點進行神經網路的訓練,有助於減少網路負擔。In one example, the character detection method provided by the embodiment of the present invention further includes: acquiring position information of the real bounding box of the second character sequence; reducing the real bounding box according to the position information of the real bounding box and a preset ratio, Obtain the valid area corresponding to the second character sequence. In this example, the range of the valid area corresponding to the second character sequence is within the real bounding box of the second character sequence, and the size of the valid area corresponding to the second character sequence is smaller than the real bounding box of the second character sequence size of. FIG. 4 shows a schematic diagram of the real bounding box 31 and the effective area 32 of the second character area. Based on this example, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to reduce the network load.

例如,根據真實邊界框的位置資訊,以及預設比例,縮小真實邊界框,得到第二字元序列對應的有效區域,包括:根據真實邊界框的位置資訊,確定真實邊界框的錨點,其中,真實邊界框的錨點為真實邊界框的對角線的交點;根據真實邊界框的位置資訊,真實邊界框的錨點的位置資訊,以及預設比例,縮小真實邊界框,得到第二字元序列對應的有效區域,其中,第一距離與第二距離的比值等於預設比例,第一距離表示有效區域的第一頂點與錨點之間的距離,第二距離表示真實邊界框中第一頂點對應的頂點與錨點之間的距離,第一頂點表示有效區域的任一頂點。例如,預設比例可以是0.35、0.4、0.3等,在此不作限定。例如,第一頂點為有效區域的左上角頂點,則真實邊界框中第一頂點對應的頂點為真實邊界框的左上角頂點,以此類推。根據這個例子得到第二字元序列對應的有效區域,並基於第二字元序列對應的有效區域中的特徵點進行神經網路的訓練,有助於提高神經網路的學習效率和預測準確性。For example, reducing the real bounding box according to the position information of the real bounding box and the preset ratio to obtain the valid area corresponding to the second character sequence, including: determining the anchor point of the real bounding box according to the position information of the real bounding box, wherein , the anchor point of the real bounding box is the intersection of the diagonal lines of the real bounding box; according to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset scale, reduce the real bounding box to get the second word The valid region corresponding to the meta-sequence, where the ratio of the first distance to the second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the valid region and the anchor point, and the second distance represents the first distance in the real bounding box. The distance between the vertex corresponding to a vertex and the anchor point, where the first vertex represents any vertex of the valid area. For example, the preset ratio may be 0.35, 0.4, 0.3, etc., which is not limited herein. For example, if the first vertex is the upper left vertex of the valid area, the vertex corresponding to the first vertex in the real bounding box is the upper left vertex of the real bounding box, and so on. According to this example, the effective area corresponding to the second character sequence is obtained, and the neural network is trained based on the feature points in the effective area corresponding to the second character sequence, which helps to improve the learning efficiency and prediction accuracy of the neural network. .

在一個例子中,第二字元序列的真實邊界框的4個頂點的座標可以表示為

Figure 02_image137
Figure 02_image139
。其中,第二字元序列的真實邊界框的4個頂點可以按順時針方向排序,
Figure 02_image141
可以表示第二字元序列的真實邊界框的左上角頂點,
Figure 02_image143
可以表示第二字元序列的真實邊界框的右上角頂點,
Figure 02_image145
可以表示第二字元序列的真實邊界框的右下角頂點,
Figure 02_image147
可以表示第二字元序列的真實邊界框的左下角頂點。對於任一第二特徵點
Figure 02_image149
,第二字元序列的真實邊界框的任意一條邊界線對應於該第二特徵點的距離參數的真值
Figure 02_image151
和角度參數的真值
Figure 02_image152
可以採用公式(9)至(16)確定:
Figure 02_image153
公式(9);
Figure 02_image155
公式(10);
Figure 02_image157
公式(11);
Figure 02_image159
公式(12);
Figure 02_image161
公式(13);
Figure 02_image163
公式(14);
Figure 02_image165
公式(15);
Figure 02_image167
公式(16); 其中,
Figure 02_image169
表示第二特徵點到該邊界線的垂線向量的真值,
Figure 02_image171
與第二特徵點到該邊界線的垂線平行,且由第二特徵點指向垂點,
Figure 02_image172
表示
Figure 02_image174
在極軸下方。 In one example, the coordinates of the 4 vertices of the true bounding box of the second sequence of characters can be expressed as
Figure 02_image137
,
Figure 02_image139
. Among them, the 4 vertices of the real bounding box of the second character sequence can be sorted clockwise,
Figure 02_image141
can represent the top-left corner vertex of the true bounding box of the second sequence of characters,
Figure 02_image143
can represent the upper-right corner vertex of the true bounding box of the second sequence of characters,
Figure 02_image145
can represent the bottom-right vertex of the true bounding box of the second sequence of characters,
Figure 02_image147
The lower left vertex of the ground-truth bounding box that may represent the second sequence of characters. for any second feature point
Figure 02_image149
, any boundary line of the true bounding box of the second character sequence corresponds to the true value of the distance parameter of the second feature point
Figure 02_image151
and the truth value of the angle parameter
Figure 02_image152
Equations (9) to (16) can be used to determine:
Figure 02_image153
formula (9);
Figure 02_image155
formula (10);
Figure 02_image157
formula (11);
Figure 02_image159
formula (12);
Figure 02_image161
formula (13);
Figure 02_image163
formula (14);
Figure 02_image165
formula (15);
Figure 02_image167
Formula (16); where,
Figure 02_image169
represents the true value of the vertical vector from the second feature point to the boundary line,
Figure 02_image171
is parallel to the vertical line from the second feature point to the boundary line, and points from the second feature point to the vertical point,
Figure 02_image172
express
Figure 02_image174
below the polar axis.

作為該實施例的一個示例,該方法還包括:經由神經網路預測訓練圖像中的圖元所在位置屬於字元的概率;根據訓練圖像中的圖元所在位置屬於字元的概率,以及訓練圖像中的圖元所在位置屬於字元的標注資料,訓練神經網路。在該示例中,神經網路可以為多工學習模型,分別學習字元分割(即學習檢測圖像中的圖元所在位置屬於字元的概率)以及邊界線的參數預測兩個任務。根據該示例,能夠使神經網路學習到預測圖元所在位置屬於字元的概率的能力。As an example of this embodiment, the method further includes: predicting the probability that the location of the primitive in the training image belongs to the character via the neural network; the probability that the location of the primitive in the training image belongs to the character, and The location of the primitives in the training image belongs to the labeling data of the characters, and the neural network is trained. In this example, the neural network can be a multi-task learning model, learning character segmentation (ie, learning to detect the probability that the position of a primitive in an image belongs to a character) and the parameter prediction of the boundary line. According to this example, the neural network can be made to learn the ability to predict the probability that the location of the primitive belongs to the character.

在一個例子中,根據訓練圖像中的圖元所在位置屬於字元的概率,以及訓練圖像中的圖元所在位置屬於字元的標注資料,訓練神經網路,包括:根據第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及有效區域中的圖元所在位置屬於字元的標注資料,訓練神經網路。In one example, according to the probability that the position of the primitive in the training image belongs to the character, and the annotation data that the position of the primitive in the training image belongs to the character, training the neural network includes: according to the second character The probability that the position of the primitive in the effective area corresponding to the sequence belongs to the character, and the position of the primitive in the effective area belongs to the labeling data of the character to train the neural network.

例如,字元分割對應的損失函數可以採用公式(17)得到:

Figure 02_image175
公式(17); 其中,
Figure 02_image177
表示第二字元序列對應的有效區域,
Figure 02_image179
表示第二字元序列對應的有效區域中的圖元數;
Figure 02_image181
表示第二字元序列對應的有效區域中的圖元
Figure 02_image183
所在位置屬於字元的標注資料,例如,若圖元
Figure 02_image183
所在位置屬於字元,則
Figure 02_image185
,若圖元
Figure 02_image183
所在位置不屬於字元,則
Figure 02_image187
Figure 02_image189
表示第二字元序列對應的有效區域中的圖元
Figure 02_image183
所在位置屬於字元的概率,
Figure 02_image191
。 For example, the loss function corresponding to character segmentation can be obtained by using formula (17):
Figure 02_image175
Formula (17); where,
Figure 02_image177
Indicates the valid area corresponding to the second character sequence,
Figure 02_image179
Indicates the number of primitives in the valid area corresponding to the second character sequence;
Figure 02_image181
Indicates the primitives in the valid area corresponding to the second character sequence
Figure 02_image183
Label data where the location belongs to the character, for example, if the element
Figure 02_image183
the position belongs to a character, then
Figure 02_image185
, if the primitive
Figure 02_image183
position does not belong to a character, then
Figure 02_image187
;
Figure 02_image189
Indicates the primitives in the valid area corresponding to the second character sequence
Figure 02_image183
the probability that the location belongs to the character,
Figure 02_image191
.

在這個例子中,通過根據第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及有效區域中的圖元所在位置屬於字元的標注資料,訓練神經網路,由此能夠使神經網路學習到字元分割的能力,且能提高神經網路學習字元分割的效率。In this example, the neural network is trained according to the probability that the position of the primitive in the effective area corresponding to the second character sequence belongs to the character, and the position of the primitive in the effective area belongs to the labeling data of the character. This enables the neural network to learn the ability of character segmentation, and can improve the efficiency of the neural network to learn character segmentation.

在一個例子中,可以採用如公式(18)所示的損失函數

Figure 02_image193
訓練神經網路:
Figure 02_image195
公式(18); 其中,
Figure 02_image197
表示字元分割對應的損失函數,
Figure 02_image199
表示距離參數對應的損失函數,
Figure 02_image200
表示角度參數對應的損失函數,
Figure 02_image201
表示
Figure 02_image197
對應的權重,
Figure 02_image203
表示
Figure 02_image199
對應的權重,
Figure 02_image205
表示
Figure 02_image200
對應的權重,
Figure 02_image201
Figure 02_image203
Figure 02_image205
可以根據經驗或者訓練策略等靈活設置,例如,
Figure 02_image207
,在此不作限定。 In one example, a loss function as shown in Equation (18) can be used
Figure 02_image193
Train the neural network:
Figure 02_image195
Formula (18); where,
Figure 02_image197
represents the loss function corresponding to the character segmentation,
Figure 02_image199
represents the loss function corresponding to the distance parameter,
Figure 02_image200
represents the loss function corresponding to the angle parameter,
Figure 02_image201
express
Figure 02_image197
corresponding weight,
Figure 02_image203
express
Figure 02_image199
corresponding weight,
Figure 02_image205
express
Figure 02_image200
corresponding weight,
Figure 02_image201
,
Figure 02_image203
and
Figure 02_image205
It can be flexibly set according to experience or training strategy, for example,
Figure 02_image207
, which is not limited here.

作為該實施例的一個示例,神經網路可以包括至少一個通道削減模組,以降低神經網路的計算量,提高神經網路進行邊界線檢測的速度。As an example of this embodiment, the neural network may include at least one channel reduction module, so as to reduce the calculation amount of the neural network and improve the speed of the boundary line detection by the neural network.

作為該實施例的一個示例,神經網路可以包括至少一個特徵聚合模組,以充分地利用多尺度的特徵,提高神經網路進行邊界線檢測的準確性。As an example of this embodiment, the neural network may include at least one feature aggregation module, so as to fully utilize multi-scale features and improve the accuracy of boundary line detection performed by the neural network.

下面對本發明實施例的一個應用場景進行說明。圖5示出本發明實施例的一個應用場景的示意圖。如圖5所示,神經網路可以是編碼器-解碼器的結構。在圖5中,506表示通道削減模組。例如,通道削減模組506可以採用1×1卷積來實現。當然,通道削減模組506還可以採用3×3卷積等來實現,在此不作限定。507表示特徵聚合模組。特徵聚合模組507可以用於對輸入的特徵圖進行相乘、相加、concat(合併)等中的至少一種操作。例如,如圖5所示,特徵聚合模組507可以將輸入的特徵圖的尺寸(寬和高)擴大為兩倍後,基於擴大後的特徵圖與通道削減模組506的輸出進行concat、1×1非線性卷積和3×3非線性卷積。如圖5所示,神經網路可以使用骨架網路提取基礎特徵,經過特徵聚合模組不斷融合不同尺度的特徵,最終得到9個通道的特徵圖,其中一個通道為文字置信度504(即輸入圖像中的各個圖元輸入字元的概率),其他8個通道為4條邊界線的直線方程的距離參數和角度參數,即四條邊界線的參數503。根據輸入圖像501中3個字元序列的各條邊界線在極座標系下的距離參數和角度參數,可以得到3個字元序列的各條邊界線在笛卡爾座標系下的直線方程。圖5的右側的虛線框505中對4條邊界線的直線方程進行了視覺化,其中,從上到下依次示出了3個字元序列的上邊界線、右邊界線、下邊界線和左邊界線。根據輸入圖像中3個字元序列的各條邊界線的直線方程,可以得到3個字元序列的邊界框502,如圖5的左下方所示。An application scenario of the embodiment of the present invention is described below. FIG. 5 shows a schematic diagram of an application scenario of an embodiment of the present invention. As shown in Figure 5, the neural network can be an encoder-decoder structure. In Figure 5, 506 denotes a channel reduction module. For example, the channel reduction module 506 can be implemented using 1×1 convolution. Of course, the channel reduction module 506 can also be implemented by using 3×3 convolution, etc., which is not limited here. 507 represents a feature aggregation module. The feature aggregation module 507 may be used to perform at least one operation of multiplying, adding, concat (merging), and the like, on the input feature maps. For example, as shown in FIG. 5 , the feature aggregation module 507 can double the size (width and height) of the input feature map, and then perform concat, 1 based on the expanded feature map and the output of the channel reduction module 506 . ×1 non-linear convolution and 3×3 non-linear convolution. As shown in Figure 5, the neural network can use the skeleton network to extract basic features, and continuously integrate features of different scales through the feature aggregation module, and finally obtain feature maps of 9 channels, one of which is the text confidence level 504 (that is, the input The probability of each primitive in the image inputting the character), the other 8 channels are the distance parameters and angle parameters of the straight line equation of the four boundary lines, that is, the parameters 503 of the four boundary lines. According to the distance parameters and angle parameters of each boundary line of the three character sequences in the input image 501 in the polar coordinate system, the straight line equation of each boundary line of the three character sequences in the Cartesian coordinate system can be obtained. The straight line equation of the 4 boundary lines is visualized in the dashed box 505 on the right side of FIG. 5, wherein the upper boundary, right boundary, lower boundary and left side of the 3 character sequence are shown sequentially from top to bottom boundaries. According to the straight line equation of each boundary line of the 3-character sequence in the input image, the bounding box 502 of the 3-character sequence can be obtained, as shown in the lower left of FIG. 5 .

本發明實施例提供的字元檢測方法可以應用於通用自然場景下的字元檢測,以及即時文本翻譯、單據識別、證件識別(例如身份證、銀行卡)、車牌識別等應用場景中,在此不作限定。在一些自然場景中,由於相機視角畸變,圖像中的字元將呈現為不規則的四邊形。通過採用本發明實施例,能夠精確地檢測字元的邊界,從而進一步校正字元的形狀,有利於後續的字元識別。另外,除了字元之外,有一些字元的載體也會呈現出上述現象,例如剛性的身份證、銀行卡以及車牌等。通過採用本發明實施例檢測這些包含字元的四邊形載體的邊界,同樣有利於後續的字元識別環節。The character detection method provided by the embodiment of the present invention can be applied to character detection in general natural scenarios, as well as application scenarios such as real-time text translation, document recognition, certificate recognition (such as ID cards, bank cards), and license plate recognition. Not limited. In some natural scenes, characters in the image will appear as irregular quadrilaterals due to camera perspective distortion. By adopting the embodiments of the present invention, the boundary of the character can be accurately detected, thereby further correcting the shape of the character, which is beneficial to the subsequent character recognition. In addition, in addition to characters, some character carriers also exhibit the above phenomenon, such as rigid ID cards, bank cards, and license plates. By using the embodiments of the present invention to detect the boundaries of these quadrilateral carriers containing characters, it is also beneficial to the subsequent character identification process.

可以理解,本發明提及的上述各個方法實施例,在不違背原理邏輯的情況下,均可以彼此相互結合形成結合後的實施例,限於篇幅,本發明不再贅述。本領域技術人員可以理解,在具體實施方式的上述方法中,各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。It can be understood that the above method embodiments mentioned in the present invention can be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, the present invention will not repeat them. Those skilled in the art can understand that, in the above method of the specific embodiment, the specific execution order of each step should be determined by its function and possible internal logic.

此外,本發明還提供了字元檢測裝置、電子設備、儲存介質以及程式,上述均可用來實現本發明提供的任一種字元檢測方法,相應技術方案和技術效果可參見方法部分的相應記載,不再贅述。In addition, the present invention also provides a character detection device, an electronic device, a storage medium and a program, all of which can be used to implement any character detection method provided by the present invention. For the corresponding technical solutions and technical effects, please refer to the corresponding records in the method section, No longer.

圖6示出本發明實施例提供的字元檢測裝置的方塊圖。如圖6所示,字元檢測裝置6包括: 第一預測模組61,配置為對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,其中,所述第一字元序列的邊界線表示所述第一字元序列所在區域與非所述第一字元序列所在區域之間的分界線; 第一確定模組62,配置為根據所述第一字元序列的多條邊界線的預測參數,確定所述第一字元序列的邊界框的頂點的位置資訊; 第二確定模組63,配置為根據所述第一字元序列的邊界框的頂點的位置資訊,確定所述第一字元序列的邊界框的位置資訊。 FIG. 6 shows a block diagram of an apparatus for character detection provided by an embodiment of the present invention. As shown in Figure 6, the character detection device 6 includes: The first prediction module 61 is configured to predict the multiple boundary lines of the first character sequence in the image to be processed respectively, and obtain the prediction parameters of the multiple boundary lines of the first character sequence. A boundary line of a character sequence represents the boundary between the area where the first character sequence is located and the area not where the first character sequence is located; a first determination module 62, configured to determine the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of a plurality of boundary lines of the first character sequence; The second determining module 63 is configured to determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence.

在本發明的一些實施例中,所述第一預測模組61還配置為基於所述待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。 In some embodiments of the present invention, the first prediction module 61 is further configured to, based on the to-be-processed image, respectively predict the first character for the first feature point related to the first character sequence A plurality of boundary lines of the sequence correspond to the parameters of the first feature point; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points.

在本發明的一些實施例中,所述字元檢測裝置6還包括: 第二預測模組,配置為預測所述待處理圖像中的圖元所在位置屬於字元的概率; 第三確定模組,配置為根據所述待處理圖像中的圖元所在位置屬於字元的概率,確定所述第一特徵點。 In some embodiments of the present invention, the character detection device 6 further includes: The second prediction module is configured to predict the probability that the position of the primitive in the to-be-processed image belongs to the character; The third determination module is configured to determine the first feature point according to the probability that the position of the graphic element in the image to be processed belongs to the character element.

在本發明的一些實施例中,所述第一字元序列的多條邊界線對應於所述第一特徵點的參數包括: 所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數,其中,所述第一特徵點對應的極座標系表示以所述第一特徵點為極點的極座標系。 In some embodiments of the present invention, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include: The distance parameters and angle parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system with a point as a pole.

在本發明的一些實施例中,所述第一預測模組61還配置為將所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。 In some embodiments of the present invention, the first prediction module 61 is further configured to calculate distance parameters and angles of multiple boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point The parameter is mapped to the Cartesian coordinate system, and the multiple boundary lines of the first character sequence are obtained corresponding to the parameters of the first feature point under the Cartesian coordinate system; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system.

在本發明的一些實施例中,所述第一字元序列的多條邊界線包括所述第一字元序列的上邊界線、右邊界線、下邊界線和左邊界線。In some embodiments of the present invention, the plurality of boundary lines of the first sequence of characters include an upper boundary, a right boundary, a lower boundary and a left boundary of the first sequence of characters.

在本發明的一些實施例中,所述第一預測模組61還配置為將所述待處理圖像輸入預先訓練的神經網路,經由所述神經網路針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數。In some embodiments of the present invention, the first prediction module 61 is further configured to input the to-be-processed image into a pre-trained neural network, and use the neural network to analyze the predictions related to the first character sequence through the neural network. A first feature point, respectively predicting that a plurality of boundary lines of the first character sequence correspond to parameters of the first feature point.

在本發明的一些實施例中,所述字元檢測裝置6還包括: 第三預測模組,配置為經由所述神經網路預測所述待處理圖像中的圖元所在位置屬於字元的概率。 In some embodiments of the present invention, the character detection device 6 further includes: The third prediction module is configured to predict the probability that the location of the primitive in the image to be processed belongs to the character through the neural network.

在本發明的一些實施例中,所述字元檢測裝置6還包括: 第四預測模組,配置為將訓練圖像輸入所述神經網路,經由所述神經網路針對與所述訓練圖像中的第二字元序列相關的第二特徵點,分別預測所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值; 第一訓練模組,配置為根據所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的真值,訓練所述神經網路。 In some embodiments of the present invention, the character detection device 6 further includes: The fourth prediction module is configured to input the training image into the neural network, and through the neural network, respectively predict the second feature points related to the second character sequence in the training image. A plurality of boundary lines of the second character sequence corresponds to the predicted value of the parameter of the second feature point; The first training module is configured to correspond to the predicted value of the parameter of the second feature point according to the multiple boundary lines of the second character sequence, and the multiple boundary lines of the second character sequence correspond to The true value of the parameter of the second feature point is used to train the neural network.

在本發明的一些實施例中,所述第二字元序列的多條邊界線對應於所述第二特徵點的參數包括:所述第二字元序列的多條邊界線在所述第二特徵點對應的極座標系下的距離參數和角度參數,其中,所述第二特徵點對應的極座標系表示以所述第二特徵點為極點的極座標系;In some embodiments of the present invention, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature point include: the plurality of boundary lines of the second character sequence are in the second The distance parameter and the angle parameter under the polar coordinate system corresponding to the feature point, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole;

所述第一訓練模組配置為根據所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的真值,訓練所述神經網路;和/或, 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的真值,訓練所述神經網路。 The first training module is configured to correspond to the predicted value of the distance parameter of the second feature point according to the plurality of boundary lines of the second character sequence, and the plurality of boundary lines of the second character sequence training the neural network corresponding to the true value of the distance parameter of the second feature point; and/or, According to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the second feature point The true value of the angle parameter to train the neural network.

在本發明的一些實施例中,所述第一訓練模組配置為對於所述第二字元序列的多條邊界線中的任意一條邊界線,根據所述邊界線對應於所述第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練所述神經網路。In some embodiments of the present invention, the first training module is configured to, for any one of a plurality of boundary lines of the second character sequence, correspond to the second feature according to the boundary line The ratio of the smaller value to the larger value of the true value and the predicted value of the distance parameter of the point, the neural network is trained.

在本發明的一些實施例中,所述第一訓練模組配置為對於所述第二字元序列的多條邊界線中的任意一條邊界線,確定所述邊界線對應於所述第二特徵點的角度參數的真值與預測值的差值的絕對值;In some embodiments of the present invention, the first training module is configured to, for any one of a plurality of boundary lines of the second character sequence, determine that the boundary line corresponds to the second feature The absolute value of the difference between the true value of the point's angle parameter and the predicted value;

根據所述絕對值的半倍角的正弦值,訓練所述神經網路。The neural network is trained according to the sine of the half angle of the absolute value.

在本發明的一些實施例中,所述第二特徵點包括所述第二字元區域對應的有效區域中的特徵點。In some embodiments of the present invention, the second feature points include feature points in an effective area corresponding to the second character area.

在本發明的一些實施例中,所述裝置還包括: 第五預測模組,配置為經由所述神經網路預測所述訓練圖像中的圖元所在位置屬於字元的概率; 第二訓練模組,配置為根據所述訓練圖像中的圖元所在位置屬於字元的概率,以及所述訓練圖像中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。 In some embodiments of the present invention, the apparatus further comprises: a fifth prediction module, configured to predict the probability that the position of the primitive in the training image belongs to the character through the neural network; The second training module is configured to train the neural network according to the probability that the position of the graphic element in the training image belongs to the character, and the labeling data that the position of the graphic element in the training image belongs to the character road.

在本發明的一些實施例中,所述第二訓練模組還配置為根據所述第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及所述有效區域中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。In some embodiments of the present invention, the second training module is further configured to be based on the probability that the position of the graphic element in the effective area corresponding to the second character sequence belongs to the character, and the probability of the graphic element in the effective area The location of the graphic element belongs to the labeling data of the character, and the neural network is trained.

在本發明的一些實施例中,所述字元檢測裝置6還包括: 獲取模組,配置為獲取所述第二字元序列的真實邊界框的位置資訊; 縮小模組,配置為根據所述真實邊界框的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域。 In some embodiments of the present invention, the character detection device 6 further includes: an acquisition module, configured to acquire the position information of the real bounding box of the second character sequence; The shrinking module is configured to shrink the real bounding box according to the position information of the real bounding box and a preset ratio to obtain an effective area corresponding to the second character sequence.

在本發明的一些實施例中,所述縮小模組還配置為根據所述真實邊界框的位置資訊,確定所述真實邊界框的錨點,其中,所述真實邊界框的錨點為所述真實邊界框的對角線的交點; 根據所述真實邊界框的位置資訊,所述真實邊界框的錨點的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域,其中,第一距離與第二距離的比值等於所述預設比例,所述第一距離表示所述有效區域的第一頂點與所述錨點之間的距離,所述第二距離表示真實邊界框中所述第一頂點對應的頂點與所述錨點之間的距離,所述第一頂點表示所述有效區域的任一頂點。 In some embodiments of the present invention, the reduction module is further configured to determine the anchor point of the real bounding box according to the position information of the real bounding box, wherein the anchor point of the real bounding box is the the intersection of the diagonals of the true bounding box; According to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of a distance to a second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the distance in the real bounding box The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area.

在本發明實施例中,通過對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到第一字元序列的多條邊界線的預測參數,根據第一字元序列的多條邊界線的預測參數,確定第一字元序列的邊界框的頂點的位置資訊,並根據第一字元序列的邊界框的頂點的位置資訊,確定第一字元序列的邊界框的位置資訊,由此將字元序列的多邊形(例如四邊形)邊界框拆解為多條(例如四條)獨立的邊界線,對每一條獨立的邊界線進行單獨檢測,從而每一條邊界線的檢測均不會被兩個不同的頂點所干擾,進而能夠提高字元檢測的準確性。In the embodiment of the present invention, the prediction parameters of the plurality of boundary lines of the first character sequence are obtained by respectively predicting the plurality of boundary lines of the first character sequence in the image to be processed. the prediction parameters of the boundary lines, determine the position information of the vertices of the bounding box of the first character sequence, and determine the position information of the bounding box of the first character sequence according to the position information of the vertices of the bounding box of the first character sequence , so that the polygon (such as quadrilateral) bounding box of the character sequence is decomposed into multiple (such as four) independent boundary lines, and each independent boundary line is detected separately, so that the detection of each boundary line will not is interfered by two different vertices, which in turn can improve the accuracy of character detection.

在一些實施例中,本發明實施例提供的字元檢測裝置6具有的功能或包含的模組可以配置為執行上文方法實施例描述的方法,其具體實現和技術效果可以參照上文方法實施例的描述,這裡不再贅述。In some embodiments, the functions or modules included in the character detection device 6 provided in the embodiments of the present invention may be configured to execute the methods described in the above method embodiments, and the specific implementation and technical effects thereof may be implemented with reference to the above methods The description of the example will not be repeated here.

本發明實施例還提供一種電腦可讀儲存介質,其上儲存有電腦程式指令,該電腦程式指令被處理器執行時實現上述方法。其中,該電腦可讀儲存介質可以是非易失性電腦可讀儲存介質,或者可以是易失性電腦可讀儲存介質。Embodiments of the present invention also provide a computer-readable storage medium, which stores computer program instructions, and the computer program instructions implement the above method when executed by a processor. Wherein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.

本發明實施例還提出一種電腦程式,包括電腦可讀代碼,當電腦可讀代碼在電子設備中運行時,電子設備中的處理器執行用於實現上述任一實施例提供的字元檢測方法。Embodiments of the present invention further provide a computer program, including computer-readable codes. When the computer-readable codes are executed in an electronic device, a processor in the electronic device executes the character detection method provided by any of the foregoing embodiments.

本發明實施例還提供了另一種電腦程式產品,用於儲存電腦可讀指令,指令被執行時使得電腦執行上述任一實施例提供的字元檢測方法的操作。The embodiments of the present invention further provide another computer program product for storing computer-readable instructions, and when the instructions are executed, the computer executes the operations of the character detection method provided by any of the above-mentioned embodiments.

本發明實施例還提供一種電子設備,包括:一個或多個處理器;用於儲存可執行指令的記憶體;其中,一個或多個處理器被配置為調用記憶體儲存的可執行指令,以執行上述任一實施例提供的字元檢測方法。An embodiment of the present invention also provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to call the executable instructions stored in the memory to Execute the character detection method provided by any of the above embodiments.

電子設備可以被提供為終端、伺服器或其它形態的設備。The electronic device may be provided as a terminal, server or other form of device.

圖7示出本發明實施例提供的一種電子設備800的方塊圖。例如,電子設備800可以是行動電話,電腦,數位廣播終端,消息收發設備,遊戲控制台,平板設備,醫療設備,健身設備,個人數位助理等終端。FIG. 7 shows a block diagram of an electronic device 800 provided by an embodiment of the present invention. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

參照圖7,電子設備800可以包括以下一個或多個組件:處理組件802,記憶體804,電源組件806,多媒體組件808,音頻組件810,輸入/輸出(Input/Output,I/O)介面812,感測器組件814,以及通信組件816。7 , an electronic device 800 may include one or more of the following components: a processing component 802 , a memory 804 , a power supply component 806 , a multimedia component 808 , an audio component 810 , and an Input/Output (I/O) interface 812 , sensor component 814 , and communication component 816 .

處理組件802通常控制電子設備800的整體操作,諸如與顯示,電話呼叫,資料通信,相機操作和記錄操作相關聯的操作。處理組件802可以包括一個或多個處理器820來執行指令,以完成上述的方法的全部或部分步驟。此外,處理組件802可以包括一個或多個模組,便於處理組件802和其他組件之間的交互。例如,處理組件802可以包括多媒體模組,以方便多媒體組件808和處理組件802之間的交互。The processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

記憶體804被配置為儲存各種類型的資料以支援在電子設備800的操作。這些資料的示例包括用於在電子設備800上操作的任何應用程式或方法的指令,連絡人資料,電話簿資料,訊息,圖片,視頻等。記憶體804可以由任何類型的易失性或非易失性存放裝置或者它們的組合實現,如靜態隨機存取記憶體(Static Random-Access Memory,SRAM),電可擦除可程式設計唯讀記憶體(Electrically Erasable Programmable read only memory,EEPROM),可擦除可程式設計唯讀記憶體(Electrical Programmable Read Only Memory,EPROM),可程式設計唯讀記憶體(Programmable Read-Only Memory,PROM),唯讀記憶體(Read-Only Memory,ROM),磁記憶體,快閃記憶體,磁片或光碟。The memory 804 is configured to store various types of data to support the operation of the electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Design Read Only Memory (Electrically Erasable Programmable read only memory, EEPROM), Erasable Programmable Read Only Memory (Electrical Programmable Read Only Memory, EPROM), Programmable Read-Only Memory (Programmable Read-Only Memory, PROM), Read-Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or CD.

電源組件806為電子設備800的各種組件提供電力。電源組件806可以包括電源管理系統,一個或多個電源,及其他與為電子設備800生成、管理和分配電力相關聯的組件。Power supply assembly 806 provides power to various components of electronic device 800 . Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

多媒體組件808包括在所述電子設備800和使用者之間的提供一個輸出介面的螢幕。在一些實施例中,螢幕可以包括液晶顯示器(Liquid Crystal Display,LCD)和觸摸面板(Touch Panel,TP)。如果螢幕包括觸摸面板,螢幕可以被實現為觸控式螢幕,以接收來自使用者的輸入信號。觸摸面板包括一個或多個觸摸感測器以感測觸摸、滑動和觸摸面板上的手勢。所述觸摸感測器可以不僅感測觸摸或滑動動作的邊界,而且還檢測與所述觸摸或滑動操作相關的持續時間和壓力。在一些實施例中,多媒體組件808包括一個前置攝影頭和/或後置攝影頭。當電子設備800處於操作模式,如拍攝模式或視訊模式時,前置攝影頭和/或後置攝影頭可以接收外部的多媒體資料。每個前置攝影頭和後置攝影頭可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音頻組件810被配置為輸出和/或輸入音頻信號。例如,音頻組件810包括一個麥克風(Microphone,MIC),當電子設備800處於操作模式,如呼叫模式、記錄模式和語音辨識模式時,麥克風被配置為接收外部音頻信號。所接收的音頻信號可以被進一步儲存在記憶體804或經由通信組件816發送。在一些實施例中,音頻組件810還包括一個揚聲器,用於輸出音頻信號。Audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (Microphone, MIC) configured to receive external audio signals when the electronic device 800 is in an operating mode, such as a calling mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

I/O介面812為處理組件802和週邊介面模組之間提供介面,上述週邊介面模組可以是鍵盤,點擊輪,按鈕等。這些按鈕可包括但不限於:主頁按鈕、音量按鈕、啟動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules. The peripheral interface modules may be keyboards, click wheels, buttons, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

感測器組件814包括一個或多個感測器,用於為電子設備800提供各個方面的狀態評估。例如,感測器組件814可以檢測到電子設備800的打開/關閉狀態,組件的相對定位,例如所述組件為電子設備800的顯示器和小鍵盤,感測器組件814還可以檢測電子設備800或電子設備800一個組件的位置改變,使用者與電子設備800接觸的存在或不存在,電子設備800方位或加速/減速和電子設備800的溫度變化。感測器組件814可以包括接近感測器,被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器組件814還可以包括光感測器,如互補金屬氧化物半導體(Complementary Metal Oxide Semiconductor,CMOS)或電荷耦合裝置(Charge-coupled Device,CCD)圖像感測器,用於在成像應用中使用。在一些實施例中,該感測器組件814還可以包括加速度感測器,陀螺儀感測器,磁感測器,壓力感測器或溫度感測器。Sensor assembly 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor assembly 814 can detect the open/closed state of the electronic device 800, the relative positioning of the components, such as the display and keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or Changes in the position of a component of the electronic device 800 , presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge-coupled Device (CCD) image sensor, for use in imaging applications used in. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信組件816被配置為便於電子設備800和其他設備之間有線或無線方式的通信。電子設備800可以接入基於通信標準的無線網路,如無線網路(Wi-Fi)、第二代移動通信技術(2-Generation,2G)、第三代移動通信技術(3rd-Generation,3G)、第四代移動通信技術(4-Generation,4G)/,通用移動通信技術的長期演進(Long Term Evolution,LTE)、第五代移動通信技術(5-Generation,5G)或它們的組合。在一個示例性實施例中,通信組件816經由廣播通道接收來自外部廣播管理系統的廣播信號或廣播相關資訊。在一個示例性實施例中,所述通信組件816還包括近場通信(Near Field Communication,NFC)模組,以促進短程通信。例如,在NFC模組可基於射頻識別(Radio Frequency Identification,RFID)技術,紅外資料協會(Infrared Data Association,IrDA)技術,超寬頻(Ultra Wide Band,UWB)技術,藍牙(BitTorrent,BT)技術和其他技術來實現。Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second-generation mobile communication technology (2-Generation, 2G), a third-generation mobile communication technology (3rd-Generation, 3G) ), the fourth generation mobile communication technology (4-Generation, 4G)/, the long term evolution (Long Term Evolution, LTE) of the universal mobile communication technology, the fifth generation mobile communication technology (5-Generation, 5G) or their combination. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BitTorrent, BT) technology and other technologies to achieve.

在示例性實施例中,電子設備800可以被一個或多個應用專用積體電路(Application Specific Integrated Circuit,ASIC)、數位訊號處理器(Digital Signal Process,DSP)、數位信號處理設備(Digital Signal Process Device,DSPD)、可程式設計邏輯器件(Programmable Logic Device,PLD)、現場可程式設計閘陣列(Field Programmable Gate Array,FPGA)、控制器、微控制器、微處理器或其他電子組件實現,用於執行上述方法。In an exemplary embodiment, the electronic device 800 may be implemented by one or more of Application Specific Integrated Circuits (ASICs), Digital Signal Processes (DSPs), Digital Signal Processes (Digital Signal Processes) Device, DSPD), programmable logic device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor or other electronic components to achieve, with to execute the above method.

在示例性實施例中,還提供了一種非易失性電腦可讀儲存介質,例如包括電腦程式指令的記憶體804,上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 804 including computer program instructions executable by the processor 820 of the electronic device 800 to accomplish the above method.

圖8示出本發明實施例提供的一種電子設備1900的方塊圖。例如,電子設備1900可以被提供為一伺服器。參照圖8,電子設備1900包括處理組件1922,其進一步包括一個或多個處理器,以及由記憶體1932所代表的記憶體資源,用於儲存可由處理組件1922的執行的指令,例如應用程式。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外,處理組件1922被配置為執行指令,以執行上述方法。FIG. 8 shows a block diagram of an electronic device 1900 provided by an embodiment of the present invention. For example, the electronic device 1900 may be provided as a server. 8, the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by the processing component 1922, such as applications. An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 1922 is configured to execute instructions to perform the above-described methods.

電子設備1900還可以包括一個電源組件1926被配置為執行電子設備1900的電源管理,一個有線或無線網路介面1950被配置為將電子設備1900連接到網路,和一個輸入輸出介面1958。電子設備1900可以操作基於儲存在記憶體1932的作業系統,例如微軟伺服器作業系統(Windows ServerTM),蘋果公司推出的基於圖形化使用者介面作業系統(Mac OS XTM),多使用者多進程的電腦作業系統(UnixTM),自由和開放原代碼的類Unix作業系統(LinuxTM),開放原代碼的類Unix作業系統(FreeBSDTM)或類似。The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input-output interface 1958. The electronic device 1900 can operate an operating system based on the memory 1932, such as Microsoft Server Operating System (Windows ServerTM), a graphical user interface based operating system (Mac OS XTM) introduced by Apple, a multi-user multi-process operating system. Computer Operating System (UnixTM), Free and Open Source Unix-like Operating System (LinuxTM), Open Source Unix-like Operating System (FreeBSDTM) or similar.

在示例性實施例中,還提供了一種非易失性電腦可讀儲存介質,例如包括電腦程式指令的記憶體1932,上述電腦程式指令可由電子設備1900的處理組件1922執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions executable by the processing component 1922 of the electronic device 1900 to accomplish the above method.

本發明可以是系統、方法和/或電腦程式產品。電腦程式產品可以包括電腦可讀儲存介質,其上載有用於使處理器實現本發明的各個方面的電腦可讀程式指令。The present invention may be a system, method and/or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.

電腦可讀儲存介質可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存介質例如可以是但不限於電存放裝置、磁存放裝置、光存放裝置、電磁存放裝置、半導體存放裝置或者上述的任意合適的組合。電腦可讀儲存介質的更具體的例子(非窮舉的列表)包括:可擕式電腦盤、硬碟、隨機存取記憶體(Random Access Memory,RAM)、ROM、EPROM或快閃記憶體、SRAM、可擕式壓縮磁碟唯讀記憶體(Compact Disc Read-Only Memory,CD-ROM)、數位多功能盤(Digital Video Disc,DVD)、記憶棒、軟碟、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裡所使用的電腦可讀儲存介質不被解釋為暫態信號本身,諸如無線電波或者其他自由傳播的電磁波、通過波導或其他傳輸媒介傳播的電磁波(例如,通過光纖電纜的光脈衝)、或者通過電線傳輸的電信號。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), ROM, EPROM or flash memory, SRAM, Portable Compact Disc Read-Only Memory (CD-ROM), Digital Video Disc (DVD), Memory Sticks, Floppy Disks, Mechanical Encoding Devices, such as A punched card or a raised structure in a groove storing instructions, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or Electrical signals carried by wires.

這裡所描述的電腦可讀程式指令可以從電腦可讀儲存介質下載到各個計算/處理設備,或者通過網路、例如網際網路、局域網、廣域網路和/或無線網下載到外部電腦或外部存放裝置。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換機、閘道電腦和/或邊緣伺服器。每個計算/處理設備中的網路介面卡或者網路介面從網路接收電腦可讀程式指令,並轉發該電腦可讀程式指令,以供儲存在各個計算/處理設備中的電腦可讀儲存介質中。The computer-readable program instructions described herein may be downloaded from computer-readable storage media to various computing/processing devices, or downloaded to external computers or external storage over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network device. Networks may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. A network interface card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage stored in each computing/processing device in the medium.

用於執行本發明操作的電腦程式指令可以是彙編指令、指令集架構(Industry Standard Architecture,ISA)指令、機器指令、機器相關指令、微代碼、固件指令、狀態設置資料、或者以一種或多種程式設計語言的任意組合編寫的原始程式碼或目標代碼,所述程式設計語言包括物件導向的程式設計語言—諸如Smalltalk、C++等,以及常規的過程式程式設計語言—諸如“C”語言或類似的程式設計語言。電腦可讀程式指令可以完全地在使用者電腦上執行、部分地在使用者電腦上執行、作為一個獨立的套裝軟體執行、部分在使用者電腦上部分在遠端電腦上執行、或者完全在遠端電腦或伺服器上執行。在涉及遠端電腦的情形中,遠端電腦可以通過任意種類的網路—包括局域網(Local Area Network,LAN)或廣域網路(Wide Area Network,WAN)—連接到使用者電腦,或者,可以連接到外部電腦(例如利用網際網路服務提供者來通過網際網路連接)。在一些實施例中,通過利用電腦可讀程式指令的狀態資訊來個性化定制電子電路,例如可程式設計邏輯電路、FPGA或可程式設計邏輯陣列(Programmable Logic Arrays,PLA),該電子電路可以執行電腦可讀程式指令,從而實現本發明的各個方面。The computer program instructions for carrying out the operations of the present invention may be assembly instructions, Industry Standard Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in one or more programs Source or object code written in any combination of design languages, including object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as "C" or the like programming language. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely remotely. run on a client computer or server. In the case of a remote computer, the remote computer can be connected to the user computer via any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, alternatively, it can be connected to to an external computer (eg using an Internet service provider to connect via the Internet). In some embodiments, by utilizing state information of computer readable program instructions to personalize custom electronic circuits, such as programmable logic circuits, FPGAs, or Programmable Logic Arrays (PLAs), the electronic circuits may execute Computer readable program instructions to implement various aspects of the present invention.

這裡參照根據本發明實施例的方法、裝置(系統)和電腦程式產品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解,流程圖和/或方塊圖的每個方塊以及流程圖和/或方塊圖中各方塊的組合,都可以由電腦可讀程式指令實現。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可程式設計資料處理裝置的處理器,從而生產出一種機器,使得這些指令在通過電腦或其它可程式設計資料處理裝置的處理器執行時,產生了實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存介質中,這些指令使得電腦、可程式設計資料處理裝置和/或其他設備以特定方式工作,從而,儲存有指令的電腦可讀介質則包括一個製造品,其包括實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的各個方面的指令。These computer readable program instructions may be provided to the processor of a general purpose computer, special purpose computer or other programmable data processing device to produce a machine for execution of the instructions by the processor of the computer or other programmable data processing device When, means are created that implement the functions/acts specified in one or more of the blocks in the flowchart and/or block diagrams. These computer readable program instructions may also be stored on a computer readable storage medium, the instructions causing the computer, programmable data processing device and/or other equipment to operate in a particular manner, so that the computer readable medium storing the instructions Included is an article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

也可以把電腦可讀程式指令載入到電腦、其它可程式設計資料處理裝置、或其它設備上,使得在電腦、其它可程式設計資料處理裝置或其它設備上執行一系列操作步驟,以產生電腦實現的過程,從而使得在電腦、其它可程式設計資料處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作。Computer readable program instructions can also be loaded into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to generate a computer Processes of implementation such that instructions executing on a computer, other programmable data processing apparatus, or other device implement the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

附圖中的流程圖和方塊圖顯示了根據本發明的多個實施例的方法和電腦程式產品的可能實現的體系架構、功能和操作。在這點上,流程圖或方塊圖中的每個方塊可以代表一個模組、程式段或指令的一部分,所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作為替換的實現中,方塊中所標注的功能也可以以不同於附圖中所標注的順序發生。例如,兩個連續的方塊實際上可以基本並行地執行,它們有時也可以按相反的循序執行,這依所涉及的功能而定。也要注意的是,方塊圖和/或流程圖中的每個方塊、以及方塊圖和/或流程圖中的方塊的組合,可以用執行規定的功能或動作的專用的基於硬體的系統來實現,或者可以用專用硬體與電腦指令的組合來實現。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more logic for implementing the specified logic Executable instructions for the function. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by dedicated hardware-based systems that perform the specified functions or actions. implementation, or may be implemented in a combination of special purpose hardware and computer instructions.

該電腦程式產品可以具體通過硬體、軟體或其結合的方式實現。在一個可選實施例中,所述電腦程式產品具體體現為電腦儲存介質,在另一個可選實施例中,電腦程式產品具體體現為軟體產品,例如軟體發展包(Software Development Kit,SDK)等等。The computer program product can be implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

以上已經描述了本發明的各實施例,上述說明是示例性的,並非窮盡性的,並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下,對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇,旨在最好地解釋各實施例的原理、實際應用或對市場中的技術的改進,或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

工業實用性 本發明提供了一種字元檢測方法、電子設備及電腦可讀儲存介質;其中,對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,其中,所述第一字元序列的邊界線表示所述第一字元序列所在區域與非所述第一字元序列所在區域之間的分界線;根據所述第一字元序列的多條邊界線的預測參數,確定所述第一字元序列的邊界框的頂點的位置資訊;根據所述第一字元序列的邊界框的頂點的位置資訊,確定所述第一字元序列的邊界框的位置資訊。 Industrial Applicability The present invention provides a character detection method, an electronic device and a computer-readable storage medium; wherein, a plurality of boundary lines of a first character sequence in a to-be-processed image are predicted respectively, and multiple boundary lines of the first character sequence are obtained. prediction parameters of a boundary line, wherein the boundary line of the first character sequence represents the boundary between the area where the first character sequence is located and the area not where the first character sequence is located; The prediction parameters of a plurality of boundary lines of a character sequence determine the position information of the vertices of the bounding box of the first character sequence; according to the position information of the vertices of the bounding box of the first character sequence, determine the Location information for the bounding box of the first character sequence.

201:圖像獲取終端 202:網路 203:確定位置終端 31:真實邊界框 32:有效區域 501:輸入圖像 502:邊界框 503:四條邊界線的參數 504:文字置信度 505:虛線框 506:通道削減模組 507:特徵聚合模組 6:字元檢測裝置 61:第一預測模組 62:第一確定模組 63:第二確定模組 800:電子設備 802:處理組件 804:記憶體 806:電源組件 808:多媒體組件 810:音頻組件 812:輸入/輸出介面 814:感測器組件 816:通信組件 820:處理器 1900:電子設備 1922:處理組件 1926:電源組件 1932:記憶體 1950:網路介面 1958:輸入輸出介面 S11~S13:步驟 201: Image acquisition terminal 202: Internet 203: Determine the location terminal 31: Ground-truth bounding box 32: Effective area 501: input image 502: Bounding Box 503: Parameters of the four boundary lines 504: Text Confidence 505: Dotted box 506: Channel Cut Module 507: Feature Aggregation Module 6: Character detection device 61: The first prediction module 62: First determine the module 63: The second determination module 800: Electronics 802: Process component 804: memory 806: Power Components 808: Multimedia Components 810: Audio Components 812: Input/Output Interface 814: Sensor Assembly 816: Communication Components 820: Processor 1900: Electronic equipment 1922: Processing components 1926: Power Components 1932: Memory 1950: Web Interface 1958: Input and output interface S11~S13: Steps

此處的附圖被併入說明書中並構成本說明書的一部分,這些附圖示出了符合本發明的實施例,並與說明書一起用於說明本發明的技術方案。 圖1示出本發明實施例提供的一種字元檢測方法的流程圖; 圖2示出了應用本發明實施例的字元檢測方法的一種系統架構示意圖; 圖3示出第一字元序列的4條邊界線在某一第一特徵點對應的極座標系下的距離參數和角度參數的示意圖; 圖4示出第二字元區域的真實邊界框31和有效區域32的示意圖; 圖5示出本發明實施例的一個應用場景的示意圖; 圖6示出本發明實施例提供的字元檢測裝置的方塊圖; 圖7示出本發明實施例提供的一種電子設備800的方塊圖; 圖8示出本發明實施例提供的一種電子設備1900的方塊圖。 The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present invention, and together with the description, serve to explain the technical solutions of the present invention. 1 shows a flowchart of a method for detecting a character provided by an embodiment of the present invention; 2 shows a schematic diagram of a system architecture applying the character detection method according to an embodiment of the present invention; Fig. 3 shows the schematic diagram of the distance parameter and the angle parameter of 4 boundary lines of the first character sequence under the polar coordinate system corresponding to a certain first feature point; Fig. 4 shows the schematic diagram of the real bounding box 31 and the effective area 32 of the second character area; 5 shows a schematic diagram of an application scenario of an embodiment of the present invention; 6 shows a block diagram of a character detection device provided by an embodiment of the present invention; FIG. 7 shows a block diagram of an electronic device 800 provided by an embodiment of the present invention; FIG. 8 shows a block diagram of an electronic device 1900 provided by an embodiment of the present invention.

S11~S13:步驟 S11~S13: Steps

Claims (19)

一種字元檢測方法,包括: 對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,其中,所述第一字元序列的邊界線表示所述第一字元序列所在區域與非所述第一字元序列所在區域之間的分界線; 根據所述第一字元序列的多條邊界線的預測參數,確定所述第一字元序列的邊界框的頂點的位置資訊; 根據所述第一字元序列的邊界框的頂點的位置資訊,確定所述第一字元序列的邊界框的位置資訊。 A character detection method, comprising: Predicting the multiple boundary lines of the first character sequence in the image to be processed respectively, to obtain the prediction parameters of the multiple boundary lines of the first character sequence, wherein the boundary line of the first character sequence represents the the boundary between the area where the first character sequence is located and the area not where the first character sequence is located; determining the position information of the vertices of the bounding box of the first character sequence according to the prediction parameters of the plurality of boundary lines of the first character sequence; The position information of the bounding box of the first character sequence is determined according to the position information of the vertices of the bounding box of the first character sequence. 根據請求項1所述的方法,所述對待處理圖像中第一字元序列的多條邊界線分別進行預測,得到所述第一字元序列的多條邊界線的預測參數,包括: 基於所述待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。 According to the method described in claim 1, the multiple boundary lines of the first character sequence in the image to be processed are predicted respectively, and the prediction parameters of the multiple boundary lines of the first character sequence are obtained, including: Based on the to-be-processed image, for the first feature point related to the first character sequence, predict the parameters corresponding to the first feature point of a plurality of boundary lines of the first character sequence respectively; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points. 根據請求項2所述的方法,所述方法還包括: 預測所述待處理圖像中的圖元所在位置屬於字元的概率; 根據所述待處理圖像中的圖元所在位置屬於字元的概率,確定所述第一特徵點。 According to the method of claim 2, the method further includes: Predict the probability that the position of the primitive in the image to be processed belongs to the character; The first feature point is determined according to the probability that the position of the graphic element in the image to be processed belongs to the character element. 根據請求項2或3所述的方法,所述第一字元序列的多條邊界線對應於所述第一特徵點的參數包括: 所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數,其中,所述第一特徵點對應的極座標系表示以所述第一特徵點為極點的極座標系。 According to the method described in claim 2 or 3, the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point include: The distance parameters and angle parameters of the plurality of boundary lines of the first character sequence in the polar coordinate system corresponding to the first feature point, wherein the polar coordinate system corresponding to the first feature point is represented by the first feature point. A polar coordinate system with a point as a pole. 根據請求項4所述的方法,所述根據所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數,包括: 將所述第一字元序列的多條邊界線在所述第一特徵點對應的極座標系下的距離參數和角度參數映射至笛卡爾座標系,得到所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數; 根據所述第一字元序列的多條邊界線在所述笛卡爾座標系下對應於所述第一特徵點的參數,確定所述第一字元序列的多條邊界線的預測參數。 According to the method of claim 4, the prediction of the plurality of boundary lines of the first character sequence is determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature points parameters, including: The distance parameters and angle parameters of the plurality of boundary lines of the first character sequence under the polar coordinate system corresponding to the first feature point are mapped to the Cartesian coordinate system to obtain the plurality of boundaries of the first character sequence The line corresponds to the parameter of the first feature point in the Cartesian coordinate system; The prediction parameters of the plurality of boundary lines of the first character sequence are determined according to the parameters of the plurality of boundary lines of the first character sequence corresponding to the first feature point in the Cartesian coordinate system. 根據請求項1至3中任一項所述的方法,所述第一字元序列的多條邊界線包括所述第一字元序列的上邊界線、右邊界線、下邊界線和左邊界線。According to the method of any one of claims 1 to 3, the plurality of boundary lines of the first character sequence includes an upper boundary line, a right boundary line, a lower boundary line and a left boundary line of the first character sequence. 根據請求項2所述的方法,所述基於所述待處理圖像,針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數,包括: 將所述待處理圖像輸入預先訓練的神經網路,經由所述神經網路針對與第一字元序列相關的第一特徵點,分別預測所述第一字元序列的多條邊界線對應於所述第一特徵點的參數。 The method according to claim 2, wherein, based on the to-be-processed image, for the first feature points related to the first character sequence, it is respectively predicted that multiple boundary lines of the first character sequence correspond to the Describe the parameters of the first feature point, including: Input the image to be processed into a pre-trained neural network, and through the neural network, for the first feature points related to the first character sequence, respectively predict the correspondence of multiple boundary lines of the first character sequence parameters at the first feature point. 根據請求項7所述的方法,所述方法還包括: 經由所述神經網路預測所述待處理圖像中的圖元所在位置屬於字元的概率。 According to the method of claim 7, the method further includes: The probability that the location of the primitive in the to-be-processed image belongs to the character is predicted through the neural network. 根據請求項7或8所述的方法,所述將所述待處理圖像輸入預先訓練的神經網路之前,所述方法還包括: 將訓練圖像輸入所述神經網路,經由所述神經網路針對與所述訓練圖像中的第二字元序列相關的第二特徵點,分別預測所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值; 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的真值,訓練所述神經網路。 According to the method of claim 7 or 8, before inputting the to-be-processed image into the pre-trained neural network, the method further includes: Input the training image into the neural network, and through the neural network, for the second feature points related to the second character sequence in the training image, respectively predict multiple items of the second character sequence. The boundary line corresponds to the predicted value of the parameter of the second feature point; According to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the parameter of the second feature point The true value of , to train the neural network. 根據請求項9所述的方法,所述第二字元序列的多條邊界線對應於所述第二特徵點的參數包括:所述第二字元序列的多條邊界線在所述第二特徵點對應的極座標系下的距離參數和角度參數,其中,所述第二特徵點對應的極座標系表示以所述第二特徵點為極點的極座標系; 所述根據所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的參數的真值,訓練所述神經網路,包括: 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的真值,訓練所述神經網路;和/或, 根據所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的真值,訓練所述神經網路。 According to the method of claim 9, the parameters of the plurality of boundary lines of the second character sequence corresponding to the second feature points include: the plurality of boundary lines of the second character sequence are in the second The distance parameter and the angle parameter under the polar coordinate system corresponding to the feature point, wherein, the polar coordinate system corresponding to the second feature point represents a polar coordinate system with the second feature point as a pole; The plurality of boundary lines according to the second character sequence corresponds to the predicted value of the parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponds to the second feature point The ground-truth values of the parameters that train the neural network, including: The plurality of boundary lines of the second character sequence correspond to the predicted value of the distance parameter of the second feature point, and the plurality of boundary lines of the second character sequence correspond to the predicted value of the distance parameter of the second feature point. the true value of the distance parameter to train the neural network; and/or, According to the plurality of boundary lines of the second character sequence corresponding to the predicted value of the angle parameter of the second feature point, and the plurality of boundary lines of the second character sequence corresponding to the second feature point The true value of the angle parameter to train the neural network. 根據請求項10所述的方法,所述根據所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的距離參數的真值,訓練所述神經網路,包括: 對於所述第二字元序列的多條邊界線中的任意一條邊界線,根據所述邊界線對應於所述第二特徵點的距離參數的真值和預測值中的較小值與較大值的比值,訓練所述神經網路。 The method according to claim 10, wherein the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the distance parameter of the second feature point, and the plurality of the second character sequence The boundary line corresponds to the true value of the distance parameter of the second feature point, and the training of the neural network includes: For any boundary line among the plurality of boundary lines of the second character sequence, according to the boundary line corresponding to the actual value and the predicted value of the distance parameter of the second feature point, the smaller value and the larger value The ratio of values to train the neural network. 根據請求項10所述的方法,所述根據所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的預測值,以及所述第二字元序列的多條邊界線對應於所述第二特徵點的角度參數的真值,訓練所述神經網路,包括: 對於所述第二字元序列的多條邊界線中的任意一條邊界線,確定所述邊界線對應於所述第二特徵點的角度參數的真值與預測值的差值的絕對值; 根據所述絕對值的半倍角的正弦值,訓練所述神經網路。 The method according to claim 10, wherein the plurality of boundary lines according to the second character sequence corresponds to the predicted value of the angle parameter of the second feature point, and the plurality of the second character sequence The boundary line corresponds to the true value of the angle parameter of the second feature point, and the training of the neural network includes: For any one of the multiple boundary lines of the second character sequence, determine the absolute value of the difference between the true value and the predicted value of the boundary line corresponding to the angle parameter of the second feature point; The neural network is trained according to the sine of the half angle of the absolute value. 根據請求項9所述的方法,所述第二特徵點包括所述第二字元區域對應的有效區域中的特徵點。According to the method of claim 9, the second feature points include feature points in an effective area corresponding to the second character area. 根據請求項9所述的方法,所述方法還包括: 經由所述神經網路預測所述訓練圖像中的圖元所在位置屬於字元的概率; 根據所述訓練圖像中的圖元所在位置屬於字元的概率,以及所述訓練圖像中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。 According to the method of claim 9, the method further includes: Predicting the probability that the position of the primitive in the training image belongs to the character through the neural network; The neural network is trained according to the probability that the position of the graphic element in the training image belongs to the character element, and the labeling data that the position of the graphic element in the training image belongs to the character element. 根據請求項14所述的方法,所述根據所述訓練圖像中的圖元所在位置屬於字元的概率,以及所述訓練圖像中的圖元所在位置屬於字元的標注資料,訓練所述神經網路,包括: 根據所述第二字元序列對應的有效區域中的圖元所在位置屬於字元的概率,以及所述有效區域中的圖元所在位置屬於字元的標注資料,訓練所述神經網路。 The method according to claim 14, wherein the training image is based on the probability that the position of the graphic element in the training image belongs to the character element, and the labeling data of the position of the graphic element in the training image belonging to the character element. The neural network described above, including: The neural network is trained according to the probability that the position of the graphic element in the effective area corresponding to the second character sequence belongs to the character element, and the labeling data of the position of the graphic element in the effective area belonging to the character element. 根據請求項13所述的方法,所述方法還包括: 獲取所述第二字元序列的真實邊界框的位置資訊; 根據所述真實邊界框的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域。 According to the method of claim 13, the method further comprises: obtaining location information of the real bounding box of the second character sequence; According to the position information of the real bounding box and a preset ratio, the real bounding box is reduced to obtain an effective area corresponding to the second character sequence. 根據請求項16所述的方法,所述根據所述真實邊界框的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域,包括: 根據所述真實邊界框的位置資訊,確定所述真實邊界框的錨點,其中,所述真實邊界框的錨點為所述真實邊界框的對角線的交點; 根據所述真實邊界框的位置資訊,所述真實邊界框的錨點的位置資訊,以及預設比例,縮小所述真實邊界框,得到所述第二字元序列對應的有效區域,其中,第一距離與第二距離的比值等於所述預設比例,所述第一距離表示所述有效區域的第一頂點與所述錨點之間的距離,所述第二距離表示真實邊界框中所述第一頂點對應的頂點與所述錨點之間的距離,所述第一頂點表示所述有效區域的任一頂點。 According to the method described in claim 16, according to the position information of the real bounding box and a preset ratio, reducing the real bounding box to obtain the effective area corresponding to the second character sequence, including: According to the position information of the real bounding box, the anchor point of the real bounding box is determined, wherein the anchor point of the real bounding box is the intersection of the diagonal lines of the real bounding box; According to the position information of the real bounding box, the position information of the anchor point of the real bounding box, and the preset ratio, the real bounding box is reduced to obtain the effective area corresponding to the second character sequence, wherein the first The ratio of a distance to a second distance is equal to the preset ratio, the first distance represents the distance between the first vertex of the effective area and the anchor point, and the second distance represents the distance in the real bounding box The distance between the vertex corresponding to the first vertex and the anchor point, where the first vertex represents any vertex of the effective area. 一種電子設備,包括: 一個或多個處理器; 用於儲存可執行指令的記憶體; 其中,所述一個或多個處理器被配置為調用所述記憶體儲存的可執行指令,以執行請求項1至17中任一項所述的字元檢測方法。 An electronic device comprising: one or more processors; memory used to store executable instructions; The one or more processors are configured to invoke executable instructions stored in the memory to execute the character detection method described in any one of request items 1 to 17. 一種電腦可讀儲存介質,其上儲存有電腦程式指令,所述電腦程式指令被處理器執行時實現請求項1至17中任一項所述的字元檢測方法。A computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the character detection method described in any one of request items 1 to 17.
TW110112439A 2020-11-06 2021-04-06 Character detection method, electronic equipment and computer-readable storage medium TW202219822A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011229418.1A CN112348025B (en) 2020-11-06 2020-11-06 Character detection method and device, electronic equipment and storage medium
CN202011229418.1 2020-11-06

Publications (1)

Publication Number Publication Date
TW202219822A true TW202219822A (en) 2022-05-16

Family

ID=74428376

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110112439A TW202219822A (en) 2020-11-06 2021-04-06 Character detection method, electronic equipment and computer-readable storage medium

Country Status (3)

Country Link
CN (1) CN112348025B (en)
TW (1) TW202219822A (en)
WO (1) WO2022095318A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348025B (en) * 2020-11-06 2023-04-07 上海商汤智能科技有限公司 Character detection method and device, electronic equipment and storage medium
CN113139625B (en) * 2021-05-18 2023-12-15 北京世纪好未来教育科技有限公司 Model training method, electronic equipment and storage medium thereof

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6873732B2 (en) * 2001-07-09 2005-03-29 Xerox Corporation Method and apparatus for resolving perspective distortion in a document image and for calculating line sums in images
US10579897B2 (en) * 2017-10-02 2020-03-03 Xnor.ai Inc. Image based object detection
CN108960245B (en) * 2018-07-13 2022-04-19 广东工业大学 Tire mold character detection and recognition method, device, equipment and storage medium
KR20190096872A (en) * 2019-07-31 2019-08-20 엘지전자 주식회사 Method and apparatus for recognizing handwritten characters using federated learning
CN110472597A (en) * 2019-07-31 2019-11-19 中铁二院工程集团有限责任公司 Rock image rate of decay detection method and system based on deep learning
CN110751151A (en) * 2019-10-12 2020-02-04 上海眼控科技股份有限公司 Text character detection method and equipment for vehicle body image
CN111191611B (en) * 2019-12-31 2023-10-13 同济大学 Traffic sign label identification method based on deep learning
CN112101346A (en) * 2020-08-27 2020-12-18 南方医科大学南方医院 Verification code identification method and device based on target detection
CN112348025B (en) * 2020-11-06 2023-04-07 上海商汤智能科技有限公司 Character detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112348025A (en) 2021-02-09
WO2022095318A1 (en) 2022-05-12
CN112348025B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US20210209392A1 (en) Image Processing Method and Device, and Storage Medium
US11436863B2 (en) Method and apparatus for outputting data
TWI771645B (en) Text recognition method and apparatus, electronic device, storage medium
US11288531B2 (en) Image processing method and apparatus, electronic device, and storage medium
CN110059623B (en) Method and apparatus for generating information
JP7181375B2 (en) Target object motion recognition method, device and electronic device
TW202209254A (en) Image segmentation method, electronic equipment and computer-readable storage medium thereof
US11443438B2 (en) Network module and distribution method and apparatus, electronic device, and storage medium
CN110781823B (en) Screen recording detection method and device, readable medium and electronic equipment
CN110796664B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN110543849B (en) Detector configuration method and device, electronic equipment and storage medium
CN109754464B (en) Method and apparatus for generating information
CN110059624B (en) Method and apparatus for detecting living body
TW202219822A (en) Character detection method, electronic equipment and computer-readable storage medium
WO2019080702A1 (en) Image processing method and apparatus
US20200151855A1 (en) Noise processing method and apparatus
CN112306235A (en) Gesture operation method, device, equipment and storage medium
CN112085733B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113283343A (en) Crowd positioning method and device, electronic equipment and storage medium
CN112990197A (en) License plate recognition method and device, electronic equipment and storage medium
CN109816791B (en) Method and apparatus for generating information
WO2023155350A1 (en) Crowd positioning method and apparatus, electronic device, and storage medium
CN111310595A (en) Method and apparatus for generating information
CN115100492A (en) Yolov3 network training and PCB surface defect detection method and device
CN114419298A (en) Virtual object generation method, device, equipment and storage medium