CN111126389A - Text detection method and device, electronic equipment and storage medium - Google Patents

Text detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111126389A
CN111126389A CN201911330293.9A CN201911330293A CN111126389A CN 111126389 A CN111126389 A CN 111126389A CN 201911330293 A CN201911330293 A CN 201911330293A CN 111126389 A CN111126389 A CN 111126389A
Authority
CN
China
Prior art keywords
detection
text
image
detected
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911330293.9A
Other languages
Chinese (zh)
Inventor
刘皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911330293.9A priority Critical patent/CN111126389A/en
Publication of CN111126389A publication Critical patent/CN111126389A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The embodiment of the invention discloses a text detection method, a text detection device, electronic equipment and a storage medium, wherein the text detection method comprises the following steps: the method comprises the steps of obtaining an image to be detected, constructing a detection frame corresponding to each text element in the image to be detected, respectively extracting texture features and geometric features of a region corresponding to each detection frame, obtaining incidence relations among the detection frames, classifying the detection frames according to the incidence relations, the texture features and the geometric features to obtain classified detection frames, and performing text detection on the image to be detected based on the classified detection frames.

Description

Text detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a text detection method and device, electronic equipment and a storage medium.
Background
The natural scene text detection has extremely important wide application in real life, such as text retrieval, guideboard identification, intelligent test paper correction and the like. However, the detection of the natural scene text is still a difficult task due to various uncontrollable interference factors in the natural scene, such as shadow masking, shooting angle, foreign matter masking, and the influence of some inherent attributes of the text, such as artistic words, deformed words or incomplete words. However, with the development of Artificial Intelligence (AI) technology in recent years, natural scene text detection technology based on deep learning algorithm has been advanced significantly in performance.
At present, a relatively common text detection technology is mainly a regression-based method, but in the research and practice processes of the prior art, the inventor of the present invention finds that the current regression-based method can only process the situation of a rectangular text line, and when the shape of a text is a curve, a predicted detection box of the text cannot accurately cover all text regions; in addition, for a long text line, once the aspect ratio of the text line is greater than a preset prediction threshold, a frame is lost or the prediction is incomplete, so that the detection effect of the existing text detection scheme is not good.
Disclosure of Invention
The embodiment of the invention provides a text detection method, a text detection device, electronic equipment and a storage medium, which can improve the accuracy of text detection.
The embodiment of the invention provides a text detection method, which comprises the following steps:
acquiring an image to be detected, wherein the image to be detected comprises a text to be detected, and the text to be detected comprises a plurality of text elements;
constructing a detection frame corresponding to each text element in the image to be detected;
respectively extracting texture features and geometric features of the corresponding region of each detection frame, and acquiring association relations among the detection frames;
classifying the detection frames according to the incidence relation, the texture features and the geometric features to obtain classified detection frames;
and performing text detection on the image to be detected based on the classified detection box.
Correspondingly, an embodiment of the present invention further provides a text detection apparatus, including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image to be detected, the image to be detected comprises a text to be detected, and the text to be detected comprises a plurality of text elements;
the construction module is used for constructing a detection frame corresponding to each text element in the image to be detected;
the extraction module is used for respectively extracting the texture features and the geometric features of the corresponding region of each detection frame;
the second acquisition module is used for acquiring the association relation among the detection frames;
the classification module is used for classifying the detection frames according to the incidence relation, the textural features and the geometric features to obtain the classified detection frames;
and the detection module is used for carrying out text detection on the image to be detected based on the classified detection box.
Optionally, in some embodiments of the present invention, the classification module includes:
the calculating unit is used for calculating the similarity function corresponding to each detection frame according to the association relation;
and the classification unit is used for classifying the detection frames based on the texture features, the geometric features and the similarity function to obtain the classified detection frames.
Optionally, in some embodiments of the present invention, the classifying unit includes:
the construction subunit is used for respectively constructing a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected according to the texture feature, the geometric feature and the similarity function;
and the classification subunit is used for classifying the detection frames based on the texture feature map and the geometric feature map to obtain the classified detection frames.
Optionally, in some embodiments of the present invention, the building subunit is specifically configured to:
calculating texture feature points corresponding to the image to be detected through texture features and a similarity function;
constructing a texture feature map corresponding to the image to be detected based on the texture feature points;
calculating a geometric feature point corresponding to the image to be detected through a geometric feature and a similarity function;
and constructing a geometric feature map corresponding to the image to be detected based on the geometric feature points.
Optionally, in some embodiments of the present invention, the classification subunit is specifically configured to:
fusing the texture feature map and the geometric feature map to obtain a fused feature map;
predicting the category of the detection frame through the fused feature map;
and classifying the detection frames based on the prediction result to obtain the classified detection frames.
Optionally, in some embodiments of the present invention, the detection module includes:
the determining unit is used for determining the classified detection frames belonging to the same category as a homologous group;
the construction unit is used for constructing a text box for text detection according to the classified detection boxes in the homologous group;
and the detection unit is used for carrying out text detection on the image to be detected based on the text box.
Optionally, in some embodiments of the present invention, the building unit is specifically configured to:
determining a central point corresponding to each classified detection frame in the homologous group;
obtaining the corresponding size of each classified detection frame in the homologous group;
a text box for text detection is constructed based on the center point and the size.
Optionally, in some embodiments of the present invention, the text box further includes an adjusting module, where the adjusting module is configured to adjust an edge of the text box to obtain an adjusted text box;
the detection module is specifically configured to: and performing text detection on the image to be detected based on the adjusted text box.
Optionally, in some embodiments of the present invention, the building module is specifically configured to:
performing semantic segmentation on the image to be detected to obtain target pixel points corresponding to each text element and pixel association information corresponding to each target pixel point;
and constructing a detection frame corresponding to each text element based on the pixel correlation information and the plurality of target pixel points.
After an image to be detected is obtained, the image to be detected comprises a text to be detected, the text to be detected comprises a plurality of text elements, a detection frame corresponding to each text element is constructed in the image to be detected, then, the texture feature and the geometric feature of a region corresponding to each detection frame are respectively extracted, the incidence relation among the detection frames is obtained, then, the detection frames are classified according to the incidence relation, the texture feature and the geometric feature, the classified detection frames are obtained, and finally, text detection is carried out on the image to be detected based on the classified detection frames. Therefore, the scheme can effectively improve the accuracy of text detection.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1a is a scene schematic diagram of a text detection method according to an embodiment of the present invention;
FIG. 1b is a schematic flow chart of a text detection method according to an embodiment of the present invention;
fig. 1c is a schematic diagram of an 8-neighborhood of a target pixel point in the text detection method according to the embodiment of the present invention;
fig. 1d is a schematic diagram of a reference line constructed in the text detection method provided in the embodiment of the present invention;
fig. 1e is a schematic diagram of a text box constructed in the text detection method provided in the embodiment of the present invention;
fig. 1f is a schematic diagram illustrating a text box being adjusted in the text detection method according to the embodiment of the present invention;
FIG. 2a is another schematic flow chart of a text detection method according to an embodiment of the present invention;
fig. 2b is a schematic view of another scene of the text detection method according to the embodiment of the present invention;
fig. 2c to fig. 2e are another schematic diagrams of constructing a text box in the text detection method according to the embodiment of the present invention;
fig. 3a is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention;
FIG. 3b is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a text detection method and device, electronic equipment and a storage medium.
The text detection apparatus may be specifically integrated in a terminal, and the terminal may include a mobile phone, a tablet Computer, or a Personal Computer (PC).
For example, referring to fig. 1a, the text detection apparatus is integrated on a mobile phone, the mobile phone may include a camera and a display screen, for example, when a user takes a picture of a sign through the camera, the mobile phone may acquire an image to be detected corresponding to the sign through the camera, where the image to be detected includes text content of the sign (i.e., a text to be detected), the text to be detected includes a plurality of text elements, then the mobile phone may perform semantic segmentation on the image to be detected to obtain target pixel points corresponding to each text element and pixel association information corresponding to each target pixel point, and construct a detection box corresponding to each text element based on the pixel association information and the plurality of target pixel points, then the mobile phone may extract texture features and geometric features of a region corresponding to each detection box, and acquire an association relationship between the detection boxes, finally, the mobile phone performs text detection on the image to be detected based on the classified detection frame, and can recognize the text to be detected in the image to be detected according to the detection result, for example, the mobile phone can recognize a road name and the like on a sign.
Compared with the existing text detection scheme, the text detection scheme can extract the texture features and the geometric features of the corresponding regions of each detection frame and acquire the association relationship among the detection frames, when the text is in a curve shape, the detection frames can be classified according to the association relationship, the texture features and the geometric features, then the text detection is carried out on the image to be detected based on the classified detection frames, the situation that the detection frames cannot accurately cover all text regions is avoided, in addition, for a long text line, because the scheme constructs the detection frames corresponding to each text element in the image to be detected, the problem that the frames are lost or the prediction is incomplete in the detection process can be avoided, and therefore, the scheme can improve the accuracy of text detection.
The following are detailed below. It should be noted that the description sequence of the following embodiments is not intended to limit the priority sequence of the embodiments.
A method of detecting text, comprising: the method comprises the steps of obtaining an image to be detected, constructing a detection frame corresponding to each text element in the image to be detected, respectively extracting texture features and geometric features of a region corresponding to each detection frame, obtaining incidence relations among the detection frames, classifying the detection frames according to the incidence relations, the texture features and the geometric features to obtain classified detection frames, and performing text detection on the image to be detected based on the classified detection frames.
Referring to fig. 1b, fig. 1b is a schematic flow chart of a text detection method according to an embodiment of the invention. The specific flow of the text recognition method can be as follows:
101. and acquiring an image to be detected.
The image to be detected can be pre-stored locally, can also be pulled through accessing a network interface, and can also be obtained through real-time shooting through a camera, which is determined according to actual conditions.
102. And constructing a detection frame corresponding to each text element in the image to be detected.
For example, specifically, semantic segmentation may be performed on an image to be detected, and then a detection frame corresponding to each text element is constructed in the image to be detected based on a semantic segmentation result, where the semantic segmentation of the image is to perform pixel-level identification and segmentation to obtain category information and accurate position information of an object in the image, it can be understood that, in the embodiment of the present invention, the semantic segmentation is performed on the image to be detected, so that a pixel point corresponding to each text element to be detected, that is, a target pixel point, and pixel association information corresponding to the target pixel point can be obtained, that is, optionally, in some embodiments, the step "constructing a detection frame corresponding to each text element in the image to be detected" may specifically include:
(11) performing semantic segmentation on an image to be detected to obtain target pixel points corresponding to each text element and pixel association information corresponding to each target pixel point;
(12) and constructing a detection frame corresponding to each text element based on the pixel correlation information and the plurality of target pixel points.
The pixel correlation information can be understood as pixel neighborhood information, in image processing, a neighborhood refers to a set of pixels adjacent to a certain pixel and reflects a spatial relationship between the pixels, the pixel correlation information can be pixel 4 neighborhood information, pixel diagonal neighborhood information or pixel 8 neighborhood information, the pixel 8 neighborhood information can be considered as information obtained by fusing the pixel 4 neighborhood information and the pixel diagonal neighborhood information, and when a pixel point is located at an image boundary, the point of a certain neighborhood corresponding to the pixel point can be considered to fall outside the image.
For example, the pixel points of the image to be detected may be classified by a first classification sub-model in a preset text detection model, so as to determine the pixel points belonging to the text elements, that is, obtain a plurality of target pixel points, then classify the plurality of target pixel points by a second classification sub-model in the preset text detection model, obtain a classification confidence corresponding to each target pixel point, and finally, according to the classification confidence, determine whether the target pixel point has a connection relationship with 8 neighborhoods, where 8 fields refer to an upper neighborhood, a lower neighborhood, a left neighborhood, a right neighborhood, and a diagonal neighborhood, as shown in fig. 1c, and then construct pixel association information corresponding to the target pixel points according to the connection relationships.
Wherein, the pixel predicted value refers to the probability that each pixel point in the text feature image belongs to the region of the text to be detected, the classification confidence coefficient refers to the probability that the target pixel point belongs to the text to be detected, firstly, a convolutional neural network such as an FPN (feature Pyramid) feature Pyramid network can be used, as shown in fig. 1d, the feature extraction is performed on the image to be detected, the image to be detected firstly outputs 512 layers of feature maps with the size of 32 in the convolutional neural network formed by nine convolutional layers, then, the feature maps are input into the FPN, after 3 stages of up-sampling, 32 layers of text feature maps with the size of 256 are finally output, then, the pixel points in the text feature image are classified through a first classification sub-model in a preset text detection model, the pixel predicted value corresponding to each pixel point is obtained, and then, based on the pixel predicted value, determining a plurality of target pixel points corresponding to the text to be detected, and then classifying the plurality of target pixel points through a second classification sub-model in a preset text detection model to obtain the classification confidence corresponding to each target pixel point.
103. And respectively extracting the texture features and the geometric features of the corresponding region of each detection frame, and acquiring the association relation among the detection frames.
For example, the texture feature of each detection box and the geometric feature of each detection box may be respectively extracted through a convolutional neural network, such as an fpn (feature Pyramid) feature Pyramid network, and an association relationship between the detection boxes may be obtained, where the association relationship may be a relative position relationship between the detection boxes.
104. And classifying the detection frames according to the incidence relation, the texture characteristics and the geometric characteristics to obtain the classified detection frames.
After obtaining the texture features of the detection frames and the geometric features of the detection frames, the categories to which the detection frames belong may be predicted by using the association relationships between the detection frames, where a similarity function corresponding to each detection frame may be calculated according to the association relationships, and then the detection frames are classified according to the texture features, the geometric features, and the similarity functions, so as to obtain the classified detection frames, that is, optionally, in some embodiments, the step "classifying the detection frames according to the association relationships, the texture features, and the geometric features, so as to obtain the classified detection frames" may specifically include:
(21) calculating a similarity function corresponding to each detection frame according to the association relation;
(22) classifying the detection frames based on the textural features, the geometric features and the similarity function to obtain the classified detection frames.
For example, a similarity function corresponding to each detection frame may be calculated according to the association relationship, and then the detection frames are classified by using a preset graph convolutional neural network based on the texture features, the geometric features, and the similarity functions to obtain the classified detection frames, it should be noted that, in the embodiments of the present invention, the similarity function may include a cosine similarity function, a gaussian similarity function, and/or a character string similarity function, preferably, in some embodiments, the similarity function includes a cosine similarity function, a gaussian similarity function, and a character string similarity function, that is, the detection frames are classified by using a preset graph convolutional neural network based on the texture features, the geometric features, and the similarity functions, and the obtaining of the classified detection frames may specifically include: the detection frames are classified by adopting a preset graph convolutional neural network based on the texture features, the geometric features, the cosine similarity function, the Gaussian similarity function and the character string similarity function to obtain the classified detection frames, namely, in the scheme for classifying the detection frames, not only the texture features corresponding to the detection frames and the geometric features corresponding to the detection frames need to be considered, but also the similarity of each detection frame in each dimension, such as the cosine similarity, the similarity in Gaussian distribution and the similarity between character strings need to be considered, so that the accuracy of classification of the detection frames can be improved, and the subsequent text detection of the image to be detected based on the detection frames is facilitated.
Further, a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected can be constructed through a preset map convolution neural network according to the texture feature, the geometric feature and the similarity function, then, the detection frame is classified based on the texture feature map and the geometric feature map, and the detection frame after classification is obtained, that is, in some embodiments, the step "classifying the detection frame based on the texture feature, the geometric feature and the similarity function, and obtaining the detection frame after classification" specifically may include:
(41) respectively constructing a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected according to the texture feature, the geometric feature and the similarity function;
(42) and classifying the detection frames based on the texture feature map and the geometric feature map to obtain the classified detection frames.
For example, after calculating the similarity function corresponding to each detection frame, texture feature points corresponding to the image to be detected may be calculated by using the texture features and the similarity function, and then, a texture feature map corresponding to the image to be detected is constructed based on the texture feature points, which may specifically be calculated by the following formula:
G1=K(gi,gj),i,j∈{1,2,...,N}
its characteristic texture feature map G1The sum of the product of the texture feature of the ith detection region and the similarity function K and the product of the texture feature of the jth detection region and the similarity function K is equal to β, and it should be noted that the similarity function K is1K12K23K3Wherein, K is1、K2And K3Respectively representing a cosine similarity function, a gaussian similarity function and a string similarity function, β1、β2And β3All represent weight coefficients which satisfy β12+β 31, the method can be set according to the actual situation,similarly, after the similarity function corresponding to each detection frame is calculated, the geometric feature points corresponding to the image to be detected can be calculated through the geometric features and the similarity function, then, the texture feature map corresponding to the image to be detected is constructed based on the geometric feature points, and the calculation can be specifically carried out through the following formula
G2=K(hi,hj),i,j∈{1,2,...,N}
Its characteristic geometric feature map G2That is, in some embodiments, the step "classify the detection frame based on the texture feature map and the geometric feature map to obtain the post-classification detection frame" may specifically include:
(51) calculating texture feature points corresponding to the image to be detected through texture features and a similarity function;
(52) constructing a texture feature map corresponding to the image to be detected based on the texture feature points;
(53) calculating a geometric feature point corresponding to the image to be detected through a geometric feature and a similarity function;
(54) and constructing a geometric feature map corresponding to the image to be detected based on the geometric feature points.
After the texture feature map and the geometric feature map are obtained, the texture feature map and the geometric feature map may be respectively processed by using a preset map convolutional neural network, where one layer of the preset map convolutional neural network may be defined as Z ═ ReLu (LayerNorm (gxw)) + X, G is the texture feature map or the geometric feature map mentioned above, X is a texture feature set of a region corresponding to each detection frame or a geometric feature set of a region corresponding to each detection frame, W is a weight matrix, ReLu is a nonlinear layer, LayerNorm represents layer normalization, optionally, in some embodiments, the texture feature map and the geometric feature map may be respectively processed by using a map convolutional neural network in which convolutional layers are 3 layers, and finally, outputs of the texture feature map and the geometric feature map are concatenated in channel dimensions, and the concatenated feature map is processed by using one layer of convolutional layer, and finally, obtaining the probability that the detection area I and other detection areas belong to the same category, and classifying the detection areas based on the detection result to obtain a classified detection frame.
Optionally, in some embodiments, when the probability is greater than a preset threshold, it may be determined that the two detection regions belong to the same category, for example, the preset threshold is 80%, the probability that the detection region I and the detection region a belong to the same category is 50%, the probability that the detection region I and the detection region B belong to the same category is 90%, and the probability that the detection region I and the detection region C belong to the same category is 35%, it may be determined that the detection region I and the detection region B belong to the same category, it should be noted that this category may be a text line, a phrase, or a sentence, specifically set according to an actual situation, that is, the step "classify the detection frames based on the texture feature map and the geometric feature map, and obtain the detection frames after classification", specifically may include:
(61) fusing the texture feature map and the geometric feature map to obtain a fused feature map;
(62) predicting the category of the detection frame through the fused feature map;
(63) and classifying the detection frames based on the prediction result to obtain the classified detection frames.
It should be noted that the graph convolution neural network may be pre-established, that is, in some embodiments, the graph convolution neural network may specifically include:
(71) acquiring an image sample comprising a plurality of sample detection boxes, wherein each sample detection box comprises a plurality of text element samples marked with true values of the category to which the sample detection box belongs;
(72) inputting the image sample into a basic graph convolution network to obtain a category prediction value of each category of the sample detection frame;
(73) and converging the basic graph convolution network based on the category true value and the category predicted value to obtain the graph convolution neural network.
And (3) rolling layers: the method is mainly used for feature extraction of an input image (such as a training sample or an image to be identified), wherein the size of a convolution kernel can be determined according to practical application, for example, the sizes of convolution kernels from a first layer of convolution layer to a fourth layer of convolution layer can be (7, 7), (5, 5), (3, 3), (3, 3); optionally, in order to reduce the complexity of the calculation and improve the calculation efficiency, in this embodiment, the sizes of convolution kernels of the four convolution layers may all be set to (3, 3), the activation functions all use "reduce (Linear rectification function, Rectified Linear Unit)", the padding (padding, which refers to a space between an attribute definition element border and an element content) modes are all set to "same", and the "same" padding mode may be simply understood as that an edge is padded with 0, and the number of 0 padding on the left side (upper side) is the same as or less than the number of 0 padding on the right side (lower side). Optionally, the convolutional layers may be directly connected to each other, so as to accelerate the network convergence speed, and in order to further reduce the amount of computation, downsampling (downsampling) may be performed on all layers or any 1 to 2 layers of the second to fourth convolutional layers, where the downsampling operation is substantially the same as the operation of convolution, and the downsampling convolution kernel is only a maximum value (maxpolong) or an average value (average value) of corresponding positions.
It should be noted that, for convenience of description, in the embodiment of the present invention, both the layer where the activation function is located and the down-sampling layer (also referred to as a pooling layer) are included in the convolution layer, and it should be understood that the structure may also be considered to include the convolution layer, the layer where the activation function is located, the down-sampling layer (i.e., a pooling layer), and a full-connection layer, and of course, the structure may also include an input layer for inputting data and an output layer for outputting data, which are not described herein again.
Full connection layer: the learned features may be mapped to a sample label space, which mainly functions as a "classifier" in the whole convolutional neural network, and each node of the fully-connected layer is connected to all nodes output by the previous layer (e.g., the down-sampling layer in the convolutional layer), where one node of the fully-connected layer is referred to as one neuron in the fully-connected layer, and the number of neurons in the fully-connected layer may be determined according to the requirements of the practical application, for example, in the text detection model, the number of neurons in the fully-connected layer may be set to 512 each, or may be set to 128 each, and so on. Similar to the convolutional layer, optionally, in the fully-connected layer, a non-linear factor may be added by adding an activation function, for example, an activation function sigmoid (sigmoid function) may be added.
For example, specifically, an image sample may be acquired through multiple approaches, where the image sample includes multiple sample detection boxes, each sample detection box includes multiple text element samples labeled with true values of the categories to which the sample belongs, then, the image sample is input into a base graph convolution network, the categories to which the sample detection boxes belong are predicted according to texture features and geometric features of corresponding regions of each sample detection box and an association relationship between the sample detection boxes, so as to obtain category prediction values of the sample detection boxes belonging to the categories, and finally, the base graph convolution network is converged based on the category true values and the category prediction values, so as to obtain a graph convolution neural network.
105. And performing text detection on the image to be detected based on the classified detection box.
For example, specifically, a text box for text detection may be constructed based on the classified detection box, and then text detection may be performed on the image to be detected according to the constructed text box, for example, it may be detected that the image to be detected includes two text lines, where one text line includes five characters and the other text line includes three characters, that is, in some embodiments, the step "performing text detection on the image to be detected based on the classified detection box" may specifically include:
(81) determining the classified detection frames belonging to the same category as a homologous group;
(82) constructing a text box for text detection according to the classified detection boxes in the homologous group;
(83) and performing text detection on the image to be detected based on the text box.
Specifically, after the determining the homology group, a central point corresponding to each classified detection box in the homology group may be determined, then, a size corresponding to each classified detection box may be obtained, such as a length of the classified detection box and a width of the classified detection box, and finally, a text box for text detection may be constructed based on the central point and the size, that is, in some embodiments, the step "constructing the text box for text detection according to the classified detection box in the homology group" may specifically include:
(91) determining a central point corresponding to each classified detection frame in the homologous group;
(92) obtaining the corresponding size of each classified detection frame in the homologous group;
(93) a text box for text detection is constructed based on the center point and the size.
For example, specifically, first, a central point corresponding to each classified detection frame in the homology group is determined, then, according to the arrangement of the classified detection frames in the image to be detected, the central points corresponding to each classified detection frame are sequentially connected to obtain a reference line, as shown in fig. 1d, at the same time, height values corresponding to the classified detection frames within a preset range of the central point, for example, height values corresponding to five classified detection frames around the central point are obtained, then, the obtained height values are summed and averaged to obtain a reference height, then, a text box for text detection is constructed based on the central point and the size, for example, the constructed reference line has a length of 5 cm and a reference height of 3 cm, the reference line is translated upwards by 3 cm and is translated downwards by 3 cm along a direction perpendicular to the reference line, obtaining a text box for text detection, wherein a reference line is constructed based on a central point corresponding to each classified detection box, so that the classified detection boxes corresponding to the head and tail end points of the reference line cannot be covered by the text box, and therefore, the head and tail end points of the reference line need to be respectively translated towards two sides, the distance of translation is equal to the distance of the reference height, the distance of the translated two end points is taken as the length corresponding to the text box, and at this time, the length of the text box is 16 cm, and the height of the text box is 6 cm, as shown in fig. 1 e.
It should be noted that, since the reference line is not necessarily a straight line segment, the constructed text box is not necessarily a rectangular box, which is inconvenient for subsequent text detection and text recognition, and therefore, optionally, in some embodiments, after the step "constructing the text box for text detection based on a central point and a size", specifically, the method may further include: and adjusting the edge of the text box to obtain the adjusted text box.
The step of performing text detection on the image to be detected based on the text box specifically includes: and performing text detection on the image to be detected based on the adjusted text box.
Specifically, the edges of the obtained text box may be smoothed by a Thin Plate Spline interpolation (Thin Plate Spline), and the shape of the text box is adjusted to be rectangular, so as to facilitate subsequent text detection and text recognition, as shown in fig. 1 f.
After the image to be detected is obtained, the image to be detected comprises a text to be detected, the text to be detected comprises a plurality of text elements, a detection frame corresponding to each text element is constructed in the image to be detected, then, the texture feature and the geometric feature of the corresponding area of each detection frame are respectively extracted, the incidence relation among the detection frames is obtained, then, the detection frames are classified according to the incidence relation, the texture feature and the geometric feature to obtain the classified detection frames, and finally, the text detection is carried out on the image to be detected based on the classified detection frames. Compared with the existing text detection scheme, the text detection scheme can extract the texture features and the geometric features of the corresponding regions of each detection frame and acquire the association relationship among the detection frames, when the text is in a curve shape, the detection frames can be classified according to the association relationship, the texture features and the geometric features, then the text detection is carried out on the image to be detected based on the classified detection frames, the situation that the detection frames cannot accurately cover all text regions is avoided, in addition, for a long text line, because the scheme constructs the detection frames corresponding to each text element in the image to be detected, the problem that the frames are lost or the prediction is incomplete in the detection process can be avoided, and therefore, the scheme can improve the accuracy of text detection.
The method according to the examples is further described in detail below by way of example.
In this embodiment, the text detection apparatus will be described by taking an example in which it is specifically integrated in a terminal.
Referring to fig. 2a, a text detection method may specifically include the following steps:
201. and the terminal acquires an image to be detected.
The image to be detected can be pre-stored locally, can also be pulled through accessing a network interface, and can also be obtained through real-time shooting through a camera, which is determined according to actual conditions.
202. And the terminal constructs a detection frame corresponding to each text element in the image to be detected.
For example, specifically, after the terminal performs semantic segmentation on the image to be detected, the terminal constructs a detection box corresponding to each text element in the image to be detected based on a semantic segmentation result.
203. And the terminal respectively extracts the texture features and the geometric features of the corresponding region of each detection frame and acquires the association relation among the detection frames.
For example, the terminal may extract texture features of each detection box and geometric features of each detection box through a convolutional neural network, such as an fpn (feature Pyramid) feature Pyramid network, and may obtain an association relationship between the detection boxes, where the association relationship may be a relative position relationship between the detection boxes.
204. And the terminal classifies the detection frames according to the incidence relation, the textural features and the geometric features to obtain the classified detection frames.
For example, specifically, the terminal may calculate a similarity function corresponding to each detection frame according to the association relationship, and then classify the detection frames by using a preset graph convolutional neural network based on the texture features, the geometric features, and the similarity function to obtain the classified detection frames.
Preferably, in some embodiments, the similarity function includes a cosine similarity function, a gaussian similarity function, and a string similarity function, that is, the terminal classifies the detection frame by using a preset graph convolutional neural network based on the texture feature, the geometric feature, and the similarity function, and the detection frame after being classified may specifically include: classifying the detection frames by adopting a preset graph convolutional neural network based on the texture features, the geometric features, the cosine similarity function, the Gaussian similarity function and the character string similarity function to obtain the classified detection frames, namely, in the scheme for classifying the detection frames, not only the texture features corresponding to the detection frames and the geometric features corresponding to the detection frames need to be considered, but also the similarity of each detection frame in each dimension, such as the cosine similarity, the similarity in Gaussian distribution and the similarity between character strings need to be considered, so that the accuracy of classification of the detection frames can be improved, and the subsequent text detection of the image to be detected based on the detection frames is facilitated
205. And the terminal performs text detection on the image to be detected based on the classified detection box.
For example, specifically, the terminal may construct a text box for text detection based on the classified detection boxes, and then perform text detection on an image to be detected according to the constructed text box, optionally, in some embodiments, the terminal may determine the classified detection boxes belonging to the same category as a homologous group, then, the terminal determines a central point corresponding to each classified detection box in the homologous group, then, may obtain a size corresponding to each classified detection box, such as a length of the classified detection box and a width of the classified detection box, and finally, may construct the text box for text detection based on the central point and the size.
To facilitate understanding of the text detection method provided by the embodiment of the present invention, taking a scene of a road sign as an example for further description, please refer to fig. 2b, where a text detection device is integrated in a terminal, the terminal captures the road sign through a camera to obtain an image of the road sign (i.e., a scene image), then performs feature extraction on the scene image to obtain a feature image corresponding to the scene image (i.e., an image to be detected), then the terminal constructs a detection box corresponding to each text element in the image to be detected, as shown in fig. 2c, then the terminal extracts texture features and geometric features of a region corresponding to each detection box, classifies the detection boxes based on a preset convolution neural network and similarity functions corresponding to the detection boxes to obtain classified detection boxes, as shown in fig. 2d, and then the terminal adjusts edges of the classified detection boxes, and finally, the terminal performs text detection on the image to be detected based on the adjusted detection box, as shown in fig. 2 e. After the terminal acquires the image to be detected, the image to be detected comprises a text to be detected, the text to be detected comprises a plurality of text elements, the terminal constructs a detection frame corresponding to each text element in the image to be detected, then the terminal extracts the texture feature and the geometric feature of the corresponding region of each detection frame respectively and acquires the incidence relation among the detection frames, then the terminal classifies the detection frames according to the incidence relation, the texture feature and the geometric feature to obtain the classified detection frames, and finally the terminal detects the text of the image to be detected based on the classified detection frames. Compared with the existing text detection scheme, the terminal can extract the texture features and the geometric features of the corresponding area of each detection frame and acquire the association relationship among the detection frames, when the text is in a curve shape, the detection frames can be classified according to the association relationship, the texture features and the geometric features, then the text detection is carried out on the image to be detected based on the classified detection frames, the situation that the detection frames cannot accurately cover all text areas is avoided, in addition, for a long text line, because the scheme constructs the detection frames corresponding to each text element in the image to be detected, the problem that the frames are lost or the prediction is incomplete in the detection process can be avoided, and therefore, the scheme can improve the accuracy of the text detection.
In order to better implement the text detection method according to the embodiment of the present invention, an embodiment of the present invention further provides a text detection apparatus (detection apparatus for short) based on the foregoing text detection method. The meanings of the nouns are the same as those in the text detection method, and specific implementation details can refer to the description in the method embodiment.
Referring to fig. 3a, fig. 3a is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention, where the detection apparatus may include an obtaining module 301, a constructing module 302, an extracting module 303, a second obtaining module 304, a classifying module 305, and a detecting module 306, and specifically the following modules may be included:
the first obtaining module 301 is configured to obtain an image to be detected.
The image to be detected includes a text to be detected, the text to be detected includes a plurality of text elements, where the text elements refer to various elements in the text to be detected, such as characters, symbols, and the like, and the image to be detected may be obtained by the first obtaining module 301 through pulling via the access network interface.
A constructing module 302, configured to construct a detection box corresponding to each text element in the image to be detected.
For example, specifically, the construction module 302 may perform semantic segmentation on the image to be detected, and then construct a detection box corresponding to each text element in the image to be detected based on a result of the semantic segmentation.
Optionally, in some embodiments, the building module 302 is specifically configured to: and performing semantic segmentation on the image to be detected to obtain target pixel points corresponding to each text element and pixel association information corresponding to each target pixel point, and constructing a detection frame corresponding to each text element based on the pixel association information and the plurality of target pixel points.
And the extracting module 303 is configured to extract the texture feature and the geometric feature of the region corresponding to each detection frame.
For example, the extracting module 303 may extract the texture feature of each detection box and the geometric feature of each detection box respectively through a convolutional neural network, such as an fpn (feature Pyramid) feature Pyramid network.
A second obtaining module 304, configured to obtain an association relationship between the detection frames.
The classification module 305 is configured to classify the detection frames according to the association relationship, the texture features, and the geometric features, so as to obtain the classified detection frames.
For example, specifically, the classification module 305 may predict the category to which each detection frame belongs by using the association relationship between the detection frames, where a similarity function corresponding to each detection frame may be calculated according to the association relationship, and then the detection frames are classified according to the texture features, the geometric features, and the similarity function, so as to obtain the classified detection frames.
Optionally, in some embodiments, the classification module 305 may specifically include:
the calculating unit is used for calculating the similarity function corresponding to each detection frame according to the association relation;
and the classification unit is used for classifying the detection frames based on the texture features, the geometric features and the similarity function to obtain the classified detection frames.
Optionally, in some embodiments, the classification unit may specifically include:
the construction subunit is used for respectively constructing a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected according to the texture feature, the geometric feature and the similarity function;
and the classification subunit is used for classifying the detection frames based on the texture feature map and the geometric feature map to obtain the classified detection frames.
Optionally, in some embodiments, the building subunit may specifically be configured to: calculating texture feature points corresponding to the image to be detected through texture features and a similarity function, constructing a texture feature map corresponding to the image to be detected based on the texture feature points, calculating geometric feature points corresponding to the image to be detected through geometric features and the similarity function, and constructing a geometric feature map corresponding to the image to be detected based on the geometric feature points.
Optionally, in some embodiments, the classification subunit is specifically configured to: and fusing the texture feature map and the geometric feature map to obtain a fused feature map, predicting the category of the detection frame according to the fused feature map, and classifying the detection frame based on the prediction result to obtain a classified detection frame.
A detection module 306, configured to perform text detection on the image to be detected based on the classified detection box
For example, specifically, the detection module 306 may construct a text box for text detection based on the classified detection box, and then perform text detection on the image to be detected according to the constructed text box.
Optionally, in some embodiments, the detection module may specifically include:
the determining unit is used for determining the classified detection frames belonging to the same category as a homologous group;
the construction unit is used for constructing a text box for text detection according to the classified detection boxes in the homologous group;
and the detection unit is used for performing text detection on the image to be detected based on the text box.
Optionally, in some embodiments, the construction unit is specifically configured to: determining a central point corresponding to each classified detection box in the homologous group, acquiring the size corresponding to each classified detection box in the homologous group, and constructing a text box for text detection based on the central point and the size.
Optionally, in some embodiments, please refer to fig. 3b, which further includes an adjusting module 307, where the adjusting module 307 is configured to adjust an edge of the text box to obtain an adjusted text box. The detection module 306 is specifically configured to: and performing text detection on the image to be detected based on the adjusted text box.
In the embodiment of the present invention, after a first obtaining module 301 obtains an image to be detected, the image to be detected includes a text to be detected, the text to be detected includes a plurality of text elements, a constructing module 302 constructs a detection frame corresponding to each text element in the image to be detected, then an extracting module 303 extracts texture features and geometric features of a region corresponding to each detection frame, respectively, and a second obtaining module 304 can obtain an association relationship between the detection frames, then a classifying module 305 classifies the detection frames according to the association relationship, the texture features and the geometric features to obtain classified detection frames, and finally, a detecting module 306 performs text detection on the image to be detected based on the classified detection frames. Compared with the existing text detection scheme, in the text detection apparatus of the present invention, the extraction module 303 may extract the texture feature and the geometric feature of the corresponding region of each detection box, and the second obtaining module 304 can obtain the association relationship between the detection boxes, when the shape of the text is a curve, the classification module 305 may classify the detection boxes according to the association relationship, the texture feature and the geometric feature, then, the detection module 306 performs text detection on the image to be detected based on the classified detection box, so as to avoid the situation that the detection box cannot accurately cover all text regions, and in addition, for long text lines, since the scheme constructs a detection box corresponding to each text element in the image to be detected, the problem of frame missing or incomplete prediction in the detection process can be avoided, and therefore the accuracy of text detection can be improved by the scheme.
In addition, an embodiment of the present invention further provides an electronic device, as shown in fig. 4, which shows a schematic structural diagram of the electronic device according to the embodiment of the present invention, specifically:
the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
the method comprises the steps of obtaining an image to be detected, constructing a detection frame corresponding to each text element in the image to be detected, respectively extracting texture features and geometric features of a region corresponding to each detection frame, obtaining incidence relations among the detection frames, classifying the detection frames according to the incidence relations, the texture features and the geometric features to obtain classified detection frames, and performing text detection on the image to be detected based on the classified detection frames.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
After the image to be detected is obtained, the image to be detected comprises a text to be detected, the text to be detected comprises a plurality of text elements, a detection frame corresponding to each text element is constructed in the image to be detected, then, the texture feature and the geometric feature of the corresponding area of each detection frame are respectively extracted, the incidence relation among the detection frames is obtained, then, the detection frames are classified according to the incidence relation, the texture feature and the geometric feature to obtain the classified detection frames, and finally, the text detection is carried out on the image to be detected based on the classified detection frames. Compared with the existing text detection scheme, the text detection scheme can extract the texture features and the geometric features of the corresponding regions of each detection frame and acquire the association relationship among the detection frames, when the text is in a curve shape, the detection frames can be classified according to the association relationship, the texture features and the geometric features, then the text detection is carried out on the image to be detected based on the classified detection frames, the situation that the detection frames cannot accurately cover all text regions is avoided, in addition, for a long text line, because the scheme constructs the detection frames corresponding to each text element in the image to be detected, the problem that the frames are lost or the prediction is incomplete in the detection process can be avoided, and therefore, the scheme can improve the accuracy of text detection.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the text detection methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:
the method comprises the steps of obtaining an image to be detected, constructing a detection frame corresponding to each text element in the image to be detected, respectively extracting texture features and geometric features of a region corresponding to each detection frame, obtaining incidence relations among the detection frames, classifying the detection frames according to the incidence relations, the texture features and the geometric features to obtain classified detection frames, and performing text detection on the image to be detected based on the classified detection frames.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any text detection method provided in the embodiments of the present invention, the beneficial effects that can be achieved by any text detection method provided in the embodiments of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The text detection method, the text detection device, the electronic device, and the storage medium according to the embodiments of the present invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core concept of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (15)

1. A text detection method, comprising:
acquiring an image to be detected, wherein the image to be detected comprises a text to be detected, and the text to be detected comprises a plurality of text elements;
constructing a detection frame corresponding to each text element in the image to be detected;
respectively extracting texture features and geometric features of the corresponding region of each detection frame, and acquiring association relations among the detection frames;
classifying the detection frames according to the incidence relation, the texture features and the geometric features to obtain classified detection frames;
and performing text detection on the image to be detected based on the classified detection box.
2. The method of claim 1, wherein the classifying the detection frame according to the association relationship, the texture feature and the geometric feature to obtain a classified detection frame comprises:
calculating a similarity function corresponding to each detection frame according to the association relation;
classifying the detection frames based on the textural features, the geometric features and the similarity function to obtain the classified detection frames.
3. The method of claim 2, wherein the classifying the detection frame based on the texture feature, the geometric feature and the similarity function to obtain a classified detection frame comprises:
respectively constructing a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected according to the texture feature, the geometric feature and the similarity function;
and classifying the detection frames based on the texture feature map and the geometric feature map to obtain the classified detection frames.
4. The method according to claim 3, wherein the constructing a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected according to the texture feature, the geometric feature and the similarity function respectively comprises:
calculating texture feature points corresponding to the image to be detected through texture features and a similarity function;
constructing a texture feature map corresponding to the image to be detected based on the texture feature points;
calculating a geometric feature point corresponding to the image to be detected through a geometric feature and a similarity function;
and constructing a geometric feature map corresponding to the image to be detected based on the geometric feature points.
5. The method of claim 3, wherein classifying the detection frame based on the texture feature map and the geometric feature map to obtain a classified detection frame comprises:
fusing the texture feature map and the geometric feature map to obtain a fused feature map;
predicting the category of the detection frame through the fused feature map;
and classifying the detection frames based on the prediction result to obtain the classified detection frames.
6. The method according to any one of claims 1 to 5, wherein the text detection of the image to be detected based on the classified detection box comprises:
determining the classified detection frames belonging to the same category as a homologous group;
constructing a text box for text detection according to the classified detection boxes in the homologous group;
and performing text detection on the image to be detected based on the text box.
7. The method of claim 6, wherein constructing the text box for text detection according to the classified detection boxes in the homology group comprises:
determining a central point corresponding to each classified detection frame in the homologous group;
obtaining the corresponding size of each classified detection frame in the homologous group;
a text box for text detection is constructed based on the center point and the size.
8. The method of claim 6, wherein after constructing the text box for text detection based on the center point and the size, further comprising:
adjusting the edge of the text box to obtain an adjusted text box;
the text detection of the image to be detected based on the text box comprises the following steps: and performing text detection on the image to be detected based on the adjusted text box.
9. The method according to any one of claims 1 to 5, wherein constructing a detection box corresponding to each text element in the image to be detected comprises:
performing semantic segmentation on the image to be detected to obtain target pixel points corresponding to each text element and pixel association information corresponding to each target pixel point;
and constructing a detection frame corresponding to each text element based on the pixel correlation information and the plurality of target pixel points.
10. A text detection apparatus, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image to be detected, the image to be detected comprises a text to be detected, and the text to be detected comprises a plurality of text elements;
the construction module is used for constructing a detection frame corresponding to each text element in the image to be detected;
the extraction module is used for respectively extracting the texture features and the geometric features of the corresponding region of each detection frame;
the second acquisition module is used for acquiring the association relation among the detection frames;
the classification module is used for classifying the detection frames according to the incidence relation, the textural features and the geometric features to obtain the classified detection frames;
and the detection module is used for carrying out text detection on the image to be detected based on the classified detection box.
11. The apparatus of claim 10, wherein the classification module comprises:
the calculating unit is used for calculating the similarity function corresponding to each detection frame according to the association relation;
and the classification unit is used for classifying the detection frames based on the texture features, the geometric features and the similarity function to obtain the classified detection frames.
12. The apparatus of claim 11, wherein the classification unit comprises:
the construction subunit is used for respectively constructing a texture feature map corresponding to the image to be detected and a geometric feature map corresponding to the image to be detected according to the texture feature, the geometric feature and the similarity function;
and the classification subunit is used for classifying the detection frames based on the texture feature map and the geometric feature map to obtain the classified detection frames.
13. The apparatus according to claim 12, wherein the building subunit is specifically configured to:
calculating texture feature points corresponding to the image to be detected through texture features and a similarity function;
constructing a texture feature map corresponding to the image to be detected based on the texture feature points;
calculating a geometric feature point corresponding to the image to be detected through a geometric feature and a similarity function;
and constructing a geometric feature map corresponding to the image to be detected based on the geometric feature points.
14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the text detection method according to any of claims 1-9 are implemented when the program is executed by the processor.
15. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the text detection method according to any one of claims 1 to 9.
CN201911330293.9A 2019-12-20 2019-12-20 Text detection method and device, electronic equipment and storage medium Pending CN111126389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911330293.9A CN111126389A (en) 2019-12-20 2019-12-20 Text detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911330293.9A CN111126389A (en) 2019-12-20 2019-12-20 Text detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111126389A true CN111126389A (en) 2020-05-08

Family

ID=70501539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911330293.9A Pending CN111126389A (en) 2019-12-20 2019-12-20 Text detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111126389A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738233A (en) * 2020-08-07 2020-10-02 北京易真学思教育科技有限公司 Text detection method, electronic device and computer readable medium
CN113033558A (en) * 2021-04-19 2021-06-25 深圳市华汉伟业科技有限公司 Text detection method and device for natural scene and storage medium
CN113077484A (en) * 2021-03-30 2021-07-06 中国人民解放军战略支援部队信息工程大学 Image instance segmentation method
CN113361238A (en) * 2021-05-21 2021-09-07 北京语言大学 Method and device for automatically proposing question by recombining question types with language blocks
WO2022017299A1 (en) * 2020-07-24 2022-01-27 北京字节跳动网络技术有限公司 Text inspection method and apparatus, electronic device, and storage medium
CN114511864A (en) * 2022-04-19 2022-05-17 腾讯科技(深圳)有限公司 Text information extraction method, target model acquisition method, device and equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022017299A1 (en) * 2020-07-24 2022-01-27 北京字节跳动网络技术有限公司 Text inspection method and apparatus, electronic device, and storage medium
CN111738233A (en) * 2020-08-07 2020-10-02 北京易真学思教育科技有限公司 Text detection method, electronic device and computer readable medium
CN113077484A (en) * 2021-03-30 2021-07-06 中国人民解放军战略支援部队信息工程大学 Image instance segmentation method
CN113033558A (en) * 2021-04-19 2021-06-25 深圳市华汉伟业科技有限公司 Text detection method and device for natural scene and storage medium
CN113033558B (en) * 2021-04-19 2024-03-19 深圳市华汉伟业科技有限公司 Text detection method and device for natural scene and storage medium
CN113361238A (en) * 2021-05-21 2021-09-07 北京语言大学 Method and device for automatically proposing question by recombining question types with language blocks
CN113361238B (en) * 2021-05-21 2022-02-11 北京语言大学 Method and device for automatically proposing question by recombining question types with language blocks
CN114511864A (en) * 2022-04-19 2022-05-17 腾讯科技(深圳)有限公司 Text information extraction method, target model acquisition method, device and equipment

Similar Documents

Publication Publication Date Title
CN111126389A (en) Text detection method and device, electronic equipment and storage medium
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
CN109934173B (en) Expression recognition method and device and electronic equipment
WO2020073951A1 (en) Method and apparatus for training image recognition model, network device, and storage medium
CN111126140A (en) Text recognition method and device, electronic equipment and storage medium
CN111242844B (en) Image processing method, device, server and storage medium
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN111401376A (en) Target detection method, target detection device, electronic equipment and storage medium
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN109086777B (en) Saliency map refining method based on global pixel characteristics
WO2021164550A1 (en) Image classification method and apparatus
CN111709497A (en) Information processing method and device and computer readable storage medium
CN110543906B (en) Automatic skin recognition method based on Mask R-CNN model
CN112528845B (en) Physical circuit diagram identification method based on deep learning and application thereof
CN109766790B (en) Pedestrian detection method based on self-adaptive characteristic channel
WO2023142602A1 (en) Image processing method and apparatus, and computer-readable storage medium
CN111242019A (en) Video content detection method and device, electronic equipment and storage medium
CN115393635A (en) Infrared small target detection method based on super-pixel segmentation and data enhancement
CN116452810A (en) Multi-level semantic segmentation method and device, electronic equipment and storage medium
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN115909336A (en) Text recognition method and device, computer equipment and computer-readable storage medium
WO2022127333A1 (en) Training method and apparatus for image segmentation model, image segmentation method and apparatus, and device
CN113139540B (en) Backboard detection method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination