CN114067329A - Text image detection method, device, medium and equipment - Google Patents

Text image detection method, device, medium and equipment Download PDF

Info

Publication number
CN114067329A
CN114067329A CN202111395006.XA CN202111395006A CN114067329A CN 114067329 A CN114067329 A CN 114067329A CN 202111395006 A CN202111395006 A CN 202111395006A CN 114067329 A CN114067329 A CN 114067329A
Authority
CN
China
Prior art keywords
character
detection
branch
segmentation
text image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111395006.XA
Other languages
Chinese (zh)
Inventor
朱浩
李丽
孟彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Information Technology Beijing Co ltd
Original Assignee
China Post Information Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Post Information Technology Beijing Co ltd filed Critical China Post Information Technology Beijing Co ltd
Priority to CN202111395006.XA priority Critical patent/CN114067329A/en
Publication of CN114067329A publication Critical patent/CN114067329A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The embodiment of the application discloses a text image detection method, a text image detection device, a text image detection medium and text image detection equipment. Wherein, the method comprises the following steps: acquiring a text image to be detected; carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network. According to the technical scheme, the generalization capability of the character detection model is enhanced by building the multi-branch character detection model, accurate region detection box regression and character segmentation are realized, the robustness and accuracy of text image detection are effectively improved, and the detection time is reduced.

Description

Text image detection method, device, medium and equipment
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to a text image detection method, a text image detection device, a text image detection medium and text image detection equipment.
Background
With the continuous development of deep learning technology, character detection plays an important role as an important part of optical character recognition technology. The character detection has wide application scenes, such as character detection scenes of identity cards, bank cards, invoices, express bills and the like.
In the prior art, character detection in a text image is mainly to perform character region positioning on the text image by adopting a target segmentation algorithm, and then perform character segmentation on a character region, thereby achieving the purpose of character detection.
The method based on character segmentation can effectively solve the problem of character direction distribution diversity, but the existing character detection algorithm is easy to have the problems of false detection, missing detection, offset of a detection frame and the like under the conditions of complex background, fuzzy characters and the like.
Disclosure of Invention
The embodiment of the application provides a text image detection method, a text image detection device, a text image detection medium and text image detection equipment.
In a first aspect, an embodiment of the present application provides a method for detecting a text image, where the method includes:
acquiring a text image to be detected;
carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
In a second aspect, an embodiment of the present application provides an apparatus for detecting a text image, where the apparatus includes:
the text image acquisition module is used for acquiring a text image to be detected;
the character detection result determining module is used for carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
In a third aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the text image detection method according to the present application.
In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor executes the computer program to implement the text image detection method according to the embodiment of the present application.
According to the technical scheme provided by the embodiment of the application, the text image to be detected is obtained, and the text detection is carried out on the text image to be detected by utilizing the pre-trained text detection model, so that the text detection result is obtained. The character detection model comprises a backbone network, an area detection branch and a character segmentation branch. The region detection branch and the character segmentation branch are connected behind the backbone network. According to the scheme, the generalization capability of the character detection model can be enhanced by building the multi-branch character detection model, the regression and character segmentation of the accurate character region detection box are realized, the robustness and the accuracy of text image detection are effectively improved, and the detection time is reduced.
Drawings
Fig. 1A is a flowchart of a text image detection method according to an embodiment of the present application;
fig. 1B is a schematic structural diagram of a text detection model according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for detecting a text image according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a text image detection apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1A is a flowchart of a text image detection method provided in an embodiment of the present application, where this embodiment is applicable to any text image detection scenario, and the method may be executed by a text image detection apparatus provided in an embodiment of the present application, where the apparatus may be implemented by software and/or hardware, and may be integrated in an electronic device.
As shown in fig. 1A, the method for detecting a text image includes:
and S110, acquiring a text image to be detected.
The scheme can be executed by electronic equipment such as a computer, a server, a workstation and the like, and on the basis, the electronic equipment can also be provided with image acquisition equipment. The electronic equipment can autonomously acquire the text image to be detected. The text image can be picture data such as an identity card, a bank card, a text screenshot, an invoice and an express bill, and can also be video data such as a monitoring, a movie and a documentary. The mode of acquiring the text image to be detected can be to read the picture data acquired by the image receiving device in real time, such as an identification card recognition scene. The electronic equipment can also selectively acquire the text image to be detected under the control of the detection instruction, such as scenes of video understanding, video searching and the like.
After the text image to be detected is obtained, if the text image to be detected is fuzzy, the target size is not suitable and the like, the electronic equipment can preprocess the text image to be detected through image enhancement, image cutting, image scaling and other means so as to improve the image quality.
S120, carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
By utilizing the pre-trained character detection model, the electronic equipment can perform character detection on the text image to be detected and obtain a detection result. Fig. 1B is a schematic structural diagram of a text detection model provided in an embodiment of the present application, and as shown in fig. 1B, the text detection model includes a backbone network, an area detection branch and a text segmentation branch, and the area detection branch and the text segmentation branch are both connected behind the backbone network. The region detection branch can be used for positioning a region where characters in the text image to be detected are located, and labeling the character region by using a detection box with a regular shape. The text segmentation branch can be used for further segmenting the text in the text area, and determining an inner boundary and an outer boundary in the text area. The text detection model may be a deep learning model, which may be a convolutional neural network, for example. The backbone network can be any network structure with a feature extraction function, for example, a classical convolutional neural network structure such as AlexNet, VGG-Net and ResNet, or an autonomously built network structure meeting the scene requirements. The area detection branch may have a network structure having a feature extraction function, or may have a structure having only an area detection function. Similarly, the character segmentation branch may have a structure with a feature extraction function, or may have only a structure with a character segmentation function. If the region detection branch and the character segmentation branch both have network structures with feature extraction functions, the network structures may be the same or different.
In this scheme, optionally, the text segmentation branches include a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch; the first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network.
In order to realize the targeted character segmentation target, the character segmentation branch can also continue to branch according to the attention point of the character segmentation. As shown in fig. 1B, the text segmentation branch may include a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch. The first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network. The first text division branch may be used for an irregular area composed of an inner boundary and an outer boundary in the text area of interest. The second text segmentation branch may be used to focus on the difference between the text and the background in the text region.
According to the scheme, the character segmentation branches are branched again, so that attention to different characteristics in the character segmentation process can be paid to, and character detection with higher discrimination is facilitated.
In order to enable each branch to achieve the extraction of the target feature, the electronic device may set a different loss function for each branch. In one possible solution, optionally, the region detection branch is provided with a first loss function; the first character segmentation branch is provided with a second loss function; and the second character segmentation branch is provided with a third loss function.
On the basis of the above scheme, optionally, the first loss function expression is:
L1=λ1l1
l1=lA+lB
Figure BDA0003370009480000061
Figure BDA0003370009480000062
wherein L is1Representing a first loss function, λ1For a predetermined coefficient, /)ARepresenting regression of the length and width of a text region,/BExpressing regression to character region, x, Y expressing coordinate position of pixel point of text image, c expressing quantity of extracted characteristic channel, YxycIndicating the probability of text being present at the coordinates x, y,
Figure BDA0003370009480000063
the probability of the character detection model predicting the existence of characters at the coordinates x and y is represented, alpha and beta are hyper-parameters, and N represents the number of key points of the text image;
and, the second penalty function expression is:
L2=λ2l2
Figure BDA0003370009480000064
wherein L is2Representing a second loss function, λ2For the preset coefficients, (i, j) represents the coordinate position of the feature map, SlRepresents a characteristic diagram, Y(i,j)The literal real result representing the position of the feature map (i, j),
Figure BDA0003370009480000065
a text prediction result representing the position of the feature map (i, j);
and, the third loss function expression is:
L3=λ3l3
Figure BDA0003370009480000071
wherein L is3Representing a second loss function, λ3P represents a probability map of the text region, T represents a segmentation map of the text region,gamma is a hyperparameter;
wherein λ is123=1。
According to the scheme, the multi-branch structure is designed, so that the character detection model can learn different distribution characteristics of the text, meanwhile, different loss functions are designed for different branch tasks, and the purpose is to improve the robustness of the character detection model on character detection and improve the generalization capability. The scheme can effectively solve the problems of character false detection, missed detection and detection frame offset caused by the conditions of image blurring, complex background and the like.
Further, in the above-described aspect, by presetting the coefficient λ1、λ2And λ3The influence degree of each branch on the main network and the segmentation sharing network of the character detection model can be set respectively, thereby being beneficial to realizing flexible and effective network regulation, accelerating the convergence speed of the character detection model and improving the detection effect.
Optionally, the training process of the text detection model includes:
acquiring text image training data;
inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the character segmentation branches to obtain character segmentation features;
and optimizing the backbone network according to the region detection characteristics and the character segmentation characteristics until preset conditions are met to obtain a training result of the character detection model.
When a character detection model is trained, text image training data, which may be a large number of original text images, needs to be acquired first. Under the condition of less text image training data, the electronic equipment can also utilize a small amount of original text images to perform amplification through rotation, mirror image, scaling and other modes, so that diversified text image training data are obtained, and a robust character detection model is favorably trained. Under the condition that the text image is difficult to acquire, the electronic equipment can also construct and generate text image training data through other types of text image templates and text data.
After a sufficient amount of text image training data is acquired, the electronic equipment can input the text image training data into the character detection model in batches for subsequent training. On the premise that the hardware environment condition allows, the electronic equipment can also input all text image training data into the character detection model for training. After passing through the backbone network, the character detection model can learn the intermediate features in the text image training data, and transmit the intermediate features to the region detection branch and the character segmentation branch.
And the intermediate features are subjected to the region detection branch to obtain region detection features, and the region detection features are fed back to the backbone network, so that the backbone network can be continuously optimized, and features which meet the region detection requirements better are extracted. And the intermediate features are subjected to the character segmentation branch to obtain character segmentation features, and the character segmentation features are fed back to the backbone network, so that the backbone network can be continuously optimized, and features which are more in line with the character segmentation requirements are extracted. The backbone network is continuously optimized through the two branches, and the backbone network is continuously learned. After the training is stable, the backbone network can extract the features meeting the regional detection requirements and can also extract the features meeting the text segmentation requirements.
It should be noted that the preset condition may be that the loss rates of the region detection branch and the text segmentation branch both satisfy respective preset thresholds. The preset condition may also be that loss curves of the region detection branch and the character segmentation branch drawn according to the training iteration number both conform to a preset form. Besides, the electronic device can calculate the overall loss rate of the character detection model or draw a loss curve according to the loss rate of each branch and the respective proportionality coefficient. Whether the training requirement is met is determined by the overall loss rate or loss curve. The judgment basis of the preset condition can be loss rate, loss curve, accuracy rate, recall rate, accuracy and the like. The preset condition may be determined according to one or more criteria.
According to the scheme, the main network can be optimized by utilizing the region detection characteristics and the character segmentation characteristics. The scheme can enhance the robustness of the backbone network, so that the backbone network can carry out targeted learning, the training speed is further accelerated, and the stability of the character detection model training is ensured.
According to the technical scheme provided by the embodiment of the application, the text image to be detected is obtained, and the text detection is carried out on the text image to be detected by utilizing the pre-trained text detection model, so that the text detection result is obtained. The character detection model comprises a backbone network, an area detection branch and a character segmentation branch. The region detection branch and the character segmentation branch are connected behind the backbone network. According to the scheme, the generalization capability of the character detection model can be enhanced by building the multi-branch character detection model, the regression and character segmentation of the accurate character region detection box are realized, the robustness and the accuracy of text image detection are effectively improved, and the detection time is reduced.
Example two
Fig. 2 is a flowchart of a text image detection method according to a second embodiment of the present invention, which is optimized based on the above-described embodiment.
As shown in fig. 2, the method of this embodiment specifically includes the following steps:
and S210, acquiring text image training data.
S220, inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; and the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features.
It can be understood that, according to the structure of the text detection model shown in fig. 1B, the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features. The electronic device may continue to communicate the segmentation sharing feature.
S230, transmitting the segmentation sharing feature to a first character segmentation branch and a second character segmentation branch; the segmentation shared features pass through the first character segmentation branch to obtain first character segmentation features; and the segmentation sharing feature passes through the second character segmentation branch to obtain a second character segmentation feature.
The electronic device may transmit the segmentation sharing feature to the first text segmentation branch and the second text segmentation branch, and obtain a corresponding segmentation feature.
S240, optimizing the segmentation shared network according to the first character segmentation characteristic and the second character segmentation characteristic; optimizing the backbone network according to the region detection feature, the first character segmentation feature and the second character segmentation feature; and obtaining the training result of the character detection model until the preset conditions are met.
According to the first character segmentation characteristic and the second character segmentation characteristic, the electronic equipment can optimize the segmentation sharing network by continuously feeding back different segmentation characteristics fed back by the first character segmentation branch and the second character segmentation branch to the segmentation sharing network. Meanwhile, the electronic equipment can also optimize the backbone network by continuously feeding back respective characteristics of the first character segmentation branch, the second character segmentation branch and the area detection branch to the backbone network. And when the training result of the character detection model reaches the preset condition, the character detection model is considered to be successfully trained.
And S250, acquiring a text image to be detected.
S260, carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model; the character detection model is a deep learning model.
And S270, determining a character detection result according to the character segmentation branches.
When the character detection of the actual scene is carried out, the electronic equipment does not need to determine the final detection result according to the output results of the multiple branches. The character detection result is determined only according to the output result of the character segmentation branch. The reason is that the character detection model is to realize character detection finally, and the region detection branch is only used for marking a regular region where a character is located, so that the detection of a character boundary cannot be realized. Furthermore, the electronic device may determine the text detection result only according to the output result of the first text segmentation branch or the second text segmentation branch. It should be noted that, due to the robustness of the text detection model, the determined text detection result is still accurate according to a single branch, and the text detection model has general adaptability.
According to the scheme, branches can be cut according to characters, and character detection results are determined, so that the character detection speed is accelerated, and the time is saved.
According to the technical scheme provided by the embodiment of the application, the shared network is divided for optimization by utilizing the first character dividing characteristic and the second character dividing characteristic; optimizing the backbone network through the region detection feature, the first character segmentation feature and the second character segmentation feature; and further obtaining a training result of the character detection model meeting the preset conditions. And determining a character detection result according to the first character segmentation branch by using the trained deep learning model. According to the scheme, the generalization capability of the character detection model can be enhanced by building the multi-branch character detection model, the regression and character segmentation of the accurate character region detection box are realized, the robustness and the accuracy of text image detection are effectively improved, and the detection time is reduced.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a text image detection apparatus according to a third embodiment of the present invention, which is capable of executing a text image detection method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 3, the apparatus may include:
the to-be-detected text image acquisition module 310 is configured to acquire a to-be-detected text image;
the character detection result determining module 320 is configured to perform character detection on the text image to be detected by using a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
In this scheme, optionally, the text segmentation branches include a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch; the first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network.
In one possible solution, optionally, the region detection branch is provided with a first loss function; the first character segmentation branch is provided with a second loss function; and the second character segmentation branch is provided with a third loss function.
On the basis of the above scheme, optionally, the first loss function expression is:
the first loss function expression is:
L1=λ1l1
l1=lA+lB
Figure BDA0003370009480000121
Figure BDA0003370009480000122
wherein L is1Representing a first loss function, λ1For a predetermined coefficient, /)ARepresenting regression of the length and width of a text region,/BExpressing regression to character region, x, Y expressing coordinate position of pixel point of text image, c expressing quantity of extracted characteristic channel, YxycIndicating the probability of text being present at the coordinates x, y,
Figure BDA0003370009480000123
a summary indicating that the text detection model predicts the presence of text at coordinates x, yThe rate, alpha and beta are hyper-parameters, and N represents the number of key points of the text image;
and, the second penalty function expression is:
L2=λ2l2
Figure BDA0003370009480000131
wherein L is2Representing a second loss function, λ2For the preset coefficients, (i, j) represents the coordinate position of the feature map, SlRepresents a characteristic diagram, Y(i,j)The literal real result representing the position of the feature map (i, j),
Figure BDA0003370009480000132
a text prediction result representing the position of the feature map (i, j);
and, the third loss function expression is:
L3=λ3l3
Figure BDA0003370009480000133
wherein L is3Representing a second loss function, λ3The method comprises the following steps that (1) a preset coefficient is obtained, P represents a probability graph of a character region, T represents a segmentation graph of the character region, and gamma is a hyper-parameter;
wherein λ is123=1。
This scheme, optional, the device still includes characters detection model training module for:
acquiring text image training data;
inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the character segmentation branches to obtain character segmentation features;
and optimizing the backbone network according to the region detection characteristics and the character segmentation characteristics until preset conditions are met to obtain a training result of the character detection model.
In an optional preferred embodiment, the text detection model training module is specifically configured to:
acquiring text image training data;
inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features;
transmitting the segmentation sharing feature to a first character segmentation branch and a second character segmentation branch; the segmentation shared features pass through the first character segmentation branch to obtain first character segmentation features; the segmentation sharing feature passes through the second character segmentation branch to obtain a second character segmentation feature;
optimizing the segmentation sharing network according to the first character segmentation characteristic and the second character segmentation characteristic; optimizing the backbone network according to the region detection feature, the first character segmentation feature and the second character segmentation feature; and obtaining the training result of the character detection model until the preset conditions are met.
In another preferred embodiment, optionally, the text detection result determining module 320 is specifically configured to:
carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model; the character detection model is a deep learning model;
and determining a character detection result according to the character segmentation branches.
The product can execute the text image detection method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.
Example four
A fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for detecting a text image according to the embodiments of the present invention:
acquiring a text image to be detected;
carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
EXAMPLE five
The fifth embodiment of the application provides electronic equipment. Fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. As shown in fig. 4, the present embodiment provides an electronic device 400, which includes: one or more processors 420; the storage device 410 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 420, the one or more processors 420 implement the method for detecting a text image provided in the embodiment of the present application, the method includes:
acquiring a text image to be detected;
carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
Of course, those skilled in the art can understand that the processor 420 also implements the technical solution of the text image detection method provided in any embodiment of the present application.
The electronic device 400 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the electronic device 400 includes a processor 420, a storage device 410, an input device 430, and an output device 440; the number of the processors 420 in the electronic device may be one or more, and one processor 420 is taken as an example in fig. 4; the processor 420, the storage device 410, the input device 430, and the output device 440 in the electronic apparatus may be connected by a bus or other means, and are exemplified by a bus 450 in fig. 4.
The storage device 410 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the text image detection method in the embodiment of the present application.
The storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 410 may further include memory located remotely from processor 420, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 430 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and function control of the electronic device. The output device 440 may include a display screen, speakers, or other electronic equipment.
The electronic equipment provided by the embodiment of the application can enhance the generalization capability of the character detection model by building the character detection model with multiple branches, and realizes accurate character region detection box regression and character segmentation.
The detection device, the medium and the electronic device for the text image provided in the embodiments above can execute the detection method for the text image provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For details of the text image detection method, reference may be made to any of the embodiments of the present application.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for detecting a text image, the method comprising:
acquiring a text image to be detected;
carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
2. The method of claim 1, wherein the text segmentation branch comprises a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch; the first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network.
3. The method according to claim 2, characterized in that the region detection branch is provided with a first loss function; the first character segmentation branch is provided with a second loss function; and the second character segmentation branch is provided with a third loss function.
4. The method of claim 3, wherein the first penalty function expression is:
L1=λ1l1
l1=lA+lB
Figure FDA0003370009470000011
Figure FDA0003370009470000012
wherein L is1Representing a first loss function, λ1For a predetermined coefficient, /)ARepresenting regression of the length and width of a text region,/BExpressing regression to character region, x, Y expressing coordinate position of pixel point of text image, c expressing quantity of extracted characteristic channel, YxycIndicating the probability of text being present at the coordinates x, y,
Figure FDA0003370009470000013
the probability of the character detection model predicting the existence of characters at the coordinates x and y is represented, alpha and beta are hyper-parameters, and N represents the number of key points of the text image;
and, the second penalty function expression is:
L2=λ2l2
Figure FDA0003370009470000021
wherein L is2Representing a second loss function, λ2For the preset coefficients, (i, j) represents the coordinate position of the feature map, SlRepresents a characteristic diagram, Y(i,j)The literal real result representing the position of the feature map (i, j),
Figure FDA0003370009470000022
a text prediction result representing the position of the feature map (i, j);
and, the third loss function expression is:
L3=λ3l3
Figure FDA0003370009470000023
wherein L is3Representing a second loss function, λ3The method comprises the following steps that (1) a preset coefficient is obtained, P represents a probability graph of a character region, T represents a segmentation graph of the character region, and gamma is a hyper-parameter;
wherein λ is123=1。
5. The method of claim 1, wherein the training process of the text detection model comprises:
acquiring text image training data;
inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the character segmentation branches to obtain character segmentation features;
and optimizing the backbone network according to the region detection characteristics and the character segmentation characteristics until preset conditions are met to obtain a training result of the character detection model.
6. The method of claim 2, wherein the training process of the text detection model comprises:
acquiring text image training data;
inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features;
transmitting the segmentation sharing feature to a first character segmentation branch and a second character segmentation branch; the segmentation shared features pass through the first character segmentation branch to obtain first character segmentation features; the segmentation sharing feature passes through the second character segmentation branch to obtain a second character segmentation feature;
optimizing the segmentation sharing network according to the first character segmentation characteristic and the second character segmentation characteristic; optimizing the backbone network according to the region detection feature, the first character segmentation feature and the second character segmentation feature; and obtaining the training result of the character detection model until the preset conditions are met.
7. The method according to claim 1, wherein the performing text detection on the text image to be detected by using a pre-trained text detection model to obtain a text detection result comprises:
carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model; the character detection model is a deep learning model;
and determining a character detection result according to the character segmentation branches.
8. An apparatus for detecting a text image, the apparatus comprising:
the text image acquisition module is used for acquiring a text image to be detected;
the character detection result determining module is used for carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of detecting a text image according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of detecting a text image according to any one of claims 1 to 7 when executing the computer program.
CN202111395006.XA 2021-11-23 2021-11-23 Text image detection method, device, medium and equipment Pending CN114067329A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111395006.XA CN114067329A (en) 2021-11-23 2021-11-23 Text image detection method, device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111395006.XA CN114067329A (en) 2021-11-23 2021-11-23 Text image detection method, device, medium and equipment

Publications (1)

Publication Number Publication Date
CN114067329A true CN114067329A (en) 2022-02-18

Family

ID=80279571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111395006.XA Pending CN114067329A (en) 2021-11-23 2021-11-23 Text image detection method, device, medium and equipment

Country Status (1)

Country Link
CN (1) CN114067329A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129456A (en) * 2023-02-09 2023-05-16 广西壮族自治区自然资源遥感院 Method and system for identifying and inputting property rights and interests information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129456A (en) * 2023-02-09 2023-05-16 广西壮族自治区自然资源遥感院 Method and system for identifying and inputting property rights and interests information
CN116129456B (en) * 2023-02-09 2023-07-25 广西壮族自治区自然资源遥感院 Method and system for identifying and inputting property rights and interests information

Similar Documents

Publication Publication Date Title
US20200356802A1 (en) Image processing method and apparatus, electronic device, storage medium, and program product
EP3792818A1 (en) Video processing method and device, and storage medium
CN111767906B (en) Face detection model training method, face detection device and electronic equipment
CN112381104A (en) Image identification method and device, computer equipment and storage medium
CN115861462B (en) Training method and device for image generation model, electronic equipment and storage medium
CN109522807B (en) Satellite image recognition system and method based on self-generated features and electronic equipment
CN115294332B (en) Image processing method, device, equipment and storage medium
CN111932577A (en) Text detection method, electronic device and computer readable medium
Seo et al. Object discovery via contrastive learning for weakly supervised object detection
Yu et al. Learning generalized intersection over union for dense pixelwise prediction
CN108229680B (en) Neural network system, remote sensing image recognition method, device, equipment and medium
CN112927266A (en) Weak supervision time domain action positioning method and system based on uncertainty guide training
CN116486296A (en) Target detection method, device and computer readable storage medium
CN114067329A (en) Text image detection method, device, medium and equipment
CN112417947A (en) Method and device for optimizing key point detection model and detecting face key points
CN116612417A (en) Method and device for detecting lane line of special scene by utilizing video time sequence information
CN113223011A (en) Small sample image segmentation method based on guide network and full-connection conditional random field
CN116980541A (en) Video editing method, device, electronic equipment and storage medium
CN115375656A (en) Training method, segmentation method, device, medium, and apparatus for polyp segmentation model
CN113610016B (en) Training method, system, equipment and storage medium for video frame feature extraction model
CN115359091A (en) Armor plate detection tracking method for mobile robot
CN112989869B (en) Optimization method, device, equipment and storage medium of face quality detection model
CN114240992A (en) Method and system for labeling target object in frame sequence
CN114419018A (en) Image sampling method, system, device and medium
CN114429628A (en) Image processing method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination