CN114067329A

CN114067329A - Text image detection method, device, medium and equipment

Info

Publication number: CN114067329A
Application number: CN202111395006.XA
Authority: CN
Inventors: 朱浩; 李丽; 孟彦伟
Original assignee: China Post Information Technology Beijing Co ltd
Current assignee: China Post Information Technology Beijing Co ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-02-18

Abstract

The embodiment of the application discloses a text image detection method, a text image detection device, a text image detection medium and text image detection equipment. Wherein, the method comprises the following steps: acquiring a text image to be detected; carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network. According to the technical scheme, the generalization capability of the character detection model is enhanced by building the multi-branch character detection model, accurate region detection box regression and character segmentation are realized, the robustness and accuracy of text image detection are effectively improved, and the detection time is reduced.

Description

Text image detection method, device, medium and equipment

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a text image detection method, a text image detection device, a text image detection medium and text image detection equipment.

Background

With the continuous development of deep learning technology, character detection plays an important role as an important part of optical character recognition technology. The character detection has wide application scenes, such as character detection scenes of identity cards, bank cards, invoices, express bills and the like.

In the prior art, character detection in a text image is mainly to perform character region positioning on the text image by adopting a target segmentation algorithm, and then perform character segmentation on a character region, thereby achieving the purpose of character detection.

The method based on character segmentation can effectively solve the problem of character direction distribution diversity, but the existing character detection algorithm is easy to have the problems of false detection, missing detection, offset of a detection frame and the like under the conditions of complex background, fuzzy characters and the like.

Disclosure of Invention

The embodiment of the application provides a text image detection method, a text image detection device, a text image detection medium and text image detection equipment.

In a first aspect, an embodiment of the present application provides a method for detecting a text image, where the method includes:

acquiring a text image to be detected;

carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.

In a second aspect, an embodiment of the present application provides an apparatus for detecting a text image, where the apparatus includes:

the text image acquisition module is used for acquiring a text image to be detected;

the character detection result determining module is used for carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.

In a third aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the text image detection method according to the present application.

In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor executes the computer program to implement the text image detection method according to the embodiment of the present application.

According to the technical scheme provided by the embodiment of the application, the text image to be detected is obtained, and the text detection is carried out on the text image to be detected by utilizing the pre-trained text detection model, so that the text detection result is obtained. The character detection model comprises a backbone network, an area detection branch and a character segmentation branch. The region detection branch and the character segmentation branch are connected behind the backbone network. According to the scheme, the generalization capability of the character detection model can be enhanced by building the multi-branch character detection model, the regression and character segmentation of the accurate character region detection box are realized, the robustness and the accuracy of text image detection are effectively improved, and the detection time is reduced.

Drawings

Fig. 1A is a flowchart of a text image detection method according to an embodiment of the present application;

fig. 1B is a schematic structural diagram of a text detection model according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for detecting a text image according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a text image detection apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1A is a flowchart of a text image detection method provided in an embodiment of the present application, where this embodiment is applicable to any text image detection scenario, and the method may be executed by a text image detection apparatus provided in an embodiment of the present application, where the apparatus may be implemented by software and/or hardware, and may be integrated in an electronic device.

As shown in fig. 1A, the method for detecting a text image includes:

and S110, acquiring a text image to be detected.

The scheme can be executed by electronic equipment such as a computer, a server, a workstation and the like, and on the basis, the electronic equipment can also be provided with image acquisition equipment. The electronic equipment can autonomously acquire the text image to be detected. The text image can be picture data such as an identity card, a bank card, a text screenshot, an invoice and an express bill, and can also be video data such as a monitoring, a movie and a documentary. The mode of acquiring the text image to be detected can be to read the picture data acquired by the image receiving device in real time, such as an identification card recognition scene. The electronic equipment can also selectively acquire the text image to be detected under the control of the detection instruction, such as scenes of video understanding, video searching and the like.

After the text image to be detected is obtained, if the text image to be detected is fuzzy, the target size is not suitable and the like, the electronic equipment can preprocess the text image to be detected through image enhancement, image cutting, image scaling and other means so as to improve the image quality.

S120, carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.

By utilizing the pre-trained character detection model, the electronic equipment can perform character detection on the text image to be detected and obtain a detection result. Fig. 1B is a schematic structural diagram of a text detection model provided in an embodiment of the present application, and as shown in fig. 1B, the text detection model includes a backbone network, an area detection branch and a text segmentation branch, and the area detection branch and the text segmentation branch are both connected behind the backbone network. The region detection branch can be used for positioning a region where characters in the text image to be detected are located, and labeling the character region by using a detection box with a regular shape. The text segmentation branch can be used for further segmenting the text in the text area, and determining an inner boundary and an outer boundary in the text area. The text detection model may be a deep learning model, which may be a convolutional neural network, for example. The backbone network can be any network structure with a feature extraction function, for example, a classical convolutional neural network structure such as AlexNet, VGG-Net and ResNet, or an autonomously built network structure meeting the scene requirements. The area detection branch may have a network structure having a feature extraction function, or may have a structure having only an area detection function. Similarly, the character segmentation branch may have a structure with a feature extraction function, or may have only a structure with a character segmentation function. If the region detection branch and the character segmentation branch both have network structures with feature extraction functions, the network structures may be the same or different.

In this scheme, optionally, the text segmentation branches include a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch; the first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network.

In order to realize the targeted character segmentation target, the character segmentation branch can also continue to branch according to the attention point of the character segmentation. As shown in fig. 1B, the text segmentation branch may include a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch. The first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network. The first text division branch may be used for an irregular area composed of an inner boundary and an outer boundary in the text area of interest. The second text segmentation branch may be used to focus on the difference between the text and the background in the text region.

According to the scheme, the character segmentation branches are branched again, so that attention to different characteristics in the character segmentation process can be paid to, and character detection with higher discrimination is facilitated.

In order to enable each branch to achieve the extraction of the target feature, the electronic device may set a different loss function for each branch. In one possible solution, optionally, the region detection branch is provided with a first loss function; the first character segmentation branch is provided with a second loss function; and the second character segmentation branch is provided with a third loss function.

On the basis of the above scheme, optionally, the first loss function expression is:

L₁＝λ₁l₁；

l₁＝l_A+l_B；

wherein L is₁Representing a first loss function, λ₁For a predetermined coefficient, /)_ARepresenting regression of the length and width of a text region,/_BExpressing regression to character region, x, Y expressing coordinate position of pixel point of text image, c expressing quantity of extracted characteristic channel, Y_xycIndicating the probability of text being present at the coordinates x, y,

the probability of the character detection model predicting the existence of characters at the coordinates x and y is represented, alpha and beta are hyper-parameters, and N represents the number of key points of the text image;

and, the second penalty function expression is:

L₂＝λ₂l₂；

wherein L is₂Representing a second loss function, λ₂For the preset coefficients, (i, j) represents the coordinate position of the feature map, S_lRepresents a characteristic diagram, Y_(i，j)The literal real result representing the position of the feature map (i, j),

a text prediction result representing the position of the feature map (i, j);

and, the third loss function expression is:

L₃＝λ₃l₃；

wherein L is₃Representing a second loss function, λ₃P represents a probability map of the text region, T represents a segmentation map of the text region,gamma is a hyperparameter;

wherein λ is₁+λ₂+λ₃＝1。

According to the scheme, the multi-branch structure is designed, so that the character detection model can learn different distribution characteristics of the text, meanwhile, different loss functions are designed for different branch tasks, and the purpose is to improve the robustness of the character detection model on character detection and improve the generalization capability. The scheme can effectively solve the problems of character false detection, missed detection and detection frame offset caused by the conditions of image blurring, complex background and the like.

Further, in the above-described aspect, by presetting the coefficient λ₁、λ₂And λ₃The influence degree of each branch on the main network and the segmentation sharing network of the character detection model can be set respectively, thereby being beneficial to realizing flexible and effective network regulation, accelerating the convergence speed of the character detection model and improving the detection effect.

Optionally, the training process of the text detection model includes:

acquiring text image training data;

inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the character segmentation branches to obtain character segmentation features;

and optimizing the backbone network according to the region detection characteristics and the character segmentation characteristics until preset conditions are met to obtain a training result of the character detection model.

When a character detection model is trained, text image training data, which may be a large number of original text images, needs to be acquired first. Under the condition of less text image training data, the electronic equipment can also utilize a small amount of original text images to perform amplification through rotation, mirror image, scaling and other modes, so that diversified text image training data are obtained, and a robust character detection model is favorably trained. Under the condition that the text image is difficult to acquire, the electronic equipment can also construct and generate text image training data through other types of text image templates and text data.

After a sufficient amount of text image training data is acquired, the electronic equipment can input the text image training data into the character detection model in batches for subsequent training. On the premise that the hardware environment condition allows, the electronic equipment can also input all text image training data into the character detection model for training. After passing through the backbone network, the character detection model can learn the intermediate features in the text image training data, and transmit the intermediate features to the region detection branch and the character segmentation branch.

And the intermediate features are subjected to the region detection branch to obtain region detection features, and the region detection features are fed back to the backbone network, so that the backbone network can be continuously optimized, and features which meet the region detection requirements better are extracted. And the intermediate features are subjected to the character segmentation branch to obtain character segmentation features, and the character segmentation features are fed back to the backbone network, so that the backbone network can be continuously optimized, and features which are more in line with the character segmentation requirements are extracted. The backbone network is continuously optimized through the two branches, and the backbone network is continuously learned. After the training is stable, the backbone network can extract the features meeting the regional detection requirements and can also extract the features meeting the text segmentation requirements.

It should be noted that the preset condition may be that the loss rates of the region detection branch and the text segmentation branch both satisfy respective preset thresholds. The preset condition may also be that loss curves of the region detection branch and the character segmentation branch drawn according to the training iteration number both conform to a preset form. Besides, the electronic device can calculate the overall loss rate of the character detection model or draw a loss curve according to the loss rate of each branch and the respective proportionality coefficient. Whether the training requirement is met is determined by the overall loss rate or loss curve. The judgment basis of the preset condition can be loss rate, loss curve, accuracy rate, recall rate, accuracy and the like. The preset condition may be determined according to one or more criteria.

According to the scheme, the main network can be optimized by utilizing the region detection characteristics and the character segmentation characteristics. The scheme can enhance the robustness of the backbone network, so that the backbone network can carry out targeted learning, the training speed is further accelerated, and the stability of the character detection model training is ensured.

Example two

Fig. 2 is a flowchart of a text image detection method according to a second embodiment of the present invention, which is optimized based on the above-described embodiment.

As shown in fig. 2, the method of this embodiment specifically includes the following steps:

and S210, acquiring text image training data.

S220, inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; and the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features.

It can be understood that, according to the structure of the text detection model shown in fig. 1B, the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features. The electronic device may continue to communicate the segmentation sharing feature.

S230, transmitting the segmentation sharing feature to a first character segmentation branch and a second character segmentation branch; the segmentation shared features pass through the first character segmentation branch to obtain first character segmentation features; and the segmentation sharing feature passes through the second character segmentation branch to obtain a second character segmentation feature.

The electronic device may transmit the segmentation sharing feature to the first text segmentation branch and the second text segmentation branch, and obtain a corresponding segmentation feature.

S240, optimizing the segmentation shared network according to the first character segmentation characteristic and the second character segmentation characteristic; optimizing the backbone network according to the region detection feature, the first character segmentation feature and the second character segmentation feature; and obtaining the training result of the character detection model until the preset conditions are met.

According to the first character segmentation characteristic and the second character segmentation characteristic, the electronic equipment can optimize the segmentation sharing network by continuously feeding back different segmentation characteristics fed back by the first character segmentation branch and the second character segmentation branch to the segmentation sharing network. Meanwhile, the electronic equipment can also optimize the backbone network by continuously feeding back respective characteristics of the first character segmentation branch, the second character segmentation branch and the area detection branch to the backbone network. And when the training result of the character detection model reaches the preset condition, the character detection model is considered to be successfully trained.

And S250, acquiring a text image to be detected.

S260, carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model; the character detection model is a deep learning model.

And S270, determining a character detection result according to the character segmentation branches.

When the character detection of the actual scene is carried out, the electronic equipment does not need to determine the final detection result according to the output results of the multiple branches. The character detection result is determined only according to the output result of the character segmentation branch. The reason is that the character detection model is to realize character detection finally, and the region detection branch is only used for marking a regular region where a character is located, so that the detection of a character boundary cannot be realized. Furthermore, the electronic device may determine the text detection result only according to the output result of the first text segmentation branch or the second text segmentation branch. It should be noted that, due to the robustness of the text detection model, the determined text detection result is still accurate according to a single branch, and the text detection model has general adaptability.

According to the scheme, branches can be cut according to characters, and character detection results are determined, so that the character detection speed is accelerated, and the time is saved.

According to the technical scheme provided by the embodiment of the application, the shared network is divided for optimization by utilizing the first character dividing characteristic and the second character dividing characteristic; optimizing the backbone network through the region detection feature, the first character segmentation feature and the second character segmentation feature; and further obtaining a training result of the character detection model meeting the preset conditions. And determining a character detection result according to the first character segmentation branch by using the trained deep learning model. According to the scheme, the generalization capability of the character detection model can be enhanced by building the multi-branch character detection model, the regression and character segmentation of the accurate character region detection box are realized, the robustness and the accuracy of text image detection are effectively improved, and the detection time is reduced.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a text image detection apparatus according to a third embodiment of the present invention, which is capable of executing a text image detection method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 3, the apparatus may include:

the to-be-detected text image acquisition module 310 is configured to acquire a to-be-detected text image;

the character detection result determining module 320 is configured to perform character detection on the text image to be detected by using a pre-trained character detection model to obtain a character detection result; the character detection model comprises a backbone network, an area detection branch and a character segmentation branch; the region detection branch and the character segmentation branch are connected behind the backbone network.

In one possible solution, optionally, the region detection branch is provided with a first loss function; the first character segmentation branch is provided with a second loss function; and the second character segmentation branch is provided with a third loss function.

the first loss function expression is:

L₁＝λ₁l₁；

l₁＝l_A+l_B；

a summary indicating that the text detection model predicts the presence of text at coordinates x, yThe rate, alpha and beta are hyper-parameters, and N represents the number of key points of the text image;

and, the second penalty function expression is:

L₂＝λ₂l₂；

a text prediction result representing the position of the feature map (i, j);

and, the third loss function expression is:

L₃＝λ₃l₃；

wherein L is₃Representing a second loss function, λ₃The method comprises the following steps that (1) a preset coefficient is obtained, P represents a probability graph of a character region, T represents a segmentation graph of the character region, and gamma is a hyper-parameter;

wherein λ is₁+λ₂+λ₃＝1。

This scheme, optional, the device still includes characters detection model training module for:

acquiring text image training data;

In an optional preferred embodiment, the text detection model training module is specifically configured to:

acquiring text image training data;

inputting the text image training data into a main network of a character detection model to obtain intermediate features, and transmitting the intermediate features to an area detection branch and a character segmentation branch; the intermediate features are subjected to the region detection branch to obtain region detection features; the intermediate features are subjected to the segmentation sharing network to obtain segmentation sharing features;

transmitting the segmentation sharing feature to a first character segmentation branch and a second character segmentation branch; the segmentation shared features pass through the first character segmentation branch to obtain first character segmentation features; the segmentation sharing feature passes through the second character segmentation branch to obtain a second character segmentation feature;

optimizing the segmentation sharing network according to the first character segmentation characteristic and the second character segmentation characteristic; optimizing the backbone network according to the region detection feature, the first character segmentation feature and the second character segmentation feature; and obtaining the training result of the character detection model until the preset conditions are met.

In another preferred embodiment, optionally, the text detection result determining module 320 is specifically configured to:

carrying out character detection on the text image to be detected by utilizing a pre-trained character detection model; the character detection model is a deep learning model;

and determining a character detection result according to the character segmentation branches.

The product can execute the text image detection method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

A fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for detecting a text image according to the embodiments of the present invention:

acquiring a text image to be detected;

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

EXAMPLE five

The fifth embodiment of the application provides electronic equipment. Fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. As shown in fig. 4, the present embodiment provides an electronic device 400, which includes: one or more processors 420; the storage device 410 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 420, the one or more processors 420 implement the method for detecting a text image provided in the embodiment of the present application, the method includes:

acquiring a text image to be detected;

Of course, those skilled in the art can understand that the processor 420 also implements the technical solution of the text image detection method provided in any embodiment of the present application.

The electronic device 400 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 4, the electronic device 400 includes a processor 420, a storage device 410, an input device 430, and an output device 440; the number of the processors 420 in the electronic device may be one or more, and one processor 420 is taken as an example in fig. 4; the processor 420, the storage device 410, the input device 430, and the output device 440 in the electronic apparatus may be connected by a bus or other means, and are exemplified by a bus 450 in fig. 4.

The storage device 410 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the text image detection method in the embodiment of the present application.

The storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 410 may further include memory located remotely from processor 420, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and function control of the electronic device. The output device 440 may include a display screen, speakers, or other electronic equipment.

The electronic equipment provided by the embodiment of the application can enhance the generalization capability of the character detection model by building the character detection model with multiple branches, and realizes accurate character region detection box regression and character segmentation.

The detection device, the medium and the electronic device for the text image provided in the embodiments above can execute the detection method for the text image provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For details of the text image detection method, reference may be made to any of the embodiments of the present application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for detecting a text image, the method comprising:

acquiring a text image to be detected;

2. The method of claim 1, wherein the text segmentation branch comprises a segmentation sharing network, a first text segmentation branch, and a second text segmentation branch; the first character segmentation branch and the second character segmentation branch are connected behind the segmentation sharing network.

3. The method according to claim 2, characterized in that the region detection branch is provided with a first loss function; the first character segmentation branch is provided with a second loss function; and the second character segmentation branch is provided with a third loss function.

4. The method of claim 3, wherein the first penalty function expression is:

L₁＝λ₁l₁；

l₁＝l_A+l_B；

and, the second penalty function expression is:

L₂＝λ₂l₂；

a text prediction result representing the position of the feature map (i, j);

and, the third loss function expression is:

L₃＝λ₃l₃；

wherein λ is₁+λ₂+λ₃＝1。

5. The method of claim 1, wherein the training process of the text detection model comprises:

acquiring text image training data;

6. The method of claim 2, wherein the training process of the text detection model comprises:

acquiring text image training data;

7. The method according to claim 1, wherein the performing text detection on the text image to be detected by using a pre-trained text detection model to obtain a text detection result comprises:

8. An apparatus for detecting a text image, the apparatus comprising:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of detecting a text image according to any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of detecting a text image according to any one of claims 1 to 7 when executing the computer program.