CN110880000A

CN110880000A - Picture character positioning method and device, computer equipment and storage medium

Info

Publication number: CN110880000A
Application number: CN201911183212.7A
Authority: CN
Inventors: 王晓珂
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-03-13
Anticipated expiration: 2039-11-27
Also published as: CN110880000B

Abstract

A picture character positioning method, a device, computer equipment and a storage medium are provided, wherein the picture character positioning method comprises the following steps: acquiring a target picture, wherein the target picture comprises characters to be recognized; inputting a target picture into a texture extraction model to obtain a plurality of texture feature layers of the target picture, wherein the texture extraction model is obtained by analyzing texture features in a historical image and is used for extracting the texture feature layers in the input picture; screening a basic texture characteristic layer from the plurality of texture characteristic layers; performing characteristic superposition on the basic texture characteristic layer to obtain a texture characteristic layer of the character to be recognized; and acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized. The method has short time consumption and high accuracy.

Description

Picture character positioning method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a picture character positioning method, a picture character positioning device, computer equipment and a storage medium.

Background

With the increasing content of internet pictures and the increasing number of office scanning and printing files, scanned and printed pictures are often required to be converted into text information, so how to realize efficient text recognition and detection and quickly complete the conversion of texts in the scanned and printed pictures becomes a problem which needs to be solved urgently. The current common character recognition and detection scheme generally comprises two parts of character positioning and character recognition, and the accuracy and the efficiency of character line positioning and the accuracy of character recognition are in a linear proportional relation.

The existing character line positioning method is divided into two categories: anchor point (Anchors) -based line localization methods and segmentation-based line localization methods. The anchor point-based line positioning method has the disadvantages that the receptive field can limit the length of the detection content, and if the detection content is not matched with the receptive field, the accuracy of character positioning can be influenced. The line positioning method based on segmentation needs to introduce a large amount of data to perform model training or data analysis, and a large amount of time is consumed. Therefore, if a method for positioning the characters in the picture with short time consumption and high accuracy can be provided, the efficiency and the accuracy of converting the characters in the scanned and printed picture can be effectively improved.

Disclosure of Invention

The invention solves the technical problem of how to provide a picture character positioning method with short time consumption and high accuracy.

In order to solve the above technical problem, an embodiment of the present invention provides a method for positioning picture characters, where the method includes: acquiring a target picture, wherein the target picture comprises characters to be recognized; inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers of the target picture, wherein the texture extraction model is obtained by analyzing texture features in a historical image and is used for extracting the texture feature layers in the input picture; screening a basic texture characteristic layer from the texture characteristic layers; performing feature superposition on the basic texture feature layer to obtain a texture feature layer of the character to be recognized; and acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized.

Optionally, each of the basic texture feature layers includes a feature matrix corresponding to the text to be recognized, and the obtaining of the texture feature layer of the text to be recognized by performing feature superposition on the basic texture feature layers includes: sequencing the basic texture feature layers according to the order of the dimension of the feature matrix from large to small; and sequentially superposing the sorted basic texture feature layers through upsampling to obtain the texture feature layer of the character to be recognized.

Optionally, the obtaining the position of the text to be recognized in the target picture according to the texture feature layer of the text to be recognized includes: performing characteristic deepening on the texture characteristic layer of the character to be recognized; and segmenting the deepened texture feature layer of the character to be recognized to obtain the position of the character to be recognized in the target picture.

Optionally, the deepening the characteristic of the texture feature layer of the text to be recognized includes: convolving the texture feature layer of the character to be recognized through a convolution layer with a kernel of 3 x 3 and a channel of 128 to obtain a first convolution layer; the first convolutional layer is convolved by a convolutional layer with a kernel of 1 × 1 and a channel of 6.

Optionally, the step of segmenting the deepened texture feature layer of the text to be recognized includes: and utilizing the PSE network to segment the deepened textural feature layer of the character to be recognized.

Optionally, before the feature superposition is performed on the basic texture feature layer, the method further includes: performing characteristic deepening on the basic texture characteristic layer; the step of deepening the features of the basic texture feature layer comprises the following steps: the base texture feature layer is convolved with a convolution layer with a kernel of 1 × 1 and a channel of 128.

Optionally, the texture extraction model is a mobilenetv2 network.

The embodiment of the invention also provides a picture character positioning device, which comprises: the target picture acquisition module is used for acquiring a target picture, and the target picture comprises characters to be identified; the characteristic extraction module is used for inputting the target picture into a texture extraction model to obtain a plurality of texture characteristic layers of the target picture, and the texture extraction model is a model which is obtained by analyzing texture characteristics in a historical image and is used for extracting the texture characteristic layers in the input picture; the basic texture feature layer screening module is used for screening a basic texture feature layer from the texture feature layers; the characteristic superposition module is used for carrying out characteristic superposition on the basic texture characteristic layer to obtain a texture characteristic layer of the character to be recognized; and the positioning module is used for acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized.

The embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores computer instructions capable of running on the processor, and the processor executes the computer instructions to perform any one of the steps of the method described above.

An embodiment of the present invention further provides a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the method of any one of the above-mentioned steps is performed.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a picture character positioning method, which comprises the following steps: acquiring a target picture, wherein the target picture comprises characters to be recognized; inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers of the target picture, wherein the texture extraction model is obtained by analyzing texture features in a historical image and is used for extracting the texture feature layers in the input picture; screening a basic texture characteristic layer from the texture characteristic layers; performing feature superposition on the basic texture feature layer to obtain a texture feature layer of the character to be recognized; and acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized. Compared with the prior art, the method identifies the area corresponding to the character to be identified contained in the target picture by a texture extraction model such as a convolutional neural network and the like to obtain a plurality of texture feature layers of the target picture, selects the basic texture feature layer with the best positioning effect to perform feature superposition, and obtains the position of the character to be identified in the target picture according to the feature distribution in the layer in the texture feature layer of the character to be identified obtained after superposition. The texture extraction model adopted by the method outputs a plurality of texture feature layers of the target picture according to different feature dimensions, so that the number of training samples during model training is reduced, and the pressure of data processing is reduced; and finally obtaining the texture feature layer capable of recognizing the characters to be recognized according to the screening and feature superposition of the texture feature layer, thereby realizing the accurate positioning of the characters to be recognized in the target picture.

Further, the obtained multiple layers of basic texture feature layers are subjected to feature superposition to blend features corresponding to characters to be recognized in the multiple layers of basic texture feature layers, namely feature deepening is performed, so that the obtained texture feature layers of the characters to be recognized have a better effect in positioning the pictures and the characters.

Furthermore, after the texture feature layer of the character to be recognized is obtained, the characteristic of the texture feature layer can be deepened in a convolution mode, so that the effect of the texture feature layer of the character to be recognized is better, and the accuracy of final character positioning is improved. And after the characteristics are deepened, acquiring the position of the character to be recognized in the target picture, recognizing the area of the character in the layer, and marking the character area and the non-character area in a distinguishing way.

Furthermore, the position of the character to be recognized in the target picture is obtained by utilizing the text detection function of the PSEnet, so that the advantages of the PSEnet can be effectively combined, and the accuracy and the real-time performance of character positioning are improved.

Furthermore, before the features of several basic feature texture layers are superimposed, the basic feature texture layers may be further processed, that is, a convolution layer with a kernel of 1 × 1 and a channel of 128 is used, so that the basic feature texture layers have a better effect, and the accuracy of final character positioning is improved.

Drawings

Fig. 1 is a schematic flow chart of a method for positioning picture and text according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating an application of a method for positioning image text according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a picture and text positioning device according to an embodiment of the present invention.

Detailed Description

As background art suggests, existing text line localization methods fall into two broad categories, anchor point (Anchors) -based line localization methods and segmentation-based line localization methods. The anchor point-based line positioning method has the disadvantages that the receptive field can limit the length of the detection content, and if the detection content is not matched with the receptive field, the accuracy of character positioning can be influenced. The line positioning method based on segmentation needs to introduce a large amount of data to perform model training or data analysis, and a large amount of time is consumed.

At present, common character detection and positioning methods based on a Segmentation idea include methods based on a natural Scene Text detection algorithm (Pixellink for short) and a Progressive Scale extension Network (PSEnet for short) of example Segmentation.

Compared with a large-scale deep neural network adopted in the traditional picture character positioning related technology, the method has the advantages that the required parameters are more, the calculated amount is large, a large number of training samples need to be manually marked to achieve the generalization capability of the model, and the inclined character detection needs to be solved by designing a special network structure or adding a preprocessing step. Pixellink can adopt a lightweight model thought and a text segmentation thought to perform character positioning, can achieve a good effect without thousands of millions of training sets, can achieve a satisfactory result on a character detection effect with serious inclination, but adopts a pixel traversal tree method in the subsequent processing of a network, is time-consuming and difficult to achieve an engineering effect.

In order to solve the above problem, embodiments of the present invention provide a method and an apparatus for positioning picture and text, a computer device, and a storage medium. The picture character positioning method comprises the following steps: acquiring a target picture, wherein the target picture comprises characters to be recognized; inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers of the target picture, wherein the texture extraction model is obtained by analyzing texture features in a historical image and is used for extracting the texture feature layers in the input picture; screening a basic texture characteristic layer from the texture characteristic layers; performing feature superposition on the basic texture feature layer to obtain a texture feature layer of the character to be recognized; and acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized.

The picture character positioning method provided by the embodiment of the invention can be used for character recognition based on pictures (such as advertisement picture character recognition, contract content verification based on printing or scanning pieces and the like). By the method, the advantages of the PSE based on Breadth First Search (BFS for short) and the natural scene text detection algorithm based on segmentation can be combined, so that the characters in the picture can be efficiently and accurately positioned.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for positioning picture characters according to an embodiment of the present invention; the method for positioning picture characters may specifically include the following steps S101 to S105.

Step S101, a target picture is obtained, and the target picture comprises characters to be recognized.

The target picture is a picture containing characters to be recognized, and may be a scanned picture of characters or a picture based on printed characters, and the like, and the target picture may be a true color image (i.e., an RGB image). The character to be recognized is the character part in the target picture.

When the characters in a certain target picture need to be identified, the characters to be identified in the target picture need to be positioned first, and then the content of the characters is identified based on the positioning of the characters. The target picture to be recognized may be sent to the recognition terminal, and after the recognition terminal obtains the target picture, the operation of positioning the text to be recognized in the target picture is started, that is, the following steps S102 to S105 are executed.

Step S102, inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers with different feature dimensions of the target picture, wherein the texture extraction model is obtained by analyzing texture features in a historical image and is used for extracting the texture feature layers in the input picture.

The texture features correspond to the distribution of characters in the picture, and the feature dimension is the dimension for identifying the region where the characters in the picture are located.

The texture extraction model is a model which is used for obtaining a texture feature layer of an input picture based on different feature dimensions and is trained by taking a historical image as a training sample according to a character part and a non-character part in the sample. When the characteristic dimension corresponds to the pixel value of the picture, the texture extraction model may use an existing convolutional neural network model (such as MobileNetv2, SqueezeNet, ShuffleNet, and the like), and perform convolution processing on the pixel of the target picture through a plurality of different convolution kernels to obtain a plurality of texture characteristic layers corresponding to the target picture. After the identification terminal obtains the target picture, the target picture passes through a texture extraction model to obtain a plurality of texture feature layers of the target picture. For example, if the texture extraction model is MobileNetv2, 19 texture feature layers may be obtained.

And S103, screening a basic texture feature layer from the texture feature layers.

The basic texture feature layer is the layers with the best character positioning effect in the texture feature layers. After obtaining a plurality of textural feature layers, not all textural feature layers are subjected to the next operation, but are screened according to the identification requirement, and only the basic textural feature layer with the best text positioning effect is reserved.

If the texture extraction model is MobileNetv2, layers 3, 7, 14, and 19 may be used as the base texture feature layer from the 19 texture feature layers, and the dimensions of the feature matrix of each selected layer are 1/2, 1/4, 1/8, and 1/16 of the original image.

And step S104, performing feature superposition on the basic texture feature layer to obtain the texture feature layer of the character to be recognized.

After the basic texture feature layers are obtained, if there is more than one basic texture feature layer, features in the basic texture feature layers need to be superimposed to obtain a texture feature layer representing the position of the character to be recognized in the target picture. When the basic texture feature layer corresponds to a convolution layer obtained by a target picture through a plurality of different convolution kernels, pixel interpolation can be carried out on the basic texture feature layer, and high-dimensional pixel images of the plurality of basic texture feature layers are obtained.

And step S105, acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized.

In the texture feature layer of the character to be recognized, the position of the character to be recognized in the target picture can be obtained according to the feature distribution condition in the texture feature layer, such as the feature pixel distribution corresponding to the character to be recognized in the pixel.

In the method for positioning the characters in the picture, the area corresponding to the characters to be recognized included in the target picture is recognized by texture extraction models such as a convolutional neural network, a plurality of texture feature layers of the target picture are obtained, a basic texture feature layer with the best positioning effect is selected for feature superposition, and the positions of the characters to be recognized in the target picture are obtained according to feature distribution in the basic texture feature layer of the characters to be recognized, which is obtained after superposition. The texture extraction model adopted by the method outputs a plurality of texture feature layers of the target picture according to different feature dimensions, so that the number of training samples during model training is reduced, and the pressure of data processing is reduced; and finally obtaining the texture feature layer capable of recognizing the characters to be recognized according to the screening and feature superposition of the texture feature layer, thereby realizing the accurate positioning of the characters to be recognized in the target picture.

In an embodiment, each of the basic texture feature layers includes a feature matrix corresponding to the text to be recognized, and step S104 in fig. 1 superimposes features of the basic texture feature layers to obtain a texture feature layer of the text to be recognized, which may include: sequencing the basic texture feature layers according to the order of the dimension of the feature matrix from large to small; and sequentially superposing the sorted basic texture feature layers through upsampling to obtain the texture feature layer of the character to be recognized.

The basic texture feature layer obtained in step S103 is obtained by, starting from the order of small matrix dimension to large matrix dimension, upsampling each layer to the matrix dimension of the previous layer, and adding the upsampled texture to the previous layer to obtain a new texture, then upsampling the newly obtained texture and adding the upsampled texture to the next layer, and finally obtaining a layer of texture information, that is, the texture feature layer of the character to be recognized.

In this embodiment, the obtained multiple layers of basic texture feature layers are subjected to feature superposition to blend features corresponding to characters to be recognized in the multiple layers of basic texture feature layers, which is equivalent to single feature deepening, so that the obtained texture feature layers of the characters to be recognized have a better effect in positioning the pictures and the characters.

In one embodiment, the obtaining the position of the text to be recognized in the target picture according to the texture feature layer of the text to be recognized includes: performing characteristic deepening on the texture characteristic layer of the character to be recognized; and segmenting the deepened texture feature layer of the character to be recognized to obtain the position of the character to be recognized in the target picture.

After the texture feature layer of the character to be recognized is obtained, the characteristic of the texture feature layer can be deepened in a convolution mode, so that the effect of the texture feature layer of the character to be recognized is better, and the accuracy of final character positioning is improved. And after the characteristics are deepened, acquiring the position of the character to be recognized in the target picture, recognizing the area of the character in the layer, and marking the character area and the non-character area in a distinguishing way.

In one embodiment, the performing feature deepening on the texture feature layer of the text to be recognized includes: convolving the texture feature layer of the character to be recognized through a convolution layer with a kernel of 3 x 3 and a channel of 128 to obtain a first convolution layer; the first convolutional layer is convolved by a convolutional layer with a kernel of 1 × 1 and a channel of 6.

The deepening method can be to make the texture feature layer of the character to be recognized pass through a convolution layer with a kernel of 3 × 3 and a channel of 128 and a convolution layer with a kernel of 1 × 1 and a channel of 6.

In one embodiment, the segmenting the deepened texture feature layer of the text to be recognized includes: and utilizing the PSE network to segment the deepened textural feature layer of the character to be recognized.

Specifically, the text detection function of the PSEnet can be utilized to segment the obtained texture feature layer of the character to be recognized, and the position of the character to be recognized in the target picture is represented as a plurality of corresponding areas in the texture feature layer of the character to be recognized.

In the embodiment, the position of the character to be recognized in the target picture is obtained by utilizing the text detection function of the PSEnet, so that the advantages of the PSEnet can be effectively combined, and the accuracy and the real-time performance of character positioning are improved.

In one embodiment, before the feature superposition of the base texture feature layer, the method further includes: performing characteristic deepening on the basic texture characteristic layer; the step of deepening the features of the basic texture feature layer comprises the following steps: the base texture feature layer is convolved with a convolution layer with a kernel of 1 × 1 and a channel of 128.

Before the features of several basic feature texture layers are superposed, the basic feature texture layers can be deepened, namely, a convolution layer with a kernel of 1 × 1 and a channel of 128 is used, so that the basic feature texture layers have better effect, and the accuracy of final character positioning is improved.

In one embodiment, the texture extraction model is a mobilenetv2 network. Referring to fig. 2, fig. 2 is a schematic diagram of an application of the method for locating text in an image according to an embodiment of the present invention, where a texture extraction model in the application scene is a mobilenetv2 network.

Acquiring a target image needing to position characters in the image, wherein the target image is an RGB image, and inputting the target image into a mobilenetv2 network to obtain 19 layers of image texture feature layers (features); taking the 3 rd layer, the 7 th layer, the 14 th layer and the 19 th layer in the 19-layer texture as basic texture feature layers; respectively passing the four layers of basic texture feature layers through a convolution layer (Conv 1X 1128) with a kernel of 1X 1 and a channel of 128 to obtain four deeper texture feature layers; for four deeper texture feature layers, starting from the texture feature layer with the smallest dimension matrix, each layer is subjected to up-sampling processing (upsample) to the dimension matrix of the previous layer and is added with the previous layer to obtain a new texture, and then the newly obtained texture is subjected to up-sampling processing (upsample) and is added with the next layer to finally obtain a texture feature layer; and passing the obtained texture feature layer through a convolution layer (Conv3 × 3128) with a kernel of 3 × 3 and a channel of 128, and passing through a convolution layer (Conv1 × 16) with a kernel of 1 × 1 and a channel of 6, and finally obtaining a positioning result and outputting the positioning result. And performing character region detection on the output result by utilizing the PSE network to obtain a region corresponding to the positioning information of the picture characters.

The picture character positioning method provided by the embodiment of the invention can effectively improve the accuracy and the real-time performance of character positioning, thereby improving the effectiveness of identification.

Referring to fig. 3, fig. 3 provides a picture text positioning device according to an embodiment of the present invention, which includes a target picture obtaining module 301, a feature extracting module 303, a basic texture feature layer screening module 303, a feature superimposing module 304, and a positioning module 305, where:

the target picture acquiring module 301 is configured to acquire a target picture, where the target picture includes characters to be recognized.

The feature extraction module 303 is configured to obtain a plurality of texture feature layers of the target image by inputting the target image into a texture extraction model, where the texture extraction model is a model obtained by analyzing texture features in a historical image and used to extract the texture feature layers in the input image.

And a basic texture feature layer screening module 303, configured to screen a basic texture feature layer from the texture feature layers.

And a feature superposition module 304, configured to perform feature superposition on the basic texture feature layer to obtain a texture feature layer of the text to be recognized.

A positioning module 305, configured to obtain a position of the text to be recognized in the target picture according to the texture feature layer of the text to be recognized.

In an embodiment, each of the base texture feature layers includes a feature matrix corresponding to the text to be recognized, please continue to refer to fig. 3, and the feature superposition module 304 in fig. 3 may include:

and the sorting unit is used for sorting the basic texture feature layers according to the order of the dimension of the feature matrix from large to small.

And the superposition unit is used for sequentially superposing the sequenced basic texture feature layers through upsampling to obtain the texture feature layers of the characters to be recognized.

In one embodiment, with continued reference to fig. 3, the positioning module 305 includes:

and the characteristic deepening unit is used for deepening the characteristics of the texture characteristic layer of the character to be recognized.

And the positioning unit is used for segmenting the deepened texture feature layer of the character to be recognized to obtain the position of the character to be recognized in the target picture.

In one embodiment, the above feature deepening unit may include:

and the first convolution subunit is used for convolving the texture feature layer of the character to be recognized through a convolution layer with a kernel of 3 x 3 and a channel of 128 to obtain a first convolution layer.

And the second convolution subunit is used for convolving the first convolution layer by a convolution layer with a kernel of 1 × 1 and a channel of 6.

In an embodiment, the positioning unit may be further configured to utilize a PSE network to segment the deepened texture feature layer of the text to be recognized.

In one embodiment, the image text positioning device is further configured to deepen features of the basic texture feature layer; that is, the base texture feature layer is convolved with a convolution layer with a kernel of 1 × 1 and a channel of 128.

For more details of the working principle and working mode of the above-mentioned picture and text positioning device, reference may be made to the related description of the picture and text positioning method in fig. 1, which is not repeated herein.

Further, the embodiment of the present invention further discloses a computer device, which includes a memory and a processor, where the memory stores a computer instruction capable of running on the processor, and the processor executes the technical solution of the picture and text positioning method in the embodiment shown in fig. 1 to 2 when running the computer instruction.

Further, the embodiment of the present invention further discloses a storage medium, on which a computer instruction is stored, and when the computer instruction runs, the technical solution of the picture character positioning method in the embodiments shown in fig. 1 to fig. 2 is executed. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transient) memory. The storage medium may include ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A picture character positioning method is characterized by comprising the following steps:

acquiring a target picture, wherein the target picture comprises characters to be recognized;

inputting the target picture into a texture extraction model to obtain a plurality of texture feature layers with different feature dimensions of the target picture, wherein the texture extraction model is obtained by analyzing texture features in a historical image and is used for extracting the texture feature layers in the input picture;

screening a basic texture characteristic layer from the texture characteristic layers;

performing feature superposition on the basic texture feature layer to obtain a texture feature layer of the character to be recognized;

and acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized.

2. The method according to claim 1, wherein each of the base texture feature layers includes a feature matrix corresponding to the text to be recognized, and the obtaining the texture feature layer of the text to be recognized by performing feature superposition on the base texture feature layers includes:

sequencing the basic texture feature layers according to the order of the dimension of the feature matrix from large to small;

and sequentially superposing the sorted basic texture feature layers through upsampling to obtain the texture feature layer of the character to be recognized.

3. The method according to claim 1, wherein the obtaining the position of the text to be recognized in the target picture according to the texture feature layer of the text to be recognized comprises:

performing characteristic deepening on the texture characteristic layer of the character to be recognized;

and segmenting the deepened texture feature layer of the character to be recognized to obtain the position of the character to be recognized in the target picture.

4. The method according to claim 3, wherein the step of performing feature deepening on the texture feature layer of the text to be recognized comprises:

convolving the texture feature layer of the character to be recognized through a convolution layer with a kernel of 3 x 3 and a channel of 128 to obtain a first convolution layer;

the first convolutional layer is convolved by a convolutional layer with a kernel of 1 × 1 and a channel of 6.

5. The method according to claim 3, wherein the segmenting the deepened textural feature layer of the text to be recognized comprises:

and utilizing the PSE network to segment the deepened textural feature layer of the character to be recognized.

6. The method according to claim 1, wherein before the feature superposition of the base texture feature layer, further comprising:

performing characteristic deepening on the basic texture characteristic layer;

the step of deepening the features of the basic texture feature layer comprises the following steps:

the base texture feature layer is convolved with a convolution layer with a kernel of 1 × 1 and a channel of 128.

7. The method of claim 1, wherein the texture extraction model is a mobilenetv2 network.

8. A picture and text positioning device, the device comprising:

the target picture acquisition module is used for acquiring a target picture, and the target picture comprises characters to be identified;

the characteristic extraction module is used for inputting the target picture into a texture extraction model to obtain a plurality of texture characteristic layers of the target picture, and the texture extraction model is a model which is obtained by analyzing texture characteristics in a historical image and is used for extracting the texture characteristic layers in the input picture;

the basic texture feature layer screening module is used for screening a basic texture feature layer from the texture feature layers;

the characteristic superposition module is used for carrying out characteristic superposition on the basic texture characteristic layer to obtain a texture characteristic layer of the character to be recognized;

and the positioning module is used for acquiring the position of the character to be recognized in the target picture according to the texture feature layer of the character to be recognized.

9. A computer device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1 to 7.

10. A storage medium having stored thereon computer instructions, wherein said computer instructions when executed perform the steps of the method of any of claims 1 to 7.