CN112395450B

CN112395450B - Picture character detection method and device, computer equipment and storage medium

Info

Publication number: CN112395450B
Application number: CN202011286320.XA
Authority: CN
Inventors: 左彬靖
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2024-03-19
Anticipated expiration: 2040-11-17
Also published as: WO2022105120A1; CN112395450A

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a picture character detection method, which comprises the steps of calculating the complexity of a target detection picture according to a preset detection model when the target detection picture is received; when the complexity is low, calculating according to a first annotation model in a preset annotation model to obtain target text coordinates of a first text box in the target detection picture; calculating the center coordinate of the target detection picture according to the target text coordinate, fusing the first text boxes with the center coordinate smaller than or equal to the preset error value into a new text box, and determining the first text boxes with the center coordinate larger than the preset error value as fixed text boxes; and extracting text information in the new text box and the fixed text box, and determining the text information as a detection text. The application also provides a picture character detection device, computer equipment and a storage medium. In addition, the present application relates to blockchain technology in which detected text may be stored. The method and the device realize efficient detection of the picture characters.

Description

Picture character detection method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for detecting image text, a computer device, and a storage medium.

Background

With the rapid development of the target detection technology, more and more fields use text detection methods in the target detection technology, such as payment, identification card identification, and the like. By identifying the text in the picture, information in the picture can be extracted.

Currently, an algorithm based on FPN (feature pyramid networks, feature pyramid network) has poor detection effect for smaller and dense characters, while an algorithm based on pixel level has relatively high precision, but the processing time of a model is long, so that the industrialization requirement is difficult to meet. In addition, the picture text detection is mainly used for extracting useful information in the picture, such as fields of name, address, account information and the like, so that the follow-up database of the parameters is convenient, and data is provided for a follow-up wind control system. However, a picture may include a lot of information, and a more complex picture may include more than one hundred fields, and when text detection is performed on the picture by the prior art, there is a problem that text detection efficiency is low.

Disclosure of Invention

An aim of the embodiment of the application is to provide a method, a device, a computer device and a storage medium for detecting picture characters, so as to solve the technical problem of low picture character detection efficiency.

In order to solve the above technical problems, the embodiments of the present application provide a method for detecting picture characters, which adopts the following technical scheme:

when a target detection picture is received, calculating the complexity of the target detection picture according to a preset detection model;

when the complexity is low, acquiring a feature vector of the target detection picture according to a first labeling model in a preset labeling model, and calculating according to the feature vector to obtain a target text coordinate of a first text box in the target detection picture;

calculating the center coordinate of the target detection picture according to the target text coordinate, fusing a first text box with the center coordinate smaller than or equal to a preset error value into a new text box, and determining the first text box with the center coordinate larger than the preset error value as a fixed text box;

and extracting text information in the new text box and the fixed text box, and determining the text information as a detection text of the target detection picture.

Further, the preset error value includes a first error value and a second error value, and the step of fusing the first text box with the center coordinate less than or equal to the preset error value into a new text box specifically includes:

acquiring a first pixel difference value of y-axis coordinates of two adjacent center coordinates and a second pixel difference value of x-axis coordinates of the center coordinates;

and fusing the first text box with the first pixel difference value smaller than or equal to the first error value and the second text box with the second pixel difference value smaller than or equal to the second error value into a new text box.

Further, after the step of calculating the complexity of the target detection picture according to the preset detection model, the method includes:

when the complexity is high, acquiring a minimum picture corresponding to the target detection picture and a minimum text coordinate of a second text box in the minimum picture according to a second labeling model in a preset labeling model;

and mapping the minimum text coordinate to the maximum picture corresponding to the target detection picture in parallel to obtain the detection text coordinate of the target detection picture, and calculating according to the detection text coordinate to obtain the detection text corresponding to the target detection picture.

Further, the step of mapping the minimum text coordinate to the maximum picture corresponding to the target detection picture in parallel to obtain the detection text coordinate of the target detection picture specifically includes:

and obtaining a preset mapping proportion, and amplifying the minimum text coordinate in parallel according to the preset mapping proportion to obtain the detection text coordinate of the target detection picture.

Further, the step of calculating the complexity of the target detection picture according to the preset detection model specifically includes:

inputting the target detection picture to a convolution layer of the preset detection model, and outputting to obtain a detection result value through a pooling layer and a full connection layer;

and predicting the detection result value according to a preset classification loss function to obtain the complexity of the target detection picture.

Further, before the step of obtaining the feature vector of the target detection picture according to the first labeling model in the preset labeling models, the method further includes:

acquiring an initial text picture, dividing the initial text picture into a training picture and a test picture, and inputting the training picture into a preset basic annotation model to obtain the annotation text coordinates of the training picture;

Calculating a loss function of the basic annotation model according to the annotation text coordinates, and determining the basic annotation model as a trained basic annotation model when the loss function converges;

and verifying the trained basic annotation model according to the test picture, and determining that the trained basic annotation model is a preset annotation model when the verification passing rate of the trained basic annotation model on the test picture is greater than or equal to the preset passing rate.

Further, the step of calculating the loss function of the basic annotation model according to the annotation text coordinates specifically includes:

labeling the training pictures based on a preset labeling tool to obtain initial text coordinates of the training pictures;

and calculating the square difference between the initial text coordinates and the labeling text coordinates, and calculating the loss function of the basic labeling model according to the square difference.

In order to solve the above technical problems, the embodiment of the present application further provides a device for detecting characters of pictures, which adopts the following technical scheme:

the detection module is used for calculating the complexity of the target detection picture according to a preset detection model when the target detection picture is received;

The labeling module is used for acquiring the feature vector of the target detection picture according to a first labeling model in a preset labeling model when the complexity is low, and calculating according to the feature vector to obtain the target text coordinate of a first text box in the target detection picture;

the confirming module is used for calculating the center coordinate of the target detection picture according to the target text coordinate, fusing the first text boxes with the center coordinate smaller than or equal to a preset error value into a new text box, and determining the first text boxes with the center coordinate larger than the preset error value as fixed text boxes;

and the extraction module is used for extracting the text information in the new text box and the fixed text box and determining the text information as the detection text of the target detection picture.

In order to solve the above technical problem, the embodiments of the present application further provide a computer device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the steps of the above-mentioned photo text detection method are implemented when the processor executes the computer readable instructions.

In order to solve the above technical problem, the embodiments of the present application further provide a computer readable storage medium, where computer readable instructions are stored, where the steps of the above-mentioned photo text detection method are implemented when the computer readable instructions are executed by a processor.

According to the picture text detection method, when the target detection picture is received, the complexity of the target detection picture is calculated according to the preset detection model, and the model of the target detection picture can be selected according to the complexity, so that the target detection picture is further subjected to targeted text detection, and the picture text detection efficiency is improved; when the complexity is low, acquiring a feature vector of a target detection picture according to a first labeling model in a preset labeling model, calculating according to the feature vector to obtain a target text coordinate of a first text box in the target detection picture, and accurately positioning text information of the target detection picture through the target text coordinate; then, calculating the center coordinates of the target detection picture according to the target text coordinates, fusing the first text boxes with the center coordinates smaller than or equal to the preset error value into new text boxes, and determining the first text boxes with the center coordinates larger than the preset error value as fixed text boxes, so that the false splitting of picture characters in low complexity is avoided, and the accuracy of picture character detection is improved; finally, extracting text information in the new text box and the fixed text box, determining the text information as a detection text of the target detection picture, realizing the text detection of pictures with different complexity, reducing the manual labeling cost, saving the response time of model processing, and further improving the efficiency and accuracy of the picture text detection.

Drawings

For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a photo text detection method according to the present application;

FIG. 3 is a schematic diagram illustrating the construction of one embodiment of a pictographic detection device according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Reference numerals: the device comprises a picture character detection device 300, a detection module 301, a labeling module 302, a confirmation module 303 and an extraction module 304.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the method for detecting the picture text provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the picture text detection apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow chart of one embodiment of a method of photo text detection according to the present application is shown. The picture character detection method comprises the following steps:

Step S201, when a target detection picture is received, calculating the complexity of the target detection picture according to a preset detection model;

in this embodiment, the target detection picture is a detection picture including a target text, and the complexity of the target detection picture is calculated according to a preset detection model; the preset detection model is a preset picture complexity detection model, such as a lightweight convolutional neural network discrimination model based on VGG 16. Specifically, inputting a target detection picture into the preset detection model, calculating the length, width and channel number of the target detection picture based on a convolution layer, a pooling layer and a full connection layer of the preset detection model, and outputting a detection result value of the target detection picture; and then calculating the detection result value according to the two classification loss functions to obtain the complexity of the current target detection picture.

Step S202, when the complexity is low, obtaining a feature vector of the target detection picture according to a first labeling model in a preset labeling model, and calculating according to the feature vector to obtain a target text coordinate of a first text box in the target detection picture;

in this embodiment, the complexity may be divided into a low complexity and a high complexity according to a preset value, and a complexity smaller than or equal to the preset value is the low complexity, and a complexity larger than the preset value is the high complexity. And when the complexity of the target detection picture is low, acquiring the target text coordinates of the target detection picture according to a first annotation model in the preset annotation models. The preset labeling model is a preset text coordinate detection model and comprises a first labeling model and a second labeling model. Detecting the low-complexity target detection picture according to the first labeling model, so as to obtain target text coordinates of the target detection picture; according to the second labeling model, the high-complexity target detection picture can be detected, the detection text coordinate of the target detection picture can be obtained, and according to the target text coordinate and the detection text coordinate, the detection texts of the low-complexity and high-complexity target detection picture can be respectively obtained. Specifically, the target text coordinates and the detection text coordinates are each composed of the lower left corner, lower right corner, upper left corner and upper right corner coordinates of each text box in the target detection picture. And when the complexity of the target detection picture is low, acquiring a feature image of the target detection picture and a preset detection feature frame. Calculating the feature picture and the detection feature frame based on the first labeling model to obtain a feature vector of the target detection picture; and outputting the second feature vector through a two-way long-short-term memory network, a full-connection layer and a regression layer in the first labeling model to obtain the target text coordinate of the current target detection picture.

Step S203, calculating the center coordinate of the target detection picture according to the target text coordinate, fusing the first text boxes with the center coordinate smaller than or equal to a preset error value into a new text box, and determining the first text boxes with the center coordinate larger than the preset error value as fixed text boxes;

in this embodiment, the first text box is a text box obtained by detecting the target pictures according to the first labeling model, and the center coordinates are mean coordinates of the first text box in each target detection picture. And calculating an x-mean value and a y-mean value of the target text coordinates of each first text box in the target detection picture, and taking the x-mean value and the y-mean value as the center coordinates of the corresponding first text box. And when the center coordinates corresponding to each first text box are obtained, the first text boxes with the center coordinates smaller than or equal to the preset error value are fused into a new text box. The left lower corner coordinate of the new text box takes the minimum x value and the minimum y value of the target text coordinate in the fused first text box, the right upper corner coordinate of the new text box takes the maximum x value and the maximum y value of the target text coordinate in the fused first text box, the right lower corner coordinate of the new text box takes the maximum x value and the minimum y value of the target text coordinate in the fused first text box, and the left upper corner coordinate of the new text box takes the minimum x value and the maximum y value of the target text coordinate in the fused first text box. And determining the first text box with the center coordinate larger than the preset error value as a fixed text box.

And S204, extracting text information in the new text box and the fixed text box, and determining the text information as the detection text of the target detection picture.

In this embodiment, when a new text box and a fixed text box are obtained, text information in the new text box and the fixed text box is extracted, and the text information is arranged according to the arrangement sequence of the text boxes, so as to obtain a detection text of the target detection picture.

It should be emphasized that to further ensure the privacy and security of the detected text, the detected text may also be stored in a blockchain node.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment realizes the text detection of pictures with different complexity, reduces the manual labeling cost, saves the response time of model processing, and further improves the efficiency and accuracy of the text detection of pictures.

In some embodiments of the present application, the preset error value includes a first error value and a second error value, and fusing the first text box with the center coordinate less than or equal to the preset error value into the new text box includes:

In this embodiment, the preset error value includes a first error value and a second error value, and when the center coordinates corresponding to each first text box are obtained, a first pixel difference value between y-axis coordinates of two adjacent center coordinates and a second pixel difference value between x-axis coordinates of the two center coordinates are sequentially obtained. The first pixel difference value is the pixel difference value between the y-axis coordinates of the two center point coordinates, and the second pixel difference value is the pixel difference value between the x-axis coordinates of the two center point coordinates. And fusing the first text boxes with the first pixel difference value smaller than or equal to the first error value and the second pixel difference value smaller than or equal to the second error value between the center coordinates to obtain new text boxes.

According to the embodiment, the text boxes are fused, so that the combination of texts with smaller errors is realized, the error splitting of the characters in the process of detecting the characters of the low-complexity picture is avoided, and the accuracy of detecting the characters of the picture is further improved.

In some embodiments of the present application, after calculating the complexity of the target detection picture according to the preset detection model, the method includes:

In this embodiment, when the complexity of the target detection picture is high, the minimum text coordinates of the second text box in the minimum picture corresponding to the target detection picture are obtained according to the second annotation model in the preset annotation model. The second text box is obtained by detecting the target detection picture according to a second labeling model, and the second labeling model is a high-complexity labeling model which is trained in advance. And obtaining a minimum picture corresponding to the target detection picture according to the second labeling model, wherein the minimum picture is the minimum picture after the target detection picture is scaled, and the target detection picture can be subjected to pixel scaling through the second labeling model, so that the minimum picture is obtained. And when the minimum picture is obtained, detecting a second text box in the minimum picture based on the second labeling model, so that the minimum text coordinate corresponding to the second text box in the minimum picture can be obtained. When the minimum text coordinate is obtained, mapping the minimum text coordinate to a maximum picture corresponding to the target detection picture, namely amplifying all obtained minimum text coordinates according to a preset mapping proportion between the minimum picture and the maximum picture, and obtaining the detection text coordinate. And obtaining text content in the target text coordinates to obtain the detection text of the target detection picture.

According to the embodiment, the text detection is carried out on the picture with high complexity through the second labeling model, so that the pertinence detection on the text of the picture with high complexity is realized, and the detection efficiency and accuracy of the picture with high complexity are further improved.

In some embodiments of the present application, the mapping the minimum text coordinate to the maximum picture corresponding to the target detection picture in parallel to obtain the detection text coordinate of the target detection picture includes:

In this embodiment, when obtaining the target text coordinates corresponding to the target detection picture with high complexity, the minimum text coordinates may be mapped to the maximum picture in parallel according to the preset mapping proportion by obtaining the preset mapping proportion, so as to obtain the detection text coordinates of the target detection picture. Specifically, the preset mapping proportion is a preset proportion when the second labeling model zooms the target detection picture, and the ratio range is 0 to 1, for example, the preset mapping proportion is taken to be 0.4. And when the preset mapping proportion is obtained, amplifying all the obtained minimum text coordinates at the same time according to the preset mapping proportion to obtain the detection text coordinates of the target detection picture.

According to the method and the device for detecting the text information of the target detection picture, the minimum text coordinates are amplified according to the preset mapping proportion, so that the accurate acquisition of the detected text coordinates of the target detection picture is realized, the text information of the target detection picture can be accurately positioned through the detected text coordinates, and the possible word disorder during word detection of the picture with high complexity is avoided.

In some embodiments of the present application, calculating the complexity of the target detection picture according to the preset detection model includes:

In this embodiment, the preset detection model includes a convolution layer, a pooling layer, and a full connection layer. And when the target detection picture is obtained, the length, the width and the channel number of the target detection picture are obtained. And inputting the length, the width and the channel number of the target detection picture into a convolution layer in a preset detection model, and then outputting a detection result value of the target detection picture through a pooling layer and a full connection layer. And when the detection result value is obtained, calculating the first result value through a preset two-classification loss function to obtain the complexity of the current target detection picture. The complexity can be represented by p, the range of p is between 0 and 1, the larger p represents the smaller characters in the target detection picture, the smaller the interval between the characters is, and the higher the complexity of the target detection picture is; the smaller p is, the larger the characters in the target detection picture are, the larger the interval between the characters is, and the lower the complexity of the target detection picture is.

According to the embodiment, the complexity of the target detection picture is calculated when the target detection picture is obtained, so that the target detection picture is classified and detected according to the complexity, and the detection efficiency of the target detection picture is further improved.

In some embodiments of the present application, before the step of obtaining the feature vector of the target detection picture according to the first labeling model in the preset labeling models, the method further includes:

In this embodiment, before labeling the target detection picture according to the preset labeling model, a basic labeling model needs to be established in advance, and training is performed on the basic labeling model to obtain the preset labeling model. The preset annotation model comprises a first annotation model and a second annotation model, wherein the first annotation model is used for processing low-complexity target detection pictures, and the second annotation model is used for processing high-complexity target detection pictures. The first annotation model and the second annotation model have different network structures, but the first annotation model and the second annotation model can be obtained through training in the same training mode. Specifically, an initial text picture is acquired, wherein the initial text picture is a plurality of pre-acquired text pictures, and the initial text picture is divided into a training picture and a test picture. And detecting the initial text coordinates of the training pictures based on a basic annotation model, wherein the basic annotation model can be a network structure of a first annotation model or a network structure of a second annotation model. And detecting according to the basic annotation model to obtain initial text coordinates of the training picture, and annotating the training picture according to a preset annotation tool to obtain annotated text coordinates of the training picture. And training the basic annotation model according to the initial text coordinate and the annotation text coordinate, namely calculating a loss function of the basic annotation model according to the annotation text coordinate and the initial text coordinate, and obtaining the trained basic annotation model when the loss function converges. And when the trained basic annotation model is obtained, testing the trained basic annotation model according to the test picture. If the initial text coordinate obtained by detecting the test picture through the trained basic annotation model is larger than or equal to a preset similarity threshold value, determining that the trained basic annotation model passes the verification of the test picture. And when the verification passing rate of the trained basic annotation model on the test picture is greater than or equal to the preset passing rate, determining the trained basic annotation model as the preset annotation model.

According to the embodiment, the basic annotation model is trained in advance, so that the preset annotation model obtained through training can accurately detect the characters of the pictures, the annotation time of the character detection of the pictures is saved, and the character detection efficiency of the pictures is improved.

In some embodiments of the present application, the calculating the loss function of the basic annotation model according to the annotation text coordinates includes:

In this embodiment, when the initial text coordinate of the training picture is obtained, the training picture is labeled according to a preset labeling tool, so as to obtain the labeled text coordinate of the training picture. And calculating the square difference of the initial text coordinate and the labeling text coordinate, and calculating the loss function of the basic labeling model according to the square difference. The calculation formula of the loss function of the basic annotation model is as follows:

wherein omicron (r) _k For the initial text coordinates,to annotate text coordinates.

According to the method and the device for calculating the loss function of the basic annotation model, the training time of the basic annotation model is saved, and the training efficiency of the basic annotation model is improved.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a photo text detection apparatus, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the photo text detection apparatus 300 according to the present embodiment includes: a detection module 301, a labeling module 302, a confirmation module 303, and an extraction module 304. Wherein:

the detection module 301 is configured to calculate, when a target detection picture is received, complexity of the target detection picture according to a preset detection model;

wherein, the detection module 301 includes:

the first calculation unit is used for inputting the target detection picture into the convolution layer of the preset detection model, and outputting a detection result value through the pooling layer and the full-connection layer;

and the second calculation unit is used for predicting the detection result value according to a preset classification loss function to obtain the complexity of the target detection picture.

The labeling module 302 is configured to obtain, when the complexity is low, a feature vector of the target detection picture according to a first labeling model in a preset labeling model, and calculate, according to the feature vector, a target text coordinate of a first text box in the target detection picture;

The confirmation module 303 is configured to calculate a center coordinate of the target detection picture according to the target text coordinate, fuse a first text box with the center coordinate less than or equal to a preset error value into a new text box, and determine the first text box with the center coordinate greater than the preset error value as a fixed text box;

wherein the preset error value includes a first error value and a second error value, and the confirmation module 303 includes:

an acquisition unit, configured to acquire a first pixel difference value of y-axis coordinates of two adjacent center coordinates and a second pixel difference value of x-axis coordinates of the center coordinates;

and the confirmation unit is used for fusing the first text box with the first pixel difference value smaller than or equal to the first error value and the second text box with the second pixel difference value smaller than or equal to the second error value into a new text box.

And the extracting module 304 is configured to extract text information in the new text box and the fixed text box, and determine that the text information is a detection text of the target detection picture.

The picture and text detection device provided in this embodiment further includes:

the obtaining module is used for obtaining a minimum picture corresponding to the target detection picture and a minimum text coordinate of a second text box in the minimum picture according to a second labeling model in a preset labeling model when the complexity is high;

and the mapping module is used for mapping the minimum text coordinate to the maximum picture corresponding to the target detection picture in parallel to obtain the detection text coordinate of the target detection picture, and calculating according to the detection text coordinate to obtain the detection text corresponding to the target detection picture.

Wherein the mapping module comprises:

and the mapping unit is used for acquiring a preset mapping proportion, and amplifying the minimum text coordinate in parallel according to the preset mapping proportion to obtain the detection text coordinate of the target detection picture.

The dividing module is used for acquiring an initial text picture, dividing the initial text picture into a training picture and a test picture, and inputting the training picture into a preset basic labeling model to obtain labeling text coordinates of the training picture;

the training module is used for calculating a loss function of the basic annotation model according to the annotation text coordinates, and determining the basic annotation model as a trained basic annotation model when the loss function converges;

the verification module is used for verifying the trained basic annotation model according to the test picture, and determining that the trained basic annotation model is a preset annotation model when the verification passing rate of the trained basic annotation model on the test picture is greater than or equal to the preset passing rate.

Wherein, training module includes:

the marking unit is used for marking the training pictures based on a preset marking tool to obtain initial text coordinates of the training pictures;

and the third calculation unit is used for calculating the square difference between the initial text coordinate and the labeling text coordinate, and calculating the loss function of the basic labeling model according to the square difference.

The picture character detection device provided by the embodiment realizes character detection of pictures with different complexity, reduces the manual labeling cost, saves the response time of model processing, and further improves the efficiency and accuracy of picture character detection.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only computer device 6 having components 61-63 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 61 includes at least one type of readable storage media including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal memory unit of the computer device 6 and an external memory device. In this embodiment, the memory 61 is generally used to store an operating system and various application software installed on the computer device 6, such as computer readable instructions of a photo text detection method. Further, the memory 61 may be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the pictographic character detection method.

The network interface 63 may comprise a wireless network interface or a wired network interface, which network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The computer equipment provided by the embodiment realizes the character detection of pictures with different complexity, reduces the manual labeling cost, saves the response time of model processing, and further improves the efficiency and accuracy of the picture character detection.

The present application also provides another embodiment, namely, a computer-readable storage medium, where computer-readable instructions are stored, where the computer-readable instructions are executable by at least one processor, so that the at least one processor performs the steps of the photo text detection method as described above.

The computer readable storage medium provided by the embodiment realizes the character detection of pictures with different complexity, reduces the manual labeling cost, saves the response time of model processing, and further improves the efficiency and accuracy of the character detection of the pictures.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. The picture character detection method is characterized by comprising the following steps of:

when a target detection picture is received, calculating the complexity of the target detection picture according to a preset detection model, wherein the step of calculating the complexity of the target detection picture according to the preset detection model specifically comprises the following steps: inputting the target detection picture to a convolution layer of the preset detection model, calculating the length, width and channel number of the target detection picture through a pooling layer and a full connection layer, and outputting to obtain a detection result value; predicting the detection result value according to a preset two-class loss function to obtain the complexity of the target detection picture, wherein the complexity is represented by p, the value range of p is between 0 and 1, the larger the value range of p comprises 0 and 1, the smaller the characters in the target detection picture are, the smaller the interval between the characters is, and the higher the complexity of the target detection picture is; the smaller p is, the larger the characters in the target detection picture are, the larger the intervals between the characters are, the lower the complexity of the target detection picture is, and the preset detection model is a light-weight convolutional neural network picture complexity detection model based on VGG 16;

extracting text information in the new text box and the fixed text box, and determining the text information as a detection text of the target detection picture;

the preset error value comprises a first error value and a second error value, and the step of fusing the first text box with the center coordinate smaller than or equal to the preset error value into a new text box specifically comprises the following steps:

fusing a first text box with the first pixel difference value smaller than or equal to the first error value and the second pixel difference value smaller than or equal to the second error value into a new text box;

After the step of calculating the complexity of the target detection picture according to a preset detection model, the method comprises the following steps:

2. The method for detecting text in a picture according to claim 1, wherein the step of mapping the minimum text coordinates to the maximum picture corresponding to the target detected picture in parallel to obtain the detected text coordinates of the target detected picture specifically includes:

3. The method for detecting characters in a picture according to claim 1, further comprising, before the step of obtaining the feature vector of the target detected picture according to a first labeling model among preset labeling models:

4. A method of detecting pictographic characters according to claim 3, wherein the step of calculating the loss function of the base annotation model from the annotation text coordinates comprises:

5. A picture and text detection apparatus, comprising:

a detection module for calculating the complexity of the target detection picture according to a preset detection model when the target detection picture is received,

wherein the detection module comprises a first calculation unit and a second calculation unit,

the first computing unit is used for inputting the target detection picture to a convolution layer of the preset detection model, calculating the length, width and channel number of the target detection picture through a pooling layer and a full-connection layer, outputting to obtain a detection result value, wherein the preset detection model is a light-weight convolutional neural network picture complexity detection model based on VGG16,

the second calculation unit is used for predicting the detection result value according to a preset two-class loss function to obtain the complexity of the target detection picture, the complexity is represented by p, the value range of p is between 0 and 1, the larger the value range of p comprises 0 and 1, the smaller the characters in the target detection picture are represented, the smaller the intervals between the characters are, and the higher the complexity of the target detection picture is; the smaller p is, the larger the characters in the target detection picture are, the larger the interval between the characters is, and the lower the complexity of the target detection picture is;

the extraction module is used for extracting text information in the new text box and the fixed text box and determining the text information as a detection text of the target detection picture;

the preset error value comprises a first error value and a second error value, and the confirmation module specifically comprises:

a confirmation unit, configured to fuse a first text box, where the first pixel difference value is less than or equal to the first error value, and the second pixel difference value is less than or equal to the second error value, into a new text box;

The picture and text detection device further comprises:

6. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the pictographic detection method of any of claims 1 to 4.

7. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the pictographic detection method of any of claims 1 to 4.