WO2022105120A1 - Text detection method and apparatus from image, computer device and storage medium - Google Patents

Text detection method and apparatus from image, computer device and storage medium Download PDF

Info

Publication number
WO2022105120A1
WO2022105120A1 PCT/CN2021/090512 CN2021090512W WO2022105120A1 WO 2022105120 A1 WO2022105120 A1 WO 2022105120A1 CN 2021090512 W CN2021090512 W CN 2021090512W WO 2022105120 A1 WO2022105120 A1 WO 2022105120A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
picture
coordinates
target detection
preset
Prior art date
Application number
PCT/CN2021/090512
Other languages
French (fr)
Chinese (zh)
Inventor
左彬靖
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022105120A1 publication Critical patent/WO2022105120A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment and storage medium for detecting text in pictures.
  • target detection technology text detection methods in target detection technology are used in more and more fields, such as Alipay scanning, ID card recognition and so on.
  • text detection methods in target detection technology are used in more and more fields, such as Alipay scanning, ID card recognition and so on.
  • the information in the picture can be extracted.
  • the algorithm based on FPN feature pyramid networks, feature pyramid network
  • FPN feature pyramid networks, feature pyramid network
  • the algorithm based on pixel level has relatively high accuracy, but the processing time of the model is long, which is difficult to meet the requirements of industrialization. need.
  • the image text detection is mainly to extract useful information in the image, such as name, address, account information and other fields, so as to facilitate the subsequent storage of these parameters and provide data for the subsequent risk control system.
  • a picture may contain a lot of information, and a more complex picture may contain more than one hundred fields.
  • the purpose of the embodiments of the present application is to provide a method, device, computer equipment and storage medium for detecting pictures and characters, so as to solve the technical problem of low efficiency of detecting pictures and characters.
  • the embodiment of the present application provides a method for detecting text in pictures, which adopts the following technical solutions:
  • the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
  • the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value.
  • the first text box of the value is determined to be a fixed text box;
  • the embodiments of the present application also provide a picture and text detection device, which adopts the following technical solutions:
  • a detection module configured to calculate the complexity of the target detection picture according to a preset detection model when the target detection picture is received
  • the labeling module is configured to obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model when the complexity is low complexity, and obtain the target detection picture according to the feature vector calculation.
  • a confirmation module configured to calculate the center coordinates of the target detection image according to the target text coordinates, fuse the first text box with the center coordinates less than or equal to a preset error value into a new text box, and set the center coordinates greater than The first text box of the preset error value is determined to be a fixed text box;
  • the extraction module is used to extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  • an embodiment of the present application further provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes
  • the computer-readable instructions also implement the following steps:
  • the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
  • the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value.
  • the first text box of the value is determined to be a fixed text box;
  • an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processing The device also performs the following steps:
  • the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
  • the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value.
  • the first text box of the value is determined to be a fixed text box;
  • the complexity of the target detection image is calculated according to the preset detection model, and the model of the target detection image can be selected according to the complexity, so as to further target the target detection image.
  • the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the target detection picture is calculated according to the feature vector.
  • the target text coordinates of the first text box in the text box can be used to accurately locate the text information of the target detection picture; then, according to the target text coordinates, the center coordinates of the target detection picture are calculated, and the center coordinates are less than or equal to the preset error value.
  • the first text box is merged into a new text box, and the first text box whose center coordinates are greater than the preset error value is determined as a fixed text box, thereby avoiding the wrong splitting of pictures and texts in low complexity, and improving the picture and text.
  • the accuracy of detection finally, extract the text information in the new text box and the fixed text box, and determine the text information as the detection text of the target detection image, which realizes the text detection of images of different complexity, reduces the cost of manual annotation, and saves
  • the response time processed by the model further improves the efficiency and accuracy of image and text detection.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is a flow chart of an embodiment of a picture text detection method according to the present application.
  • FIG. 3 is a schematic structural diagram of an embodiment of a picture and text detection device according to the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • Reference numerals picture and text detection device 300, detection module 301, labeling module 302, confirmation module 303 and extraction module 304.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4
  • the server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
  • the image and text detection methods provided in the embodiments of the present application are generally executed by a server/terminal device, and correspondingly, the image and text detection apparatus is generally set in the server/terminal device.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • the image and text detection method includes the following steps:
  • Step S201 when receiving the target detection picture, calculate the complexity of the target detection picture according to a preset detection model
  • the target detection picture is a detection picture including target text
  • the complexity of the target detection picture is calculated according to a preset detection model
  • the preset detection model is a preset picture complexity detection model, such as A lightweight convolutional neural network discriminant model based on VGG16.
  • input the target detection picture into the preset detection model calculate the length, width and channel number of the target detection picture based on the convolution layer, pooling layer and fully connected layer of the preset detection model, and output
  • the detection result value of the target detection picture is obtained; then the detection result value is calculated according to the two-class loss function, that is, the complexity of the current target detection picture is obtained.
  • Step S202 when the complexity is low complexity, obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model, and obtain the first target detection picture according to the feature vector calculation.
  • the complexity can be divided into low complexity and high complexity according to the preset value, the complexity less than or equal to the preset value is the low complexity, and the complexity greater than the preset value is the high complexity the complexity.
  • the preset annotation model is a preset text coordinate detection model, including a first annotation model and a second annotation model. Detecting a low-complexity target detection picture according to the first annotation model can obtain the target text coordinates of the target detection picture; according to the second labeling model, a high-complexity target detection picture can be detected, and the detection of the target detection picture can be obtained. Text coordinates.
  • the detected texts of the low-complexity and high-complexity target detection images can be obtained respectively.
  • the coordinates of the target text and the coordinates of the detected text are composed of the coordinates of the lower left corner, the lower right corner, the upper left corner and the upper right corner of each text box in the target detection picture.
  • a feature map of the target detection picture and a preset detection feature frame are acquired. Calculate the feature picture and the detection feature frame based on the first labeling model to obtain the feature vector of the target detection picture; the second feature vector is passed through the bidirectional long short-term memory network, the fully connected layer and the regression layer in the first labeling model. , and output the target text coordinates of the current target detection image.
  • Step S203 Calculate the center coordinates of the target detection picture according to the target text coordinates, fuse the first text box whose center coordinates are less than or equal to a preset error value into a new text box, and set the center coordinates greater than the The first text box of the preset error value is determined as a fixed text box;
  • the first text box is a text box obtained by detecting the target picture according to the first annotation model
  • the center coordinates are the mean coordinates of the first text boxes in each target detection picture. Calculate the x mean value and the y mean value of the target text coordinates of each first text box in the target detection image, and use the x mean value and the y mean value as the center coordinates of the corresponding first text box.
  • the center coordinates corresponding to each first text box are obtained, the first text boxes whose center coordinates are less than or equal to the preset error value are merged into a new text box.
  • the coordinates of the lower left corner of the new text box take the minimum x value and the minimum y value of the coordinates of the target text in the fused first text box, and the coordinates of the upper right corner of the new text box take the coordinates of the target text in the first fused text box.
  • the maximum x value and the maximum y value, the coordinates of the lower right corner of the new text box take the maximum x value and the minimum y value of the target text coordinates in the first text box fused, and the coordinates of the upper left corner of the new text box take the first fused text
  • the first text box whose center coordinates are greater than the preset error value is determined as a fixed text box.
  • Step S204 Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  • the text information in the new text box and the fixed text box is extracted, and the text information is arranged in the order of the text boxes, that is, the target detection picture is obtained. Detect text.
  • the above detection text can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • This embodiment realizes text detection for pictures of different complexity, reduces the cost of manual labeling, saves the response time of model processing, and further improves the efficiency and accuracy of text detection in pictures.
  • the preset error value includes a first error value and a second error value
  • the fusion of the first text box whose center coordinates are less than or equal to the preset error value into a new text box includes:
  • a first text box whose first pixel difference value is less than or equal to the first error value and whose second pixel difference value is less than or equal to the second error value is merged into a new text box.
  • the preset error value includes a first error value and a second error value.
  • the difference between the y-axis coordinates of two adjacent center coordinates is sequentially obtained.
  • the first pixel difference value, and the second pixel difference value between the x-axis coordinates of the two center coordinates.
  • the first pixel difference value is the pixel difference value between the y-axis coordinates of the two center point coordinates
  • the second pixel difference value is the pixel difference value between the x-axis coordinates of the two center point coordinates.
  • a new text box is obtained by fusing the first text boxes with the first pixel difference between the center coordinates less than or equal to the first error value and the second pixel difference less than or equal to the second error value.
  • This embodiment realizes the combination of texts with small errors by fusing the text boxes, avoids the wrong splitting of texts in the process of text detection for low-complexity pictures, and further improves the accuracy of text detection in pictures .
  • the method after calculating the complexity of the target detection picture according to the preset detection model, the method includes:
  • the minimum text coordinates are mapped to the largest picture corresponding to the target detection picture in parallel to obtain the detected text coordinates of the target detection picture, and the detected text corresponding to the target detection picture is calculated according to the detected text coordinates.
  • the minimum text coordinates of the second text box in the minimum picture corresponding to the target detection picture are obtained according to the second labeling model in the preset labeling model.
  • the second text box is a text box obtained by detecting the target detection picture according to the second labeling model
  • the second labeling model is a pre-trained high-complexity labeling model.
  • the minimum picture corresponding to the target detection picture is obtained according to the second labeling model, and the minimum picture is the minimum picture after scaling the target detection picture.
  • the second labeling model can perform pixel scaling on the target detection picture to obtain the minimum picture.
  • the second text box in the minimum picture is detected based on the second labeling model, thereby obtaining the minimum text coordinates corresponding to the second text box in the minimum picture.
  • the minimum text coordinates are obtained, the minimum text coordinates are mapped to the maximum picture corresponding to the target detection picture, that is, all the obtained minimum text coordinates are enlarged according to the preset mapping ratio between the minimum picture and the maximum picture at the same time.
  • Detect text coordinates Obtaining the text content in the target text coordinates is to obtain the detection text of the target detection image.
  • This embodiment uses the second labeling model to perform text detection on pictures with high complexity, thereby realizing targeted detection of text in pictures with high complexity, and further improving the detection efficiency and accuracy of pictures with high complexity.
  • the above-mentioned parallel mapping of the minimum text coordinates to the maximum picture corresponding to the target detection picture, and obtaining the detected text coordinates of the target detection picture includes:
  • a preset mapping ratio is acquired, and the minimum text coordinates are enlarged in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
  • the target detection picture when obtaining the target text coordinates corresponding to the target detection picture with high complexity, can be obtained by obtaining a preset mapping ratio, and mapping the minimum text coordinates to the largest picture in parallel according to the preset mapping ratio.
  • the detected text coordinates Specifically, the preset mapping ratio is a preset ratio when the second annotation model scales the target detection picture, and the ratio ranges from 0 to 1. For example, the preset mapping ratio is 0.4.
  • the preset mapping ratio is obtained, all the obtained minimum text coordinates are simultaneously enlarged according to the preset mapping ratio, that is, the detected text coordinates of the target detection image are obtained.
  • the accurate acquisition of the detected text coordinates of the target detection picture is realized, so that the text information of the target detection picture can be accurately located by detecting the text coordinates, avoiding the need for Text confusion may occur when text detection is performed on pictures with high complexity.
  • the above calculation of the complexity of the target detection picture according to the preset detection model includes:
  • the detection result value is predicted according to a preset two-class loss function to obtain the complexity of the target detection picture.
  • the preset detection model includes a convolution layer, a pooling layer, and a fully connected layer.
  • the length, width and channel number of the target detection picture are obtained.
  • the detection result value is obtained, the first result value is calculated by the preset two-class loss function, that is, the complexity of the current target detection picture is obtained.
  • the complexity can be represented by p, and the range of p is between 0 and 1.
  • the larger p is, the smaller the text in the target detection picture, the smaller the interval between words, and the higher the complexity of the target detection picture. High; the smaller the p, the larger the text in the target detection picture, the larger the interval between words, and the lower the complexity of the target detection picture.
  • the complexity of the target detection picture is calculated, so as to realize the classification and detection of the target detection picture according to the complexity, and further improve the detection efficiency of the target detection picture.
  • the method before the above step of acquiring the feature vector of the target detection picture according to the first annotation model in the preset annotation model, the method further includes:
  • the trained basic labeling model is verified according to the test picture, and when the verification pass rate of the trained basic labeling model on the test picture is greater than or equal to a preset pass rate, the trained basic labeling model is determined.
  • the annotation model is a preset annotation model.
  • a basic annotation model needs to be established in advance, and the basic annotation model is trained to obtain the preset annotation model.
  • the preset labeling model includes a first labeling model and a second labeling model.
  • the first labeling model is used for processing low-complexity target detection pictures
  • the second labeling model is used for processing high-complexity target detection pictures.
  • the first labeling model and the second labeling model have different network structures, but both the first labeling model and the second labeling model can be trained by the same training method.
  • an initial text picture is obtained, where the initial text picture is a plurality of pre-collected text pictures, and the initial text picture is divided into a training picture and a test picture.
  • the initial text coordinates of the training picture are detected based on a basic labeling model, where the basic labeling model may be the network structure of the first labeling model or the network structure of the second labeling model.
  • the basic labeling model may be the network structure of the first labeling model or the network structure of the second labeling model.
  • the initial text coordinates of the training picture are obtained, and at the same time, the training picture is labelled according to the preset labeling tool, and the labelled text coordinates of the training picture are obtained.
  • the basic labeling model is trained according to the initial text coordinates and the labeling text coordinates, that is, the loss function of the basic labeling model is calculated according to the labeling text coordinates and the initial text coordinates. When the loss function converges, the trained basic labeling model is obtained.
  • the trained basic labeling model is tested according to the test image. If the similarity between the initial text coordinates detected by the trained basic labeling model and the labeling text coordinates corresponding to the test picture is greater than or equal to a preset similarity threshold, then the trained basic labeling model is determined to be the same as the test picture. Verification passed. When the verification pass rate of the trained basic annotation model for the test image is greater than or equal to the preset pass rate, the trained basic annotation model is determined to be the preset annotation model.
  • the basic labeling model is trained in advance, so that the preset labeling model obtained by training can accurately detect the text of the picture, save the labeling time of the picture and text detection, and improve the efficiency of the picture and text detection.
  • the above-mentioned calculation of the loss function of the basic annotation model according to the coordinates of the annotation text includes:
  • the training picture is labeled according to a preset labeling tool, and the labeled text coordinates of the training picture are obtained. Calculate the squared difference between the initial text coordinates and the labeled text coordinates, and then calculate the loss function of the basic labeling model according to the squared difference.
  • the calculation formula of the loss function of the basic annotation model is as follows:
  • ⁇ k is the initial text coordinate, coordinates for the label text.
  • the training time of the basic labeling model is saved, and the training efficiency of the basic labeling model is improved.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of a picture and text detection device, and the device embodiment corresponds to the method embodiment shown in FIG. 2 .
  • the device embodiment corresponds to the method embodiment shown in FIG. 2 .
  • the image and text detection apparatus 300 in this embodiment includes: a detection module 301 , a labeling module 302 , a confirmation module 303 , and an extraction module 304 . in:
  • a detection module 301 configured to calculate the complexity of the target detection picture according to a preset detection model when the target detection picture is received;
  • the detection module 301 includes:
  • a first computing unit used for inputting the target detection picture to the convolution layer of the preset detection model, and outputting the detection result value through the pooling layer and the fully connected layer;
  • the second calculation unit is configured to predict the detection result value according to the preset binary classification loss function, and obtain the complexity of the target detection picture.
  • the target detection picture is a detection picture including target text
  • the complexity of the target detection picture is calculated according to a preset detection model
  • the preset detection model is a preset picture complexity detection model, such as A lightweight convolutional neural network discriminant model based on VGG16.
  • input the target detection picture into the preset detection model calculate the length, width and channel number of the target detection picture based on the convolution layer, pooling layer and fully connected layer of the preset detection model, and output
  • the detection result value of the target detection picture is obtained; then the detection result value is calculated according to the two-class loss function, that is, the complexity of the current target detection picture is obtained.
  • the labeling module 302 is configured to obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model when the complexity is low complexity, and calculate the target detection picture according to the feature vector The target text coordinates of the first text box in ;
  • the complexity can be divided into low complexity and high complexity according to the preset value, the complexity less than or equal to the preset value is the low complexity, and the complexity greater than the preset value is the high complexity the complexity.
  • the preset annotation model is a preset text coordinate detection model, including a first annotation model and a second annotation model. Detecting a low-complexity target detection picture according to the first annotation model can obtain the target text coordinates of the target detection picture; according to the second labeling model, a high-complexity target detection picture can be detected, and the detection of the target detection picture can be obtained. Text coordinates.
  • the detected texts of the low-complexity and high-complexity target detection images can be obtained respectively.
  • the coordinates of the target text and the coordinates of the detected text are composed of the coordinates of the lower left corner, the lower right corner, the upper left corner and the upper right corner of each text box in the target detection picture.
  • a feature map of the target detection picture and a preset detection feature frame are acquired. Calculate the feature picture and the detection feature frame based on the first labeling model to obtain the feature vector of the target detection picture; the second feature vector is passed through the bidirectional long short-term memory network, the fully connected layer and the regression layer in the first labeling model. , and output the target text coordinates of the current target detection image.
  • Confirmation module 303 configured to calculate the center coordinates of the target detection picture according to the target text coordinates, fuse the first text box whose center coordinates are less than or equal to a preset error value into a new text box, and combine the center coordinates A first text box larger than the preset error value is determined as a fixed text box;
  • the preset error value includes a first error value and a second error value
  • the confirmation module 303 includes:
  • an acquisition unit configured to acquire the first pixel difference between the y-axis coordinates of the two adjacent center coordinates, and the second pixel difference between the x-axis coordinates of the center coordinates;
  • a confirmation unit configured to merge a first text box whose first pixel difference value is less than or equal to the first error value and the second pixel difference value is less than or equal to the second error value into a new text box.
  • the first text box is a text box obtained by detecting the target picture according to the first annotation model
  • the center coordinates are the mean coordinates of the first text boxes in each target detection picture. Calculate the x mean value and the y mean value of the target text coordinates of each first text box in the target detection image, and use the x mean value and the y mean value as the center coordinates of the corresponding first text box.
  • the center coordinates corresponding to each first text box are obtained, the first text boxes whose center coordinates are less than or equal to the preset error value are merged into a new text box.
  • the coordinates of the lower left corner of the new text box take the minimum x value and the minimum y value of the coordinates of the target text in the fused first text box, and the coordinates of the upper right corner of the new text box take the coordinates of the target text in the first fused text box.
  • the maximum x value and the maximum y value, the coordinates of the lower right corner of the new text box take the maximum x value and the minimum y value of the target text coordinates in the first text box fused, and the coordinates of the upper left corner of the new text box take the first fused text
  • the first text box whose center coordinates are greater than the preset error value is determined as a fixed text box.
  • the extraction module 304 is configured to extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  • the text information in the new text box and the fixed text box is extracted, and the text information is arranged in the order of the text boxes, that is, the target detection picture is obtained. Detect text.
  • the above detection text can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the obtaining module is configured to obtain the minimum picture corresponding to the target detection picture according to the second labeling model in the preset labeling model when the complexity is high, and the minimum picture of the second text box in the minimum picture. text coordinates;
  • the mapping module is used to map the minimum text coordinates to the maximum picture corresponding to the target detection picture in parallel, to obtain the detected text coordinates of the target detection picture, and to calculate and obtain the corresponding target detection picture according to the detected text coordinates. Detect text.
  • mapping module includes:
  • the mapping unit is configured to obtain a preset mapping ratio, and enlarge the minimum text coordinates in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
  • the minimum text coordinates of the second text box in the minimum picture corresponding to the target detection picture are obtained according to the second labeling model in the preset labeling model.
  • the second text box is a text box obtained by detecting the target detection picture according to the second labeling model
  • the second labeling model is a pre-trained high-complexity labeling model.
  • the minimum picture corresponding to the target detection picture is obtained according to the second labeling model, and the minimum picture is the minimum picture after scaling the target detection picture.
  • the second labeling model can perform pixel scaling on the target detection picture to obtain the minimum picture.
  • the second text box in the minimum picture is detected based on the second labeling model, thereby obtaining the minimum text coordinates corresponding to the second text box in the minimum picture.
  • the minimum text coordinates are obtained, the minimum text coordinates are mapped to the maximum picture corresponding to the target detection picture, that is, all the obtained minimum text coordinates are enlarged according to the preset mapping ratio between the minimum picture and the maximum picture at the same time.
  • Detect text coordinates Obtaining the text content in the target text coordinates is to obtain the detection text of the target detection image.
  • a division module configured to obtain an initial text picture, divide the initial text picture into a training picture and a test picture, input the training picture into a preset basic labeling model, and obtain the labeling text coordinates of the training picture;
  • a training module configured to calculate the loss function of the basic labeling model according to the coordinates of the labeling text, and when the loss function converges, determine that the basic labeling model is the trained basic labeling model;
  • a verification module configured to verify the trained basic labeling model according to the test picture, and determine the The trained basic annotation model is the preset annotation model.
  • the training module includes:
  • a labeling unit configured to label the training picture based on a preset labeling tool to obtain initial text coordinates of the training picture
  • the third calculation unit is configured to calculate the squared difference between the initial text coordinates and the labeled text coordinates, and calculate the loss function of the basic labeling model according to the squared difference.
  • a basic annotation model needs to be established in advance, and the basic annotation model is trained to obtain the preset annotation model.
  • the preset labeling model includes a first labeling model and a second labeling model.
  • the first labeling model is used for processing low-complexity target detection pictures
  • the second labeling model is used for processing high-complexity target detection pictures.
  • the first labeling model and the second labeling model have different network structures, but both the first labeling model and the second labeling model can be trained by the same training method. Specifically, an initial text picture is obtained, where the initial text picture is a plurality of pre-collected text pictures, and the initial text picture is divided into a training picture and a test picture.
  • the initial text coordinates of the training image are detected based on a basic labeling model, where the basic labeling model may be the network structure of the first labeling model or the network structure of the second labeling model.
  • the basic labeling model may be the network structure of the first labeling model or the network structure of the second labeling model.
  • the initial text coordinates of the training picture are obtained, and at the same time, the training picture is labelled according to the preset labeling tool, and the labelled text coordinates of the training picture are obtained.
  • the basic labeling model is trained according to the initial text coordinates and the labeling text coordinates, that is, the loss function of the basic labeling model is calculated according to the labeling text coordinates and the initial text coordinates. When the loss function converges, the trained basic labeling model is obtained.
  • the trained basic labeling model is tested according to the test image. If the similarity between the initial text coordinates detected by the trained basic labeling model and the labeling text coordinates corresponding to the test picture is greater than or equal to the preset similarity threshold, it is determined that the trained basic labeling model is the same as the test picture. Verification passed. When the verification pass rate of the trained basic labeling model for the test image is greater than or equal to the preset pass rate, the trained basic labeling model is determined to be the preset labeling model.
  • the image text detection device proposed in this embodiment realizes text detection on images of different complexity, reduces manual labeling costs, saves the response time of model processing, and further improves the efficiency and accuracy of image text detection.
  • FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 that communicate with each other through a system bus. It should be pointed out that only the computer device 6 with components 61-63 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 61 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 .
  • the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device.
  • the memory 61 is generally used to store the operating system and various application software installed on the computer device 6 , such as computer-readable instructions of a picture and text detection method.
  • the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 62 is typically used to control the overall operation of the computer device 6 . In this embodiment, the processor 62 is configured to execute computer-readable instructions or process data stored in the memory 61, for example, computer-readable instructions for executing the image and text detection method.
  • CPU Central Processing Unit
  • controller a microcontroller
  • microprocessor microprocessor
  • This processor 62 is typically used to control the overall operation of the computer device 6 .
  • the processor 62 is configured to execute computer-readable instructions or process data stored in the memory 61, for example, computer-readable instructions for executing the image and text detection method.
  • the network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
  • the computer device proposed in this embodiment realizes text detection for pictures of different complexity, reduces manual labeling costs, saves response time for model processing, and further improves the efficiency and accuracy of picture text detection.
  • the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned picture text detection method.
  • the computer-readable storage medium proposed in this embodiment realizes text detection for pictures of different complexity, reduces manual labeling costs, saves the response time of model processing, and further improves the efficiency and accuracy of picture text detection.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

A text detection method and apparatus from an image, a computer device and a storage medium, relating to the field of artificial intelligence. The method comprises: when receiving a target detection image, calculating the complexity of the target detection image according to a preconfigured detection model; when the complexity is a low complexity, calculating target text coordinates of first text boxes in the target detection image according to a first marking model in preconfigured marking models; calculating center coordinates of the target detection image according to the target text coordinates, fusing the first text boxes of which the center coordinates are less than or equal to a preset error value into a new text box, and determining the first text boxes of which the center coordinates are greater than the preset error value as fixed text boxes; and extracting text information from the new text box and the fixed text boxes, and determining the text information as detected texts. The method realizes efficient text detection from images.

Description

图片文字检测方法、装置、计算机设备及存储介质Image and text detection method, device, computer equipment and storage medium
本申请要求于2020年11月17日提交中国专利局、申请号为202011286320.X,发明名称为“图片文字检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 17, 2020 with the application number 202011286320.X and the title of the invention is "image text detection method, device, computer equipment and storage medium", the entire content of which is Incorporated herein by reference.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种图片文字检测方法、装置、计算机设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment and storage medium for detecting text in pictures.
背景技术Background technique
随着目标检测技术的快速发展,越来越多的领域都用到了目标检测技术中的文字检测方法,如支付宝扫福,身份证识别等等。通过识别图片中的文本,可以对图片中的信息进行提取。With the rapid development of target detection technology, text detection methods in target detection technology are used in more and more fields, such as Alipay scanning, ID card recognition and so on. By recognizing the text in the picture, the information in the picture can be extracted.
当前,基于FPN(feature pyramid networks,特征金字塔网络)的算法对于较小和密集的文字,检测效果较差,而基于像素级的算法精度相对较高,但是模型的处理时间较长,难以满足工业化需求。此外,发明人意识到,图片文字检测主要是为了抽取图片中有用的信息,如姓名,地址,账号信息等字段,从而方便这些参数的后续落库,为后续风控系统提供数据。然而,一张图片可能包括许多信息,更复杂的图片可能包含一百多个字段,在通过现有技术对图片进行文字检测时,往往会存在文字检测效率低下的问题。At present, the algorithm based on FPN (feature pyramid networks, feature pyramid network) has poor detection effect for small and dense text, while the algorithm based on pixel level has relatively high accuracy, but the processing time of the model is long, which is difficult to meet the requirements of industrialization. need. In addition, the inventor realized that the image text detection is mainly to extract useful information in the image, such as name, address, account information and other fields, so as to facilitate the subsequent storage of these parameters and provide data for the subsequent risk control system. However, a picture may contain a lot of information, and a more complex picture may contain more than one hundred fields. When text detection is performed on a picture by the existing technology, there is often a problem of low text detection efficiency.
发明内容SUMMARY OF THE INVENTION
本申请实施例的目的在于提出一种图片文字检测方法、装置、计算机设备及存储介质,以解决图片文字检测效率低下的技术问题。The purpose of the embodiments of the present application is to provide a method, device, computer equipment and storage medium for detecting pictures and characters, so as to solve the technical problem of low efficiency of detecting pictures and characters.
为了解决上述技术问题,本申请实施例提供一种图片文字检测方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application provides a method for detecting text in pictures, which adopts the following technical solutions:
在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;When receiving the target detection picture, calculate the complexity of the target detection picture according to the preset detection model;
在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;When the complexity is low complexity, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;According to the target text coordinates, the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value. The first text box of the value is determined to be a fixed text box;
提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
为了解决上述技术问题,本申请实施例还提供一种图片文字检测装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application also provide a picture and text detection device, which adopts the following technical solutions:
检测模块,用于在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;a detection module, configured to calculate the complexity of the target detection picture according to a preset detection model when the target detection picture is received;
标注模块,用于在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;The labeling module is configured to obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model when the complexity is low complexity, and obtain the target detection picture according to the feature vector calculation. The target text coordinates of the first text box;
确认模块,用于根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;A confirmation module, configured to calculate the center coordinates of the target detection image according to the target text coordinates, fuse the first text box with the center coordinates less than or equal to a preset error value into a new text box, and set the center coordinates greater than The first text box of the preset error value is determined to be a fixed text box;
提取模块,用于提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信 息为所述目标检测图片的检测文本。The extraction module is used to extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时还实现如下步骤:In order to solve the above technical problem, an embodiment of the present application further provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes The computer-readable instructions also implement the following steps:
在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;When receiving the target detection picture, calculate the complexity of the target detection picture according to the preset detection model;
在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;When the complexity is low complexity, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;According to the target text coordinates, the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value. The first text box of the value is determined to be a fixed text box;
提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器还执行如下步骤:In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processing The device also performs the following steps:
在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;When receiving the target detection picture, calculate the complexity of the target detection picture according to the preset detection model;
在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;When the complexity is low complexity, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;According to the target text coordinates, the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value. The first text box of the value is determined to be a fixed text box;
提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
上述图片文字检测方法,通过在接收到目标检测图片时,根据预设检测模型计算目标检测图片的复杂度,通过复杂度可以对目标检测图片的模型进行选取,从而进一步对目标检测图片进行针对性的文字检测,提高了图片文字检测的效率;之后,在复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取目标检测图片的特征向量,根据特征向量计算得到目标检测图片中第一文本框的目标文本坐标,通过目标文本坐标可以对目标检测图片的文本信息进行精确定位;而后,根据目标文本坐标,计算目标检测图片的中心坐标,将中心坐标小于等于预设误差值的第一文本框融合为新文本框,将中心坐标大于预设误差值的第一文本框确定为固定文本框,由此避免了对低复杂度中图片文字的错误拆分,提高了图片文字检测的精确度;最后,提取新文本框和固定文本框中的文本信息,确定文本信息为目标检测图片的检测文本,实现了对不同复杂度图片的文字检测,减少了人工标注成本,节省了模型处理的响应时长,进一步提高了图片文字检测的效率和准确率。In the above-mentioned image text detection method, when the target detection image is received, the complexity of the target detection image is calculated according to the preset detection model, and the model of the target detection image can be selected according to the complexity, so as to further target the target detection image. Then, when the complexity is low, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the target detection picture is calculated according to the feature vector. The target text coordinates of the first text box in the text box can be used to accurately locate the text information of the target detection picture; then, according to the target text coordinates, the center coordinates of the target detection picture are calculated, and the center coordinates are less than or equal to the preset error value. The first text box is merged into a new text box, and the first text box whose center coordinates are greater than the preset error value is determined as a fixed text box, thereby avoiding the wrong splitting of pictures and texts in low complexity, and improving the picture and text. The accuracy of detection; finally, extract the text information in the new text box and the fixed text box, and determine the text information as the detection text of the target detection image, which realizes the text detection of images of different complexity, reduces the cost of manual annotation, and saves The response time processed by the model further improves the efficiency and accuracy of image and text detection.
附图说明Description of drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
图2根据本申请的图片文字检测方法的一个实施例的流程图;Fig. 2 is a flow chart of an embodiment of a picture text detection method according to the present application;
图3是根据本申请的图片文字检测装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a picture and text detection device according to the present application;
图4是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
附图标记:图片文字检测装置300、检测模块301、标注模块302、确认模块303以及 提取模块304。Reference numerals: picture and text detection device 300, detection module 301, labeling module 302, confirmation module 303 and extraction module 304.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
需要说明的是,本申请实施例所提供的图片文字检测方法一般由服务器/终端设备执行,相应地,图片文字检测装置一般设置于服务器/终端设备中。It should be noted that the image and text detection methods provided in the embodiments of the present application are generally executed by a server/terminal device, and correspondingly, the image and text detection apparatus is generally set in the server/terminal device.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
继续参考图2,示出了根据本申请的图片文字检测的方法的一个实施例的流程图。所述的图片文字检测方法,包括以下步骤:Continuing to refer to FIG. 2 , a flowchart of an embodiment of a method for detecting text in pictures according to the present application is shown. The image and text detection method includes the following steps:
步骤S201,在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;Step S201, when receiving the target detection picture, calculate the complexity of the target detection picture according to a preset detection model;
在本实施例中,目标检测图片为包括有目标文本的检测图片,根据预设检测模型计算该目标检测图片的复杂度;其中,预设检测模型为预先设定的图片复杂度检测模型,如基于VGG16的轻量级卷积神经网络判别模型。具体地,将目标检测图片输入至该预设检测模型中,基于该预设检测模型的卷积层、池化层和全连接层对该目标检测图片的长、宽、通道数进行计算,输出得到该目标检测图片的检测结果值;之后根据二分类损失函数对该检测结果值进行计算,即得到当前目标检测图片的复杂度。In this embodiment, the target detection picture is a detection picture including target text, and the complexity of the target detection picture is calculated according to a preset detection model; wherein, the preset detection model is a preset picture complexity detection model, such as A lightweight convolutional neural network discriminant model based on VGG16. Specifically, input the target detection picture into the preset detection model, calculate the length, width and channel number of the target detection picture based on the convolution layer, pooling layer and fully connected layer of the preset detection model, and output The detection result value of the target detection picture is obtained; then the detection result value is calculated according to the two-class loss function, that is, the complexity of the current target detection picture is obtained.
步骤S202,在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;Step S202, when the complexity is low complexity, obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model, and obtain the first target detection picture according to the feature vector calculation. The target text coordinates of the text box;
在本实施例中,将复杂度按照预设值可以划分为低复杂度和高复杂度,小于等于该预 设值的复杂度即为低复杂度,大于该预设值的复杂度即为高复杂度。在目标检测图片的复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取目标检测图片的目标文本坐标。其中,预设标注模型为预先设定的文本坐标检测模型,包括第一标注模型和第二标注模型。根据第一标注模型对低复杂度的目标检测图片进行检测,可以得到目标检测图片的目标文本坐标;根据第二标注模型可以对高复杂度的目标检测图片进行检测,可以得到目标检测图片的检测文本坐标,根据该目标文本坐标和检测文本坐标则可以分别得到低复杂度和高复杂度目标检测图片的检测文本。具体地,目标文本坐标和检测文本坐标均由目标检测图片中各个文本框的左下角、右下角、左上角和右上角坐标组成。在目标检测图片的复杂度为低复杂度时,获取该目标检测图片的特征图,以及预设的检测特征框。基于第一标注模型对该特征图片与检测特征框进行计算,得到目标检测图片的特征向量;将该第二特征向量经过该第一标注模型中的双向长短期记忆网络、全连接层和回归层,输出得到当前目标检测图片的目标文本坐标。In this embodiment, the complexity can be divided into low complexity and high complexity according to the preset value, the complexity less than or equal to the preset value is the low complexity, and the complexity greater than the preset value is the high complexity the complexity. When the complexity of the target detection picture is low complexity, the target text coordinates of the target detection picture are acquired according to the first labeling model in the preset labeling model. The preset annotation model is a preset text coordinate detection model, including a first annotation model and a second annotation model. Detecting a low-complexity target detection picture according to the first annotation model can obtain the target text coordinates of the target detection picture; according to the second labeling model, a high-complexity target detection picture can be detected, and the detection of the target detection picture can be obtained. Text coordinates. According to the target text coordinates and the detected text coordinates, the detected texts of the low-complexity and high-complexity target detection images can be obtained respectively. Specifically, the coordinates of the target text and the coordinates of the detected text are composed of the coordinates of the lower left corner, the lower right corner, the upper left corner and the upper right corner of each text box in the target detection picture. When the complexity of the target detection picture is low complexity, a feature map of the target detection picture and a preset detection feature frame are acquired. Calculate the feature picture and the detection feature frame based on the first labeling model to obtain the feature vector of the target detection picture; the second feature vector is passed through the bidirectional long short-term memory network, the fully connected layer and the regression layer in the first labeling model. , and output the target text coordinates of the current target detection image.
步骤S203,根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;Step S203: Calculate the center coordinates of the target detection picture according to the target text coordinates, fuse the first text box whose center coordinates are less than or equal to a preset error value into a new text box, and set the center coordinates greater than the The first text box of the preset error value is determined as a fixed text box;
在本实施例中,第一文本框为根据第一标注模型对目标图片进行检测得到的文本框,中心坐标为每个目标检测图片中第一文本框的均值坐标。计算目标检测图片中每个第一文本框的目标文本坐标的x均值和y均值,将x均值和y均值作为对应的第一文本框的中心坐标。在得到每个第一文本框对应的中心坐标时,将中心坐标小于等于预设误差值的第一文本框融合为一个新文本框。其中,新文本框的左下角坐标取所融合的第一文本框中目标文本坐标的最小x值和最小y值,新文本框的右上角坐标取所融合的第一文本框中目标文本坐标的最大x值和最大y值,新文本框的右下角坐标取所融合的第一文本框中目标文本坐标的最大x值和最小y值,新文本框的左上角坐标取所融合的第一文本框中目标文本坐标的最小x值和最大y值。将中心坐标大于预设误差值的第一文本框则确定为固定文本框。In this embodiment, the first text box is a text box obtained by detecting the target picture according to the first annotation model, and the center coordinates are the mean coordinates of the first text boxes in each target detection picture. Calculate the x mean value and the y mean value of the target text coordinates of each first text box in the target detection image, and use the x mean value and the y mean value as the center coordinates of the corresponding first text box. When the center coordinates corresponding to each first text box are obtained, the first text boxes whose center coordinates are less than or equal to the preset error value are merged into a new text box. The coordinates of the lower left corner of the new text box take the minimum x value and the minimum y value of the coordinates of the target text in the fused first text box, and the coordinates of the upper right corner of the new text box take the coordinates of the target text in the first fused text box. The maximum x value and the maximum y value, the coordinates of the lower right corner of the new text box take the maximum x value and the minimum y value of the target text coordinates in the first text box fused, and the coordinates of the upper left corner of the new text box take the first fused text The minimum x and maximum y values for the coordinates of the target text in the box. The first text box whose center coordinates are greater than the preset error value is determined as a fixed text box.
步骤S204,提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Step S204: Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
在本实施例中,在得到新文本框和固定文本框时,提取该新文本框和固定文本框中的文本信息,将该文本信息按照文本框的排列顺序进行排列,即得到目标检测图片的检测文本。In this embodiment, when a new text box and a fixed text box are obtained, the text information in the new text box and the fixed text box is extracted, and the text information is arranged in the order of the text boxes, that is, the target detection picture is obtained. Detect text.
需要强调的是,为进一步保证上述检测文本的私密和安全性,上述检测文本还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above detection text, the above detection text can also be stored in a node of a blockchain.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
本实施例实现了对不同复杂度图片的文字检测,减少了人工标注成本,节省了模型处理的响应时长,进一步提高了图片文字检测的效率和准确率。This embodiment realizes text detection for pictures of different complexity, reduces the cost of manual labeling, saves the response time of model processing, and further improves the efficiency and accuracy of text detection in pictures.
在本申请的一些实施例中,上述预设误差值包括第一误差值和第二误差值,所述将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框包括:In some embodiments of the present application, the preset error value includes a first error value and a second error value, and the fusion of the first text box whose center coordinates are less than or equal to the preset error value into a new text box includes:
获取相邻的两个所述中心坐标的y轴坐标的第一像素差值,以及所述中心坐标的x轴坐标的第二像素差值;Obtain the first pixel difference value of the y-axis coordinates of the two adjacent center coordinates, and the second pixel difference value of the x-axis coordinates of the center coordinates;
将所述第一像素差值小于等于所述第一误差值,且所述第二像素差值小于等于所述第二误差值的第一文本框融合为新文本框。A first text box whose first pixel difference value is less than or equal to the first error value and whose second pixel difference value is less than or equal to the second error value is merged into a new text box.
在本实施例中,预设误差值包括第一误差值和第二误差值,在得到每个第一文本框对应的中心坐标时,依次获取相邻两个中心坐标的y轴坐标之间的第一像素差值,以及该两 个中心坐标的x轴坐标之间的第二像素差值。该第一像素差值即为两个中心点坐标y轴坐标之间的像素差值,第二像素差值即为两个中心点坐标x轴坐标之间的像素差值。将中心坐标之间的第一像素差值小于等于第一误差值,且第二像素差值小于等于第二误差值的第一文本框进行融合即得到新文本框。In this embodiment, the preset error value includes a first error value and a second error value. When the center coordinates corresponding to each first text box are obtained, the difference between the y-axis coordinates of two adjacent center coordinates is sequentially obtained. The first pixel difference value, and the second pixel difference value between the x-axis coordinates of the two center coordinates. The first pixel difference value is the pixel difference value between the y-axis coordinates of the two center point coordinates, and the second pixel difference value is the pixel difference value between the x-axis coordinates of the two center point coordinates. A new text box is obtained by fusing the first text boxes with the first pixel difference between the center coordinates less than or equal to the first error value and the second pixel difference less than or equal to the second error value.
本实施例通过对文本框进行融合,实现了对误差较小的文本的组合,避免了在对低复杂度的图片进行文字检测过程中文字的错误拆分,进一步提高了图片文字检测的精确度。This embodiment realizes the combination of texts with small errors by fusing the text boxes, avoids the wrong splitting of texts in the process of text detection for low-complexity pictures, and further improves the accuracy of text detection in pictures .
在本申请的一些实施例中,在上述根据预设检测模型计算所述目标检测图片的复杂度之后包括:In some embodiments of the present application, after calculating the complexity of the target detection picture according to the preset detection model, the method includes:
在所述复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取所述目标检测图片对应的最小图片,以及所述最小图片中第二文本框的最小文本坐标;When the complexity is high complexity, obtain the minimum picture corresponding to the target detection picture and the minimum text coordinates of the second text box in the minimum picture according to the second labeling model in the preset labeling model;
将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标,根据所述检测文本坐标计算得到所述目标检测图片对应的检测文本。The minimum text coordinates are mapped to the largest picture corresponding to the target detection picture in parallel to obtain the detected text coordinates of the target detection picture, and the detected text corresponding to the target detection picture is calculated according to the detected text coordinates.
在本实施例中,在目标检测图片的复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取目标检测图片对应的最小图片中第二文本框的最小文本坐标。其中,第二文本框为根据第二标注模型对目标检测图片进行检测得到的文本框,第二标注模型则为预先训练完成的高复杂度的标注模型。根据该第二标注模型获取目标检测图片对应的最小图片,该最小图片为该目标检测图片缩放后的最小图片,通过第二标注模型可以对目标检测图片进行像素缩放,从而得到最小图片。在得到最小图片时,基于该第二标注模型对该最小图片中的第二文本框进行检测,由此即可得到该最小图片中第二文本框对应的最小文本坐标。在得到该最小文本坐标时,将该最小文本坐标映射至目标检测图片对应的最大图片,即同时将得到的所有最小文本坐标按照最小图片与最大图片之间的预设映射比例进行放大,则得到检测文本坐标。获取该目标文本坐标中的文本内容,即得到该目标检测图片的检测文本。In this embodiment, when the complexity of the target detection picture is high, the minimum text coordinates of the second text box in the minimum picture corresponding to the target detection picture are obtained according to the second labeling model in the preset labeling model. The second text box is a text box obtained by detecting the target detection picture according to the second labeling model, and the second labeling model is a pre-trained high-complexity labeling model. The minimum picture corresponding to the target detection picture is obtained according to the second labeling model, and the minimum picture is the minimum picture after scaling the target detection picture. The second labeling model can perform pixel scaling on the target detection picture to obtain the minimum picture. When the minimum picture is obtained, the second text box in the minimum picture is detected based on the second labeling model, thereby obtaining the minimum text coordinates corresponding to the second text box in the minimum picture. When the minimum text coordinates are obtained, the minimum text coordinates are mapped to the maximum picture corresponding to the target detection picture, that is, all the obtained minimum text coordinates are enlarged according to the preset mapping ratio between the minimum picture and the maximum picture at the same time. Detect text coordinates. Obtaining the text content in the target text coordinates is to obtain the detection text of the target detection image.
本实施例通过第二标注模型对复杂度高的图片进行文字检测,实现了对复杂度高图片文字的针对性检测,进一步提高了对复杂度高图片的检测效率及准确率。This embodiment uses the second labeling model to perform text detection on pictures with high complexity, thereby realizing targeted detection of text in pictures with high complexity, and further improving the detection efficiency and accuracy of pictures with high complexity.
在本申请的一些实施例中,上述将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标包括:In some embodiments of the present application, the above-mentioned parallel mapping of the minimum text coordinates to the maximum picture corresponding to the target detection picture, and obtaining the detected text coordinates of the target detection picture includes:
获取预设映射比例,按照所述预设映射比例将所述最小文本坐标并行放大,得到所述目标检测图片的检测文本坐标。A preset mapping ratio is acquired, and the minimum text coordinates are enlarged in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
在本实施例中,在获取复杂度高的目标检测图片对应的目标文本坐标时,可通过获取预设映射比例,根据该预设映射比例将最小文本坐标并行映射至最大图片,得到目标检测图片的检测文本坐标。具体地,预设映射比例为第二标注模型对目标检测图片进行缩放时的预设比例,比值范围为0至1,如将该预设映射比例取0.4。在得到该预设映射比例时,按照该预设映射比例,将得到的所有最小文本坐标同时进行放大,即得到目标检测图片的检测文本坐标。In this embodiment, when obtaining the target text coordinates corresponding to the target detection picture with high complexity, the target detection picture can be obtained by obtaining a preset mapping ratio, and mapping the minimum text coordinates to the largest picture in parallel according to the preset mapping ratio. The detected text coordinates. Specifically, the preset mapping ratio is a preset ratio when the second annotation model scales the target detection picture, and the ratio ranges from 0 to 1. For example, the preset mapping ratio is 0.4. When the preset mapping ratio is obtained, all the obtained minimum text coordinates are simultaneously enlarged according to the preset mapping ratio, that is, the detected text coordinates of the target detection image are obtained.
本实施例通过按照预设映射比例对最小文本坐标进行放大,实现了对目标检测图片的检测文本坐标的精确获取,使得通过检测文本坐标能够对目标检测图片的文本信息进行精确定位,避免了在对复杂度高的图片进行文字检测时可能出现的文字错乱。In this embodiment, by enlarging the minimum text coordinates according to the preset mapping ratio, the accurate acquisition of the detected text coordinates of the target detection picture is realized, so that the text information of the target detection picture can be accurately located by detecting the text coordinates, avoiding the need for Text confusion may occur when text detection is performed on pictures with high complexity.
在本申请的一些实施例中,上述根据预设检测模型计算所述目标检测图片的复杂度包括:In some embodiments of the present application, the above calculation of the complexity of the target detection picture according to the preset detection model includes:
输入所述目标检测图片至所述预设检测模型的卷积层,经过池化层和全连接层,输出得到检测结果值;Inputting the target detection picture to the convolution layer of the preset detection model, and outputting the detection result value through the pooling layer and the fully connected layer;
根据预设的二分类损失函数对该检测结果值进行预测,得到所述目标检测图片的复杂度。The detection result value is predicted according to a preset two-class loss function to obtain the complexity of the target detection picture.
在本实施例中,预设检测模型包括卷积层、池化层和全连接层。在得到目标检测图片 时,则获取该目标检测图片的长、宽和通道数。将该目标检测图片的长、宽和通道数输入至预设检测模型中的卷积层,之后经过池化层和全连接层,输出得到目标检测图片的检测结果值。在得到检测结果值时,将第一结果值通过预设的二分类损失函数计算,即得到当前目标检测图片的复杂度。其中,复杂度可用p表示,p的范围为0至1之间,p越大,则表示目标检测图片中的文字越小,字与字之间的间隔越小,目标检测图片的复杂度越高;p越小,则表示目标检测图片中的文字越大,字与字之间的间隔越大,目标检测图片的复杂度越低。In this embodiment, the preset detection model includes a convolution layer, a pooling layer, and a fully connected layer. When the target detection picture is obtained, the length, width and channel number of the target detection picture are obtained. Input the length, width and number of channels of the target detection image to the convolution layer in the preset detection model, and then go through the pooling layer and the fully connected layer to output the detection result value of the target detection image. When the detection result value is obtained, the first result value is calculated by the preset two-class loss function, that is, the complexity of the current target detection picture is obtained. Among them, the complexity can be represented by p, and the range of p is between 0 and 1. The larger p is, the smaller the text in the target detection picture, the smaller the interval between words, and the higher the complexity of the target detection picture. High; the smaller the p, the larger the text in the target detection picture, the larger the interval between words, and the lower the complexity of the target detection picture.
本实施例通过在得到目标检测图片时,计算该目标检测图片的复杂度,实现了根据复杂度对目标检测图片的分类检测,进一步提高了目标检测图片的检测效率。In this embodiment, when the target detection picture is obtained, the complexity of the target detection picture is calculated, so as to realize the classification and detection of the target detection picture according to the complexity, and further improve the detection efficiency of the target detection picture.
在本申请的一些实施例中,在上述根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量的步骤之前还包括:In some embodiments of the present application, before the above step of acquiring the feature vector of the target detection picture according to the first annotation model in the preset annotation model, the method further includes:
获取初始文本图片,划分所述初始文本图片为训练图片和测试图片,输入所述训练图片至预设的基础标注模型中,得到所述训练图片的标注文本坐标;Obtaining an initial text picture, dividing the initial text picture into a training picture and a test picture, inputting the training picture into a preset basic labeling model, and obtaining the labeling text coordinates of the training picture;
根据所述标注文本坐标计算所述基础标注模型的损失函数,在所述损失函数收敛时,确定所述基础标注模型为训练后的基础标注模型;Calculate the loss function of the basic labeling model according to the coordinates of the labeling text, and when the loss function converges, determine that the basic labeling model is the trained basic labeling model;
根据所述测试图片对所述训练后的基础标注模型进行验证,在所述训练后的基础标注模型对所述测试图片的验证通过率大于等于预设通过率时,确定所述训练后的基础标注模型为预设标注模型。The trained basic labeling model is verified according to the test picture, and when the verification pass rate of the trained basic labeling model on the test picture is greater than or equal to a preset pass rate, the trained basic labeling model is determined. The annotation model is a preset annotation model.
在本实施例中,在根据预设标注模型对目标检测图片进行标注之前,需要预先建立基础标注模型,对该基础标注模型进行训练,得到预设标注模型。预设标注模型包括第一标注模型和第二标注模,第一标注模型用于处理低复杂度的目标检测图片,第二标注模型用于处理高复杂度的目标检测图片。第一标注模型和第二标注模型具有不同的网络结构,但第一标注模型和第二标注模型均可通过同样的训练方式训练得到。具体地,获取初始文本图片,初始文本图片为预先采集的多张文本图片,将该初始文本图片划分为训练图片和测试图片。基于基础标注模型对训练图片的初始文本坐标进行检测,该基础标注模型可以为第一标注模型的网络结构也可以为第二标注模型的网络结构。根据该基础标注模型检测得到训练图片的初始文本坐标,同时根据预设标注工具对训练图片进行标注,得到训练图片的标注文本坐标。根据该初始文本坐标和标注文本坐标对基础标注模型进行训练,即根据该标注文本坐标与初始文本坐标计算基础标注模型的损失函数,在该损失函数收敛时,则得到训练后的基础标注模型。在得到训练后的基础标注模型时,根据测试图片对该训练后的基础标注模型进行测试。若测试图片通过该训练后的基础标注模型检测得到的初始文本坐标,与该测试图片对应的标注文本坐标的相似度大于等于预设相似阈值,则确定该训练后的基础标注模型对该测试图片验证通过。在该训练后的基础标注模型对测试图片的验证通过率大于等于预设通过率时,确定训练后的基础标注模型为预设标注模型。In this embodiment, before annotating the target detection picture according to the preset annotation model, a basic annotation model needs to be established in advance, and the basic annotation model is trained to obtain the preset annotation model. The preset labeling model includes a first labeling model and a second labeling model. The first labeling model is used for processing low-complexity target detection pictures, and the second labeling model is used for processing high-complexity target detection pictures. The first labeling model and the second labeling model have different network structures, but both the first labeling model and the second labeling model can be trained by the same training method. Specifically, an initial text picture is obtained, where the initial text picture is a plurality of pre-collected text pictures, and the initial text picture is divided into a training picture and a test picture. The initial text coordinates of the training picture are detected based on a basic labeling model, where the basic labeling model may be the network structure of the first labeling model or the network structure of the second labeling model. According to the basic labeling model, the initial text coordinates of the training picture are obtained, and at the same time, the training picture is labelled according to the preset labeling tool, and the labelled text coordinates of the training picture are obtained. The basic labeling model is trained according to the initial text coordinates and the labeling text coordinates, that is, the loss function of the basic labeling model is calculated according to the labeling text coordinates and the initial text coordinates. When the loss function converges, the trained basic labeling model is obtained. When the trained basic labeling model is obtained, the trained basic labeling model is tested according to the test image. If the similarity between the initial text coordinates detected by the trained basic labeling model and the labeling text coordinates corresponding to the test picture is greater than or equal to a preset similarity threshold, then the trained basic labeling model is determined to be the same as the test picture. Verification passed. When the verification pass rate of the trained basic annotation model for the test image is greater than or equal to the preset pass rate, the trained basic annotation model is determined to be the preset annotation model.
本实施例通过预先对基础标注模型进行训练,使得训练得到的预设标注模型能够对图片进行精确地文字检测,节省了图片文字检测的标注时长,提高了图片文字检测的效率。In this embodiment, the basic labeling model is trained in advance, so that the preset labeling model obtained by training can accurately detect the text of the picture, save the labeling time of the picture and text detection, and improve the efficiency of the picture and text detection.
在本申请的一些实施例中,上述根据所述标注文本坐标计算所述基础标注模型的损失函数包括:In some embodiments of the present application, the above-mentioned calculation of the loss function of the basic annotation model according to the coordinates of the annotation text includes:
基于预设标注工具对所述训练图片进行标注,得到所述训练图片的初始文本坐标;Annotate the training picture based on a preset labeling tool to obtain initial text coordinates of the training picture;
计算所述初始文本坐标和所述标注文本坐标的平方差,根据所述平方差计算得到所述基础标注模型的损失函数。Calculate the squared difference between the initial text coordinates and the labeled text coordinates, and calculate the loss function of the basic labeling model according to the squared difference.
在本实施例中,在得到训练图片的初始文本坐标时,根据预设标注工具对训练图片进行标注,得到该训练图片的标注文本坐标。计算该初始文本坐标和标注文本坐标的平方差,根据平方差即可计算得到该基础标注模型的损失函数。该基础标注模型的损失函数的计算公式如下所示:In this embodiment, when the initial text coordinates of the training picture are obtained, the training picture is labeled according to a preset labeling tool, and the labeled text coordinates of the training picture are obtained. Calculate the squared difference between the initial text coordinates and the labeled text coordinates, and then calculate the loss function of the basic labeling model according to the squared difference. The calculation formula of the loss function of the basic annotation model is as follows:
Figure PCTCN2021090512-appb-000001
Figure PCTCN2021090512-appb-000001
其中,ο k为初始文本坐标,
Figure PCTCN2021090512-appb-000002
为标注文本坐标。
where ο k is the initial text coordinate,
Figure PCTCN2021090512-appb-000002
coordinates for the label text.
本实施例通过对基础标注模型的损失函数进行计算,节省了基础标注模型的训练时长,提高了基础标注模型的训练效率。In this embodiment, by calculating the loss function of the basic labeling model, the training time of the basic labeling model is saved, and the training efficiency of the basic labeling model is improved.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种图片文字检测装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a picture and text detection device, and the device embodiment corresponds to the method embodiment shown in FIG. 2 . Can be used in various electronic devices.
如图3所示,本实施例所述的图片文字检测装置300包括:检测模块301、标注模块302、确认模块303以及提取模块304。其中:As shown in FIG. 3 , the image and text detection apparatus 300 in this embodiment includes: a detection module 301 , a labeling module 302 , a confirmation module 303 , and an extraction module 304 . in:
检测模块301,用于在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;A detection module 301, configured to calculate the complexity of the target detection picture according to a preset detection model when the target detection picture is received;
其中,所述检测模块301包括:Wherein, the detection module 301 includes:
第一计算单元,用于输入所述目标检测图片至所述预设检测模型的卷积层,经过池化层和全连接层,输出得到检测结果值;a first computing unit, used for inputting the target detection picture to the convolution layer of the preset detection model, and outputting the detection result value through the pooling layer and the fully connected layer;
第二计算单元,用于根据预设的二分类损失函数对该检测结果值进行预测,得到所述目标检测图片的复杂度。The second calculation unit is configured to predict the detection result value according to the preset binary classification loss function, and obtain the complexity of the target detection picture.
在本实施例中,目标检测图片为包括有目标文本的检测图片,根据预设检测模型计算该目标检测图片的复杂度;其中,预设检测模型为预先设定的图片复杂度检测模型,如基于VGG16的轻量级卷积神经网络判别模型。具体地,将目标检测图片输入至该预设检测模型中,基于该预设检测模型的卷积层、池化层和全连接层对该目标检测图片的长、宽、通道数进行计算,输出得到该目标检测图片的检测结果值;之后根据二分类损失函数对该检测结果值进行计算,即得到当前目标检测图片的复杂度。In this embodiment, the target detection picture is a detection picture including target text, and the complexity of the target detection picture is calculated according to a preset detection model; wherein, the preset detection model is a preset picture complexity detection model, such as A lightweight convolutional neural network discriminant model based on VGG16. Specifically, input the target detection picture into the preset detection model, calculate the length, width and channel number of the target detection picture based on the convolution layer, pooling layer and fully connected layer of the preset detection model, and output The detection result value of the target detection picture is obtained; then the detection result value is calculated according to the two-class loss function, that is, the complexity of the current target detection picture is obtained.
标注模块302,用于在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;The labeling module 302 is configured to obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model when the complexity is low complexity, and calculate the target detection picture according to the feature vector The target text coordinates of the first text box in ;
在本实施例中,将复杂度按照预设值可以划分为低复杂度和高复杂度,小于等于该预设值的复杂度即为低复杂度,大于该预设值的复杂度即为高复杂度。在目标检测图片的复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取目标检测图片的目标文本坐标。其中,预设标注模型为预先设定的文本坐标检测模型,包括第一标注模型和第二标注模型。根据第一标注模型对低复杂度的目标检测图片进行检测,可以得到目标检测图片的目标文本坐标;根据第二标注模型可以对高复杂度的目标检测图片进行检测,可以得到目标检测图片的检测文本坐标,根据该目标文本坐标和检测文本坐标则可以分别得到低复杂 度和高复杂度目标检测图片的检测文本。具体地,目标文本坐标和检测文本坐标均由目标检测图片中各个文本框的左下角、右下角、左上角和右上角坐标组成。在目标检测图片的复杂度为低复杂度时,获取该目标检测图片的特征图,以及预设的检测特征框。基于第一标注模型对该特征图片与检测特征框进行计算,得到目标检测图片的特征向量;将该第二特征向量经过该第一标注模型中的双向长短期记忆网络、全连接层和回归层,输出得到当前目标检测图片的目标文本坐标。In this embodiment, the complexity can be divided into low complexity and high complexity according to the preset value, the complexity less than or equal to the preset value is the low complexity, and the complexity greater than the preset value is the high complexity the complexity. When the complexity of the target detection picture is low complexity, the target text coordinates of the target detection picture are acquired according to the first labeling model in the preset labeling model. The preset annotation model is a preset text coordinate detection model, including a first annotation model and a second annotation model. Detecting a low-complexity target detection picture according to the first annotation model can obtain the target text coordinates of the target detection picture; according to the second labeling model, a high-complexity target detection picture can be detected, and the detection of the target detection picture can be obtained. Text coordinates. According to the target text coordinates and the detected text coordinates, the detected texts of the low-complexity and high-complexity target detection images can be obtained respectively. Specifically, the coordinates of the target text and the coordinates of the detected text are composed of the coordinates of the lower left corner, the lower right corner, the upper left corner and the upper right corner of each text box in the target detection picture. When the complexity of the target detection picture is low complexity, a feature map of the target detection picture and a preset detection feature frame are acquired. Calculate the feature picture and the detection feature frame based on the first labeling model to obtain the feature vector of the target detection picture; the second feature vector is passed through the bidirectional long short-term memory network, the fully connected layer and the regression layer in the first labeling model. , and output the target text coordinates of the current target detection image.
确认模块303,用于根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;Confirmation module 303, configured to calculate the center coordinates of the target detection picture according to the target text coordinates, fuse the first text box whose center coordinates are less than or equal to a preset error value into a new text box, and combine the center coordinates A first text box larger than the preset error value is determined as a fixed text box;
其中,所述预设误差值包括第一误差值和第二误差值,所述确认模块303包括:Wherein, the preset error value includes a first error value and a second error value, and the confirmation module 303 includes:
获取单元,用于获取相邻的两个所述中心坐标的y轴坐标的第一像素差值,以及所述中心坐标的x轴坐标的第二像素差值;an acquisition unit, configured to acquire the first pixel difference between the y-axis coordinates of the two adjacent center coordinates, and the second pixel difference between the x-axis coordinates of the center coordinates;
确认单元,用于将所述第一像素差值小于等于所述第一误差值,且所述第二像素差值小于等于所述第二误差值的第一文本框融合为新文本框。A confirmation unit, configured to merge a first text box whose first pixel difference value is less than or equal to the first error value and the second pixel difference value is less than or equal to the second error value into a new text box.
在本实施例中,第一文本框为根据第一标注模型对目标图片进行检测得到的文本框,中心坐标为每个目标检测图片中第一文本框的均值坐标。计算目标检测图片中每个第一文本框的目标文本坐标的x均值和y均值,将x均值和y均值作为对应的第一文本框的中心坐标。在得到每个第一文本框对应的中心坐标时,将中心坐标小于等于预设误差值的第一文本框融合为一个新文本框。其中,新文本框的左下角坐标取所融合的第一文本框中目标文本坐标的最小x值和最小y值,新文本框的右上角坐标取所融合的第一文本框中目标文本坐标的最大x值和最大y值,新文本框的右下角坐标取所融合的第一文本框中目标文本坐标的最大x值和最小y值,新文本框的左上角坐标取所融合的第一文本框中目标文本坐标的最小x值和最大y值。将中心坐标大于预设误差值的第一文本框则确定为固定文本框。In this embodiment, the first text box is a text box obtained by detecting the target picture according to the first annotation model, and the center coordinates are the mean coordinates of the first text boxes in each target detection picture. Calculate the x mean value and the y mean value of the target text coordinates of each first text box in the target detection image, and use the x mean value and the y mean value as the center coordinates of the corresponding first text box. When the center coordinates corresponding to each first text box are obtained, the first text boxes whose center coordinates are less than or equal to the preset error value are merged into a new text box. The coordinates of the lower left corner of the new text box take the minimum x value and the minimum y value of the coordinates of the target text in the fused first text box, and the coordinates of the upper right corner of the new text box take the coordinates of the target text in the first fused text box. The maximum x value and the maximum y value, the coordinates of the lower right corner of the new text box take the maximum x value and the minimum y value of the target text coordinates in the first text box fused, and the coordinates of the upper left corner of the new text box take the first fused text The minimum x and maximum y values for the coordinates of the target text in the box. The first text box whose center coordinates are greater than the preset error value is determined as a fixed text box.
提取模块304,用于提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。The extraction module 304 is configured to extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
在本实施例中,在得到新文本框和固定文本框时,提取该新文本框和固定文本框中的文本信息,将该文本信息按照文本框的排列顺序进行排列,即得到目标检测图片的检测文本。In this embodiment, when a new text box and a fixed text box are obtained, the text information in the new text box and the fixed text box is extracted, and the text information is arranged in the order of the text boxes, that is, the target detection picture is obtained. Detect text.
需要强调的是,为进一步保证上述检测文本的私密和安全性,上述检测文本还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above detection text, the above detection text can also be stored in a node of a blockchain.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
本实施例中提出的图片文字检测装置还包括:The picture and text detection device proposed in this embodiment also includes:
获取模块,用于在所述复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取所述目标检测图片对应的最小图片,以及所述最小图片中第二文本框的最小文本坐标;The obtaining module is configured to obtain the minimum picture corresponding to the target detection picture according to the second labeling model in the preset labeling model when the complexity is high, and the minimum picture of the second text box in the minimum picture. text coordinates;
映射模块,用于将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标,根据所述检测文本坐标计算得到所述目标检测图片对应的检测文本。The mapping module is used to map the minimum text coordinates to the maximum picture corresponding to the target detection picture in parallel, to obtain the detected text coordinates of the target detection picture, and to calculate and obtain the corresponding target detection picture according to the detected text coordinates. Detect text.
其中,所述映射模块包括:Wherein, the mapping module includes:
映射单元,用于获取预设映射比例,按照所述预设映射比例将所述最小文本坐标并行放大,得到所述目标检测图片的检测文本坐标。The mapping unit is configured to obtain a preset mapping ratio, and enlarge the minimum text coordinates in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
在本实施例中,在目标检测图片的复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取目标检测图片对应的最小图片中第二文本框的最小文本坐标。其中,第二文 本框为根据第二标注模型对目标检测图片进行检测得到的文本框,第二标注模型则为预先训练完成的高复杂度的标注模型。根据该第二标注模型获取目标检测图片对应的最小图片,该最小图片为该目标检测图片缩放后的最小图片,通过第二标注模型可以对目标检测图片进行像素缩放,从而得到最小图片。在得到最小图片时,基于该第二标注模型对该最小图片中的第二文本框进行检测,由此即可得到该最小图片中第二文本框对应的最小文本坐标。在得到该最小文本坐标时,将该最小文本坐标映射至目标检测图片对应的最大图片,即同时将得到的所有最小文本坐标按照最小图片与最大图片之间的预设映射比例进行放大,则得到检测文本坐标。获取该目标文本坐标中的文本内容,即得到该目标检测图片的检测文本。In this embodiment, when the complexity of the target detection picture is high, the minimum text coordinates of the second text box in the minimum picture corresponding to the target detection picture are obtained according to the second labeling model in the preset labeling model. The second text box is a text box obtained by detecting the target detection picture according to the second labeling model, and the second labeling model is a pre-trained high-complexity labeling model. The minimum picture corresponding to the target detection picture is obtained according to the second labeling model, and the minimum picture is the minimum picture after scaling the target detection picture. The second labeling model can perform pixel scaling on the target detection picture to obtain the minimum picture. When the minimum picture is obtained, the second text box in the minimum picture is detected based on the second labeling model, thereby obtaining the minimum text coordinates corresponding to the second text box in the minimum picture. When the minimum text coordinates are obtained, the minimum text coordinates are mapped to the maximum picture corresponding to the target detection picture, that is, all the obtained minimum text coordinates are enlarged according to the preset mapping ratio between the minimum picture and the maximum picture at the same time. Detect text coordinates. Obtaining the text content in the target text coordinates is to obtain the detection text of the target detection image.
划分模块,用于获取初始文本图片,划分所述初始文本图片为训练图片和测试图片,输入所述训练图片至预设的基础标注模型中,得到所述训练图片的标注文本坐标;a division module, configured to obtain an initial text picture, divide the initial text picture into a training picture and a test picture, input the training picture into a preset basic labeling model, and obtain the labeling text coordinates of the training picture;
训练模块,用于根据所述标注文本坐标计算所述基础标注模型的损失函数,在所述损失函数收敛时,确定所述基础标注模型为训练后的基础标注模型;A training module, configured to calculate the loss function of the basic labeling model according to the coordinates of the labeling text, and when the loss function converges, determine that the basic labeling model is the trained basic labeling model;
验证模块,用于根据所述测试图片对所述训练后的基础标注模型进行验证,在所述训练后的基础标注模型对所述测试图片的验证通过率大于等于预设通过率时,确定所述训练后的基础标注模型为预设标注模型。A verification module, configured to verify the trained basic labeling model according to the test picture, and determine the The trained basic annotation model is the preset annotation model.
其中,所述训练模块包括:Wherein, the training module includes:
标注单元,用于基于预设标注工具对所述训练图片进行标注,得到所述训练图片的初始文本坐标;a labeling unit, configured to label the training picture based on a preset labeling tool to obtain initial text coordinates of the training picture;
第三计算单元,用于计算所述初始文本坐标和所述标注文本坐标的平方差,根据所述平方差计算得到所述基础标注模型的损失函数。The third calculation unit is configured to calculate the squared difference between the initial text coordinates and the labeled text coordinates, and calculate the loss function of the basic labeling model according to the squared difference.
在本实施例中,在根据预设标注模型对目标检测图片进行标注之前,需要预先建立基础标注模型,对该基础标注模型进行训练,得到预设标注模型。预设标注模型包括第一标注模型和第二标注模,第一标注模型用于处理低复杂度的目标检测图片,第二标注模型用于处理高复杂度的目标检测图片。第一标注模型和第二标注模型具有不同的网络结构,但第一标注模型和第二标注模型均可通过同样的训练方式训练得到。具体地,获取初始文本图片,初始文本图片为预先采集的多张文本图片,将该初始文本图片划分为训练图片和测试图片。基于基础标注模型对训练图片的初始文本坐标进行检测,该基础标注模型可以为第一标注模型的网络结构也可以为第二标注模型的网络结构。根据该基础标注模型检测得到训练图片的初始文本坐标,同时根据预设标注工具对训练图片进行标注,得到训练图片的标注文本坐标。根据该初始文本坐标和标注文本坐标对基础标注模型进行训练,即根据该标注文本坐标与初始文本坐标计算基础标注模型的损失函数,在该损失函数收敛时,则得到训练后的基础标注模型。在得到训练后的基础标注模型时,根据测试图片对该训练后的基础标注模型进行测试。若测试图片通过该训练后的基础标注模型检测得到的初始文本坐标,与该测试图片对应的标注文本坐标的相似度大于等于预设相似阈值,则确定该训练后的基础标注模型对该测试图片验证通过。在该训练后的基础标注模型对测试图片的验证通过率大于等于预设通过率时,确定训练后的基础标注模型为预设标注模型。In this embodiment, before the target detection picture is annotated according to the preset annotation model, a basic annotation model needs to be established in advance, and the basic annotation model is trained to obtain the preset annotation model. The preset labeling model includes a first labeling model and a second labeling model. The first labeling model is used for processing low-complexity target detection pictures, and the second labeling model is used for processing high-complexity target detection pictures. The first labeling model and the second labeling model have different network structures, but both the first labeling model and the second labeling model can be trained by the same training method. Specifically, an initial text picture is obtained, where the initial text picture is a plurality of pre-collected text pictures, and the initial text picture is divided into a training picture and a test picture. The initial text coordinates of the training image are detected based on a basic labeling model, where the basic labeling model may be the network structure of the first labeling model or the network structure of the second labeling model. According to the basic labeling model, the initial text coordinates of the training picture are obtained, and at the same time, the training picture is labelled according to the preset labeling tool, and the labelled text coordinates of the training picture are obtained. The basic labeling model is trained according to the initial text coordinates and the labeling text coordinates, that is, the loss function of the basic labeling model is calculated according to the labeling text coordinates and the initial text coordinates. When the loss function converges, the trained basic labeling model is obtained. When the trained basic labeling model is obtained, the trained basic labeling model is tested according to the test image. If the similarity between the initial text coordinates detected by the trained basic labeling model and the labeling text coordinates corresponding to the test picture is greater than or equal to the preset similarity threshold, it is determined that the trained basic labeling model is the same as the test picture. Verification passed. When the verification pass rate of the trained basic labeling model for the test image is greater than or equal to the preset pass rate, the trained basic labeling model is determined to be the preset labeling model.
本实施例提出的图片文字检测装置,实现了对不同复杂度图片的文字检测,减少了人工标注成本,节省了模型处理的响应时长,进一步提高了图片文字检测的效率和准确率。The image text detection device proposed in this embodiment realizes text detection on images of different complexity, reduces manual labeling costs, saves the response time of model processing, and further improves the efficiency and accuracy of image text detection.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. For details, please refer to FIG. 4 , which is a block diagram of a basic structure of a computer device according to this embodiment.
所述计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是,图中仅示出了具有组件61-63的计算机设备6,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application  Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 that communicate with each other through a system bus. It should be pointed out that only the computer device 6 with components 61-63 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
所述存储器61至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些实施例中,所述存储器61可以是所述计算机设备6的内部存储单元,例如该计算机设备6的硬盘或内存。在另一些实施例中,所述存储器61也可以是所述计算机设备6的外部存储设备,例如该计算机设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器61还可以既包括所述计算机设备6的内部存储单元也包括其外部存储设备。本实施例中,所述存储器61通常用于存储安装于所述计算机设备6的操作系统和各类应用软件,例如图片文字检测方法的计算机可读指令等。此外,所述存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. The computer-readable storage medium may be non-volatile or volatile. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 . In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed on the computer device 6 , such as computer-readable instructions of a picture and text detection method. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器62在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制所述计算机设备6的总体操作。本实施例中,所述处理器62用于运行所述存储器61中存储的计算机可读指令或者处理数据,例如运行所述图片文字检测方法的计算机可读指令。In some embodiments, the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 62 is typically used to control the overall operation of the computer device 6 . In this embodiment, the processor 62 is configured to execute computer-readable instructions or process data stored in the memory 61, for example, computer-readable instructions for executing the image and text detection method.
所述网络接口63可包括无线网络接口或有线网络接口,该网络接口63通常用于在所述计算机设备6与其他电子设备之间建立通信连接。The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
本实施例提出的计算机设备,实现了对不同复杂度图片的文字检测,减少了人工标注成本,节省了模型处理的响应时长,进一步提高了图片文字检测的效率和准确率。The computer device proposed in this embodiment realizes text detection for pictures of different complexity, reduces manual labeling costs, saves response time for model processing, and further improves the efficiency and accuracy of picture text detection.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的图片文字检测方法的步骤。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned picture text detection method.
本实施例提出的计算机可读存储介质,实现了对不同复杂度图片的文字检测,减少了人工标注成本,节省了模型处理的响应时长,进一步提高了图片文字检测的效率和准确率。The computer-readable storage medium proposed in this embodiment realizes text detection for pictures of different complexity, reduces manual labeling costs, saves the response time of model processing, and further improves the efficiency and accuracy of picture text detection.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims (20)

  1. 一种图片文字检测方法,包括下述步骤:A method for detecting text in pictures, comprising the following steps:
    在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;When receiving the target detection picture, calculate the complexity of the target detection picture according to the preset detection model;
    在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;When the complexity is low complexity, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
    根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;According to the target text coordinates, the center coordinates of the target detection image are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value. The first text box of the value is determined to be a fixed text box;
    提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  2. 根据权利要求1所述的图片文字检测方法,其中,所述预设误差值包括第一误差值和第二误差值,所述将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框的步骤具体包括:The image text detection method according to claim 1, wherein the preset error value includes a first error value and a second error value, and the first text box whose center coordinate is less than or equal to the preset error value is fused The steps for the new text box specifically include:
    获取相邻的两个所述中心坐标的y轴坐标的第一像素差值,以及所述中心坐标的x轴坐标的第二像素差值;Obtain the first pixel difference value of the y-axis coordinates of the two adjacent center coordinates, and the second pixel difference value of the x-axis coordinates of the center coordinates;
    将所述第一像素差值小于等于所述第一误差值,且所述第二像素差值小于等于所述第二误差值的第一文本框融合为新文本框。A first text box whose first pixel difference value is less than or equal to the first error value and whose second pixel difference value is less than or equal to the second error value is merged into a new text box.
  3. 根据权利要求1所述的图片文字检测方法,其中,在所述根据预设检测模型计算所述目标检测图片的复杂度的步骤之后包括:The image text detection method according to claim 1, wherein after the step of calculating the complexity of the target detection image according to a preset detection model, the method comprises:
    在所述复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取所述目标检测图片对应的最小图片,以及所述最小图片中第二文本框的最小文本坐标;When the complexity is high complexity, obtain the minimum picture corresponding to the target detection picture and the minimum text coordinates of the second text box in the minimum picture according to the second labeling model in the preset labeling model;
    将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标,根据所述检测文本坐标计算得到所述目标检测图片对应的检测文本。The minimum text coordinates are mapped to the largest picture corresponding to the target detection picture in parallel to obtain the detected text coordinates of the target detection picture, and the detected text corresponding to the target detection picture is calculated according to the detected text coordinates.
  4. 根据权利要求3所述的图片文字检测方法,其中,所述将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标的步骤具体包括:The method for detecting text in pictures according to claim 3, wherein the step of mapping the minimum text coordinates to the largest picture corresponding to the target detection picture in parallel, and obtaining the detected text coordinates of the target detection picture specifically comprises:
    获取预设映射比例,按照所述预设映射比例将所述最小文本坐标并行放大,得到所述目标检测图片的检测文本坐标。A preset mapping ratio is acquired, and the minimum text coordinates are enlarged in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
  5. 根据权利要求1所述的图片文字检测方法,其中,所述根据预设检测模型计算所述目标检测图片的复杂度的步骤具体包括:The image text detection method according to claim 1, wherein the step of calculating the complexity of the target detection image according to a preset detection model specifically comprises:
    输入所述目标检测图片至所述预设检测模型的卷积层,经过池化层和全连接层,输出得到检测结果值;Inputting the target detection picture to the convolution layer of the preset detection model, and outputting the detection result value through the pooling layer and the fully connected layer;
    根据预设的二分类损失函数对该检测结果值进行预测,得到所述目标检测图片的复杂度。The detection result value is predicted according to a preset two-class loss function to obtain the complexity of the target detection picture.
  6. 根据权利要求1所述的图片文字检测方法,其中,在所述根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量的步骤之前还包括:The picture text detection method according to claim 1, wherein before the step of acquiring the feature vector of the target detection picture according to the first labeling model in the preset labeling model, it further comprises:
    获取初始文本图片,划分所述初始文本图片为训练图片和测试图片,输入所述训练图片至预设的基础标注模型中,得到所述训练图片的标注文本坐标;Obtaining an initial text picture, dividing the initial text picture into a training picture and a test picture, inputting the training picture into a preset basic labeling model, and obtaining the labeling text coordinates of the training picture;
    根据所述标注文本坐标计算所述基础标注模型的损失函数,在所述损失函数收敛时,确定所述基础标注模型为训练后的基础标注模型;Calculate the loss function of the basic labeling model according to the coordinates of the labeling text, and when the loss function converges, determine that the basic labeling model is the trained basic labeling model;
    根据所述测试图片对所述训练后的基础标注模型进行验证,在所述训练后的基础标注模型对所述测试图片的验证通过率大于等于预设通过率时,确定所述训练后的基础标注模型为预设标注模型。The trained basic labeling model is verified according to the test picture, and when the verification pass rate of the trained basic labeling model on the test picture is greater than or equal to a preset pass rate, the trained basic labeling model is determined. The annotation model is a preset annotation model.
  7. 根据权利要求6所述的图片文字检测方法,其中,所述根据所述标注文本坐标计算 所述基础标注模型的损失函数的步骤具体包括:The picture text detection method according to claim 6, wherein, the step of calculating the loss function of the basic labeling model according to the labeling text coordinates specifically includes:
    基于预设标注工具对所述训练图片进行标注,得到所述训练图片的初始文本坐标;Annotate the training picture based on a preset labeling tool to obtain initial text coordinates of the training picture;
    计算所述初始文本坐标和所述标注文本坐标的平方差,根据所述平方差计算得到所述基础标注模型的损失函数。Calculate the squared difference between the initial text coordinates and the labeled text coordinates, and calculate the loss function of the basic labeling model according to the squared difference.
  8. 一种图片文字检测装置,包括:A picture and text detection device, comprising:
    检测模块,用于在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;a detection module, configured to calculate the complexity of the target detection picture according to a preset detection model when the target detection picture is received;
    标注模块,用于在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;The labeling module is configured to obtain the feature vector of the target detection picture according to the first labeling model in the preset labeling model when the complexity is low complexity, and obtain the target detection picture according to the feature vector calculation. The target text coordinates of the first text box;
    确认模块,用于根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;A confirmation module, configured to calculate the center coordinates of the target detection image according to the target text coordinates, fuse the first text box with the center coordinates less than or equal to a preset error value into a new text box, and set the center coordinates greater than The first text box of the preset error value is determined to be a fixed text box;
    提取模块,用于提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。The extraction module is configured to extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时还实现如下步骤:A computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor also implements the following steps when executing the computer-readable instructions:
    在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;When receiving the target detection picture, calculate the complexity of the target detection picture according to the preset detection model;
    在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;When the complexity is low complexity, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
    根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;According to the target text coordinates, the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value. The first text box of the value is determined to be a fixed text box;
    提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  10. 根据权利要求9所述的计算机设备,其中,所述预设误差值包括第一误差值和第二误差值,所述将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框的步骤具体包括:The computer device according to claim 9, wherein the preset error value includes a first error value and a second error value, and the first text box whose center coordinate is less than or equal to the preset error value is fused into a new The steps of the text box include:
    获取相邻的两个所述中心坐标的y轴坐标的第一像素差值,以及所述中心坐标的x轴坐标的第二像素差值;Obtain the first pixel difference value of the y-axis coordinates of the two adjacent center coordinates, and the second pixel difference value of the x-axis coordinates of the center coordinates;
    将所述第一像素差值小于等于所述第一误差值,且所述第二像素差值小于等于所述第二误差值的第一文本框融合为新文本框。A first text box whose first pixel difference value is less than or equal to the first error value and whose second pixel difference value is less than or equal to the second error value is merged into a new text box.
  11. 根据权利要求9所述的计算机设备,其中,在所述根据预设检测模型计算所述目标检测图片的复杂度的步骤之后包括:The computer device according to claim 9, wherein after the step of calculating the complexity of the target detection picture according to the preset detection model, it comprises:
    在所述复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取所述目标检测图片对应的最小图片,以及所述最小图片中第二文本框的最小文本坐标;When the complexity is high complexity, obtain the minimum picture corresponding to the target detection picture and the minimum text coordinates of the second text box in the minimum picture according to the second labeling model in the preset labeling model;
    将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标,根据所述检测文本坐标计算得到所述目标检测图片对应的检测文本。The minimum text coordinates are mapped to the largest picture corresponding to the target detection picture in parallel to obtain the detected text coordinates of the target detection picture, and the detected text corresponding to the target detection picture is calculated according to the detected text coordinates.
  12. 根据权利要求11所述的计算机设备,其中,所述将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标的步骤具体包括:The computer device according to claim 11, wherein the step of mapping the minimum text coordinates to the maximum picture corresponding to the target detection picture in parallel, and obtaining the detected text coordinates of the target detection picture specifically comprises:
    获取预设映射比例,按照所述预设映射比例将所述最小文本坐标并行放大,得到所述目标检测图片的检测文本坐标。A preset mapping ratio is acquired, and the minimum text coordinates are enlarged in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
  13. 根据权利要求9所述的计算机设备,其中,所述根据预设检测模型计算所述目标 检测图片的复杂度的步骤具体包括:computer equipment according to claim 9, wherein, the described step of calculating the complexity of described target detection picture according to preset detection model specifically comprises:
    输入所述目标检测图片至所述预设检测模型的卷积层,经过池化层和全连接层,输出得到检测结果值;Inputting the target detection picture to the convolution layer of the preset detection model, and outputting the detection result value through the pooling layer and the fully connected layer;
    根据预设的二分类损失函数对该检测结果值进行预测,得到所述目标检测图片的复杂度。The detection result value is predicted according to a preset two-class loss function to obtain the complexity of the target detection picture.
  14. 根据权利要求9所述的计算机设备,其中,在所述根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量的步骤之前还包括:The computer device according to claim 9, wherein before the step of acquiring the feature vector of the target detection picture according to the first labeling model in the preset labeling model, it further comprises:
    获取初始文本图片,划分所述初始文本图片为训练图片和测试图片,输入所述训练图片至预设的基础标注模型中,得到所述训练图片的标注文本坐标;Obtaining an initial text picture, dividing the initial text picture into a training picture and a test picture, inputting the training picture into a preset basic labeling model, and obtaining the labeling text coordinates of the training picture;
    根据所述标注文本坐标计算所述基础标注模型的损失函数,在所述损失函数收敛时,确定所述基础标注模型为训练后的基础标注模型;Calculate the loss function of the basic labeling model according to the coordinates of the labeling text, and when the loss function converges, determine that the basic labeling model is the trained basic labeling model;
    根据所述测试图片对所述训练后的基础标注模型进行验证,在所述训练后的基础标注模型对所述测试图片的验证通过率大于等于预设通过率时,确定所述训练后的基础标注模型为预设标注模型。The trained basic labeling model is verified according to the test picture, and when the verification pass rate of the trained basic labeling model on the test picture is greater than or equal to a preset pass rate, the trained basic labeling model is determined. The annotation model is a preset annotation model.
  15. 根据权利要求14所述的计算机设备,其中,所述根据所述标注文本坐标计算所述基础标注模型的损失函数的步骤具体包括:The computer device according to claim 14, wherein the step of calculating the loss function of the basic annotation model according to the coordinates of the annotation text specifically comprises:
    基于预设标注工具对所述训练图片进行标注,得到所述训练图片的初始文本坐标;Annotate the training picture based on a preset labeling tool to obtain initial text coordinates of the training picture;
    计算所述初始文本坐标和所述标注文本坐标的平方差,根据所述平方差计算得到所述基础标注模型的损失函数。Calculate the squared difference between the initial text coordinates and the labeled text coordinates, and calculate the loss function of the basic labeling model according to the squared difference.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器还执行如下步骤:A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor further performs the following steps:
    在接收到目标检测图片时,根据预设检测模型计算所述目标检测图片的复杂度;When receiving the target detection picture, calculate the complexity of the target detection picture according to the preset detection model;
    在所述复杂度为低复杂度时,根据预设标注模型中的第一标注模型获取所述目标检测图片的特征向量,根据所述特征向量计算得到所述目标检测图片中第一文本框的目标文本坐标;When the complexity is low complexity, the feature vector of the target detection picture is obtained according to the first labeling model in the preset labeling model, and the feature vector of the first text box in the target detection picture is calculated according to the feature vector. target text coordinates;
    根据所述目标文本坐标,计算所述目标检测图片的中心坐标,将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框,将所述中心坐标大于所述预设误差值的第一文本框确定为固定文本框;According to the target text coordinates, the center coordinates of the target detection picture are calculated, the first text box whose center coordinates are less than or equal to the preset error value is merged into a new text box, and the center coordinates are greater than the preset error value. The first text box of the value is determined to be a fixed text box;
    提取所述新文本框和所述固定文本框中的文本信息,确定所述文本信息为所述目标检测图片的检测文本。Extract the text information in the new text box and the fixed text box, and determine that the text information is the detection text of the target detection picture.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述预设误差值包括第一误差值和第二误差值,所述将所述中心坐标小于等于预设误差值的第一文本框融合为新文本框的步骤具体包括:The computer-readable storage medium of claim 16, wherein the preset error value includes a first error value and a second error value, and the first text box whose center coordinate is less than or equal to the preset error value The steps of merging into a new text box include:
    获取相邻的两个所述中心坐标的y轴坐标的第一像素差值,以及所述中心坐标的x轴坐标的第二像素差值;Obtain the first pixel difference value of the y-axis coordinates of the two adjacent center coordinates, and the second pixel difference value of the x-axis coordinates of the center coordinates;
    将所述第一像素差值小于等于所述第一误差值,且所述第二像素差值小于等于所述第二误差值的第一文本框融合为新文本框。A first text box whose first pixel difference value is less than or equal to the first error value and whose second pixel difference value is less than or equal to the second error value is merged into a new text box.
  18. 根据权利要求16所述的计算机可读存储介质,其中,在所述根据预设检测模型计算所述目标检测图片的复杂度的步骤之后包括:The computer-readable storage medium according to claim 16, wherein after the step of calculating the complexity of the target detection picture according to the preset detection model, it comprises:
    在所述复杂度为高复杂度时,根据预设标注模型中的第二标注模型获取所述目标检测图片对应的最小图片,以及所述最小图片中第二文本框的最小文本坐标;When the complexity is high complexity, obtain the minimum picture corresponding to the target detection picture and the minimum text coordinates of the second text box in the minimum picture according to the second labeling model in the preset labeling model;
    将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标,根据所述检测文本坐标计算得到所述目标检测图片对应的检测文本。The minimum text coordinates are mapped to the largest picture corresponding to the target detection picture in parallel to obtain the detected text coordinates of the target detection picture, and the detected text corresponding to the target detection picture is calculated according to the detected text coordinates.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述将所述最小文本坐标并行映射至所述目标检测图片对应的最大图片,得到所述目标检测图片的检测文本坐标的步 骤具体包括:The computer-readable storage medium according to claim 18, wherein the step of mapping the minimum text coordinates to the largest picture corresponding to the target detection picture in parallel, and obtaining the detected text coordinates of the target detection picture specifically comprises the following steps: :
    获取预设映射比例,按照所述预设映射比例将所述最小文本坐标并行放大,得到所述目标检测图片的检测文本坐标。A preset mapping ratio is acquired, and the minimum text coordinates are enlarged in parallel according to the preset mapping ratio to obtain the detected text coordinates of the target detection image.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述根据预设检测模型计算所述目标检测图片的复杂度的步骤具体包括:The computer-readable storage medium according to claim 16, wherein the step of calculating the complexity of the target detection picture according to the preset detection model specifically comprises:
    输入所述目标检测图片至所述预设检测模型的卷积层,经过池化层和全连接层,输出得到检测结果值;Inputting the target detection picture to the convolution layer of the preset detection model, and outputting the detection result value through the pooling layer and the fully connected layer;
    根据预设的二分类损失函数对该检测结果值进行预测,得到所述目标检测图片的复杂度。The detection result value is predicted according to a preset two-class loss function to obtain the complexity of the target detection picture.
PCT/CN2021/090512 2020-11-17 2021-04-28 Text detection method and apparatus from image, computer device and storage medium WO2022105120A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011286320.XA CN112395450B (en) 2020-11-17 2020-11-17 Picture character detection method and device, computer equipment and storage medium
CN202011286320.X 2020-11-17

Publications (1)

Publication Number Publication Date
WO2022105120A1 true WO2022105120A1 (en) 2022-05-27

Family

ID=74600891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090512 WO2022105120A1 (en) 2020-11-17 2021-04-28 Text detection method and apparatus from image, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112395450B (en)
WO (1) WO2022105120A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395450B (en) * 2020-11-17 2024-03-19 平安科技(深圳)有限公司 Picture character detection method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images
CN111340139A (en) * 2020-03-27 2020-06-26 中国科学院微电子研究所 Method and device for judging complexity of image content
CN111612003A (en) * 2019-02-22 2020-09-01 北京京东尚科信息技术有限公司 Method and device for extracting text in picture
CN112395450A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Picture character detection method and device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262699B2 (en) * 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
CN109685055B (en) * 2018-12-26 2021-11-12 北京金山数字娱乐科技有限公司 Method and device for detecting text area in image
CN110046616B (en) * 2019-03-04 2021-05-25 北京奇艺世纪科技有限公司 Image processing model generation method, image processing device, terminal device and storage medium
WO2020223859A1 (en) * 2019-05-05 2020-11-12 华为技术有限公司 Slanted text detection method, apparatus and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images
CN111612003A (en) * 2019-02-22 2020-09-01 北京京东尚科信息技术有限公司 Method and device for extracting text in picture
CN111340139A (en) * 2020-03-27 2020-06-26 中国科学院微电子研究所 Method and device for judging complexity of image content
CN112395450A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Picture character detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112395450A (en) 2021-02-23
CN112395450B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
WO2022142014A1 (en) Multi-modal information fusion-based text classification method, and related device thereof
US20220253631A1 (en) Image processing method, electronic device and storage medium
WO2022174491A1 (en) Artificial intelligence-based method and apparatus for medical record quality control, computer device, and storage medium
US10311288B1 (en) Determining identity of a person in a digital image
US11861919B2 (en) Text recognition method and device, and electronic device
WO2023035531A1 (en) Super-resolution reconstruction method for text image and related device thereof
WO2022105119A1 (en) Training corpus generation method for intention recognition model, and related device thereof
KR20210090576A (en) A method, an apparatus, an electronic device, a storage medium and a program for controlling quality
CN112330331A (en) Identity verification method, device and equipment based on face recognition and storage medium
WO2023280106A1 (en) Information acquisition method and apparatus, device, and medium
CN113780098A (en) Character recognition method, character recognition device, electronic equipment and storage medium
WO2022105120A1 (en) Text detection method and apparatus from image, computer device and storage medium
WO2022126962A1 (en) Knowledge graph-based method for detecting guiding and abetting corpus and related device
WO2022001233A1 (en) Pre-labeling method based on hierarchical transfer learning and related device
CN112396048B (en) Picture information extraction method and device, computer equipment and storage medium
CN112508005B (en) Method, apparatus, device and storage medium for processing image
CN113837194A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN112651399A (en) Method for detecting same-line characters in oblique image and related equipment thereof
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN114445833B (en) Text recognition method, device, electronic equipment and storage medium
CN115761778A (en) Document reconstruction method, device, equipment and storage medium
CN116030375A (en) Video feature extraction and model training method, device, equipment and storage medium
CN112395834B (en) Brain graph generation method, device and equipment based on picture input and storage medium
WO2021151274A1 (en) Image file processing method and apparatus, electronic device, and computer readable storage medium
CN113742485A (en) Method and device for processing text

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893276

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893276

Country of ref document: EP

Kind code of ref document: A1