WO2018054326A1 - Character detection method and device, and character detection training method and device - Google Patents

Character detection method and device, and character detection training method and device Download PDF

Info

Publication number
WO2018054326A1
WO2018054326A1 PCT/CN2017/102679 CN2017102679W WO2018054326A1 WO 2018054326 A1 WO2018054326 A1 WO 2018054326A1 CN 2017102679 W CN2017102679 W CN 2017102679W WO 2018054326 A1 WO2018054326 A1 WO 2018054326A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
text
area
neural network
convolutional neural
Prior art date
Application number
PCT/CN2017/102679
Other languages
French (fr)
Chinese (zh)
Inventor
向东来
郭强
夏炎
梁鼎
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018054326A1 publication Critical patent/WO2018054326A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present application relates to word detection, and in particular to a text detection method and apparatus, and a text detection training method and apparatus.
  • RPN Regional Proposal Network
  • the present application provides a technical solution for text detection.
  • the present application provides a text detection method, including: extracting a feature map from an image including a text region using a convolutional neural network; and horizontally intercepting the feature map by using a plurality of anchor rectangles to obtain a plurality of suggested regions. Dividing and regressing each of the suggested regions by the convolutional neural network, wherein the classification determines whether each of the suggested regions corresponds to a region including a character, and determining, by the regression, each corresponding region a position in the image; and each suggestion region corresponding to the region including the character determined by the classification is horizontally spliced according to the position in the image corresponding to each of the suggested regions determined by the regression to obtain the text region detection result.
  • the present application provides a text detection training method, including: extracting a feature map from a training image including a text region using a convolutional neural network; and laterally intercepting a feature image of the training image by using a plurality of anchor rectangles, Obtaining a plurality of suggestion regions; classifying and retrieving the suggested regions intercepted by each anchor rectangle by the convolutional neural network, wherein the classification determines whether each of the suggested regions corresponds to an area including characters, and the regression determines each The suggested area corresponds to a position in the training image; and iteratively trains the convolutional neural network until training according to a known difference between the real text area corresponding to the training image and the predicted text area obtained by the classification and regression The result satisfies the predetermined convergence condition.
  • the present application provides a text detecting apparatus, comprising: an image feature extraction module, which uses a convolutional neural network to extract a feature image from an image including a text area; and proposes an area intercepting module, which adopts multiple anchor rectangle pairs The feature maps are respectively laterally intercepted to obtain a plurality of suggestion regions; the classification module classifies each suggested region through the convolutional neural network to determine whether each suggested region corresponds to a region including characters; the regression module will Each suggested area is regressed by the convolutional neural network to determine that each suggested area corresponds to a position in the image; and a detection result splicing module, the suggestions corresponding to the area including the text determined by the classification module The area is spliced horizontally according to the position in the image according to the recommended areas determined by the regression module to obtain a text area detection result.
  • the present application provides a text detection training apparatus, including: an image feature extraction module, which uses a convolutional neural network to extract a feature image from a training image including a text region; and proposes an area interception module that uses multiple anchor rectangle pairs.
  • the feature image of the training image is laterally intercepted to obtain a plurality of suggestion regions; the classification module classifies each suggested region by the convolutional neural network to determine whether each suggested region corresponds to an area including characters; the regression module Retrieving each suggested region through the convolutional neural network to determine that each suggested region corresponds to a location in the training image; and a training module that is based on a known real text region corresponding to the training image and The classification and the difference of the predicted text regions obtained by the regression, iteratively training the convolutional neural network until the training result satisfies a predetermined convergence condition.
  • the present application provides a text detecting apparatus, a memory storing executable instructions, and one or more processors in communication with the memory to execute the executable instructions to perform the following operations: using convolution The neural network extracts the feature map from the image including the text region; the feature map is horizontally intercepted by using multiple anchor rectangles to obtain a plurality of suggestion regions; each suggested region is classified and returned through the convolutional neural network, Wherein, by the classification, determining whether each of the suggestion regions corresponds to an area including characters, determining, by the regression, each of the suggested regions corresponds to a position in the image; and determining an area corresponding to the text determined by the classification Each of the suggested regions is horizontally spliced according to the positions in the image corresponding to the respective suggested regions determined by the regression to obtain a text region detection result.
  • the present application provides a text detection training apparatus comprising: a memory storing executable instructions; and one or more processors in communication with the memory to execute the executable instructions to perform the following operations: Extracting a feature map from a training image including a text region using a convolutional neural network; laterally intercepting the feature image of the training image by using a plurality of anchor rectangles to obtain a plurality of suggested regions; and recommending a suggested region for each anchor rectangle
  • the convolutional neural network performs classification and regression, wherein the classification determines whether each of the suggested regions corresponds to an area including text, the regression determines a location of each of the suggested regions; and according to a known known image corresponding to the training image
  • the difference between the real text area and the predicted text area obtained by the classification and regression iteratively trains the convolutional neural network until the training result satisfies a predetermined convergence condition.
  • the present application also provides a computer readable medium having computer executable instructions stored thereon, when the processor executes computer executable instructions stored in the computer readable medium, the processor executes the Any text detection method and/or text detection training method.
  • Feature extraction and subsequent classification and regression are performed by employing a plurality of laterally stitched anchor rectangles, each of which intercepts only the suggested region corresponding to the lateral portion of the region to be detected in the image for processing, and thus is used for processing
  • a convolutional neural network for text detection when detecting a text area having a large width, an area near a single anchor rectangle corresponding to a lateral portion of the area to be detected can be seen without having a great feeling. Wild, thereby reducing the difficulty of network design.
  • FIG. 1 is a flow chart showing a text detecting method according to an embodiment of the present application.
  • FIG. 2 shows an architectural diagram of a character detecting apparatus according to an exemplary embodiment
  • FIG. 3 shows a schematic diagram of an exemplary application example according to the present application
  • FIG. 4 shows a flow chart of a training method for a convolutional neural network in accordance with an exemplary embodiment
  • FIG. 5 shows an architectural diagram of a text detection training device according to an exemplary embodiment
  • FIG. 6 is a block diagram showing the structure of a computer system suitable for implementing an embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations. Suitable for use with terminal equipment, computer systems or services
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations used with electronic devices such as the device include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 shows a flow chart 1000 of a text detection method in accordance with an embodiment of the present application.
  • a feature map is extracted from the image including the text region using a convolutional neural network.
  • the feature map obtained by the convolutional neural network contains the feature information of the image.
  • the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles to obtain a plurality of suggested regions (for example, at least two suggested regions are obtained). Since the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles, each of the suggested regions corresponding to the lateral portion of the image to be detected corresponds to the entire lateral length of the text region to be detected.
  • each suggestion region is classified and regressed by a convolutional neural network, wherein each of the suggested regions is determined by classification to correspond to an area including characters, and each suggested region is determined by regression to correspond to the image to be detected. s position.
  • each suggestion region corresponding to the region including the character determined by the classification is horizontally spliced according to the position in the image corresponding to each of the suggested regions determined by the regression to obtain a text region detection result.
  • the suggested areas that are adjacent to each other and/or intersected may be connected according to the respective suggested areas determined by the regression, respectively, corresponding to the positions in the image to be detected, thereby obtaining the text area detection result.
  • the anchor rectangles corresponding to the suggested regions in the adjacent and/or intersecting regions may be connected according to the respective suggested regions determined by the regression respectively corresponding to the positions in the image to be detected, thereby Get the text area detection result.
  • the processing object of classification and regression is a suggested region corresponding to the lateral portion of the image to be detected intercepted by the anchor rectangle, for the convolutional neural network used for text detection, the text region having a larger width is performed.
  • the processing object of classification and regression is a suggested region corresponding to the lateral portion of the image to be detected intercepted by the anchor rectangle, for the convolutional neural network used for text detection, the text region having a larger width is performed.
  • the plurality of anchor rectangles may be anchor rectangles continuously spliced in the lateral direction (ie, the width direction), whereby each suggestion region intercepted by each anchor rectangle may correspond to the entire width of the image to be detected.
  • a plurality of anchor rectangles may overlap slightly in the width direction.
  • two adjacent anchor rectangles may overlap one pixel in the width direction; thus, each suggested region intercepted by each anchor rectangle corresponds to the to-be-detected
  • the entire width of the image has a small amount of overlap to avoid gaps between adjacent anchor rectangles or adjacent suggested areas due to errors in actual use, thereby missing some intermediate width of the image to be detected.
  • the text detection method provided by the embodiment of the present application may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like.
  • the text detection method provided by the embodiment of the present application may be executed by a processor, for example, the processor executes the text detection method mentioned in the embodiment of the present application by calling a corresponding instruction stored in the memory. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 2 shows an architectural diagram of a text detecting device 2000 according to an exemplary embodiment.
  • the text detection device 2000 is implemented in the form of an RPN.
  • the text detecting device 2000 includes: an image feature extraction module 2010, a suggestion region intercepting module 2030, a classification module 2040, a regression module 2050, and a detection result splicing module 2070, wherein the image feature extraction module 2010 uses a convolutional neural network. Extracting the feature map from the image including the text area, the suggestion region intercepting module 2030 performs horizontal truncation on the feature map by using a plurality of anchor rectangles to obtain a plurality of suggestion regions, and the classification module 2040 passes each suggested region through the convolutional nerve.
  • the network performs classification to determine whether each suggested area corresponds to an area including text, and the regression module 2050 regresses each suggested area through the convolutional neural network to determine that each suggested area corresponds to a position in the image.
  • the detection result splicing module 2070 splices the recommended areas corresponding to the areas including the characters determined by the classification module 2040 according to the positions in the image according to the recommended areas determined by the regression module 2050, Get the text area detection result.
  • the image including the text is first input into the image feature extraction module 2010, and the image feature module 2010 uses a convolutional neural network from the included text region.
  • the image is extracted from the feature map.
  • the feature map obtained by convolution contains the feature information of the image.
  • the feature map extracted by the image feature extraction module 2010 is input into the suggestion region intercepting module 2030.
  • the suggestion region intercepting module 2030 the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles to obtain a plurality of suggested regions.
  • the obtained suggestion regions are respectively input into the classification module 2040 and the regression module 2050 to perform classification and regression, and it is determined by classification whether each of the suggested regions corresponds to an area including characters, and each suggested region is determined by regression to correspond to a position in the image.
  • the detection result splicing module 2070 respectively determines, according to the position in the image, each suggestion area determined by the classification module 2040 corresponding to the area including the text according to the recommended area determined by the regression module 2050. The area is spliced horizontally to obtain the text area detection result.
  • the detection result splicing module 2070 connects the adjacent regions and/or the intersecting suggestion regions according to the respective suggested regions determined by the regression corresponding to the positions in the image, thereby obtaining The text area detection result; in an optional example, the detection result splicing module 2070 suggests adjacent positions and/or intersections according to positions in the image corresponding to the respective suggested areas determined by regression respectively. The anchor rectangles corresponding to the regions are connected, thereby obtaining the text region detection result.
  • FIG. 3 shows a schematic diagram of an exemplary application example in accordance with the present application.
  • the image 10 containing the text area is the object to be detected.
  • the anchor rectangle employed is, for example, a single anchor rectangle 110 corresponding to the entire lateral width of the text area to be detected.
  • the detection of the text area can only be achieved if the lateral width of the anchor rectangle employed corresponds to the entire lateral width of the text area to be detected.
  • the RPN often requires a large receptive field to be processed, thereby bringing great difficulty to the design of the network. Therefore, existing regional recommendations are often not suitable for direct application to text detection.
  • a plurality of laterally stitched anchor rectangles 120 are used instead of a single anchor rectangle 110, and the sum of the widths of the plurality of laterally stitched anchor rectangles 120 corresponds to the entire text area to be detected.
  • Horizontal width For example, the sum of the widths of the plurality of laterally stitched anchor rectangles 120 may be equal to the entire lateral width of the text area to be detected, or slightly larger than the entire lateral width of the text area to be detected.
  • the plurality of anchor rectangles 120 abut each other so as to correspond to the entire lateral width of the text region to be detected.
  • the sum of the widths of the plurality of laterally stitched anchor rectangles 120 is greater than the entire lateral width of the text region to be detected, at least a portion of the plurality of anchor rectangles 120 have partial overlaps between the adjacent anchor rectangles 120, and a plurality of anchor rectangles
  • the width of the area formed by the 120 connection corresponds to the entire lateral width of the text area to be detected.
  • FIG. 3 exemplarily shows a portion 20 of the resulting feature map.
  • the suggestion region intercepting module 2030 the feature map is intercepted by using a plurality of laterally stitched anchor rectangles to obtain a plurality of suggestion regions, so that the suggested regions intercepted by each anchor rectangle are separately processed.
  • the suggested area intercepted by each anchor rectangle is, for example, in the form of a sliding window as shown in FIG.
  • the suggested area intercepted by the anchor rectangle may be further processed by one or more convolution layers 40.
  • the recommended area (or the recommended area not processed by the convolution layer) processed by the convolutional layer 40 is input to the classifier 50 and the regression unit 60. It is identified at the classifier 50 whether each of the suggested areas is a text area. The position of each suggestion area in the image to be detected 10 is determined at the regression unit 60. Finally, the detection result splicing module 2070 splices the suggested region corresponding to the text region determined by the classifier 50 according to the position determined at the regenerator 60 to form the detected character detection result.
  • the detection result splicing module 2070 connects the adjacent regions and/or the intersecting suggestion regions, thereby obtaining the text region detection result; and, for example, the detection result splicing module 2070 is adjacent to the location and/or has The anchor rectangles corresponding to the suggested regions of the intersection are connected, thereby obtaining the text region detection result.
  • a step of training the convolutional neural network in advance is further included.
  • a trained text detecting device such as the above-described character detecting device 2000, is obtained by the training described below.
  • FIG. 4 illustrates a training method 4000 for a convolutional neural network in accordance with an exemplary embodiment.
  • the training method 4000 for the convolutional neural network may include: extracting the feature map from the training image including the text region using the convolutional neural network in step S4010; And a plurality of anchor rectangles are laterally intercepted to obtain a plurality of suggestion regions; and in step S4050, the suggested regions intercepted by each anchor rectangle are classified and regressed by the convolutional neural network, wherein The classification determines whether each of the suggestion regions corresponds to an area including a text, the regression determines a position of each of the suggested regions; and in step S4070, based on the known real text region corresponding to the training image and the classification and regression The resulting difference in predicted text regions is iteratively trained on the convolutional neural network until the training result satisfies a predetermined convergence condition.
  • the predetermined convergence condition may be, for example, iterative training that the most recent error value falls within the allowable range, or the error value is less than the predetermined value, or the error value is the smallest, or the number of iterations reaches a predetermined number of times, and the like.
  • the real text in each iterative training of the convolutional neural network, is determined according to a cross ratio of the predicted text area and the corresponding real text area in a vertical direction.
  • the difference between the area and the predicted text area is determined according to a smooth L1 loss function.
  • One form of difference can be an error.
  • the suggested area corresponding to the predicted text area is determined to be a positive sample; otherwise, The suggested area corresponding to the predicted text area is determined to be a negative sample.
  • the classifier may use the softmax loss function as a training objective function to predict whether the suggested region is a text region.
  • the classifier determines each recommended area according to the intersection ratio of the suggested area and the horizontal part of the corresponding real text area in the vertical direction. Is it a positive or negative sample?
  • the regressionr can use the smooth L1 loss function in the RPN network as a training objective function to minimize the difference between the real text region and the predicted text region.
  • the parameters of the convolutional neural network are adapted to be adapted to identify the text regions in the image using a plurality of horizontally stitched anchor rectangles.
  • the difference between the real text area and the predicted text area is determined by the following formula:
  • L is the target error function
  • i is the number of the suggested region intercepted by the anchor rectangle
  • c i is the category marker of the ith suggested region
  • r i is the position vector of the ith suggested region
  • the variable indicated by * is represented by
  • L cls is the classification loss function
  • L reg is the loss function of the regression position
  • N cls and N reg represent the number of selected classification and regression training samples, respectively
  • is the preset empirical value
  • j It is any of x, y, w, and h, where x and y are the abscissa and the ordinate of the center point of the corresponding suggestion area, respectively, and w and h are the width and height of the corresponding suggestion area, respectively.
  • c i When the intersection ratio of the i-th suggestion region and the horizontal portion of the corresponding real text region in the vertical direction is greater than a preset threshold, c i is equal to 1, indicating that the i-th suggested region is a positive sample; and, when When the intersection ratio of the i-th suggestion area and the horizontal part of the corresponding real text area in the vertical direction is less than or equal to a preset threshold, c i is equal to 0, and the i-th suggestion area is a negative sample.
  • the classifier 50 determines whether each of the suggested regions corresponds to the region including the text (positive sample) or the region not including the text according to the intersection ratio of the suggested region and the real region intercepted by the anchor rectangle ( Negative sample), therefore, when an anchor rectangle coincides with the real area in the vertical direction but only a small part of the real area in the horizontal direction, the anchor rectangle will be considered to correspond to the text area and thus be selected as Positive sample.
  • the anchor rectangle is indeed a text area, it will not be selected as a positive sample.
  • the trained convolutional neural network that is, the above-described character detecting device 2000, is obtained by adjusting the system parameters in an iterative training process to reduce the difference between the real text region and the predicted text region represented by the training objective function.
  • each anchor rectangle (or suggested region of the anchor rectangle interception) corresponds to the area to be detected.
  • the classifier in the convolutional neural network also considers the characteristics of the vertical direction of the proposed area to predict each Whether the suggested area corresponds to the text area.
  • the text area detection result is obtained after the respective recommended areas corresponding to the area including the characters determined by the classification are horizontally spliced according to the positions in the image corresponding to the respective recommended areas determined by the regression. Based on such a technical solution, the problem that the actual real area corresponding to the text area cannot be correctly recognized when the anchor rectangle width is smaller than the real area width is avoided.
  • the training method for the convolutional neural network provided by the embodiment of the present application may be processed by any appropriate data. Capable device execution, including but not limited to: terminal devices and servers. Alternatively, the training method for the convolutional neural network provided by the embodiment of the present application may be performed by a processor, such as the processor performing the training method for the convolutional neural network mentioned in the embodiment of the present application by calling corresponding instructions stored in the memory. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 5 shows an architectural diagram of a text detection training device 5000 in accordance with an exemplary embodiment.
  • Each module of the text detection training device 5000 executes the various steps of the above-described character detection training method 4000.
  • the text detection training device 5000 is implemented in the form of an RPN.
  • the text detection training apparatus 5000 includes an image feature extraction module 5010, a suggestion region interception module 5030, a classification module 5040, a regression module 5050, and a training module 5060, wherein the image feature extraction module 5010 includes from a convolutional neural network.
  • the training image of the text area extracts the feature map, and the suggested region intercepting module 5030 performs horizontal interception on the feature images of the training image by using multiple anchor rectangles to obtain a plurality of suggestion regions, and the classification module 5040 passes each suggested region through the convolutional nerve.
  • the network performs classification to determine whether each suggested area corresponds to an area including text, and the regression module 5050 regresses each suggested area through the convolutional neural network to determine a position in each of the suggested areas corresponding to the training image, training
  • the module 5060 iteratively trains the convolutional neural network according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies the predetermined convergence condition.
  • the training image including the text is first input to the image feature extraction module 5010, and the image feature module 5010 is included from the convolutional neural network.
  • the training image of the text area extracts the feature map.
  • the feature map obtained by convolution contains the feature information of the training image.
  • the feature map extracted by the image feature extraction module 5010 is input into the suggestion region intercepting module 5030.
  • the suggestion region intercepting module 5030 the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles to obtain a plurality of suggestion regions.
  • the obtained suggestion areas are respectively input into the classification module 5040 and the regression module 5050 to perform classification and regression, and it is determined by classification whether each of the recommended areas corresponds to an area including characters, and each recommended area corresponds to a position in the training image by regression.
  • the training module 5060 iteratively trains the convolutional neural network according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies a predetermined convergence condition.
  • the predetermined convergence condition may be, for example, iterative training that the most recent error value falls within the allowable range, or the error value is less than the predetermined value, or the error value is the smallest, or the number of iterations reaches a predetermined number of times, and the like.
  • the training module 5060 determines the ratio of the predicted text area and the corresponding real text area in the vertical direction. Tell the truth The difference between the real text area and the predicted text area.
  • regression module 5050 determines a difference between the real text region and the predicted text region based on a smooth L1 loss function.
  • One form of difference can be an error.
  • the recommended area corresponding to the predicted text area is determined as a positive sample by the training module 5060; otherwise, the predicted text area The corresponding suggested area is determined by the training module 5060 as a negative sample.
  • the various features of the character detection training method 4000 described above in connection with FIG. 4 are applicable to the text detection training device 5000 shown in FIG.
  • any number of various combinations of the various features of the text detection training method 4000 described above in connection with FIG. 4 can be incorporated into the text detection training device 5000 shown in FIG.
  • the width of the anchor rectangle employed may be fixed, thereby reducing the size and number of anchor rectangles required for matching, thereby reducing the amount of calculation.
  • the width of the anchor rectangle used may be equal to the step size of the convolutional neural network, whereby the detection result is laterally spliced to form a detection result corresponding to the detection area.
  • the width of the anchor rectangle used may be slightly larger than the step size of the convolutional neural network.
  • the width of the anchor rectangle may be +1 of the convolutional neural network, thereby forming a detection by laterally splicing the detection results.
  • the result corresponds to the entire width of the detection area and has a small amount of overlap to avoid a gap between adjacent anchor rectangles due to factors such as errors in actual use, thereby missing some intermediate width of the detection area.
  • the character detecting method and apparatus and the character detecting training method and apparatus described with reference to Figs. 1 to 5 can be implemented by a computer system.
  • the computer system can include a memory that stores executable instructions and a processor.
  • the processor is in communication with the memory to execute executable instructions to implement the text detection method and apparatus and text detection training method and apparatus described with reference to Figures 1 through 5.
  • the text detection method and apparatus and text detection training method and apparatus described with reference to Figures 1 through 5 may be implemented by a non-transitory computer storage medium.
  • the medium stores computer readable instructions that, when executed, cause the processor to perform the text detection method and apparatus and text detection training method and apparatus described with reference to Figures 1 through 5.
  • FIG. 6 there is shown a block diagram of a computer system 6000 suitable for implementing embodiments of the present application.
  • computer system 6000 can include a processing unit (such as a central processing unit (CPU) 6001, an image processing unit (GPU), etc.) that can be stored according to a program stored in read only memory (ROM) 6002 or from a storage device. Portion 6008 loads into a program in random access memory (RAM) 6003 to perform various appropriate actions and processes. In the RAM 6003, various programs and data required for the operation of the system 6000 can also be stored.
  • the CPU 6001, the ROM 6002, and the RAM 6003 are connected to each other through a bus 6004.
  • the input/output I/O interface 6005 is also connected to the bus 6004.
  • the communication section 6009 can perform communication processing through a network such as the Internet.
  • the driver 6010 can also be connected to the I/O interface 6005 as needed.
  • a removable medium 6011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like can be mounted on the drive 6010 so that a computer program read therefrom can be installed into the storage portion 6008 as needed.
  • the text detection method and apparatus and text detection training method and apparatus described above with reference to FIGS. 1 through 5 may be implemented as a computer software program in accordance with an embodiment of the present disclosure.
  • embodiments of the present disclosure can include a computer program product comprising a computer program tangibly embodied in a machine readable medium.
  • the computer program includes a text detection method and apparatus and a character detection training method and apparatus for performing the description with reference to FIGS. 1 through 5.
  • the computer program can be downloaded and installed from the network via the communication portion 6009, and/or can be installed from the removable medium 6011.
  • the text detection technology of the present application can be used in the company card identification product, for example, when the employee wears the company card certificate through the camera of the company access control system, the data processing device of the company access control system (such as a computer or server connected to the camera through the network)
  • the image of the employee wearing the company card can be obtained by the camera.
  • the data processing device of the company access control system can obtain the text area on the company card in the image by using the text detection technology of the application, by performing the text area on the text area.
  • Text recognition can obtain information such as the employee's name and department marked on the company card.
  • the text detection technology of the present application can also be used in various applications involving text box positioning, for example, text box positioning for formatted texts such as medical bills, express orders, and invoices, so as to facilitate text recognition of the positioned text box.
  • text box positioning for formatted texts such as medical bills, express orders, and invoices
  • the result of the text box positioning or the result of the text recognition may be stored or displayed locally, or may be transmitted to a server or a peer in a peer-to-peer network. This application does not limit the specific application scenario of the text box after positioning.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units or modules involved in the embodiments of the present application may be implemented by software or hardware.
  • the described unit or module can also be provided in the processor.
  • the names of these units or modules should not be construed as limiting these units or modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

Disclosed are a character detection method and device, and a character detection training method and device. An exemplary character detection method comprises: extracting a feature map from an image comprising a character region by using a convolutional neural network; transversely clipping the feature map by using a plurality of anchors separately, to obtain a plurality of suggestion regions; classifying and regressing each suggestion region by means of the convolutional neural network, wherein whether each suggestion region corresponds to an region comprising characters is determined by means of the classification, and the position in the image corresponding to each suggestion region is determined by means of the regression; and transversely splicing, according to the positions in the image that respectively correspond to the suggestion regions and are determined by means of the regression, the suggestion regions that correspond to the regions comprising characters and are determined by means of the classification, to obtain a character region detection result.

Description

文字检测方法和装置、及文字检测训练方法和装置Text detection method and device, and text detection training method and device
本公开要求在2016年9月22日提交中国专利局、申请号为201610842572.3、发明名称为“文字检测方法和装置、及文字检测训练方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。The present disclosure claims the priority of the Chinese patent application filed on September 22, 2016, the Chinese Patent Office, the application number is 201610842572.3, and the invention is entitled "Text detection method and apparatus, and text detection training method and apparatus", the entire contents of which are The citations are incorporated in the disclosure.
技术领域Technical field
本申请涉及文字检测,具体地,涉及文字检测方法和装置、及文字检测训练方法和装置。The present application relates to word detection, and in particular to a text detection method and apparatus, and a text detection training method and apparatus.
背景技术Background technique
近年来,基于卷积神经网络的通用物体检测方法被尝试用于文字检测领域,并取得了较好的效果。区域建议神经网络(Region Proposal Network,RPN)是卷积神经网络中性能最好的算法之一,如何将区域建议神经网络应用到文字检测中,目前引起了业内人士的广泛关注和研究热情。In recent years, general object detection methods based on convolutional neural networks have been tried in the field of text detection, and have achieved good results. The Regional Proposal Network (RPN) is one of the best performing algorithms in convolutional neural networks. How to apply the regional recommendation neural network to text detection has attracted widespread attention and research enthusiasm.
发明内容Summary of the invention
本申请提供了用于文字检测的技术方案。The present application provides a technical solution for text detection.
一方面,本申请提供了一种文字检测方法,包括:使用卷积神经网络从包括文字区域的图像提取特征图;采用多个锚矩形对所述特征图分别进行横向截取,得到多个建议区域;将每个建议区域通过所述卷积神经网络进行分类和回归,其中,通过所述分类来确定每个建议区域是否对应于包括文字的区域,通过所述回归来确定每个建议区域对应所述图像中的位置;以及将通过分类确定的对应于包括文字的区域的各建议区域根据通过回归确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。In one aspect, the present application provides a text detection method, including: extracting a feature map from an image including a text region using a convolutional neural network; and horizontally intercepting the feature map by using a plurality of anchor rectangles to obtain a plurality of suggested regions. Dividing and regressing each of the suggested regions by the convolutional neural network, wherein the classification determines whether each of the suggested regions corresponds to a region including a character, and determining, by the regression, each corresponding region a position in the image; and each suggestion region corresponding to the region including the character determined by the classification is horizontally spliced according to the position in the image corresponding to each of the suggested regions determined by the regression to obtain the text region detection result.
另一方面,本申请提供了一种文字检测训练方法,包括:使用卷积神经网络从包括文字区域的训练图像提取特征图;采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;将每个锚矩形截取的建议区域通过所述卷积神经网络进行分类和回归,其中所述分类确定每个建议区域是否对应于包括文字的区域,所述回归确定每个建议区域对应所述训练图像中的位置;以及根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。In another aspect, the present application provides a text detection training method, including: extracting a feature map from a training image including a text region using a convolutional neural network; and laterally intercepting a feature image of the training image by using a plurality of anchor rectangles, Obtaining a plurality of suggestion regions; classifying and retrieving the suggested regions intercepted by each anchor rectangle by the convolutional neural network, wherein the classification determines whether each of the suggested regions corresponds to an area including characters, and the regression determines each The suggested area corresponds to a position in the training image; and iteratively trains the convolutional neural network until training according to a known difference between the real text area corresponding to the training image and the predicted text area obtained by the classification and regression The result satisfies the predetermined convergence condition.
又一方面,本申请提供了一种文字检测装置,包括:图像特征提取模块,使用卷积神经网络从包括文字区域的图像提取特征图;建议区域截取模块,采用多个锚矩形对所 述特征图分别进行横向截取,得到多个建议区域;分类模块,将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域;回归模块,将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应所述图像中的位置;以及检测结果拼接模块,将所述分类模块确定的对应于包括文字的区域的各建议区域根据所述回归模块确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。In another aspect, the present application provides a text detecting apparatus, comprising: an image feature extraction module, which uses a convolutional neural network to extract a feature image from an image including a text area; and proposes an area intercepting module, which adopts multiple anchor rectangle pairs The feature maps are respectively laterally intercepted to obtain a plurality of suggestion regions; the classification module classifies each suggested region through the convolutional neural network to determine whether each suggested region corresponds to a region including characters; the regression module will Each suggested area is regressed by the convolutional neural network to determine that each suggested area corresponds to a position in the image; and a detection result splicing module, the suggestions corresponding to the area including the text determined by the classification module The area is spliced horizontally according to the position in the image according to the recommended areas determined by the regression module to obtain a text area detection result.
再一方面,本申请提供了一种文字检测训练装置,包括:图像特征提取模块,使用卷积神经网络从包括文字区域的训练图像提取特征图;建议区域截取模块,采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;分类模块,将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域;回归模块,将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应所述训练图像中的位置;以及训练模块,根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。In another aspect, the present application provides a text detection training apparatus, including: an image feature extraction module, which uses a convolutional neural network to extract a feature image from a training image including a text region; and proposes an area interception module that uses multiple anchor rectangle pairs. The feature image of the training image is laterally intercepted to obtain a plurality of suggestion regions; the classification module classifies each suggested region by the convolutional neural network to determine whether each suggested region corresponds to an area including characters; the regression module Retrieving each suggested region through the convolutional neural network to determine that each suggested region corresponds to a location in the training image; and a training module that is based on a known real text region corresponding to the training image and The classification and the difference of the predicted text regions obtained by the regression, iteratively training the convolutional neural network until the training result satisfies a predetermined convergence condition.
再一方面,本申请提供了一种文字检测装置,存储器,存储有可执行指令;以及一个或多个处理器,与所述存储器通信以执行所述可执行指令从而执行以下操作:使用卷积神经网络从包括文字区域的图像提取特征图;采用多个锚矩形对所述特征图分别进行横向截取,得到多个建议区域;将每个建议区域通过所述卷积神经网络进行分类和回归,其中,通过所述分类来确定每个建议区域是否对应于包括文字的区域,通过所述回归来确定每个建议区域对应所述图像中的位置;以及将通过分类确定的对应于包括文字的区域的各建议区域根据通过回归确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。In still another aspect, the present application provides a text detecting apparatus, a memory storing executable instructions, and one or more processors in communication with the memory to execute the executable instructions to perform the following operations: using convolution The neural network extracts the feature map from the image including the text region; the feature map is horizontally intercepted by using multiple anchor rectangles to obtain a plurality of suggestion regions; each suggested region is classified and returned through the convolutional neural network, Wherein, by the classification, determining whether each of the suggestion regions corresponds to an area including characters, determining, by the regression, each of the suggested regions corresponds to a position in the image; and determining an area corresponding to the text determined by the classification Each of the suggested regions is horizontally spliced according to the positions in the image corresponding to the respective suggested regions determined by the regression to obtain a text region detection result.
再一方面,本申请提供了一种文字检测训练装置,包括:存储器,存储有可执行指令;以及一个或多个处理器,与所述存储器通信以执行所述可执行指令从而执行以下操作:使用卷积神经网络从包括文字区域的训练图像提取特征图;采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;将每个锚矩形截取的建议区域通过所述卷积神经网络进行分类和回归,其中所述分类确定每个建议区域是否对应于包括文字的区域,所述回归确定每个建议区域的位置;以及根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。In still another aspect, the present application provides a text detection training apparatus comprising: a memory storing executable instructions; and one or more processors in communication with the memory to execute the executable instructions to perform the following operations: Extracting a feature map from a training image including a text region using a convolutional neural network; laterally intercepting the feature image of the training image by using a plurality of anchor rectangles to obtain a plurality of suggested regions; and recommending a suggested region for each anchor rectangle The convolutional neural network performs classification and regression, wherein the classification determines whether each of the suggested regions corresponds to an area including text, the regression determines a location of each of the suggested regions; and according to a known known image corresponding to the training image The difference between the real text area and the predicted text area obtained by the classification and regression, iteratively trains the convolutional neural network until the training result satisfies a predetermined convergence condition.
本申请还提供了一种计算机可读介质,其中存储有计算机可执行指令,当处理器执行存储于该计算机可读介质中的计算机可执行指令时,处理器执行本申请实施例提供的 任一种文字检测方法和/或文字检测训练方法。The present application also provides a computer readable medium having computer executable instructions stored thereon, when the processor executes computer executable instructions stored in the computer readable medium, the processor executes the Any text detection method and/or text detection training method.
通过采用了多个横向拼接的锚矩形来执行特征提取以及之后的分类和回归,每个锚矩形仅截取与图像中的待检测区域的横向一部分对应的建议区域来进行处理,因此对于用来进行文字检测的卷积神经网络而言,在对具有较大宽度文字区域进行检测时,可以看到与待检测区域的横向一部分对应的单个锚矩形附近的区域即可,而无需具有很大的感受野,由此,减小了网络设计的难度。Feature extraction and subsequent classification and regression are performed by employing a plurality of laterally stitched anchor rectangles, each of which intercepts only the suggested region corresponding to the lateral portion of the region to be detected in the image for processing, and thus is used for processing In the convolutional neural network for text detection, when detecting a text area having a large width, an area near a single anchor rectangle corresponding to a lateral portion of the area to be detected can be seen without having a great feeling. Wild, thereby reducing the difficulty of network design.
附图说明DRAWINGS
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。The accompanying drawings, which are incorporated in FIG.
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:The present application can be more clearly understood from the following detailed description, in which:
图1是示出了根据本申请实施例的文字检测方法的流程图;1 is a flow chart showing a text detecting method according to an embodiment of the present application;
图2示出了根据示例性实施方式的文字检测装置的架构图;FIG. 2 shows an architectural diagram of a character detecting apparatus according to an exemplary embodiment;
图3示出了根据本申请示例性应用实例的示意图;FIG. 3 shows a schematic diagram of an exemplary application example according to the present application;
图4示出了根据示例性实施方式对卷积神经网络的训练方法的流程图;4 shows a flow chart of a training method for a convolutional neural network in accordance with an exemplary embodiment;
图5示出了根据示例性实施方式的文字检测训练装置的架构图;以及FIG. 5 shows an architectural diagram of a text detection training device according to an exemplary embodiment;
图6是示出了适合实施本申请实施例的计算机系统的结构示意图。FIG. 6 is a block diagram showing the structure of a computer system suitable for implementing an embodiment of the present application.
具体实施例Specific embodiment
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。Various exemplary embodiments of the present application will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in the embodiments are not intended to limit the scope of the application.
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。In the meantime, it should be understood that the dimensions of the various parts shown in the drawings are not drawn in the actual scale relationship for the convenience of the description.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。The following description of the at least one exemplary embodiment is merely illustrative and is in no way
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but the techniques, methods and apparatus should be considered as part of the specification, where appropriate.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in one figure, it is not required to be further discussed in the subsequent figures.
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统或者服务 器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或者膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations. Suitable for use with terminal equipment, computer systems or services Examples of well-known terminal devices, computing systems, environments, and/or configurations used with electronic devices such as the device include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括:例程、程序、目标程序、组件、逻辑、数据结构等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system. Generally, program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types. The computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including storage devices.
图1示出了根据本申请实施例的文字检测方法的流程图1000。在步骤S1010,使用卷积神经网络从包括文字区域的图像提取特征图(feature map)。通过卷积神经网络得到的特征图包含了图像的特征信息。在步骤S1030,采用多个锚矩形(anchor)对特征图分别进行横向截取,得到多个建议区域(例如,得到至少两个建议区域)。由于采用多个锚矩形对特征图分别进行横向截取,因此得到的每个建议区域与待检测图像的横向一部分对应,而不是对应于待检测文字区域的整个横向长度。在步骤S1050,将每个建议区域通过卷积神经网络进行分类和回归,其中,通过分类来确定每个建议区域是否对应于包括文字的区域,通过回归来确定每个建议区域对应待检测图像中的位置。在步骤S1070,将通过分类确定的对应于包括文字的区域的各建议区域根据通过回归确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。在一个可选示例中,可以根据通过回归确定的所述各建议区域分别对应待检测图像中的位置,将位置相邻的和/或有交集的建议区域进行连接,由此得到文字区域检测结果;在一个可选示例中,可以根据通过回归确定的所述各建议区域分别对应待检测图像中的位置,将位置相邻的和/或有交集的建议区域对应的锚矩形进行连接,由此得到文字区域检测结果。FIG. 1 shows a flow chart 1000 of a text detection method in accordance with an embodiment of the present application. In step S1010, a feature map is extracted from the image including the text region using a convolutional neural network. The feature map obtained by the convolutional neural network contains the feature information of the image. In step S1030, the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles to obtain a plurality of suggested regions (for example, at least two suggested regions are obtained). Since the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles, each of the suggested regions corresponding to the lateral portion of the image to be detected corresponds to the entire lateral length of the text region to be detected. In step S1050, each suggestion region is classified and regressed by a convolutional neural network, wherein each of the suggested regions is determined by classification to correspond to an area including characters, and each suggested region is determined by regression to correspond to the image to be detected. s position. In step S1070, each suggestion region corresponding to the region including the character determined by the classification is horizontally spliced according to the position in the image corresponding to each of the suggested regions determined by the regression to obtain a text region detection result. In an optional example, the suggested areas that are adjacent to each other and/or intersected may be connected according to the respective suggested areas determined by the regression, respectively, corresponding to the positions in the image to be detected, thereby obtaining the text area detection result. In an optional example, the anchor rectangles corresponding to the suggested regions in the adjacent and/or intersecting regions may be connected according to the respective suggested regions determined by the regression respectively corresponding to the positions in the image to be detected, thereby Get the text area detection result.
由于分类和回归的处理对象是由锚矩形截取的对应于待检测图像的横向一部分对应的建议区域,因此对于用来进行文字检测的卷积神经网络而言,在对具有较大宽度文字区域进行检测时,可以看到与文字区域的横向一部分对应的单个锚矩形附近的区域即可,而无需具有很大的感受野,由此,减小了网络设计的难度。Since the processing object of classification and regression is a suggested region corresponding to the lateral portion of the image to be detected intercepted by the anchor rectangle, for the convolutional neural network used for text detection, the text region having a larger width is performed. When detecting, it is possible to see an area near a single anchor rectangle corresponding to a lateral portion of the text area without having a large receptive field, thereby reducing the difficulty of network design.
在上述文字检测方法中,多个锚矩形可以是在横向方向(即,宽度方向)上连续拼接的锚矩形,由此,各锚矩形截取的各建议区域可以对应于待检测图像的整个宽度。可 选地,多个锚矩形之间在宽度方向可以略微重叠,例如,两个相邻的锚矩形之间在宽度方向可以重叠一个像素;由此,各锚矩形截取的各建议区域对应于待检测图像的整个宽度并具有少量重叠部分,以避免由于实际使用中的误差而在相邻锚矩形或相邻的建议区域之间产生间隙,从而遗漏待检测图像的某些中间宽度。In the above text detecting method, the plurality of anchor rectangles may be anchor rectangles continuously spliced in the lateral direction (ie, the width direction), whereby each suggestion region intercepted by each anchor rectangle may correspond to the entire width of the image to be detected. Can Optionally, a plurality of anchor rectangles may overlap slightly in the width direction. For example, two adjacent anchor rectangles may overlap one pixel in the width direction; thus, each suggested region intercepted by each anchor rectangle corresponds to the to-be-detected The entire width of the image has a small amount of overlap to avoid gaps between adjacent anchor rectangles or adjacent suggested areas due to errors in actual use, thereby missing some intermediate width of the image to be detected.
本申请实施例提供的文字检测方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的文字检测方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的文字检测方法。下文不再赘述。The text detection method provided by the embodiment of the present application may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like. Alternatively, the text detection method provided by the embodiment of the present application may be executed by a processor, for example, the processor executes the text detection method mentioned in the embodiment of the present application by calling a corresponding instruction stored in the memory. This will not be repeated below.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
图2示出了根据示例性实施方式的文字检测装置2000的架构图。在一个可选示例中,文字检测装置2000以RPN的形式实现。如图2所示,文字检测装置2000包括:图像特征提取模块2010、建议区域截取模块2030、分类模块2040、回归模块2050以及检测结果拼接模块2070,其中,图像特征提取模块2010使用卷积神经网络从包括文字区域的图像提取特征图,建议区域截取模块2030采用多个锚矩形对所述特征图分别进行横向截取以得到多个建议区域,分类模块2040将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域,回归模块2050将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应所述图像中的位置,检测结果拼接模块2070将所述分类模块2040确定的对应于包括文字的区域的各建议区域根据所述回归模块2050确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。FIG. 2 shows an architectural diagram of a text detecting device 2000 according to an exemplary embodiment. In an alternative example, the text detection device 2000 is implemented in the form of an RPN. As shown in FIG. 2, the text detecting device 2000 includes: an image feature extraction module 2010, a suggestion region intercepting module 2030, a classification module 2040, a regression module 2050, and a detection result splicing module 2070, wherein the image feature extraction module 2010 uses a convolutional neural network. Extracting the feature map from the image including the text area, the suggestion region intercepting module 2030 performs horizontal truncation on the feature map by using a plurality of anchor rectangles to obtain a plurality of suggestion regions, and the classification module 2040 passes each suggested region through the convolutional nerve. The network performs classification to determine whether each suggested area corresponds to an area including text, and the regression module 2050 regresses each suggested area through the convolutional neural network to determine that each suggested area corresponds to a position in the image. The detection result splicing module 2070 splices the recommended areas corresponding to the areas including the characters determined by the classification module 2040 according to the positions in the image according to the recommended areas determined by the regression module 2050, Get the text area detection result.
在一个可选示例中,结合上文所述,在对图像中的文字进行检测时,首先将包括文字的图像输入图像特征提取模块2010,在图像特征模块2010使用卷积神经网络从包括文字区域的图像提取特征图。通过卷积得到的特征图包含了图像的特征信息。然后,在图像特征提取模块2010提取到的特征图被输入建议区域截取模块2030,在建议区域截取模块2030中,采用多个锚矩形对所述特征图分别进行横向截取,得到多个建议区域。获得的建议区域分别输入分类模块2040和回归模块2050,进行分类和回归,通过分类确定每个建议区域是否对应于包括文字的区域,通过回归确定每个建议区域对应所述图像中的位置。检测结果拼接模块2070将所述分类模块2040确定的对应于包括文字的区域的各建议区域根据所述回归模块2050确定的所述各建议区域分别对应所述图像中的位置进行 区域横向拼接,以得到文字区域检测结果。在一个可选示例中,检测结果拼接模块2070根据通过回归确定的所述各建议区域分别对应所述图像中的位置,将位置相邻的和/或有交集的建议区域进行连接,由此得到所述文字区域检测结果;在一个可选示例中,检测结果拼接模块2070根据通过回归确定的所述各建议区域分别对应所述图像中的位置,将位置相邻的和/或有交集的建议区域对应的锚矩形进行连接,由此得到所述文字区域检测结果。In an alternative example, in conjunction with the above, when detecting text in an image, the image including the text is first input into the image feature extraction module 2010, and the image feature module 2010 uses a convolutional neural network from the included text region. The image is extracted from the feature map. The feature map obtained by convolution contains the feature information of the image. Then, the feature map extracted by the image feature extraction module 2010 is input into the suggestion region intercepting module 2030. In the suggestion region intercepting module 2030, the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles to obtain a plurality of suggested regions. The obtained suggestion regions are respectively input into the classification module 2040 and the regression module 2050 to perform classification and regression, and it is determined by classification whether each of the suggested regions corresponds to an area including characters, and each suggested region is determined by regression to correspond to a position in the image. The detection result splicing module 2070 respectively determines, according to the position in the image, each suggestion area determined by the classification module 2040 corresponding to the area including the text according to the recommended area determined by the regression module 2050. The area is spliced horizontally to obtain the text area detection result. In an optional example, the detection result splicing module 2070 connects the adjacent regions and/or the intersecting suggestion regions according to the respective suggested regions determined by the regression corresponding to the positions in the image, thereby obtaining The text area detection result; in an optional example, the detection result splicing module 2070 suggests adjacent positions and/or intersections according to positions in the image corresponding to the respective suggested areas determined by regression respectively. The anchor rectangles corresponding to the regions are connected, thereby obtaining the text region detection result.
下面结合上述文字检测方法和文字检测装置对示例性的应用实例进行描述。图3示出了根据本申请示例性应用实例的示意图。An exemplary application example will be described below in conjunction with the above-described character detecting method and text detecting device. FIG. 3 shows a schematic diagram of an exemplary application example in accordance with the present application.
如图3所示,包含文字区域的图像10是待检测的对象。在现有的RPN中,采用的锚矩形例如为图示的与待检测文字区域的整个横向宽度对应的单个锚矩形110。只有在采用的锚矩形的横向宽度对应于待检测文字区域的整个横向宽度时,才能实现对该文字区域的检测。这样,在文字宽度较大的情况下,RPN往往需要很大的感受野才能进行处理,由此给网络的设计带来很大的难度。因此,现有的区域建议神经网络往往并不适于直接应用于文字检测。As shown in FIG. 3, the image 10 containing the text area is the object to be detected. In the existing RPN, the anchor rectangle employed is, for example, a single anchor rectangle 110 corresponding to the entire lateral width of the text area to be detected. The detection of the text area can only be achieved if the lateral width of the anchor rectangle employed corresponds to the entire lateral width of the text area to be detected. In this way, in the case of a large text width, the RPN often requires a large receptive field to be processed, thereby bringing great difficulty to the design of the network. Therefore, existing regional recommendations are often not suitable for direct application to text detection.
如图3所示,根据本申请的示例性实施方式,采用多个横向拼接的锚矩形120代替单个锚矩形110,多个横向拼接的锚矩形120的宽度之和对应于待检测文字区域的整个横向宽度。例如,多个横向拼接的锚矩形120的宽度之和可以等于待检测文字区域的整个横向宽度,或者略大于待检测文字区域的整个横向宽度。在多个横向拼接的锚矩形120的宽度之和等于待检测文字区域的整个横向宽度的情况下,多个锚矩形120彼此邻接,从而与待检测文字区域的整个横向宽度相对应。在多个横向拼接的锚矩形120的宽度之和大于待检测文字区域的整个横向宽度的情况下,多个锚矩形120中的至少一部分相邻锚矩形120之间具有部分重合,多个锚矩形120连接形成的区域的宽度与待检测文字区域的整个横向宽度相对应。在上述文字检测方法中,首先由卷积神经网络中的图像特征提取模块2010对待检测图像10进行特征图提取。图3示例性示出了得到的特征图的一部分20。在建议区域截取模块2030,采用多个横向拼接的锚矩形对所述特征图进行截取以获得多个建议区域,以便对每个锚矩形截取的建议区域分别进行处理。每个锚矩形截取的建议区域例如为图3所示的滑动窗的形式。可选地,对于锚矩形截取的建议区域,可以通过一个或多个卷积层40进一步处理。经卷积层40处理后的建议区域(或者未经卷积层处理的建议区域)输入到分类器50和回归器60。在分类器50处识别每个建议区域是否为文字区域。在回归器60处确定每个建议区域在待检测图像10中的位置。最后,在检测结果拼接模块2070将分类器50确定的对应于文字区域的建议区域根据在回归器60处确定的位置进行拼接,以形成检测的文字检测结果。如上文所述,拼接的可选方式 例如,检测结果拼接模块2070将位置相邻的和/或有交集的建议区域进行连接,由此得到所述文字区域检测结果;再例如,检测结果拼接模块2070将位置相邻的和/或有交集的建议区域对应的锚矩形进行连接,由此得到所述文字区域检测结果。As shown in FIG. 3, according to an exemplary embodiment of the present application, a plurality of laterally stitched anchor rectangles 120 are used instead of a single anchor rectangle 110, and the sum of the widths of the plurality of laterally stitched anchor rectangles 120 corresponds to the entire text area to be detected. Horizontal width. For example, the sum of the widths of the plurality of laterally stitched anchor rectangles 120 may be equal to the entire lateral width of the text area to be detected, or slightly larger than the entire lateral width of the text area to be detected. In the case where the sum of the widths of the plurality of laterally stitched anchor rectangles 120 is equal to the entire lateral width of the text region to be detected, the plurality of anchor rectangles 120 abut each other so as to correspond to the entire lateral width of the text region to be detected. In a case where the sum of the widths of the plurality of laterally stitched anchor rectangles 120 is greater than the entire lateral width of the text region to be detected, at least a portion of the plurality of anchor rectangles 120 have partial overlaps between the adjacent anchor rectangles 120, and a plurality of anchor rectangles The width of the area formed by the 120 connection corresponds to the entire lateral width of the text area to be detected. In the above-described character detecting method, feature image extraction is first performed on the image to be detected 10 by the image feature extraction module 2010 in the convolutional neural network. FIG. 3 exemplarily shows a portion 20 of the resulting feature map. In the suggestion region intercepting module 2030, the feature map is intercepted by using a plurality of laterally stitched anchor rectangles to obtain a plurality of suggestion regions, so that the suggested regions intercepted by each anchor rectangle are separately processed. The suggested area intercepted by each anchor rectangle is, for example, in the form of a sliding window as shown in FIG. Alternatively, the suggested area intercepted by the anchor rectangle may be further processed by one or more convolution layers 40. The recommended area (or the recommended area not processed by the convolution layer) processed by the convolutional layer 40 is input to the classifier 50 and the regression unit 60. It is identified at the classifier 50 whether each of the suggested areas is a text area. The position of each suggestion area in the image to be detected 10 is determined at the regression unit 60. Finally, the detection result splicing module 2070 splices the suggested region corresponding to the text region determined by the classifier 50 according to the position determined at the regenerator 60 to form the detected character detection result. As mentioned above, the alternative way of splicing For example, the detection result splicing module 2070 connects the adjacent regions and/or the intersecting suggestion regions, thereby obtaining the text region detection result; and, for example, the detection result splicing module 2070 is adjacent to the location and/or has The anchor rectangles corresponding to the suggested regions of the intersection are connected, thereby obtaining the text region detection result.
根据示例性实施方式,在上述文字检测方法1000中,进一步包括预先对卷积神经网络进行训练的步骤。通过下文描述的训练,获得经训练的文字检测装置,例如上述文字检测装置2000。According to an exemplary embodiment, in the above-described character detecting method 1000, a step of training the convolutional neural network in advance is further included. A trained text detecting device, such as the above-described character detecting device 2000, is obtained by the training described below.
图4示出了根据示例性实施方式对卷积神经网络的训练方法4000。在一个可选示例中,如图4所示,对卷积神经网络的训练方法4000可包括:在步骤S4010,使用卷积神经网络从包括文字区域的训练图像提取特征图;在步骤S4030,采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;在步骤S4050,将每个锚矩形截取的建议区域通过所述卷积神经网络进行分类和回归,其中所述分类确定每个建议区域是否对应于包括文字的区域,所述回归确定每个建议区域的位置;以及在步骤S4070,根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。预定的收敛条件例如可以是:迭代训练最近一次的误差值落入容许范围、或者误差值小于预定值、或者误差值最小、或者迭代次数达到预定次数,等等。FIG. 4 illustrates a training method 4000 for a convolutional neural network in accordance with an exemplary embodiment. In an optional example, as shown in FIG. 4, the training method 4000 for the convolutional neural network may include: extracting the feature map from the training image including the text region using the convolutional neural network in step S4010; And a plurality of anchor rectangles are laterally intercepted to obtain a plurality of suggestion regions; and in step S4050, the suggested regions intercepted by each anchor rectangle are classified and regressed by the convolutional neural network, wherein The classification determines whether each of the suggestion regions corresponds to an area including a text, the regression determines a position of each of the suggested regions; and in step S4070, based on the known real text region corresponding to the training image and the classification and regression The resulting difference in predicted text regions is iteratively trained on the convolutional neural network until the training result satisfies a predetermined convergence condition. The predetermined convergence condition may be, for example, iterative training that the most recent error value falls within the allowable range, or the error value is less than the predetermined value, or the error value is the smallest, or the number of iterations reaches a predetermined number of times, and the like.
根据本申请的实施方式,在所述卷积神经网络的每次迭代训练中,根据所述预测文字区域与所述对应的真实文字区域在竖直方向上的交并比,确定所述真实文字区域和所述预测文字区域之间的差异。例如,在所述卷积神经网络的每次迭代训练中,根据smooth L1损失函数确定所述真实文字区域和所述预测文字区域之间的差异。差异的一种表现形式可以是误差。According to an embodiment of the present application, in each iterative training of the convolutional neural network, the real text is determined according to a cross ratio of the predicted text area and the corresponding real text area in a vertical direction. The difference between the area and the predicted text area. For example, in each iterative training of the convolutional neural network, the difference between the real text region and the predicted text region is determined according to a smooth L1 loss function. One form of difference can be an error.
根据本申请的实施方式,当预测文字区域与对应的真实文字区域在竖直方向上的交并比大于预先设定的阈值时,该预测文字区域对应的建议区域被确定为正样本;否则,该预测文字区域对应的建议区域被确定为负样本。According to the embodiment of the present application, when the intersection ratio of the predicted text area and the corresponding real text area in the vertical direction is greater than a preset threshold, the suggested area corresponding to the predicted text area is determined to be a positive sample; otherwise, The suggested area corresponding to the predicted text area is determined to be a negative sample.
在一个可选示例中,分类器可采用softmax损失函数作为训练目标函数,用来对建议区域是否为文字区域进行预测。根据示例性实施方式,在训练过程中,在计算卷积神经网络的误差值时,分类器根据建议区域与对应的真实文字区域的横向一部分在竖直方向上的交并比,确定各个建议区域是正样本还是负样本。回归器可采用RPN网络中的smooth L1损失函数作为训练目标函数来使真实文字区域和预测文字区域之间的差异最小化。经过迭代训练所述卷积神经网络直至训练结果满足预定收敛条件,卷积神经网络的参数被调整为适应于利用多个横向拼接的锚矩形来对图像中的文字区域进行识别。In an optional example, the classifier may use the softmax loss function as a training objective function to predict whether the suggested region is a text region. According to an exemplary embodiment, during the training process, when calculating the error value of the convolutional neural network, the classifier determines each recommended area according to the intersection ratio of the suggested area and the horizontal part of the corresponding real text area in the vertical direction. Is it a positive or negative sample? The regressionr can use the smooth L1 loss function in the RPN network as a training objective function to minimize the difference between the real text region and the predicted text region. After iteratively training the convolutional neural network until the training result satisfies a predetermined convergence condition, the parameters of the convolutional neural network are adapted to be adapted to identify the text regions in the image using a plurality of horizontally stitched anchor rectangles.
在一个可选示例中,当采用RPN网络中的smooth L1损失函数作为训练目标函数时, 真实文字区域和预测文字区域之间的差异由以下公式确定:In an optional example, when using the smooth L1 loss function in the RPN network as the training objective function, The difference between the real text area and the predicted text area is determined by the following formula:
Figure PCTCN2017102679-appb-000001
Figure PCTCN2017102679-appb-000001
其中,L是目标误差函数,i是锚矩形截取的建议区域的序号,ci为第i个建议区域的类别标记,ri为第i个建议区域的位置向量,上标为*的变量表示相应变量的目标真实值,Lcls为分类损失函数,Lreg为回归位置的损失函数,Ncls和Nreg分别代表被选择的分类和回归训练样本数目,λ是预先设定的经验值,j为x、y、w和h中的任一,其中x和y分别是对应建议区域的中心点的横坐标和纵坐标,w和h分别是对应建议区域的宽度和高度。Where L is the target error function, i is the number of the suggested region intercepted by the anchor rectangle, c i is the category marker of the ith suggested region, r i is the position vector of the ith suggested region, and the variable indicated by * is represented by The target real value of the corresponding variable, L cls is the classification loss function, L reg is the loss function of the regression position, N cls and N reg represent the number of selected classification and regression training samples, respectively, and λ is the preset empirical value, j It is any of x, y, w, and h, where x and y are the abscissa and the ordinate of the center point of the corresponding suggestion area, respectively, and w and h are the width and height of the corresponding suggestion area, respectively.
当第i个建议区域与对应的真实文字区域的横向一部分在竖直方向上的交并比大于预先设定的阈值时,ci等于1,代表第i个建议区域为正样本;以及,当第i个建议区域与对应的真实文字区域的横向一部分在竖直方向上的交并比小于或等于预先设定的阈值时,ci等于0,代表第i个建议区域为负样本。When the intersection ratio of the i-th suggestion region and the horizontal portion of the corresponding real text region in the vertical direction is greater than a preset threshold, c i is equal to 1, indicating that the i-th suggested region is a positive sample; and, when When the intersection ratio of the i-th suggestion area and the horizontal part of the corresponding real text area in the vertical direction is less than or equal to a preset threshold, c i is equal to 0, and the i-th suggestion area is a negative sample.
由于上述训练过程中,分类器50根据锚矩形截取的建议区域和真实区域的交并比来确定每个建议区域是对应于包括文字的区域(正样本),还是对应于不包括文字的区域(负样本),因此,当一个锚矩形在竖直方向和真实区域重合、但在水平方向只占了真实区域的一小部分时,这个锚矩形将被认为是对应于文字区域,从而被选为正样本。而在现有的RPN中,这种情况虽然锚矩形的确是文字区域,但其将不会被选为正样本。Due to the above training process, the classifier 50 determines whether each of the suggested regions corresponds to the region including the text (positive sample) or the region not including the text according to the intersection ratio of the suggested region and the real region intercepted by the anchor rectangle ( Negative sample), therefore, when an anchor rectangle coincides with the real area in the vertical direction but only a small part of the real area in the horizontal direction, the anchor rectangle will be considered to correspond to the text area and thus be selected as Positive sample. In the existing RPN, although the anchor rectangle is indeed a text area, it will not be selected as a positive sample.
通过在迭代的训练过程对系统参数进行调整以减小由训练目标函数表示真实文字区域和预测文字区域的差异,得到训练好的卷积神经网络,即,上述文字检测装置2000。The trained convolutional neural network, that is, the above-described character detecting device 2000, is obtained by adjusting the system parameters in an iterative training process to reduce the difference between the real text region and the predicted text region represented by the training objective function.
经此训练后,在后续的检测过程中,可采用多个横向拼接的锚矩形来执行特征提取以及之后的分类和回归,每个锚矩形(或锚矩形截取的建议区域)对应于待检测区域的横向一部分,由于在对卷积神经网络的训练过程中考虑了竖直方向上的特征,因此在检测过程中,卷积神经网络中的分类器也考虑建议区域竖直方向的特征来预测每个建议区域是否对应于文字区域。在将通过分类确定的对应于包括文字的区域的各建议区域根据通过回归确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接后,得到文字区域检测结果。基于这样的技术方案,避免了在锚矩形宽度小于真实区域宽度时,对实际对应于文字区域的部分真实区域不能正确识别的问题。After this training, in the subsequent detection process, multiple horizontally stitched anchor rectangles may be used to perform feature extraction and subsequent classification and regression, and each anchor rectangle (or suggested region of the anchor rectangle interception) corresponds to the area to be detected. In the lateral part, since the vertical direction is considered in the training process of the convolutional neural network, in the detection process, the classifier in the convolutional neural network also considers the characteristics of the vertical direction of the proposed area to predict each Whether the suggested area corresponds to the text area. The text area detection result is obtained after the respective recommended areas corresponding to the area including the characters determined by the classification are horizontally spliced according to the positions in the image corresponding to the respective recommended areas determined by the regression. Based on such a technical solution, the problem that the actual real area corresponding to the text area cannot be correctly recognized when the anchor rectangle width is smaller than the real area width is avoided.
本申请实施例提供的对卷积神经网络的训练方法可以由任意适当的具有数据处理 能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的对卷积神经网络的训练方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的对卷积神经网络的训练方法。下文不再赘述。The training method for the convolutional neural network provided by the embodiment of the present application may be processed by any appropriate data. Capable device execution, including but not limited to: terminal devices and servers. Alternatively, the training method for the convolutional neural network provided by the embodiment of the present application may be performed by a processor, such as the processor performing the training method for the convolutional neural network mentioned in the embodiment of the present application by calling corresponding instructions stored in the memory. This will not be repeated below.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
图5示出了根据示例性实施方式的文字检测训练装置5000的架构图。文字检测训练装置5000的各个模块执行上述文字检测训练方法4000的各个步骤。在一个可选示例中,文字检测训练装置5000以RPN的形式实现。如图5所示,文字检测训练装置5000包括图像特征提取模块5010、建议区域截取模块5030、分类模块5040、回归模块5050和训练模块5060,其中,图像特征提取模块5010使用卷积神经网络从包括文字区域的训练图像提取特征图,建议区域截取模块5030采用多个锚矩形对训练图像的特征图分别进行横向截取以得到多个建议区域,分类模块5040将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域,回归模块5050将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应训练图像中的位置,训练模块5060根据已知的与训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练卷积神经网络直至训练结果满足预定收敛条件。FIG. 5 shows an architectural diagram of a text detection training device 5000 in accordance with an exemplary embodiment. Each module of the text detection training device 5000 executes the various steps of the above-described character detection training method 4000. In an alternative example, the text detection training device 5000 is implemented in the form of an RPN. As shown in FIG. 5, the text detection training apparatus 5000 includes an image feature extraction module 5010, a suggestion region interception module 5030, a classification module 5040, a regression module 5050, and a training module 5060, wherein the image feature extraction module 5010 includes from a convolutional neural network. The training image of the text area extracts the feature map, and the suggested region intercepting module 5030 performs horizontal interception on the feature images of the training image by using multiple anchor rectangles to obtain a plurality of suggestion regions, and the classification module 5040 passes each suggested region through the convolutional nerve. The network performs classification to determine whether each suggested area corresponds to an area including text, and the regression module 5050 regresses each suggested area through the convolutional neural network to determine a position in each of the suggested areas corresponding to the training image, training The module 5060 iteratively trains the convolutional neural network according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies the predetermined convergence condition.
在一个可选示例中,结合上文所述,在对训练图像中的文字进行检测时,首先将包括文字的训练图像输入图像特征提取模块5010,在图像特征模块5010使用卷积神经网络从包括文字区域的训练图像提取特征图。通过卷积得到的特征图包含了训练图像的特征信息。然后,在图像特征提取模块5010提取到的特征图被输入建议区域截取模块5030,在建议区域截取模块5030中,采用多个锚矩形对所述特征图分别进行横向截取,得到多个建议区域。获得的建议区域分别输入分类模块5040和回归模块5050,进行分类和回归,通过分类确定每个建议区域是否对应于包括文字的区域,通过回归确定每个建议区域对应所述训练图像中的位置。训练模块5060根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。预定的收敛条件例如可以是:迭代训练最近一次的误差值落入容许范围、或者误差值小于预定值、或者误差值最小、或者迭代次数达到预定次数,等等。In an alternative example, in conjunction with the above, when detecting text in the training image, the training image including the text is first input to the image feature extraction module 5010, and the image feature module 5010 is included from the convolutional neural network. The training image of the text area extracts the feature map. The feature map obtained by convolution contains the feature information of the training image. Then, the feature map extracted by the image feature extraction module 5010 is input into the suggestion region intercepting module 5030. In the suggestion region intercepting module 5030, the feature maps are respectively laterally intercepted by using a plurality of anchor rectangles to obtain a plurality of suggestion regions. The obtained suggestion areas are respectively input into the classification module 5040 and the regression module 5050 to perform classification and regression, and it is determined by classification whether each of the recommended areas corresponds to an area including characters, and each recommended area corresponds to a position in the training image by regression. The training module 5060 iteratively trains the convolutional neural network according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies a predetermined convergence condition. The predetermined convergence condition may be, for example, iterative training that the most recent error value falls within the allowable range, or the error value is less than the predetermined value, or the error value is the smallest, or the number of iterations reaches a predetermined number of times, and the like.
根据本申请的实施方式,在所述卷积神经网络的每次迭代训练中,训练模块5060根据所述预测文字区域与所述对应的真实文字区域在竖直方向上的交并比,确定所述真 实文字区域和所述预测文字区域之间的差异。在所述卷积神经网络的每次迭代训练中,回归模块5050根据smooth L1损失函数确定所述真实文字区域和所述预测文字区域之间的差异。差异的一种表现形式可以是误差。当预测文字区域与对应的真实文字区域在竖直方向上的交并比大于预先设定的阈值时,该预测文字区域对应的建议区域被训练模块5060确定为正样本;否则,该预测文字区域对应的建议区域被训练模块5060确定为负样本。According to an embodiment of the present application, in each iterative training of the convolutional neural network, the training module 5060 determines the ratio of the predicted text area and the corresponding real text area in the vertical direction. Tell the truth The difference between the real text area and the predicted text area. In each iterative training of the convolutional neural network, regression module 5050 determines a difference between the real text region and the predicted text region based on a smooth L1 loss function. One form of difference can be an error. When the intersection ratio of the predicted text area and the corresponding real text area in the vertical direction is greater than a preset threshold, the recommended area corresponding to the predicted text area is determined as a positive sample by the training module 5060; otherwise, the predicted text area The corresponding suggested area is determined by the training module 5060 as a negative sample.
此外,上文结合图4描述的文字检测训练方法4000的各个特征均适用于图5所示的文字检测训练装置5000。在不同的实施方式中,上文结合图4描述的文字检测训练方法4000的各个特征的任意数量的各种组合可结合于图5所示的文字检测训练装置5000中。Moreover, the various features of the character detection training method 4000 described above in connection with FIG. 4 are applicable to the text detection training device 5000 shown in FIG. In various embodiments, any number of various combinations of the various features of the text detection training method 4000 described above in connection with FIG. 4 can be incorporated into the text detection training device 5000 shown in FIG.
根据示例性实施方式,在以上所述的训练和文字检测中,采用的锚矩形的宽度可以是固定,由此减少了进行匹配所需的锚矩形的尺寸和数量,从而减少了计算量。According to an exemplary embodiment, in the training and character detection described above, the width of the anchor rectangle employed may be fixed, thereby reducing the size and number of anchor rectangles required for matching, thereby reducing the amount of calculation.
根据示例性实施方式,在以上所述的训练和文字检测中,采用的锚矩形的宽度可等于卷积神经网络的步长,由此,将检测结果横向拼接后形成检测结果正好对应于检测区域的整个宽度。可选地,采用的锚矩形的宽度可略大于卷积神经网络的步长,例如,锚矩形的宽度可以为卷积神经网络的步长+1,由此,将检测结果横向拼接后形成检测结果对应于检测区域的整个宽度并具有少量重叠部分,以避免由于实际使用中的误差等因素而在相邻锚矩形之间产生间隙,从而遗漏检测区域的某些中间宽度。According to an exemplary embodiment, in the training and character detection described above, the width of the anchor rectangle used may be equal to the step size of the convolutional neural network, whereby the detection result is laterally spliced to form a detection result corresponding to the detection area. The entire width. Optionally, the width of the anchor rectangle used may be slightly larger than the step size of the convolutional neural network. For example, the width of the anchor rectangle may be +1 of the convolutional neural network, thereby forming a detection by laterally splicing the detection results. The result corresponds to the entire width of the detection area and has a small amount of overlap to avoid a gap between adjacent anchor rectangles due to factors such as errors in actual use, thereby missing some intermediate width of the detection area.
参照图1至图5描述的文字检测方法和装置及文字检测训练方法和装置可通过计算机系统来实施。该计算机系统可包括存储有可执行指令的存储器以及处理器。处理器与存储器通信以执行可执行指令从而实施参照图1至图5描述的文字检测方法和装置及文字检测训练方法和装置。可替代地或附加地,参照图1至图5描述的文字检测方法和装置及文字检测训练方法和装置可通过非暂时性计算机存储介质来实施。该介质存储计算机可读指令,当这些指令被执行时使处理器执行参照图1至图5描述的文字检测方法和装置及文字检测训练方法和装置。The character detecting method and apparatus and the character detecting training method and apparatus described with reference to Figs. 1 to 5 can be implemented by a computer system. The computer system can include a memory that stores executable instructions and a processor. The processor is in communication with the memory to execute executable instructions to implement the text detection method and apparatus and text detection training method and apparatus described with reference to Figures 1 through 5. Alternatively or additionally, the text detection method and apparatus and text detection training method and apparatus described with reference to Figures 1 through 5 may be implemented by a non-transitory computer storage medium. The medium stores computer readable instructions that, when executed, cause the processor to perform the text detection method and apparatus and text detection training method and apparatus described with reference to Figures 1 through 5.
现参照图6,图6示出了适合实施本申请实施例的计算机系统6000的结构示意图。Referring now to Figure 6, there is shown a block diagram of a computer system 6000 suitable for implementing embodiments of the present application.
如图6所示,计算机系统6000可包括处理单元(如中央处理单元(CPU)6001、图像处理单元(GPU)等),其可根据存储在只读存储器(ROM)6002中的程序或从存储部分6008加载至随机存取存储器(RAM)6003中的程序而执行各种适当的动作和过程。在RAM 6003中,还可存储有系统6000操作所需要的各种程序和数据。CPU 6001、ROM 6002和RAM 6003通过总线6004彼此连接。输入/输出I/O接口6005也与总线6004连接。As shown in FIG. 6, computer system 6000 can include a processing unit (such as a central processing unit (CPU) 6001, an image processing unit (GPU), etc.) that can be stored according to a program stored in read only memory (ROM) 6002 or from a storage device. Portion 6008 loads into a program in random access memory (RAM) 6003 to perform various appropriate actions and processes. In the RAM 6003, various programs and data required for the operation of the system 6000 can also be stored. The CPU 6001, the ROM 6002, and the RAM 6003 are connected to each other through a bus 6004. The input/output I/O interface 6005 is also connected to the bus 6004.
以下为可与I/O接口6005连接的部件:包括键盘、鼠标等的输入部分6006;包括 阴极射线管CRT、液晶显示设备LCD和扬声器等的输出部分6007;包括硬盘等的存储部分6008;以及包括网络接口卡(如LAN卡和调制解调器等)的通信部分6009。通信部分6009可通过诸如因特网等网络执行通信处理。根据需要,驱动器6010也可与I/O接口6005连接。如磁盘、光盘、磁光盘、半导体存储器等的可拆卸介质6011可安装在驱动器6010上,以便于从其上读出的计算机程序根据需要被安装入存储部分6008。The following are components that can be connected to the I/O interface 6005: an input portion 6006 including a keyboard, a mouse, etc.; An output portion 6007 of a cathode ray tube CRT, a liquid crystal display device LCD and a speaker, etc.; a storage portion 6008 including a hard disk or the like; and a communication portion 6009 including a network interface card such as a LAN card and a modem. The communication section 6009 can perform communication processing through a network such as the Internet. The driver 6010 can also be connected to the I/O interface 6005 as needed. A removable medium 6011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like can be mounted on the drive 6010 so that a computer program read therefrom can be installed into the storage portion 6008 as needed.
在一个可选示例中,根据本公开的实施例,以上参照图1至图5描述的文字检测方法和装置及文字检测训练方法和装置可实施为计算机软件程序。例如,本公开的实施例可包括计算机程序产品,该产品包括有形地体现在机器可读介质中的计算机程序。该计算机程序包括用于执行参照图1至图5描述的文字检测方法和装置及文字检测训练方法和装置。在这种实施例中,计算机程序可通过通信部分6009从网络上下载并进行安装,和/或可从可拆卸介质6011安装。In an alternative example, the text detection method and apparatus and text detection training method and apparatus described above with reference to FIGS. 1 through 5 may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, embodiments of the present disclosure can include a computer program product comprising a computer program tangibly embodied in a machine readable medium. The computer program includes a text detection method and apparatus and a character detection training method and apparatus for performing the description with reference to FIGS. 1 through 5. In such an embodiment, the computer program can be downloaded and installed from the network via the communication portion 6009, and/or can be installed from the removable medium 6011.
本申请的文字检测技术可以用于公司卡证识别产品中,例如,在员工佩带公司卡证经过公司门禁系统的摄像头时,公司门禁系统的数据处理设备(如与摄像头通过网络连接的计算机或者服务器等)可以通过摄像头获得佩带公司卡证的员工的图像,公司门禁系统的数据处理设备通过利用本申请的文字检测技术,可以获得图像中的公司卡证上的文字区域,通过对该文字区域进行文字识别,可以获得公司卡证上标注的员工姓名以及部门等信息。The text detection technology of the present application can be used in the company card identification product, for example, when the employee wears the company card certificate through the camera of the company access control system, the data processing device of the company access control system (such as a computer or server connected to the camera through the network) The image of the employee wearing the company card can be obtained by the camera. The data processing device of the company access control system can obtain the text area on the company card in the image by using the text detection technology of the application, by performing the text area on the text area. Text recognition can obtain information such as the employee's name and department marked on the company card.
本申请的文字检测技术还可以用于涉及文本框定位的多种应用中,例如,针对医疗票据、快递单以及发票等格式文本进行文本框定位,以便于对定位的文本框进行文字识别。文本框定位的结果或者文字识别的结果可以本地存储或者显示,也可以传输给服务器或者对等网络中的对等者等。本申请不限制定位后的文本框的具体应用场景。The text detection technology of the present application can also be used in various applications involving text box positioning, for example, text box positioning for formatted texts such as medical bills, express orders, and invoices, so as to facilitate text recognition of the positioned text box. The result of the text box positioning or the result of the text recognition may be stored or displayed locally, or may be transmitted to a server or a peer in a peer-to-peer network. This application does not limit the specific application scenario of the text box after positioning.
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products in accordance with various embodiments of the present application. In this regard, each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified. Functional executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
本申请的实施例所涉及的单元或模块可通过软件或硬件实施。所描述的单元或模块也可设置在处理器中。这些单元或模块的名称不应被视为限制这些单元或模块。 The units or modules involved in the embodiments of the present application may be implemented by software or hardware. The described unit or module can also be provided in the processor. The names of these units or modules should not be construed as limiting these units or modules.
以上描述仅为本申请的示例性实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不背离所述申请构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的具有类似功能的技术特征进行互相替换而形成的技术方案。 The above description is only illustrative of the exemplary embodiments of the application and the description of the technical principles applied. It should be understood by those skilled in the art that the scope of the present application is not limited to the specific combination of the above technical features, and should also be covered by the above technical features or without departing from the concept of the application. Other technical solutions formed by arbitrarily combining the equivalent features. For example, the above features are combined with the technical features of the similar functions disclosed in the present application to replace each other.

Claims (36)

  1. 一种文字检测方法,包括:A text detection method comprising:
    使用卷积神经网络从包括文字区域的图像提取特征图;Extracting a feature map from an image including a text region using a convolutional neural network;
    采用多个锚矩形对所述特征图分别进行横向截取,得到多个建议区域;The feature maps are separately intercepted by using a plurality of anchor rectangles to obtain a plurality of suggested regions;
    将每个建议区域通过所述卷积神经网络进行分类和回归,其中,通过所述分类来确定每个建议区域是否对应于包括文字的区域,通过所述回归来确定每个建议区域对应所述图像中的位置;以及Each of the suggested regions is classified and regressed by the convolutional neural network, wherein, by the classification, it is determined whether each of the suggested regions corresponds to an area including characters, and each of the suggested regions is determined by the regression The position in the image;
    将通过分类确定的对应于包括文字的区域的各建议区域根据通过回归确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。Each of the suggestion regions corresponding to the region including the character determined by the classification is laterally spliced according to the respective suggested regions determined by the regression corresponding to the positions in the image to obtain the text region detection result.
  2. 根据权利要求1所述的文字检测方法,所述区域横向拼接包括:The character detecting method according to claim 1, wherein the horizontal splicing of the area comprises:
    根据通过回归确定的所述各建议区域分别对应所述图像中的位置,将位置相邻的和/或位置有交集的建议区域进行连接,由此得到所述文字区域检测结果;或者And determining, according to the positions in the image, the recommended areas determined by the regression, respectively, the recommended areas where the adjacent positions and/or the positions are intersected, thereby obtaining the text area detection result; or
    根据通过回归确定的所述各建议区域分别对应所述图像中的位置,将位置相邻的和/或位置有交集的建议区域对应的锚矩形进行连接,由此得到所述文字区域检测结果。The character region detection result is obtained by connecting the anchor rectangles corresponding to the suggested regions in which the adjacent positions and/or the positions are intersected according to the positions in the image determined by the regression, respectively.
  3. 根据权利要求1或2所述的文字检测方法,还包括预先对所述卷积神经网络进行训练,其中,对所述卷积神经网络的训练包括:The character detecting method according to claim 1 or 2, further comprising training the convolutional neural network in advance, wherein training on the convolutional neural network comprises:
    使用所述卷积神经网络从包括文字区域的训练图像提取特征图;Extracting a feature map from the training image including the text region using the convolutional neural network;
    采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;The feature map of the training image is laterally intercepted by using multiple anchor rectangles to obtain a plurality of suggested regions;
    将每个锚矩形截取的建议区域通过所述卷积神经网络进行分类和回归,其中所述分类确定每个建议区域是否对应于包括文字的区域,所述回归确定每个建议区域对应所述训练图像中的位置;以及A suggested region intercepted by each anchor rectangle is classified and regressed by the convolutional neural network, wherein the classification determines whether each suggested region corresponds to an area including a character, and the regression determines that each suggested region corresponds to the training The position in the image;
    根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。The convolutional neural network is iteratively trained according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies a predetermined convergence condition.
  4. 根据权利要求3所述的文字检测方法,其中,在所述卷积神经网络的每次迭代训练中,根据所述预测文字区域与所述对应的真实文字区域在竖直方向上的交并比,确定所述真实文字区域和所述预测文字区域之间的差异。The character detecting method according to claim 3, wherein in each iterative training of the convolutional neural network, a ratio of the predicted text area to the corresponding real text area in a vertical direction is compared Determining a difference between the real text area and the predicted text area.
  5. 根据权利要求3或4所述的文字检测方法,其中,在所述卷积神经网络的每次迭代训练中,根据smooth L1损失函数确定所述真实文字区域和所述预测文字区域之间的差异。The character detecting method according to claim 3 or 4, wherein in each iterative training of the convolutional neural network, the difference between the real text region and the predicted text region is determined according to a smooth L1 loss function .
  6. 根据权利要求3-5任一所述的文字检测方法,其中,当预测文字区域与对应的真实文字区域在竖直方向上的交并比大于预先设定的阈值时,该预测文字区域对应的建 议区域被确定为正样本;否则,该预测文字区域对应的建议区域被确定为负样本。The character detecting method according to any one of claims 3 to 5, wherein when the ratio of the predicted text area to the corresponding real text area in the vertical direction is greater than a predetermined threshold, the predicted text area corresponds to Built The negotiation area is determined to be a positive sample; otherwise, the suggested area corresponding to the predicted text area is determined to be a negative sample.
  7. 根据权利要求1-6任一所述的文字检测方法,其中,所述锚矩形的宽度是固定的。The character detecting method according to any one of claims 1 to 6, wherein the width of the anchor rectangle is fixed.
  8. 根据权利要求1-7任一所述的文字检测方法,其中,所述锚矩形的宽度根据所述卷积神经网络的步长确定。The character detecting method according to any one of claims 1 to 7, wherein the width of the anchor rectangle is determined according to a step size of the convolutional neural network.
  9. 根据权利要求8所述的文字检测方法,其中,所述锚矩形的宽度等于或大于所述卷积神经网络的步长。The character detecting method according to claim 8, wherein a width of the anchor rectangle is equal to or larger than a step size of the convolutional neural network.
  10. 一种文字检测训练方法,包括:A text detection training method includes:
    使用卷积神经网络从包括文字区域的训练图像提取特征图;Extracting a feature map from a training image including a text region using a convolutional neural network;
    采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;The feature map of the training image is laterally intercepted by using multiple anchor rectangles to obtain a plurality of suggested regions;
    将每个锚矩形截取的建议区域通过所述卷积神经网络进行分类和回归,其中所述分类确定每个建议区域是否对应于包括文字的区域,所述回归确定每个建议区域对应所述训练图像中的位置;以及A suggested region intercepted by each anchor rectangle is classified and regressed by the convolutional neural network, wherein the classification determines whether each suggested region corresponds to an area including a character, and the regression determines that each suggested region corresponds to the training The position in the image;
    根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。The convolutional neural network is iteratively trained according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies a predetermined convergence condition.
  11. 根据权利要求10所述的文字检测训练方法,其中,在所述卷积神经网络的每次迭代训练中,根据所述预测文字区域与所述对应的真实文字区域在竖直方向上的交并比,确定所述真实文字区域和所述预测文字区域之间的差异。The character detection training method according to claim 10, wherein in each iterative training of the convolutional neural network, according to the intersection of the predicted text region and the corresponding real text region in the vertical direction And determining a difference between the real text area and the predicted text area.
  12. 根据权利要求10或11所述的文字检测训练方法,其中,在所述卷积神经网络的每次迭代训练中,根据smooth L1损失函数确定所述真实文字区域和所述预测文字区域之间的差异。The character detection training method according to claim 10 or 11, wherein in each iterative training of the convolutional neural network, determining between the real text region and the predicted text region is performed according to a smooth L1 loss function difference.
  13. 根据权利要求10-12任一所述的文字检测训练方法,其中,当预测文字区域与对应的真实文字区域在竖直方向上的交并比大于预先设定的阈值时,该预测文字区域对应的建议区域被确定为正样本;否则,该预测文字区域对应的建议区域被确定为负样本。The character detection training method according to any one of claims 10 to 12, wherein when the ratio of the predicted text area to the corresponding real text area in the vertical direction is greater than a preset threshold, the predicted text area corresponds to The suggested area is determined to be a positive sample; otherwise, the suggested area corresponding to the predicted text area is determined to be a negative sample.
  14. 根据权利要求10-13任一所述的文字检测训练方法,其中,所述锚矩形的宽度是固定的。A character detection training method according to any one of claims 10-13, wherein the width of the anchor rectangle is fixed.
  15. 根据权利要求10-14任一所述的文字检测训练方法,其中,所述锚矩形的宽度根据所述卷积神经网络的步长确定。The character detection training method according to any one of claims 10-14, wherein the width of the anchor rectangle is determined according to a step size of the convolutional neural network.
  16. 根据权利要求15所述的文字检测训练方法,其中,所述锚矩形的宽度等于或大于所述卷积神经网络的步长。The character detection training method according to claim 15, wherein a width of the anchor rectangle is equal to or larger than a step size of the convolutional neural network.
  17. 一种文字检测装置,包括:A text detecting device comprising:
    图像特征提取模块,使用卷积神经网络从包括文字区域的图像提取特征图; An image feature extraction module for extracting a feature map from an image including a text region using a convolutional neural network;
    建议区域截取模块,采用多个锚矩形对所述特征图分别进行横向截取,得到多个建议区域;The area intercepting module is recommended, and the feature maps are separately intercepted by using multiple anchor rectangles to obtain a plurality of suggestion areas;
    分类模块,将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域;a classification module that classifies each suggested region by the convolutional neural network to determine whether each suggested region corresponds to an area including text;
    回归模块,将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应所述图像中的位置;以及a regression module that regresses each suggested region through the convolutional neural network to determine that each suggested region corresponds to a location in the image;
    检测结果拼接模块,将所述分类模块确定的对应于包括文字的区域的各建议区域根据所述回归模块确定的所述各建议区域分别对应所述图像中的位置进行区域横向拼接,以得到文字区域检测结果。a result splicing module, wherein each suggestion area corresponding to the area including the text determined by the classification module is horizontally spliced according to the position in the image according to the position determined by the regression module, to obtain a text Regional test results.
  18. 根据权利要求17所述的文字检测装置,所述区域横向拼接包括:The character detecting device according to claim 17, wherein the horizontal splicing of the region comprises:
    根据通过回归确定的所述各建议区域分别对应所述图像中的位置,将位置相邻的和/或位置有交集的建议区域进行连接,由此得到所述文字区域检测结果;或者And determining, according to the positions in the image, the recommended areas determined by the regression, respectively, the recommended areas where the adjacent positions and/or the positions are intersected, thereby obtaining the text area detection result; or
    根据通过回归确定的所述各建议区域分别对应所述图像中的位置,将位置相邻的和/或位置有交集的建议区域对应的锚矩形进行连接,由此得到所述文字区域检测结果。The character region detection result is obtained by connecting the anchor rectangles corresponding to the suggested regions in which the adjacent positions and/or the positions are intersected according to the positions in the image determined by the regression, respectively.
  19. 根据权利要求17或18所述的文字检测装置,还包括预先对所述卷积神经网络进行训练的训练模块,其中,在对所述卷积神经网络的预先训练过程中:A character detecting apparatus according to claim 17 or 18, further comprising a training module for training said convolutional neural network in advance, wherein in a pre-training process for said convolutional neural network:
    所述图像特征提取模块从包括文字区域的训练图像提取特征图;The image feature extraction module extracts a feature map from a training image including a text region;
    所述建议区域截取模块采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;The suggested area intercepting module uses a plurality of anchor rectangles to perform horizontal interception on the feature image of the training image to obtain a plurality of suggested areas;
    所述分类模块将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域,所述回归模块将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应所述训练图像中的位置;以及The classification module classifies each of the suggested regions by the convolutional neural network to determine whether each of the suggested regions corresponds to an area including characters, and the regression module performs each of the suggested regions through the convolutional neural network. Regressing to determine that each suggested area corresponds to a location in the training image;
    所述训练模块根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。The training module iteratively trains the convolutional neural network according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies a predetermined convergence condition.
  20. 根据权利要求19所述的文字检测装置,其中,在所述卷积神经网络的每次迭代训练中,训练模块根据所述预测文字区域与所述对应的真实文字区域在竖直方向上的交并比,确定所述真实文字区域和所述预测文字区域之间的差异。The character detecting device according to claim 19, wherein, in each iterative training of the convolutional neural network, the training module intersects the predicted text region and the corresponding real text region in a vertical direction And determining a difference between the real text area and the predicted text area.
  21. 根据权利要求19或20所述的文字检测装置,其中,在所述卷积神经网络的每次迭代训练中,回归模块根据smooth L1损失函数确定所述真实文字区域和所述预测文字区域之间的差异。The character detecting apparatus according to claim 19 or 20, wherein, in each iterative training of said convolutional neural network, a regression module determines between said real text area and said predicted text area based on a smooth L1 loss function The difference.
  22. 根据权利要求19-21任一所述的文字检测装置,其中,当预测文字区域与对应 的真实文字区域在竖直方向上的交并比大于预先设定的阈值时,该预测文字区域对应的建议区域被训练模块确定为正样本;否则,该预测文字区域对应的建议区域被训练模块确定为负样本。The character detecting device according to any one of claims 19 to 21, wherein when the predicted text area and the corresponding When the intersection ratio of the real text area in the vertical direction is greater than a preset threshold, the recommended area corresponding to the predicted text area is determined as a positive sample by the training module; otherwise, the recommended area corresponding to the predicted text area is trained by the training module. Determined as a negative sample.
  23. 根据权利要求17-22任一所述的文字检测装置,其中,所述锚矩形的宽度是固定的。A character detecting device according to any one of claims 17 to 22, wherein the width of the anchor rectangle is fixed.
  24. 根据权利要求17-23任一所述的文字检测装置,其中,所述锚矩形的宽度根据所述卷积神经网络的步长确定。A character detecting apparatus according to any one of claims 17 to 23, wherein a width of said anchor rectangle is determined according to a step size of said convolutional neural network.
  25. 根据权利要求24所述的文字检测装置,其中,所述锚矩形的宽度等于或大于所述卷积神经网络的步长。The character detecting device according to claim 24, wherein a width of said anchor rectangle is equal to or larger than a step size of said convolutional neural network.
  26. 一种文字检测训练装置,包括:A text detection training device includes:
    图像特征提取模块,使用卷积神经网络从包括文字区域的训练图像提取特征图;An image feature extraction module for extracting a feature map from a training image including a text region using a convolutional neural network;
    建议区域截取模块,采用多个锚矩形对所述训练图像的特征图进行横向截取,得到多个建议区域;The area intercepting module is proposed to perform horizontal interception on the feature image of the training image by using multiple anchor rectangles to obtain a plurality of suggested regions;
    分类模块,将每个建议区域通过所述卷积神经网络进行分类,以确定每个建议区域是否对应于包括文字的区域;a classification module that classifies each suggested region by the convolutional neural network to determine whether each suggested region corresponds to an area including text;
    回归模块,将每个建议区域通过所述卷积神经网络进行回归,以确定每个建议区域对应所述训练图像中的位置;以及a regression module that regresses each suggested region through the convolutional neural network to determine that each suggested region corresponds to a location in the training image;
    训练模块,根据已知的与所述训练图像对应的真实文字区域以及所述分类和回归得到的预测文字区域的差异,迭代训练所述卷积神经网络直至训练结果满足预定收敛条件。The training module iteratively trains the convolutional neural network according to the known difference between the real text region corresponding to the training image and the predicted text region obtained by the classification and regression until the training result satisfies a predetermined convergence condition.
  27. 根据权利要求26所述的文字检测训练装置,其中,在所述卷积神经网络的每次迭代训练中,训练模块根据所述预测文字区域与所述对应的真实文字区域在竖直方向上的交并比,确定所述真实文字区域和所述预测文字区域之间的差异。The character detection training apparatus according to claim 26, wherein in each iterative training of said convolutional neural network, said training module is in a vertical direction according to said predicted text area and said corresponding real text area The difference between the real text area and the predicted text area is determined.
  28. 根据权利要求26或27所述的文字检测训练装置,其中,在所述卷积神经网络的每次迭代训练中,回归模块根据smooth L1损失函数确定所述真实文字区域和所述预测文字区域之间的差异。The character detection training apparatus according to claim 26 or 27, wherein, in each iterative training of said convolutional neural network, said regression module determines said real text area and said predicted text area based on a smooth L1 loss function The difference between the two.
  29. 根据权利要求26-28任一所述的文字检测训练装置,其中,当预测文字区域与对应的真实文字区域在竖直方向上的交并比大于预先设定的阈值时,该预测文字区域对应的建议区域被训练模块确定为正样本;否则,该预测文字区域对应的建议区域被训练模块确定为负样本。The character detection training apparatus according to any one of claims 26-28, wherein the predicted text area corresponds to when the intersection ratio of the predicted text area and the corresponding real text area in the vertical direction is greater than a preset threshold The suggested area is determined by the training module as a positive sample; otherwise, the suggested area corresponding to the predicted text area is determined by the training module as a negative sample.
  30. 根据权利要求26-29任一所述的文字检测训练装置,其中,所述锚矩形的宽度是固定的。A character detecting training device according to any one of claims 26-29, wherein the width of the anchor rectangle is fixed.
  31. 根据权利要求26-30任一所述的文字检测训练装置,其中,所述锚矩形的宽度 根据所述卷积神经网络的步长确定。A character detecting training apparatus according to any one of claims 26 to 30, wherein a width of said anchor rectangle Determined according to the step size of the convolutional neural network.
  32. 根据权利要求31所述的文字检测训练装置,其中,所述锚矩形的宽度等于或大于所述卷积神经网络的步长。The character detecting training apparatus according to claim 31, wherein a width of said anchor rectangle is equal to or larger than a step size of said convolutional neural network.
  33. 一种文字检测装置,包括:A text detecting device comprising:
    存储器,存储有可执行指令;以及a memory that stores executable instructions;
    一个或多个处理器,与所述存储器通信,以执行所述可执行指令,从而执行如权利要求1-9中的任一权利要求所述的文字检测方法中的操作。One or more processors in communication with the memory to execute the executable instructions to perform the operations in the text detection method of any of claims 1-9.
  34. 一种文字检测训练装置,包括:A text detection training device includes:
    存储器,存储有可执行指令;以及a memory that stores executable instructions;
    一个或多个处理器,与所述存储器通信,以执行所述可执行指令,从而执行如权利要求10-16中的任一权利要求所述的文字检测训练方法中的操作。One or more processors in communication with the memory to execute the executable instructions to perform the operations in the text detection training method of any of claims 10-16.
  35. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备中运行时,所述设备中的处理器执行用于实现权利要求1-9中的任一权利要求所述的文字检测方法或者用于实现权利要求10-16中的任一权利要求所述的文字检测训练方法中的步骤的可执行指令。A computer program comprising computer readable code, the processor in the device executing a text for implementing any of claims 1-9 when the computer readable code is run in a device A method of detecting or an executable instruction for implementing the steps in the text detection training method of any of claims 10-16.
  36. 一种计算机可读介质,存储有计算机可读代码,当所述计算机可读代码在设备中运行时,所述设备中的处理器执行用于实现权利要求1-9中的任一权利要求所述的文字检测方法或者用于实现权利要求10-16中的任一权利要求所述的文字检测训练方法中的步骤的可执行指令。 A computer readable medium storing computer readable code, the processor in the device executing to perform any of claims 1-9 when the computer readable code is run in a device The text detection method or the executable instruction for implementing the steps in the character detection training method according to any one of claims 10-16.
PCT/CN2017/102679 2016-09-22 2017-09-21 Character detection method and device, and character detection training method and device WO2018054326A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610842572.3 2016-09-22
CN201610842572.3A CN106446899A (en) 2016-09-22 2016-09-22 Text detection method and device and text detection training method and device

Publications (1)

Publication Number Publication Date
WO2018054326A1 true WO2018054326A1 (en) 2018-03-29

Family

ID=58166338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/102679 WO2018054326A1 (en) 2016-09-22 2017-09-21 Character detection method and device, and character detection training method and device

Country Status (2)

Country Link
CN (1) CN106446899A (en)
WO (1) WO2018054326A1 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840524A (en) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 Kind identification method, device, equipment and the storage medium of text
CN110210478A (en) * 2019-06-04 2019-09-06 天津大学 A kind of commodity outer packing character recognition method
CN110619325A (en) * 2018-06-20 2019-12-27 北京搜狗科技发展有限公司 Text recognition method and device
CN110991440A (en) * 2019-12-11 2020-04-10 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN111046866A (en) * 2019-12-13 2020-04-21 哈尔滨工程大学 Method for detecting RMB crown word number region by combining CTPN and SVM
CN111191695A (en) * 2019-12-19 2020-05-22 杭州安恒信息技术股份有限公司 Website picture tampering detection method based on deep learning
CN111325194A (en) * 2018-12-13 2020-06-23 杭州海康威视数字技术股份有限公司 Character recognition method, device and equipment and storage medium
CN111340023A (en) * 2020-02-24 2020-06-26 创新奇智(上海)科技有限公司 Text recognition method and device, electronic equipment and storage medium
CN111339995A (en) * 2020-03-16 2020-06-26 合肥闪捷信息科技有限公司 Sensitive image identification method based on neural network
CN111461304A (en) * 2020-03-31 2020-07-28 北京小米松果电子有限公司 Training method for classifying neural network, text classification method, text classification device and equipment
CN111615702A (en) * 2018-12-07 2020-09-01 华为技术有限公司 Method, device and equipment for extracting structured data from image
CN111639566A (en) * 2020-05-19 2020-09-08 浙江大华技术股份有限公司 Method and device for extracting form information
CN111738326A (en) * 2020-06-16 2020-10-02 中国工商银行股份有限公司 Sentence granularity marking training sample generation method and device
CN111767867A (en) * 2020-06-30 2020-10-13 创新奇智(北京)科技有限公司 Text detection method, model training method and corresponding devices
CN111967391A (en) * 2020-08-18 2020-11-20 清华大学 Text recognition method and computer-readable storage medium for medical laboratory test reports
CN112418216A (en) * 2020-11-18 2021-02-26 湖南师范大学 Method for detecting characters in complex natural scene image
CN112541489A (en) * 2019-09-23 2021-03-23 顺丰科技有限公司 Image detection method and device, mobile terminal and storage medium
CN112861045A (en) * 2021-02-20 2021-05-28 北京金山云网络技术有限公司 Method and device for displaying file, storage medium and electronic device
CN112966690A (en) * 2021-03-03 2021-06-15 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN113012029A (en) * 2019-12-20 2021-06-22 北京搜狗科技发展有限公司 Curved surface image correction method and device and electronic equipment
CN113158862A (en) * 2021-04-13 2021-07-23 哈尔滨工业大学(深圳) Lightweight real-time face detection method based on multiple tasks
CN113313066A (en) * 2021-06-23 2021-08-27 Oppo广东移动通信有限公司 Image recognition method, image recognition device, storage medium and terminal
CN113762109A (en) * 2021-08-23 2021-12-07 北京百度网讯科技有限公司 Training method of character positioning model and character positioning method
CN113887282A (en) * 2021-08-30 2022-01-04 中国科学院信息工程研究所 Detection system and method for any-shape adjacent text in scene image
US20220189190A1 (en) * 2019-03-28 2022-06-16 Nielsen Consumer Llc Methods and apparatus to detect a text region of interest in a digital image using machine-based analysis
US20220245954A1 (en) * 2020-03-25 2022-08-04 Tencent Technology (Shenzhen) Company Limited Image recognition method, apparatus, terminal, and storage medium

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446899A (en) * 2016-09-22 2017-02-22 北京市商汤科技开发有限公司 Text detection method and device and text detection training method and device
CN106980858B (en) * 2017-02-28 2020-08-18 中国科学院信息工程研究所 Language text detection and positioning system and language text detection and positioning method using same
CN108229299B (en) * 2017-10-31 2021-02-26 北京市商汤科技开发有限公司 Certificate identification method and device, electronic equipment and computer storage medium
CN108229303B (en) * 2017-11-14 2021-05-04 北京市商汤科技开发有限公司 Detection recognition and training method, device, equipment and medium for detection recognition network
CN108229469A (en) * 2017-11-22 2018-06-29 北京市商汤科技开发有限公司 Recognition methods, device, storage medium, program product and the electronic equipment of word
CN109961068A (en) * 2017-12-26 2019-07-02 阿里巴巴集团控股有限公司 Image recognition, training, searching method and device and equipment, medium
CN109740585A (en) * 2018-03-28 2019-05-10 北京字节跳动网络技术有限公司 A kind of text positioning method and device
CN110321886A (en) * 2018-03-30 2019-10-11 高德软件有限公司 A kind of character area recognition methods and device
CN108549893B (en) * 2018-04-04 2020-03-31 华中科技大学 End-to-end identification method for scene text with any shape
CN108564084A (en) * 2018-05-08 2018-09-21 北京市商汤科技开发有限公司 character detecting method, device, terminal and storage medium
CN108664971B (en) * 2018-05-22 2021-12-14 中国科学技术大学 Pulmonary nodule detection method based on 2D convolutional neural network
CN111339341A (en) * 2018-12-19 2020-06-26 顺丰科技有限公司 Model training method and device, positioning method and device, and equipment
CN110163202B (en) * 2019-04-03 2024-06-04 平安科技(深圳)有限公司 Text region positioning method and device, terminal equipment and medium
CN110321892B (en) * 2019-06-04 2022-12-13 腾讯科技(深圳)有限公司 Picture screening method and device and electronic equipment
CN113033269B (en) * 2019-12-25 2023-08-25 华为技术服务有限公司 Data processing method and device
CN112464925A (en) * 2020-11-11 2021-03-09 湖北省楚建易网络科技有限公司 Mobile terminal account opening data bank information automatic extraction method based on machine learning
WO2023279186A1 (en) * 2021-07-06 2023-01-12 Orbiseed Technology Inc. Methods and systems for extracting text and symbols from documents

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608454A (en) * 2015-12-21 2016-05-25 上海交通大学 Text structure part detection neural network based text detection method and system
CN105809164A (en) * 2016-03-11 2016-07-27 北京旷视科技有限公司 Character identification method and device
CN105868758A (en) * 2015-01-21 2016-08-17 阿里巴巴集团控股有限公司 Method and device for detecting text area in image and electronic device
CN106446899A (en) * 2016-09-22 2017-02-22 北京市商汤科技开发有限公司 Text detection method and device and text detection training method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499588B2 (en) * 2004-05-20 2009-03-03 Microsoft Corporation Low resolution OCR for camera acquired documents
CN104463209B (en) * 2014-12-08 2017-05-24 福建坤华仪自动化仪器仪表有限公司 Method for recognizing digital code on PCB based on BP neural network
CN105447529B (en) * 2015-12-30 2020-11-03 商汤集团有限公司 Method and system for detecting clothes and identifying attribute value thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868758A (en) * 2015-01-21 2016-08-17 阿里巴巴集团控股有限公司 Method and device for detecting text area in image and electronic device
CN105608454A (en) * 2015-12-21 2016-05-25 上海交通大学 Text structure part detection neural network based text detection method and system
CN105809164A (en) * 2016-03-11 2016-07-27 北京旷视科技有限公司 Character identification method and device
CN106446899A (en) * 2016-09-22 2017-02-22 北京市商汤科技开发有限公司 Text detection method and device and text detection training method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHAOQING REN ET AL.: "Faster R-CNN: Towards Real-Time Object Detection with Re- gion Proposal Networks", ARXIV:1506.01497V3, vol. 39, no. 6, 6 January 2016 (2016-01-06), pages 1137 - 1149, XP055583592, Retrieved from the Internet <URL:https://arxiv.org/pdf/1506.01497v3> *
ZHUOYAO ZHONG ET AL.: "DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images", ARXIV: 1605. 07314VL [CS. CV, 24 May 2016 (2016-05-24), pages 1 - 12, XP080703156, Retrieved from the Internet <URL:https://arxiv.org/pdf/1605.07314vl> *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619325A (en) * 2018-06-20 2019-12-27 北京搜狗科技发展有限公司 Text recognition method and device
CN110619325B (en) * 2018-06-20 2024-03-08 北京搜狗科技发展有限公司 Text recognition method and device
CN111615702B (en) * 2018-12-07 2023-10-17 华为云计算技术有限公司 Method, device and equipment for extracting structured data from image
CN111615702A (en) * 2018-12-07 2020-09-01 华为技术有限公司 Method, device and equipment for extracting structured data from image
CN111325194A (en) * 2018-12-13 2020-06-23 杭州海康威视数字技术股份有限公司 Character recognition method, device and equipment and storage medium
CN111325194B (en) * 2018-12-13 2023-12-29 杭州海康威视数字技术股份有限公司 Character recognition method, device and equipment and storage medium
CN109840524B (en) * 2019-01-04 2023-07-11 平安科技(深圳)有限公司 Text type recognition method, device, equipment and storage medium
CN109840524A (en) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 Kind identification method, device, equipment and the storage medium of text
US20220189190A1 (en) * 2019-03-28 2022-06-16 Nielsen Consumer Llc Methods and apparatus to detect a text region of interest in a digital image using machine-based analysis
CN110210478A (en) * 2019-06-04 2019-09-06 天津大学 A kind of commodity outer packing character recognition method
CN112541489A (en) * 2019-09-23 2021-03-23 顺丰科技有限公司 Image detection method and device, mobile terminal and storage medium
CN110991440A (en) * 2019-12-11 2020-04-10 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN110991440B (en) * 2019-12-11 2023-10-13 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN111046866B (en) * 2019-12-13 2023-04-18 哈尔滨工程大学 Method for detecting RMB crown word number region by combining CTPN and SVM
CN111046866A (en) * 2019-12-13 2020-04-21 哈尔滨工程大学 Method for detecting RMB crown word number region by combining CTPN and SVM
CN111191695B (en) * 2019-12-19 2023-05-23 杭州安恒信息技术股份有限公司 Website picture tampering detection method based on deep learning
CN111191695A (en) * 2019-12-19 2020-05-22 杭州安恒信息技术股份有限公司 Website picture tampering detection method based on deep learning
CN113012029B (en) * 2019-12-20 2023-12-08 北京搜狗科技发展有限公司 Curved surface image correction method and device and electronic equipment
CN113012029A (en) * 2019-12-20 2021-06-22 北京搜狗科技发展有限公司 Curved surface image correction method and device and electronic equipment
CN111340023A (en) * 2020-02-24 2020-06-26 创新奇智(上海)科技有限公司 Text recognition method and device, electronic equipment and storage medium
CN111340023B (en) * 2020-02-24 2022-09-09 创新奇智(上海)科技有限公司 Text recognition method and device, electronic equipment and storage medium
CN111339995B (en) * 2020-03-16 2024-02-20 合肥闪捷信息科技有限公司 Sensitive image recognition method based on neural network
CN111339995A (en) * 2020-03-16 2020-06-26 合肥闪捷信息科技有限公司 Sensitive image identification method based on neural network
US20220245954A1 (en) * 2020-03-25 2022-08-04 Tencent Technology (Shenzhen) Company Limited Image recognition method, apparatus, terminal, and storage medium
US12014556B2 (en) * 2020-03-25 2024-06-18 Tencent Technology (Shenzhen) Company Limited Image recognition method, apparatus, terminal, and storage medium
CN111461304B (en) * 2020-03-31 2023-09-15 北京小米松果电子有限公司 Training method of classified neural network, text classification method, device and equipment
CN111461304A (en) * 2020-03-31 2020-07-28 北京小米松果电子有限公司 Training method for classifying neural network, text classification method, text classification device and equipment
CN111639566A (en) * 2020-05-19 2020-09-08 浙江大华技术股份有限公司 Method and device for extracting form information
CN111738326A (en) * 2020-06-16 2020-10-02 中国工商银行股份有限公司 Sentence granularity marking training sample generation method and device
CN111767867B (en) * 2020-06-30 2022-12-09 创新奇智(北京)科技有限公司 Text detection method, model training method and corresponding devices
CN111767867A (en) * 2020-06-30 2020-10-13 创新奇智(北京)科技有限公司 Text detection method, model training method and corresponding devices
CN111967391A (en) * 2020-08-18 2020-11-20 清华大学 Text recognition method and computer-readable storage medium for medical laboratory test reports
CN112418216A (en) * 2020-11-18 2021-02-26 湖南师范大学 Method for detecting characters in complex natural scene image
CN112418216B (en) * 2020-11-18 2024-01-05 湖南师范大学 Text detection method in complex natural scene image
CN112861045A (en) * 2021-02-20 2021-05-28 北京金山云网络技术有限公司 Method and device for displaying file, storage medium and electronic device
CN112966690B (en) * 2021-03-03 2023-01-13 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN112966690A (en) * 2021-03-03 2021-06-15 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN113158862B (en) * 2021-04-13 2023-08-22 哈尔滨工业大学(深圳) Multitasking-based lightweight real-time face detection method
CN113158862A (en) * 2021-04-13 2021-07-23 哈尔滨工业大学(深圳) Lightweight real-time face detection method based on multiple tasks
CN113313066A (en) * 2021-06-23 2021-08-27 Oppo广东移动通信有限公司 Image recognition method, image recognition device, storage medium and terminal
CN113762109B (en) * 2021-08-23 2023-11-07 北京百度网讯科技有限公司 Training method of character positioning model and character positioning method
CN113762109A (en) * 2021-08-23 2021-12-07 北京百度网讯科技有限公司 Training method of character positioning model and character positioning method
CN113887282A (en) * 2021-08-30 2022-01-04 中国科学院信息工程研究所 Detection system and method for any-shape adjacent text in scene image

Also Published As

Publication number Publication date
CN106446899A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
WO2018054326A1 (en) Character detection method and device, and character detection training method and device
US20240037926A1 (en) Segmenting objects by refining shape priors
US11775574B2 (en) Method and apparatus for visual question answering, computer device and medium
CN108304835B (en) character detection method and device
US11768876B2 (en) Method and device for visual question answering, computer apparatus and medium
US11816710B2 (en) Identifying key-value pairs in documents
US20210103763A1 (en) Method and apparatus for processing laser radar based sparse depth map, device and medium
CN108038880B (en) Method and apparatus for processing image
CN108304775B (en) Remote sensing image recognition method and device, storage medium and electronic equipment
US10657359B2 (en) Generating object embeddings from images
CN110874618B (en) OCR template learning method and device based on small sample, electronic equipment and medium
CN109564575A (en) Classified using machine learning model to image
US10748217B1 (en) Systems and methods for automated body mass index calculation
US11335093B2 (en) Visual tracking by colorization
CN113822428A (en) Neural network training method and device and image segmentation method
CN109345460B (en) Method and apparatus for rectifying image
US11881044B2 (en) Method and apparatus for processing image, device and storage medium
CN113313114B (en) Certificate information acquisition method, device, equipment and storage medium
CN117523586A (en) Check seal verification method and device, electronic equipment and medium
CN113515920B (en) Method, electronic device and computer readable medium for extracting formulas from tables
CN113392215A (en) Training method of production problem classification model, and production problem classification method and device
CN114240780A (en) Seal identification method, device, equipment, medium and product
CN115565152B (en) Traffic sign extraction method integrating vehicle-mounted laser point cloud and panoramic image
CN114328891A (en) Training method of information recommendation model, information recommendation method and device
CN114627481A (en) Form processing method, device, equipment, medium and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17852394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17852394

Country of ref document: EP

Kind code of ref document: A1