WO2018010657A1 - 结构化文本检测方法和系统、计算设备 - Google Patents

结构化文本检测方法和系统、计算设备 Download PDF

Info

Publication number
WO2018010657A1
WO2018010657A1 PCT/CN2017/092586 CN2017092586W WO2018010657A1 WO 2018010657 A1 WO2018010657 A1 WO 2018010657A1 CN 2017092586 W CN2017092586 W CN 2017092586W WO 2018010657 A1 WO2018010657 A1 WO 2018010657A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
area
detected
picture
sample
Prior art date
Application number
PCT/CN2017/092586
Other languages
English (en)
French (fr)
Inventor
向东来
夏炎
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018010657A1 publication Critical patent/WO2018010657A1/zh
Priority to US16/052,584 priority Critical patent/US10937166B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of image processing, and in particular, to a structured text detection method and system, and a computing device.
  • Structured text refers to text with a basically fixed layout structure, such as ID cards, passports, motor vehicle driver's licenses, and tickets.
  • ID cards such as ID cards, passports, motor vehicle driver's licenses, and tickets.
  • people In the digital age, people often need to type and spend a lot of time in order to enter this information into a computer. In order to save time, people began to use the method of taking pictures into pictures and then using computer vision technology to automatically obtain text from pictures.
  • the embodiment of the present application provides a structured text detection scheme.
  • a structured text detection method including:
  • the convolutional neural network receives the picture and the text area template; the picture includes structured text; the text area template includes a position of at least one text area, and the positions of the text areas in the position of the at least one text area are respectively based on Obtaining a position of a corresponding text area in at least one sample picture of the same type;
  • a structured text detection system including:
  • a receiving module configured to receive a picture and a text area template; the picture includes structured text; the text area template includes a location of at least one text area, and positions of each of the at least one text area are respectively based on Obtaining a position of a corresponding text area in at least one sample picture of the same picture;
  • an obtaining module configured to acquire, according to the text area template, an actual location of a group of to-be-detected areas of the picture.
  • a computing device comprising: the structured text detection system of any of the embodiments of the present application.
  • another computing device including:
  • the processor runs the structured text detection system
  • the elements in the structured text detection system of any of the embodiments of the present application are executed.
  • a computing device including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface are completed by using the communication bus Communication with each other;
  • the memory is for storing at least one executable instruction that causes the processor to perform operations of steps in the structured text detection method of any of the embodiments of the present application.
  • a computer system including:
  • One or more processors in communication with the memory to execute the executable instructions to perform the operations of the steps in the structured text detection method of any of the embodiments of the present application.
  • a computer program comprising computer readable code, when the computer readable code is run on a device, the processor in the device performs the implementation of the present application The instructions of the steps in the structured text detection method of an embodiment.
  • a computer readable medium for storing a computer readable instruction, and when the instruction is executed, implementing the structured text detection method according to any embodiment of the present application The operation of each step in the process.
  • the text area template is obtained in advance based on the position of the corresponding text area in the at least one sample picture of the same type, and after the convolutional neural network receives the to-be-detected picture and the text area template, the convolutional neural network acquires the template according to the text area template.
  • the actual position of a group of to-be-detected areas of the picture is detected, the area to be detected is small, the amount of computation required to detect the structured text is reduced, the time taken is reduced, the detection rate is significantly accelerated, and the required computing resources are also significantly reduced.
  • FIG. 1 is a block diagram of an exemplary device suitable for implementing the present application.
  • FIG. 2 is a schematic block diagram of an exemplary apparatus suitable for implementing the present application.
  • FIG. 3 is a flow chart of an embodiment of a structured text detection method in accordance with the present application.
  • FIG. 4 is a flow chart of another embodiment of a structured text detection method in accordance with the present application.
  • FIG. 5 is a flow diagram of an application embodiment of a structured text detection method in accordance with the present application.
  • FIG. 6 is a schematic diagram of a picture used in the application embodiment shown in FIG. 5.
  • FIG. 7 is a block diagram showing an embodiment of a structured text detection system in accordance with the present application.
  • FIG. 8 is a block diagram showing another embodiment of a structured text detection system in accordance with the present application.
  • FIG. 9 is a schematic diagram of yet another embodiment of a structured text detection system in accordance with the present application.
  • FIG. 10 is a schematic diagram of an embodiment of a computing device implementing the structured text detection method of the present application.
  • the structured text detection related art solution of the embodiments of the present application can be applied to a computer system/server that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor system, set-top box, programmable Consumer electronics, networked personal computers, small computer systems, large computer systems, and distributed cloud computing technology environments including any of the above, and so on.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the structured text detection related technical solutions of the embodiments of the present application may be implemented in many ways, including methods, systems, and devices.
  • the methods, systems, and devices of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.
  • FIG. 1 illustrates a block diagram of an exemplary device 10 (eg, a computer system/server) suitable for implementing the present application.
  • the device 10 shown in FIG. 1 is merely an example and should not impose any limitation on the function and scope of use of the present application.
  • device 10 can be embodied in the form of a general purpose computing device. Components of device 10 may include, but are not limited to, one or more processors or processing units 101, system memory 102, and bus 103 that connect different system components, including system memory 102 and processing unit 101.
  • Device 10 can include a variety of computer system readable media. These media can be any available media that can be accessed by device 10, including volatile and non-volatile media, removable and non-removable media, and the like.
  • System memory 102 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 1021 and/or cache memory 1022.
  • Device 10 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • ROM 1023 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 1, commonly referred to as "hard disk drives").
  • a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
  • a removable non-volatile disk such as a CD-ROM, DVD-
  • ROM or other optical media read and write optical drive.
  • each drive can be coupled to bus 103 via one or more data medium interfaces.
  • At least one program product can be included in system memory 102, the program product having a set (e.g., at least one) of program modules configured to perform the functions of the present application.
  • Program module 1024 typically performs the functions and/or methods described herein.
  • Device 10 can also be in communication with one or more external devices 104 (e.g., a keyboard, pointing device, display, etc.). Such communication may occur through an input/output (I/O) interface 105, and device 10 may also be through network adapter 106 with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, For example, the Internet) communication. As shown in FIG. 1, network adapter 106 communicates with other modules of device 10 (e.g., processing unit 101, etc.) via bus 103. It should be understood that although not shown in FIG. 1, other hardware and/or software modules may be utilized in conjunction with device 10.
  • external devices 104 e.g., a keyboard, pointing device, display, etc.
  • I/O input/output
  • network adapter 106 communicates with other modules of device 10 (e.g., processing unit 101, etc.) via bus 103. It should be understood that although not shown in FIG. 1, other hardware and/
  • Processing unit 101 performs various functional applications and data processing by running a computer program stored in system memory 102, for example, to perform structured text inspection as shown in any embodiment of the present application.
  • FIG. 2 shows a block diagram of an exemplary device 20 suitable for implementing the present application.
  • the device 20 may be a mobile terminal, a personal computer (PC), a tablet computer, a server, or the like.
  • computer system 20 includes one or more processors, communication units, etc., which may be: one or more central processing units (CPUs) 201, and/or one or more An image processor (GPU) 213 or the like, the processor may execute various kinds according to executable instructions stored in the read only memory (ROM) 202 or executable instructions loaded from the storage portion 208 into the random access memory (RAM) 203. Proper action and handling.
  • CPUs central processing units
  • GPU An image processor
  • the communication unit 212 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with the read only memory 202 and/or the random access memory 230 to execute executable instructions, connect to the communication portion 212 via the bus 204, and communicate with other target devices via the communication portion 212 to complete any implementation of the present application. The corresponding steps in the method described.
  • the processor performs the steps of: a convolutional neural network receiving a picture and a text area template; the picture comprising structured text; the text area template including a location of at least one text area, The positions of the respective text regions in the position of the at least one text region are respectively obtained based on the positions of the corresponding text regions in the at least one sample picture of the same kind as the picture; the convolutional neural network acquires one of the pictures according to the text area template The actual location of the group to be tested.
  • RAM 203 various programs and data required for the operation of the device can be stored.
  • the CPU 201, the ROM 202, and the RAM 203 are connected to each other through a bus 204.
  • ROM 202 is an optional module.
  • the RAM 203 stores executable instructions, or writes executable instructions to the ROM 202 at runtime, the executable instructions causing the central processing unit 201 to perform the steps involved in the method of any of the embodiments of the present application.
  • An input/output (I/O) interface 205 is also coupled to bus 204.
  • the communication unit 212 may be integrated, or may be configured to have a plurality of sub-modules (for example, a plurality of IB network cards) and be respectively connected to the bus.
  • the following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, etc.; an output portion 207 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 208 including a hard disk or the like. And a communication portion 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the Internet.
  • Driver 210 is also connected to I/O interface 205 as needed.
  • a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 210 as needed so that a computer program read therefrom is installed in the storage portion 208 as needed.
  • FIG. 2 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 2 may be selected, deleted, added, or replaced according to actual needs; Different function components can be set up, such as separate settings or integrated settings.
  • the GPU and the CPU can be separated, and the GPU can be integrated on the CPU.
  • the communication unit can be separated or integrated. CPU or GPU, etc.
  • embodiments of the present application include a computer program product comprising tangibly embodied on a machine readable medium
  • Computer program comprising program code for executing the steps shown in the flowchart, the program code comprising instructions corresponding to each step in the method of any of the embodiments of the present application, for example, a convolutional neural network receiving a picture And an instruction of the text area template;
  • the picture includes structured text;
  • the text area template includes a position of at least one text area, and positions of each of the text areas in the position of the at least one text area are respectively based on The position of the corresponding text area in the at least one sample picture of the same type is obtained;
  • the convolutional neural network acquires an instruction of the actual position of a set of the to-be-detected area of the picture according to the text area template.
  • the computer program can be downloaded and installed from the network via the communication portion 209, and/or installed from the removable medium 211.
  • the computer program is executed by the central processing unit (CPU) 201, the above-described instructions described in the present application are executed.
  • the structured text detection method of this embodiment includes:
  • the convolutional neural network receives the picture and text area template.
  • the above-mentioned picture includes a structured text, which is a picture that needs to be subjected to structured text detection.
  • the present invention may be referred to as a picture to be detected.
  • the text area template includes a position of at least one text area, and positions of each of the text areas in the position of the at least one text area are respectively obtained based on positions of corresponding text areas in at least one sample picture of the same type as the picture to be detected.
  • the position of each text area in the text area template may be determined, for example, by the center coordinate, the width, and the length of the corresponding text area, or may be determined by the coordinates of the upper, lower, left, and right borders of the corresponding text area.
  • the operations 301 described above may be performed by a processor invoking a memory stored instruction or may be performed by a receiving module 601 executed by the processor.
  • the convolutional neural network acquires the actual location of a group of to-be-detected regions of the to-be-detected picture according to the text area template.
  • the above operation 302 may be performed by a processor calling an instruction stored in a memory, or may be performed by an acquisition module 602 executed by a processor.
  • the text area template is obtained in advance based on the position of the corresponding text area in the at least one sample picture of the same type, and the convolutional neural network receives the to-be-detected picture and the text area template, according to the text area template.
  • step 302 can be implemented by using the following scheme as compared with the embodiment shown in FIG. 3:
  • a convolution process is performed on the picture to be detected, and a convolution feature map of the picture to be detected is obtained.
  • the convolutional feature map is a feature map formed by all the features extracted from the image to be detected.
  • the foregoing operation 401 may be performed by a processor calling an instruction stored in a memory, or may be performed by a feature extraction unit 701 executed by a processor.
  • the location of all the text areas in the text area template is used as a group of to-be-detected areas of the to-be-detected picture, and the Region of Interest Pooling (RoI Pooling) operation is performed on the convolved feature map to extract the group. Local features of each area to be detected in the area to be detected.
  • RoI Pooling Region of Interest Pooling
  • the foregoing operation 402 may be performed by a processor calling an instruction stored in a memory, or may be performed by an area of interest pooling operation unit 702 executed by the processor.
  • the classification score and the position adjustment value of each to-be-detected area are obtained according to local features of each to-be-detected area.
  • the above operation 403 may be performed by a processor calling an instruction stored in a memory, or may be performed by a classification score and position adjustment value acquisition unit 703 executed by the processor.
  • each detection area has a text according to a classification score of each of the to-be-detected areas in the group of to-be-detected areas.
  • a classification function (softmax) in the convolutional neural network may be passed.
  • a layer respectively, determining a classification score of each to-be-detected area; if the classification score of the to-be-detected area is greater than a preset threshold, determining that the classification target is greater than a preset threshold, the area to be detected has a text; otherwise, if the classification score of the to-be-detected area is not greater than
  • the preset threshold is used to determine that the area to be detected whose classification score is not greater than a preset threshold has no text.
  • Operation 405 is performed for each of the areas to be detected having text. The subsequent flow of this embodiment is not performed for the area to be detected without text.
  • the above operation 404 may be performed by a processor calling an instruction stored in a memory, or may be performed by a text area determining unit 704 operated by the processor.
  • the coordinate value of the area to be detected with the text is adjusted according to the position adjustment value of the area to be detected with the text, and the actual position of the group of the to-be-detected area with the text is obtained.
  • the actual position of the area to be detected may be expressed as: [x+w*f1, y+h*f2, exp(f3)*w, exp(f4)*h].
  • the above operation 405 may be performed by a processor calling an instruction stored in a memory, or may be performed by an actual location determining unit 705 operated by the processor.
  • the image to be detected may be preprocessed, including: intercepting and correcting processing, and scaling.
  • the picture received by the convolutional neural network is a pre-processed picture.
  • the background area in the picture can be removed by interception, and the skewed picture can be corrected by turning right.
  • the pre-processing operation may be performed by a processor invoking a memory stored instruction or may be performed by a picture pre-processing module 603 executed by the processor.
  • the pre-processing operation may be performed by a processor invoking a memory stored instruction or may be performed by a text recognition module 604 executed by the processor.
  • the text area template corresponding to the to-be-detected picture may be acquired before operation 301.
  • the text area template can be obtained as follows:
  • the text area template includes an average of the correct positions of all of the text areas in the at least one sample picture.
  • the foregoing operation of acquiring the text area template corresponding to the to-be-detected picture may be performed by the processor invoking an instruction stored in the memory, or may be performed by the text area template module 605 or the calculation module 607 executed by the processor. carried out.
  • the present application uses a text area template as a region to be detected (Proposal) of the picture, which improves the detection speed of the structured text.
  • a text area template As a region to be detected (Proposal) of the picture, which improves the detection speed of the structured text.
  • preprocessing such as intercepting, rectifying, and scaling to a preset size
  • the error due to interception and rotation, and the length of the structured text itself vary in different images.
  • the location of the area to be detected may be different in different pictures, but its distribution is in the form of a center and a small number of four weeks in the middle.
  • these text area templates are used as input area convolutional neural networks, and the local area features of the area to be detected are extracted by using the interest area pooling operation, and then the classification scores and position adjustment amounts of the corresponding detection areas are calculated according to the local features, It is determined whether there is text and the position of the text in this area, so that the number of areas to be detected is equal to the number of all possible text areas, which reduces the amount of calculation when identifying structured text, thereby improving the recognition speed.
  • FIG. 5 is a schematic diagram of an application embodiment of a structured text detection method in accordance with the present application.
  • FIG. 6 is a schematic diagram of a picture used in the application embodiment shown in FIG. 5.
  • an ID card photo is taken as an example to be detected, and an embodiment of the present application is described.
  • the technical solution provided by the present application can also be applied to other text structured detections such as a passport, a motor vehicle driver's license, a ticket, and the like, and details are not described herein.
  • the ID card photo includes 10 areas (ie, text areas) that may have text information, wherein the address is divided into three lines, and each line forms a text area.
  • the correct position of each text area is called the ground-truth box, which is determined by the x coordinate of the left and right borders and the y coordinate of the upper and lower boundaries.
  • the application embodiment includes:
  • Operation 501 pre-processing a large number of ID card sample photos, including: intercepting and correcting processing, removing the background area in the ID card photo by intercepting, correcting the skewed ID card photo by righting, and then zooming the ID card photo to one Get the ID card image by default size.
  • a plurality of positions of all 10 text areas of each ID picture in the pre-processed ID picture are obtained, and corresponding text areas of all ID picture images are calculated for any of the 10 text areas.
  • the "template" is shown.
  • the foregoing operations 501-52 are operations performed before the structured text detection method of the picture to be detected. After the text area template is obtained through the operations 501-452, the direct structure of the picture is detected by the structured text detection method of the present application. In the case of text detection, the following operations 503-508 are directly performed without performing the above operations 501-52.
  • Operation 503 pre-processing the photo of the ID card to be detected, including: intercepting and correcting the processing, and zooming to a preset size, obtaining a picture of the ID card to be detected, and the picture of the ID card to be detected and obtained by the above operations 501-522
  • the text area template is entered into the convolutional neural network.
  • the position of the ten text areas in the text area template obtained by the above operations 501-502 is used as a group of to-be-detected areas of the to-be-detected ID picture, and a total of 10 areas to be detected are used to perform the pooling operation of the interest area. , extract local features of 10 text areas.
  • Operation 506 for example, by acquiring one or more Fully Connected Layer (Fc) layers in the convolutional neural network, respectively acquiring classification scores and position adjustment values of the to-be-detected regions in the above 10 groups of to-be-detected regions.
  • Fc Fully Connected Layer
  • Operation 507 for example, by identifying a classification function (softmax) layer in the convolutional neural network, identifying whether each of the regions to be detected contains text.
  • softmax classification function
  • a classification score of each to-be-detected area for example, 0.5; if the classification score of the to-be-detected area is greater than the preset threshold, determining that the to-be-detected area having the classification score is greater than a preset threshold has text; otherwise, if the classification score of the to-be-detected area is Not greater than the preset threshold, determining that the region to be detected whose classification score is not greater than a preset threshold has no text.
  • the preset threshold can be set according to actual conditions and can be adjusted according to actual conditions.
  • Operation 408 is performed for each of the areas to be detected having text. For the area to be detected without text, the area to be detected is discarded, and the subsequent process of this embodiment is not performed.
  • Operation 508 adjusting the corresponding to be detected in the text difference area template according to the position adjustment value of the detection area with the text The coordinate value of the area, the actual position of a set of detection areas with text.
  • each to-be-detected area may be expressed as [x+w*f1, y+h*f2, exp(f3)*w, exp(f4)*h], where (x+w*f1, y+h*f2) is the X and Y coordinates of the center of the corresponding area to be detected, exp(f3)*w is the length of the corresponding area to be detected, and exp(f4)*h is the width of the corresponding area to be detected, exp() Is an exponential function.
  • the method further includes: training the convolutional neural network by using at least one sample picture similar to the picture to be detected, wherein the sample picture includes at least one text. Area, the sample image is marked with the correct position of each text area.
  • the convolutional neural network can be used to perform structured text detection on the image through the above embodiments of the present application.
  • the convolutional neural network is trained using at least one sample picture of the same type as the picture, including:
  • the convolutional neural network receives at least one sample picture and a text area template, and respectively performs a sample process on any one of the at least one sample picture: performing convolution processing on any sample picture to obtain a convolution feature picture of any sample picture; Taking the position of all the text areas in the text area template as a set of to-be-detected areas of any sample picture, performing a pooling operation on the convolutional feature map, and extracting local features of each of the to-be-detected areas in the to-be-detected area; Obtaining a predicted classification score and a position adjustment value of each of the to-be-detected areas in the to-be-detected area; determining whether each detection area has a text according to the predicted classification score of each to-be-detected area; respectively, according to the to-be-detected area having each character, according to The position adjustment value of the area to be detected of the character adjusts the coordinate value of the area to be detected with the character to obtain the predicted position of the area to be
  • the predicted position of the region to be detected may be expressed as [x+w*f1, y+h*f2, exp(f3)*w, exp(f4)*h], where (x+w*f1, y+h*f2) is the X coordinate and the Y coordinate of the center of the area to be detected, exp(f3)*w is the length of the area to be detected, exp(f4)*h is the width of the area to be detected, exp( ) is an exponential function;
  • the convolutional neural network is trained to adjust the parameter values of the network parameters in the convolutional neural network according to the correct position of each character region marked by the at least one sample picture, whether the detection region has a text determination result and a predicted position.
  • the operation of the convolutional neural network on the sample picture is the same as the operation in the embodiment of the structured text detection method of the convolutional neural network to be detected.
  • a structured text detection method is to be performed when the picture to be detected is a sample picture. Therefore, an optional implementation manner in the embodiment of training the convolutional neural network may refer to a corresponding manner in the embodiment of the structured text detection method. I will not repeat them here.
  • the convolutional neural network when the convolutional neural network is trained according to the correct position of each character region marked by the at least one sample picture, whether the detection region has a text determination result, and a predicted position, the iterative update method or gradient update method is used to train the convolutional neural network.
  • the above process of training the convolutional neural network by using at least one sample picture similar to the picture may be iteratively performed, and in each execution process, for example, a softmax loss function may be used.
  • a softmax loss function may be used.
  • Calculating a first loss function value according to a correct position of each of the character regions marked by the at least one sample picture and a determination result of whether each of the detection regions has a character, for example, a text field marked according to the at least one sample image by a smooth L1loss regression function The correct position and predicted position are calculated, and the second loss function value is calculated.
  • the parameter values of the network parameters in the convolutional neural network can be directly adjusted so that the first loss function value and the second loss function value respectively reach the minimum value.
  • any of the sample pictures may be intercepted and corrected, and scaled to a preset size.
  • the text area template may also be obtained by:
  • the text area template specifically includes the average of the positions of all the text areas in the two or more sample pictures.
  • the above method of the embodiment of the present application may be implemented in hardware, firmware, or implemented as software or computer code that can be stored in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or implemented as Network downloaded, computer code originally stored in a remote recording medium or non-transitory machine readable medium and stored in a local recording medium, whereby the methods described herein can be stored using a general purpose computer, a dedicated processor, or Such software processing on a recording medium of programming or dedicated hardware such as an ASIC or FPGA.
  • a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or implemented as Network downloaded, computer code originally stored in a remote recording medium or non-transitory machine readable medium and stored in a local recording medium, whereby the methods described herein can be stored using a general purpose computer, a dedicated processor, or Such software processing on a recording medium
  • a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is The processing methods described herein are implemented when the processor or hardware is accessed and executed. Moreover, when a general purpose computer accesses code for implementing the processing shown herein, the execution of the code converts the general purpose computer into a special purpose computer for performing the processing shown herein.
  • FIG. 7 is a schematic diagram of an embodiment of a structured text detection system according to the present application.
  • the structured text detection system of the embodiment can be used to implement the embodiments of the structured text detection method described above.
  • the system of this embodiment includes a receiving module 601 and an obtaining module 602.
  • the receiving module 601 is configured to receive a picture and a text area template.
  • the image includes a structured text;
  • the text area template includes a position of at least one text area, and the positions of the text areas in the position of the at least one text area are respectively based on positions of corresponding text areas in at least one sample picture of the same kind as the picture obtain.
  • the position of each text area in the text area template may be determined by the center coordinates, width, and length of the corresponding text area.
  • the obtaining module 602 is configured to acquire an actual location of a group of to-be-detected regions of the picture according to the text area template.
  • the actual position of each to-be-detected area may be expressed as: [x+w*f1, y+h*f2, exp(f3)*w, exp(f4)*h].
  • the receiving module 601 and the obtaining module 602 may be implemented by using a convolutional neural network.
  • the text area template is obtained in advance based on the position of the corresponding text area in at least one sample picture of the same type, and the convolutional neural network receives the image to be detected and the text area template, according to the text area template.
  • the obtaining module 602 includes: a feature extraction unit 701, an interest area pooling operation unit 702, a classification score and position adjustment value acquisition unit 703, a text area determination unit 704, and an actual position determination.
  • the feature extraction unit 701 is configured to perform convolution processing on the image to obtain a convolution feature map of the image.
  • the interest area pooling operation unit 702 is configured to use the position of all the text areas in the text area template as a group of to-be-detected areas of the picture, perform an interest area pooling operation on the convolved feature map, and extract the group to be detected. Local features of each area to be detected in the area.
  • the classification score and position adjustment value acquisition unit 703 is configured to acquire the classification score and the position adjustment value of each to-be-detected area according to the local features of each group of the to-be-detected areas.
  • the text area determining unit 704 is configured to determine whether each detection area has a text according to the classification score of each to-be-detected area.
  • the text region determining unit 704 can be implemented by a classification function softmax layer.
  • the classification function layer is configured to respectively determine a classification score of each to-be-detected area; if the classification score of the to-be-detected area is greater than a preset threshold, it is determined that the to-be-detected area whose classification score is greater than a preset threshold has a text.
  • the actual position determining unit 705 is configured to adjust the coordinate value of the to-be-detected area of the text according to the position adjustment value of the to-be-detected area with the text, and obtain the to-be-detected area of the text with the text to be detected. Actual location.
  • the image pre-processing module 603 may be further included to perform interception and correction processing on the image, and zoom to A preset size is then sent to the receiving module 601.
  • the method further includes: a text recognition module 604, configured to perform text on a corresponding position of a group of to-be-detected areas. Identify and obtain the structured text information in the above picture.
  • the text area template module 605 may be further included to be used for at least one sample picture of the same type as the above picture.
  • Each corresponding text area obtains an average value of the correct position of the corresponding text area according to the correct position of the corresponding text area, and obtains a text area template according to an average value of the correct positions of all the text areas in the at least one sample picture.
  • the structured text detecting system of the present application may further include a network training module 606 for utilizing at least one sample image pair of the same type as the above picture.
  • the convolutional neural network is trained, wherein the sample picture includes at least one text area, and the sample picture is marked with the correct position of each text area.
  • the network training module 606 can be removed after the training of the convolutional neural network 60 is completed.
  • the convolutional neural network 60 is specifically configured to: receive at least one sample picture and a text area template, and respectively target any one of the at least one sample picture: for any sample picture Convolution processing is performed to obtain a convolutional feature map of any sample image; the position of all the text regions in the text region template is used as a group of to-be-detected regions of any sample image, and the region of interest is pooled for the convolution feature map.
  • the network training module 606 is specifically configured to train the convolutional neural network 60 according to the correct position of each character region marked by the at least one sample picture, whether the detection region has a determination result and a predicted position of each of the detection regions.
  • the picture pre-processing module 603 can also be used to intercept and correct any sample picture and zoom to a preset size.
  • the structured text detection system of this embodiment may further include a calculation module 607, configured to calculate the two or more for each text region in the two or more sample pictures after being scaled to the preset size, respectively. An average of the positions of the corresponding text regions of the sample image, and an average of the positions of the text regions in the two or more sample images, wherein the text region template specifically includes all the text regions in the two or more sample images The average of the locations.
  • the average of the positions of the text areas in the two or more sample pictures by the text area template module 605 or the calculation module 607 may be selected. Value to get the text area template.
  • the embodiment of the present application further provides a computing device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, etc., which is provided with the structured text detection system of any embodiment of the present application.
  • a computing device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, etc., which is provided with the structured text detection system of any embodiment of the present application.
  • the embodiment of the present application further provides another computing device, including:
  • a processor and a structured text detection system of any of the above embodiments of the present application
  • the processor runs the structured text detection system
  • the elements in the structured text detection system of any of the above embodiments of the present application are executed.
  • the embodiment of the present application further provides another computing device, including: a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface complete communication with each other through the communication bus;
  • the memory is for storing at least one executable instruction that causes the processor to perform the operations of the steps in the structured text detection method of any of the above-described embodiments of the present application.
  • Figure 10 illustrates a computing device in which the structured text detection method of the present application can be implemented.
  • the computing device includes a processor 801, a Communications Interface 802, a memory 803, and a communication bus 804.
  • the processor 801, the communication interface 802, and the memory 803 complete communication with each other via the communication bus 804.
  • the communication interface 804 is configured to communicate with network elements of other devices, such as clients or data collection devices.
  • the processor 801 is configured to execute a program, and specifically, perform related steps in the foregoing method embodiments.
  • the processor 801 can be a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 506 is configured to store a program, where the program includes at least one executable instruction, and the executable instruction may be specifically configured to cause the processor 801 to: receive a picture and a text area template by a convolutional neural network; the picture includes a structure Text; the text area template includes a position of at least one text area, and positions of each of the text areas in the position of the at least one text area are respectively obtained based on positions of corresponding text areas in at least one sample picture of the same type as the picture; The convolutional neural network acquires an actual location of a group of to-be-detected regions of the picture according to the text area template.
  • Memory 506 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
  • the embodiment of the present application further provides a computer program, including computer readable code, when a computer readable code is run on a device, the processor in the device executes the structured text detection method for implementing any of the embodiments of the present application. Instructions for each step.
  • the embodiment of the present application further provides a computer system, including:
  • One or more processors in communication with the memory to execute the executable instructions to perform the operations of the various steps in the structured text detection method of any of the embodiments of the present application.
  • the embodiment of the present application further provides a computer readable medium for storing computer readable instructions that, when executed, implement the operations of the steps in the structured text detection method of any of the present applications.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

一种结构化文本检测方法和系统、计算设备,其中所述方法包括:卷积神经网络接收图片及文字区域模板(301);所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置(302)。上述方法在保证结构化文本检测准确度的同时减小了计算量,并提高了检测效率。

Description

结构化文本检测方法和系统、计算设备
本申请要求在2016年7月15日提交中国知识产权局、申请号为201610561355.7、发明创造名称为“结构化文本检测方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图片处理领域,尤其涉及一种结构化文本检测方法和系统、计算设备。
背景技术
结构化文本是指布局结构基本固定的文本,例如身份证、护照、机动车驾驶证、票据等。在数字化时代,人们为了将这些信息录入计算机,往往需要手动打字,花费大量的时间。为了节省时间,人们开始采用将证件拍成图片,再利用计算机视觉技术从图片中自动获取文本的方法。
发明内容
本申请实施例提供一种结构化文本检测方案。
根据本申请实施例的一个方面,提供的一种结构化文本检测方法,包括:
卷积神经网络接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;
所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。根据本申请实施例的另一个方面,提供的一种结构化文本检测系统,包括:
接收模块,用于接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;
获取模块,用于根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。
根据本申请实施例的又一个方面,提供的一种计算设备,包括:本申请任一实施例所述的结构化文本检测系统。
根据本申请实施例的又一个方面,提供的另一种计算设备,包括:
处理器和本申请任一实施例所述的结构化文本检测系统;
在处理器运行所述结构化文本检测系统时,本申请任一实施例所述的结构化文本检测系统中的单元被运行。
根据本申请实施例的又一个方面,提供的又一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请任一实施例所述的结构化文本检测方法中各步骤的操作。
根据本申请实施例的又一个方面,提供的一种计算机系统,包括:
存储器,存储可执行指令;
一个或多个处理器,与存储器通信以执行可执行指令从而完成本申请任一实施例所述的结构化文本检测方法中各步骤的操作。
根据本申请实施例的再一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的结构化文本检测方法中各步骤的指令。
根据本申请实施例的还一个方面,提供的一种计算机可读介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请任一实施例所述的结构化文本检测方法中各步骤的操作。
本申请实施例提供的技术方案中,预先基于同类的至少一个样本图片中相应文字区域的位置获得文字区域模板,卷积神经网络接收待检测图片及文字区域模板后,根据该文字区域模板获取待检测图片的一组待检测区域的实际位置,待检测区域较少,检测结构化文本所需要的计算量减小,所花费的时间减少,检测速率明显加快,所需要的计算资源也明显减少。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
本申请将在下面参考附图并结合实施例进行更完全地说明。
图1为适于实现本申请的一个示例性设备的框图。
图2为适于实现本申请的一个示例性设备的结构示意图。
图3为根据本申请结构化文本检测方法的一实施例的流程图。
图4为根据本申请结构化文本检测方法的另一实施例的流程图。
图5为根据本申请结构化文本检测方法的一应用实施例的流程图。
图6为图5所示应用实施例中使用的图片的一示意图。
图7为根据本申请结构化文本检测系统的一实施例的结构示意图。
图8为根据本申请结构化文本检测系统的另一实施例的结构示意图。
图9为根据本申请结构化文本检测系统的又一实施例的示意图。
图10为实现本申请结构化文本检测方法的计算设备一实施例的示意图。
为清晰起见,这些附图均为示意性及简化的图,它们只给出了对于理解本申请所必要的细节,而省略其他细节。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例的结构化文本检测相关技术方案可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程 消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
可能以许多方式来实现本申请实施例的结构化文本检测相关技术方案,包括方法、系统和设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法、系统和设备。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
下面结合附图通过具体的实施例对本申请实施例的结构化文本检测技术方案进行详细介绍。
图1示出了适于实现本申请的一个示例性设备10(例如,计算机系统/服务器)的框图。图1显示的设备10仅仅是一个示例,不应对本申请的功能和使用范围带来任何限制。如图1所示,设备10可以以通用计算设备的形式表现。设备10的组件可以包括但不限于:一个或者多个处理器或者处理单元101,系统存储器102,连接不同系统组件(包括系统存储器102和处理单元101)的总线103。设备10可以包括多种计算机系统可读介质。这些介质可以是任何能够被设备10访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质等。
系统存储器102可以包括易失性存储器形式的计算机系统可读介质,例如,随机存取存储器(RAM)1021和/或高速缓存存储器1022。设备10可以进一步包括其他可移动的/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,ROM 1023可以用于读写不可移动的、非易失性磁介质(图1中未显示,通常称为“硬盘驱动器”)。尽管未在图1中示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其他光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线103相连。系统存储器102中可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请的功能。
具有一组(至少一个)程序模块1024的程序/实用工具1025,可以存储在例如系统存储器102中,这样的程序模块1024包括但不限于:操作系统、一个或者多个应用程序、其他程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块1024通常执行本申请所描述的功能和/或方法。
设备10也可以与一个或多个外部设备104(如键盘、指向设备、显示器等)通信。这种通信可以通过输入/输出(I/O)接口105进行,并且,设备10还可以通过网络适配器106与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或者公共网络,例如因特网)通信。如图1所示,网络适配器106通过总线103与设备10的其他模块(如处理单元101等)通信。应当明白,尽管图1中未示出,可以结合设备10使用其他硬件和/或软件模块。
处理单元101(即处理器)通过运行存储在系统存储器102中的计算机程序,从而执行各种功能应用以及数据处理,例如,执行用于实现本申请任一实施例所示结构化文本检 测方法中的各步骤的指令;具体而言,处理单元101可以执行系统存储器102中存储的计算机程序,且该计算机程序被执行时,下述步骤被实现:卷积神经网络接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。
图2示出了适于实现本申请的一个示例性设备20的结构示意图。其中,设备20可以是移动终端、个人计算机(PC)、平板电脑以及服务器等。图2中,计算机系统20包括一个或者多个处理器、通信部等,所述一个或者多个处理器可以为:一个或者多个中央处理单元(CPU)201,和/或,一个或者多个图像处理器(GPU)213等,处理器可以根据存储在只读存储器(ROM)202中的可执行指令或者从存储部分208加载到随机访问存储器(RAM)203中的可执行指令而执行各种适当的动作和处理。通信部212可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器202和/或随机访问存储器230中通信以执行可执行指令,通过总线204与通信部212相连、并经通信部212与其他目标设备通信,从而完成本申请任一实施例所述方法中的相应步骤。在本申请的一个示例中,处理器所执行的步骤包括:卷积神经网络接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。
此外,在RAM 203中,还可以存储有装置操作所需的各种程序以及数据。CPU201、ROM202以及RAM203通过总线204彼此相连。在有RAM203的情况下,ROM202为可选模块。RAM203存储可执行指令,或在运行时向ROM202中写入可执行指令,可执行指令使中央处理单元201执行本申请任一实施例所述方法所包括的步骤。输入/输出(I/O)接口205也连接至总线204。通信部212可以集成设置,也可以设置为具有多个子模块(例如,多个IB网卡),并分别与总线连接。
以下部件连接至I/O接口205:包括键盘、鼠标等的输入部分206;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分207;包括硬盘等的存储部分208;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分209。通信部分209经由诸如因特网的网络执行通信处理。驱动器210也根据需要连接至I/O接口205。可拆卸介质211,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器210上,以便于从其上读出的计算机程序根据需要被安装在存储部分208中。
需要说明的,如图2所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图2的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如,GPU和CPU可分离设置,再如理,可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上等。这些可替换的实施方式均落入本申请的保护范围。
特别地,根据本申请的实施方式,下文参考流程图描述的过程可以被实现为计算机软件程序,例如,本申请的实施方式包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的步骤的程序代码,程序代码可包括对应执行本申请任一实施例所述方法中各步骤对应的指令,例如,卷积神经网络接收图片及文字区域模板的指令;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述 图片同类的至少一个样本图片中相应文字区域的位置获得;所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置的指令。
在这样的实施方式中,该计算机程序可以通过通信部分209从网络上被下载及安装,和/或从可拆卸介质211被安装。在该计算机程序被中央处理单元(CPU)201执行时,执行本申请中记载的上述指令。
图3为根据本申请的结构化文本检测方法的一实施例的流程图。如图3所示,该实施例的结构化文本检测方法包括:
操作301,卷积神经网络接收图片及文字区域模板。
其中,上述图片包括结构化文本,为需要进行结构化文本检测的图片,为便于区分,本申请实施例中可以称为待检测图片。上述文字区域模板包括至少一个文字区域的位置,该至少一个文字区域的位置中各文字区域的位置分别基于与上述待检测图片同类的至少一个样本图片中相应文字区域的位置获得。
在一种可选的实现方式中,文字区域模板中各文字区域的位置,例如可以由相应文字区域的中心坐标、宽度及长度确定,或者也可以由相应文字区域的上下左右边界的坐标确定。
在一种可选的实现方式中,上述操作301可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的接收模块601执行。
操作302,卷积神经网络根据上述文字区域模板获取上述待检测图片一组待检测区域的实际位置。
一种可选的实现方式中,上述操作302可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的获取模块602执行。
本申请实施例的结构化文本检测方法中,预先基于同类的至少一个样本图片中相应文字区域的位置获得文字区域模板,卷积神经网络接收待检测图片及文字区域模板后,根据该文字区域模板获取待检测图片的一组待检测区域的实际位置,待检测区域较少,检测结构化文本所需要的计算量减小,所花费的时间减少,检测速率明显加快,所需要的计算资源也明显减少。
图4为根据本申请结构化文本检测方法的另一实施例的流程图。如图4所示,与图3所示实施例相比,该实施例中,步骤302可以示例地通过如下方案实现:
操作401,对待检测图片进行卷积处理,获得该待检测图片的卷积特征图。
其中的卷积特征图即由从待检测图片提取的所有特征形成的特征图。
一种可选的实现方式中,上述操作401可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的特征提取单元701执行。
操作402,以上述文字区域模板中所有文字区域的位置作为待检测图片的一组待检测区域,对卷积特征图进行兴趣区域池化(Region of Interest Pooling,RoI Pooling)操作,提取该一组待检测区域中各待检测区域的局部特征。
一种可选的实现方式中,上述操作402可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的兴趣区域池化操作单元702执行。
操作403,分别根据各待检测区域的局部特征获取各待检测区域的分类分数和位置调整值。
一种可选的实现方式中,上述操作403可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的分类分数和位置调整值获取单元703执行。
操作404,分别根据上述一组待检测区域中各待检测区域的分类分数确定各检测区域是否有文字。
在一种可选示例中,该操作404中,可以通过卷积神经网络中的一个分类函数(softmax) 层,分别确定各待检测区域的分类分数;若待检测区域的分类分数大于预设阈值,确定该分类分数大于预设阈值的待检测区域有文字;否则,若待检测区域的分类分数不大于预设阈值,确定该分类分数不大于预设阈值的待检测区域没有文字。
分别针对各有文字的待检测区域,执行操作405。针对没有文字的待检测区域,不执行本实施例的后续流程。
一种可选的实现方式中,上述操作404可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的文字区域确定单元704执行。
操作405,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的一组待检测区域的实际位置。
在一种可选的实现方式中,待检测区域的实际位置可以表示为:[x+w*f1,y+h*f2,exp(f3)*w,exp(f4)*h]。
其中,(x+w*f1,y+h*f2)表示该有文字的待检测区域的中心坐标(X,Y),exp(f3)*w表示所述待检测区域的长度,exp(f4)*h表示所述待检测区域的宽度;x、y、h、w分别表示与该有文字的待检测区域对应的文字区域的中心的X坐标、Y坐标、宽度和长度;[f1,f2,f3,f4]分别表示卷积神经网络在训练过程中,文字区域模板中各文字区域的回归目标,[f1,f2,f3,f4]=[(x'-x)/w,(y'-y)/h,log(w'/w),log(h'/h)],x'、y'、h'、w'分别表示至少一个样本图片中各样本图片的相应文字区域的中心的X坐标、Y坐标、宽度和长度。
一种可选的实现方式中,上述操作405可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的实际位置确定单元705执行。
另外,再参见图4,在基于本申请上述各结构化文本检测方法的另一可选实施例中,在操作301之前,还可以对待检测图片进行预处理,包括:截取和转正处理、并缩放到预设尺寸,即,本申请各实施例中,卷积神经网络接收到的图片为经预处理后的图片。其中,通过截取可以去除图片中的背景区域,通过转正可以使歪斜的图片变正。一种可选的实现方式中,该预处理操作可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的图片预处理模块603执行。
另外,再参见图4,在基于本申请上述各结构化文本检测方法的又一可选实施例中,在操作302或者405之后,对上述一组待检测区域的实际位置对应区域进行文字识别,获得待检测图片中的结构化文本信息。一种可选的实现方式中,该预处理操作可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的文字识别模块604执行。
另外,在基于本申请上述各结构化文本检测方法的再一可选实施例中,可以在操作301之前,获取待检测图片对应的文字区域模板。例如,在其中一种可选的实现方案中,可以通过如下方式获取该文字区域模板:
分别获取与上述待检测图片同类的至少一个样本图片中所有文字区域的正确位置;
分别针对该至少一个样本图片中的各相应文字区域,获取各对应文字区域的正确位置的平均值,根据该至少一个样本图片中所有文字区域的正确位置的平均值获得文字区域模板,即:该文字区域模板包括该至少一个样本图片中所有文字区域的正确位置的平均值。
一种可选的实现方式中,上述获取待检测图片对应的文字区域模板的操作可以由处理器调用存储器存储的指令执行,或者,可以由被处理器运行的文字区域模板模块605或者计算模块607执行。
本申请使用文字区域模板作为图片的待检测区域(Proposal)),提高了结构化文本的检测速度。对于结构化文本信息的获取,在对图片进行截取、转正、缩放到预设尺寸等预处理之后,由于截取和转正时的误差,以及结构化文本长度本身在不同图片中有所变化的原因,待检测区域在不同图片中的位置可能不同,但是其分布是围绕一个中心、中间多四周少的形式。我们将预先将大量同类结构化文本图片中的所有文字区域各计算一个位置的 平均值,作为一组文字区域模板。然后将这些文字区域模板作为待检测区域输入卷积神经网络,利用兴趣区域池化操作,提取待检测区域的局部特征,然后根据该局部特征计算相应待检测区域的分类分数和位置调整量,以确定这个区域内是否有文本以及文本的位置,从而使得待检测区域的个数等于所有可能存在的文本区域的个数,减小了在识别结构化文本时的计算量,进而提高了识别速度。
图5为根据本申请结构化文本检测方法一个应用实施例的示意图。图6为图5所示应用实施例中使用的图片的一示意图。如图3中所示,该应用实施例中以身份证照片作为待检测图片为例,对本申请实施例进行说明。可以理解,除了身份证的结构化文件检测之外,本申请提供的技术方案还可应用于护照、机动车驾驶证、票据等其他文本结构化检测中,不再赘述。
如图6所示,身份证照片包括10个可能有文字信息的区域(即:文字区域),其中住址最多分为三行,每行形成一个文字区域。每个文字区域的正确位置称为ground-truth框,通过左右边界的x坐标和上下边界的y坐标确定。如图5所示,该应用实施例包括:
操作501,对大量身份证样本照片进行预处理,包括:截取和转正处理,通过截取去除身份证照片中的背景区域,通过转正使歪斜的身份证照片变正,然后将身份证照片缩放到一个预设尺寸,得到身份证图片。
操作502,获取大量经预处理后的身份证图片中每一身份证图片的所有10个文字区域的位置,分别针对该10个文字区域中的任一文字区域,计算所有身份证图片的相应文字区域的位置的平均值,10个文字区域的位置的平均值作为一组文字区域模板,作为待检测身份证图片上文字区域的检测基础(本申请实施例中的待检测区域),如图6中的“模板”所示。
其中,上述操作501-502为在对待检测图片进行结构化文本检测方法之前预先执行的操作,通过上述操作501-502获得文字区域模板后,通过本申请的结构化文本检测方法对待检测图片直接结构化文本检测时,直接进行以下操作503-508,而无需执行上述操作501-502。
操作503,对待检测的身份证照片进行预处理,包括:截取和转正处理、并缩放到预设尺寸,得到待检测身份证图片,将该待检测身份证图片和通过上述操作501-502获得的文字区域模板输入卷积神经网络。
操作504,对待检测身份证图片进行卷积、非线性变换等处理,获得该待检测身份证图片的卷积特征图。
操作505,以通过上述操作501-502获得的文字区域模板中的10个文字区域的位置作为该待检测身份证图片的一组待检测区域,共10个待检测区域,进行兴趣区域池化操作,提取10个文字区域的局部特征。
操作506,例如通过卷积神经网络中的一个或多个全连接(Fully Connected Layer,Fc)层,分别获取上述10个组待检测区域中各待检测区域的分类分数和位置调整值。
操作507,例如通过卷积神经网络中的一个分类函数(softmax)层,识别各待检测区域是否包含文字。
分别确定各待检测区域的分类分数,例如0.5;若待检测区域的分类分数大于该预设阈值,确定该分类分数大于预设阈值的待检测区域有文字;否则,若待检测区域的分类分数不大于该预设阈值,确定该分类分数不大于预设阈值的待检测区域没有文字。
上述预设阈值可根据实际情况设定,并可以根据实际情况调整。
分别针对各有文字的待检测区域,执行操作408。针对没有文字的待检测区域,舍弃该待检测区域,不执行本实施例的后续流程。
操作508,根据有文字的检测区域的位置调整值调整文字区别区域模板中相应待检测 区域的坐标值,得到有文字的一组检测区域的实际位置。
具体地,各待检测区域的实际位置可以表示为[x+w*f1,y+h*f2,exp(f3)*w,exp(f4)*h],其中,(x+w*f1,y+h*f2)为相应待检测区域的中心的X和Y坐标,exp(f3)*w为相应待检测区域的长度,exp(f4)*h为相应待检测区域的宽度,exp()为指数函数。
在确定待检测区域的实际位置后,即可采取各种文字识别技术对相应区域的文字进行自动识别。
进一步地,在本申请上述各结构化文本检测方法实施例之前,还可以包括:利用与上述待检测图片同类的至少一个样本图片对卷积神经网络进行训练,其中,该样本图片包括至少一个文字区域,样本图片标注有各文字区域的正确位置。训练完成后,即可通过本申请的上述各实施例,利用该卷积神经网络对图片进行结构化文本检测。
在其中一种可选的实现方式中,利用与图片同类的至少一个样本图片对卷积神经网络进行训练,包括:
卷积神经网络接收至少一个样本图片及文字区域模板,并分别针对该至少一个样本图片中的任一样本图片:对任一样本图片进行卷积处理,获得任一样本图片的卷积特征图;以文字区域模板中所有文字区域的位置作为任一样本图片的一组待检测区域,对卷积特征图进行兴趣区域池化操作,提取一组待检测区域中各待检测区域的局部特征;分别获取一组待检测区域中各待检测区域的预测分类分数和位置调整值;分别根据各待检测区域的预测分类分数确定各检测区域是否有文字;分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的预测位置。示例性地,该待检测区域的预测位置可以表示为[x+w*f1,y+h*f2,exp(f3)*w,exp(f4)*h],其中(x+w*f1,y+h*f2)为该待检测区域的中心的X坐标和Y坐标,exp(f3)*w为该待检测区域的长度,exp(f4)*h为该待检测区域的宽度,exp()为指数函数;
根据上述至少一个样本图片标注的各文字区域的正确位置、各检测区域是否有文字的确定结果和预测位置,对卷积神经网络进行训练,调整卷积神经网络中网络参数的参数值。
其中,本申请对卷积神经网络进行训练的实施例中,卷积神经网络对样本图片的操作,与上述卷积神经网络对待检测图片的结构化文本检测方法实施例中的操作相同,可以看作待检测图片为样本图片时的结构化文本检测方法,因此,对卷积神经网络进行训练的实施例中的可选实现方式,可以参考采用上述结构化文本检测方法实施例中的相应方式,此处不再赘述。
在其中一种可选的实现方式中,根据上述至少一个样本图片标注的各文字区域的正确位置、各检测区域是否有文字的确定结果和预测位置,对卷积神经网络进行训练时,可以采用迭代更新法或者梯度更新法对卷积神经网络进行训练。
采用迭代更新法对卷积神经网络进行训练时,可以迭代执行上述利用与图片同类的至少一个样本图片对卷积神经网络进行训练的过程,在每次执行过程中,例如可以通过softmax损失函数,根据上述至少一个样本图片标注的各文字区域的正确位置和各检测区域是否有文字的确定结果计算第一损失函数值,例如可以通过smooth L1loss回归函数,根据上述至少一个样本图片标注的各文字区域的正确位置和预测位置,计算第二损失函数值,对于每一待检测区域,回归函数的回归目标例如可以是[f1,f2,f3,f4]=[(x'-x)/w,(y'-y)/h,log(w'/w),log(h'/h)],其中x',y',h',w'为每一样本图片的相应ground-truth框的X坐标和Y坐标、宽度和长度;x,y,h,w为相应待检测区域的X坐标和Y坐标、宽度和长度;或者统计对卷积神经网络的训练次数。根据第一损失函数值和/或第二损失函数值调整卷积神经网络中网络参数的参数值,以减小第一损失函数值和/或第二损失函数值,之后再执行下一次训练过程,直至满足预设条件,例如,对卷积神经网络的训练次数达到预设次数阈值、或者第一损失函数值和/或第二损失函数值分别小于对应的预设损失函数值,结束训练。
采用梯度更新法对卷积神经网络进行训练时,可以直接调整卷积神经网络中网络参数的参数值,以使第一损失函数值、第二损失函数值分别达到最小值。
在另一种可选的实现方式中,还可以在卷积神经网络接收至少一个样本图片之前,对上述任一样本图片进行截取和转正处理、并缩放到预设尺寸。
在又一种可选的实现方式中,对任一样本图片进行截取和转正处理、并缩放到预设尺寸之后,还可以通过如下方式获取文字区域模板:
分别针对上述缩放到预设尺寸后样本图片中的两个或以上样本图片中的各文字区域,计算两个或以上样本图片的相应文字区域的位置的平均值,得到两个或以上样本图片中的各文字区域的位置的平均值,文字区域模板具体包括两个或以上样本图片中的所有文字区域的位置的平均值。
上述本申请实施例的方法可在硬件、固件中实现,或者被实现为可存储在记录介质(诸如CD ROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码,或者被实现为通过网络下载的、原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码,从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如ASIC或FPGA)的记录介质上的这样的软件处理。可以理解,计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如,RAM、ROM、闪存等),当所述软件或计算机代码被计算机、处理器或硬件访问且执行时,实现在此描述的处理方法。此外,当通用计算机访问用于实现在此示出的处理的代码时,代码的执行将通用计算机转换为用于执行在此示出的处理的专用计算机。
图7为根据本申请结构化文本检测系统的一实施例的示意图,该实施例的结构化文本检测系统可用于实现本申请上述各结构化文本检测方法实施例。如图7所示,该实施例的系统包括:接收模块601和获取模块602。
接收模块601,用于接收图片及文字区域模板。其中,该图片包括结构化文本;文字区域模板包括至少一个文字区域的位置,该至少一个文字区域的位置中各文字区域的位置分别基于与该图片同类的至少一个样本图片中相应文字区域的位置获得。
在一种可选的实现方案中,文字区域模板中各文字区域的位置,可以由相应文字区域的中心坐标、宽度及长度确定。
获取模块602,用于根据上述文字区域模板获取图片的一组待检测区域的实际位置。
在一种可选的实现方案中,各待检测区域的实际位置可以表示为:[x+w*f1,y+h*f2,exp(f3)*w,exp(f4)*h]。
其中,(x+w*f1,y+h*f2)表示待检测区域的中心坐标(X,Y),exp(f3)*w表示待检测区域的长度,exp(f4)*h表示待检测区域的宽度;x、y、h、w分别表示与待检测区域对应的文字区域的中心的X坐标、Y坐标、宽度和长度;[f1,f2,f3,f4]分别表示卷积神经网络在训练过程中,文字区域模板中各文字区域的回归目标,[f1,f2,f3,f4]=[(x'-x)/w,(y'-y)/h,log(w'/w),log(h'/h)],x'、y'、h'、w'分别表示至少一个样本图片中各样本图片的相应文字区域的中心的X坐标、Y坐标、宽度和长度。
在一种可选的实现方案中,本申请各结构化文本检测系统实施例中,上述接收模块601和获取模块602具体可以通过一个卷积神经网络实现。
本申请实施例的结构化文本检测系统中,预先基于同类的至少一个样本图片中相应文字区域的位置获得文字区域模板,卷积神经网络接收待检测图片及文字区域模板后,根据该文字区域模板获取待检测图片的一组待检测区域的实际位置,待检测区域较少,检测结构化文本所需要的计算量减小,所花费的时间减少,检测速率明显加快,所需要的计算资源也明显减少。
图8为根据本申请结构化文本检测系统的另一实施例的示意图,如图8所示,与图7 所示的实施例相比,该实施例中,获取模块602包括:特征提取单元701,兴趣区域池化操作单元702,分类分数和位置调整值获取单元703,文字区域确定单元704和实际位置确定单元705。其中:
特征提取单元701,用于对上述图片进行卷积处理,获得该图片的卷积特征图。
兴趣区域池化操作单元702,用于以上述文字区域模板中所有文字区域的位置作为上述图片的一组待检测区域,对上述卷积特征图进行兴趣区域池化操作,提取上述一组待检测区域中各待检测区域的局部特征。
分类分数和位置调整值获取单元703,用于分别根据各组待检测区域的局部特征获取各待检测区域的分类分数和位置调整值。
文字区域确定单元704,用于分别根据各待检测区域的分类分数确定各检测区域是否有文字。
在一种可选的实现方式中,文字区域确定单元704可以通过一个分类函数softmax层实现。分类函数层,用于分别确定各待检测区域的分类分数;若待检测区域的分类分数大于预设阈值,确定该分类分数大于预设阈值的待检测区域有文字。
实际位置确定单元705,用于分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的实际位置。
另外,再参见图8,在本申请各结构化文本检测系统的另一实施例中,还可以选择性地包括:图片预处理模块603,用于对上述图片进行截取和转正处理、并缩放到一个预设尺寸,然后发送给接收模块601。
另外,再参见图8,在本申请各结构化文本检测系统的又一实施例中,还可以选择性地包括:文字识别模块604,用于对一组待检测区域的实际位置对应区域进行文字识别,获得上述图片中的结构化文本信息。
另外,再参见图8,在本申请各结构化文本检测系统的再一实施例中,还可以选择性地包括:文字区域模板模块605,用于分别针对与上述图片同类的至少一个样本图片中的各相应文字区域,分别根据该对应文字区域的正确位置获取该对应文字区域的正确位置的平均值,根据上述至少一个样本图片中所有文字区域的正确位置的平均值获得文字区域模板。
图9为根据本申请结构化文本检测系统的又一实施例的示意图。如图9所示,接收模块601与获取模块602通过卷积神经网络60实现时,本申请结构化文本检测系统还可以包括网络训练模块606,用于利用与上述图片同类的至少一个样本图片对卷积神经网络进行训练,其中的样本图片包括至少一个文字区域,样本图片标注有各文字区域的正确位置。
该网络训练模块606,可以在对卷积神经网络60训练完成后移除。
在其中一种可选的实现方式中,卷积神经网络60具体用于:接收至少一个样本图片及文字区域模板,并分别针对该至少一个样本图片中的任一样本图片:对任一样本图片进行卷积处理,获得任一样本图片的卷积特征图;以上述文字区域模板中所有文字区域的位置作为任一样本图片的一组待检测区域,对卷积特征图进行兴趣区域池化操作,提取一组待检测区域中各待检测区域的局部特征;分别获取上述一组待检测区域中各待检测区域的预测分类分数和位置调整值;分别根据各待检测区域的预测分类分数确定各检测区域是否有文字;分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的预测位置。
相应地,网络训练模块606,具体用于根据至少一个样本图片标注的各文字区域的正确位置、各检测区域是否有文字的确定结果和预测位置,对卷积神经网络60进行训练。
另外,在本申请各结构化文本检测系统的进一步实施例中,图片预处理模块603,还可用于对任一样本图片进行截取和转正处理、并缩放到预设尺寸。再参见图9,该实施例的结构化文本检测系统还可以包括计算模块607,用于分别针对缩放到预设尺寸后的两个或以上样本图片中的各文字区域,计算该两个或以上样本图片的相应文字区域的位置的平均值,得到该两个或以上样本图片中的各文字区域的位置的平均值,其中的文字区域模板具体包括该两个或以上样本图片中的所有文字区域的位置的平均值。
在本申请各结构化文本检测系统的实施例中,基于样本图片获取文字区域模板时,可以选择通过文字区域模板模块605或者计算模块607两个或以上样本图片中的各文字区域的位置的平均值,来得到文字区域模板。
另外,本申请实施例还提供了一种计算设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等,该计算设备设置有本申请任一实施例的结构化文本检测系统。
本申请实施例还提供了另一种计算设备,包括:
处理器和本申请上述任一实施例的结构化文本检测系统;
在处理器运行结构化文本检测系统时,本申请上述任一实施例的结构化文本检测系统中的单元被运行。
本申请实施例还提供了又一种计算设备,包括:处理器、存储器、通信接口和通信总线,处理器、存储器和通信接口通过通信总线完成相互间的通信;
存储器用于存放至少一可执行指令,可执行指令使处理器执行本申请上述任一实施例的结构化文本检测方法中各步骤的操作。
例如,图10示出了可以实现本申请的结构化文本检测方法的一个计算设备。该计算设备包括:处理器(processor)801、通信接口(Communications Interface)802、存储器(memory)803、以及通信总线804。
处理器801、通信接口802、以及存储器803通过通信总线804完成相互间的通信。
通信接口804,用于与其它设备比如客户端或数据采集设备等的网元通信。
处理器801,用于执行程序,具体可以执行上述方法实施例中的相关步骤。
处理器801可以是一个中央处理器(CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。
存储器506,用于存放程序,该程序包括至少一可执行指令,该可执行指令具体可以用于使得处理器801执行以下操作:卷积神经网络接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。
存储器506可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
本申请实施例还提供一种计算机程序,包括计算机可读代码,当计算机可读代码在设备上运行时,设备中的处理器执行用于实现本申请任一实施例的结构化文本检测方法中各步骤的指令。
本申请各实施例中计算机程序中各步骤的具体实现可以参见上述实施例中的相应操作、模块、单元中对应的描述,在此不赘述。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上面描述的设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程描述,在此不再赘述。
本申请实施例还提供了一种计算机系统,包括:
存储器,存储可执行指令;
一个或多个处理器,与存储器通信以执行可执行指令从而完成本申请任一实施例的结构化文本检测方法中各步骤的操作。
本申请实施例还提供了一种计算机可读介质,用于存储计算机可读取的指令,该指令被执行时实现本申请任一的结构化文本检测方法中各步骤的操作。
除非明确指出,在此所用的单数形式“一”、“该”均包括复数含义(即具有“至少一”的意思)。应当进一步理解,说明书中使用的术语“具有”、“包括”和/或“包含”表明存在所述的特征、步骤、操作、元件和/或部件,但不排除存在或增加一个或多个其他特征、步骤、操作、元件、部件和/或其组合。如在此所用的术语“和/或”包括一个或多个列举的相关项目的任何及所有组合。除非明确指出,在此公开的任何方法的步骤不必精确按照所公开的顺序执行。
需要指出,根据实施的需要,可将本申请实施例中描述的各个部件/步骤拆分为更多部件/步骤,也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤,以实现本申请实施例的目的。
在此提供的方法和显示不与任何特定计算机、虚拟系统或者其他设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本申请并帮助理解各个发明方面中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统、设备、存储介质、程序等实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
一些实施例已经在前面进行了说明,但是应当强调的是,本申请不局限于这些实施例,而是可以本申请主题范围内的其它方式实现。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及方法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
以上实施方式仅用于说明本申请实施例,而并非对本申请实施例的限制,有关技术领域的普通技术人员,在不脱离本申请实施例的精神和范围的情况下,还可以做出各种变化和变型,因此所有等同的技术方案也属于本申请实施例的范畴,本申请实施例的专利保护范围应由权利要求限定。

Claims (27)

  1. 一种结构化文本检测方法,其特征在于,所述方法包括:
    卷积神经网络接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;
    所述卷积神经网络根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。
  2. 根据权利要求1所述的方法,其特征在于,根据所述文字区域模板获取所述图片的一组待检测区域的实际位置,包括:
    对所述图片进行卷积处理,获得所述图片的卷积特征图;
    以所述文字区域模板中所有文字区域的位置作为所述图片的一组待检测区域,对所述卷积特征图进行兴趣区域池化操作,提取所述一组待检测区域中各待检测区域的局部特征;
    分别根据各待检测区域的局部特征获取所述一组待检测区域中各待检测区域的分类分数和位置调整值;
    分别根据各待检测区域的分类分数确定各检测区域是否有文字;
    分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的实际位置。
  3. 根据权利要求2所述的方法,其特征在于,所述分别根据各待检测区域的分类分数确定各检测区域是否有文字,包括:
    通过所述卷积神经网络中的分类函数层,分别确定各待检测区域的分类分数;
    若待检测区域的分类分数大于预设阈值,确定该分类分数大于预设阈值的待检测区域有文字。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述文字区域模板中各文字区域的位置,由相应文字区域的中心坐标、宽度及长度确定。
  5. 根据权利要求1-4任一所述的方法,其特征在于,卷积神经网络接收图片及文字区域模板之前,还包括:
    对所述图片进行截取和转正处理、并缩放到预设尺寸。
  6. 根据权利要求1-5任一所述的方法,其特征在于,获取所述图片的一组待检测区域的实际位置之后,还包括:
    对所述一组待检测区域的实际位置对应区域进行文字识别,获得所述图片中的结构化文本信息。
  7. 根据权利要求1-6任一所述的方法,其特征在于,所述卷积神经网络接收图片及文字区域模板之前,还包括:
    分别获取与所述图片同类的至少一个样本图片中所有文字区域的正确位置;
    分别针对所述至少一个样本图片中的各相应文字区域,获取各对应文字区域的正确位置的平均值,根据所述至少一个样本图片中所有文字区域的正确位置的平均值获得所述文字区域模板。
  8. 根据权利要求1-6任一所述的方法,其特征在于,所述卷积神经网络接收图片及文字区域模板之前,还包括:
    利用与所述图片同类的至少一个样本图片对卷积神经网络进行训练,所述样本图片包括至少一个文字区域,所述样本图片标注有各文字区域的正确位置。
  9. 根据权利要求8所述的方法,其特征在于,所述利用与所述图片同类的至少一个样本图片对卷积神经网络进行训练,包括:
    所述卷积神经网络接收所述至少一个样本图片及文字区域模板,并分别针对所述至少一个样本图片中的任一样本图片:对所述任一样本图片进行卷积处理,获得所述任一样本图片的卷积特征图;以所述文字区域模板中所有文字区域的位置作为所述任一样本图片的一组待检测区域,对所述卷积特征图进行兴趣区域池化操作,提取所述一组待检测区域中各待检测区域的局部特征;分别获取所述一组待检测区域中各待检测区域的预测分类分数和位置调整值;分别根据各待检测区域的预测分类分数确定各检测区域是否有文字;分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的预测位置;
    根据所述至少一个样本图片标注的各文字区域的正确位置、各检测区域是否有文字的确定结果和所述预测位置,对所述卷积神经网络进行训练。
  10. 根据权利要求9所述的方法,其特征在于,所述卷积神经网络接收所述至少一个样本图片之前,还包括:
    对所述任一样本图片进行截取和转正处理、并缩放到预设尺寸。
  11. 根据权利要求10所述的方法,其特征在于,所述对所述任一样本图片进行截取和转正处理、并缩放到预设尺寸之后,还包括:
    分别针对缩放到预设尺寸后的两个或以上样本图片中的各文字区域,计算所述两个或以上样本图片的相应文字区域的位置的平均值,得到所述两个或以上样本图片中的各文字区域的位置的平均值,所述文字区域模板具体包括所述两个或以上样本图片中的所有文字区域的位置的平均值。
  12. 一种结构化文本检测系统,其特征在于,所述系统包括:
    接收模块,用于接收图片及文字区域模板;所述图片包括结构化文本;所述文字区域模板包括至少一个文字区域的位置,所述至少一个文字区域的位置中各文字区域的位置分别基于与所述图片同类的至少一个样本图片中相应文字区域的位置获得;
    获取模块,用于根据所述文字区域模板获取所述图片的一组待检测区域的实际位置。
  13. 根据权利要求12所述的系统,其特征在于,所述获取模块包括:
    特征提取单元,用于对所述图片进行卷积处理,获得所述图片的卷积特征图;
    兴趣区域池化操作单元,用于以所述文字区域模板中所有文字区域的位置作为所述图片的一组待检测区域,对所述卷积特征图进行兴趣区域池化操作,提取所述一组待检测区域中各待检测区域的局部特征;
    分类分数和位置调整值获取单元,用于分别根据各组待检测区域的局部特征获取各待检测区域的分类分数和位置调整值;
    文字区域确定单元,用于分别根据各待检测区域的分类分数确定各检测区域是否有文字;
    实际位置确定单元,用于分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的实际位置。
  14. 根据权利要求13所述的系统,其特征在于,所述文字区域确定单元包括分类函数层;
    所述分类函数层,用于分别确定各待检测区域的分类分数;若待检测区域的分类分数大于预设阈值,确定该分类分数大于预设阈值的待检测区域有文字。
  15. 根据权利要求12-14任一所述的系统,其特征在于,所述文字区域模板中各文字区域的位置,由相应文字区域的中心坐标、宽度及长度确定。
  16. 根据权利要求12-15任一所述的系统,其特征在于,还包括:
    图片预处理模块,用于对所述图片进行截取和转正处理、并缩放到一个预设尺寸。
  17. 根据权利要求12-16任一所述的系统,其特征在于,还包括:
    文字识别模块,用于对所述一组待检测区域的实际位置对应区域进行文字识别,获得所述图片中的结构化文本信息。
  18. 根据权利要求12-17任一所述的系统,其特征在于,还包括:
    文字区域模板模块,用于分别针对所述至少一个样本图片中的各相应文字区域,分别根据对应文字区域的正确位置获取对应文字区域的正确位置的平均值,根据所述至少一个样本图片中所有文字区域的正确位置的平均值获得所述文字区域模板。
  19. 根据权利要求12-17任一所述的系统,其特征在于,所述接收模块与所述获取模块通过卷积神经网络实现;
    所述系统还包括:
    网络训练模块,用于利用与所述图片同类的至少一个样本图片对所述卷积神经网络进行训练,所述样本图片包括至少一个文字区域,所述样本图片标注有各文字区域的正确位置。
  20. 根据权利要求19所述的方法,其特征在于,所述卷积神经网络具体用于:接收所述至少一个样本图片及文字区域模板,并分别针对所述至少一个样本图片中的任一样本图片:对所述任一样本图片进行卷积处理,获得所述任一样本图片的卷积特征图;以所述文字区域模板中所有文字区域的位置作为所述任一样本图片的一组待检测区域,对所述卷积特征图进行兴趣区域池化操作,提取所述一组待检测区域中各待检测区域的局部特征;分别获取所述一组待检测区域中各待检测区域的预测分类分数和位置调整值;分别根据各待检测区域的预测分类分数确定各检测区域是否有文字;分别针对各有文字的待检测区域,根据有文字的待检测区域的位置调整值调整该有文字的待检测区域的坐标值,得到该有文字的待检测区域的预测位置;
    所述网络训练模块,具体用于根据所述至少一个样本图片标注的各文字区域的正确位置、各检测区域是否有文字的确定结果和所述预测位置,对所述卷积神经网络进行训练。
  21. 根据权利要求20所述的方法,其特征在于,所述图片预处理模块,还用于对所述任一样本图片进行截取和转正处理、并缩放到预设尺寸;
    所述系统还包括:
    计算模块,用于分别针对缩放到预设尺寸后的两个或以上样本图片中的各文字区域,计算所述两个或以上样本图片的相应文字区域的位置的平均值,得到所述两个或以上样本图片中的各文字区域的位置的平均值,所述文字区域模板具体包括所述两个或以上样本图片中的所有文字区域的位置的平均值。
  22. 一种计算设备,其特征在于,包括:权利要求12至21任一所述的结构化文本检测系统。
  23. 一种计算设备,其特征在于,包括:
    处理器和权利要求12至21任一所述的结构化文本检测系统;
    在处理器运行所述结构化文本检测系统时,权利要求12至21任一所述的结构化文本检测系统中的单元被运行。
  24. 一种计算设备,其特征在于,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行权利要求1-11任一所述的结构化文本检测方法中各步骤的操作。
  25. 一种计算机系统,其特征在于,包括:
    存储器,存储可执行指令;
    一个或多个处理器,与存储器通信以执行可执行指令从而完成权利要求1-11任一所述的结构化文本检测方法中各步骤的操作。
  26. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-11任一所述的结构化文本检测方法中各步骤的指令。
  27. 一种计算机可读介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-11任一所述的结构化文本检测方法中各步骤的操作。
PCT/CN2017/092586 2016-07-15 2017-07-12 结构化文本检测方法和系统、计算设备 WO2018010657A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/052,584 US10937166B2 (en) 2016-07-15 2018-08-01 Methods and systems for structured text detection, and non-transitory computer-readable medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610561355.7A CN106295629B (zh) 2016-07-15 2016-07-15 结构化文本检测方法和系统
CN201610561355.7 2016-07-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/052,584 Continuation US10937166B2 (en) 2016-07-15 2018-08-01 Methods and systems for structured text detection, and non-transitory computer-readable medium

Publications (1)

Publication Number Publication Date
WO2018010657A1 true WO2018010657A1 (zh) 2018-01-18

Family

ID=57651567

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/092586 WO2018010657A1 (zh) 2016-07-15 2017-07-12 结构化文本检测方法和系统、计算设备

Country Status (3)

Country Link
US (1) US10937166B2 (zh)
CN (1) CN106295629B (zh)
WO (1) WO2018010657A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492143A (zh) * 2018-09-21 2019-03-19 平安科技(深圳)有限公司 图像数据处理方法、装置、计算机设备及存储介质
CN110619325A (zh) * 2018-06-20 2019-12-27 北京搜狗科技发展有限公司 一种文本识别方法及装置
CN111461105A (zh) * 2019-01-18 2020-07-28 顺丰科技有限公司 一种文本识别方法和装置
CN113743327A (zh) * 2021-09-07 2021-12-03 中国工商银行股份有限公司 单据识别方法、单据核对方法、装置和设备

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295629B (zh) 2016-07-15 2018-06-15 北京市商汤科技开发有限公司 结构化文本检测方法和系统
WO2018089210A1 (en) * 2016-11-09 2018-05-17 Konica Minolta Laboratory U.S.A., Inc. System and method of using multi-frame image features for object detection
CN107665354B (zh) * 2017-09-19 2021-04-23 北京小米移动软件有限公司 识别身份证的方法及装置
CN107909107B (zh) * 2017-11-14 2020-09-15 深圳码隆科技有限公司 纤维检测方法、装置及电子设备
CN107977665A (zh) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 一种发票中关键信息的识别方法及计算设备
CN108229470B (zh) 2017-12-22 2022-04-01 北京市商汤科技开发有限公司 文字图像处理方法、装置、设备及存储介质
CN108256520B (zh) * 2017-12-27 2020-10-30 中国科学院深圳先进技术研究院 一种识别硬币年份的方法、终端设备及计算机可读存储介质
CN108229463A (zh) * 2018-02-07 2018-06-29 众安信息技术服务有限公司 基于图像的文字识别方法
CN108520254B (zh) * 2018-03-01 2022-05-10 腾讯科技(深圳)有限公司 一种基于格式化图像的文本检测方法、装置以及相关设备
CN109034159B (zh) * 2018-05-28 2021-05-28 北京捷通华声科技股份有限公司 图像信息提取方法和装置
CN108874941B (zh) * 2018-06-04 2021-09-21 成都知道创宇信息技术有限公司 基于卷积特征和多重哈希映射的大数据url去重方法
CN109086756B (zh) * 2018-06-15 2021-08-03 众安信息技术服务有限公司 一种基于深度神经网络的文本检测分析方法、装置及设备
US20200004815A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Text entity detection and recognition from images
CN110766014B (zh) * 2018-09-06 2020-05-29 邬国锐 票据信息定位方法、系统及计算机可读存储介质
CN111144400B (zh) * 2018-11-06 2024-03-29 北京金山云网络技术有限公司 身份证信息的识别方法、装置、终端设备及存储介质
CN111222368B (zh) * 2018-11-26 2023-09-19 北京金山办公软件股份有限公司 一种识别文档段落的方法、装置及电子设备
CN111615702B (zh) * 2018-12-07 2023-10-17 华为云计算技术有限公司 一种从图像中提取结构化数据的方法、装置和设备
US10496899B1 (en) * 2019-01-25 2019-12-03 StradVision, Inc. Learning method and learning device for adjusting parameters of CNN in which residual networks are provided for meta learning, and testing method and testing device using the same
CN111488877A (zh) * 2019-01-29 2020-08-04 北京新唐思创教育科技有限公司 一种用于教学系统的ocr识别方法、装置和终端
CN109886257B (zh) * 2019-01-30 2022-10-18 四川长虹电器股份有限公司 一种ocr系统中采用深度学习矫正发票图片分割结果的方法
US10616443B1 (en) * 2019-02-11 2020-04-07 Open Text Sa Ulc On-device artificial intelligence systems and methods for document auto-rotation
CN111695377B (zh) * 2019-03-13 2023-09-29 杭州海康威视数字技术股份有限公司 一种文本检测方法、装置和计算机设备
US11176364B2 (en) 2019-03-19 2021-11-16 Hyland Software, Inc. Computing system for extraction of textual elements from a document
CN110188755B (zh) * 2019-05-30 2021-09-07 北京百度网讯科技有限公司 一种图像识别的方法、装置和计算机可读存储介质
US10977184B2 (en) * 2019-06-20 2021-04-13 Apical Limited and Arm Limited Managing memory access for convolutional neural networks
CN110378338B (zh) * 2019-07-11 2024-08-27 腾讯科技(深圳)有限公司 一种文本识别方法、装置、电子设备和存储介质
CN110443252A (zh) * 2019-08-16 2019-11-12 广东工业大学 一种文字检测方法、装置及设备
CN110738238B (zh) * 2019-09-18 2023-05-26 平安科技(深圳)有限公司 一种证件信息的分类定位方法及装置
CN110826557A (zh) * 2019-10-25 2020-02-21 杭州依图医疗技术有限公司 一种骨折检出的方法及装置
CN113076441A (zh) * 2020-01-06 2021-07-06 北京三星通信技术研究有限公司 关键词抽取方法、装置、电子设备及计算机可读存储介质
CN111414905B (zh) * 2020-02-25 2023-08-18 泰康保险集团股份有限公司 一种文本检测方法、文本检测装置、电子设备及存储介质
CN111754505B (zh) * 2020-06-30 2024-03-15 创新奇智(成都)科技有限公司 辅料检测方法、装置、电子设备及存储介质
CN112446829B (zh) * 2020-12-11 2023-03-24 成都颜创启新信息技术有限公司 图片方向调整方法、装置、电子设备及存储介质
CN112712075B (zh) * 2020-12-30 2023-12-01 科大讯飞股份有限公司 算式检测方法、电子设备和存储装置
CN113420564B (zh) * 2021-06-21 2022-11-22 国网山东省电力公司物资公司 一种基于混合匹配的电力铭牌语义结构化方法及系统
US20230129240A1 (en) * 2021-10-26 2023-04-27 Salesforce.Com, Inc. Automatic Image Conversion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5521985A (en) * 1992-08-13 1996-05-28 International Business Machines Corporation Apparatus for recognizing machine generated or handprinted text
CN104182722A (zh) * 2013-05-24 2014-12-03 佳能株式会社 文本检测方法和装置以及文本信息提取方法和系统
CN104794504A (zh) * 2015-04-28 2015-07-22 浙江大学 基于深度学习的图形图案文字检测方法
CN105469047A (zh) * 2015-11-23 2016-04-06 上海交通大学 基于无监督学习深度学习网络的中文检测方法及系统
CN106295629A (zh) * 2016-07-15 2017-01-04 北京市商汤科技开发有限公司 结构化文本检测方法和系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679901B (zh) * 2013-12-30 2017-01-04 威海北洋电气集团股份有限公司 提高ocr证件识别效率的方法及访客登记一体机
CN103927352A (zh) * 2014-04-10 2014-07-16 江苏唯实科技有限公司 利用知识库海量关联信息的中文名片ocr数据修正系统
CN105608454B (zh) * 2015-12-21 2019-08-09 上海交通大学 基于文字结构部件检测神经网络的文字检测方法及系统
CN105574513B (zh) * 2015-12-22 2017-11-24 北京旷视科技有限公司 文字检测方法和装置
US10984289B2 (en) * 2016-12-23 2021-04-20 Shenzhen Institute Of Advanced Technology License plate recognition method, device thereof, and user equipment
US10455259B2 (en) * 2017-08-23 2019-10-22 Intel Corporation Enhanced text rendering and readability in images
US10262235B1 (en) * 2018-02-26 2019-04-16 Capital One Services, Llc Dual stage neural network pipeline systems and methods
US11055557B2 (en) * 2018-04-05 2021-07-06 Walmart Apollo, Llc Automated extraction of product attributes from images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5521985A (en) * 1992-08-13 1996-05-28 International Business Machines Corporation Apparatus for recognizing machine generated or handprinted text
CN104182722A (zh) * 2013-05-24 2014-12-03 佳能株式会社 文本检测方法和装置以及文本信息提取方法和系统
CN104794504A (zh) * 2015-04-28 2015-07-22 浙江大学 基于深度学习的图形图案文字检测方法
CN105469047A (zh) * 2015-11-23 2016-04-06 上海交通大学 基于无监督学习深度学习网络的中文检测方法及系统
CN106295629A (zh) * 2016-07-15 2017-01-04 北京市商汤科技开发有限公司 结构化文本检测方法和系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619325A (zh) * 2018-06-20 2019-12-27 北京搜狗科技发展有限公司 一种文本识别方法及装置
CN110619325B (zh) * 2018-06-20 2024-03-08 北京搜狗科技发展有限公司 一种文本识别方法及装置
CN109492143A (zh) * 2018-09-21 2019-03-19 平安科技(深圳)有限公司 图像数据处理方法、装置、计算机设备及存储介质
CN111461105A (zh) * 2019-01-18 2020-07-28 顺丰科技有限公司 一种文本识别方法和装置
CN111461105B (zh) * 2019-01-18 2023-11-28 顺丰科技有限公司 一种文本识别方法和装置
CN113743327A (zh) * 2021-09-07 2021-12-03 中国工商银行股份有限公司 单据识别方法、单据核对方法、装置和设备

Also Published As

Publication number Publication date
US20180342061A1 (en) 2018-11-29
US10937166B2 (en) 2021-03-02
CN106295629B (zh) 2018-06-15
CN106295629A (zh) 2017-01-04

Similar Documents

Publication Publication Date Title
WO2018010657A1 (zh) 结构化文本检测方法和系统、计算设备
US11321593B2 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
WO2019174130A1 (zh) 票据识别方法、服务器及计算机可读存储介质
US20200320273A1 (en) Remote sensing image recognition method and apparatus, storage medium and electronic device
CN108229303B (zh) 检测识别和检测识别网络的训练方法及装置、设备、介质
US10013643B2 (en) Performing optical character recognition using spatial information of regions within a structured document
WO2019119966A1 (zh) 文字图像处理方法、装置、设备及存储介质
US11106891B2 (en) Automated signature extraction and verification
WO2018205467A1 (zh) 车损部位的识别方法、系统、电子装置及存储介质
CN108491866B (zh) 色情图片鉴定方法、电子装置及可读存储介质
US9355332B2 (en) Pattern recognition based on information integration
CN111967286A (zh) 信息承载介质的识别方法、识别装置、计算机设备和介质
US10163007B2 (en) Detecting orientation of textual documents on a live camera feed
JP5832656B2 (ja) 画像中のテキストの検出を容易にする方法及び装置
CN110796108A (zh) 一种人脸质量检测的方法、装置、设备及存储介质
US10049268B2 (en) Selective, user-mediated content recognition using mobile devices
CN112434689A (zh) 识别图片中信息的方法、装置、设备及存储介质
CN110390295B (zh) 一种图像信息识别方法、装置及存储介质
CN113762455A (zh) 检测模型训练方法、单字检测方法、装置、设备及介质
WO2019071476A1 (zh) 一种基于智能终端的快递信息录入方法及录入系统
CN112508005B (zh) 用于处理图像的方法、装置、设备以及存储介质
CN114120305B (zh) 文本分类模型的训练方法、文本内容的识别方法及装置
CN115620315A (zh) 手写文本检测方法、装置、服务器和存储介质
CN113887375A (zh) 一种文本识别方法、装置、设备及存储介质
US20230036812A1 (en) Text Line Detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17826993

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17826993

Country of ref document: EP

Kind code of ref document: A1