WO2022042365A1 - 一种基于图神经网络识别证件的方法及系统 - Google Patents

一种基于图神经网络识别证件的方法及系统 Download PDF

Info

Publication number
WO2022042365A1
WO2022042365A1 PCT/CN2021/112926 CN2021112926W WO2022042365A1 WO 2022042365 A1 WO2022042365 A1 WO 2022042365A1 CN 2021112926 W CN2021112926 W CN 2021112926W WO 2022042365 A1 WO2022042365 A1 WO 2022042365A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection
text
sample
image
layout
Prior art date
Application number
PCT/CN2021/112926
Other languages
English (en)
French (fr)
Inventor
汪昊
张天明
王智恒
王树栋
薛韬略
周士奇
程博
毕潇
Original Assignee
北京嘀嘀无限科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京嘀嘀无限科技发展有限公司 filed Critical 北京嘀嘀无限科技发展有限公司
Publication of WO2022042365A1 publication Critical patent/WO2022042365A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area

Definitions

  • the embodiments of this specification relate to the technical field of image processing, and in particular, to a method and system for identifying a certificate based on a graph neural network.
  • a certificate is an important document for recording basic information of an individual or organization.
  • certificates are widely used in many fields of social activities.
  • more and more application platforms such as online car-hailing platforms, lending platforms, etc., need to collect and register the text information in the corresponding documents to complete the business, such as real-name authentication.
  • an embodiment of the present specification proposes a method for identifying a certificate based on a graph neural network, to determine the type of text in a certificate image.
  • An aspect of the embodiments of the present specification provides a method for identifying a document based on a graph neural network, the method includes: acquiring an image to be recognized; detecting content text contained in the image to be recognized, and determining a plurality of detection frames; A plurality of detection frames construct a layout; wherein, the layout includes a plurality of nodes and a plurality of edges, the nodes correspond to the detection frame, and the edges correspond to the spatial positional relationship between the detection frame and other detection frames ; Process the layout by using the trained graph neural network model, determine the field category of the detection frame in the layout, and identify the certificate based on the field category.
  • An aspect of the embodiments of this specification provides a system for identifying documents based on a graph neural network, the system includes: an acquisition module for acquiring an image to be recognized; a detection module for detecting content text contained in the image to be recognized , determine a plurality of detection frames; a building module is used to construct a layout based on the plurality of detection frames; wherein, the layout includes a plurality of nodes and a plurality of edges, the nodes correspond to the detection frames, and the edges Corresponding to the spatial positional relationship between the detection frame and other detection frames; the classification module is used to process the layout by using the trained graph neural network model, and determine the field category of the detection frame in the layout , which identifies the credential based on the field class.
  • An aspect of the embodiments of the present specification provides an apparatus for identifying documents based on a graph neural network
  • the apparatus includes a processor and a memory
  • the memory is used for storing instructions
  • the processor is used for executing the instructions, so as to implement the above An operation corresponding to the method for identifying documents based on a graph neural network.
  • One aspect of the embodiments of the present specification provides a computer-readable storage medium, where the storage medium stores computer instructions, and after the computer reads the computer instructions in the storage medium, the identification certificate based on the graph neural network as described in the previous item is realized. method corresponding to the operation.
  • FIG. 1 is a schematic diagram of an application scenario of a system for identifying documents based on a graph neural network according to some embodiments of the present specification
  • FIG. 2 is a block diagram of a system for identifying documents based on a graph neural network according to some embodiments of the present specification
  • FIG. 3 is a flowchart of a method for identifying a credential based on a graph neural network according to some embodiments of the present specification
  • FIG. 4 is an exemplary schematic diagram of building a layout from a plurality of detection frames according to some embodiments of the present specification
  • FIG. 5 is another exemplary schematic diagram of constructing a layout from a plurality of detection frames according to some embodiments of the present specification
  • FIG. 6 is another exemplary schematic diagram of constructing a layout from a plurality of detection frames according to some embodiments of the present specification
  • FIG. 7 is a flowchart of a method for determining multiple detection frames according to some embodiments of the present specification.
  • FIG. 8 is another flowchart of a method for determining a plurality of detection frames according to some embodiments of the present specification
  • FIG. 9 is a flowchart of a method for training a graph neural network model according to some embodiments of the present specification.
  • FIG. 10 is an exemplary schematic diagram of two text boxes located on the same coordinate axis according to some embodiments of the present specification.
  • system means for distinguishing different components, elements, parts, parts or assemblies at different levels.
  • device means for converting signals into signals.
  • unit means for converting signals into signals.
  • module means for converting signals into signals.
  • FIG. 1 is a schematic diagram of an application scenario of a system for identifying documents based on a graph neural network according to some embodiments of the present specification.
  • the system for identifying documents based on the graph neural network disclosed in the embodiments of this specification can be applied to the scene of text recognition based on images. For example, automatically enter text information in a certificate based on the certificate image.
  • an application platform such as a driver's registration for an online car-hailing platform
  • the sources of information to be reviewed include the driver's ID card, driving license, and driver's license and other documents. .
  • the type of the text information in the certificate can be pre-determined, and the text to be obtained can be filtered out based on the type. information, only the text information is identified.
  • a corresponding matching rule may be set through the fixed format of the certificate, and the category of the text information in the corresponding position in the certificate is determined based on the matching rule. For example, the text positions of a large number of documents are counted to generate a fixed template, and the matching relationship between different positions in the template and the corresponding text categories is established, so as to determine the text category corresponding to each position in the template.
  • this method has the following characteristics: (1) For a certificate with a changed layout, it will cause a category matching error.
  • the field in the corresponding position in the template is for the driving type, and the field in the recognized document that matches the corresponding position in the template has changed from one line of text to two lines, which will result in a category matching error; An image of a complete document, which cannot exactly match the template, will also result in a class match error.
  • the embodiment of this specification proposes a method for identifying a certificate based on a graph neural network.
  • the graph neural network model is used to classify the text information in the document image, which does not depend on the relative position of the text information in the document, does not need to formulate complex matching rules, and can still be used for documents with large layout changes and incomplete documents. Get the correct category and improve the classification accuracy.
  • an application scenario 100 of a system for identifying documents based on a graph neural network may include a processing device 110 , a network 120 and a user terminal 130 .
  • Processing device 110 may be operable to process information and/or data associated with a graph neural network-based identification document to perform one or more functions disclosed in this specification.
  • the processing device 110 may acquire the image to be recognized.
  • the processing device 110 may detect the content text contained in the image to be recognized, and determine a plurality of detection frames.
  • processing device 110 may construct a layout based on a plurality of detection boxes.
  • the processing device 110 may use the trained graph neural network model to process the layout, determine the field category of the detection frame in the layout, and identify the document based on the field category.
  • processing device 110 may include one or more processing engines (eg, single-core processing engines or multi-core processors).
  • the processing device 110 may include a central processing unit (CPU), an application specific integrated circuit (ASIC), an application specific instruction set processor (ASIP), a graphics processor (GPU), a physical processing unit (PPU), One of digital signal processor (DSP), field programmable gate array (FPGA), programmable logic device (PLD), controller, microcontroller unit, reduced instruction set computer (RISC), microprocessor, etc.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • ASIP application specific instruction set processor
  • GPU graphics processor
  • PPU physical processing unit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • PLD programmable logic device
  • controller microcontroller unit
  • RISC reduced instruction set computer
  • Network 120 may facilitate the exchange of information and/or data.
  • one or more components of scene graph 100 eg, processing device 110 , user terminal 130
  • the processing device 110 may acquire the image to be recognized from the user terminal 130 through the network 120 .
  • the user terminal 130 may obtain the identification result of the certificate by the processing device 110 through the network 120 .
  • network 120 may be any form of wired or wireless network, or any combination thereof.
  • the network 120 may be a wired network, a fiber optic network, a telecommunications network, an internal network, the Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), a metropolitan area network (MAN), a wide area network (WAN) One or more combinations of , Public Switched Telephone Network (PSTN), Bluetooth network, etc.
  • LAN local area network
  • WAN wide area network
  • WLAN wireless local area network
  • MAN metropolitan area network
  • WAN wide area network
  • PSTN Public Switched Telephone Network
  • Bluetooth network etc.
  • User terminal 130 may be a device with data acquisition, storage and/or transmission functions.
  • user terminal 130 includes a camera device.
  • the user terminal 130 may acquire the image to be recognized through a photographing device.
  • the user terminal 130 may receive the identification result of the credential by the processing device 110 .
  • the user of the user terminal 130 may be a user of an online service using the application platform. For example, users who use the operating services of online car-hailing platforms.
  • user terminal 130 may include, but is not limited to, mobile device 130-1, tablet computer 130-2, laptop computer 130-3, desktop computer 130-4, etc., or any combination thereof.
  • Exemplary mobile devices 130-1 may include, but are not limited to, smart phones, personal digital assistants (PDAs), etc., or any combination thereof.
  • the user terminal 130 may send the acquired data to one or more devices in the scene 100 for identifying documents based on a graph neural network.
  • FIG. 2 is a block diagram of a system for identifying documents based on a graph neural network according to some embodiments of the present specification.
  • the system 200 may include an acquisition module 210 , a detection module 220 , a construction module 230 , and a classification module 240 .
  • the acquiring module 210 can be used to acquire the image to be recognized.
  • the detection module 220 may be configured to detect the content text contained in the to-be-recognized image, and determine a plurality of detection frames. In some embodiments, the detection module 220 may be further configured to: obtain the type of the certificate; process the to-be-recognized image based on a text detection algorithm to determine multiple text boxes; when the type belongs to a preset type , processing the multiple text boxes based on the preset rules corresponding to the preset types to determine the multiple detection boxes.
  • the detection module may be further configured to: determine the same line in the certificate. To-be-merged text boxes; determine at least one to-be-merged line of the certificate, where the to-be-merged line corresponds to the merge reference line; merge the to-be-merged text boxes of the to-be-merged lines to determine the detection frame.
  • the detection module 220 may be further configured to: determine the degree of coincidence of coordinate values corresponding to the text box and other text boxes in the vertical direction; in response to the degree of coincidence being greater than a first preset threshold , and determine the text box and the other text boxes as the text boxes to be merged located in the same row.
  • the detection module 220 may be further configured to: process the to-be-recognized image based on a text detection algorithm to determine multiple text boxes; determine whether the distance between the text box and other text boxes is smaller than the first Two preset thresholds, and whether the font size of the content in the text box and the content in the other text boxes are the same; in response to the distance between the text box and the other text boxes being less than the second preset threshold , and the font size of the content in the text box and the content in the other text boxes is the same, and the text box and the other text boxes are combined to determine the detection frame.
  • the building module 230 may be configured to build a layout based on the plurality of detection frames; wherein the layout includes a plurality of nodes and a plurality of edges, the nodes correspond to the detection frames, and the edges Corresponding to the spatial positional relationship between the detection frame and other detection frames.
  • the features of the nodes reflect one or more of the following: the location, size, shape, and associated image information of the detection frame based on the detection frame Information about the image of the determined area.
  • the feature of the edge reflects one or more of the following information: distance information and relative position information between the detection frame and the other detection frames.
  • the building module may be further configured to: from the plurality of detection frames, determine at least one other detection frame that is horizontally adjacent or/and vertically adjacent to the detection frame; Each of the plurality of detection frames is connected with at least one other detection frame corresponding to it to form the layout.
  • the building module may be further configured to: from the plurality of detection frames, determine at least one other detection frame whose distance from the detection frame meets a preset requirement; Each detection frame and its corresponding at least one other detection frame are connected to form the layout.
  • the classification module 240 may be configured to process the layout using a trained graph neural network model, determine a field category of the detection frame in the layout, and identify the credential based on the field category.
  • the graph neural network model is obtained by training as follows:
  • the sample training set includes: a plurality of sample layouts established based on a plurality of sample images of the document, and a label corresponding to at least one sample node of the sample layout, wherein the sample images is the complete image of the certificate, the incomplete image of the certificate, or the image of different typeset of the certificate; the sample node of the sample layout corresponds to the sample detection frame of the sample image, and the sample edge in the sample layout Corresponding to the spatial positional relationship between the sample detection frame and other sample detection frames, the label corresponding to the sample node represents the category of the field in the sample detection frame corresponding to the sample node;
  • the trained graph neural network model wherein, the trained loss function is established based on the difference between the label corresponding to the sample node and the predicted value output by the sample node.
  • the system 200 further includes an identification module 250, configured to: determine a content frame related to a preset service based on the field category of the detection frame; identify the text in the content frame based on an identification algorithm, and determine the content The text content of the box.
  • an identification module 250 configured to: determine a content frame related to a preset service based on the field category of the detection frame; identify the text in the content frame based on an identification algorithm, and determine the content The text content of the box.
  • system and its modules shown in FIG. 2 may be implemented in various ways.
  • the system and its modules may be implemented in hardware, software, or a combination of software and hardware.
  • the hardware part can be realized by using dedicated logic;
  • the software part can be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware.
  • a suitable instruction execution system such as a microprocessor or specially designed hardware.
  • the methods and systems described above may be implemented using computer-executable instructions and/or embodied in processor control code, for example on a carrier medium such as a disk, CD or DVD-ROM, such as a read-only memory (firmware) ) or a data carrier such as an optical or electronic signal carrier.
  • the system and its modules of this specification can be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. , can also be implemented by, for example, software executed by various types of processors, and can also be implemented by a combination of the above-mentioned hardware circuits and software (eg, firmware).
  • the above description of the system 200 and its modules for identifying documents based on a graph neural network is only for the convenience of description, and does not limit the description to the scope of the illustrated embodiments. It can be understood that for those skilled in the art, after understanding the principle of the system, various modules may be combined arbitrarily, or a subsystem may be formed to connect with other modules without departing from the principle.
  • the acquisition module 210 , the detection module 220 , the construction module 230 , the classification module 240 and the identification module 250 disclosed in FIG. 2 may be different modules in a system, or may be one module to implement the functions of the above two modules.
  • each module in the system 200 for identifying documents based on a graph neural network may share one storage module, and each module may also have its own storage module. Such deformations are all within the protection scope of this specification.
  • FIG. 3 is a flowchart of a method for identifying a credential based on a graph neural network according to some embodiments of the present specification.
  • the process 300 may be implemented by a system for identifying documents based on a graph neural network, or the processing device 110 shown in FIG. 1 . As shown in Figure 3, the process 300 may include the following steps:
  • Step 310 acquiring an image to be recognized. In some embodiments, this step 310 may be performed by the acquisition module 210 .
  • the image to be recognized may refer to any image that needs to recognize the text information in the image.
  • the image to be recognized is an image obtained after imaging a recognized object, wherein the recognized object contains text information that needs to be recognized.
  • the identification object may be a certificate or a certificate-related object, wherein the certificate may be any certificate, such as an ID card, a driver's license, or a driving license, and the like.
  • the image to be recognized may be an image of a document or an object related to the document. For example, an image obtained after imaging a document or a document-related object (eg, a copy of the document, etc.).
  • the image to be identified may be a preprocessed image.
  • the to-be-recognized image may be an image obtained by preprocessing an original image obtained by imaging a recognized object (for example, a document).
  • preprocessing may include, but is not limited to, cutting, straightening, grayscale, and/or denoising.
  • the cutting may be cutting out and saving the region of the recognized object in the original image, and discarding the remaining non-recognition object region.
  • the original image may be processed by an object detection algorithm to obtain the region of the recognized object in the original image. Specifically, taking the identification object as the certificate as an example, the original image is processed by the object detection algorithm, which can clearly display the certificate area in the original image, and at the same time weaken the non-document area in the original image, so that the location can be accurately and effectively located. The location of the document in the original image.
  • object detection algorithms may include, but are not limited to, edge detection methods, mathematical morphology methods, texture analysis based localization methods, line detection and edge statistics methods, genetic algorithms, contour line methods, wavelet transform based methods and neural networks, etc.
  • the correction may be to make the recognition object area in the original image be located in the target position, for example, make the document area in the original image in the horizontal direction.
  • the rectification processing method includes, but is not limited to, using the prospectiveTransform( ) function in OpenCV to perform rectification processing.
  • Grayscale can be the conversion of a colored image to a grayscale image.
  • a grayscale image is a monochrome image with 256 grayscale gamuts or levels from black to white.
  • the grayscale method may be to use the imread function to obtain a grayscale image.
  • Denoising can refer to the process of reducing noise in digital images.
  • the method of denoising may be to use noise model or NL-Means algorithm or the like.
  • the acquiring module 210 may acquire the image to be recognized from the user terminal 130, and may also acquire the image to be recognized from a storage device.
  • Step 320 Detect the content text contained in the to-be-recognized image, and determine a plurality of detection frames. In some embodiments, this step 320 may be performed by the detection module 220 .
  • the content text may refer to text information contained in the image to be recognized.
  • the content text may be all text contained in the image to be recognized.
  • the detection box may be a bounding box generated after framing all the text contained in the image to be recognized.
  • the multiple detection boxes may be multiple text boxes determined by processing the image to be recognized by using a text detection algorithm.
  • the detection frame may also be a plurality of detection frames determined by processing the plurality of text boxes.
  • text detection algorithms include but are not limited to: PSENet (Progressive Scale Expansion Network) progressive scale expansion network, PANNet (Pixel Aggregation Network) pixel aggregation network, and DBNet (Differentiable Binarization Network) differentiable binarization network .
  • Step 330 constructing a layout map based on the plurality of detection frames.
  • this step 330 may be performed by the building module 230 .
  • the layout map may be a map constructed based on a plurality of detection frames and relationships among the plurality of detection frames.
  • the layout graph may include multiple nodes and multiple edges, the nodes correspond to the detection frames, and the edges correspond to the relationships between the detection frames and other detection frames.
  • the edge corresponds to the spatial positional relationship between the detection frame and other detection frames, and the spatial positional relationship may be a relative positional relationship, a distance relationship, or the like. It can be understood that the detection frame and other detection frames are all from multiple detection frames, and are different detection frames in the multiple detection frames.
  • nodes and edges each contain their own characteristics.
  • the characteristics of the nodes may reflect one or more of the following information: the location, size, shape and related image information of the detection frame.
  • the position of the detection frame may refer to the position of the detection frame in the image to be recognized.
  • the position of the detection frame may be represented by the position of any point (eg, a geometric center point) within the detection frame.
  • the size of the detection frame may include the width and height of the detection frame.
  • the processing device 110 may obtain position, size, and shape information of the detection frame using a text detection algorithm.
  • the related image information may be related information of the region image determined based on the detection frame.
  • the area image may refer to an image corresponding to the area of the to-be-recognized image framed by the detection frame.
  • the relevant image information may include one or more of RGB values, grayscale values, and histogram of oriented gradient (Histogram of Oriented Gradient, HOG) features of the area image.
  • the characteristics of the nodes may be represented by vectors.
  • the features of the edges reflect the relationship between the features of the nodes corresponding to the detection frame and the features of the nodes corresponding to other detection frames.
  • the features of the edge can reflect one or more of the following information: distance information and relative position information between the detection frame and other detection frames.
  • the relative position information may be the relative position relationship between the detection frame and other detection frames, for example, the other detection frames are located directly above, directly below, directly left, right, 30° or 250° from the detection frame ° and other azimuth information.
  • the distance information may include distance relationships between the detection frame and other detection frames.
  • the distance between a specific point (eg, a geometric center point) of a detection frame and a corresponding specific point (eg, a geometric center point) of other detection frames can be used as the distance between the detection frame and other detection frames.
  • distance In some embodiments, the minimum distance between points in the detection frame and points in other detection frames may be used as the distance between the detection frame and other detection frames.
  • the distance may be a horizontal (eg, x-axis) distance, or a vertical (eg, y-axis) distance.
  • the relationship between the detection frame and other detection frames can be obtained by the feature of the node corresponding to the detection frame and the feature of the node corresponding to the other detection frame.
  • the distance relationship between the detection frame and other detection frames can be obtained by calculating the vector of the feature corresponding to the node based on the distance calculation formula. It can be understood that the distance relationship can be the feature distance.
  • the distance calculation formula may be a Euclidean distance calculation formula, a Manhattan distance calculation formula, or the like.
  • Other detection frames may be detection frames that have a specific relationship with the position or distance between the detection frames, or detection frames whose position and distance from the detection frames satisfy a preset condition, for example, other detection frames may be adjacent to the detection frame (ie. , and other detection frames may be adjacent detection frames of detection frames), wherein, adjacent may be one or more of horizontally adjacent and vertically adjacent (see below for how to determine adjacent detection frames).
  • the distance between other detection frames and the detection frame may also meet preset requirements (for example, smaller than the third preset threshold value or greater than the fourth preset threshold value, etc.), and the preset requirements can be customized.
  • Other detection frames may also be other situations, which are not limited in this embodiment. It can be understood that when other detection frames connected to the detection frame are determined by the distance, the determined detection frames may be adjacent detection frames or non-adjacent detection frames, which may be specifically determined according to the size of the third preset threshold.
  • the positions of the multiple detection frames may be sorted in the vertical direction and the horizontal direction respectively, so as to determine the adjacent detection frames of the detection frames.
  • the sorting in the horizontal direction is based on the same row (for how to determine the same row, see later).
  • the vertical direction may be top-to-bottom or bottom-to-top
  • the horizontal direction may be left-to-right or right-to-left.
  • the ranking results of each detection frame can be marked, for example, x-y, where x represents the vertical ranking, and y represents the horizontal ranking.
  • adjacent detection frames that are horizontally adjacent or vertically adjacent to the detection frame can be determined according to the sorting result, that is, the detection frame adjacent to the sorting result is the adjacent detection frame, x adjacent means horizontal adjacent, y adjacent means vertical. directly adjacent.
  • the detection frame 3-1 and the detection frame 3-2 are horizontally adjacent, the detection frame 2-1 is vertically adjacent to the detection frame 3-1, and the detection frame 2-1 is vertically adjacent to the detection frame 3-2. adjacent.
  • adjacent detection frames may also be determined in other manners, which are not limited in this embodiment. For example, by whether the distance size is smaller than a certain threshold, etc.
  • other detection frames may be other positional relationships, which are not limited in this embodiment.
  • any two of the plurality of detection frames may be connected to form.
  • the process of constructing the layout map will be described below with reference to FIG. 4 , FIG. 5 and FIG. 6 .
  • the edges in the layout diagram connect the detection frame and other detection frames, wherein the other detection frames are detection frames adjacent to the detection frame.
  • the layout map 430 when the layout map 430 is constructed, the layout map includes 6 nodes, and each node corresponds to a detection frame (ie, one of the detection frames 1-1' to 6-1').
  • the edge-connected detection frames include: vertically adjacent detection frames 1-1' and 2-1', detection frames 2-1' and 3-1', detection frames 3-1' and 4-1', detection frames 4-1' and 5-1', and detection frames 5-1' and 6-1'.
  • the layout includes 12 nodes, each node corresponds to a detection frame (ie, one of the detection frames 1-1 to 11-1), and the edge in the layout 530
  • the connected detection frames include: vertically adjacent detection frames 1-1 and 2-1, detection frames 2-1 and 3-1, detection frames 2-1 and 3-2, and detection frames 3-1 and 4-1 , detection frames 3-2 and 4-1, detection frames 4-1 and 5-1, etc., horizontally adjacent detection frames 3-1 and 3-2.
  • the edges in the layout diagram connect the detection frame and other detection frames, wherein the other detection frames may be detection frames whose distance from the detection frame is smaller than the third preset threshold.
  • the layout map 630 The edges in are connected to the detection frame including: detection frame 1-1' and 2-1', 1-1' and 3-1', 1-1' and 4-1', 2-1' and 3-1 ', 2-1' and 4-1', 3-1' and 4-1', 4-1' and 5-1', 5-1' and 6-1'.
  • Step 340 Process the layout by using the trained graph neural network model, determine the field type of the detection frame in the layout, and identify the certificate based on the field type. In some embodiments, this step 340 may be performed by the classification module 240 .
  • the graph neural network model may be a pre-trained machine learning model.
  • the trained graph neural network model can process the layout map to determine the field category of the detection frame in the layout map.
  • different types of credentials correspond to different trained graph neural network models, that is, there is a corresponding graph neural network model for the credential, and the corresponding graph neural network model is obtained by training on a training set constructed based on the credential. For the training of the graph neural network model, see Figure 8 and its related description.
  • the field category may refer to the category to which the text within the detection box belongs.
  • the field category can be the type of vehicle permitted, file number, place of issue, name, date, and others.
  • the trained graph neural network model can process the layout to determine the probability that the detection frame in the layout belongs to each predetermined field type, wherein the predetermined field category is determined by the samples of the training graph neural network model.
  • Label OK The probability that the detection frame belongs to each predetermined field type can be represented by a probability distribution.
  • a probability distribution can be a 1*n vector of real numbers, where n is the dimension of the vector, and n can be 1, 2, 3, etc.
  • the probability distribution of the detection frame may be a 1*6 real vector.
  • the probability distribution can be in the form of (a,b,c,d,e,f), where a represents the probability that the field type of the detection frame is a quasi-driving model, and b represents the probability that the field type of the detection frame is the file number , c represents the probability that the field type of the detection frame is the place of issuance, d represents the probability that the field type of the detection frame is a name, e represents the probability that the field type of the detection frame is a date, and f represents the probability that the field type of the detection frame is other.
  • the classification module 240 may determine the field category of the detection box based on the probability distribution. For example, the field category corresponding to the maximum probability value in the probability distribution is determined as the field category of the detection frame.
  • the credential may be identified based on the field class. Specifically, based on the field category of the detection frame, a content frame related to a preset service is determined; the text in the content frame is identified based on an identification algorithm, and the text content in the content frame is determined.
  • the recognition algorithm may include any text recognition algorithm, eg, OCR recognition.
  • the preset business can be customized, for example, the online car-hailing business, or, for example, driver authentication in the online car-hailing business.
  • the content box may refer to a detection box corresponding to a field type related to a preset service.
  • the content frame may be a detection frame corresponding to a field type required to implement a preset service.
  • the preset business is online car-hailing driver authentication
  • the certificate corresponding to the image to be recognized is an ID card
  • the field types related to the preset business include ID number, name, age, gender, household registration address, etc.
  • these The detection box corresponding to the field type is the content box.
  • the preset service can be implemented based on the text content in the content box.
  • the default business is driver authentication.
  • the text of the detection box ie, the content box
  • the key field can be obtained by determining the detection frame, and the key field can be identified to extract the key information and key information in the certificate. That is, the text content information of the key field.
  • the graph neural network model may comprise a multi-layer graph neural network.
  • each node of each layer receives information from the nodes connected to it (for example, adjacent), and performs information fusion between nodes.
  • Nodes in each layer can perform information fusion with more distant nodes (eg, nodes that are not connected or adjacent to it), improving classification accuracy.
  • the embodiments of this specification use the graph neural network model to process the layout of the document, and use the deep learning method to solve the problem of layout analysis of the document, without specifying complex matching rules. related content frame; on the other hand, the embodiment of this specification uses the graph neural network model to analyze the layout, and can make full use of the information of the detection frame and its surrounding detection frames in the layout, even if the layout of the certificate has changed, such as a certain text The field of information has changed from one line to two lines. Since the information of the surrounding detection frame has not changed, the correct field category of the text information can also be obtained, and the classification accuracy is high; on the other hand, the graph neural network model can still mine incomplete information.
  • the graph neural network model in the embodiment of this specification can resist the interference of the incomplete document or the document not corrected to the horizontal state, etc., and obtain the correct classification.
  • FIG. 7 is a flowchart of a method for determining a plurality of detection frames according to some embodiments of the present specification. As shown in FIG. 7, the process 700 may include the following steps:
  • Step 710 Obtain the type of the certificate. In some embodiments, this step 710 may be performed by the detection module 220 .
  • the type of credential may be the type of credential to which the image to be identified corresponds.
  • the type of credential may reflect the purpose or/and language information of the credential.
  • the type of the certificate can reflect usage information such as ID card, driver's license or driving license, and can also reflect language information such as Chinese or English.
  • the detection module 220 may obtain the type of credential from the user terminal 130 . For example, the user uploads a picture of a certain certificate on the user terminal 130, and the user fills in and selects it by himself, or the user terminal automatically recognizes and determines the type of the certificate.
  • Step 720 Process the to-be-recognized image based on a text detection algorithm to determine multiple text boxes. In some embodiments, this step 720 may be performed by the detection module 220 .
  • the text detection algorithm may be an algorithm for detecting text in documents.
  • the detection algorithm can use any text detection algorithm, including but not limited to: PSENet (Progressive Scale Expansion Network) progressive scale expansion network, PANNet (Pixel Aggregation Network) pixel aggregation network, and DBNet (Differentiable Binarization Network) Differentiable binarization networks, etc.
  • the text box may be a bounding box automatically generated by processing the image to be recognized based on a text detection algorithm.
  • a text box is a bounding box with specific content as a unit, where the specific content can be a word, a line of text, or a single word.
  • the text detection algorithm may generate different text boxes based on the type of text in the image to be recognized. For example, when the certificate contains English text, the text detection algorithm can frame the English text of the certificate line by line in units of words to generate multiple text boxes. It can be understood that the text in the text box determined in this embodiment is single English word. For another example, when the image to be recognized contains Chinese, the text detection algorithm can frame the Chinese text of the certificate in units of behavior, and generate multiple text boxes.
  • the text in the text box determined in this embodiment is a line of Chinese. text.
  • the text detection algorithm can frame the Chinese text of the certificate in units of single characters to generate multiple text boxes. It can be understood that the text in the text box determined in this embodiment is: one word.
  • Step 730 When the type belongs to a preset type, the multiple text boxes are processed based on the preset rule corresponding to the preset type, and the multiple detection frames are determined. In some embodiments, this step 730 may be performed by the detection module 220 .
  • the preset type can be specifically set according to the actual situation.
  • the preset type can be Chinese driver's license or Chinese ID card, etc.
  • Preset rules refer to rules for processing text boxes.
  • the preset rules represent the situations in which text box merging can be performed, and the manner of merging.
  • the preset rules may also be other processing rules, which are not limited in this embodiment. For example, rules for text box segmentation, etc. Different certificates have different preset rules. Therefore, different preset types have corresponding preset rules.
  • the detection frame may be a text frame. In some embodiments, the detection frame may also be a frame obtained after processing the text frame.
  • the number of text boxes directly determined by the text detection algorithm is generally large, and there may be multiple text boxes in a row.
  • the text box determined by the text detection algorithm is in units of words, and the more English words in the document, the more text boxes.
  • the detection module 220 may process multiple text boxes based on preset rules corresponding to preset types to determine multiple detection boxes.
  • the determination of the multiple detection frames is specifically: determine the text boxes to be merged in the same row in the document (see below for how to determine the text frames to be merged in the same row); determine the document at least one to-be-merged line of the to-be-merged line; merge the to-be-merged text boxes of the to-be-merged line.
  • Determining the rows to be merged can be: sorting the text boxes in the document (the sorting of the text boxes is similar to the sorting of the detection frames, see the above for details), and determining whether it corresponds to the merged reference row based on the sorting result, for example, if the merged reference row is in the first If there are three rows, the row to be merged is also the third row.
  • the second line in the certificate corresponding to the preset type is of the same field type, based on the text detection algorithm, it is determined that the second line of the certificate has 2 text boxes (as shown in 410), then the 2 The text boxes are merged to obtain a detection box (shown as 420).
  • some lines of the document may contain more than one field type, such as age and gender on the same line, and for such documents, multiple text boxes on a specific line are not merged.
  • the third line is not merged (as shown in 520), that is, the detection of the third line
  • the box is a text box.
  • determining the text boxes to be merged in the same row in the document includes: judging the degree of coincidence of coordinate values corresponding to the text box and other text boxes in the vertical direction; in response to the degree of coincidence being greater than a first preset threshold, Identify the text box and other text boxes as the text boxes to be merged on the same line.
  • the y-axis coordinate value range corresponding to the text box and the y-axis coordinate value range corresponding to other text boxes are determined. Further, based on the coordinate value ranges of the two text boxes, determine The degree of coincidence of the coordinate values of the text box and other text boxes on the y-axis. Specifically, the degree of coincidence is a range in which the overlapping range of the coordinate values of the two text boxes on the y-axis accounts for the entire range of coordinate values occupied by the two text boxes on the y-axis. As shown in FIG.
  • the y-axis coordinate value range of the text box 1010 is (y4, y3)
  • the y-axis coordinate value range of other text boxes 1020 is (y2, y1)
  • the overlapping range of the coordinate values of the two text boxes is (y4, y1)
  • the first preset threshold may be specifically set according to actual requirements. For example, 80%, 95%, etc.
  • the detection module 220 may determine the text box and other text boxes as the to-be-merged text boxes located in the same row as the text box in response to the coincidence degree being greater than the first preset threshold.
  • multiple text boxes obtained by a text detection algorithm can be combined in advance to determine multiple detection frames, and then a layout map can be constructed based on the multiple detection frames.
  • the number of nodes in the layout graph is reduced, the result of the constructed layout graph is simplified, and the processing efficiency of the layout graph by the graph neural network model is improved.
  • FIG. 8 is another flowchart of a method for determining a plurality of detection frames according to some embodiments of the present specification. As shown in FIG. 8, the process 800 may include the following steps:
  • Step 810 Process the to-be-recognized image based on a text detection algorithm to determine multiple text boxes. In some embodiments, this step 810 may be performed by the detection module 220 .
  • step 810 For the specific details of step 810, reference may be made to the foregoing step 720, and details are not repeated here.
  • Step 820 Determine whether the distance between the text box and other text boxes is less than a second preset threshold, and whether the font size of the content in the text box and the content in the other text boxes is the same. In some embodiments, this step 820 may be performed by the detection module 220.
  • the distance between the text box and other text boxes can be referred to FIG. 3 and related descriptions.
  • the detection module 220 may determine the font size based on the size of the text box (eg, the height of the text box). For example, based on a preset rule, the font size corresponding to the size of the text box is determined.
  • the second preset threshold may be specifically set according to actual requirements. For example, 3 or 5 etc.
  • Step 830 in response to the distance between the text box and the other text boxes being less than the second preset threshold, and the content in the text box and the content in the other text boxes have the same font size, merge The text box and the other text boxes determine the detection box. In some embodiments, this step 830 may be performed by the detection module 220 .
  • the type of the certificate can be ignored, as long as the distance between the text boxes based on the certificate image meets the requirements and the font size is the same, the combination can be performed.
  • FIG. 9 is a flowchart of a method for training a graph neural network model according to some embodiments of the present specification.
  • a supervised learning method is used for training.
  • different types of credentials can train corresponding graph neural network models, and the trained graph neural network models can be used to determine the types of fields of the corresponding credentials.
  • a corresponding training set is constructed based on an image of an ID card, and a graph neural network model for identifying field types in the ID card is trained based on the corresponding training set.
  • the training process 900 may include the following steps:
  • Step 910 Obtain a sample training set, where the sample training set includes: a plurality of sample layouts established based on a plurality of sample images of the certificate, and a label corresponding to at least one sample node of the sample layout.
  • this step 910 may be performed by the processing device 110 .
  • the sample training set may be data input to the initial graph neural network model for training the graph neural network model.
  • the sample image refers to the image obtained based on the certificate and used to establish the sample layout for training.
  • the sample image may be a complete image of the document, an incomplete image of the document, and an image of a different layout of the document.
  • the full image of the document is the image that contains all the content of the document.
  • An incomplete image of a document is an image that contains part of the document.
  • the incomplete image of the document may be obtained by cropping the complete image of the document.
  • the incomplete image of the certificate may be obtained by covering part of the content in the certificate, etc., and there is no limitation on the acquisition method of the incomplete image of the certificate.
  • the complete image of the certificate can be the certificate image containing all the information shown in Fig. 5, and the incomplete image of the certificate can be cropped on the driver's license image.
  • the obtained document image for example, crop the obtained document image that only contains the information in the detection frames 1-1 to 7-1.
  • the images with different layouts of the certificate refer to the images obtained based on the different layouts of the certificate.
  • Different layouts refer to all possible types of layouts of the certificate.
  • the address information in the ID card may be two or three lines in the layout. In many cases, there may be differences in the layout of ID cards of different people (that is, the number of lines occupied by the address is different).
  • the sample layout is the layout used for training, which is obtained based on the sample images.
  • the sample node of the sample layout corresponds to the sample detection frame of the sample image
  • the sample edge in the sample layout corresponds to the spatial positional relationship between the sample detection frame and other sample detection frames
  • the label corresponding to the sample node represents the sample node The category of the field in the corresponding sample detection box.
  • the application of the trained model and the manner of determining the detection frame during the training process may be consistent.
  • each field type corresponds to the relationship between the detection boxes, so the trained graph neural network model can be used to determine the type of fields in the document in different situations.
  • other sample detection frames in the sample layout may be frames that have a specific positional relationship with the sample detection frame, and the specific positional relationship does not include more than one relationship.
  • other sample detection frames may be adjacent to the sample detection frames, and the distances between the other sample detection frames and the sample detection frames may also meet preset requirements. Therefore, for the sample detection frame, as its positional relationship with other sample detection frames changes, other sample detection frames connected to it also change, and the arrangement and layout of the sample detection frame and other sample detection frames are different. , and then construct different layouts.
  • the merging rule may be located on the same line and of the same text type. Among them, whether the text types are the same can be manually determined.
  • Step 920 based on the sample training set, train to obtain the trained graph neural network model.
  • this step 920 may be performed by the processing device 110 .
  • the trained graph neural network model can be obtained by training based on the sample training set.
  • the parameters of the initial graph neural network can be iteratively updated based on the sample training set to reduce the loss function value corresponding to the sample node of each sample layout to obtain a trained graph neural network model.
  • the parameters of the initial graph neural network model can be iteratively updated to reduce the loss function value corresponding to the sample node of each sample layout, so that the loss function value satisfies the preset condition. For example, the loss function value converges, or the loss function value is smaller than a preset value.
  • the model training is completed, and the trained graph neural network model is obtained.
  • the trained loss function may be established based on the difference between the labels corresponding to the sample nodes and the predicted values output by the sample nodes.
  • the predicted value output by the sample node may be the predicted value of the sample node obtained after the initial graph neural network model processes the sample layout. For example, it can be the sum of the loss functions corresponding to all sample nodes, and the loss function of each sample node is established based on the difference between the predicted value and the label output by the node. It is understandable that each node in the layout graph is trained by means of supervised learning, that is, each node has a corresponding loss function, and the parameters of the graph neural network model are updated through the loss function of all nodes to complete the training.
  • the way to build the loss function can be cross entropy or squared difference etc.
  • Embodiments of this specification further provide an apparatus for identifying documents based on a graph neural network
  • the apparatus includes a processor and a memory, where the memory is used for storing instructions, and the processor is used for executing the instructions, so as to implement any of the preceding items Operations corresponding to the method for identifying documents based on a graph neural network.
  • the embodiments of the present specification further provide a computer-readable storage medium.
  • the storage medium stores computer instructions, and when the computer instructions are executed by the processor, the operations corresponding to the method for identifying a certificate based on a graph neural network as described in any preceding item are implemented.
  • the possible beneficial effects of the embodiments of this specification include, but are not limited to: (1)
  • the embodiments of this specification use the graph neural network model to process the layout of the certificate, and use the deep learning method to solve the layout analysis problem of the certificate without specifying complex (2)
  • the graph neural network model in the embodiment of this specification can tolerate large layout changes in the certificate, and can resist the incomplete certificate or the certificate is not corrected to a horizontal state (3)
  • the embodiment of this specification can improve the efficiency of subsequent identification of the certificate by determining the type of the field in the certificate. Specifically, it can be selected based on the determined field type. For the fields related to the preset service, only the specific content of the field related to the preset service is further identified, and the specific content of all fields is avoided to be identified. It should be noted that different embodiments may have different beneficial effects, and in different embodiments, the possible beneficial effects may be any one or a combination of the above, or any other possible beneficial effects.
  • aspects of this specification may be illustrated and described in several patentable categories or situations, including any new and useful process, machine, product, or combination of matter, or combinations of them. of any new and useful improvements. Accordingly, various aspects of this specification may be performed entirely in hardware, entirely in software (including firmware, resident software, microcode, etc.), or in a combination of hardware and software.
  • the above hardware or software may be referred to as a "data block”, “module”, “engine”, “unit”, “component” or “system”.
  • aspects of this specification may be embodied as a computer product comprising computer readable program code embodied in one or more computer readable media.
  • a computer storage medium may contain a propagated data signal with the computer program code embodied therein, for example, on baseband or as part of a carrier wave.
  • the propagating signal may take a variety of manifestations, including electromagnetic, optical, etc., or a suitable combination.
  • Computer storage media can be any computer-readable media other than computer-readable storage media that can communicate, propagate, or transmit a program for use by coupling to an instruction execution system, apparatus, or device.
  • Program code on a computer storage medium may be transmitted over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or a combination of any of the foregoing.
  • the computer program coding required for the operation of the various parts of this manual may be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python etc., conventional procedural programming languages such as C language, Visual Basic, Fortran2003, Perl, COBOL2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
  • the program code may run entirely on the user's computer, or as a stand-alone software package on the user's computer, or partly on the user's computer and partly on a remote computer, or entirely on the remote computer or processing device.
  • the remote computer can be connected to the user's computer through any network, such as a local area network (LAN) or wide area network (WAN), or to an external computer (eg, through the Internet), or in a cloud computing environment, or as a service Use eg software as a service (SaaS).
  • LAN local area network
  • WAN wide area network
  • SaaS software as a service

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种基于图神经网络识别证件的方法及系统,所述方法包括:获取待识别图像(310);检测所述待识别图像中包含的内容文本,确定多个检测框(320);基于所述多个检测框构建版面图(330);其中,所述版面图包括多个节点和多个边,所述节点对应所述检测框,所述边对应所述检测框与其它检测框之间的空间位置关系;利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别(340),并对关键字段进行文字识别,从而提取证件中的关键信息。

Description

一种基于图神经网络识别证件的方法及系统
交叉引用
本申请要求2020年08月26日提交的中国申请号202010870570.1的优先权,全部内容通过引用并入本文。
技术领域
本说明书实施例涉及图像处理技术领域,特别涉及一种基于图神经网络识别证件的方法及系统。
背景技术
证件是记录个人或组织基本信息的重要凭证。为了保障经济活动的正常进行,保护社会安全,证件在社会活动中的诸多领域得到广泛应用。随之而来的,越来越多的应用平台,如网约车平台、借贷平台等,需要采集和登记相应证件中的文本信息,以完成业务,例如,进行实名制认证等。然而,在利用证件中的文本之前(例如,确定填写的内容是否与证件中的文本内容一致等),首先需要确定文本对应的类型,即属于证件中的什么信息。
为此,本说明书实施例提出一种基于图神经网络识别证件的方法,确定证件图像中文本的类别。
发明内容
本说明书实施例的一个方面提供一种基于图神经网络识别证件的方法,所述方法包括:获取待识别图像;检测所述待识别图像中包含的内容文本,确定多个检测框;基于所述多个检测框构建版面图;其中,所述版面图包括多个节点和多个边,所述节点对应所述检测框,所述边对应所述检测框与其它检测框之间的空间位置关系;利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别,基于所述字段类别识别证件。
本说明书实施例的一个方面提供一种基于图神经网络识别证件的系统,所述系统包括:获取模块,用于获取待识别图像;检测模块,用于检测所述待识别图像中包含的内容文本,确定多个检测框;构建模块,用于基于所述多个检测框构建版面图;其中,所述版面图包括多个节点和多个边,所述节点对应所述检测框,所述边对应所述检测框与其它检测框之间的空间位置关系;分类模块,用于利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别,基于所述字段类别识别 证件。
本说明书实施例的一个方面提供一种基于图神经网络识别证件的装置,所述装置包括处理器以及存储器,所述存储器用于存储指令,所述处理器用于执行所述指令,以实现如前任一项所述的基于图神经网络识别证件的方法对应的操作。
本说明书实施例的一个方面提供一种计算机可读存储介质,所述存储介质存储计算机指令,当计算机读取存储介质中的计算机指令后,实现如前一项所述的基于图神经网络识别证件的方法对应的操作。
附图说明
本说明书将以示例性实施例的方式进一步描述,这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的,在这些实施例中,相同的编号表示相同的结构,其中:
图1是根据本说明书的一些实施例所示的基于图神经网络识别证件的系统的应用场景示意图;
图2是根据本说明书的一些实施例所示的基于图神经网络识别证件的系统的模块图;
图3是根据本说明书的一些实施例所示的基于图神经网络识别证件的方法的流程图;
图4是根据本说明书的一些实施例所示的由多个检测框构建版面图的示例性示意图;
图5是根据本说明书的一些实施例所示的由多个检测框构建版面图的另一示例性示意图;
图6是根据本说明书的一些实施例所示的由多个检测框构建版面图的另一示例性示意图;
图7是根据本说明书的一些实施例所示的确定多个检测框的方法的流程图;
图8是根据本说明书的一些实施例所示的确定多个检测框的方法的另一流程图;
图9是根据本说明书的一些实施例所示的训练图神经网络模型的方法的流程图;
图10是根据本说明书的一些实施例所示的位于同一个坐标轴的两个文本框的示例性示意图。
具体实施方式
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本说明书的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本说明书应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。
应当理解,本说明书中所使用的“系统”、“装置”、“单元”和/或“模组”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而,如果其他词语可实现相同的目的,则可通过其他表达来替换所述词语。
如本说明书和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其它的步骤或元素。
本说明书中使用了流程图用来说明根据本说明书的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
图1是根据本说明书的一些实施例所示的基于图神经网络识别证件的系统的应用场景示意图。本说明书实施例所披露的基于图神经网络识别证件的系统可以应用于基于图像进行文本识别的场景。例如,基于证件图像自动录入证件中的文本信息。仅作为示例,当用户注册应用平台时,例如司机注册网约车平台,平台需要对司机的身份信息和车辆信息等进行审核,审核的信息来源包括司机的身份证、行驶证以及驾驶证等证件。由于证件中通常包含大量的文本信息,而应用平台需要获取的文本信息通常较少,因此,在对证件进行识别之前,可以预先确定证件中的文本信息的类别,基于类别筛选出需要获取的文本信息,仅对该文本信息进行识别。
在一些实施例中,可以通过证件的固定格式设置相应的匹配规则,基于该匹配规则确定证件中对应位置的文本信息的类别。例如,对大量的证件的文本位置进行统计,生成一个固定的模板,建立该模板中不同位置与对应文本类别的匹配关系,从而确定模板中每个位置对应的文本类别。然而,该方式存在以下特点:(1)对于版面存在变化的证件,会导致类别匹配错误。例如,模板中对应位置的字段为准驾车型,而识别的证件中与模板对应位置匹配的位置处的字段由一行文本变为了两行,该情况下会导致类别 匹配错误;(2)对于不完整证件的图像,由于其无法与模板准确匹配,同样会导致类别匹配错误。
因此,本说明书实施例提出一种基于图神经网络识别证件的方法。采用图神经网络模型对证件图像中的文本信息进行分类,不依赖于证件中文本信息的相对位置,无需制定复杂的匹配规则,且对于存在较大版面变化的证件以及不完整的证件,仍然能够得到正确的类别,提高了分类准确率。
如图1所示,基于图神经网络识别证件的系统的应用场景100可以包括处理设备110、网络120以及用户终端130。
处理设备110可用于处理与基于图神经网络识别证件相关联的信息和/或数据来执行在本说明书中揭示的一个或者多个功能。在一些实施例中,处理设备110可以获取待识别图像。在一些实施例中,处理设备110可以检测待识别图像中包含的内容文本,确定多个检测框。在一些实施例中,处理设备110可以基于多个检测框构建版面图。在一些实施例中,处理设备110可以利用训练好的图神经网络模型对版面图进行处理,确定版面图中检测框的字段类别,基于字段类别识别证件。在一些实施例中,处理设备110可以包括一个或多个处理引擎(例如,单核心处理引擎或多核心处理器)。仅作为范例,处理设备110可以包括中央处理器(中央处理器)、特定应用集成电路(ASIC)、专用指令集处理器(ASIP)、图像处理器(GPU)、物理运算处理单元(PPU)、数字信号处理器(DSP)、现场可程序门阵列(FPGA)、可程序逻辑装置(PLD)、控制器、微控制器单元、精简指令集计算机(RISC)、微处理器等中的一种或多种组合。在一些实施例中,处理设备中可以包含一个或多个存储设备,用于存储处理设备需要处理的数据或者处理的结果数据等。例如,存储设备中可以存储待识别图像等。
网络120可以促进信息和/或数据的交换。在一些实施例中,场景图100的一个或者多个组件(例如处理设备110、用户终端130)可以通过网络120传送信息至场景图100的其他组件。例如,处理设备110可以通过网络120从用户终端130获取待识别图像。又例如,用户终端130可以通过网络120获取处理设备110对证件的识别结果。在一些实施例中,网络120可以是任意形式的有线或者无线网络,或其任意组合。仅作为范例,网络120可以是有线网络、光纤网络、远程通信网络、内部网络、互联网、局域网(LAN)、广域网(WAN)、无线局域网(WLAN)、城域网(MAN)、广域网(WAN)、公共交换电话网络(PSTN)、蓝牙网络等中的一种或多种组合。
用户终端130可以是带有数据获取、存储和/或发送功能的设备。在一些实施例 中,用户终端130包含有拍摄设备。在一些实施例中,用户终端130可以通过拍摄设备获取待识别图像。在一些实施例中,用户终端130可以接收处理设备110对证件的识别结果。在一些实施例中,用户终端130的使用者可以是使用应用平台的在线服务的用户。例如使用网约车平台的经营服务的用户。在一些实施例中,用户终端130可以包括但不限于移动设备130-1、平板电脑130-2、笔记本电脑130-3、台式电脑130-4等或其任意组合。示例性的移动设备130-1可以包括但不限于智能手机、个人数码助理(Personal Digital Assistance,PDA)等或其任意组合。在一些实施例中,用户终端130可以将获取到的数据发送至基于图神经网络识别证件的场景100中的一个或多个设备。
应当注意的是,以上应用场景100中的各个部件的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对应用场景100中的部件进行添加或减少。然而,这些改变仍在本说明书的范围之内。
图2是根据本说明书的一些实施例所示的基于图神经网络识别证件的系统的模块图。如图2所示,该系统200可以包括获取模块210、检测模块220、构建模块230、以及分类模块240。
获取模块210可以用于获取待识别图像。
检测模块220可以用于检测所述待识别图像中包含的内容文本,确定多个检测框。在一些实施例中,所述检测模块220可以还用于:获取所述证件的类型;基于文本检测算法对所述待识别图像进行处理,确定多个文本框;当所述类型属于预设类型,基于所述预设类型对应的预设规则对所述多个文本框进行处理,确定所述多个检测框。
在一些实施例中,所述预设类型对应的证件存在至少一个合并参考行,所述合并参考行中的字段类型相同,所述检测模块可以还用于:确定所述证件中位于同一行的待合并文本框;确定所述证件的至少一个待合并行,所述待合并行与所述合并参考行对应;将所述待合并行的待合并文本框进行合并,确定所述检测框。
在一些实施例中,所述检测模块220可以还用于:判断所述文本框与其他文本框在竖直方向上对应的坐标值的重合度;响应于所述重合度大于第一预设阈值,将所述文本框和所述其他文本框确定为所述位于同一行的待合并文本框。
在一些实施例中,所述检测模块220可以还用于:基于文本检测算法对所述待识别图像进行处理,确定多个文本框;判断所述文本框和其他文本框之间距离是否小于第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字号是否相同;响应于所述文本框和所述其他文本框之间所述距离小于所述第二预设阈值,以及所述文本框 中内容和所述其他文本框中内容的字号相同,合并所述文本框和所述其他文本框,确定所述检测框。
在一些实施例中,构建模块230可以用于基于所述多个检测框构建版面图;其中,所述版面图包括多个节点和多个边,所述节点对应所述检测框,所述边对应所述检测框与其它检测框之间的空间位置关系。在一些实施例中,所述节点的特征反映以下信息中的一种或多种:所述检测框的位置、大小、形状和相关的图像信息,所述相关的图像信息是基于所述检测框确定的区域图像的相关信息。在一些实施例中,所述边的特征反映以下信息中的一种或多种:所述检测框与所述其它检测框之间的距离信息和相对位置信息。
在一些实施例中,所述构建模块可以还用于:从所述多个检测框中,确定与所述检测框水平相邻或/和竖直相邻的至少一个其他检测框;将所述多个检测框中每一个及其对应的至少一个其他检测框进行连接,构成所述版面图。
在一些实施例中,所述构建模块可以还用于:从所述多个检测框中,确定与所述检测框之间的距离满足预设要求的至少一个其他检测框;将所述多个检测框中每一个及其对应的至少一个其他检测框进行连接,构成所述版面图。
在一些实施例中,分类模块240可以用于利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别,基于所述字段类别识别证件。所述图神经网络模型通过如下方法训练得到:
获取样本训练集,所述样本训练集包括:基于所述证件的多个样本图像建立的多个样本版面图,和所述样本版面图的至少一个样本节点对应的标签,其中,所述样本图像为所述证件的完整图像、所述证件的非完整图像或所述证件的不同排版的图像;所述样本版面图的样本节点对应所述样本图像的样本检测框,所样本版面图中样本边对应所述样本检测框与其它样本检测框之间的空间位置关系,所述样本节点对应的标签表征所述样本节点对应的样本检测框中字段的类别;基于所述样本训练集,训练得到所述训练好的图神经网络模型;其中,训练的损失函数基于所述样本节点对应的标签和所述样本节点输出的预测值之间的差异建立。
所述系统200还包括识别模块250,用于:基于所述检测框的字段类别,确定与预设业务相关的内容框;基于识别算法对所述内容框中的文本进行识别,确定所述内容框中的文本内容。
应当理解,图2所示的系统及其模块可以利用各种方式来实现。例如,在一些 实施例中,系统及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和系统可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的系统及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。
需要注意的是,以上对于基于图神经网络识别证件的系统200及其模块的描述,仅为描述方便,并不能把本说明书限制在所举实施例范围之内。可以理解,对于本领域的技术人员来说,在了解该系统的原理后,可能在不背离这一原理的情况下,对各个模块进行任意组合,或者构成子系统与其他模块连接。例如,图2中披露的获取模块210、检测模块220、构建模块230、分类模块240以及识别模块250可以是一个系统中的不同模块,也可以是一个模块实现上述的两个模块的功能。又例如,基于图神经网络识别证件的系统200中各个模块可以共用一个存储模块,各个模块也可以分别具有各自的存储模块。诸如此类的变形,均在本说明书的保护范围之内。
图3是根据本说明书的一些实施例所示的基于图神经网络识别证件的方法的流程图。在一些实施例中,流程300可以由基于图神经网络识别证件的系统,或图1所示的处理设备110实现。如图3所示,该流程300可以包括以下步骤:
步骤310,获取待识别图像。在一些实施例中,该步骤310可以由获取模块210执行。
待识别图像可以是指任何需要对图像中的文本信息进行识别的图像。在一些实施例中,待识别图像是对识别对象进行成像后,获取的图像,其中,该识别对象中存在需要识别的文本信息。在一些实施例中,识别对象可以是证件或与证件相关的物体,其中,证件可以是任意证件,如,身份证、驾驶证或者行驶证等。对应的,在一些实施例中,待识别图像可以是证件或与证件相关的物体的图像。例如,对证件或者证件相关的物体(例如,证件的复印件等)进行成像后,获得的图像。
在一些实施例中,待识别图像可以是经过预处理的图像。在一些实施例中,待 识别图像可以是对识别对象(例如,证件)进行成像后得到的原始图像进行预处理后得到的图像。在一些实施例中,预处理可以包括但不限于:切割、矫正、灰度化、和/或去噪。
其中,切割可以是将原始图像中识别对象的区域切出来保存,剩余的非识别对象区域舍弃。在一些实施例中,可以通过对象检测算法对原始图像进行处理,得到原始图像中的识别对象的区域。具体的,以识别对象为证件为例,通过对象检测算法对原始图像进行处理,可以清楚地显示出原始图像中的证件区域,同时使原始图像中的非证件区域减弱,从而能准确有效地定位出证件在原始图像中的位置。在一些实施例中,对象检测算法可以包括但不限于:边缘检测法、数学形态学法、基于纹理分析的定位方法、行检测和边缘统计法、遗传算法、轮廓线法、基于小波变换的方法和神经网络等。
矫正可以是使得原始图像中的识别对象区域位于目标位置,例如,使得原始图像中的证件区域位于水平方向。在一些实施例中,矫正的处理方法包括但不限于采用OpenCV中的prespectiveTransform()函数进行矫正处理。
灰度化可以是将彩色的图像转化为灰度图像。灰度图像是一种从黑色到白色间256级灰度色域或等级的单色图像。在一些实施例中,灰度化的方法可以是采用imread函数获取灰度图像。
去噪可以是指减少数字图像中噪声的过程。在一些实施例中,去噪的方法可以是采用噪声模型或NL-Means算法等。
在一些实施例中,获取模块210可以从用户终端130获取待识别图像,还可以从存储设备中获取待识别图像。
步骤320,检测所述待识别图像中包含的内容文本,确定多个检测框。在一些实施例中,该步骤320可以由检测模块220执行。
在一些实施例中,内容文本可以是指待识别图像中包含的文本信息。例如,内容文本可以是待识别图像中包含的所有文本。
在一些实施例中,检测框可以是对待识别图像中包含的所有文本分别进行框定后生成的边界框。在一些实施例中,多个检测框可以是利用文本检测算法对待识别图像进行处理,确定的多个文本框。在一些实施例中,检测框还可以是对该多个文本框进行处理,确定的多个检测框。
在一些实施例中,文本检测算法包括但不限于:PSENet(Progressive Scale Expansion Network)渐进尺度扩展网络、PANNet(Pixel Aggregation Network)像素聚合网 络、以及DBNet(Differentiable Binarization Network)可微分的二值化网络。
关于确定多个检测框的具体细节参见图7和图8及其相关描述,此处不再赘述。
步骤330,基于所述多个检测框构建版面图。在一些实施例中,该步骤330可以由构建模块230执行。
在一些实施例中,版面图可以是基于多个检测框及多个检测框之间的关系构建的图。在一些实施例中,版面图可以包括多个节点和多个边,节点对应检测框,边对应检测框与其他检测框之间的关系。在一些实施例中,边对应检测框与其他检测框之间的空间位置关系,空间位置关系可以是相对位置关系、距离关系等。可以理解的,检测框和其他检测框均来自于多个检测框,且为多个检测框中不同的检测框。
在一些实施例中,节点和边分别包含有各自的特征。在一些实施例中,节点的特征可以反映以下信息中的一种或多种:检测框的位置、大小、形状和相关的图像信息。
检测框的位置可以是指检测框在待识别图像中的位置。在一些实施例中,可以用检测框中任意点(例如,几何中心点)的位置来代表检测框的位置。检测框的大小可以包括检测框的宽和高。在一些实施例中,处理设备110可以利用文本检测算法获得检测框的位置、大小以及形状信息。
在一些实施例中,相关的图像信息可以是基于检测框确定的区域图像的相关信息。在一些实施例中,区域图像可以是指检测框框定的待识别图像的区域对应的图像。在一些实施例中,相关的图像信息可以包括区域图像的RGB值、灰度值以及方向梯度直方图(Histogram of Oriented Gradient,HOG)特征等中的一种或多种。
在一些实施例中,节点的特征可以通过向量表示。
在一些实施例中,边的特征反映检测框对应节点的特征和其他检测框对应节点的特征之间的关系。例如,边的特征可以反映以下信息中的一种或多种:检测框与其他检测框之间的距离信息和相对位置信息。在一些实施例中,相对位置信息可以是检测框和其他检测框之间的相对位置关系,例如,其他检测框位于检测框正上方、正下方、正左方、正右方、30°或250°等方位信息。距离信息可以包括检测框与其他检测框之间的距离关系。在一些实施例中,可以用检测框的特定点(例如,几何中心点)与其他检测框的对应的特定点(例如,几何中心点)之间的距离作为检测框与其他检测框之间的距离。在一些实施例中,可以将检测框中的点和其他检测框中的点之间的最小距离作为检测框与其他检测框之间的距离。其中,距离可以是水平(例如,x轴)距离,也可以是竖直(例如,y轴)距离。
在一些实施例中,检测框与其他检测框之间的关系可以通过检测框对应的节点的特征和其他检测框对应的节点的特征得到。例如,检测框与其他检测框之间的距离关系可以基于距离计算公式对节点对应的特征的向量计算得到,可以理解的,该距离关系可以是特征距离。距离计算公式可以是欧式距离计算公式或曼哈顿距离计算公式等。
其他检测框可以是与检测框之间位置或距离存在特定关系的检测框,或者与检测框之间的位置距离满足预设条件的检测框,例如,其他检测框可以与检测框相邻(即,其他检测框可以是检测框的相邻检测框),其中,相邻可以是水平位置相邻和竖直位置相邻中一个或多个(关于如何确定相邻检测框见后文)。其他检测框也可以与检测框之间的距离满足预设要求(例如,小于第三预设阈值或大于第四预设阈值等),预设要求可以自定义。其他检测框还可以是其他情况,本实施例不做限制。可以理解的,通过距离确定与检测框相连的其他检测框时,确定的可以是相邻的检测框,也可以是非相邻的检测框,具体可以根据第三预设阈值大小决定。
在一些实施例中,确定待识别图像中多个检测框之后,可以分别对多个检测框在竖直方向和水平方向进行位置排序,从而确定检测框的相邻检测框。其中,水平方向的排序是基于同一行的(关于如何确定同一行,见后文)。例如,竖直方向可以是从上自下或从下自上,水平方向可以是从左至右或从右至左。在一些实施例中,可以标记每个检测框的排序结果,例如,x-y,x代表竖直方向的排序,y代表水平方向的排序。可以理解的,通过上述排序,可以表达不同行的检测框之间竖直位置关系,也可以表达同一行检测框之间的水平位置关系。如图5所示,第三行中有两个检测框,分别的排序结果为3-1和3-2,其他行都只有一个检测框,因此都是x-1。
进一步的,可以根据排序结果确定与检测框水平相邻或者竖直相邻的相邻检测框,即,排序结果临近的检测框就是相邻检测框,x临近代表水平相邻,y临近代表竖直相邻。例如,图5中检测框3-1和检测框3-2水平相邻,检测框2-1与检测框3-1竖直相邻,检测框2-1与检测框3-2竖直相邻。
在一些实施例中,除了上述排序方式,还可以通过其他方式确定相邻的检测框,本实施例不做限制。例如,通过距离大小是否小于某个阈值等。
在一些实施例中,除了上述相邻的位置关系以外,其他检测框还是可以是其他位置关系,本实施例不做限制。在一些实施例中,基于多个检测框构建版面图时,还可以将多个检测框中任意两个进行连接构成。
为了更清楚的示意构建版面图的过程,以下将结合图4、图5和图6对构建版 面图的过程进行说明。仅作为示例,版面图中的边连接的是检测框和其他检测框,其中,其他检测框为与检测框相邻的检测框。如图4所示,在构建版面图430时,版面图包含6个节点,每个节点对应一个检测框(即检测框1-1’至6-1’中的一个),版面图430中的边连接的检测框包括:竖直相邻的检测框1-1’和2-1’、检测框2-1’和3-1’、检测框3-1’和4-1’、检测框4-1’和5-1’、和检测框5-1’和6-1’。如图5所示,在构建版面图530时,版面图包含12个节点,每个节点对应一个检测框(即,检测框1-1至11-1中的一个),版面图530中的边连接的检测框包括:竖直相邻的检测框1-1和2-1、检测框2-1和3-1、检测框2-1和3-2、检测框3-1和4-1、检测框3-2和4-1、检测框4-1和5-1等,水平相邻的检测框3-1和3-2。
仅作为示例,版面图中的边连接的是检测框和其他检测框,其中,其他检测框可以为与检测框之间的距离小于第三预设阈值的检测框。如图6所示,若第三预设阈值大于检测框4-1’与5-1’之间的距离,小于检测框4-1’与6-1’之间的距离,则版面图630中的边连接的是检测框包括:检测框1-1’与2-1’、1-1’与3-1’、1-1’与4-1’、2-1’与3-1’、2-1’与4-1’、3-1’与4-1’、4-1’与5-1’、5-1’与6-1’。
步骤340,利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别,基于所述字段类别识别证件。在一些实施例中,该步骤340可以由分类模块240执行。
在一些实施例中,图神经网络模型可以是预先训练好的机器学习模型。训练好的图神经网络模型可以对版面图进行处理,确定版面图中检测框的字段类别。在一些实施例中,不同类型的证件对应不同的训练好的图神经网络模型,即,证件存在对应的图神经网络模型,该对应的图神经网络模型是基于该证件构建的训练集训练得到。关于图神经网络模型的训练参见图8及其相关描述。
在一些实施例中,字段类别可以是指检测框内的文本所属的类别。例如,待识别图像为驾驶证图像,则字段类别可以是准驾车型、档案号、签发地、姓名、日期以及其他等。
在一些实施例中,训练好的图神经网络模型可以对版面图进行处理,确定版面图中检测框属于各个预定的字段类型的概率,其中,预定的字段类别由训练图神经网络模型的样本的标签确定。检测框属于各个预定的字段类型的概率可以通过概率分布表示。例如,概率分布可以是1*n的实数向量,其中,n是向量的维数,n可以是1、2、3等。示例地,仍以上述待识别图像为驾驶证为例,则检测框的概率分布可以是1*6的实数向 量。例如,概率分布的形式可以为(a,b,c,d,e,f),其中,a表示检测框的字段类别为准驾车型的概率,b表示检测框的字段类别为档案号的概率,c表示检测框的字段类别为签发地的概率,d表示检测框的字段类别为姓名的概率,e表示检测框的字段类别为日期的概率,f表示检测框的字段类别为其他的概率。
在一些实施例中,分类模块240可以基于概率分布,确定检测框的字段类别。例如,将概率分布中最大概率值对应的字段类别,确定为检测框的字段类别。
在一些实施例中,可以基于字段类别识别证件。具体的,基于检测框的字段类别,确定与预设业务相关的内容框;基于识别算法对内容框中的文本进行识别,确定内容框中的文本内容。在一些实施例中,识别算法可以包括任何文本识别算法,例如,OCR识别。
预设业务可以自定义,例如,网约车业务,又例如,网约车业务中的司机认证等。在一些实施例中,内容框可以是指与预设业务相关的字段类型对应的检测框。在一些实施例中,内容框可以是为了实现预设业务所需要的字段类型对应的检测框。例如,预设业务是网约车司机认证,待识别图像对应的证件为身份证,则与预设业务相关的字段类型包括身份证号码、姓名、年龄、性别、户籍地址等,相应的,这些字段类型对应的检测框为内容框。可以理解的,可以基于内容框中的文本内容,实现预设业务。例如,预设业务为司机认证,通过对比身份证中内容框的文本是否与司机填写的内容一致,可以确定司机是否通过认证。又例如,预设业务为支付账户注册,可以将银行卡中卡号对应的检测框(即,内容框)的文本添加到支付账户的银行卡信息中。其中,与预设业务相关的内容框中的文本内容可以称为关键字段,从而,可以通过确定检测框获取关键字段,并对关键字段进行识别,提取证件中的关键信息,关键信息即为关键字段的文本内容信息。可以理解的,通过上述实施例,基于检测框的字段类别,可以筛除与预设业务无关的检测框,为证件识别提高效率。
在一些实施例中,图神经网络模型可以包括多层图神经网络。多层图神经网络训练或实际应用过程中,每一层每个节点从与之连接(例如,相邻)的节点接收信息,并进行节点之间的信息融合,经过多层图神经网络之后,每一层中的节点可以与更远的节点(例如,与之不连接或相邻的节点)进行信息融合,提高了分类准确性。
根据以上描述可知,一方面,本说明书实施例利用图神经网络模型对证件的版面图进行处理,利用深度学习方法来解决证件的版面分析问题,无需指定复杂的匹配规则即可确定与预设业务相关的内容框;另一方面,本说明书实施例利用图神经网络模型 对版面进行分析,可以充分利用版面图中检测框与其周围检测框的信息,即使证件的版面发生了变化,例如某个文本信息的字段由一行变成了两行,由于其周围检测框的信息未发生变化,也能得到该文本信息的正确字段类别,分类准确率高;再一方面,图神经网络模型仍然能够挖掘残缺证件或为矫正为水平状态的证件中检测框与周围检测框的信息,因此,本说明书实施例中的图神经网络模型能够对抗证件残缺或证件未校正为水平状态等的干扰,得到正确的分类结果;最后一方面,本说明书实施例仅对内容框中的文本进行识别,识别效率高。
图7是根据本说明书的一些实施例所示的确定多个检测框的方法的流程图。如图7所示,该流程700可以包括以下步骤:
步骤710,获取所述证件的类型。在一些实施例中,该步骤710可以由检测模块220执行。
在一些实施例中,证件的类型可以待识别图像对应的证件的类型。在一些实施例中,证件的类型可以反映证件的用途或/和语言信息。例如,证件的类型可以反映身份证、驾驶证或者行驶证等用途信息,还可以反映中文或英文等语言信息。在一些实施例中,检测模块220可以从用户终端130中获取证件的类型。例如,用户在用户终端130上传某个证件的图片,通过用户自己填写、选择,或者用户终端自动识别确定证件的类型。
步骤720,基于文本检测算法对所述待识别图像进行处理,确定多个文本框。在一些实施例中,该步骤720可以由检测模块220执行。
在一些实施例中,文本检测算法可以是用于检测证件中文本的算法。在一些实施例中,检测算法可以采用任意的文本检测算法,包括但不限于:PSENet(Progressive Scale Expansion Network)渐进尺度扩展网络、PANNet(Pixel Aggregation Network)像素聚合网络、以及DBNet(Differentiable Binarization Network)可微分的二值化网络等。
在一些实施例中,文本框可以是基于文本检测算法对待识别图像进行处理,自动生成的边界框。文本框是以特定内容为单位的边界框,其中,特定内容可以是单词、一行文字或单个字等。在一些实施例中,文本检测算法可以基于待识别图像中文本的类型,生成不同的文本框。例如,当证件中包含英文文本,则文本检测算法可以对证件的英文文本以单词为单位逐行分别进行框定,生成多个文本框,可以理解的,该实施例确定的文本框中的文本为单个英文单词。又例如,当待识别图像中包含中文,则文本检测算法可以对证件的中文文本以行为单位进行框定,生成多个文本框,可以理解的,该实 施例确定的文本框中的文本为一行中文文本。又例如,当待识别图像中包含中文,则文本检测算法可以对证件的中文文本以单个字为单位进行框定,生成多个文本框,可以理解的,该实施例确定的文本框中的文本为一个字。
步骤730,当所述类型属于预设类型,基于所述预设类型对应的预设规则对所述多个文本框进行处理,确定所述多个检测框。在一些实施例中,该步骤730可以由检测模块220执行。
在一些实施例中,预设类型可以根据实际情况进行具体设置。例如,预设类型可以是中国驾驶证或中国身份证等。
预设规则是指对文本框处理的规则。在一些实施例中,预设规则代表可以进行文本框合并的情况,以及合并的方式。在一些实施例中,预设规则还可以是其他处理规则,本实施例不做限制。例如,文本框分割的规则等。不同的证件,预设规则也不完全相同,因此,不同的预设类型,存在对应的预设规则。
如图3及其相关描述可知,检测框可以是文本框。在一些实施例中,检测框还可以是对文本框进行处理后得到的框。
通过文本检测算法直接确定的文本框数量一般较大,一行中可能存在多个文本框。例如,当待识别图像为英文证件图像时,文本检测算法确定的文本框以单词为单位,证件中英文单词越多,文本框也越多。
某些证件的某一行的文本通常为同一种字段类型。在一些实施例中,检测模块220可以基于预设类型对应的预设规则对多个文本框进行处理,确定多个检测框。在一些实施例中,预设类型对应的证件存在至少一个合并参考行,合并参考行中的字段类型相同,则预设规则可以包括:将证件中待合并行的待合并文本框进行合并,其中,待合并行是指证件中与合并参考行对应的行,待合并文本框是指位于同一行的文本框。。相应的,当证件类型属于预设类型时,多个检测框的确定具体为:确定证件中位于同一行的待合并文本框(关于如何确定同一行的待合并文本框见后文);确定证件的至少一个待合并行;将待合并行的待合并文本框进行合并。确定待合并行可以是:对证件中的文本框进行排序(文本框的排序与检测框的排序类似,具体见前文),基于排序结果确定是否与合并参考行对应,如,若合并参考行在第三行,则待合并行也为第三行。以图4为例,若预设类型对应的证件中第二行为同一字段类型,基于文本检测算法,确定了证件的第二行有2个文本框(如410所示),则将该2个文本框进行合并,得到一个检测框(如420所示)。
对于一些证件而言,证件的某些行中可能含有不止一种字段类型,例如年龄和性别在同一行,对于此类证件而言,特定行的多个文本框并不会发生合并。如图5所示,此证件第3行存在两个字段类型,在对510中的文本框进行合并处理时,第三行并未发生合并(如520所示),即,第三行的检测框为文本框。
某些特定证件的某些紧挨着的特定的多行,通常为同一字段类型。可以理解的,预设规则可以基于此类特定进行指定,对文本框进行合并。
在一些实施例中,确定证件中位于同一行的待合并文本框包括:判断文本框与其他文本框在竖直方向上对应的坐标值的重合度;响应于重合度大于第一预设阈值,将文本框和其他文本框确定为位于同一行的待合并文本框。
在一些实施例中,在同一个坐标轴下,确定文本框对应的y轴坐标值范围,以及其他文本框对应的y轴坐标值范围,进一步的,基于两个文本框的坐标值范围,确定文本框和其他文本框在y轴的坐标值的重合度。具体的,重合度为两个文本框在y轴的坐标值的重合范围占两个文本框在y轴所占的整个坐标值的范围。如图10所示,文本框1010的y轴坐标值范围为(y4,y3),其他文本框1020的y轴坐标值范围为(y2,y1),两个文本框的坐标值的重合范围为(y4,y1),两个文本框的整个坐标值的范围为(y2,y3),因此,两个文本框的重合度=(y4-y1)/(y2-y3)。
在一些实施例中,第一预设阈值可以根据实际需求进行具体设置。例如,80%、95%等。在一些实施例中,检测模块220可以响应于重合度大于第一预设阈值,将文本框和其他文本框确定为与文本框位于同一行的待合并文本框。
根据以上描述可知,本说明书实施例可以预先将文本检测算法获取的多个文本框进行合并处理,确定多个检测框,再基于该多个检测框构建版面图。减少了版面图中的节点的数量,简化了构建的版面图的结果,提高了图神经网络模型对版面图的处理效率。
图8是根据本说明书的一些实施例所示的确定多个检测框的方法的另一流程图。如图8所示,该流程800可以包括以下步骤:
步骤810,基于文本检测算法对所述待识别图像进行处理,确定多个文本框。在一些实施例中,该步骤810可以由检测模块220执行。
关于步骤810的具体细节可以参见上述步骤720,在此不再赘述。
步骤820,判断所述文本框和其他文本框之间距离是否小于第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字号是否相同。在一些实施例中,该步 骤820可以由检测模块220执行。
在一些实施例中,文本框和其他文本框之间的距离可以参见图3及其相关描述。
在一些实施例中,检测模块220可以基于文本框的大小(例如,文本框的高)确定字号大小。例如,基于预设规则,确定文本框大小对应的字号。
在一些实施例中,第二预设阈值可以根据实际需求具体设置。例如,3或5等。
步骤830,响应于所述文本框和所述其他文本框之间所述距离小于所述第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字字号相同,合并所述文本框和所述其他文本框,确定所述检测框。在一些实施例中,该步骤830可以由检测模块220执行。
关于第二预设阈值和字号的具体细节可以参见上述步骤820,在此不再赘述。
通过该实施例可以不考虑证件类型,只要基于证件图像的文本框之间的距离满足要求,字号相同,即可进行合并。
图9是根据本说明书的一些实施例所示的训练图神经网络模型的方法的流程图。其中,对于图神经网络模型的样本训练集中的样本版面图的至少一个样本节点中的每个,均采用监督学习的方法进行训练。在一些实施例中,不同类型的证件可以训练对应的图神经网络模型,训练好的图神经网络模型可以用于确定对应的证件的字段的类型。例如,基于身份证的图像构建对应的训练集,并基于该对应的训练集训练用于识别身份证中字段类型的图神经网络模型。
具体的,如图9所示,该训练流程900可以包括以下步骤:
步骤910,获取样本训练集,所述样本训练集包括:基于所述证件的多个样本图像建立的多个样本版面图,和所述样本版面图的至少一个样本节点对应的标签。在一些实施例中,该步骤910可以由处理设备110执行。
在一些实施例中,样本训练集可以是输入至初始图神经网络模型中用于训练图神经网络模型的数据。
样本图像是指基于证件得到的、用于建立训练用的样本版面图的图像。其中,样本图像可以是证件的完整图像、证件的非完整图像和证件的不同排版的图像。
证件的完整图像是指包含证件所有内容的图像。
证件的非完整图像是指包含证件中部分内容的图像。例如,证件的非完整图像可以是对证件的完整图像进行裁剪得到。又例如,证件的非完整图像可以是遮挡证件中部分内容成像得到等,关于证件的非完整图像的获取方式不做限制。
示例地,仍以图5示例的样本证件为驾驶证为例,则证件的完整图像可以为图5所示的包含所有信息的证件图像,证件的非完整图像可以是对该驾驶证图像进行裁剪获得的证件图像,例如,裁剪获得的仅包含检测框1-1至7-1内信息的证件图像。
证件不同排版的图像是指基于该证件的不同排版得到的图像,不同排版是指该证件可能的所有类型的排版,例如,身份证中地址信息,其在排版上可能是两行、三行等多种情况,则可能不同人的身份证排版上存在差异(即,地址所占行数不同)。
样本版面图是用于训练的版面图,其基于样本图像得到的。在一些实施例中,样本版面图的样本节点对应样本图像的样本检测框,样本版面图中样本边对应样本检测框与其它样本检测框之间的空间位置关系,样本节点对应的标签表征样本节点对应的样本检测框中字段的类别。在一些实施例中,为了保证训练好的模型的预测准确率,训练好的模型的应用和训练过程中确定检测框的方式可以一致。
关于基于样本图像确定样本检测框,与图3中基于待识别图像确定检测框类似;关于确定样本检测框和其他样本检测框的空间位置关系,与图3中确定检测框和其他检测框的空间位置关系类似,在此不再赘述。
可以理解的,基于证件的完整图像、非完整图像和不同排版的图像建议样本版面图,并训练初始图神经网络,可以使初始图神经网络模型学习证件在不同情况下(例如,只有证件的不一部分内容等)各个字段类型对应检测框之间的关系,从而,训练好的图神经网络模型可以用于确定不同情况下证件中字段的类型。
如前步骤330及其相关描述可知,样本版面图中的其他样本检测框可以是与样本检测框存在特定位置关系的框,其中的特定位置关系并不止包括一种关系。例如,其他样本检测框可以与样本检测框相邻,其他样本检测框也可以与样本检测框之间的距离满足预设要求。因此,对于样本检测框而言,随着其与其他样本检测框存在的位置关系发生变化,与之连接的其他样本检测框也会发生变化,样本检测框与其他样本检测框的排列布局不相同,进而构建出不同的版面图。
考虑到某些证件同一行的字段中会出现多种文本类型,不同的文本类型代表的字段类型可能不同。在一些实施例中,基于文本监测算法确定样本图像的样本文本框之后,基于样本文本框确定样本检测框时,合并的规则可以是位于同一行、且文字类型相同。其中,文字类型相同是否可以人工判定。
步骤920,基于所述样本训练集,训练得到所述训练好的图神经网络模型。在一些实施例中,该步骤920可以由处理设备110执行。
在一些实施例中,可以基于样本训练集,训练得到训练好的图神经网络模型。在一些实施例中,可以基于样本训练集迭代更新初始图神经网络的参数以减小各样本版面图的样本节点对应的损失函数值,得到训练好的图神经网络模型。具体的,可以迭代更新初始图神经网络模型的参数,以减小各样本版面图的样本节点对应的损失函数值,使得损失函数值满足预设条件。例如,损失函数值收敛,或损失函数值小于预设值。当损失函数满足预设条件时,模型训练完成,得到训练好的图神经网络模型。
在一些实施例中,训练的损失函数可以基于样本节点对应的标签和样本节点输出的预测值之间的差异建立。其中,样本节点输出的预测值可以是初始图神经网络模型对样本版面图进行处理后,得到的样本节点的预测值。例如,可以是所有样本节点对应的损失函数之和,每一个样本节点的损失函数是基于该节点输出的预测值和标签的差异建立。可以理解的,通过监督学习的方式对版面图中每个节点进行训练,即每个节点存在对应的损失函数,通过所有节点的损失函数对图神经网络模型的参数进行更新,完成训练。建立损失函数的方式可以是交叉熵或平方差等。
本说明书实施例还提供一种基于图神经网络识别证件的装置,所述装置包括处理器以及存储器,所述存储器用于存储指令,所述处理器用于执行所述指令,以实现如前任一项所述的基于图神经网络识别证件的方法对应的操作。
本说明书实施例还提供一种计算机可读存储介质。所述存储介质存储计算机指令,所述计算机指令被处理器执行时,实现如前任一项所述的基于图神经网络识别证件的方法对应的操作。
本说明书实施例可能带来的有益效果包括但不限于:(1)本说明书实施例利用图神经网络模型对证件的版面图进行处理,利用深度学习方法来解决证件的版面分析问题,无需指定复杂的匹配规则即可确定与预设业务相关的内容框;(2)本说明书实施例中的图神经网络模型能够容忍证件存在较大的版面变化、以及能够对抗证件残缺或证件未校正为水平状态等的干扰,得到正确的分析结果,分析准确率高;(3)本说明书实施例通过确定证件中的字段的类型,可以提高证件后续识别的效率,具体的,可以基于确定的字段类型,选择与预设业务相关的字段,进一步仅识别与预设业务相关的字段的具体内容,避免对所有字段的具体内容都进行识别。需要说明的是,不同实施例可能产生的有益效果不同,在不同的实施例里,可能产生的有益效果可以是以上任意一种或几种的组合,也可以是其他任何可能获得的有益效果。
上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述详细披露 仅仅作为示例,而并不构成对本说明书的限定。虽然此处并没有明确说明,本领域技术人员可能会对本说明书进行各种修改、改进和修正。该类修改、改进和修正在本说明书中被建议,所以该类修改、改进、修正仍属于本说明书示范实施例的精神和范围。
同时,本说明书使用了特定词语来描述本说明书的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本说明书至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外,本说明书的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。
此外,本领域技术人员可以理解,本说明书的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的工序、机器、产品或物质的组合,或对他们的任何新的和有用的改进。相应地,本说明书的各个方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。此外,本说明书的各方面可能表现为位于一个或多个计算机可读介质中的计算机产品,该产品包括计算机可读程序编码。
计算机存储介质可能包含一个内含有计算机程序编码的传播数据信号,例如在基带上或作为载波的一部分。该传播信号可能有多种表现形式,包括电磁形式、光形式等,或合适的组合形式。计算机存储介质可以是除计算机可读存储介质之外的任何计算机可读介质,该介质可以通过连接至一个指令执行系统、装置或设备以实现通讯、传播或传输供使用的程序。位于计算机存储介质上的程序编码可以通过任何合适的介质进行传播,包括无线电、电缆、光纤电缆、RF、或类似介质,或任何上述介质的组合。
本说明书各部分操作所需的计算机程序编码可以用任意一种或多种程序语言编写,包括面向对象编程语言如Java、Scala、Smalltalk、Eiffel、JADE、Emerald、C++、C#、VB.NET、Python等,常规程序化编程语言如C语言、Visual Basic、Fortran2003、Perl、COBOL2002、PHP、ABAP,动态编程语言如Python、Ruby和Groovy,或其他编程语言等。该程序编码可以完全在用户计算机上运行、或作为独立的软件包在用户计算机上运行、或部分在用户计算机上运行部分在远程计算机运行、或完全在远程计算机或处理设备上运行。在后种情况下,远程计算机可以通过任何网络形式与用户计算机连接,比如局域网(LAN)或广域网(WAN),或连接至外部计算机(例如通过因特网),或在云计算环境中,或作为服务使用如软件即服务(SaaS)。
此外,除非权利要求中明确说明,本说明书所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用,并非用于限定本说明书流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例,但应当理解的是,该类细节仅起到说明的目的,附加的权利要求并不仅限于披露的实施例,相反,权利要求旨在覆盖所有符合本说明书实施例实质和范围的修正和等价组合。例如,虽然以上所描述的系统组件可以通过硬件设备实现,但是也可以只通过软件的解决方案得以实现,如在现有的处理设备或移动设备上安装所描述的系统。
同理,应当注意的是,为了简化本说明书披露的表述,从而帮助对一个或多个发明实施例的理解,前文对本说明书实施例的描述中,有时会将多种特征归并至一个实施例、附图或对其的描述中。但是,这种披露方法并不意味着本说明书对象所需要的特征比权利要求中提及的特征多。实际上,实施例的特征要少于上述披露的单个实施例的全部特征。
一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本说明书一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。
针对本说明书引用的每个专利、专利申请、专利申请公开物和其他材料,如文章、书籍、说明书、出版物、文档等,特此将其全部内容并入本说明书作为参考。与本说明书内容不一致或产生冲突的申请历史文件除外,对本说明书权利要求最广范围有限制的文件(当前或之后附加于本说明书中的)也除外。需要说明的是,如果本说明书附属材料中的描述、定义、和/或术语的使用与本说明书所述内容有不一致或冲突的地方,以本说明书的描述、定义和/或术语的使用为准。
最后,应当理解的是,本说明书中所述实施例仅用以说明本说明书实施例的原则。其他的变形也可能属于本说明书的范围。因此,作为示例而非限制,本说明书实施例的替代配置可视为与本说明书的教导一致。相应地,本说明书的实施例不仅限于本说明书明确介绍和描述的实施例。

Claims (24)

  1. 一种基于图神经网络识别证件的方法,包括:
    获取待识别图像;
    检测所述待识别图像中包含的内容文本,确定多个检测框;
    基于所述多个检测框构建版面图;其中,所述版面图包括多个节点和多个边,所述节点对应所述检测框,所述边对应所述检测框与其它检测框之间的空间位置关系;以及
    利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别,基于所述字段类别识别证件。
  2. 如权利要求1所述的方法,所述检测所述待识别图像中包含的内容文本,确定多个检测框,包括:
    获取所述证件的类型;
    基于文本检测算法对所述待识别图像进行处理,确定多个文本框;以及
    当所述类型属于预设类型,基于所述预设类型对应的预设规则对所述多个文本框进行处理,确定所述多个检测框。
  3. 如权利要求2所述的方法,所述预设类型对应的证件存在至少一个合并参考行,所述合并参考行中的字段类型相同,
    所述当所述类型属于预设类型,基于所述预设类型对应的预设规则对所述多个文本框进行处理,确定所述多个检测框,包括:
    确定所述证件中位于同一行的待合并文本框;
    确定所述证件的至少一个待合并行,所述待合并行与所述合并参考行对应;以及
    将所述待合并行的待合并文本框进行合并,确定所述检测框。
  4. 如权利要求3所述的方法,所述确定所述证件中位于同一行的待合并文本框,包括:
    判断所述文本框与其他文本框在竖直方向上对应的坐标值的重合度;以及
    响应于所述重合度大于第一预设阈值,将所述文本框和所述其他文本框确定为所述位于同一行的待合并文本框。
  5. 如权利要求1所述的方法,所述检测所述待识别图像中包含的内容文本,确定多个检测框,包括:
    基于文本检测算法对所述待识别图像进行处理,确定多个文本框;
    判断所述文本框和其他文本框之间距离是否小于第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字号是否相同;以及
    响应于所述文本框和所述其他文本框之间所述距离小于所述第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字号相同,合并所述文本框和所述其他文本框,确定所述检测框。
  6. 如权利要求1所述的方法,所述节点的特征反映以下信息中的一种或多种:
    所述检测框的位置、大小、形状和相关的图像信息,所述相关的图像信息是基于所述检测框确定的区域图像的相关信息。
  7. 如权利要求1所述的方法,所述边的特征反映以下信息中的一种或多种:
    所述检测框与所述其它检测框之间的距离信息和相对位置信息。
  8. 如权利要求1所述的方法,所述基于所述多个检测框构建版面图,包括:
    从所述多个检测框中,确定与所述检测框水平相邻或/和竖直相邻的至少一个其他检测框;以及
    将所述多个检测框中每一个及其对应的至少一个其他检测框进行连接,构成所述版面图。
  9. 如权利要求1所述的方法,所述基于所述多个检测框构建版面图,包括:
    从所述多个检测框中,确定与所述检测框之间的距离满足预设要求的至少一个其他检测框;以及
    将所述多个检测框中每一个及其对应的至少一个其他检测框进行连接,构成所述版面图。
  10. 如权利要求1所述的方法,所述基于所述字段类别识别证件,包括:
    基于所述检测框的字段类别,确定与预设业务相关的内容框;以及
    基于识别算法对所述内容框中的文本进行识别,确定所述内容框中的文本内容。
  11. 如权利要求1所述的方法,训练所述训练好的图神经网络模型包括:
    获取样本训练集,所述样本训练集包括:基于所述证件的多个样本图像建立的多个样本版面图,和所述样本版面图的至少一个样本节点对应的标签;其中,
    所述样本图像为所述证件的完整图像、所述证件的非完整图像和所述证件的不同排版的图像;
    所述样本版面图的样本节点对应所述样本图像的样本检测框,所样本版面图中样本边对应所述样本检测框与其它样本检测框之间的空间位置关系,所述样本节点对应的标签表征所述样本节点对应的样本检测框中字段的类别;以及
    基于所述样本训练集,训练得到所述训练好的图神经网络模型;其中,
    训练的损失函数基于所述样本节点对应的标签和所述样本节点输出的预测值之间的差异建立。
  12. 一种基于图神经网络识别证件的系统,包括:
    获取模块,用于获取待识别图像;
    检测模块,用于检测所述待识别图像中包含的内容文本,确定多个检测框;
    构建模块,用于基于所述多个检测框构建版面图;其中,所述版面图包括多个节点和多个边,所述节点对应所述检测框,所述边对应所述检测框与其它检测框之间的空间位置关系;以及
    分类模块,用于利用训练好的图神经网络模型对所述版面图进行处理,确定所述版面图中所述检测框的字段类别,基于所述字段类别识别证件。
  13. 如权利要求12所述的系统,所述检测模块用于:
    获取所述证件的类型;
    基于文本检测算法对所述待识别图像进行处理,确定多个文本框;以及
    当所述类型属于预设类型,基于所述预设类型对应的预设规则对所述多个文本框进行处理,确定所述多个检测框。
  14. 如权利要求13所述的系统,所述预设类型对应的证件存在至少一个合并参考 行,所述合并参考行中的字段类型相同,所述检测模块用于:
    确定所述证件中位于同一行的待合并文本框;
    确定所述证件的至少一个待合并行,所述待合并行与所述合并参考行对应;以及
    将所述待合并行的待合并文本框进行合并,确定所述检测框。
  15. 如权利要求14所述的系统,所述检测模块用于:
    判断所述文本框与其他文本框在竖直方向上对应的坐标值的重合度;以及
    响应于所述重合度大于第一预设阈值,将所述文本框和所述其他文本框为确定为所述位于同一行的待合并文本框。
  16. 如权利要求12所述的系统,所述检测模块用于:
    基于文本检测算法对所述待识别图像进行处理,确定多个文本框;
    判断所述文本框和其他文本框之间距离是否小于第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字号是否相同;以及
    响应于所述文本框和所述其他文本框之间所述距离小于所述第二预设阈值,以及所述文本框中内容和所述其他文本框中内容的字号相同,合并所述文本框和所述其他文本框,确定所述检测框。
  17. 如权利要求12所述的系统,所述节点的特征反映以下信息中的一种或多种:
    所述检测框的位置、大小、形状和相关的图像信息,所述相关的图像信息是基于所述检测框确定的区域图像的相关信息。
  18. 如权利要求12所述的系统,所述边的特征反映以下信息中的一种或多种:
    所述检测框与所述其它检测框之间的距离信息和相对位置信息。
  19. 如权利要求12所述的系统,所述构建模块用于:
    从所述多个检测框中,确定与所述检测框水平相邻或/和竖直相邻的至少一个其他检测框;以及
    将所述多个检测框中每一个及其对应的至少一个其他检测框进行连接,构成所述版面图。
  20. 如权利要求12所述的系统,所述构建模块用于:
    从所述多个检测框中,确定与所述检测框之间的距离满足预设要求的至少一个其他检测框;以及
    将所述多个检测框中每一个及其对应的至少一个其他检测框进行连接,构成所述版面图。
  21. 如权利要求12所述的系统,所述系统还包括识别模块,用于:
    基于所述检测框的字段类别,确定与预设业务相关的内容框;以及
    基于识别算法对所述内容框中的文本进行识别,确定所述内容框中的文本内容。
  22. 如权利要求12所述的系统,所述图神经网络模型通过如下方法训练得到:
    获取样本训练集,所述样本训练集包括:基于所述证件的多个样本图像建立的多个样本版面图,和所述样本版面图的至少一个样本节点对应的标签,其中,
    所述样本图像为所述证件的完整图像、所述证件的非完整图像或所述证件的不同排版的图像;
    所述样本版面图的样本节点对应所述样本图像的样本检测框,所样本版面图中样本边对应所述样本检测框与其它样本检测框之间的空间位置关系,所述样本节点对应的标签表征所述样本节点对应的样本检测框中字段的类别;以及
    基于所述样本训练集,训练得到所述训练好的图神经网络模型;其中,
    训练的损失函数基于所述样本节点对应的标签和所述样本节点输出的预测值之间的差异建立。
  23. 一种基于图神经网络识别证件的装置,所述装置包括处理器以及存储器,所述存储器用于存储指令,其特征在于,所述处理器用于执行所述指令,以实现如权利要求1至11中任一项所述的基于图神经网络识别证件的方法对应的操作。
  24. 一种计算机可读存储介质,所述存储介质存储计算机指令,所述计算机指令被处理器执行时,实现如权利要求1至11中任一项所述的基于图神经网络识别证件的方法对应的操作。
PCT/CN2021/112926 2020-08-26 2021-08-17 一种基于图神经网络识别证件的方法及系统 WO2022042365A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010870570.1 2020-08-26
CN202010870570.1A CN112016438B (zh) 2020-08-26 2020-08-26 一种基于图神经网络识别证件的方法及系统

Publications (1)

Publication Number Publication Date
WO2022042365A1 true WO2022042365A1 (zh) 2022-03-03

Family

ID=73503363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112926 WO2022042365A1 (zh) 2020-08-26 2021-08-17 一种基于图神经网络识别证件的方法及系统

Country Status (2)

Country Link
CN (1) CN112016438B (zh)
WO (1) WO2022042365A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283431A (zh) * 2022-03-04 2022-04-05 南京安元科技有限公司 一种基于可微分二值化的文本检测方法
CN114821622A (zh) * 2022-03-10 2022-07-29 北京百度网讯科技有限公司 文本抽取方法、文本抽取模型训练方法、装置及设备
CN115937868A (zh) * 2022-12-12 2023-04-07 江苏中烟工业有限责任公司 烟包标签信息匹配方法、装置、电子设备及存储介质
CN116129456A (zh) * 2023-02-09 2023-05-16 广西壮族自治区自然资源遥感院 一种产权权属信息识别录入方法及系统
CN116229493A (zh) * 2022-12-14 2023-06-06 国家能源集团物资有限公司 跨模态的图片文本命名实体识别方法、系统及电子设备
CN116363667A (zh) * 2023-04-26 2023-06-30 公安部信息通信中心 一种聚合文件主题识别与归类系统

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016438B (zh) * 2020-08-26 2021-08-10 北京嘀嘀无限科技发展有限公司 一种基于图神经网络识别证件的方法及系统
CN112749694B (zh) * 2021-01-20 2024-05-21 中科云谷科技有限公司 用于识别图像方向、识别铭牌文字的方法及装置
CN113342997B (zh) * 2021-05-18 2022-11-11 成都快眼科技有限公司 一种基于文本行匹配的跨图文本阅读方法
CN113505716B (zh) * 2021-07-16 2022-07-01 重庆工商大学 静脉识别模型的训练方法、静脉图像的识别方法及装置
CN113610098B (zh) * 2021-08-19 2022-08-09 创优数字科技(广东)有限公司 纳税号识别方法、装置、存储介质及计算机设备
CN114283403B (zh) * 2021-12-24 2024-01-16 北京有竹居网络技术有限公司 一种图像检测方法、装置、存储介质及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101137011A (zh) * 2006-08-29 2008-03-05 索尼株式会社 图像处理装置、图像处理方法和计算机程序
CN110188827A (zh) * 2019-05-29 2019-08-30 创意信息技术股份有限公司 一种基于卷积神经网络和递归自动编码器模型的场景识别方法
US20200143201A1 (en) * 2018-02-26 2020-05-07 Capital One Services, Llc Dual stage neural network pipeline systems and methods
CN111191715A (zh) * 2019-12-27 2020-05-22 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质
CN112016438A (zh) * 2020-08-26 2020-12-01 北京嘀嘀无限科技发展有限公司 一种基于图神经网络识别证件的方法及系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030104526A1 (en) * 1999-03-24 2003-06-05 Qiang Liu Position dependent recognition of GNN nucleotide triplets by zinc fingers
CA2983235A1 (en) * 2016-10-20 2018-04-20 Arya Ghadimi System and method for capturing and processing image and text information
CN108229299B (zh) * 2017-10-31 2021-02-26 北京市商汤科技开发有限公司 证件的识别方法和装置、电子设备、计算机存储介质
CN111902825A (zh) * 2018-03-23 2020-11-06 多伦多大学管理委员会 多边形对象标注系统和方法以及训练对象标注系统的方法
US11669914B2 (en) * 2018-05-06 2023-06-06 Strong Force TX Portfolio 2018, LLC Adaptive intelligence and shared infrastructure lending transaction enablement platform responsive to crowd sourced information
CN108694393A (zh) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 一种基于深度卷积的证件图像文本区域提取方法
CN110263227B (zh) * 2019-05-15 2023-07-18 创新先进技术有限公司 基于图神经网络的团伙发现方法和系统
CN110738203B (zh) * 2019-09-06 2024-04-05 中国平安财产保险股份有限公司 字段结构化输出方法、装置及计算机可读存储介质
CN110378328B (zh) * 2019-09-16 2019-12-13 图谱未来(南京)人工智能研究院有限公司 一种证件图像处理方法及装置
CN110647832A (zh) * 2019-09-16 2020-01-03 贝壳技术有限公司 获取证件中信息的方法和装置、电子设备和存储介质
CN110705260B (zh) * 2019-09-24 2023-04-18 北京工商大学 一种基于无监督图神经网络结构的文本向量生成方法
CN110674301A (zh) * 2019-09-30 2020-01-10 出门问问信息科技有限公司 一种情感倾向预测方法、装置、系统及存储介质
CN111353458B (zh) * 2020-03-10 2023-08-18 腾讯科技(深圳)有限公司 文本框标注方法、装置和存储介质
CN111340037B (zh) * 2020-03-25 2022-08-19 上海智臻智能网络科技股份有限公司 文本版面分析方法、装置、计算机设备和存储介质
CN111553363B (zh) * 2020-04-20 2023-08-04 北京易道博识科技有限公司 一种端到端的图章识别方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101137011A (zh) * 2006-08-29 2008-03-05 索尼株式会社 图像处理装置、图像处理方法和计算机程序
US20200143201A1 (en) * 2018-02-26 2020-05-07 Capital One Services, Llc Dual stage neural network pipeline systems and methods
CN110188827A (zh) * 2019-05-29 2019-08-30 创意信息技术股份有限公司 一种基于卷积神经网络和递归自动编码器模型的场景识别方法
CN111191715A (zh) * 2019-12-27 2020-05-22 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质
CN112016438A (zh) * 2020-08-26 2020-12-01 北京嘀嘀无限科技发展有限公司 一种基于图神经网络识别证件的方法及系统

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283431A (zh) * 2022-03-04 2022-04-05 南京安元科技有限公司 一种基于可微分二值化的文本检测方法
CN114821622A (zh) * 2022-03-10 2022-07-29 北京百度网讯科技有限公司 文本抽取方法、文本抽取模型训练方法、装置及设备
CN115937868A (zh) * 2022-12-12 2023-04-07 江苏中烟工业有限责任公司 烟包标签信息匹配方法、装置、电子设备及存储介质
CN116229493A (zh) * 2022-12-14 2023-06-06 国家能源集团物资有限公司 跨模态的图片文本命名实体识别方法、系统及电子设备
CN116229493B (zh) * 2022-12-14 2024-02-09 国家能源集团物资有限公司 跨模态的图片文本命名实体识别方法、系统及电子设备
CN116129456A (zh) * 2023-02-09 2023-05-16 广西壮族自治区自然资源遥感院 一种产权权属信息识别录入方法及系统
CN116129456B (zh) * 2023-02-09 2023-07-25 广西壮族自治区自然资源遥感院 一种产权权属信息识别录入方法及系统
CN116363667A (zh) * 2023-04-26 2023-06-30 公安部信息通信中心 一种聚合文件主题识别与归类系统
CN116363667B (zh) * 2023-04-26 2023-10-13 公安部信息通信中心 一种聚合文件主题识别与归类系统

Also Published As

Publication number Publication date
CN112016438B (zh) 2021-08-10
CN112016438A (zh) 2020-12-01

Similar Documents

Publication Publication Date Title
WO2022042365A1 (zh) 一种基于图神经网络识别证件的方法及系统
US10726244B2 (en) Method and apparatus detecting a target
US20190278994A1 (en) Photograph driven vehicle identification engine
WO2018103608A1 (zh) 一种文字检测方法、装置及存储介质
US9275307B2 (en) Method and system for automatic selection of one or more image processing algorithm
WO2020082731A1 (zh) 电子装置、证件识别方法及存储介质
CN110781885A (zh) 基于图像处理的文本检测方法、装置、介质及电子设备
KR102435365B1 (ko) 증명서 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능한 저장 매체
WO2018233055A1 (zh) 保单信息录入的方法、装置、计算机设备及存储介质
US9384398B2 (en) Method and apparatus for roof type classification and reconstruction based on two dimensional aerial images
US11367310B2 (en) Method and apparatus for identity verification, electronic device, computer program, and storage medium
TW202111498A (zh) 指紋識別方法、晶片及電子裝置
CN111209827B (zh) 一种基于特征检测的ocr识别票据问题的方法及系统
US9418446B2 (en) Method and apparatus for determining a building location based on a building image
WO2017161636A1 (zh) 一种基于指纹的终端支付方法及装置
CN112396050B (zh) 图像的处理方法、设备以及存储介质
CN110795714A (zh) 一种身份验证方法、装置、计算机设备及存储介质
US9679218B2 (en) Method and apparatus for image matching
CN113627428A (zh) 文档图像矫正方法、装置、存储介质及智能终端设备
CN111160395A (zh) 图像识别方法、装置、电子设备和存储介质
CN114155365A (zh) 模型训练方法、图像处理方法及相关装置
CN113673413A (zh) 建筑图纸的审图方法、装置、计算机可读介质及电子设备
CN110287361B (zh) 一种人物图片筛选方法及装置
CN114663871A (zh) 图像识别方法、训练方法、装置、系统及存储介质
US20230186668A1 (en) Polar relative distance transformer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21860203

Country of ref document: EP

Kind code of ref document: A1