WO2022057471A1 - 票据识别方法、系统、计算机设备与计算机可读存储介质 - Google Patents

票据识别方法、系统、计算机设备与计算机可读存储介质 Download PDF

Info

Publication number
WO2022057471A1
WO2022057471A1 PCT/CN2021/109726 CN2021109726W WO2022057471A1 WO 2022057471 A1 WO2022057471 A1 WO 2022057471A1 CN 2021109726 W CN2021109726 W CN 2021109726W WO 2022057471 A1 WO2022057471 A1 WO 2022057471A1
Authority
WO
WIPO (PCT)
Prior art keywords
position information
frame
area
text
reference field
Prior art date
Application number
PCT/CN2021/109726
Other languages
English (en)
French (fr)
Inventor
王文浩
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022057471A1 publication Critical patent/WO2022057471A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, system, computer device and computer-readable storage medium for bill identification.
  • OCR optical character recognition
  • the iOCR custom template text recognition system developed by Baidu can realize the self-made recognition model by uploading a template image, establish the key-value correspondence between the text in the image, and realize the structure of the same layout image. identification function.
  • the product for bills 1.
  • the reference fixed field (key) and the area to be recognized (value) are misplaced; 2.
  • the recognition area contains multi-line text content, resulting in poor recognition effect.
  • the purpose of the embodiments of the present application is to provide a bill identification method, system, computer device and computer-readable storage medium, which improve the accuracy of multi-line identification.
  • an embodiment of the present application provides a method for identifying a ticket, including:
  • the template ticket picture includes a frame-selected first reference field area and a second reference field area;
  • Receive the picture of the bill to be processed input the picture of the bill to be processed into the text recognition model, identify the target text that matches the text, and frame the first recognition of the target text through the detection model area and first target location information;
  • the second target position information is adjusted to obtain the to-be-recognized area, and the text in the to-be-recognized area is recognized by the text recognition model, get the recognition result.
  • the embodiment of the present application also provides a bill identification system, including:
  • an acquisition module configured to acquire a template ticket picture, where the template ticket picture includes a frame-selected first reference field area and a second reference field area;
  • the recognition module is used to input the template bill picture into the text recognition model to recognize the text in the first reference field area, and output the first position information of the first reference field area and all the text through the detection model. the second location information of the second reference field area;
  • the detection module is used to receive the picture of the bill to be processed, input the picture of the bill to be processed into the text recognition model, identify the target text that matches the text, and frame the target through the detection model.
  • a building module for establishing a transformation matrix according to the first position information and the first target position information
  • a calculation module configured to calculate the second position information through the transformation matrix to obtain the second target position information of the second identification area in the to-be-processed bill picture
  • An adjustment and recognition module configured to adjust the second target position information according to the overlap value of the first position information and the second position information to obtain an area to be recognized, and identify the to-be-recognized area through the character recognition model Recognize the text in the area and get the recognition result.
  • an embodiment of the present application also provides a computer device, the computer device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the computer program is The processor implements the steps of the above-mentioned bill identification method when executed.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the At least one processor executes the steps of the ticket identification method as described above.
  • the bill identification method, system, computer device and computer-readable storage medium match the text in the picture of the bill to be processed through the text in the first reference field area of the template bill picture, and identify the target through the identification model.
  • the first target position area corresponding to the text is selected, and then the transformation matrix between the first position information corresponding to the first reference field area and the first target position information of the target text is established, and finally the second target position information is obtained through the transformation matrix.
  • the second target position information of the bill to be processed is adjusted by the IOU value of the first position information and the second position information, the situation that contains multi-line characters in the information area to be identified can be solved, and the multiple lines that cannot be identified in the prior art can be filled up. Blank for line text technology.
  • FIG. 1 is a flowchart of Embodiment 1 of the method for identifying a bill of the present application.
  • FIG. 2 is a flowchart of step S100 in Embodiment 1 of the present application.
  • FIG. 3 is a flowchart of step S102 in Embodiment 1 of the present application.
  • FIG. 4 is a flowchart of step S104 in Embodiment 1 of the present application.
  • FIG. 5 is a flowchart of step S106 in Embodiment 1 of the present application.
  • FIG. 6 is a flowchart of step S110 in Embodiment 1 of the present application.
  • FIG. 7 is a schematic diagram of a program module of Embodiment 2 of the ticket identification system of the present application.
  • FIG. 8 is a schematic diagram of a hardware structure of Embodiment 3 of a computer device of the present application.
  • FIG. 1 a flow chart of steps of a method for identifying a bill according to Embodiment 1 of the present application is shown. It can be understood that the flowchart in this embodiment of the method is not used to limit the sequence of executing steps. The following description will be exemplified by taking the computer device 2 as the execution subject. details as follows.
  • Step S100 Obtain a template ticket picture, where the template ticket picture includes the first reference field area and the second reference field area selected by the frame.
  • a bill corresponding to a bill picture with a clear picture, standard printing, less background interference and well-placed is selected as the template bill.
  • the field content of the text bar of the template ticket can be selected by manual frame as a reference field (key), that is, the first reference field area.
  • the content of the financial report data after the text bar is selected as the second reference field area.
  • the text bar includes information such as customer number, deposit date, account opening bank, account name, etc. It should be noted that the content of the reference field is unchanged in the same type of bill layout.
  • the step S100 includes:
  • Step S100A selecting a bill picture in a standard picture format as a template bill picture.
  • a plurality of bill pictures are received, and the bill pictures in the bill pictures with clear pictures, standard printing, less background interference and well-placed are used as template bill pictures.
  • Step S100B the field content of the text bar is selected as the first reference field area and the data content corresponding to the text bar is selected as the second reference field area.
  • the text bar area is frame-selected as the first reference field area
  • the filled-in data content corresponding to the frame-selected text bar that is, the area of financial report data
  • the second reference field area There are multiple text bar boxes, and the number of fields in the data content is guaranteed to be more than 5 (recommended 8-10), and they should be distributed in the entire bill picture as much as possible; the field content of a single text bar cannot span across lines, and the recommended number of words is 4
  • the field content of the text bar is unique; the field content of the text bar selected by the box should not contain various symbols and patterns as much as possible.
  • Step S102 input the template bill picture into a text recognition model to identify the text in the first reference field area, and output the first position information of the first reference field area and the first reference field area through the detection model. 2.
  • the second position information of the reference field area is input the template bill picture into a text recognition model to identify the text in the first reference field area, and output the first position information of the first reference field area and the first reference field area through the detection model. 2.
  • the text recognition model recognizes the text in the first reference field area, and identifies the first position information of the first reference field area by the frame check mark of the first reference field area through the detection model.
  • the second position information of the second reference field area of the box check mark, and the detection model may be a bbox algorithm.
  • the first position information may include frame coordinates in the first reference field area, that is, the position of the text bar in the picture, including a plurality of area vertex coordinates.
  • the second position information is the second reference field area of the field information (value) corresponding to the text bar.
  • the frame selection can be performed according to the border regression algorithm (bbox), and the area can be marked by highlighting, color, underline, etc.
  • the second position information is the coordinate position information of the second reference field area in the template ticket.
  • the text recognition model is trained according to the sample bill pictures, and can identify the field content of the text bar of each sample bill picture, and output the sample text that matches the text of the sample bill picture.
  • the step S102 includes:
  • Step S102A performing frame identification on the first reference field area and the second reference field area by the detection model, to obtain the first frame vertex of the first reference field area and the second reference field area corresponding to the second reference field area.
  • a plurality of first frame vertices corresponding to each first reference field area, and a plurality of first frame vertices corresponding to each first reference field area, and the corresponding A plurality of second border vertices generally four.
  • Step S102B establishing a coordinate system using any vertex of the template bill image as the coordinate origin, and obtaining the first frame coordinates of the first frame vertex and the second frame coordinates of the second frame vertex, wherein the first frame
  • the position information includes a plurality of first frame coordinates
  • the second position information includes a plurality of second frame coordinates.
  • any vertex of the template bill image is selected as the coordinate origin to establish a coordinate system, and the first frame vertex and the second frame vertex are mapped in the coordinate system to obtain the first frame coordinates and the second frame coordinates.
  • Step S104 Receive the picture of the bill to be processed, input the picture of the bill to be processed into the text recognition model, identify the target text that matches the text, and frame and mark the target text through the detection model. The first identification area and the first target location information.
  • the to-be-processed bill is a financial bill of the same format as the template bill, and can be uploaded by taking a photo or scanning to be uploaded for acquisition. Input the image of the bill to be processed into the text recognition model, output the target text matching the text in the first reference field area, then frame the position information of the marked target text, and frame the first recognition area through the detection model. Since the pending ticket and the template ticket are in the same layout, the identification is easier and faster.
  • the first target location information is the location area corresponding to the text bar in the to-be-processed receipt image, including the coordinate position of the text bar in the receipt image. In this embodiment, it refers to the coordinate location of the area corresponding to the target information, which is composed of multiple vertex coordinates.
  • the step S104 includes:
  • Step S104A performing frame selection on the first recognition area by using the detection model to obtain a plurality of third frame vertices.
  • the position information of the marked target text is framed, and the first recognition area is framed by the detection model to obtain a plurality of third frame vertices corresponding to the first recognition area.
  • Step S104B establishing a coordinate system with any vertex of the to-be-processed bill image as the coordinate origin, and obtaining third frame coordinates of the third frame vertex, wherein the first target position information includes a plurality of third frame coordinates.
  • a coordinate system is established by taking any vertex of the image to be processed as the coordinate origin, and the third frame coordinates of the vertex of the third frame are obtained.
  • the coordinate system can be consistent with the coordinate system established by the template ticket image, so as to better establish the transformation matrix .
  • the number of coordinates of the third frame is the same as the number of coordinates of the first frame, generally four.
  • Step S106 establishing a transformation matrix according to the first position information and the first target position information.
  • a transformation matrix of the two which can also be said to be an affine matrix, is established to establish a template The link between the ticket image and the pending ticket image.
  • Affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, which maintains the "straightness” (the straight line is still a straight line after transformation) and "parallelism” (the difference between two-dimensional graphics) of two-dimensional graphics. The relative positional relationship between them remains unchanged, the parallel lines are still parallel lines, and the positional order of the points on the straight line remains unchanged).
  • Any affine transformation can be represented by multiplying by a matrix (linear transformation) followed by a vector (translation). If the picture of the bill to be processed is in a tilted state when the picture is taken, but the handwriting is clear, the picture can be corrected by the transformation matrix. That is, according to the field information corresponding to the template, it is compared with the position information to be identified, and then further corrected by the transformation matrix, so that the picture of the bill to be processed is displayed in front, which is convenient for bill identification.
  • the step S106 includes:
  • Step S106A Obtain the first frame coordinates of the first position information and the third frame coordinates of the first target position information.
  • a plurality of first frame coordinates in the first position information and a plurality of third frame coordinates in the first target position information are acquired.
  • Step S106B transform the coordinates of the first frame to obtain the coordinates of the third frame.
  • each first frame coordinate and each third frame coordinate are in one-to-one correspondence, a first coordinate matrix and a second coordinate matrix are established, and the first coordinate matrix is subjected to transformation operations such as rotation, translation, scaling, and transposition. , get the second coordinate matrix.
  • Step S106C establishing the transformation matrix according to the transformation relationship between the coordinates of the first frame and the coordinates of the third frame.
  • the affine matrix is established, which can be understood as: the first coordinate matrix obtains the second coordinate matrix through linear or nonlinear transformation, and the affine matrix is established according to the linear or nonlinear transformation.
  • Step S108 Calculate the second position information by using the transformation matrix to obtain the second target position information of the second identification area in the to-be-processed bill picture.
  • the second reference field area and the second identification area belong to the same area and have different sizes.
  • the second target position information can be inferred from the second position information. That is, affine transformation is performed on the second position information to obtain the second target position information.
  • Step S110 according to the overlap value of the first position information and the second position information, adjust the second target position information to obtain the area to be recognized, and identify the area to be recognized through the character recognition model. text to get the recognition result.
  • IOU Intersection of the first reference field area and the second reference field area in the template ticket picture Over Union, overlap
  • IOU Intersection of the first reference field area and the second reference field area in the template ticket picture Over Union, overlap
  • the numerator calculate the area of overlap between the prediction box and ground-truth; the denominator is the union area, or more simply, the total area contained by the prediction box and ground-truth.
  • the ratio of the overlap area to the union area is the IOU.
  • the specific data can be adjusted and output by using a regular expression. For example, standardize the output of four specific fields of "pure number”, “lowercase amount”, “uppercase amount” and “date” in the bill. Adjust through regular expressions, for example: the date identified in the ticket is April 22, 2020, and the standardized output is: 2020-04-22.
  • the step S110 includes:
  • Step S110A obtaining an overlap value of the first position information and the second position information.
  • the first frame coordinates of each first reference field area and the second frame coordinates of each second reference field area are obtained, and the IOU value is calculated according to the first reference field area and the second frame coordinates.
  • Step S110B Calculate a target overlap value of the first target position information and the second target position information.
  • Step S110C Adjust the second target position information so that the target overlap degree value is equal to the overlap degree value to obtain a to-be-identified area.
  • the second target position information is adjusted according to the IOU value of the template bill picture, so as to select the area to be recognized by a more precise frame. For example, if the difference between the IOU values is greater than a certain preset threshold, that is, the adjustment is completed, and the area to be identified is intercepted.
  • the method further includes:
  • uploading the identification results to the blockchain can ensure its security and fairness and transparency to users.
  • the user equipment can download the summary information from the blockchain to verify whether the financial report data has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 7 shows a schematic diagram of program modules in Embodiment 2 of the ticket identification system of the present application.
  • the ticket identification system 20 may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors to complete the present invention.
  • the program module referred to in the embodiments of the present application refers to a series of computer-readable instructions capable of performing specific functions, and is more suitable for describing the execution process of the ticket identification system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
  • the obtaining module 200 is configured to obtain a template bill picture, where the template bill picture includes a frame-selected first reference field area and a second reference field area.
  • a bill corresponding to a bill picture with a clear picture, standard printing, less background interference and well-placed is selected as the template bill.
  • the field content of the text bar of the template ticket can be selected by manual frame as a reference field (key), that is, the first reference field area.
  • the content of the financial report data after the text bar is selected as the second reference field area.
  • the text bar includes information such as customer number, deposit date, account opening bank, account name, etc. It should be noted that the content of the reference field is unchanged in the same type of bill layout.
  • the obtaining module 200 is further configured to:
  • a plurality of bill pictures are received, and the bill pictures in the bill pictures with clear pictures, standard printing, less background interference and well-placed are used as template bill pictures.
  • the field content of the text bar is selected as the first reference field area and the data content corresponding to the text bar is selected as the second reference field area.
  • the text bar area is frame-selected as the first reference field area
  • the filled-in data content corresponding to the frame-selected text bar that is, the area of financial report data
  • the second reference field area There are multiple text bar boxes, and the number of fields in the data content is guaranteed to be more than 5 (recommended 8-10), and they should be distributed in the entire bill picture as much as possible; the field content of a single text bar cannot span across lines, and the recommended number of words is 4
  • the field content of the text bar is unique; the field content of the text bar selected by the box should not contain various symbols and patterns as much as possible.
  • the recognition module 202 is used to input the template bill picture into the text recognition model to recognize the text in the first reference field area, and output the first position information of the first reference field area and the first position information of the first reference field area through the detection model. second location information of the second reference field area.
  • the text recognition model recognizes the text in the first reference field area, and identifies the first position information of the first reference field area by the frame check mark of the first reference field area through the detection model.
  • the second position information of the second reference field area of the box check mark, and the detection model may be a bbox algorithm.
  • the first position information may include frame coordinates in the first reference field area, that is, the position of the text bar in the picture, including a plurality of area vertex coordinates.
  • the second position information is the second reference field area of the field information (value) corresponding to the text bar.
  • the frame selection can be performed according to the border regression algorithm (bbox), and the area can be marked by highlighting, color, underline, etc.
  • the second position information is the coordinate position information of the second reference field area in the template ticket.
  • the text recognition model is trained according to the sample bill pictures, and can identify the field content of the text bar of each sample bill picture, and output the sample text that matches the text of the sample bill picture.
  • the identification module 202 is also used for:
  • the first reference field area and the second reference field area are identified by the detection model, and the first frame vertex of the first reference field area and the second frame vertex corresponding to the second reference field area are obtained. .
  • a plurality of first frame vertices corresponding to each first reference field area, and a plurality of first frame vertices corresponding to each first reference field area, and the corresponding A plurality of second border vertices generally four.
  • a coordinate system is established by using any vertex of the template ticket image as the coordinate origin, and the first frame coordinates of the first frame vertex and the second frame coordinates of the second frame vertex are obtained, wherein the first position information includes: a plurality of first frame coordinates, and the second position information includes a plurality of second frame coordinates.
  • any vertex of the template bill image is selected as the coordinate origin to establish a coordinate system, and the first frame vertex and the second frame vertex are mapped in the coordinate system to obtain the first frame coordinates and the second frame coordinates.
  • the detection module 204 is configured to receive the picture of the bill to be processed, input the picture of the bill to be processed into the text recognition model, identify the target text that matches the text, and mark the text through the detection model. The first recognition area of the target text and the first target position information.
  • the to-be-processed bill is a financial bill of the same format as the template bill, and can be uploaded by taking a photo or scanning to be uploaded for acquisition. Input the image of the bill to be processed into the text recognition model, output the target text matching the text in the first reference field area, then frame the position information of the marked target text, and frame the first recognition area through the detection model. Since the pending ticket and the template ticket are in the same layout, the identification is easier and faster.
  • the first target location information is the location area corresponding to the text bar in the to-be-processed receipt image, including the coordinate position of the text bar in the receipt image. In this embodiment, it refers to the coordinate location of the area corresponding to the target information, which is composed of multiple vertex coordinates.
  • the detection module 204 is also used for:
  • the first recognition area is frame-selected by the detection model to obtain a plurality of third frame vertices.
  • the position information of the marked target text is framed, and the first recognition area is framed by the detection model to obtain a plurality of third frame vertices corresponding to the first recognition area.
  • a coordinate system is established with any vertex of the to-be-processed bill image as a coordinate origin, and third frame coordinates of the third frame vertex are obtained, wherein the first target position information includes a plurality of third frame coordinates.
  • a coordinate system is established by taking any vertex of the image to be processed as the coordinate origin, and the third frame coordinates of the vertex of the third frame are obtained.
  • the coordinate system can be consistent with the coordinate system established by the template ticket image, so as to better establish the transformation matrix .
  • the number of coordinates of the third frame is the same as the number of coordinates of the first frame, generally four.
  • the construction module 206 is configured to establish a transformation matrix according to the first position information and the first target position information.
  • a transformation matrix of the two which can also be said to be an affine matrix, is established to establish a template The link between the ticket image and the pending ticket image.
  • Affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, which maintains the "straightness” (the straight line is still a straight line after transformation) and "parallelism” (the difference between two-dimensional graphics) of two-dimensional graphics. The relative positional relationship between them remains unchanged, the parallel lines are still parallel lines, and the positional order of the points on the straight line remains unchanged).
  • Any affine transformation can be represented by multiplying by a matrix (linear transformation) followed by a vector (translation). If the picture of the bill to be processed is in a tilted state when the picture is taken, but the handwriting is clear, the picture can be straightened and corrected through the transformation matrix. That is, according to the field information corresponding to the template, it is compared with the position information to be identified, and then further corrected by the transformation matrix, so that the picture of the bill to be processed is displayed in an upright position, which is convenient for bill identification.
  • the building block 206 is also used to:
  • the first frame coordinates of the first position information and the third frame coordinates of the first target position information are acquired.
  • a plurality of first frame coordinates in the first position information and a plurality of third frame coordinates in the first target position information are acquired.
  • each first frame coordinate and each third frame coordinate are in one-to-one correspondence, a first coordinate matrix and a second coordinate matrix are established, and the first coordinate matrix is subjected to transformation operations such as rotation, translation, scaling, and transposition. , get the second coordinate matrix.
  • the transformation matrix is established according to the transformation relationship between the coordinates of the first frame and the coordinates of the third frame.
  • the affine matrix is established, which can be understood as: the first coordinate matrix obtains the second coordinate matrix through linear or nonlinear transformation, and the affine matrix is established according to the linear or nonlinear transformation.
  • the calculation module 208 is configured to calculate the second position information through the transformation matrix to obtain the second target position information of the second identification area in the to-be-processed bill picture.
  • the second reference field area and the second identification area belong to the same area and have different sizes.
  • the second target position information can be inferred from the second position information. That is, affine transformation is performed on the second position information to obtain the second target position information.
  • the adjustment and recognition module 210 is configured to adjust the second target position information according to the overlap value of the first position information and the second position information to obtain an area to be recognized, and to recognize the text recognition model through the text recognition model. The text in the area to be recognized, and the recognition result is obtained.
  • IOU Intersection of the first reference field area and the second reference field area in the template ticket picture Over Union, overlap
  • IOU Intersection of the first reference field area and the second reference field area in the template ticket picture Over Union, overlap
  • the numerator calculate the area of overlap between the prediction box and ground-truth; the denominator is the union area, or more simply, the total area contained by the prediction box and ground-truth.
  • the ratio of the overlap area to the union area is the IOU.
  • the specific data can be adjusted and output by using a regular expression. For example, standardize the output of four specific fields of "pure number”, “lowercase amount”, “uppercase amount” and “date” in the bill. Adjust through regular expressions, for example: the date identified in the ticket is April 22, 2020, and the standardized output is: 2020-04-22.
  • the adjustment identification module 210 is further configured to:
  • Step S110A obtaining an overlap value of the first position information and the second position information.
  • the first frame coordinates of each first reference field area and the second frame coordinates of each second reference field area are obtained, and the IOU value is calculated according to the first reference field area and the second frame coordinates.
  • a target overlap value of the first target position information and the second target position information is calculated.
  • the second target position information is adjusted so that the target overlap degree value is equal to the overlap degree value to obtain the area to be identified.
  • the second target position information is adjusted according to the IOU value of the template bill picture, so as to select the area to be recognized by a more precise frame. For example, if the difference between the IOU values is greater than a certain preset threshold, that is, the adjustment is completed, and the area to be identified is intercepted.
  • FIG. 8 it is a schematic diagram of a hardware architecture of a computer device according to Embodiment 3 of the present application.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • the computer equipment 2 may be a rack-type server, a blade-type server, a tower-type server or a cabinet-type server (including an independent server, or a server cluster composed of multiple servers) and the like.
  • the server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution network (Content Distribution Network) Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • the computer device 2 at least includes, but is not limited to, a memory 21 , a processor 22 , a network interface 23 , and a ticket identification system 20 that can communicate with each other through a system bus. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), a random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 21 may be an internal storage unit of the computer device 2 , such as a hard disk or a memory of the computer device 2 .
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk equipped on the computer device 2, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2 , such as program codes of the ticket identification system 20 of the second embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 22 is typically used to control the overall operation of the computer device 2 .
  • the processor 22 is used for running the program code or processing data stored in the memory 21, for example, running the ticket identification system 20, so as to realize the ticket identification method of the first embodiment.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the server 2 and other electronic devices.
  • the network interface 23 is used to connect the server 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the server 2 and the external terminal.
  • the network can be an intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), Wideband Code Division Multiple Access (Wideband Code Division) Multiple Access, WCDMA), 4G network, 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 8 only shows the computer device 2 having components 20-23, but it should be understood that it is not required to implement all of the shown components, and that more or less components may be implemented instead.
  • the bill recognition system 20 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and are composed of one or more program modules.
  • a processor in this embodiment, the processor 22 is executed to complete the present application.
  • FIG. 7 shows a schematic diagram of program modules for implementing the second embodiment of the ticket identification system 20.
  • the ticket identification system 20 can be divided into an acquisition module 200, an identification module 202, a detection module 204, a construction module 206 , calculation module 208 and adjustment identification module 210 .
  • the program module referred to in this application refers to a series of computer-readable instructions capable of completing specific functions, and is more suitable for describing the execution process of the ticket identification system 20 in the computer device 2 than a program.
  • the specific functions of the program modules 200-210 have been described in detail in the second embodiment, and are not repeated here.
  • This embodiment also provides a computer-readable storage medium, which may be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type storage (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory , magnetic disks, optical discs, servers, App application malls, etc., on which computer-readable instructions are stored, and when executed by the processor, implement corresponding functions.
  • the computer-readable storage medium of this embodiment is used to store a computer-readable instruction for ticket identification, and when executed by a processor, implements the ticket identification method of the first embodiment.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)

Abstract

本申请还涉及人工智能技术领域。本申请公开了一种票据识别方法、系统、计算机设备与计算机可读存储介质,方法包括:通过模板票据图片的第一参考字段区域中的文本对待处理票据图片中文本的进行匹配,得到目标文本,并通过识别模型将目标文本对应的第一目标位置区域进行框选,再建立第一参考字段区域对应的第一位置信息与目标文本的第一目标位置信息之间的变换矩阵,最后通过变换矩阵获取第二目标位置信息;且对待处理票据的第二目标位置信息通过第一位置信息与第二位置信息的重叠度值进行调整。本申请实施例的有益效果在于:提高了多行识别的精确度。

Description

票据识别方法、系统、计算机设备与计算机可读存储介质
本申请要求于2020年09月17日提交中国专利局、申请号为202010977474.7,发明名称为“票据识别方法、系统、计算机设备与计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种票据识别方法、系统、计算机设备与计算机可读存储介质。
背景技术
伴随着企业经营管理规模的扩大与业务的拓展,财务部门需要审核整理票据的工作量也是与日俱增,每月都要面对大量的纸质票据,并将这些纸质票据中的内容转化为可存储的结构化信息。
目前而言,多数中小企业对于票据的处理主要依赖于人工手动进行票据信息的录入,通过人工录入方式不仅效率低、周期长,而且在高强度重复工作压力下难免精神懈怠导致工作错误。对于技术高度发达的今天,显然这种模式已经不能满足人们对效率和智能化的追求。针对上述问题,可以通过光学字符识别(OCR)技术,借助光学设备,使用票据识别技术,实现票信息结构化自动录入。
现有产品中,百度开发的iOCR自定义模板文字识别系统能够实现只需上传一张模板图片,即可自助制作识别模型,建立图片中文字的key-value对应关系,实现对相同版式图片的结构化识别功能。但发明人发现该产品对于票据中:1、参考固定字段(key)与待识别区域(value)打印错位;2、识别区域中含有多行文字内容的情况,导致识别效果较差。
发明内容
有鉴于此,本申请实施例的目的是提供一种票据识别方法、系统、计算机设备与计算机可读存储介质,提高了多行识别的精确度。
为实现上述目的,本申请实施例提供了一种票据识别方法,包括:
获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域;
将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息;
接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息;
根据所述第一位置信息与所述第一目标位置信息建立变换矩阵;
通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息;
根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
为实现上述目的,本申请实施例还提供了一种票据识别系统,包括:
获取模块,用于获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域;
识别模块,用于将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息;
检测模块,用于接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息;
构建模块,用于根据所述第一位置信息与所述第一目标位置信息建立变换矩阵;
计算模块,用于通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息;
调整识别模块,用于根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
为实现上述目的,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上所述的票据识别方法的步骤。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行如上所述的票据识别方法的步骤。
本申请实施例提供的票据识别方法、系统、计算机设备与计算机可读存储介质,通过模板票据图片的第一参考字段区域中的文本对待处理票据图片中文本的进行匹配,并通过识别模型将目标文本对应的第一目标位置区域进行框选,再建立第一参考字段区域对应的第一位置信息与目标文本的第一目标位置信息之间的变换矩阵,最后通过变换矩阵获取第二目标位置信息;且对待处理票据的第二目标位置信息通过第一位置信息与第二位置信息的IOU值进行调整,可解决待识别信息区域中含有多行文字的情况,填补了现有技术中无法识别多行文字技术的空白。
附图说明
图1为本申请票据识别方法实施例一的流程图。
图2为本申请实施例一中步骤S100的流程图。
图3为本申请实施例一中步骤S102的流程图。
图4为本申请实施例一中步骤S104的流程图。
图5为本申请实施例一中步骤S106的流程图。
图6为本申请实施例一中步骤S110的流程图。
图7为本申请票据识别系统实施例二的程序模块示意图。
图8为本申请计算机设备实施例三的硬件结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
实施例一
参阅图1,示出了本申请实施例一之票据识别方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。具体如下。
步骤S100,获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域。
具体地,从多个标准票据图片中,选取图片清晰、打印标准、背景干扰少、摆放端正的票据图片对应的票据,作为模板票据。可通过人工框选出模板票据的文字条的字段内容,作为参考字段(key),即第一参考字段区域。相应的,文字条后的财报数据内容进行框选,作为第二参考字段区域。以单位定期存款开户证实书的票据为例,文字条包括客户号、存款日期、开户银行、户名等信息,需要说明的是,参考字段的内容在同类型的票据版式中是不变的。
示例性地,参阅图2,所述步骤S100包括:
步骤S100A,选取标准图片格式的票据图片作为模板票据图片。
具体地,接收多个票据图片,将票据图片中的图片清晰、打印标准、背景干扰少、摆放端正的票据图片,作为模板票据图片。
步骤S100B,框选出文字条的字段内容作为第一参考字段区域以及所述文字条对应的数据内容作为第二参考字段区域。
具体地,对文字条区域进行框选,作为第一参考字段区域,而框选的文字条对应的填写的数据内容,即财报数据的区域,作为第二参考字段区域。文字条框选有多个,数据内容的字段个数保证在5个以上(推荐8-10个),并尽量分布在整张票据图片中;单个文字条的字段内容不可跨行,推荐字数在4个左右;文字条的字段内容是唯一的;框选的文字条的字段内容尽量不要包含各种符号、图案。
步骤S102,将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息。
具体地,文字识别模型将第一参考字段区域的文本识别出来,并通过检测模型将第一参考字段区域的框选标记识别出第一参考字段区域的第一位置信息,根据第二参考字段区域的框选标记第二参考字段区域的第二位置信息,检测模型可以为bbox算法。第一位置信息可以包括第一参考字段区域中的边框坐标,即文字条在图片中的位置,包括有多个区域顶点坐标。第二位置信息是文字条对应的字段信息(value)的第二参考字段区域,可根据边框回归算法(bbox)进行识别框选,可通过高亮或者颜色、下划线等进行区域标记。第二位置信息是第二参考字段区域在模板票据中的坐标位置信息。文字识别模型根据样本票据图片进行训练得到,可识别出每张样本票据图片的文字条的字段内容,并输出与样本票据图片的文本匹配的样本文本。
示例性地,参阅图3,所述步骤S102包括:
步骤S102A,通过所述检测模型对所述第一参考字段区域与所述第二参考字段区域进行边框识别,得到第一参考字段区域的第一边框顶点与所述第二参考字段区域对应的第二边框顶点。
具体地,通过检测模型识别出第一参考字段区域与第二参考字段区域的边框标记,得到每个第一参考字段区域对应的多个第一边框顶点,以及每个第二参考字段区域对应的多个第二边框顶点,一般为4个。
步骤S102B,将所述模板票据图片的任意顶点作为坐标原点建立坐标系,得到所述第一边框顶点的第一边框坐标与所述第二边框顶点的第二边框坐标,其中,所述第一位置信息包括多个第一边框坐标,所述第二位置信息包括多个第二边框坐标。
具体地,选取模板票据图片的任意一个顶点作为坐标原点建立坐标系,将第一边框顶点与第二边框顶点映射在该坐标系里,得到第一边框坐标与第二边框坐标。
步骤S104,接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息。
具体地,待处理票据是模板票据的同类版式的财务票据,可以通过拍照上传,或者扫描进行上传,以进行获取。将待处理票据图片输入至文字识别模型中,输出与第一参考字段区域的文本匹配的目标文本,再框选标记目标文本的位置信息,并通过检测模型对第一识别区域进行框选。由于待处理票据与模板票据是同一版式,识别更加简便快捷。第一目标位置信息是待处理票据图片中文字条对应的位置区域,包括文字条在票据图片的坐标位置,本实施例中指的是目标信息对应的区域的坐标位置,由多个顶点坐标组成。
示例性地,参阅图4,所述步骤S104包括:
步骤S104A,通过所述检测模型对所述第一识别区域进行框选,得到多个第三边框顶点。
具体地,框选标记目标文本的位置信息,并通过检测模型对第一识别区域进行框选,得到第一识别区域对应的多个第三边框顶点。
步骤S104B,以所述待处理票据图片的任意顶点作为坐标原点建立坐标系,得到所述第三边框顶点的第三边框坐标,其中,所述第一目标位置信息包括多个第三边框坐标。
具体地,将待处理票据图片的任意顶点作为坐标原点建立坐标系,得到第三边框顶点的第三边框坐标,该坐标系可以与模板票据图片建立的坐标系一致,以更好的建立变换矩阵。第三边框坐标的个数与第一边框坐标的坐标个数一致,一般为4个。
步骤S106,根据所述第一位置信息与所述第一目标位置信息建立变换矩阵。
具体地,当识别到待处理图票据上的文字条中的字段内容与模板票据上的文字条中的字段内容一致时,建立两者的变换矩阵,也可以说是仿射矩阵,以建立模板票据图片与待处理票据图片之间的联系。仿射变换是一种二维坐标到二维坐标之间的线性变换,它保持了二维图形的“平直性”(直线经过变换之后依然是直线)和“平行性”(二维图形之间的相对位置关系保持不变,平行线依然是平行线,且直线上点的位置顺序不变)。任意的仿射变换都能表示为乘以一个矩阵(线性变换),再加上一个向量 (平移) 的形式。若待处理票据的图片拍照时处于倾斜状态,但字迹清晰,可通过变换矩阵对该图片进行正放矫正处理。即,根据模板对应的字段信息,与待识别位置信息进行对比,再通过变换矩阵进行进一步校正,以将待处理票据图片进行正放显示,便于票据识别。
示例性地,参阅图5,所述步骤S106包括:
步骤S106A,获取所述第一位置信息的第一边框坐标与所述第一目标位置信息的第三边框坐标。
具体地,根据上述坐标系的建立,获取第一位置信息中的多个第一边框坐标与第一目标位置信息中的多个第三边框坐标。
步骤S106B,将所述第一边框坐标进行变换得到所述第三边框坐标。
具体地,将每个第一边框坐标与每个第三边框坐标一一对应,建立第一坐标矩阵与第二坐标矩阵,将第一坐标矩阵经历旋转、平移、缩放、转置等变换操作后,得到第二坐标矩阵。
步骤S106C,根据所述第一边框坐标与所述第三边框坐标之间的变换关系建立所述变换矩阵。
具体地,根据变换的步骤,建立仿射矩阵,可以理解为,第一坐标矩阵通过线性或者非线性变换得到第二坐标矩阵,根据线性或者非线性变换建立仿射矩阵。
步骤S108,通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息。
具体地,第二参考字段区域与第二识别区域属于同一区域,大小有不同。根据文字条之间的仿射变换,由第二位置信息可推断出第二目标位置信息。即对第二位置信息进行仿射变换,得到第二目标位置信息。
步骤S110,根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
具体地,对模板票据图片中的第一参考字段区域与第二参考字段区域的IOU(Intersection over Union,重叠度)值,来调整待处理票据图片的第一目标位置区域与第二目标位置区域之间的IOU值,进而调整第二目标位置信息,裁剪出待识别区域。计算IOU的公式为:IOU=重合区域/预测区域+实际区域,可以看到IOU是一个比值,即交并比。在分子中,计算预测框和ground-truth之间的重叠区域;分母是并集区域,或者更简单地说,是预测框和ground-truth所包含的总区域。重叠区域和并集区域的比值,就是IOU。可以理解为,首先算出模板票据图片的第一位置信息与第二位置信息的IOU值,再计算出第一目标位置信息与第二目标位置信息的目标IOU值,最后根据两者的IOU值,调整第二目标位置信息,使得目标IOU值靠近IOU值。最后通过文字识别模块,对待识别区域进行内容识别,得到识别结果。
示例性地,通过文字识别模型对待识别区域的内容进行识别时,可通过正则表达式对特定数据进行调整再输出。例如:对票据中“纯数字”、“小写金额”、“大写金额”、“日期”这四个特定字段进行标准化输出。通过正则表达式进行调整,例如:票据中识别出日期为2020年4月22日,标准化输出为:2020-04-22。
示例性地,参阅图6,所述步骤S110包括:
步骤S110A,获取所述第一位置信息与所述第二位置信息的重叠度值。
具体地,获取每个第一参考字段区域的第一边框坐标,每个第二参考字段区域的第二边框坐标,根据第一参考字段区域与第二边框坐标进行IOU值的计算。
步骤S110B,计算所述第一目标位置信息与所述第二目标位置信息的目标重叠度值。
具体地,获取第一目标位置信息中的第三边框坐标与第二目标位置信息的第四边框坐标,根据重叠度值的计算公式,计算第三边框坐标与第四边框坐标的目标重叠度值。
步骤S110C,调整所述第二目标位置信息,以使所述目标重叠度值等于所述重叠度值,得到待识别区域。
具体地,根据模板票据图片的IOU值,对第二目标位置信息进行调整,以更加精确的框选出待识别区域。例如:IOU值之差大于某个预设阈值,即调整结束,对待识别区域进行截取。
示例性地,所述方法还包括:
将所述识别结果上传至区块链进行存储。
具体地,将识别结果上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该摘要信息,以便查证财报数据是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
实施例二
请继续参阅图7,示出了本申请票据识别系统实施例二的程序模块示意图。在本实施例中,票据识别系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述票据识别方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机可读指令,比程序本身更适合于描述票据识别系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:
获取模块200,用于获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域。
具体地,从多个标准票据图片中,选取图片清晰、打印标准、背景干扰少、摆放端正的票据图片对应的票据,作为模板票据。可通过人工框选出模板票据的文字条的字段内容,作为参考字段(key),即第一参考字段区域。相应的,文字条后的财报数据内容进行框选,作为第二参考字段区域。以单位定期存款开户证实书的票据为例,文字条包括客户号、存款日期、开户银行、户名等信息,需要说明的是,参考字段的内容在同类型的票据版式中是不变的。
示例性地,所述获取模块200还用于:
选取标准图片格式的票据图片作为模板票据图片。
具体地,接收多个票据图片,将票据图片中的图片清晰、打印标准、背景干扰少、摆放端正的票据图片,作为模板票据图片。
框选出文字条的字段内容作为第一参考字段区域以及所述文字条对应的数据内容作为第二参考字段区域。
具体地,对文字条区域进行框选,作为第一参考字段区域,而框选的文字条对应的填写的数据内容,即财报数据的区域,作为第二参考字段区域。文字条框选有多个,数据内容的字段个数保证在5个以上(推荐8-10个),并尽量分布在整张票据图片中;单个文字条的字段内容不可跨行,推荐字数在4个左右;文字条的字段内容是唯一的;框选的文字条的字段内容尽量不要包含各种符号、图案。
识别模块202,用于将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息。
具体地,文字识别模型将第一参考字段区域的文本识别出来,并通过检测模型将第一参考字段区域的框选标记识别出第一参考字段区域的第一位置信息,根据第二参考字段区域的框选标记第二参考字段区域的第二位置信息,检测模型可以为bbox算法。第一位置信息可以包括第一参考字段区域中的边框坐标,即文字条在图片中的位置,包括有多个区域顶点坐标。第二位置信息是文字条对应的字段信息(value)的第二参考字段区域,可根据边框回归算法(bbox)进行识别框选,可通过高亮或者颜色、下划线等进行区域标记。第二位置信息是第二参考字段区域在模板票据中的坐标位置信息。文字识别模型根据样本票据图片进行训练得到,可识别出每张样本票据图片的文字条的字段内容,并输出与样本票据图片的文本匹配的样本文本。
示例性地,所述识别模块202还用于:
通过所述检测模型对所述第一参考字段区域与所述第二参考字段区域进行边框识别,得到第一参考字段区域的第一边框顶点与所述第二参考字段区域对应的第二边框顶点。
具体地,通过检测模型识别出第一参考字段区域与第二参考字段区域的边框标记,得到每个第一参考字段区域对应的多个第一边框顶点,以及每个第二参考字段区域对应的多个第二边框顶点,一般为4个。
将所述模板票据图片的任意顶点作为坐标原点建立坐标系,得到所述第一边框顶点的第一边框坐标与所述第二边框顶点的第二边框坐标,其中,所述第一位置信息包括多个第一边框坐标,所述第二位置信息包括多个第二边框坐标。
具体地,选取模板票据图片的任意一个顶点作为坐标原点建立坐标系,将第一边框顶点与第二边框顶点映射在该坐标系里,得到第一边框坐标与第二边框坐标。
检测模块204,用于接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息。
具体地,待处理票据是模板票据的同类版式的财务票据,可以通过拍照上传,或者扫描进行上传,以进行获取。将待处理票据图片输入至文字识别模型中,输出与第一参考字段区域的文本匹配的目标文本,再框选标记目标文本的位置信息,并通过检测模型对第一识别区域进行框选。由于待处理票据与模板票据是同一版式,识别更加简便快捷。第一目标位置信息是待处理票据图片中文字条对应的位置区域,包括文字条在票据图片的坐标位置,本实施例中指的是目标信息对应的区域的坐标位置,由多个顶点坐标组成。
示例性地,所述检测模块204还用于:
通过所述检测模型对所述第一识别区域进行框选,得到多个第三边框顶点。
具体地,框选标记目标文本的位置信息,并通过检测模型对第一识别区域进行框选,得到第一识别区域对应的多个第三边框顶点。
以所述待处理票据图片的任意顶点作为坐标原点建立坐标系,得到所述第三边框顶点的第三边框坐标,其中,所述第一目标位置信息包括多个第三边框坐标。
具体地,将待处理票据图片的任意顶点作为坐标原点建立坐标系,得到第三边框顶点的第三边框坐标,该坐标系可以与模板票据图片建立的坐标系一致,以更好的建立变换矩阵。第三边框坐标的个数与第一边框坐标的坐标个数一致,一般为4个。
构建模块206,用于根据所述第一位置信息与所述第一目标位置信息建立变换矩阵。
具体地,当识别到待处理图票据上的文字条中的字段内容与模板票据上的文字条中的字段内容一致时,建立两者的变换矩阵,也可以说是仿射矩阵,以建立模板票据图片与待处理票据图片之间的联系。仿射变换是一种二维坐标到二维坐标之间的线性变换,它保持了二维图形的“平直性”(直线经过变换之后依然是直线)和“平行性”(二维图形之间的相对位置关系保持不变,平行线依然是平行线,且直线上点的位置顺序不变)。任意的仿射变换都能表示为乘以一个矩阵(线性变换),再加上一个向量 (平移) 的形式。若待处理票据的图片拍照时处于倾斜状态,但字迹清晰,可通过变换矩阵对该图片进行正放矫正处理。即,根据模板对应的字段信息,与待识别位置信息进行对比,再通过变换矩阵进行进一步校正,以将待处理票据图片进行正放显示,便于票据识别。
示例性地,所述构建模块206还用于:
获取所述第一位置信息的第一边框坐标与所述第一目标位置信息的第三边框坐标。
具体地,根据上述坐标系的建立,获取第一位置信息中的多个第一边框坐标与第一目标位置信息中的多个第三边框坐标。
将所述第一边框坐标进行变换得到所述第三边框坐标。
具体地,将每个第一边框坐标与每个第三边框坐标一一对应,建立第一坐标矩阵与第二坐标矩阵,将第一坐标矩阵经历旋转、平移、缩放、转置等变换操作后,得到第二坐标矩阵。
根据所述第一边框坐标与所述第三边框坐标之间的变换关系建立所述变换矩阵。
具体地,根据变换的步骤,建立仿射矩阵,可以理解为,第一坐标矩阵通过线性或者非线性变换得到第二坐标矩阵,根据线性或者非线性变换建立仿射矩阵。
计算模块208,用于通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息。
具体地,第二参考字段区域与第二识别区域属于同一区域,大小有不同。根据文字条之间的仿射变换,由第二位置信息可推断出第二目标位置信息。即对第二位置信息进行仿射变换,得到第二目标位置信息。
调整识别模块210,用于根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
具体地,对模板票据图片中的第一参考字段区域与第二参考字段区域的IOU(Intersection over Union,重叠度)值,来调整待处理票据图片的第一目标位置区域与第二目标位置区域之间的IOU值,进而调整第二目标位置信息,裁剪出待识别区域。计算IOU的公式为:IOU=重合区域/预测区域+实际区域,可以看到IOU是一个比值,即交并比。在分子中,计算预测框和ground-truth之间的重叠区域;分母是并集区域,或者更简单地说,是预测框和ground-truth所包含的总区域。重叠区域和并集区域的比值,就是IOU。可以理解为,首先算出模板票据图片的第一位置信息与第二位置信息的IOU值,再计算出第一目标位置信息与第二目标位置信息的目标IOU值,最后根据两者的IOU值,调整第二目标位置信息,使得目标IOU值靠近IOU值。最后通过文字识别模块,对待识别区域进行内容识别,得到识别结果。
示例性地,通过文字识别模型对待识别区域的内容进行识别时,可通过正则表达式对特定数据进行调整再输出。例如:对票据中“纯数字”、“小写金额”、“大写金额”、“日期”这四个特定字段进行标准化输出。通过正则表达式进行调整,例如:票据中识别出日期为2020年4月22日,标准化输出为:2020-04-22。
示例性地,所述调整识别模块210还用于:
步骤S110A,获取所述第一位置信息与所述第二位置信息的重叠度值。
具体地,获取每个第一参考字段区域的第一边框坐标,每个第二参考字段区域的第二边框坐标,根据第一参考字段区域与第二边框坐标进行IOU值的计算。
计算所述第一目标位置信息与所述第二目标位置信息的目标重叠度值。
具体地,获取第一目标位置信息中的第三边框坐标与第二目标位置信息的第四边框坐标,根据重叠度值的计算公式,计算第三边框坐标与第四边框坐标的目标重叠度值。
调整所述第二目标位置信息,以使所述目标重叠度值等于所述重叠度值,得到待识别区域。
具体地,根据模板票据图片的IOU值,对第二目标位置信息进行调整,以更加精确的框选出待识别区域。例如:IOU值之差大于某个预设阈值,即调整结束,对待识别区域进行截取。
实施例三
参阅图8,是本申请实施例三之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。如图8所示,所述计算机设备2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及票据识别系统20。其中:
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例二的票据识别系统20的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行票据识别系统20,以实现实施例一的票据识别方法。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述服务器2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述服务器2与外部终端相连,在所述服务器2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图8仅示出了具有部件20-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的所述票据识别系统20还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。
例如,图7示出了所述实现票据识别系统20实施例二的程序模块示意图,该实施例中,所述票据识别系统20可以被划分为获取模块200、识别模块202、检测模块204、构建模块206、计算模块208以及调整识别模块210。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机可读指令,比程序更适合于描述所述票据识别系统20在所述计算机设备2中的执行过程。所述程序模块200-210的具体功能在实施例二中已有详细描述,在此不再赘述。
实施例四
本实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机可读指令,被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储计算机可读指令票据识别,被处理器执行时实现实施例一的票据识别方法。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种票据识别方法,其中,包括:
    获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域;
    将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息;
    接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息;
    根据所述第一位置信息与所述第一目标位置信息建立变换矩阵;
    通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息;
    根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
  2. 根据权利要求1所述的票据识别方法,其中,所述获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域包括:
    选取标准图片格式的票据图片作为模板票据图片;
    框选出文字条的字段内容作为第一参考字段区域以及所述文字条对应的数据内容作为第二参考字段区域。
  3. 根据权利要求1所述的票据识别方法,其中,所述通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息包括:
    通过所述检测模型对所述第一参考字段区域与所述第二参考字段区域进行边框识别,得到第一参考字段区域的第一边框顶点与所述第二参考字段区域对应的第二边框顶点;
    将所述模板票据图片的任意顶点作为坐标原点建立坐标系,得到所述第一边框顶点的第一边框坐标与所述第二边框顶点的第二边框坐标,其中,所述第一位置信息包括多个第一边框坐标,所述第二位置信息包括多个第二边框坐标。
  4. 根据权利要求1所述的票据识别方法,其中,所述通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息包括:
    通过所述检测模型对所述第一识别区域进行框选,得到多个第三边框顶点;
    以所述待处理票据图片的任意顶点作为坐标原点建立坐标系,得到所述第三边框顶点的第三边框坐标,其中,所述第一目标位置信息包括多个第三边框坐标。
  5. 根据权利要求4所述的票据识别方法,其中,所述根据所述第一位置信息与所述第一目标位置信息建立变换矩阵包括:
    获取所述第一位置信息的第一边框坐标与所述第一目标位置信息的第三边框坐标;
    将所述第一边框坐标进行变换得到所述第三边框坐标;
    根据所述第一边框坐标与所述第三边框坐标之间的变换关系建立所述变换矩阵。
  6. 根据权利要求1所述的票据识别方法,其中,所述根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域包括:
    获取所述第一位置信息与所述第二位置信息的重叠度值;
    计算所述第一目标位置信息与所述第二目标位置信息的目标重叠度值;
    调整所述第二目标位置信息,以使所述目标重叠度值等于所述重叠度值,得到待识别区域。
  7. 根据权利要求1所述的票据识别方法,其中,所述方法还包括:
    将所述识别结果上传至区块链进行存储。
  8. 一种票据识别系统,其中,包括:
    获取模块,用于获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域;
    识别模块,用于将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息;
    检测模块,用于接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息;
    构建模块,用于根据所述第一位置信息与所述第一目标位置信息建立变换矩阵;
    计算模块,用于通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息;
    调整识别模块,用于根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
  9. 一种计算机设备,其中,所述计算机设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域;
    将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息;
    接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息;
    根据所述第一位置信息与所述第一目标位置信息建立变换矩阵;
    通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息;
    根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    选取标准图片格式的票据图片作为模板票据图片;
    框选出文字条的字段内容作为第一参考字段区域以及所述文字条对应的数据内容作为第二参考字段区域。
  11. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    通过所述检测模型对所述第一参考字段区域与所述第二参考字段区域进行边框识别,得到第一参考字段区域的第一边框顶点与所述第二参考字段区域对应的第二边框顶点;
    将所述模板票据图片的任意顶点作为坐标原点建立坐标系,得到所述第一边框顶点的第一边框坐标与所述第二边框顶点的第二边框坐标,其中,所述第一位置信息包括多个第一边框坐标,所述第二位置信息包括多个第二边框坐标。
  12. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    通过所述检测模型对所述第一识别区域进行框选,得到多个第三边框顶点;
    以所述待处理票据图片的任意顶点作为坐标原点建立坐标系,得到所述第三边框顶点的第三边框坐标,其中,所述第一目标位置信息包括多个第三边框坐标。
  13. 根据权利要求12所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取所述第一位置信息的第一边框坐标与所述第一目标位置信息的第三边框坐标;
    将所述第一边框坐标进行变换得到所述第三边框坐标;
    根据所述第一边框坐标与所述第三边框坐标之间的变换关系建立所述变换矩阵。
  14. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取所述第一位置信息与所述第二位置信息的重叠度值;
    计算所述第一目标位置信息与所述第二目标位置信息的目标重叠度值;
    调整所述第二目标位置信息,以使所述目标重叠度值等于所述重叠度值,得到待识别区域。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:
    获取模板票据图片,所述模板票据图片包括框选的第一参考字段区域以及第二参考字段区域;
    将所述模板票据图片输入至文字识别模型,以识别出所述第一参考字段区域中的文本,并通过检测模型输出所述第一参考字段区域的第一位置信息与所述第二参考字段区域的第二位置信息;
    接收待处理票据图片,并将所述待处理票据图片输入至所述文字识别模型,识别出与所述文本匹配的目标文本,并通过所述检测模型框选标记所述目标文本的第一识别区域与第一目标位置信息;
    根据所述第一位置信息与所述第一目标位置信息建立变换矩阵;
    通过所述变换矩阵计算所述第二位置信息,得到所述待处理票据图片中第二识别区域的第二目标位置信息;
    根据所述第一位置信息与所述第二位置信息的重叠度值,调整所述第二目标位置信息,得到待识别区域,并通过所述文字识别模型识别所述待识别区域中的文本,得到识别结果。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器还执行以下步骤:
    选取标准图片格式的票据图片作为模板票据图片;
    框选出文字条的字段内容作为第一参考字段区域以及所述文字条对应的数据内容作为第二参考字段区域。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器还执行以下步骤:
    通过所述检测模型对所述第一参考字段区域与所述第二参考字段区域进行边框识别,得到第一参考字段区域的第一边框顶点与所述第二参考字段区域对应的第二边框顶点;
    将所述模板票据图片的任意顶点作为坐标原点建立坐标系,得到所述第一边框顶点的第一边框坐标与所述第二边框顶点的第二边框坐标,其中,所述第一位置信息包括多个第一边框坐标,所述第二位置信息包括多个第二边框坐标。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器还执行以下步骤:
    通过所述检测模型对所述第一识别区域进行框选,得到多个第三边框顶点;
    以所述待处理票据图片的任意顶点作为坐标原点建立坐标系,得到所述第三边框顶点的第三边框坐标,其中,所述第一目标位置信息包括多个第三边框坐标。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器还执行以下步骤:
    获取所述第一位置信息的第一边框坐标与所述第一目标位置信息的第三边框坐标;
    将所述第一边框坐标进行变换得到所述第三边框坐标;
    根据所述第一边框坐标与所述第三边框坐标之间的变换关系建立所述变换矩阵。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器还执行以下步骤:
    获取所述第一位置信息与所述第二位置信息的重叠度值;
    计算所述第一目标位置信息与所述第二目标位置信息的目标重叠度值;
    调整所述第二目标位置信息,以使所述目标重叠度值等于所述重叠度值,得到待识别区域。
PCT/CN2021/109726 2020-09-17 2021-07-30 票据识别方法、系统、计算机设备与计算机可读存储介质 WO2022057471A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010977474.7A CN111931784B (zh) 2020-09-17 2020-09-17 票据识别方法、系统、计算机设备与计算机可读存储介质
CN202010977474.7 2020-09-17

Publications (1)

Publication Number Publication Date
WO2022057471A1 true WO2022057471A1 (zh) 2022-03-24

Family

ID=73333846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109726 WO2022057471A1 (zh) 2020-09-17 2021-07-30 票据识别方法、系统、计算机设备与计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111931784B (zh)
WO (1) WO2022057471A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115497114A (zh) * 2022-11-18 2022-12-20 中国烟草总公司四川省公司 一种卷烟物流收货票据的结构化信息提取方法
CN116246294A (zh) * 2022-12-05 2023-06-09 连连(杭州)信息技术有限公司 图像信息识别方法、装置、存储介质和电子设备

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931784B (zh) * 2020-09-17 2021-01-01 深圳壹账通智能科技有限公司 票据识别方法、系统、计算机设备与计算机可读存储介质
CN112381153A (zh) * 2020-11-17 2021-02-19 深圳壹账通智能科技有限公司 票据分类的方法、装置和计算机设备
CN112597987B (zh) * 2020-11-17 2023-08-04 北京百度网讯科技有限公司 纸质数据数字化方法及装置、电子设备、存储介质
CN112541443B (zh) * 2020-12-16 2024-05-10 平安科技(深圳)有限公司 发票信息抽取方法、装置、计算机设备及存储介质
CN112669515B (zh) * 2020-12-28 2022-09-27 上海斑马来拉物流科技有限公司 票据图像识别方法、装置、电子设备和存储介质
CN112633279A (zh) * 2020-12-31 2021-04-09 北京市商汤科技开发有限公司 文本识别方法、装置和系统
CN112836632B (zh) * 2021-02-02 2023-04-07 浪潮云信息技术股份公司 自定义模板文字识别的实现方法及系统
CN113485618A (zh) * 2021-07-05 2021-10-08 上海商汤临港智能科技有限公司 自定义识别模板的生成方法、证件的识别方法以及装置
CN113723069A (zh) * 2021-09-03 2021-11-30 北京房江湖科技有限公司 文件检测方法与系统、机器可读存储介质及电子设备
CN113723347B (zh) * 2021-09-09 2023-11-07 京东科技控股股份有限公司 信息提取的方法、装置、电子设备及存储介质
CN113920513B (zh) * 2021-12-15 2022-04-19 中电云数智科技有限公司 基于自定义通用模板的文本识别方法及设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109574A1 (en) * 2013-03-15 2017-04-20 Mitek Systems, Inc. Systems and methods for capturing critical fields from a mobile image of a credit card bill
CN109948135A (zh) * 2019-03-26 2019-06-28 厦门商集网络科技有限责任公司 一种基于表格特征归一化图像的方法及设备
CN110263616A (zh) * 2019-04-29 2019-09-20 五八有限公司 一种文字识别方法、装置、电子设备及存储介质
CN111126125A (zh) * 2019-10-15 2020-05-08 平安科技(深圳)有限公司 证件中的目标文本提取方法、装置、设备及可读存储介质
CN111178365A (zh) * 2019-12-31 2020-05-19 五八有限公司 图片文字的识别方法、装置、电子设备及存储介质
CN111582021A (zh) * 2020-03-26 2020-08-25 平安科技(深圳)有限公司 场景图像中的文本检测方法、装置及计算机设备
CN111931784A (zh) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 票据识别方法、系统、计算机设备与计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109426814B (zh) * 2017-08-22 2023-02-24 顺丰科技有限公司 一种发票图片特定板块的定位、识别方法、系统、设备
CN109117814B (zh) * 2018-08-27 2020-11-03 京东数字科技控股有限公司 图像处理方法、装置、电子设备及介质
CN111209856B (zh) * 2020-01-06 2023-10-17 泰康保险集团股份有限公司 发票信息的识别方法、装置、电子设备及存储介质
CN111444795A (zh) * 2020-03-13 2020-07-24 安诚迈科(北京)信息技术有限公司 票据数据识别方法、电子设备、存储介质及装置
CN111444792B (zh) * 2020-03-13 2023-05-09 安诚迈科(北京)信息技术有限公司 票据识别方法、电子设备、存储介质及装置
CN111476109A (zh) * 2020-03-18 2020-07-31 深圳中兴网信科技有限公司 票据处理方法、票据处理装置和计算机可读存储介质
CN111462388A (zh) * 2020-03-19 2020-07-28 广州市玄武无线科技股份有限公司 一种票据检验方法、装置、终端设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109574A1 (en) * 2013-03-15 2017-04-20 Mitek Systems, Inc. Systems and methods for capturing critical fields from a mobile image of a credit card bill
CN109948135A (zh) * 2019-03-26 2019-06-28 厦门商集网络科技有限责任公司 一种基于表格特征归一化图像的方法及设备
CN110263616A (zh) * 2019-04-29 2019-09-20 五八有限公司 一种文字识别方法、装置、电子设备及存储介质
CN111126125A (zh) * 2019-10-15 2020-05-08 平安科技(深圳)有限公司 证件中的目标文本提取方法、装置、设备及可读存储介质
CN111178365A (zh) * 2019-12-31 2020-05-19 五八有限公司 图片文字的识别方法、装置、电子设备及存储介质
CN111582021A (zh) * 2020-03-26 2020-08-25 平安科技(深圳)有限公司 场景图像中的文本检测方法、装置及计算机设备
CN111931784A (zh) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 票据识别方法、系统、计算机设备与计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115497114A (zh) * 2022-11-18 2022-12-20 中国烟草总公司四川省公司 一种卷烟物流收货票据的结构化信息提取方法
CN115497114B (zh) * 2022-11-18 2024-03-12 中国烟草总公司四川省公司 一种卷烟物流收货票据的结构化信息提取方法
CN116246294A (zh) * 2022-12-05 2023-06-09 连连(杭州)信息技术有限公司 图像信息识别方法、装置、存储介质和电子设备
CN116246294B (zh) * 2022-12-05 2024-04-09 连连(杭州)信息技术有限公司 图像信息识别方法、装置、存储介质和电子设备

Also Published As

Publication number Publication date
CN111931784B (zh) 2021-01-01
CN111931784A (zh) 2020-11-13

Similar Documents

Publication Publication Date Title
WO2022057471A1 (zh) 票据识别方法、系统、计算机设备与计算机可读存储介质
US7886219B2 (en) Automatic form generation
CN110675546B (zh) 发票图片识别及验真方法、系统、设备及可读存储介质
WO2022048211A1 (zh) 文档目录生成方法、装置、电子设备及可读存储介质
CN108768929B (zh) 电子装置、征信反馈报文的解析方法及存储介质
CN110874618B (zh) 基于小样本的ocr模板学习方法、装置、电子设备及介质
US20080205742A1 (en) Generation of randomly structured forms
CN111695439A (zh) 图像结构化数据提取方法、电子装置及存储介质
CN112699775A (zh) 基于深度学习的证件识别方法、装置、设备及存储介质
CN111639648B (zh) 证件识别方法、装置、计算设备和存储介质
CN112712014B (zh) 表格图片结构解析方法、系统、设备和可读存储介质
CN113837151B (zh) 表格图像处理方法、装置、计算机设备及可读存储介质
CN111325104A (zh) 文本识别方法、装置及存储介质
CN112541443B (zh) 发票信息抽取方法、装置、计算机设备及存储介质
CN111931771B (zh) 票据内容识别方法、装置、介质及电子设备
CN115758451A (zh) 基于人工智能的数据标注方法、装置、设备及存储介质
CN112418206B (zh) 基于位置检测模型的图片分类方法及其相关设备
CN113837113A (zh) 基于人工智能的文档校验方法、装置、设备及介质
CN112581344A (zh) 一种图像处理方法、装置、计算机设备及存储介质
CN117133006A (zh) 一种单证验证方法、装置、计算机设备及存储介质
WO2021151274A1 (zh) 图像文档处理方法、装置、电子设备及计算机可读存储介质
CN114169306A (zh) 一种生成电子回执单的方法、装置、设备及可读存储介质
CN112035774A (zh) 网络页面生成方法、装置、计算机设备及可读存储介质
CN113435331B (zh) 图像文字识别方法、系统、电子设备及存储介质
CN117115839B (zh) 一种基于自循环神经网络的发票字段识别方法和装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21868300

Country of ref document: EP

Kind code of ref document: A1