CN113869131B - Method for structuring text business license picture - Google Patents

Method for structuring text business license picture Download PDF

Info

Publication number
CN113869131B
CN113869131B CN202111023703.2A CN202111023703A CN113869131B CN 113869131 B CN113869131 B CN 113869131B CN 202111023703 A CN202111023703 A CN 202111023703A CN 113869131 B CN113869131 B CN 113869131B
Authority
CN
China
Prior art keywords
business license
text
field
picture
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111023703.2A
Other languages
Chinese (zh)
Other versions
CN113869131A (en
Inventor
穆宁
郭涛远
李磊
朱和军
王康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fenghuo Tiandi Communication Technology Co ltd
Original Assignee
Nanjing Fenghuo Tiandi Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Fenghuo Tiandi Communication Technology Co ltd filed Critical Nanjing Fenghuo Tiandi Communication Technology Co ltd
Priority to CN202111023703.2A priority Critical patent/CN113869131B/en
Publication of CN113869131A publication Critical patent/CN113869131A/en
Application granted granted Critical
Publication of CN113869131B publication Critical patent/CN113869131B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method for structuring a business license picture, which relates to the technical field of structuring and text picture processing, and uses four stages to detect business license targets, detect business license texts, recognize business license texts and structurally output text contents.

Description

Method for structuring text business license picture
Technical Field
The invention relates to the technical field of structure text picture processing, in particular to a method for structuring text business license pictures.
Background
The commercial license recognition hardware and method are available in various factories in the market, most of the commercial license recognition hardware and method are produced by using a server interface internet calling mode, and the commercial license recognition hardware and method have huge application markets in banks, tax authorities and police departments.
Most of the prior art needs to fix the posture of shooting business license and the duty ratio of the business license in the picture, seriously influences the user experience, and is unfriendly to outdoor police hand-held shooting acquisition and user natural scene calling.
Most of the prior art takes a text recognition mode as a core point, ignores the importance of a structured output field, and discards the real demand point of business license business scenes.
At present, another scheme exists in the market, namely, after a field picture is cut out in the step 3, a deep classification network is used for intelligent classification to different field categories, and the scheme is slightly inferior to the scheme in effectiveness, but time-efficient because the deep network classification is performed once more, the time is longer.
At present, another scheme exists in the market, the NLP algorithm is adopted to directly classify the identification character strings output in the step 5, but the rule formulation cost in the NLP is higher, the scheme is not applicable to universality of the scheme, namely, the scheme can only be used for business license, and when the scheme is migrated to other identification cards, the rule needs to be formulated again.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for structuring a text business license picture, which can realize automatic detection of multi-scene multi-business license and intelligent extraction of text content structuring, help to quickly search and collect business license information of a warehouse-in company, and improve the working efficiency.
The invention adopts the following technical scheme for solving the technical problems:
a method for structuring a text business license picture specifically comprises the following steps:
step 1, selecting N business license pictures with labels as training samples, and obtaining a business license detection model, a field detection model and a field identification model through training; wherein N >1000;
step 2, outputting four-point coordinates of the business license by the training sample to be identified through the business license detection modelAnd a rotation angle theta i Wherein i ε k, k represents the total number of business licenses contained in this sample; each business license is subjected to subsequent operation respectively, wherein four-point coordinates are ordered clockwise according to the lower left corner as a starting point;
step 3, outputting four-point coordinates of the business license through the business license detection model in step 2Cutting the business license image to obtain a quadrilateral matrix, and obtaining k foreground pictures only containing one business license;
step 4: detecting the foreground picture through a field detection model, if the detection is successful, obtaining n text four-point coordinate (A, B) positions and the field category (delta), wherein n represents the total number of texts of the current foreground picture,
(A,B)=[(a 1 ,b 1 ),(a 2 ,b 2 ),(a 3 ,b 3 ),(a 4 ,b 4 )]
field class δ= { C 0 :F 0 ,...C i :F i ...,C t_k :F t_k T_k is a threshold value top_k, representing taking the nearest preceding top_k text categories C i ,F i Representing a fraction of the output of the network,
the text quadrilateral coordinates cut the foreground picture into n text rectangular pictures by the same transmission transformation operation in step 3, wherein each text rectangular picture M i And field class delta i One-to-one correspondence;
step 5, the text rectangular picture obtained in the step 4 is identified by an OCR text box to obtain n text field character strings;
step 6, combining text field content S i Sum field class delta i The final Class is obtained by joint discrimination, and the specific joint discrimination mode is as follows:
for each field class delta i If F i >0.9, then indicates that the output Class confidence is high enough, then class=c i
Otherwise, calculate text field content S i To top_k text categories { F 0 ,...F i ...,F t_k Boundary distance { D } 0 ,...D i ...,D t_k The boundary distance minimum position arg_min (D) =d m By means ofThe location gets a text categoryThen->
And 7, converting the picture sample containing the business license into a structured character string for outputting.
As a further preferred embodiment of the method for structuring a textual business license image according to the present invention, in step 1, the business license image with label includes: the business license detection frame four-point marking and angle marking, the field detection frame four-point marking and category marking, and the text content marking in the field detection frame;
the output of the business license detection model is the four-point coordinate (x, y) position and rotation angle (theta) of the business license, wherein theta is [0,360];
the output of the field detection model is the n text four-point coordinate (A, B) locations of the business license and the field class (delta), whereinWherein C represents a collection of all layouts, C i Representing the total number of field categories contained in the ith layout;
the output of the field identification model is that the text content in the field frame is obtained through the field detection model.
As a further preferable scheme of the method for structuring the textualized business license picture of the present invention, in step 3, the calculation formula of the length and width (h, w) of the quadrilateral matrix is specifically:
by affineTransformation performs-theta on quadrilateral matrix i Angular rotation, affine transformation center point coordinates (w/2,h/2), affine transformation rotation matrix:
converting the rotated quadrilateral matrix into a rectangular matrix through projection transformation, wherein rectangular coordinates are represented as [0, w, h ];
the transmission transformation matrix is:
wherein a is 11 -a 33 The projection transformation parameters are obtained by a set of quadrilateral coordinates and rectangular coordinates before and after transformation.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
1. according to the invention, four stages are used for detecting business license targets, business license texts are detected, business license texts are identified, text contents are output in a structured mode, the process effectively avoids the pain points of poor robustness of most of business license ocr identification algorithms on the current market on complex backgrounds and one-picture multiple certificates, and simultaneously compensates the lack of main stream schemes on the output text structuring;
2. the present invention results in a solution for business license ocr identification for validity testing; the method of text detection classification combined with text recognition distance is adopted, and the structural analysis accuracy of the recognition content is doubly ensured;
3. the algorithm flow of the method for structuring the textual business license picture has stronger universality and can be migrated to the textual output of other card picture structures such as tax registration cards, drivers' licenses and the like;
4. the final test result of the invention is higher than the main stream scheme of the market, the robustness to complex scenes (hand-held, rotating, twisting, multi-target and the like) is obviously better than that of the prior art, the structure text output recognition result of the invention is more suitable for users, and the structure text output recognition result is used in the prior art.
Drawings
FIG. 1 is a flow chart of a method of structuring a textual business license picture in accordance with the present invention;
FIG. 2 is a schematic diagram of a business license of the present invention;
FIG. 3 is a schematic diagram of the test effect of the present invention.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
the technical problem to be solved by the invention is to provide a method for structuring a text business license picture aiming at the defects of the prior art, and the method for structuring the text business license picture can realize automatic detection of multiple scenes and business licenses and intelligent extraction of text content structuring, so that the business license information of a warehouse-in company can be quickly searched and acquired by assistance, and the working efficiency is improved.
A method of structuring a textual business license picture, comprising the steps of:
step 1: and selecting N (N > 1000) business license pictures with labels as training samples, and obtaining a business license detection model, a field detection model and a field identification model through training.
The labeling content mainly comprises the following steps: business license detection frame four-point labeling and angle labeling; four-point labeling and category labeling of a field detection frame; the field detects the text content label within the frame.
The output of the business license detection model is the four-point coordinate (x, y) position and rotation angle (θ) of the business license, where θ ε [0,360]The method comprises the steps of carrying out a first treatment on the surface of the The output of the field detection model is the n text four-point coordinate (A, B) locations of the business license and the field class (delta), whereinWherein C represents a collection of all layouts, C i Representing the total number of field categories contained in the ith layout; the output of the field identification model is that the field detection model is passed throughText content within the field box is obtained.
Step 2: the sample to be identified outputs four-point coordinates of the business license through the business license detection modelAnd a rotation angle theta i Where i ε k, k represents the total number of business licenses contained in this sample; and respectively carrying out subsequent operation on each business license, wherein the four-point coordinates are ordered clockwise according to the lower left corner as a starting point.
Step 3: by the four-point coordinatesCutting the image to obtain a quadrilateral matrix, wherein the calculation formula of the length and width (h, w) of the quadrilateral matrix is as follows:
performing-theta on quadrilateral matrix through affine transformation i Angular rotation, affine transformation center point coordinates (w/2,h/2), affine transformation rotation matrix:
converting the rotated quadrilateral matrix into a rectangular matrix by projective transformation, wherein rectangular coordinates are expressed as [0, w, h ]
The transmission transformation matrix is:
wherein a is 11 -a 33 For projection transformation parameters, the transformation parameters can be obtained by a group of four before and after transformationThe coordinates of the edge and the rectangular coordinates are obtained.
Finally, k foreground pictures only containing one business license are obtained, and the foreground pictures are respectively subjected to the following operation.
Step 4: detecting the foreground picture through a field detection model, if the detection is successful, obtaining n text four-point coordinate (A, B) positions and the field category (delta), wherein n represents the total number of texts of the current foreground picture,
(A,B)=[(a 1 ,b 1 ),(a 2 ,b 2 ),(a 3 ,b 3 ),(a 4 ,b 4 )]
field class δ= { C 0 :F 0 ,...C i :F i ...,C t_k :F t_k T_k is a threshold value top_k, representing taking the nearest preceding top_k text categories C i ,F i Representing a fraction of the output of the network,
the text quadrilateral coordinates can cut the foreground picture into n text rectangular pictures by the same transmission transformation operation in the step 3, wherein each text rectangular picture M i And field class delta i One-to-one correspondence. The text rectangular pictures are subjected to the following operations;
step 5: the text rectangular picture is subjected to an OCR text box recognition model to obtain n text field character strings;
step 6: combining the text field contents S i And delta i And (5) carrying out joint discrimination to obtain the final Class. The specific joint discrimination mode is as follows:
for each field class delta i If said F i >0.9, then indicates that the output Class confidence is high enough, then class=c i
Otherwise, calculate text field content S i To top_k text categories { F 0 ,...F i ...,F t_k Boundary distance { D } 0 ,...D i ...,D t_k } = D, take out boundaryDistance minimum position arg_min (D) =d m From this location, a text category is obtainedThen->
Step 7, through all the operations, the present invention can convert the picture sample containing the business license into the structured character string output, and according to fig. 2, convert the picture sample containing the business license into the structured character string output, and the specific effects are as follows:
unifying social credit codes: 913**************** (1/1)
Name: yangzhou corporation
Type (2): limited liability company
Residence: baozi county × the following are all the following
Statutory representative: tension ×
Register capital: 100 ten thousand yuan whole
Date of establishment: 2015, 10 month and 22 day
Business deadlines: 22 days of 10 months of 2015
The operating range is as follows: processing and selling glass fiber reinforced plastic yachts and glass fiber reinforced plastic public municipal facilities; selling yacht auxiliary facilities and electrical equipment; designing, researching and developing ships; the import and export business of various commodities is self-owned and agency, except for the commodities which are restricted to business operation or forbidden to import and export by the country. (legal approval project, approval by related departments, the operation can be carried out)
In order to verify the feasibility of the algorithm flow provided by the invention, a bid test is specially carried out, a test sample is 100 pieces of picture data containing business licenses, a test index is field accuracy and character accuracy, and the calculation mode is as follows:
the test results are shown in fig. 3.
According to the invention, four stages are used for detecting business license targets, business license texts are detected, business license texts are identified, text contents are output in a structured mode, the process effectively avoids the pain points of poor robustness of most of business license ocr identification algorithms on the current market on complex backgrounds and one-picture multiple certificates, and simultaneously compensates the lack of main stream schemes on the output text structuring;
the present invention results in a solution for business license ocr identification for validity testing; the method of text detection classification combined with text recognition distance is adopted, and the structural analysis accuracy of the recognition content is doubly ensured;
the algorithm flow of the method for structuring the textual business license picture has stronger universality and can be migrated to the textual output of other card picture structures such as tax registration cards, drivers' licenses and the like;
the final test result of the invention is higher than the main stream scheme of the market, the robustness to complex scenes (hand-held, rotating, twisting, multi-target and the like) is obviously better than that of the prior art, the structure text output recognition result of the invention is more suitable for users, and the structure text output recognition result is used in the prior art.

Claims (3)

1. A method of structuring a textual business license picture, comprising: the method specifically comprises the following steps:
step 1, selecting N business license pictures with labels as training samples, and obtaining a business license detection model, a field detection model and a field identification model through training; wherein N >1000;
step 2, outputting four-point coordinates of the business license by the training sample to be identified through the business license detection modelAnd a rotation angle theta i Wherein i ε k, k represents the total number of business licenses contained in this sample; each business license is subjected to subsequent operation respectively, wherein four-point coordinates are ordered clockwise according to the lower left corner as a starting point;
step 3, outputting four-point coordinates of the business license through the business license detection model in step 2Cutting the business license image to obtain a quadrilateral matrix, and obtaining k foreground pictures only containing one business license;
step 4: detecting the foreground picture through a field detection model, if the detection is successful, obtaining n text four-point coordinate (A, B) positions and the field category (delta), wherein n represents the total number of texts of the current foreground picture,
(A,B)=[(a 1 ,b 1 ),(a 2 ,b 2 ),(a 3 ,b 3 ),(a 4 ,b 4 )]
field class δ= { C 0 :F 0 ,...C i :F i ...,C t_k :F t_k T_k is a threshold value top_k, representing taking the nearest preceding top_k text categories C i ,F i Representing a fraction of the output of the network,
the text quadrilateral coordinates cut the foreground picture into n text rectangular pictures by the same transmission transformation operation in step 3, wherein each text rectangular picture M i And field class delta i One-to-one correspondence;
step 5, the text rectangular picture obtained in the step 4 is identified by an OCR text box to obtain n text field character strings;
step 6, combining text field content S i Sum field class delta i The final Class is obtained by joint discrimination, and the specific joint discrimination mode is as follows:
for the followingEach field class delta i If F i >0.9, then indicates that the output Class confidence is high enough, then class=c i
Otherwise, calculate text field content S i To top_k text categories { F 0 ,...F i ...,F t_k Boundary distance { D } 0 ,...D i ...,D t_k The boundary distance minimum position arg_min (D) =d m From this location, a text category is obtainedThen->
And 7, converting the picture sample containing the business license into a structured character string for outputting.
2. The method of structuring a textual business license photograph of claim 1 wherein: in step 1, the business license picture with the label includes: the business license detection frame four-point marking and angle marking, the field detection frame four-point marking and category marking, and the text content marking in the field detection frame;
the output of the business license detection model is the four-point coordinate (x, y) position and rotation angle (theta) of the business license, wherein theta is [0,360];
the output of the field detection model is the n text four-point coordinate (A, B) locations of the business license and the field class (delta), whereinWherein C represents a collection of all layouts, C i Representing the total number of field categories contained in the ith layout;
the output of the field identification model is that the text content in the field frame is obtained through the field detection model.
3. The method of structuring a textual business license photograph of claim 1 wherein: in step 3, the calculation formula of the length and width (h, w) of the quadrilateral matrix is specifically:
performing-theta on quadrilateral matrix through affine transformation i Angular rotation, affine transformation center point coordinates (w/2,h/2), affine transformation rotation matrix:
converting the rotated quadrilateral matrix into a rectangular matrix through projection transformation, wherein rectangular coordinates are represented as [0, w, h ];
the transmission transformation matrix is:
wherein a is 11 -a 33 The projection transformation parameters are obtained by a set of quadrilateral coordinates and rectangular coordinates before and after transformation.
CN202111023703.2A 2021-09-01 2021-09-01 Method for structuring text business license picture Active CN113869131B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111023703.2A CN113869131B (en) 2021-09-01 2021-09-01 Method for structuring text business license picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111023703.2A CN113869131B (en) 2021-09-01 2021-09-01 Method for structuring text business license picture

Publications (2)

Publication Number Publication Date
CN113869131A CN113869131A (en) 2021-12-31
CN113869131B true CN113869131B (en) 2024-03-29

Family

ID=78989119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111023703.2A Active CN113869131B (en) 2021-09-01 2021-09-01 Method for structuring text business license picture

Country Status (1)

Country Link
CN (1) CN113869131B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN110889402A (en) * 2019-11-04 2020-03-17 广州丰石科技有限公司 Business license content identification method and system based on deep learning
CN112668335A (en) * 2020-12-21 2021-04-16 广州市申迪计算机系统有限公司 Method for identifying and extracting business license structured information by using named entity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN110889402A (en) * 2019-11-04 2020-03-17 广州丰石科技有限公司 Business license content identification method and system based on deep learning
CN112668335A (en) * 2020-12-21 2021-04-16 广州市申迪计算机系统有限公司 Method for identifying and extracting business license structured information by using named entity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习技术的图片文字提取技术的研究;蒋良卫;黄玉柱;邓芙蓉;;信息系统工程;20200331(第03期);第87-88页 *

Also Published As

Publication number Publication date
CN113869131A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
JP5522408B2 (en) Pattern recognition device
CN109800698B (en) Icon detection method based on deep learning, icon detection system and storage medium
CN111401410B (en) Traffic sign detection method based on improved cascade neural network
CN110569832A (en) text real-time positioning and identifying method based on deep learning attention mechanism
CN111931664A (en) Mixed note image processing method and device, computer equipment and storage medium
CN110442744A (en) Extract method, apparatus, electronic equipment and the readable medium of target information in image
CN110569878A (en) Photograph background similarity clustering method based on convolutional neural network and computer
CN111476210B (en) Image-based text recognition method, system, device and storage medium
WO2020233611A1 (en) Method and device for recognizing image information bearing medium, computer device and medium
CN115294150A (en) Image processing method and terminal equipment
CN110647956A (en) Invoice information extraction method combined with two-dimensional code recognition
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN111753923A (en) Intelligent photo album clustering method, system, equipment and storage medium based on human face
CN112418206B (en) Picture classification method based on position detection model and related equipment thereof
CN113869131B (en) Method for structuring text business license picture
CN110674678A (en) Method and device for identifying sensitive mark in video
US20230132261A1 (en) Unified framework for analysis and recognition of identity documents
CN114359931A (en) Express bill identification method and device, computer equipment and storage medium
CN114359912B (en) Software page key information extraction method and system based on graph neural network
Ahmed et al. A generic method for automatic ground truth generation of camera-captured documents
CN115439850A (en) Image-text character recognition method, device, equipment and storage medium based on examination sheet
CN114241485A (en) Information identification method, device, equipment and storage medium of property certificate
CN115035032A (en) Neural network training method, related method, device, terminal and storage medium
CN114547437A (en) Image retrieval method and device
CN113269045A (en) Chinese artistic word detection and recognition method under natural scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant