WO2021051553A1 - Certificate information classification and positioning method and apparatus - Google Patents

Certificate information classification and positioning method and apparatus Download PDF

Info

Publication number
WO2021051553A1
WO2021051553A1 PCT/CN2019/117550 CN2019117550W WO2021051553A1 WO 2021051553 A1 WO2021051553 A1 WO 2021051553A1 CN 2019117550 W CN2019117550 W CN 2019117550W WO 2021051553 A1 WO2021051553 A1 WO 2021051553A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
detection
boxes
text line
classification
Prior art date
Application number
PCT/CN2019/117550
Other languages
French (fr)
Chinese (zh)
Inventor
黄泽浩
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051553A1 publication Critical patent/WO2021051553A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of computer technology, and in particular to a method and device for classifying and positioning credential information.
  • the classification and positioning of the card surface information of ID cards, bank cards, etc. usually use fixed position extraction of text lines or general text detection methods.
  • the former has a limited scope of application and overly relies on the contour extraction and image correction of the certificate.
  • the latter has a slow detection speed.
  • the extracted text needs to be classified according to the content, which further reduces the accuracy.
  • the embodiments of the present application provide a classification and positioning method and device for credential information, which can expand the scope of application and increase the detection speed.
  • the embodiment of the application provides a classification and positioning method for credential information, which includes the following steps: a server uses a classification and positioning model based on the YOLO network to detect A feature information in a first target image, and extract A detection frames, And obtain the first frame information of the above A detection frames and the first classification label of the above A detection frames.
  • the first target image contains the first document, and A is a positive integer greater than 0; the server is structured according to the first document
  • the information feature adjusts the frame information of the A detection frames and the classification labels of the A detection frames to generate the second frame information of the A detection frames and the second classification labels of the A detection frames.
  • the embodiment of the present application also provides a device for classification and positioning of credential information, which can realize the beneficial effects of the above-mentioned classification and positioning method for credential information.
  • the function of the device can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes at least one module corresponding to the above-mentioned functions.
  • the device includes a first extraction unit and an adjustment unit.
  • the first extraction unit is used to detect A feature information in the first target image using a classification and positioning model based on the YOLO network, extract A detection frames, and obtain the first frame information of the A detection frames and the A
  • the first classification label of a detection frame, the first target image contains the first document, and A is a positive integer greater than 0;
  • the adjustment unit is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first document, and generate the second frame information of the A detection frames and the A detection frames The second classification label.
  • the embodiment of the present application also provides a server, which can realize the beneficial effects of the above-mentioned classification and positioning method for credential information.
  • the function of the server can be realized by hardware, and can also be realized by hardware executing corresponding software.
  • the hardware or software includes at least one module corresponding to the above-mentioned functions.
  • the server includes a memory, a processor, and a transceiver.
  • the memory is used to store a computer program that supports the server to execute the above method.
  • the computer program includes program instructions.
  • the processor is used to control and manage the actions of the server according to the program instructions.
  • the transceiver is used to support The server communicates with other communication devices.
  • the embodiment of the present application also provides a computer-readable storage medium with instructions stored on the readable storage medium, which when run on a processor, cause the processor to execute the above-mentioned classification and positioning method of credential information.
  • the server uses the classification and positioning model based on the YOLO network to detect A feature information in the first target image, extracts A detection frames, and obtains the first frame information of the A detection frames and the A
  • the first classification label of each detection frame, the first target image contains the first document, and A is a positive integer greater than 0; the server adjusts the border information of the above A detection frames and the above A according to the structured information characteristics of the first document
  • the classification label of the detection frame, the second frame information of the above A detection frames and the second classification label of the above A detection frames are generated.
  • the solution proposed in the embodiment of this application does not rely on the contour extraction and image correction of the certificate, and can expand the scope of application.
  • the embodiment of this application adopts the classification and positioning model based on the YOLO network, which effectively improves the detection speed of the classification and positioning of the certificate information.
  • FIG. 1 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for classifying and positioning credential information provided by an embodiment of the present application
  • Fig. 3 is a schematic structural diagram of an apparatus for classifying and positioning credential information provided by an embodiment of the present application.
  • the server in the embodiment of the present application can be a conventional server that can undertake services and guarantee service capabilities, or it can be a terminal device that has a processor, hard disk, memory, and system bus structure that can undertake services and guarantee service capabilities. .
  • the embodiments of this application do not make specific limitations.
  • the YOLO network is a deep residual network.
  • the advantage of the deep residual network over the general deep network is to use a high-speed network to solve the problem of gradient disappearance in a deep network with a higher number of layers.
  • Using the deep residual network can effectively solve the problem of gradient disappearance when the number of layers of the deep network is large, so that the error of the deep network will not increase when the number of layers is large, and the training efficiency can be improved.
  • FIG. 1 is a schematic diagram of the hardware structure of a server 100 according to an embodiment of the application.
  • the server 100 includes a memory 101, a transceiver 102, and a processor 103 coupled to the memory 101 and the transceiver 102.
  • the memory 101 is configured to store a computer program
  • the computer program includes program instructions
  • the processor 103 is configured to execute the program instructions stored in the memory 101
  • the transceiver 102 is configured to communicate with other devices under the control of the processor 103.
  • the processor 103 When the processor 103 is executing instructions, it can execute the classification and positioning method of the credential information according to the program instructions.
  • the processor 103 may be a central processing unit (English: central processing unit, abbreviated as: CPU), a general-purpose processor, a digital signal processor (English: digital signal processor, abbreviated as: DSP), an application specific integrated circuit (English: application- Specific integrated circuit, abbreviation: ASIC), field programmable gate array (English: field programmable gate array, abbreviation: FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the embodiments of the present application.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application- Specific integrated circuit
  • FPGA field programmable gate array
  • FPGA field programmable gate array
  • the processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the transceiver 102 may be a communication interface, a transceiver circuit, etc., where the communication interface is a general term and may include one or more interfaces, such as an interface between a server and a terminal.
  • the server 100 may further include a bus 104.
  • the memory 101, the transceiver 102, and the processor 103 may be connected to each other through a bus 104;
  • the bus 104 may be a peripheral component interconnection standard (English: peripheral component interconnect, abbreviated as: PCI) bus or an extended industry standard structure (English: extended industry standard architecture, referred to as EISA) bus, etc.
  • the bus 104 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 1, but it does not mean that there is only one bus or one type of bus.
  • the server 100 in the embodiment may also include other hardware generally according to the actual function of the server, which will not be repeated here.
  • the embodiment of the present application provides a classification and positioning method for credential information as shown in FIG. 2. Please refer to Figure 2.
  • the classification and positioning method of the certificate information includes:
  • the server uses the classification and positioning model based on the YOLO network to detect A feature information in the first target image, extracts A detection frames, and obtains the first frame information of the A detection frame and the first frame information of the A detection frame. Once the label is classified, the first target image contains the first certificate, and A is a positive integer greater than 0.
  • the method further includes: binarizing the second target image to obtain the second target image The binarized image is the first target image.
  • the above A detection frames include N text line detection frames and M non-text line detection frames
  • the server uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image, and extract A detection frames
  • the frame includes: the server uses the classification and positioning model based on the YOLO network to detect the feature information in the first target image, and extracts N text line detection frames; the server uses the classification and positioning model based on the YOLO network to detect the features in the first target image Information is detected, and M non-text line detection boxes are extracted.
  • the front of the ID card includes 8 characteristic information, namely name, gender, ethnicity, address, date of birth, address, ID number, and ID photo.
  • the 8 feature information on the front of the ID card includes one non-text line information and 7 text line information.
  • the personal data page on the inside page of the passport includes 12 characteristic information, namely Type/Type, Country Code/Country Code, Passport No./Passport No, Last Name/Surname, First Name/Given Names, Gender/Sex, Place of birth/Place of birth , Date of birth/Date of birth, Place of Issue/Place of Issue, Date of Issue/Date of Issue, Issuing Authority/Authority, and passport photo.
  • the 8 feature information on the inside page of the passport contains one non-text line information and 11 text line information.
  • the above-mentioned text line may be consecutive p symbols that do not include sentence-breaking punctuation marks, and the above-mentioned sentence-breaking punctuation marks include commas, periods, and exclamation points.
  • the distance between any two characters in the text line does not exceed the first distance threshold, and the first distance threshold is determined by actual application conditions, which is not specifically limited in the embodiment of the present application.
  • the symbols in the above text line may include Chinese characters, English letters, numbers, and non-breaking punctuation marks, etc.
  • the above non-breaking punctuation marks include plus signs, minus signs, and semicolons.
  • P is a positive integer greater than or equal to 0.
  • the above-mentioned server uses a classification and positioning model based on the YOLO network to detect feature information in the first target image, and extracts N text line detection frames, including:
  • the server uses the classification and positioning model to extract n text head detection boxes and n text tail detection boxes from the first target image.
  • the first text head detection box of the above n text head detection boxes includes the first target image
  • the first B characters of the first text line, the length of the first B characters of the first text line is L1
  • the text header detection box also includes a non-text image area with a length of t*L1 before the B characters
  • the first text end detection box in the n text end detection boxes includes the last C characters of the first text line, the last C characters of the first text line are L2, and the text end detection box also includes the above C characters after A non-text image area with a length of t*L2.
  • B and C are positive integers, and t is greater than zero and less than or equal to 1.
  • the server matches the n text header detection boxes with the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity, to obtain the initial detection boxes of the n text lines.
  • the server corrects the initial detection frame of the above n text lines, removes the non-text image area in the text line detection frame, and obtains n prediction frames.
  • the server uses the K-means clustering algorithm to obtain the confidence that the n prediction boxes contain the text line feature information and the confidence of the category to which the text line feature information in the n prediction boxes belongs.
  • the server uses a non-maximum value suppression algorithm to filter the above n prediction boxes to obtain the above N text line detection boxes, the target check scores of the above N text line detection boxes, and the first of the above N text line detection boxes. Sub-category label.
  • the text line information in the certificate mostly meets the consistency of the text line slope, that is, the connection slopes of any two characters in a text line are the same, and/or the slopes of any two text lines are the same.
  • ID cards bank cards and social security cards.
  • the server matches the aforementioned n text header detection boxes with the aforementioned n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity, to obtain the initial detection frame of the aforementioned n text lines, including: With reference to the horizontal line, the server calculates the slopes of the n text head detection boxes, the slopes of the n text tail detection boxes, and the i-th text head detection box in the n text head detection boxes and the first text tail detection box in the n text tail detection boxes. The connection slope of j text end detection boxes. Then, when the slope consistency condition is satisfied, the n text head detection boxes and the n text tail detection boxes are matched one by one based on the principle of proximity and sequence consistency. Sequential consistency means that the text head detection box of all text lines in the first target image is on the left (right) of the text end detection box of the text line.
  • connection slope between the i-th text head detection box and the j-th text tail detection box refers to: the center coordinates of the i-th text head detection box and the center of the j-th text tail detection box The slope of the line of coordinates.
  • connection slope of the i-th text head detection box in the n text head detection boxes and the g-th text tail detection box in the n text tail detection boxes is the second slope
  • the above i-th text head detection If the box and the g-th text tail detection box meet the slope consistency condition, it means that the difference between the slope of the g-th text tail detection box and the first slope is less than the first preset threshold, and the second slope and the first slope The difference of a slope is smaller than the second preset threshold.
  • the first slope may be the slope of the i-th text head detection frame, or the average value of the slopes of the n text head detection frames and the n text tail detection frames.
  • the settings of the first preset threshold and the second preset threshold are related to the above-mentioned average value of the slope, and are determined according to actual conditions, and the embodiment of the present application does not specifically limit it.
  • the text header of the initial detection box of each text line contains a non-text image area of length t*L1
  • the text end of the initial detection box of each text line contains a non-text image area of length t*L2
  • the server corrects the initial detection frame of the text line to remove the non-text image area in the text line detection frame to obtain the above-mentioned N text line detection frames.
  • the foregoing server uses a classification and positioning model to detect feature information in the first target image, and extracts M non-text line detection frames, including:
  • the server uses the classification and positioning model to perform feature extraction on the first target image to obtain m feature maps with a size of a*a, where the feature maps are images containing non-text line feature information;
  • the server divides each feature map in the m feature maps into a*a network cells, and predicts the center coordinates of the non-text line feature information in the m feature maps, and uses K-means clustering based on the center coordinates.
  • the class algorithm obtains the length and width of m prediction boxes, the confidence that the m prediction boxes contain non-text line feature information, and the confidence of the category of the non-text line feature information in the m prediction frames;
  • the server uses a non-maximum value suppression algorithm to filter the above m prediction boxes to obtain the above M non-text line detection boxes, the target detection scores of the above M non-text line detection boxes, and the above M non-text line detection boxes The first classification label.
  • the sigmoid function is used to predict the center coordinates of the non-text line feature information.
  • the server uses a non-maximum value suppression algorithm to filter the m prediction frames to obtain the M non-text line detection frames, including: using a non-maximum value suppression algorithm to generate target detections for the m prediction frames Score, sort the scores of the above m prediction boxes, and select the highest score and its corresponding prediction box. Traverse the rest of the prediction frames, and if the overlap area between the prediction frame and the current highest score prediction frame is greater than the third threshold, delete the prediction frame. Continue to select one with the highest score from the unprocessed prediction box, and repeat the above process until M prediction boxes are selected as M non-text line detection boxes.
  • a non-maximum value suppression algorithm to filter the m prediction frames to obtain the M non-text line detection frames, including: using a non-maximum value suppression algorithm to generate target detections for the m prediction frames Score, sort the scores of the above m prediction boxes, and select the highest score and its corresponding prediction box. Traverse the rest of the prediction frames, and if the overlap area between the prediction frame and
  • the non-maximum suppression algorithm generates a detection frame based on the target detection score, the prediction frame with the highest score is selected, and other prediction frames that have obvious overlap with the selected prediction frame are suppressed. This process is continuously recursively applied to the remaining prediction boxes.
  • the frame information of the frame information of the A detection frames includes the center coordinates of the detection frame, the length of the detection frame, and the width of the detection frame.
  • the server adjusts the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generates the second frame information of the A detection frames and the first frame information of the A detection frames. Secondary classification label.
  • the above-mentioned structured information feature of the first certificate refers to the relative positional relationship and relative ratio of any two feature information in the A pieces of feature information of the first certificate.
  • the server adjusts the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generates the second frame information of the A detection frames and the A detection frames.
  • the second classification label includes steps S7 to S14. It is not limited to the above steps, and other steps may also be included in the embodiments of the present application.
  • adjusting the first frame information of the detection frame to the second frame information according to the frame information of the j-th reference prediction frame includes:
  • the center coordinates of the detection frame are (x1, y1), and the difference between the center coordinates of the detection frame and the j-th reference prediction frame is (x2, y2), then the center coordinates of the detection frame are adjusted to (x1+a* x2, y1+a*y2).
  • the length of the detection frame is L1, and the difference between the length of the detection frame and the aforementioned j-th reference prediction frame is L2, and the length of the detection frame is adjusted to L1+b*L2.
  • the width of the detection frame is K1, and the difference between the width of the detection frame and the aforementioned j-th reference prediction frame is K2, and the width of the detection frame is adjusted to K1+c*K2.
  • a, b, and c are all greater than or equal to zero and less than or equal to 1. For example, the values of a, b, and c are all 0.5.
  • the above-mentioned server uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image.
  • the method further includes: pre-training the YOLO network; the above-mentioned pre-training the YOLO network Training includes: establishing a sample database, which contains image samples used to train the YOLO network; initializing the training parameters of the YOLO network; randomly selecting image samples from the sample database as training samples; inputting the training samples as input vectors into the YOLO network; obtaining The output vector of the YOLO network is the feature map of the training sample; the training parameters are optimized according to the output vector, and the residual network between the image sample and the feature map of the image sample is established.
  • a migration learning strategy is adopted, and the network parameters trained on the ImageNet data set are used as the training parameters of the YOLO network.
  • the A feature information in the first target image is detected by using the classification and positioning model based on the YOLO network, A detection frames are extracted, and the first frame information of the A detection frames and the A
  • the first classification label of each detection frame, the first target image contains the first document, and A is a positive integer greater than 0; the server adjusts the border information of the above A detection frames and the above A according to the structured information characteristics of the first document
  • the classification label of the detection frame, the second frame information of the above A detection frames and the second classification label of the above A detection frames are generated.
  • the solution proposed in the embodiment of this application does not rely on the contour extraction and image correction of the certificate, and can expand the scope of application.
  • the embodiment of this application adopts the classification and positioning model based on the YOLO network and uses the consistency of the slope of the text line to effectively improve the certificate. Detection speed of information classification and positioning.
  • the embodiment of the present application also provides a device for classification and positioning of credential information, which can achieve the beneficial effects of the above-mentioned classification and positioning method for credential information.
  • the function of the device can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes at least one module corresponding to the above-mentioned functions.
  • FIG. 3 is a structural block diagram of an apparatus 300 for classifying and positioning credential information according to an embodiment of the present application.
  • the apparatus includes a first extraction unit 301 and an adjustment unit 302.
  • the first extraction unit 301 is configured to detect A feature information in the first target image using a classification and positioning model based on the YOLO network, extract A detection frames, and obtain the first frame information of the A detection frames and the foregoing
  • the first classification label of A detection frames, the first target image contains the first certificate, and A is a positive integer greater than 0;
  • the adjustment unit 302 is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and the A detections.
  • the second classification label of the box is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and the A detections.
  • the second classification label of the box is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and the A detections.
  • the second classification label of the box is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and the A detections. The second classification label of the box.
  • the above A detection frames include N text line detection frames and M non-text line detection frames;
  • the first extraction unit 301 includes: a text extraction unit, configured to use a classification and positioning model based on the YOLO network to target the first target The feature information in the image is detected, and N text line detection frames are extracted; the non-text extraction unit is used to detect the feature information in the first target image using the classification and positioning model based on the YOLO network, and extract M non-text line detection frame.
  • the aforementioned text extraction unit includes: a detection frame extraction unit, a matching unit, a correction unit, and a first filtering unit.
  • the detection frame extraction unit is used for extracting n text head detection frames and n text tail detection frames from the first target image by using the classification and positioning model, and the first text head detection frame of the n text head detection frames includes The first B characters of the first text line in the first target image, the length of the first B characters of the first text line is L1, and the text header detection box also includes non-text with a length of t*L1 before the above B characters
  • the first text end detection box in the n text end detection boxes includes the last C characters of the first text line, the length of the last C characters of the first text line is L2, and the text end detection box also includes the above
  • the length of the non-text image area after C characters is t*L2, B and C are positive integers, and t is greater than zero and less than or equal to 1.
  • the matching unit is configured to match the n text head detection boxes with the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity to obtain the n text line detection boxes.
  • the correction unit is used to correct the above-mentioned n text line detection frames, remove the non-text image area in the text line detection frame, and obtain n prediction frames.
  • the first filtering unit uses a non-maximum value suppression algorithm to filter the above n prediction boxes to obtain the above N text line detection boxes, the target detection scores of the above N text line detection boxes, and the above N text line detection boxes. Sort tags for the first time.
  • the aforementioned non-text extraction unit includes: a first acquisition unit, configured to perform feature extraction on the first target image using the classification and positioning model to obtain m feature maps with a size of a*a, the feature maps containing non-text
  • the second acquisition unit is used to predict the center coordinates of the non-text line information in the m feature maps, and use the K-means clustering algorithm to acquire the length and width of the m prediction boxes based on the center coordinates.
  • the m prediction boxes contain the confidence of the non-text line feature information and the confidence of the category of the non-text line feature information in the m prediction boxes; the second filtering unit is used to use the non-maximum value suppression algorithm to predict the m
  • the frames are filtered to obtain the M non-text line detection frames, the target detection scores of the M non-text line detection frames, and the first classification labels of the M non-text line detection frames.
  • the aforementioned extraction unit uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image.
  • the aforementioned device further includes: a pre-training unit.
  • the pre-training unit is used to pre-train the YOLO network.
  • the above-mentioned pre-training unit includes: a establishing unit for establishing a sample database, the sample database containing image samples for training the YOLO network; an initialization unit for initializing the training parameters of the YOLO network; a selection unit for randomly selecting from the sample database Select the image sample as the training sample; the input unit is used to input the training sample as the input vector into the YOLO network; the third acquisition unit is used to obtain the output vector of the YOLO network, that is, the feature map of the training sample; the processing unit is used to according to the output vector Optimize the training parameters and establish a residual network between the image sample and the feature map of the image sample.
  • the steps of the method or algorithm described in combination with the disclosure of the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: random access memory, referred to as RAM), flash memory, read-only memory (English: read only memory, referred to as ROM), Erasable programmable read-only memory (English: erasable programmable rom, abbreviation: EPROM), electrically erasable programmable read-only memory (English: electrically eprom, abbreviation: EEPROM), register, hard disk, mobile hard disk, CD-ROM (CD -ROM) or any other form of storage medium known in the art.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC can be located in a network device.
  • the processor and the storage medium may also exist as discrete components in the network device.
  • the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
  • the computer-readable medium described in this application may be a non-volatile computer-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A certificate information classification and positioning method and apparatus. The method comprises: a server detecting A pieces of feature information in a first target image by means of a classification and positioning model based on a YOLO network, extracting A detection boxes, and acquiring first border information of the A detection boxes and first classification labels of the A detection boxes, wherein the first target image includes a first certificate, and A is a positive integer greater than 0 (S201); and the server adjusting, according to structured information features of the first certificate, the border information of the A detection boxes and the classification labels of the A detection boxes, and generating second border information of the A detection boxes and second classification labels of the A detection boxes (S202). The application range of the method can be enlarged, and the detection speed is improved.

Description

一种证件信息的分类定位方法及装置Method and device for classifying and positioning certificate information
本申请要求于2019年09月18日提交中国专利局、申请号为201910880737X、申请名称为“一种证件信息的分类定位方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 18, 2019, the application number is 201910880737X, and the application name is "a method and device for classification and positioning of document information", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种证件信息的分类定位方法及装置。This application relates to the field of computer technology, and in particular to a method and device for classifying and positioning credential information.
背景技术Background technique
身份证、银行卡等证件的卡面信息的分类定位,通常使用文本行的固定位置提取或者通用文本检测方法。前者适用范围受限,过度依赖证件的轮廓提取以及图像矫正,后者检测速度慢,同时对提取文本还需按照内容进行分类,进一步降低了准确性。The classification and positioning of the card surface information of ID cards, bank cards, etc. usually use fixed position extraction of text lines or general text detection methods. The former has a limited scope of application and overly relies on the contour extraction and image correction of the certificate. The latter has a slow detection speed. At the same time, the extracted text needs to be classified according to the content, which further reduces the accuracy.
综上所述,现有的证件信息的分类定位方法在实际应用场景下,适用范围受限,检测速度慢。In summary, the existing classification and positioning methods for document information have limited application scope and slow detection speed in actual application scenarios.
发明内容Summary of the invention
本申请实施例提供了一种证件信息的分类定位方法及装置,能够扩大适用范围,提升检测速度。The embodiments of the present application provide a classification and positioning method and device for credential information, which can expand the scope of application and increase the detection speed.
本申请实施例提供了一种证件信息的分类定位方法,该方法包括以下步骤:服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取上述A个检测框的第一边框信息和上述A个检测框的第一次分类标签,第一目标图像包含第一证件,A为大于0的正整数;服务器根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签。The embodiment of the application provides a classification and positioning method for credential information, which includes the following steps: a server uses a classification and positioning model based on the YOLO network to detect A feature information in a first target image, and extract A detection frames, And obtain the first frame information of the above A detection frames and the first classification label of the above A detection frames. The first target image contains the first document, and A is a positive integer greater than 0; the server is structured according to the first document The information feature adjusts the frame information of the A detection frames and the classification labels of the A detection frames to generate the second frame information of the A detection frames and the second classification labels of the A detection frames.
本申请实施例还提供了一种证件信息的分类定位的装置,该装置能实现上述证件信息的分类定位方法所具备的有益效果。其中,该装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括至少一个与上述功能相对应的模块。The embodiment of the present application also provides a device for classification and positioning of credential information, which can realize the beneficial effects of the above-mentioned classification and positioning method for credential information. Among them, the function of the device can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes at least one module corresponding to the above-mentioned functions.
可选的,该装置包括第一提取单元和调整单元。Optionally, the device includes a first extraction unit and an adjustment unit.
第一提取单元,用于利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取上述A个检测框的第一边框信息和上述A个检测框的第一次分类标签,第一目标图像包含第一证件,A为大于0的正整数;The first extraction unit is used to detect A feature information in the first target image using a classification and positioning model based on the YOLO network, extract A detection frames, and obtain the first frame information of the A detection frames and the A The first classification label of a detection frame, the first target image contains the first document, and A is a positive integer greater than 0;
调整单元,用于根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签。The adjustment unit is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first document, and generate the second frame information of the A detection frames and the A detection frames The second classification label.
本申请实施例还提供了一种服务器,该服务器能实现上述证件信息的分类定位方法所具备的有益效果。其中,该服务器的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括至少一个与上述功能相对应的模块。该服务器包括存储器、处理器和收发器,存储器用于存储支持服务器执行上述方法的计算机程序,所述计算机程序包括程序指令,处理器用于根据程序指令对服务器的动作进行控制管理,收发器用于支持服务器与其它通信设备的通信。The embodiment of the present application also provides a server, which can realize the beneficial effects of the above-mentioned classification and positioning method for credential information. Among them, the function of the server can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes at least one module corresponding to the above-mentioned functions. The server includes a memory, a processor, and a transceiver. The memory is used to store a computer program that supports the server to execute the above method. The computer program includes program instructions. The processor is used to control and manage the actions of the server according to the program instructions. The transceiver is used to support The server communicates with other communication devices.
本申请实施例还提供了一种计算机可读存储介质,可读存储介质上存储有指令,当其在处理器上运行时,使得处理器执行上述证件信息的分类定位方法。The embodiment of the present application also provides a computer-readable storage medium with instructions stored on the readable storage medium, which when run on a processor, cause the processor to execute the above-mentioned classification and positioning method of credential information.
本申请实施例中,服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取上述A个检测框的第一边框信息和上述A 个检测框的第一次分类标签,第一目标图像包含第一证件,A为大于0的正整数;服务器根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签。本申请实施例所提方案,不依赖证件的轮廓提取以及图像矫正,能够扩大适用范围,本申请实施例采用基于YOLO网络的分类定位模型,有效提升了证件信息的分类定位的检测速度。In the embodiment of this application, the server uses the classification and positioning model based on the YOLO network to detect A feature information in the first target image, extracts A detection frames, and obtains the first frame information of the A detection frames and the A The first classification label of each detection frame, the first target image contains the first document, and A is a positive integer greater than 0; the server adjusts the border information of the above A detection frames and the above A according to the structured information characteristics of the first document The classification label of the detection frame, the second frame information of the above A detection frames and the second classification label of the above A detection frames are generated. The solution proposed in the embodiment of this application does not rely on the contour extraction and image correction of the certificate, and can expand the scope of application. The embodiment of this application adopts the classification and positioning model based on the YOLO network, which effectively improves the detection speed of the classification and positioning of the certificate information.
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。The additional aspects and advantages of the present application will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of the present application.
附图说明Description of the drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1是本申请实施例提供的一种服务器的结构示意图;FIG. 1 is a schematic structural diagram of a server provided by an embodiment of the present application;
图2是本申请实施例提供的一种证件信息的分类定位方法的流程示意图;2 is a schematic flowchart of a method for classifying and positioning credential information provided by an embodiment of the present application;
图3是本申请实施例提供的一种证件信息的分类定位装置的结构示意图。Fig. 3 is a schematic structural diagram of an apparatus for classifying and positioning credential information provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。此外,术语“第一”、“第二”和“第三”等是用于区别不同的对象,而并非用于描述特定的顺序。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. It should be understood that when used in this specification and appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof. In addition, the terms "first", "second", "third", etc. are used to distinguish different objects, but not to describe a specific sequence.
需要说明的是,在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。It should be noted that the terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms of "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items.
需要说明的是,本申请实施例中的服务器可以是能够承担服务并保障服务能力的常规服务器,也可以是具有处理器、硬盘、内存和系统总线结构的能够承担服务并保障服务能力的终端设备。本申请实施例不作具体限定。It should be noted that the server in the embodiment of the present application can be a conventional server that can undertake services and guarantee service capabilities, or it can be a terminal device that has a processor, hard disk, memory, and system bus structure that can undertake services and guarantee service capabilities. . The embodiments of this application do not make specific limitations.
YOLO网络是深度残差网络,深度残差网络相对于一般深度网络的优势在于使用高速网络解决层数较高的深度网络中的梯度消失问题。在深度神经网络中,如果层数较高,其较深的某些层很可能需要模拟一个恒等映射,而这个恒等映射对于某一层是较难学习的。因此,深度残差网络利用捷径连接把原本的恒等映射F(x)=x设计为F(x)=g(x)+x,也即g(x)=F(x)-x,只要学习使残差g(x)=0,就能学习到一个恒等映射,降低了学习恒等映射的难度。利用深度残差网络,可以有效解决在深度网络层数较多时产生的梯度消失问题,使得在层数较大时深度网络的误差也不会增大,提高训练效率。The YOLO network is a deep residual network. The advantage of the deep residual network over the general deep network is to use a high-speed network to solve the problem of gradient disappearance in a deep network with a higher number of layers. In a deep neural network, if the number of layers is high, some of its deeper layers may need to simulate an identity map, and this identity map is more difficult to learn for a certain layer. Therefore, the deep residual network uses shortcut connections to design the original identity mapping F(x)=x as F(x)=g(x)+x, that is, g(x)=F(x)-x, as long as By learning to make the residual g(x)=0, an identity mapping can be learned, which reduces the difficulty of learning identity mapping. Using the deep residual network can effectively solve the problem of gradient disappearance when the number of layers of the deep network is large, so that the error of the deep network will not increase when the number of layers is large, and the training efficiency can be improved.
请参见图1,图1为本申请实施例提供的一种服务器100的硬件结构示意图,服务器100包括:存储器101、收发器102及与所述存储器101和收发器102耦合的处理器103。存储器101用于存储计算机程序,所述计算机程序包括程序指令,处理器103用于执行存储器101存储的程序指令,收发器102用于在处理器103的控制下与其他设备进行通信。当处理器103在执行指令时可根据程序指令执行证件信息的分类定位方法。Please refer to FIG. 1, which is a schematic diagram of the hardware structure of a server 100 according to an embodiment of the application. The server 100 includes a memory 101, a transceiver 102, and a processor 103 coupled to the memory 101 and the transceiver 102. The memory 101 is configured to store a computer program, the computer program includes program instructions, the processor 103 is configured to execute the program instructions stored in the memory 101, and the transceiver 102 is configured to communicate with other devices under the control of the processor 103. When the processor 103 is executing instructions, it can execute the classification and positioning method of the credential information according to the program instructions.
其中,处理器103可以是中央处理器(英文:central processing unit,简称:CPU),通用处理器,数字信号处理器(英文:digital signal processor,简称:DSP),专用集成电路(英文:application-specific integrated circuit,简称:ASIC),现场可编程门阵列(英文:field programmable gate array,简称:FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件 部件或者其任意组合。其可以实现或执行结合本申请实施例公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。收发器102可以是通信接口、收发电路等,其中,通信接口是统称,可以包括一个或多个接口,例如服务器与终端之间的接口。Among them, the processor 103 may be a central processing unit (English: central processing unit, abbreviated as: CPU), a general-purpose processor, a digital signal processor (English: digital signal processor, abbreviated as: DSP), an application specific integrated circuit (English: application- Specific integrated circuit, abbreviation: ASIC), field programmable gate array (English: field programmable gate array, abbreviation: FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the embodiments of the present application. The processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on. The transceiver 102 may be a communication interface, a transceiver circuit, etc., where the communication interface is a general term and may include one or more interfaces, such as an interface between a server and a terminal.
可选地,服务器100还可以包括总线104。其中,存储器101、收发器102以及处理器103可以通过总线104相互连接;总线104可以是外设部件互连标准(英文:peripheral component interconnect,简称:PCI)总线或扩展工业标准结构(英文:extended industry standard architecture,简称:EISA)总线等。总线104可以分为地址总线、数据总线、控制总线等。为便于表示,图1中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Optionally, the server 100 may further include a bus 104. Among them, the memory 101, the transceiver 102, and the processor 103 may be connected to each other through a bus 104; the bus 104 may be a peripheral component interconnection standard (English: peripheral component interconnect, abbreviated as: PCI) bus or an extended industry standard structure (English: extended industry standard architecture, referred to as EISA) bus, etc. The bus 104 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 1, but it does not mean that there is only one bus or one type of bus.
除了图1所示的存储器101、收发器102、处理器103以及上述总线104之外,实施例中服务器100通常根据该服务器的实际功能,还可以包括其他硬件,对此不再赘述。In addition to the memory 101, the transceiver 102, the processor 103, and the aforementioned bus 104 shown in FIG. 1, the server 100 in the embodiment may also include other hardware generally according to the actual function of the server, which will not be repeated here.
在上述运行环境下,本申请实施例提供了如图2所示的证件信息的分类定位方法。请参阅图2,所述证件信息的分类定位方法包括:In the foregoing operating environment, the embodiment of the present application provides a classification and positioning method for credential information as shown in FIG. 2. Please refer to Figure 2. The classification and positioning method of the certificate information includes:
S201、服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,获取上述A个检测框的第一边框信息和上述A个检测框的第一次分类标签,第一目标图像包含第一证件,A为大于0的正整数。S201. The server uses the classification and positioning model based on the YOLO network to detect A feature information in the first target image, extracts A detection frames, and obtains the first frame information of the A detection frame and the first frame information of the A detection frame. Once the label is classified, the first target image contains the first certificate, and A is a positive integer greater than 0.
可选的,上述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测之前,上述方法还包括:对第二目标图像进行二值化处理,获取第二目标图像的二值化图像,即第一目标图像。Optionally, before the server uses the classification and positioning model based on the YOLO network to detect the A feature information in the first target image, the method further includes: binarizing the second target image to obtain the second target image The binarized image is the first target image.
可选的,上述A个检测框包括N个文本行检测框和M个非文本行检测框,服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框,包括:服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取N个文本行检测框;服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取M个非文本行检测框。Optionally, the above A detection frames include N text line detection frames and M non-text line detection frames, and the server uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image, and extract A detection frames The frame includes: the server uses the classification and positioning model based on the YOLO network to detect the feature information in the first target image, and extracts N text line detection frames; the server uses the classification and positioning model based on the YOLO network to detect the features in the first target image Information is detected, and M non-text line detection boxes are extracted.
举例来说,身份证的正面包括8个特征信息,分别是姓名、性别、民族、地址、出生年月日、住址、身份证号和身份证的证件照。上述身份证正面的8个特征信息中包含一个非文本行信息和7个文本行信息。护照内页的个人资料页包括12个特征信息,分别是类型/Type、国家码/Country Code、护照号/Passport No、姓/Surname、名/Given names、性别/Sex、出生地点/Place of birth、出生日期/Date of birth、签发地点/Place of issue、签发日期/Date of issue、签发机关/Authority和护照的证件照。上述护照内页的8个特征信息中包含一个非文本行信息和11文本行信息。For example, the front of the ID card includes 8 characteristic information, namely name, gender, ethnicity, address, date of birth, address, ID number, and ID photo. The 8 feature information on the front of the ID card includes one non-text line information and 7 text line information. The personal data page on the inside page of the passport includes 12 characteristic information, namely Type/Type, Country Code/Country Code, Passport No./Passport No, Last Name/Surname, First Name/Given Names, Gender/Sex, Place of Birth/Place of Birth , Date of Birth/Date of Birth, Place of Issue/Place of Issue, Date of Issue/Date of Issue, Issuing Authority/Authority, and passport photo. The 8 feature information on the inside page of the passport contains one non-text line information and 11 text line information.
本申请实施例中,上述文本行可以为不包含断句标点符号的连续p个符号,上述断句标点符号包括逗号、句号和感叹号等。上述文本行中任意两个字符的间距不超过第一距离阈值,上述第一距离阈值由实际应用情况决定,本申请实施例对此不做具体限定。上述文本行中的符号可以包括汉字、英文字母、数字和非断句标点符号等,上述非断句标点符号包括加号、减号和分号等。P为大于等于0的正整数。In the embodiment of the present application, the above-mentioned text line may be consecutive p symbols that do not include sentence-breaking punctuation marks, and the above-mentioned sentence-breaking punctuation marks include commas, periods, and exclamation points. The distance between any two characters in the text line does not exceed the first distance threshold, and the first distance threshold is determined by actual application conditions, which is not specifically limited in the embodiment of the present application. The symbols in the above text line may include Chinese characters, English letters, numbers, and non-breaking punctuation marks, etc. The above non-breaking punctuation marks include plus signs, minus signs, and semicolons. P is a positive integer greater than or equal to 0.
可选的,上述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取N个文本行检测框,包括:Optionally, the above-mentioned server uses a classification and positioning model based on the YOLO network to detect feature information in the first target image, and extracts N text line detection frames, including:
S1、服务器利用分类定位模型从第一目标图像中提取n个文本头检测框和n个文本尾检测框,上述n个文本头检测框中的第一文本头检测框中包括第一目标图像中的第一文本行的前B个字符,第一文本行的前B个字符的长度为L1,上述文本头检测框还包括上述B个字符之前的长度为t*L1的非文本图像区域,上述n个文本尾检测框中的第一文本尾检测框包括第一文本行的后C个字符,第一文本行的后C个字符的长度为L2,文本尾检测框还包括上述C个字符之后的长度为t*L2的非文本图像区域。B和C为正整数,t大于零小于 等于1。S1. The server uses the classification and positioning model to extract n text head detection boxes and n text tail detection boxes from the first target image. The first text head detection box of the above n text head detection boxes includes the first target image The first B characters of the first text line, the length of the first B characters of the first text line is L1, the text header detection box also includes a non-text image area with a length of t*L1 before the B characters, the above The first text end detection box in the n text end detection boxes includes the last C characters of the first text line, the last C characters of the first text line are L2, and the text end detection box also includes the above C characters after A non-text image area with a length of t*L2. B and C are positive integers, and t is greater than zero and less than or equal to 1.
S2、服务器基于文本行的斜率一致性和就近原则将上述n个文本头检测框和上述n个文本尾检测框进行匹配,获得上述n个文本行的初始检测框。S2. The server matches the n text header detection boxes with the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity, to obtain the initial detection boxes of the n text lines.
S3、服务器对上述n个文本行的初始检测框进行修正,去除文本行检测框中的非文本图像区,获得n个预测框。S3. The server corrects the initial detection frame of the above n text lines, removes the non-text image area in the text line detection frame, and obtains n prediction frames.
S4、服务器采用K-means聚类算法获取上述n个预测框包含文本行特征信息的置信度和上述n个预测框内文本行特征信息所属类别的置信度。S4. The server uses the K-means clustering algorithm to obtain the confidence that the n prediction boxes contain the text line feature information and the confidence of the category to which the text line feature information in the n prediction boxes belongs.
S5、服务器利用非极大值抑制算法对上述n个预测框进行过滤,获得上述N个文本行检测框、上述N个文本行检测框的目标检查分数和上述N个文本行检测框的第一次分类标签。S5. The server uses a non-maximum value suppression algorithm to filter the above n prediction boxes to obtain the above N text line detection boxes, the target check scores of the above N text line detection boxes, and the first of the above N text line detection boxes. Sub-category label.
需要说明的是,证件中的文本行息多满足文本行斜率一致性,即一个文本行中的任意两个字符的连接斜率相同,和/或任意两个文本行的斜率均相同。例如身份证、银行卡和社保卡等。It should be noted that the text line information in the certificate mostly meets the consistency of the text line slope, that is, the connection slopes of any two characters in a text line are the same, and/or the slopes of any two text lines are the same. For example, ID cards, bank cards and social security cards.
可选的,服务器基于文本行的斜率一致性和就近原则将上述n个文本头检测框和上述n个文本尾检测框进行匹配,获得上述n个文本行的初始检测框,包括:基于适当的参考水平线,服务器分别计算n个文本头检测框的斜率、n个文本尾检测框的斜率,以及n个文本头检测框中的第i个文本头检测框和n个文本尾检测框中的第j个文本尾检测框的连接斜率。然后在满足斜率一致性条件的情况下,基于就近原则将和顺序一致性将上述n个文本头检测框和上述n个文本尾检测框进行一一匹配。顺序一致性指的是第一目标图像中所有文本行的文本头检测框均在该文本行的文本尾检测框的左边(右边)。Optionally, the server matches the aforementioned n text header detection boxes with the aforementioned n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity, to obtain the initial detection frame of the aforementioned n text lines, including: With reference to the horizontal line, the server calculates the slopes of the n text head detection boxes, the slopes of the n text tail detection boxes, and the i-th text head detection box in the n text head detection boxes and the first text tail detection box in the n text tail detection boxes. The connection slope of j text end detection boxes. Then, when the slope consistency condition is satisfied, the n text head detection boxes and the n text tail detection boxes are matched one by one based on the principle of proximity and sequence consistency. Sequential consistency means that the text head detection box of all text lines in the first target image is on the left (right) of the text end detection box of the text line.
可选的,上述第i个文本头检测框和上述第j个文本尾检测框的连接斜率指的是:上述第i个文本头检测框的中心坐标与上述第j个文本尾检测框的中心坐标的连线的斜率。Optionally, the connection slope between the i-th text head detection box and the j-th text tail detection box refers to: the center coordinates of the i-th text head detection box and the center of the j-th text tail detection box The slope of the line of coordinates.
可选的,n个文本头检测框中的第i个文本头检测框和n个文本尾检测框中的第g个文本尾检测框的连接斜率为第二斜率,上述第i个文本头检测框和上述第g个文本尾检测框满足斜率一致性条件指的是:上述第g个文本尾检测框的斜率与第一斜率的差值小于第一预设阈值,且上述第二斜率与第一斜率的差值小于第二预设阈值。上述第一斜率可以是上述第i个文本头检测框的斜率,也可以是n个文本头检测框和n个文本尾检测框的斜率平均值。Optionally, the connection slope of the i-th text head detection box in the n text head detection boxes and the g-th text tail detection box in the n text tail detection boxes is the second slope, and the above i-th text head detection If the box and the g-th text tail detection box meet the slope consistency condition, it means that the difference between the slope of the g-th text tail detection box and the first slope is less than the first preset threshold, and the second slope and the first slope The difference of a slope is smaller than the second preset threshold. The first slope may be the slope of the i-th text head detection frame, or the average value of the slopes of the n text head detection frames and the n text tail detection frames.
需要说明的是,第一预设阈值和第二预设阈值的设定与上述斜率平均值相关,依据实际情况而定,本申请实施例对不作具体限定。It should be noted that the settings of the first preset threshold and the second preset threshold are related to the above-mentioned average value of the slope, and are determined according to actual conditions, and the embodiment of the present application does not specifically limit it.
可以理解的是,每个文本行的初始检测框的文本头包含长度为t*L1非文本图像区域,每个文本行的初始检测框的文本尾包含长度为t*L2非文本图像区域,因此服务器对需要对文本行的初始检测框进行修正,以去除文本行检测框中的非文本图像区,获得上述N个文本行检测框。It is understandable that the text header of the initial detection box of each text line contains a non-text image area of length t*L1, and the text end of the initial detection box of each text line contains a non-text image area of length t*L2, so The server corrects the initial detection frame of the text line to remove the non-text image area in the text line detection frame to obtain the above-mentioned N text line detection frames.
可选的,上述服务器利用分类定位模型对第一目标图像中的特征信息进行检测,提取M个非文本行检测框,包括:Optionally, the foregoing server uses a classification and positioning model to detect feature information in the first target image, and extracts M non-text line detection frames, including:
S4、服务器利用分类定位模型对第一目标图像进行特征提取,获得m张a*a尺寸大小的特征图,特征图为包含非文本行特征信息的图像;S4. The server uses the classification and positioning model to perform feature extraction on the first target image to obtain m feature maps with a size of a*a, where the feature maps are images containing non-text line feature information;
S5、服务器将m张特征图中的每张特征图分为a*a个网络单元格,对上述m张特征图中的非文本行特征信息进行中心坐标预测,基于中心坐标采用K-means聚类算法获取m个预测框的长和宽、上述m个预测框包含非文本行特征信息的置信度和上述m个预测框内非文本行特征信息所属类别的置信度;S5. The server divides each feature map in the m feature maps into a*a network cells, and predicts the center coordinates of the non-text line feature information in the m feature maps, and uses K-means clustering based on the center coordinates. The class algorithm obtains the length and width of m prediction boxes, the confidence that the m prediction boxes contain non-text line feature information, and the confidence of the category of the non-text line feature information in the m prediction frames;
S6、服务器利用非极大值抑制算法对上述m个预测框进行过滤,获得上述M个非文本行检测框、上述M个非文本行检测框的目标检测分数和上述M个非文本行检测框的第 一次分类标签。S6. The server uses a non-maximum value suppression algorithm to filter the above m prediction boxes to obtain the above M non-text line detection boxes, the target detection scores of the above M non-text line detection boxes, and the above M non-text line detection boxes The first classification label.
可选的,利用sigmoid函数进行非文本行特征信息的中心坐标预测。Optionally, the sigmoid function is used to predict the center coordinates of the non-text line feature information.
可选的,上述服务器利用非极大值抑制算法对上述m个预测框进行过滤,获得上述M个非文本行检测框,包括:利用非极大值抑制算法生成上述m个预测框的目标检测分数,将上述m个预测框的得分进行排序,选中最高分及其对应的预测框。遍历其余的预测框,如果存在预测框和当前最高分的预测框的重叠面积大于第三阈值,便将该预测框框删除。从未处理的预测框中继续选一个得分最高的,重复上述过程,直到选出M个预测框作为M个非文本行检测框。Optionally, the server uses a non-maximum value suppression algorithm to filter the m prediction frames to obtain the M non-text line detection frames, including: using a non-maximum value suppression algorithm to generate target detections for the m prediction frames Score, sort the scores of the above m prediction boxes, and select the highest score and its corresponding prediction box. Traverse the rest of the prediction frames, and if the overlap area between the prediction frame and the current highest score prediction frame is greater than the third threshold, delete the prediction frame. Continue to select one with the highest score from the unprocessed prediction box, and repeat the above process until M prediction boxes are selected as M non-text line detection boxes.
可以理解,非极大值抑制算法基于目标检测分数产生检测框,分数最高的预测框被选中,其他与被选中预测框有明显重叠的预测框被抑制。该过程被不断递归的应用于其余预测框。It can be understood that the non-maximum suppression algorithm generates a detection frame based on the target detection score, the prediction frame with the highest score is selected, and other prediction frames that have obvious overlap with the selected prediction frame are suppressed. This process is continuously recursively applied to the remaining prediction boxes.
本申请实施例中,上述A个检测框的边框信息的边框信息包括检测框的中心坐标、检测框的长和检测框的宽。In the embodiment of the present application, the frame information of the frame information of the A detection frames includes the center coordinates of the detection frame, the length of the detection frame, and the width of the detection frame.
S202、服务器根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签。S202. The server adjusts the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generates the second frame information of the A detection frames and the first frame information of the A detection frames. Secondary classification label.
本申请实施例中,上述第一证件的结构化信息特征指的是第一证件的A个特征信息中的任意两个特征信息的相对位置关系和相对比例。In the embodiment of the present application, the above-mentioned structured information feature of the first certificate refers to the relative positional relationship and relative ratio of any two feature information in the A pieces of feature information of the first certificate.
可选的,服务器根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签,包括步骤S7至S14。不限于上述步骤,本申请实施例中还可以包括其他步骤。Optionally, the server adjusts the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generates the second frame information of the A detection frames and the A detection frames. The second classification label includes steps S7 to S14. It is not limited to the above steps, and other steps may also be included in the embodiments of the present application.
S7、i=0,从A-i个检测框中选择目标检测分数最高的第一检测框,第一检测框的第一次分类标签为第一特征信息。S7, i=0, select the first detection frame with the highest target detection score from A-i detection frames, and the first classification label of the first detection frame is the first feature information.
S8、以第一检测框为参考,根据第一特征信息与剩余A-1个特征信息的相对位置关系和相对比例,获取剩余A-1个特征信息对应的参考预测框、参考预测框对应的边框信息。S8. Using the first detection frame as a reference, according to the relative position relationship and relative ratio between the first feature information and the remaining A-1 feature information, obtain the reference prediction frame corresponding to the remaining A-1 feature information, and the reference prediction frame corresponding to the reference prediction frame. Border information.
S9、j=1,从剩余A-1个检测框中选择与A-1个参考预测框中的第j个参考预测框的重叠面积最大的检测框。若该检测框与上述第j个参考预测框的重叠面积大于第三预设阈值,且该检测框对应的第一次分类标签与第j个参考预测框对应的特征信息相同,则将该检测框对应的目标检测分数增加Δt;若该检测框对应的第一次分类标签与第j个参考预测框对应的特征信息不相同,则将该检测框对应的目标检测分数降低Δt。S9, j=1, select the detection frame with the largest overlap area with the j-th reference prediction frame of the A-1 reference prediction frames from the remaining A-1 detection frames. If the overlap area between the detection frame and the j-th reference prediction frame is greater than the third preset threshold, and the first classification label corresponding to the detection frame is the same as the feature information corresponding to the j-th reference prediction frame, then the detection The target detection score corresponding to the frame is increased by Δt; if the first classification label corresponding to the detection frame is different from the feature information corresponding to the j-th reference prediction frame, the target detection score corresponding to the detection frame is decreased by Δt.
S9、j=j+1,且j小于等于A-1。S9, j=j+1, and j is less than or equal to A-1.
重复步骤S9和S10。Repeat steps S9 and S10.
S10、i=i+1,且i小于等于A-1。S10, i=i+1, and i is less than or equal to A-1.
重复步骤S7至S10直至遍历上述A个检测框。Repeat steps S7 to S10 until the above A detection frames are traversed.
S11、从遍历后的A个检测框中选择目标检测分数最高的第三检测框,第三检测框的第一次分类标签为第三特征信息。S11. Select the third detection frame with the highest target detection score from the traversed A detection frames, and the first classification label of the third detection frame is the third feature information.
S12、以第三检测框为参考,根据第三特征信息剩余与A-1个特征信息的相对位置关系和相对比例,获取剩余A-1个特征信息对应的参考预测框、参考预测框对应的边框信息。S12. Take the third detection frame as a reference, and obtain the reference prediction frame corresponding to the remaining A-1 feature information and the reference prediction frame corresponding to the remaining A-1 feature information according to the relative positional relationship and relative ratio between the remaining third feature information and the A-1 feature information Border information.
S13、j=1,从剩余A-1个检测框中选择与A-1个参考预测框中的第j个参考预测框的重叠面积最大的检测框。若该检测框与上述第j个参考预测框的重叠面积大于第四预设阈值,则令该检测框对应的第二次分类标签与第j个参考预测框对应的特征信息相同。并根据上述第j个参考预测框的边框信息,将该检测框第一边框信息调整为第二边框信息。S13, j=1, select the detection frame with the largest overlap area with the j-th reference prediction frame of the A-1 reference prediction frames from the remaining A-1 detection frames. If the overlap area between the detection frame and the j-th reference prediction frame is greater than the fourth preset threshold, the second classification label corresponding to the detection frame is made the same as the feature information corresponding to the j-th reference prediction frame. And according to the frame information of the j-th reference prediction frame, the first frame information of the detection frame is adjusted to the second frame information.
S14、j=j+1,且j小于等于A-1。S14, j=j+1, and j is less than or equal to A-1.
重复步骤S13和S14。直到生成上述A个检测框的第二边框信息和上述A个检测框的 第二次分类标签。Repeat steps S13 and S14. Until the second frame information of the A detection frames and the second classification label of the A detection frames are generated.
可选的,根据上述第j个参考预测框的边框信息,将该检测框第一边框信息调整为第二边框信息,包括:Optionally, adjusting the first frame information of the detection frame to the second frame information according to the frame information of the j-th reference prediction frame includes:
该检测框的中心坐标为(x1,y1),该检测框与上述第j个参考预测框的中心坐标的差值为(x2,y2),则调整检测框的中心坐标为(x1+a*x2,y1+a*y2)。该检测框的长为L1,该检测框与上述第j个参考预测框的长的差值为L2,则调整检测框的长为L1+b*L2。该检测框的宽为K1,该检测框与上述第j个参考预测框的宽的差值为K2,则调整检测框的宽为K1+c*K2。a、b和c均大于等于零小于等于1。例如a、b和c取值均为0.5。The center coordinates of the detection frame are (x1, y1), and the difference between the center coordinates of the detection frame and the j-th reference prediction frame is (x2, y2), then the center coordinates of the detection frame are adjusted to (x1+a* x2, y1+a*y2). The length of the detection frame is L1, and the difference between the length of the detection frame and the aforementioned j-th reference prediction frame is L2, and the length of the detection frame is adjusted to L1+b*L2. The width of the detection frame is K1, and the difference between the width of the detection frame and the aforementioned j-th reference prediction frame is K2, and the width of the detection frame is adjusted to K1+c*K2. a, b, and c are all greater than or equal to zero and less than or equal to 1. For example, the values of a, b, and c are all 0.5.
可选的,上述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框之前,方法还包括:对YOLO网络进行预训练;上述对YOLO网络进行预训练,包括:建立样本数据库,样本数据库包含用于训练YOLO网络的图像样本;初始化YOLO网络的训练参数;从样本数据库中随机选择图像样本作为训练样本;将训练样本作为输入向量输入YOLO网络;获取YOLO网络输出向量,即训练样本的特征图;根据输出向量优化训练参数,建立图像样本和图像样本的特征图之间的残差网络。Optionally, the above-mentioned server uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image. Before extracting A detection frames, the method further includes: pre-training the YOLO network; the above-mentioned pre-training the YOLO network Training includes: establishing a sample database, which contains image samples used to train the YOLO network; initializing the training parameters of the YOLO network; randomly selecting image samples from the sample database as training samples; inputting the training samples as input vectors into the YOLO network; obtaining The output vector of the YOLO network is the feature map of the training sample; the training parameters are optimized according to the output vector, and the residual network between the image sample and the feature map of the image sample is established.
可选的,采用迁移学习策略,将ImageNet数据集训练好的网络参数作为YOLO网络的训练参数。Optionally, a migration learning strategy is adopted, and the network parameters trained on the ImageNet data set are used as the training parameters of the YOLO network.
本申请实施例中,通过利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取上述A个检测框的第一边框信息和上述A个检测框的第一次分类标签,第一目标图像包含第一证件,A为大于0的正整数;服务器根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签。本申请实施例所提方案,不依赖证件的轮廓提取以及图像矫正,能够扩大适用范围,本申请实施例采用基于YOLO网络的分类定位模型,并利用了文本行的斜率一致性,有效提升了证件信息的分类定位的检测速度。In the embodiment of the present application, the A feature information in the first target image is detected by using the classification and positioning model based on the YOLO network, A detection frames are extracted, and the first frame information of the A detection frames and the A The first classification label of each detection frame, the first target image contains the first document, and A is a positive integer greater than 0; the server adjusts the border information of the above A detection frames and the above A according to the structured information characteristics of the first document The classification label of the detection frame, the second frame information of the above A detection frames and the second classification label of the above A detection frames are generated. The solution proposed in the embodiment of this application does not rely on the contour extraction and image correction of the certificate, and can expand the scope of application. The embodiment of this application adopts the classification and positioning model based on the YOLO network and uses the consistency of the slope of the text line to effectively improve the certificate. Detection speed of information classification and positioning.
本申请实施例还提供了一种证件信息的分类定位装置,该装置能上述证件信息的分类定位方法所具备的有益效果。其中,该装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括至少一个与上述功能相对应的模块。The embodiment of the present application also provides a device for classification and positioning of credential information, which can achieve the beneficial effects of the above-mentioned classification and positioning method for credential information. Among them, the function of the device can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes at least one module corresponding to the above-mentioned functions.
请参阅图3,图3是本申请实施例提供的一种证件信息的分类定位装置300的结构框图,所述装置包括:第一提取单元301和调整单元302。Please refer to FIG. 3. FIG. 3 is a structural block diagram of an apparatus 300 for classifying and positioning credential information according to an embodiment of the present application. The apparatus includes a first extraction unit 301 and an adjustment unit 302.
第一提取单元301,用于利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取上述A个检测框的第一边框信息和上述A个检测框的第一次分类标签,第一目标图像包含第一证件,A为大于0的正整数;The first extraction unit 301 is configured to detect A feature information in the first target image using a classification and positioning model based on the YOLO network, extract A detection frames, and obtain the first frame information of the A detection frames and the foregoing The first classification label of A detection frames, the first target image contains the first certificate, and A is a positive integer greater than 0;
调整单元302,用于根据第一证件的结构化信息特征调整上述A个检测框的边框信息和上述A个检测框的分类标签,生成上述A个检测框的第二边框信息和上述A个检测框的第二次分类标签。The adjustment unit 302 is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and the A detections. The second classification label of the box.
可选的,上述A个检测框包括N个文本行检测框和M个非文本行检测框;第一提取单元301包括:文本提取单元,用于利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取N个文本行检测框;非文本提取单元,用于利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取M个非文本行检测框。Optionally, the above A detection frames include N text line detection frames and M non-text line detection frames; the first extraction unit 301 includes: a text extraction unit, configured to use a classification and positioning model based on the YOLO network to target the first target The feature information in the image is detected, and N text line detection frames are extracted; the non-text extraction unit is used to detect the feature information in the first target image using the classification and positioning model based on the YOLO network, and extract M non-text line detection frame.
可选的,上述文本提取单元包括:检测框提取单元、匹配单元、修正单元和第一过滤单元。Optionally, the aforementioned text extraction unit includes: a detection frame extraction unit, a matching unit, a correction unit, and a first filtering unit.
检测框提取单元,用于利用利用分类定位模型从第一目标图像中提取n个文本头检测框和n个文本尾检测框,上述n个文本头检测框中的第一文本头检测框中包括第一目标图像中的第一文本行的前B个字符,第一文本行的前B个字符的长度为L1,文本头检测框还 包括上述B个字符之前的长度为t*L1的非文本图像区域,上述n个文本尾检测框中的第一文本尾检测框包括第一文本行的后C个字符,第一文本行的后C个字符的长度为L2,文本尾检测框还包括上述C个字符之后的长度为t*L2的非文本图像区域,B和C为正整数,t大于零小于等于1。The detection frame extraction unit is used for extracting n text head detection frames and n text tail detection frames from the first target image by using the classification and positioning model, and the first text head detection frame of the n text head detection frames includes The first B characters of the first text line in the first target image, the length of the first B characters of the first text line is L1, and the text header detection box also includes non-text with a length of t*L1 before the above B characters In the image area, the first text end detection box in the n text end detection boxes includes the last C characters of the first text line, the length of the last C characters of the first text line is L2, and the text end detection box also includes the above The length of the non-text image area after C characters is t*L2, B and C are positive integers, and t is greater than zero and less than or equal to 1.
匹配单元,用于基于文本行的斜率一致性和就近原则将上述n个文本头检测框和上述n个文本尾检测框进行匹配,获得上述n个文本行检测框。The matching unit is configured to match the n text head detection boxes with the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity to obtain the n text line detection boxes.
修正单元,用于对上述n个文本行检测框进行修正,去除文本行检测框中的非文本图像区域,获得n个预测框。The correction unit is used to correct the above-mentioned n text line detection frames, remove the non-text image area in the text line detection frame, and obtain n prediction frames.
第一过滤单元,利用非极大值抑制算法对上述n个预测框进行过滤,获得上述N个文本行检测框、上述N个文本行检测框的目标检测分数和上述N个文本行检测框的第一次分类标签。The first filtering unit uses a non-maximum value suppression algorithm to filter the above n prediction boxes to obtain the above N text line detection boxes, the target detection scores of the above N text line detection boxes, and the above N text line detection boxes. Sort tags for the first time.
可选的,上述非文本提取单元,包括:第一获取单元,用于利用分类定位模型对第一目标图像进行特征提取,获得m张a*a尺寸大小的特征图,特征图为包含非文本行信息的图像;第二获取单元,用于对上述m张特征图中的非文本行信息进行中心坐标预测,基于中心坐标采用K-means聚类算法获取m个预测框的长和宽、上述m个预测框包含非文本行特征信息的置信度和上述m个预测框内非文本行特征信息所属类别的置信度;第二过滤单元,用于利用非极大值抑制算法对上述m个预测框进行过滤,获得上述M个非文本行检测框、上述M个非文本行检测框的目标检测分数和上述M个非文本行检测框的第一次分类标签。Optionally, the aforementioned non-text extraction unit includes: a first acquisition unit, configured to perform feature extraction on the first target image using the classification and positioning model to obtain m feature maps with a size of a*a, the feature maps containing non-text The second acquisition unit is used to predict the center coordinates of the non-text line information in the m feature maps, and use the K-means clustering algorithm to acquire the length and width of the m prediction boxes based on the center coordinates. The m prediction boxes contain the confidence of the non-text line feature information and the confidence of the category of the non-text line feature information in the m prediction boxes; the second filtering unit is used to use the non-maximum value suppression algorithm to predict the m The frames are filtered to obtain the M non-text line detection frames, the target detection scores of the M non-text line detection frames, and the first classification labels of the M non-text line detection frames.
上述提取单元利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框之前,上述装置还包括:预训练单元。预训练单元用于对YOLO网络进行预训练。The aforementioned extraction unit uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image. Before extracting A detection frames, the aforementioned device further includes: a pre-training unit. The pre-training unit is used to pre-train the YOLO network.
上述预训练单元,包括:建立单元,用于建立样本数据库,样本数据库包含用于训练YOLO网络的图像样本;初始化单元,用于初始化YOLO网络的训练参数;选择单元,用于从样本数据库中随机选择图像样本作为训练样本;输入单元,用于将训练样本作为输入向量输入YOLO网络;第三获取单元,用于获取YOLO网络输出向量,即训练样本的特征图;处理单元,用于根据输出向量优化训练参数,建立图像样本和图像样本的特征图之间的残差网络。The above-mentioned pre-training unit includes: a establishing unit for establishing a sample database, the sample database containing image samples for training the YOLO network; an initialization unit for initializing the training parameters of the YOLO network; a selection unit for randomly selecting from the sample database Select the image sample as the training sample; the input unit is used to input the training sample as the input vector into the YOLO network; the third acquisition unit is used to obtain the output vector of the YOLO network, that is, the feature map of the training sample; the processing unit is used to according to the output vector Optimize the training parameters and establish a residual network between the image sample and the feature map of the image sample.
结合本申请实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(英文:random access memory,简称:RAM)、闪存、只读存储器(英文:read only memory,简称:ROM)、可擦除可编程只读存储器(英文:erasable programmable rom,简称:EPROM)、电可擦可编程只读存储器(英文:electrically eprom,简称:EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备中。The steps of the method or algorithm described in combination with the disclosure of the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. Software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: random access memory, referred to as RAM), flash memory, read-only memory (English: read only memory, referred to as ROM), Erasable programmable read-only memory (English: erasable programmable rom, abbreviation: EPROM), electrically erasable programmable read-only memory (English: electrically eprom, abbreviation: EEPROM), register, hard disk, mobile hard disk, CD-ROM (CD -ROM) or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC can be located in a network device. Of course, the processor and the storage medium may also exist as discrete components in the network device.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
本申请中所述的计算机可读介质可以为计算机非易失性可读介质。The computer-readable medium described in this application may be a non-volatile computer-readable medium.
以上所述的具体实施方式,对本申请实施例的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请实施例的具体实施方式而已,并不用于限定本申请实施例的保护范围,凡在本申请实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请实施例的保护范围之内。The specific implementations described above further describe the purpose, technical solutions, and beneficial effects of the embodiments of the application in detail. It should be understood that the foregoing descriptions are only specific implementations of the embodiments of the application, and are not used for To limit the protection scope of the embodiments of the application, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the application shall be included in the protection scope of the embodiments of the application.

Claims (20)

  1. 一种证件信息的分类定位的训练方法,其特征在于,所述方法包括:A training method for classification and positioning of credential information, characterized in that the method includes:
    所述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取所述A个检测框的第一边框信息和所述A个检测框的第一次分类标签,所述第一目标图像包含第一证件,A为大于0的正整数;The server uses the classification and positioning model based on the YOLO network to detect A feature information in the first target image, extracts A detection frames, and obtains the first frame information of the A detection frames and the A detections The first classification label of the frame, the first target image contains the first certificate, and A is a positive integer greater than 0;
    所述服务器根据所述第一证件的结构化信息特征调整所述A个检测框的边框信息和所述A个检测框的分类标签,生成所述A个检测框的第二边框信息和所述A个检测框的第二次分类标签。The server adjusts the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generates the second frame information of the A detection frames and the The second classification label of A detection frame.
  2. 根据权利要求1所述的方法,其特征在于,所述A个检测框包括N个文本行检测框和M个非文本行检测框,所述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框,包括:The method according to claim 1, wherein the A detection frames include N text line detection frames and M non-text line detection frames, and the server uses a classification and positioning model based on the YOLO network to analyze the first target The feature information in the image is detected, and A detection frames are extracted, including:
    所述服务器利用基于YOLO网络的所述分类定位模型对所述第一目标图像中的特征信息进行检测,提取N个文本行检测框;The server detects the feature information in the first target image by using the classification and positioning model based on the YOLO network, and extracts N text line detection frames;
    所述服务器利用基于YOLO网络的所述分类定位模型对所述第一目标图像中的特征信息进行检测,提取M个非文本行检测框。The server detects the feature information in the first target image by using the classification and positioning model based on the YOLO network, and extracts M non-text line detection frames.
  3. 根据权利要求2所述的方法,其特征在于,所述服务器利用基于YOLO网络的所述分类定位模型对第一目标图像中的特征信息进行检测,提取N个文本行检测框,包括:The method according to claim 2, wherein the server uses the classification and positioning model based on the YOLO network to detect feature information in the first target image, and extracts N text line detection frames, comprising:
    所述服务器利用所述分类定位模型从所述第一目标图像中提取n个文本头检测框和n个文本尾检测框,所述n个文本头检测框中的第一文本头检测框中包括所述第一目标图像中的第一文本行的前B个字符,所述第一文本行的前B个字符的长度为L1,所述文本头检测框还包括所述B个字符之前的长度为t*L1的非文本图像区域,所述n个文本尾检测框中的第一文本尾检测框包括所述第一文本行的后C个字符,所述第一文本行的后C个字符的长度为L2,所述文本尾检测框还包括所述C个字符之后的长度为t*L2的非文本图像区域,B和C为正整数,t大于零小于等于1;The server uses the classification and positioning model to extract n text head detection boxes and n text tail detection boxes from the first target image, and the first text head detection box of the n text head detection boxes includes The first B characters of the first text line in the first target image, the length of the first B characters of the first text line is L1, and the text header detection box also includes the length before the B characters Is a non-text image area of t*L1, the first text end detection box in the n text end detection boxes includes the last C characters of the first text line, and the last C characters of the first text line The length of is L2, the text end detection box also includes a non-text image area of length t*L2 after the C characters, B and C are positive integers, and t is greater than zero and less than or equal to 1;
    所述服务器基于文本行的斜率一致性和就近原则将所述n个文本头检测框和所述n个文本尾检测框进行匹配,获得所述n个文本行检测框;The server matches the n text head detection boxes and the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity to obtain the n text line detection boxes;
    所述服务器对所述n个文本行检测框进行修正,去除所述文本行检测框中的非文本图像区域,获得n个预测框;The server corrects the n text line detection boxes, removes the non-text image area in the text line detection box, and obtains n prediction boxes;
    所述服务器利用非极大值抑制算法对所述n个预测框进行过滤,获得所述N个文本行检测框、所述N个文本行检测框的目标检测分数和所述N个文本行检测框的第一次分类标签。The server uses a non-maximum value suppression algorithm to filter the n prediction boxes to obtain the N text line detection boxes, the target detection scores of the N text line detection boxes, and the N text line detections The first classification label of the box.
  4. 根据权利要求2所述的方法,其特征在于,所述服务器利用所述分类定位模型对第一目标图像中的特征信息进行检测,提取M个非文本行检测框,包括:The method according to claim 2, wherein the server uses the classification and positioning model to detect feature information in the first target image, and extracts M non-text line detection frames, comprising:
    所述服务器利用所述分类定位模型对所述第一目标图像进行特征提取,获得m张a*a尺寸大小的特征图,所述特征图为包含非文本行信息的图像;The server uses the classification and positioning model to perform feature extraction on the first target image to obtain m feature maps with a size of a*a, where the feature maps are images containing non-text line information;
    所述服务器对所述m张特征图中的非文本行信息进行中心坐标预测,基于所述中心坐标采用K-means聚类算法获取m个预测框的长和宽、所述m个预测框包含非文本行特征信息的置信度和所述m个预测框内非文本行特征信息所属类别的置信度;The server performs center coordinate prediction on the non-text line information in the m feature maps, and uses the K-means clustering algorithm to obtain the length and width of m prediction boxes based on the center coordinates, and the m prediction boxes include The confidence of the non-text line feature information and the confidence of the category to which the non-text line feature information in the m prediction boxes belongs;
    所述服务器利用非极大值抑制算法对所述m个预测框进行过滤,获得所述M个非文本行检测框、所述M个非文本行检测框的目标检测分数和所述M个非文本行检测框的第一次分类标签。The server uses a non-maximum value suppression algorithm to filter the m prediction frames to obtain the M non-text line detection frames, the target detection scores of the M non-text line detection frames, and the M non-text line detection frames. The first classification label of the text line detection box.
  5. 根据权利要求3所述的方法,其特征在于,所述n个文本头检测框中的第i个文本头检测框和所述n个文本尾检测框中的第g个文本尾检测框的连接斜率为第二斜率,所述 第i个文本头检测框和所述第g个文本尾检测框满足斜率一致性的条件为:所述第g个文本尾检测框的斜率与第一斜率的差值小于第一预设阈值,且所述第二斜率与所述第一斜率的差值小于第二预设阈值;所述第一斜率为所述第i个文本头检测框的斜率,或者,所述第一斜率为所述n个文本头检测框和所述n个文本尾检测框的斜率平均值。The method according to claim 3, wherein the connection between the i-th text head detection box in the n text head detection boxes and the g-th text tail detection box in the n text tail detection boxes The slope is the second slope, and the condition that the i-th text head detection box and the g-th text tail detection box meet the slope consistency is: the difference between the slope of the g-th text tail detection box and the first slope The value is less than a first preset threshold, and the difference between the second slope and the first slope is less than a second preset threshold; the first slope is the slope of the i-th text header detection frame, or, The first slope is an average value of the slopes of the n text head detection boxes and the n text tail detection boxes.
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框之前,还包括:The method according to any one of claims 1 to 5, wherein the server uses a classification and positioning model based on the YOLO network to detect A feature information in the first target image, and before extracting A detection frames, Also includes:
    对第二目标图像进行二值化处理,获取所述第二目标图像的二值化图像,所述第二目标图像的二值化图像为所述第一目标图像,所述第二目标图像包含所述第一证件。Binarize a second target image to obtain a binarized image of the second target image, the binarized image of the second target image is the first target image, and the second target image includes The first certificate.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框之前,所述方法还包括:对所述YOLO网络进行预训练;The method according to any one of claims 1 to 6, wherein the server uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image, and before extracting A detection frames, the The method further includes: pre-training the YOLO network;
    所述对所述YOLO网络进行预训练,包括:The pre-training of the YOLO network includes:
    建立样本数据库,所述样本数据库包含用于训练所述YOLO网络的图像样本;Establishing a sample database, the sample database containing image samples used to train the YOLO network;
    初始化所述YOLO网络的训练参数;Initialize the training parameters of the YOLO network;
    从所述样本数据库中随机选择图像样本作为训练样本;Randomly selecting image samples from the sample database as training samples;
    将所述训练样本作为输入向量输入所述YOLO网络;Input the training sample as an input vector into the YOLO network;
    获取所述YOLO网络输出向量,即所述训练样本的特征图;Acquiring the YOLO network output vector, that is, the feature map of the training sample;
    根据所述输出向量优化所述训练参数,建立所述图像样本和所述图像样本的特征图之间的残差网络。The training parameters are optimized according to the output vector, and a residual network between the image sample and the feature map of the image sample is established.
  8. 一种证件信息的分类定位训练的装置,其特征在于,所述装置包括:A device for classification and positioning training of credential information, characterized in that the device comprises:
    第一提取单元,用于利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取所述A个检测框的第一边框信息和所述A个检测框的第一次分类标签,所述第一目标图像包含第一证件,A为大于0的正整数;The first extraction unit is used to detect A feature information in the first target image by using a classification and positioning model based on the YOLO network, extract A detection frames, and obtain the first frame information and all the A detection frames. The first classification labels of the A detection frames, the first target image contains the first certificate, and A is a positive integer greater than 0;
    调整单元,用于根据所述第一证件的结构化信息特征调整所述A个检测框的边框信息和所述A个检测框的分类标签,生成所述A个检测框的第二边框信息和所述A个检测框的第二次分类标签。The adjustment unit is configured to adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and The second classification label of the A detection frames.
  9. 根据权利要求8所述的装置,其特征在于,所述A个检测框包括N个文本行检测框和M个非文本行检测框;所述提取单元,包括:The device according to claim 8, wherein the A detection frames comprise N text line detection frames and M non-text line detection frames; and the extraction unit comprises:
    文本提取单元,用于利用基于YOLO网络的所述分类定位模型对所述第一目标图像中的特征信息进行检测,提取N个文本行检测框;A text extraction unit, configured to use the classification and positioning model based on the YOLO network to detect feature information in the first target image, and extract N text line detection frames;
    非文本提取单元,用于利用基于YOLO网络的所述分类定位模型对所述第一目标图像中的特征信息进行检测,提取M个非文本行检测框。The non-text extraction unit is configured to use the classification and positioning model based on the YOLO network to detect the feature information in the first target image, and extract M non-text line detection frames.
  10. 根据权利要求9所述的装置,其特征在于,所述文本提取单元,包括:The device according to claim 9, wherein the text extraction unit comprises:
    检测框提取单元,用于利用所述分类定位模型从所述第一目标图像中提取n个文本头检测框和n个文本尾检测框,所述n个文本头检测框中的第一文本头检测框中包括所述第一目标图像中的第一文本行的前B个字符,所述第一文本行的前B个字符的长度为L1,所述文本头检测框还包括所述B个字符之前的长度为t*L1的非文本图像区域,所述n个文本尾检测框中的第一文本尾检测框包括所述第一文本行的后C个字符,所述第一文本行的后C个字符的长度为L2,所述文本尾检测框还包括所述C个字符之后的长度为t*L2的非文本图像区域,B和C为正整数,t大于零小于等于1;The detection frame extraction unit is configured to extract n text head detection frames and n text tail detection frames from the first target image by using the classification and positioning model, and the first text head in the n text head detection frames The detection frame includes the first B characters of the first text line in the first target image, the length of the first B characters of the first text line is L1, and the text header detection frame also includes the B A non-text image area with a length of t*L1 before a character, the first text end detection box in the n text end detection boxes includes the last C characters of the first text line, and the first text line The length of the last C characters is L2, the text end detection box also includes a non-text image area with a length of t*L2 after the C characters, B and C are positive integers, and t is greater than zero and less than or equal to 1;
    匹配单元,用于基于文本行的斜率一致性和就近原则将所述n个文本头检测框和所述n个文本尾检测框进行匹配,获得所述n个文本行检测框;A matching unit, configured to match the n text head detection boxes with the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity to obtain the n text line detection boxes;
    修正单元,用于对所述n个文本行检测框进行修正,去除所述文本行检测框中的非文 本图像区域,获得n个预测框;The correction unit is configured to correct the n text line detection frames, remove the non-text image area in the text line detection frame, and obtain n prediction frames;
    第一过滤单元,用于利用非极大值抑制算法对所述n个预测框进行过滤,获得所述N个文本行检测框、所述N个文本行检测框的目标检测分数和所述N个文本行检测框的第一次分类标签。The first filtering unit is configured to filter the n prediction boxes by using a non-maximum value suppression algorithm to obtain the N text line detection boxes, the target detection scores of the N text line detection boxes, and the N The first classification label of a text line detection box.
  11. 根据权利要求9所述的装置,其特征在于,所述非文本提取单元,包括:The device according to claim 9, wherein the non-text extraction unit comprises:
    第一获取单元,用于利用所述分类定位模型对所述第一目标图像进行特征提取,获得m张a*a尺寸大小的特征图,所述特征图为包含非文本行信息的图像;The first acquiring unit is configured to perform feature extraction on the first target image by using the classification and positioning model to obtain m feature maps with a size of a*a, where the feature maps are images containing non-text line information;
    第二获取单元,用于对所述m张特征图中的非文本行信息进行中心坐标预测,基于所述中心坐标采用K-means聚类算法获取m个预测框的长和宽、所述m个预测框包含非文本行特征信息的置信度和所述m个预测框内非文本行特征信息所属类别的置信度;The second acquisition unit is configured to predict the center coordinates of the non-text line information in the m feature maps, and use the K-means clustering algorithm to acquire the length and width of the m prediction frames based on the center coordinates, and the m Each prediction box contains the confidence level of the non-text line feature information and the confidence level of the category to which the non-text line feature information in the m prediction boxes belongs;
    第二过滤单元,还用于利用非极大值抑制算法对所述m个预测框进行过滤,获得所述M个非文本行检测框、所述M个非文本行检测框的目标检测分数和所述M个非文本行检测框的第一次分类标签。The second filtering unit is further configured to filter the m prediction frames using a non-maximum value suppression algorithm to obtain the M non-text line detection frames, the target detection scores of the M non-text line detection frames, and The first classification label of the M non-text line detection boxes.
  12. 根据权利要求10所述的装置,其特征在于,所述n个文本头检测框中的第i个文本头检测框和所述n个文本尾检测框中的第g个文本尾检测框的连接斜率为第二斜率,所述第i个文本头检测框和所述第g个文本尾检测框满足斜率一致性的条件为:所述第g个文本尾检测框的斜率与第一斜率的差值小于第一预设阈值,且所述第二斜率与所述第一斜率的差值小于第二预设阈值;所述第一斜率为所述第i个文本头检测框的斜率,或者,所述第一斜率为所述n个文本头检测框和所述n个文本尾检测框的斜率平均值。The device according to claim 10, wherein the connection between the i-th text header detection box in the n text header detection boxes and the g-th text tail detection box in the n text tail detection boxes The slope is the second slope, and the condition that the i-th text head detection box and the g-th text tail detection box meet the slope consistency is: the difference between the slope of the g-th text tail detection box and the first slope The value is less than a first preset threshold, and the difference between the second slope and the first slope is less than a second preset threshold; the first slope is the slope of the i-th text header detection frame, or, The first slope is an average value of the slopes of the n text head detection boxes and the n text tail detection boxes.
  13. 根据权利要求8至12任一项所述的装置,其特征在于,所述服务器利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框之前,还包括:The device according to any one of claims 8 to 12, wherein the server uses a classification and positioning model based on the YOLO network to detect A feature information in the first target image, and before extracting A detection frames, Also includes:
    二值化单元,用于对第二目标图像进行二值化处理,获取所述第二目标图像的二值化图像,所述第二目标图像的二值化图像为所述第一目标图像,所述第二目标图像包含所述第一证件。The binarization unit is configured to perform binarization processing on a second target image to obtain a binarized image of the second target image, where the binarized image of the second target image is the first target image, The second target image includes the first certificate.
  14. 根据权利要求8至13任一项所述的方法,其特征在于,所述第一提取单元利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框之前,所述装置还包括:预训练单元,用于对所述YOLO网络进行预训练;The method according to any one of claims 8 to 13, wherein the first extraction unit uses a classification and positioning model based on the YOLO network to detect the feature information in the first target image, and before extracting A detection frames , The device further includes: a pre-training unit for pre-training the YOLO network;
    所述预训练单元,包括:The pre-training unit includes:
    建立单元,用于建立样本数据库,所述样本数据库包含用于训练所述YOLO网络的图像样本;A establishing unit for establishing a sample database, the sample database containing image samples used for training the YOLO network;
    初始化单元,用于初始化所述YOLO网络的训练参数;The initialization unit is used to initialize the training parameters of the YOLO network;
    选择单元,用于从所述样本数据库中随机选择图像样本作为训练样本;The selection unit is configured to randomly select an image sample from the sample database as a training sample;
    输入单元,用于将所述训练样本作为输入向量输入所述YOLO网络;An input unit, configured to input the training sample as an input vector into the YOLO network;
    第三获取单元,用于获取所述YOLO网络输出向量,即所述训练样本的特征图;The third acquiring unit is configured to acquire the output vector of the YOLO network, that is, the feature map of the training sample;
    处理单元,用于根据所述输出向量优化所述训练参数,建立所述图像样本和所述图像样本的特征图之间的残差网络。The processing unit is configured to optimize the training parameters according to the output vector, and establish a residual network between the image sample and the feature map of the image sample.
  15. 一种服务器,其特征在于,包括:A server, characterized in that it comprises:
    一个或多个处理器;One or more processors;
    存储器;Memory
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个应用程序配置用于执行以下步骤:One or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, and the one or more application programs are configured to execute The following steps:
    利用基于YOLO网络的分类定位模型对第一目标图像中的A个特征信息进行检测,提取A个检测框,并获取所述A个检测框的第一边框信息和所述A个检测框的第一次分类标 签,所述第一目标图像包含第一证件,A为大于0的正整数;A classification and positioning model based on the YOLO network is used to detect A feature information in the first target image, extract A detection frame, and obtain the first frame information of the A detection frame and the first frame information of the A detection frame One-time classification label, the first target image contains the first certificate, and A is a positive integer greater than 0;
    根据所述第一证件的结构化信息特征调整所述A个检测框的边框信息和所述A个检测框的分类标签,生成所述A个检测框的第二边框信息和所述A个检测框的第二次分类标签。Adjust the frame information of the A detection frames and the classification labels of the A detection frames according to the structured information characteristics of the first certificate, and generate the second frame information of the A detection frames and the A detections The second classification label of the box.
  16. 根据权利要求15所述的服务器,其特征在于,所述A个检测框包括N个文本行检测框和M个非文本行检测框,所述利用基于YOLO网络的分类定位模型对第一目标图像中的特征信息进行检测,提取A个检测框时,所述一个或多个应用程序被配置用于执行以下步骤:The server according to claim 15, wherein the A detection frames include N text line detection frames and M non-text line detection frames, and the classification and positioning model based on the YOLO network is used to analyze the first target image When A detection frames are extracted, the one or more application programs are configured to perform the following steps:
    利用基于YOLO网络的所述分类定位模型对所述第一目标图像中的特征信息进行检测,提取N个文本行检测框;Using the classification and positioning model based on the YOLO network to detect the feature information in the first target image, and extract N text line detection frames;
    利用基于YOLO网络的所述分类定位模型对所述第一目标图像中的特征信息进行检测,提取M个非文本行检测框。The feature information in the first target image is detected by using the classification and positioning model based on the YOLO network, and M non-text line detection frames are extracted.
  17. 根据权利要求16所述的服务器,其特征在于,所述利用基于YOLO网络的所述分类定位模型对第一目标图像中的特征信息进行检测,提取N个文本行检测框时,所述一个或多个应用程序被配置用于执行以下步骤:The server according to claim 16, wherein the feature information in the first target image is detected using the classification and positioning model based on the YOLO network, and when N text line detection frames are extracted, the one or Multiple applications are configured to perform the following steps:
    利用所述分类定位模型从所述第一目标图像中提取n个文本头检测框和n个文本尾检测框,所述n个文本头检测框中的第一文本头检测框中包括所述第一目标图像中的第一文本行的前B个字符,所述第一文本行的前B个字符的长度为L1,所述文本头检测框还包括所述B个字符之前的长度为t*L1的非文本图像区域,所述n个文本尾检测框中的第一文本尾检测框包括所述第一文本行的后C个字符,所述第一文本行的后C个字符的长度为L2,所述文本尾检测框还包括所述C个字符之后的长度为t*L2的非文本图像区域,B和C为正整数,t大于零小于等于1;The classification and positioning model is used to extract n text head detection boxes and n text tail detection boxes from the first target image, and the first text head detection box of the n text head detection boxes includes the first text head detection box. The first B characters of the first text line in a target image, the length of the first B characters of the first text line is L1, and the text header detection box also includes the length before the B characters is t* In the non-text image area of L1, the first text end detection box in the n text end detection boxes includes the last C characters of the first text line, and the length of the last C characters of the first text line is L2, the text end detection frame further includes a non-text image area with a length of t*L2 after the C characters, B and C are positive integers, and t is greater than zero and less than or equal to 1;
    基于文本行的斜率一致性和就近原则将所述n个文本头检测框和所述n个文本尾检测框进行匹配,获得所述n个文本行检测框;Matching the n text head detection boxes and the n text tail detection boxes based on the consistency of the slope of the text line and the principle of proximity to obtain the n text line detection boxes;
    对所述n个文本行检测框进行修正,去除所述文本行检测框中的非文本图像区域,获得n个预测框;Correcting the n text line detection boxes, removing non-text image areas in the text line detection boxes, and obtaining n prediction boxes;
    利用非极大值抑制算法对所述n个预测框进行过滤,获得所述N个文本行检测框、所述N个文本行检测框的目标检测分数和所述N个文本行检测框的第一次分类标签。Use a non-maximum value suppression algorithm to filter the n prediction boxes to obtain the N text line detection boxes, the target detection scores of the N text line detection boxes, and the number of the N text line detection boxes Sort tags once.
  18. 根据权利要求15所述的服务器,其特征在于,所述利用所述分类定位模型对第一目标图像中的特征信息进行检测,提取M个非文本行检测框时,所述一个或多个应用程序还被配置用于执行以下步骤:The server according to claim 15, wherein when the feature information in the first target image is detected using the classification and positioning model, and M non-text line detection frames are extracted, the one or more applications The program is also configured to perform the following steps:
    利用所述分类定位模型对所述第一目标图像进行特征提取,获得m张a*a尺寸大小的特征图,所述特征图为包含非文本行信息的图像;Performing feature extraction on the first target image by using the classification and positioning model to obtain m feature maps with a size of a*a, where the feature maps are images containing non-text line information;
    对所述m张特征图中的非文本行信息进行中心坐标预测,基于所述中心坐标采用K-means聚类算法获取m个预测框的长和宽、所述m个预测框包含非文本行特征信息的置信度和所述m个预测框内非文本行特征信息所属类别的置信度;Perform center coordinate prediction on the non-text line information in the m feature maps, and use the K-means clustering algorithm to obtain the length and width of m prediction boxes based on the center coordinates, and the m prediction boxes contain non-text lines The confidence of the feature information and the confidence of the category to which the feature information of the non-text lines in the m prediction boxes belongs;
    利用非极大值抑制算法对所述m个预测框进行过滤,获得所述M个非文本行检测框、所述M个非文本行检测框的目标检测分数和所述M个非文本行检测框的第一次分类标签。Use a non-maximum suppression algorithm to filter the m prediction boxes to obtain the M non-text line detection boxes, the target detection scores of the M non-text line detection boxes, and the M non-text line detections The first classification label of the box.
  19. 根据权利要求15至18任一项所述的服务器,其特征在于,所述n个文本头检测框中的第i个文本头检测框和所述n个文本尾检测框中的第g个文本尾检测框的连接斜率为第二斜率,所述第i个文本头检测框和所述第g个文本尾检测框满足斜率一致性的条件为:所述第g个文本尾检测框的斜率与第一斜率的差值小于第一预设阈值,且所述第二斜率与所述第一斜率的差值小于第二预设阈值;所述第一斜率为所述第i个文本头检测框的斜率,或者,所述第一斜率为所述n个文本头检测框和所述n个文本尾检测框的斜率平均值。The server according to any one of claims 15 to 18, wherein the i-th text head detection box in the n text head detection boxes and the g-th text in the n text tail detection boxes The connection slope of the tail detection box is the second slope, and the condition for the i-th text head detection box and the g-th text tail detection box to meet the slope consistency is: the slope of the g-th text tail detection box is The difference between the first slope is less than a first preset threshold, and the difference between the second slope and the first slope is less than a second preset threshold; the first slope is the i-th text header detection frame Or, the first slope is an average value of the slopes of the n text head detection boxes and the n text tail detection boxes.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至7任意一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method according to any one of claims 1 to 7.
PCT/CN2019/117550 2019-09-18 2019-11-12 Certificate information classification and positioning method and apparatus WO2021051553A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910880737.X 2019-09-18
CN201910880737.XA CN110738238B (en) 2019-09-18 2019-09-18 Classification positioning method and device for certificate information

Publications (1)

Publication Number Publication Date
WO2021051553A1 true WO2021051553A1 (en) 2021-03-25

Family

ID=69268040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117550 WO2021051553A1 (en) 2019-09-18 2019-11-12 Certificate information classification and positioning method and apparatus

Country Status (2)

Country Link
CN (1) CN110738238B (en)
WO (1) WO2021051553A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486881A (en) * 2021-09-03 2021-10-08 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and medium
CN113627439A (en) * 2021-08-11 2021-11-09 北京百度网讯科技有限公司 Text structuring method, processing device, electronic device and storage medium
CN114037985A (en) * 2021-11-04 2022-02-11 北京有竹居网络技术有限公司 Information extraction method, device, equipment, medium and product

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476113A (en) * 2020-03-20 2020-07-31 中保车服科技服务股份有限公司 Card identification method, device and equipment based on transfer learning and readable medium
CN111898520A (en) * 2020-07-28 2020-11-06 腾讯科技(深圳)有限公司 Certificate authenticity identification method and device, computer readable medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015089115A1 (en) * 2013-12-09 2015-06-18 Nant Holdings Ip, Llc Feature density object classification, systems and methods
CN106295629A (en) * 2016-07-15 2017-01-04 北京市商汤科技开发有限公司 Structured text detection method and system
CN108537146A (en) * 2018-03-22 2018-09-14 五邑大学 A kind of block letter mixes line of text extraction system with handwritten form
CN109670495A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of method and system of the length text detection based on deep neural network
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN110046616A (en) * 2019-03-04 2019-07-23 北京奇艺世纪科技有限公司 Image processing model generation, image processing method, device, terminal device and storage medium
CN110188755A (en) * 2019-05-30 2019-08-30 北京百度网讯科技有限公司 A kind of method, apparatus and computer readable storage medium of image recognition

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8965127B2 (en) * 2013-03-14 2015-02-24 Konica Minolta Laboratory U.S.A., Inc. Method for segmenting text words in document images
CN107742093B (en) * 2017-09-01 2020-05-05 国网山东省电力公司电力科学研究院 Real-time detection method, server and system for infrared image power equipment components
CN109271970A (en) * 2018-10-30 2019-01-25 北京旷视科技有限公司 Face datection model training method and device
CN109977949B (en) * 2019-03-20 2024-01-26 深圳华付技术股份有限公司 Frame fine adjustment text positioning method and device, computer equipment and storage medium
CN109961040B (en) * 2019-03-20 2023-03-21 深圳市华付信息技术有限公司 Identity card area positioning method and device, computer equipment and storage medium
CN110008882B (en) * 2019-03-28 2021-06-08 华南理工大学 Vehicle detection method based on similarity loss of mask and frame
CN110084173B (en) * 2019-04-23 2021-06-15 精伦电子股份有限公司 Human head detection method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015089115A1 (en) * 2013-12-09 2015-06-18 Nant Holdings Ip, Llc Feature density object classification, systems and methods
CN106295629A (en) * 2016-07-15 2017-01-04 北京市商汤科技开发有限公司 Structured text detection method and system
CN108537146A (en) * 2018-03-22 2018-09-14 五邑大学 A kind of block letter mixes line of text extraction system with handwritten form
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN109670495A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of method and system of the length text detection based on deep neural network
CN110046616A (en) * 2019-03-04 2019-07-23 北京奇艺世纪科技有限公司 Image processing model generation, image processing method, device, terminal device and storage medium
CN110188755A (en) * 2019-05-30 2019-08-30 北京百度网讯科技有限公司 A kind of method, apparatus and computer readable storage medium of image recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627439A (en) * 2021-08-11 2021-11-09 北京百度网讯科技有限公司 Text structuring method, processing device, electronic device and storage medium
CN113486881A (en) * 2021-09-03 2021-10-08 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and medium
CN113486881B (en) * 2021-09-03 2021-12-07 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and medium
CN114037985A (en) * 2021-11-04 2022-02-11 北京有竹居网络技术有限公司 Information extraction method, device, equipment, medium and product

Also Published As

Publication number Publication date
CN110738238A (en) 2020-01-31
CN110738238B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
WO2021051553A1 (en) Certificate information classification and positioning method and apparatus
CN110569832B (en) Text real-time positioning and identifying method based on deep learning attention mechanism
US10817741B2 (en) Word segmentation system, method and device
Hamad et al. A detailed analysis of optical character recognition technology
AU2017302248B2 (en) Label and field identification without optical character recognition (OCR)
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
Antonacopoulos et al. ICDAR2015 competition on recognition of documents with complex layouts-RDCL2015
JP2022532177A (en) Forged face recognition methods, devices, and non-temporary computer-readable storage media
CN107729865A (en) A kind of handwritten form mathematical formulae identified off-line method and system
WO2023284502A1 (en) Image processing method and apparatus, device, and storage medium
CN111488732B (en) Method, system and related equipment for detecting deformed keywords
Makhmudov et al. Improvement of the end-to-end scene text recognition method for “text-to-speech” conversion
CN114724133B (en) Text detection and model training method, device, equipment and storage medium
Igorevna et al. Document image analysis and recognition: a survey
CN114581928A (en) Form identification method and system
Karthik et al. Segmentation and recognition of handwritten kannada text using relevance feedback and histogram of oriented gradients–a novel approach
Gharde et al. Identification of handwritten simple mathematical equation based on SVM and projection histogram
Chaturvedi et al. Automatic license plate recognition system using surf features and rbf neural network
WO2023011606A1 (en) Training method of live body detection network, method and apparatus of live body detectoin
Chaki et al. Fragmented handwritten digit recognition using grading scheme and fuzzy rules
Cai et al. Bank card and ID card number recognition in Android financial APP
CN111488870A (en) Character recognition method and character recognition device
Rajithkumar et al. Template matching method for recognition of stone inscripted Kannada characters of different time frames based on correlation analysis
Jameel et al. A REVIEW ON RECOGNITION OF HANDWRITTEN URDU CHARACTERS USING NEURAL NETWORKS.
Singh et al. Line parameter based word-level Indic script identification system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946043

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946043

Country of ref document: EP

Kind code of ref document: A1