WO2020220575A1 - 证件识别方法和装置、电子设备、计算机可读存储介质 - Google Patents

证件识别方法和装置、电子设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2020220575A1
WO2020220575A1 PCT/CN2019/108209 CN2019108209W WO2020220575A1 WO 2020220575 A1 WO2020220575 A1 WO 2020220575A1 CN 2019108209 W CN2019108209 W CN 2019108209W WO 2020220575 A1 WO2020220575 A1 WO 2020220575A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
line
predicted
height
area
Prior art date
Application number
PCT/CN2019/108209
Other languages
English (en)
French (fr)
Inventor
郑迪昕
刘学博
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2020543760A priority Critical patent/JP7033208B2/ja
Priority to SG11202007758TA priority patent/SG11202007758TA/en
Priority to KR1020207025083A priority patent/KR102435365B1/ko
Priority to US16/991,533 priority patent/US20200372248A1/en
Publication of WO2020220575A1 publication Critical patent/WO2020220575A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1448Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1452Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on positionally close symbols, e.g. amount sign or URL-specific characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to computer vision technology, in particular to a certificate recognition method and device, electronic equipment, and computer-readable storage media.
  • Optical character recognition Optical Character Recognition, OCR
  • OCR Optical Character Recognition
  • the current OCR recognition technology has high recognition accuracy for the recognition of commonly used characters, but the recognition accuracy for special types of characters such as ethnic minority characters needs to be improved.
  • An embodiment of the present disclosure provides a certificate recognition technology.
  • the first aspect of the embodiments of the present disclosure provides a certificate recognition method, including:
  • the first text area includes a plurality of text lines corresponding to the first character type
  • a second aspect of the embodiments of the present application provides a credential identification device, including:
  • the key point detection unit is used to perform key point detection on the credential image to obtain information on multiple key points of the credential included in the credential image, wherein the multiple key points include the information of the first text area in the credential At least two boundary defining points, and the first text area includes a plurality of text lines corresponding to the first character type;
  • the text recognition unit is used to determine the text recognition result of the certificate based on the information of the multiple key points.
  • the certificate further includes a second text area, wherein the second text area includes at least one text line corresponding to a second character type different from the first character type, and the first character type The second text area has the same text content as the first text area.
  • a certificate recognition device including:
  • the key point detection unit is used to perform key point detection on the credential image to obtain information on multiple key points of the credential included in the credential image, wherein the multiple key points include the information of the first text area in the credential At least two boundary defining points, and the first text area includes a plurality of text lines corresponding to the first character type;
  • the text recognition unit is used to determine the text recognition result of the certificate based on the information of the multiple key points.
  • an electronic device including a processor, and the processor includes the credential recognition device described in any one of the above embodiments.
  • an electronic device including: a memory for storing executable instructions;
  • a processor configured to communicate with the memory to execute the executable instruction to complete the operation of the credential identification method described in any one of the above embodiments.
  • a computer-readable storage medium for storing computer-readable instructions that, when executed, execute the document identification method described in any one of the above embodiments. operating.
  • a computer program including computer-readable code, and when the computer-readable code is run on a device, a processor in the device executes to implement any one of the foregoing Instructions for the certificate recognition method described in this embodiment.
  • another computer program product for storing computer-readable instructions, which when executed, cause a computer to execute the face recognition method in any of the above possible implementations Or the operation of the training method of the face recognition network.
  • the computer program product is specifically a computer storage medium.
  • the computer program product is specifically a software product, such as SDK.
  • another certificate recognition method and device in which key points of the certificate image are detected to obtain multiple key points of the certificate image Information, wherein the multiple key points include at least two boundary defining points of the first text area in the certificate, and the first text area includes multiple text lines corresponding to the first character type;
  • the information of multiple key points is used to determine the text recognition result of the certificate.
  • key point detection is performed on the certificate image to obtain information on multiple key points of the certificate included in the certificate image, wherein:
  • the plurality of key points includes at least two boundary defining points of the first text area in the certificate, and the first text area includes a plurality of text lines corresponding to the first character type; based on the plurality of key points
  • To determine the text recognition result of the certificate by adding at least two boundary defining points of the first text area, it is helpful to improve the recognition accuracy of the text position of the multi-line text in the first text area, and reduce other
  • the negative impact of the character type on the text recognition of the first character type improves the recognition accuracy of the content of the first character type in the certificate.
  • FIG. 1 is an example diagram of an ID card to which the certificate recognition technology provided by an embodiment of the disclosure is applicable.
  • FIG. 2 is a schematic flowchart of a credential identification method provided by an embodiment of the disclosure.
  • FIG. 3 is a schematic diagram of another process of the credential identification method provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic diagram of another process of the credential identification method provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic diagram of another process of the credential identification method provided by an embodiment of the disclosure.
  • FIG. 6 is a schematic diagram of still another process of the credential identification method provided by an embodiment of the disclosure.
  • FIG. 7 is a diagram of an application example of the certificate recognition method provided by the embodiments of the disclosure.
  • FIG. 8 is a diagram of another application example of the certificate recognition method provided by the embodiments of the disclosure.
  • FIG. 9 is a schematic structural diagram of a credential identification device provided by an embodiment of the disclosure.
  • FIG. 10 is a schematic diagram of an exemplary structure of an electronic device according to an embodiment of the disclosure.
  • the embodiments of the present disclosure are mainly applied to identification of ID cards, but can also be applied to the identification of other certificates or bills with a fixed or partially fixed layout, which is not limited in the embodiments of the present disclosure.
  • the current OCR recognition algorithm has high recognition accuracy for most ID cards such as Han nationality ID cards.
  • the recognition of a small number of ID cards such as minority ID cards mainly faces the following key problems:
  • ethnic minority ID cards have multiple formats. Taking the address field as an example, there are currently two commonly used formats. In the first version, there is no obvious line difference between ethnic minority characters and Chinese characters, which appear line by line; in the second format As shown in Figure 1, although ethnic minorities and Chinese characters appear in the same area, there are obvious line differences between them, and they do not appear line by line. The diversification of format will also affect the accuracy of ethnic minority ID card recognition.
  • an embodiment of the present disclosure proposes an image recognition technology, by adding the following defined points as key points in the key points: at least two boundary definitions of a first text area including multiple text lines in the Chinese character area Point (for example, the key point in the upper left corner and the key point in the lower right corner can determine the boundary of the first text area) to improve the positioning accuracy of the Chinese character area containing at least the first text area, and reduce the influence of ethnic minority characters on Chinese character recognition. This helps improve the accuracy of document recognition.
  • Figure 1 exemplarily shows the 24 key points in the embodiment of the present disclosure, including: the four top corner key points of the credential image, the field name area (including: “name”, “gender”, “birth”, “ “Address” and “Citizen ID Number”) the key points in the upper left corner and the key points in the lower right corner, the field information area of some fields (including: name field information area, gender field information area, ethnic field information area and ID number field information).
  • the key points in the upper left corner and the key points in the lower right corner of the address field information area in addition, also include the key points in the upper left corner and the key point in the lower right corner of the address field information area. Points to improve the accuracy of the recognition of Chinese characters in ethnic minority ID cards.
  • FIG. 2 is a schematic flowchart of a credential identification method provided by an embodiment of the disclosure.
  • Step 210 Perform key point detection on the credential image to obtain information on multiple key points of the credential included in the credential image.
  • the credential recognition method can be applied to various image processing devices.
  • the image processing devices include terminal devices such as mobile phones, tablet computers, wearable devices, and access control devices.
  • the certificate recognition method can be applied to a server on the network side.
  • the terminal collects a certificate image and uploads it to the server.
  • the server recognizes the certificate image and obtains the certificate information of the certificate corresponding to the certificate image.
  • the credential information includes at least the text recognition result.
  • the user does not need to manually enter the identity information, but can simply collect the document image, and the terminal or server will pass the verification of the document Image recognition, obtain the text recognition result in the certificate.
  • the multiple key points include at least two boundary defining points of the first text area in the certificate, and the first text area includes multiple text lines corresponding to the first character type.
  • the information of the multiple key points includes position information of the multiple key points in the document image.
  • the document image is an image formed by collecting the document.
  • the certificates include, but are not limited to: ID cards, passports, residence permits, temporary residence permits, degree certificates, academic certificates and other certificate images containing multiple types of characters.
  • the certificate includes two types of characters, namely the first character type and the second character type.
  • the text of the first character type and the second character type appear on different lines, and the text line of the first character type and the second character type Text lines of character type can have the same or different content.
  • the first character type is a recognizable character type or a recognized target character type, such as Chinese characters
  • the second character type is an unrecognizable character type or a character type that is not recognized, for example, minority characters, etc. .
  • the ID card recognition technology in order to maintain the universality of the recognition technology, it is also suitable for the recognition of Han nationality ID cards and the recognition of ethnic minority ID cards.
  • the Chinese characters in the ID cards are recognized, but the minority characters are not recognized. .
  • the first character type may be Chinese characters
  • the second character type may be languages used in other countries or regions, for example, characters of minor languages in other countries.
  • the text area corresponding to the first character type may only contain text of the first character type, or may further contain other character types other than the first and second character types, such as numbers, etc.
  • the text area corresponding to the second character type may include text of the second character type and text of other character types, which is not limited in the embodiment of the present disclosure.
  • the certificate further includes a second text area, wherein the second text area includes at least one text line corresponding to a second character type different from the first character type, and the second text area is connected to the first character type.
  • a text area has the same text content.
  • the address field information area in the ID card contains the Chinese character information area and the ethnic minority text information area, indicating the same address of the person.
  • the first text area and the second text area are respectively the Chinese character information area and the minority text information area of the address field information area in the example shown in FIG. 1, the second text area is adjacent to or separated from the first text area by at least one Blank rows, but the embodiments of the present disclosure are not limited thereto.
  • the embodiments of the present disclosure perform key point detection on the credential image to obtain information of multiple key points of the credential included in the credential image, where the key point information includes location information, or further includes other information. Make a limit.
  • the multiple key points of the certificate include at least two boundary defining points of the first text area, for example, the upper left key point and the lower right key point, or the lower left key point and the upper right key point, or four vertices, etc., The embodiment of the present disclosure does not limit this.
  • the first text area can be positioned more accurately, which is beneficial to obtain a more accurate predicted line height of the first text area , Reduce the influence of the text of the second character type on the identification of the certificate, and improve the recognition accuracy.
  • Step 220 Determine the text recognition result of the certificate based on the information of multiple key points.
  • the more precise position of the text line included in the first text area can be determined.
  • the text of the first character type whose position is determined is further processed.
  • Recognition the text recognition result of the first text area is obtained.
  • the position of the text line of the first character type in the other text areas included in the certificate may be determined based on the position of the text line of the first character type included in the first text area, which is beneficial to improve the certificate. The accuracy of text recognition.
  • key point detection is performed on a credential image to obtain information of multiple key points of the credential included in the credential image, wherein the multiple key points include the At least two boundary defining points of the first text area in the certificate, the first text area includes a plurality of text lines corresponding to the first character type; based on the information of the plurality of key points, the text of the certificate is determined.
  • the recognition by adding at least two boundary defining points of the first text area, it is helpful to improve the recognition accuracy of the text position of the multi-line text in the first text area, and reduce other character types to the first character type text. The impact of recognition improves the recognition accuracy of the first character type content in the certificate.
  • the first character type is Chinese characters
  • the second character type is ethnic minority scripts.
  • the character recognition technology has not been able to realize the recognition of minority characters. Therefore, the embodiments of the present disclosure need to eliminate the interference of minority characters on the content of Chinese characters. For example, when minority characters and Chinese characters do not appear line by line, that is, minority characters There is a gap between the field and the Chinese character field. At this time, the original ID card processing method often fails to detect the text area, and incorrectly detects and recognizes ethnic minority characters as Chinese characters, resulting in incorrect results.
  • both the first text area and the second text area may be connected quadrangular areas, for example, rectangular areas.
  • FIG. 3 is a schematic diagram of another process of the credential identification method provided by an embodiment of the disclosure.
  • Step 310 Perform key point detection on the credential image to obtain information on multiple key points of the credential included in the credential image.
  • the multiple key points include at least two boundary defining points of the first text area in the certificate, and the first text area includes multiple text lines corresponding to the first character type.
  • Step 320 Determine the target predicted position of each text line in the multiple text lines contained in the first text area based on the information of at least two boundary defining points of the first text area.
  • a rectangular area may be determined based on the information of at least two boundary defining points of the first text area.
  • the rectangular area includes at least the first text area and may also include part of the second text area;
  • To recognize the first character type in a text area it is necessary to determine the position of each text line, that is, the target predicted position of each text line determined in the embodiment of the present disclosure, and then perform text recognition at the target predicted position, that is The content of the first character type included in the first area can be determined.
  • the recognition of the content in the first text area can be performed line by line. Line-by-line recognition improves the accuracy of character recognition and reduces recognition errors caused by the intersection between lines.
  • Step 330 Recognizing at least one text area corresponding to the first character type contained in the certificate based on the target predicted position of each text line in the plurality of text lines contained in the first text area, to obtain a text recognition result of the certificate.
  • the certificate can include multiple text areas (including the first text area) with recognizable content.
  • the character types in these text areas are all the first character type, and because the certificate is of a relatively format
  • the height of Chinese characters in the ID card is the same, that is, the line height of Chinese characters in the ID card image is the same; therefore, it is determined that the first text area includes
  • the line height of the text line included in the first text area can be determined.
  • This line height can be used to correct the line height of the text lines in other text areas to obtain the corrected text line Highly determine the position of each text line in other text areas, and then determine the content in other text areas, and improve the recognition accuracy of text in other text areas.
  • FIG. 4 is a schematic diagram of a part of the process in another embodiment of the credential recognition method provided by the embodiment of the disclosure. Based on the foregoing embodiment, step 320 includes:
  • Step 402 Determine the initial predicted position of each text line in the multiple text lines contained in the first text area based on the information of at least two boundary defining points of the first text area.
  • the initial predicted position of the text line may include the upper boundary and the lower boundary of the text line, and the position of the text line can be determined by the coordinates of the upper and lower boundary; the initial predicted position in the embodiment of the present disclosure may be based on the first text
  • the number of lines included in the area, the initial line height of each text line, and the upper and lower boundaries of the first text area determined based on the information of the boundary defining points, where the number of lines and the initial line height can be obtained using a neural network
  • a deep neural network is used to identify the number of lines included in the first text area in the document and the initial line height of each text in the first text area.
  • Step 404 In response to determining that the initial predicted positions of the multiple text lines are abnormal, perform correction processing on the initial predicted positions of the multiple text lines contained in the first text area to obtain target predicted positions of the multiple text lines.
  • the embodiments of the present disclosure need to determine whether the initial predicted location is normal after obtaining the initial predicted location. When the initial predicted location is abnormal, identifying with the initial predicted location will cause errors in recognizing content.
  • the embodiments of the present disclosure improve the accuracy of the position of the text line through the correction process; since the multiple text lines included in the first text area may have an abnormality in the initial predicted position of one or more text lines, the correction process can be The abnormal initial predicted position is corrected based on the line height of other text lines, and the initial predicted position may also be corrected based on other methods.
  • the embodiment of the present disclosure does not limit the specific correction method.
  • it is determined whether there is an abnormality in the initial position of the multiple text lines by determining whether there is a text line with abnormal line height among the multiple text lines. For example, in response to the presence of a corresponding text line whose initial predicted line height is greater than a first preset line height among the multiple text lines, it is determined that the initial predicted positions of the multiple text lines are abnormal. For another example, in response to the average predicted line height of the multiple text lines being higher than the second preset line height, it is determined that the initial predicted positions of the multiple text lines are abnormal, and so on.
  • the first preset line height may be obtained by counting the text line heights in a large number of documents, for example, the first preset line height is set to 15 pixels.
  • whether the row height is greater than the first preset row height is used as a criterion for determining whether the initial predicted row height is normal.
  • the line height of each text line is less than or equal to the first preset line height, it means that the recognition result of the number of lines and the initial predicted line height is relatively accurate.
  • the upper boundary of a text area and the lower boundary of the first text area and the number of lines obtain the first average line height, and use the first average line height as the target predicted line height of each text line, Then determine the target predicted position of each text line.
  • the initial predicted line height of one or more text lines in the multiple text lines is greater than the first preset line height, it indicates that the initial predicted line height of the multiple text lines is incorrectly recognized, and it is necessary Make corrections to improve the accuracy of text recognition results.
  • step 404 includes: in response to determining that the initial predicted positions of the plurality of text lines are abnormal, determining that the first text area has an abnormal initial predicted line height; in response to determining that the first The initial predicted line height of the first text line in a text area is abnormal, and the initial predicted line height of the first text line is corrected to obtain the target predicted line height of the first text line; based on the first text line The target predicted line height of is corrected to the initial predicted position of the first text line to obtain the target predicted position of the first text line.
  • the initial predicted positions of multiple text lines are abnormal
  • first determine which of the multiple text lines has abnormal initial predicted positions and then perform position correction on the text lines with abnormal initial predicted positions .
  • an abnormality is detected in the initial predicted position of the first text line among the multiple text lines, for example, the initial predicted line height is abnormal, then the predicted line height is corrected for the first text line to obtain an accurate Target predicted location.
  • the multiple text lines are divided by the The second predicted average line height of at least one second text line other than the first text line, and the initial predicted line height of the first text line is corrected based on the second predicted average line height.
  • the first predicted average line height of the first text area may be obtained based on the position information of the boundary defining point of the first text area and the predicted line number, and then based on the first predicted average line height and the first text line
  • the initial predicted line height of the first text area is the average predicted line height of at least one second text line remaining in the first text area, that is, the second average predicted line height.
  • the second average predicted line height can be used for the first text line The initial predicted line height of is corrected to obtain the target predicted line height of the first text line.
  • FIG. 5 is a schematic diagram of another process of the credential identification method provided by an embodiment of the disclosure. Wherein, illustratively, step 404 includes the following steps.
  • Step 502 Determine whether the initial predicted line height corresponding to the initial predicted position of the first text line is abnormal based on the information of the at least two boundary defining points of the first text area and the initial predicted position of at least one adjacent line of the first text line .
  • the adjacent line can be the previous text line and/or the next text line of the first text line.
  • the adjacent line is the next text line, and when the first text line is the middle line
  • the adjacent line is the previous text line and the next line of text
  • the adjacent line is the previous line
  • the line of each text line in the multiple text lines included in the first text area The height should be the same. Therefore, when the difference between the initial predicted row height of the first text row and the adjacent row reaches a certain level, it indicates that the initial predicted row height of the first text row is abnormal.
  • Step 504 in response to determining that the initial predicted line height of the first text line is abnormal, correct the initial predicted line height of the first text line to obtain the target predicted line height of the first text line.
  • the second text area is usually adjacent to the first text area.
  • the position of the last line in the first text area in the embodiment of the present disclosure generally does not need to be corrected.
  • the initial predicted position of the first text line is corrected with the next line of the first text line, and the correction of the text line in the first text area is from the first line to the second to last line; and when the second text area is in the first
  • the position of the first line in the first text area of the embodiment of the present disclosure does not usually need to be corrected.
  • the initial predicted position of the first text line is corrected by the upper line of the first text line, and the first text line is corrected.
  • the correction of the text line in a text area is from the last line to the second line.
  • Step 506 Correct the initial predicted position of the first text line based on the target predicted line height of the first text line to obtain the target predicted position of the first text line.
  • the lower boundary may be determined based on the determined upper boundary of the first text line, or the upper boundary may be determined based on the determined lower boundary of the first text line, based on The upper boundary and the lower boundary can determine the target predicted position.
  • the initial predicted upper boundary of the first text line is adjusted to obtain the target predicted upper boundary of the first text line.
  • the upper boundary of the first text line may be determined based on the upper boundary of the next line.
  • the lower boundary of the first text line may have an intersection with the upper boundary of the next text line.
  • the lower boundary of the first text line is corrected to prevent the text of the next text line from affecting the first text line.
  • the lower boundary of the first text line the upper boundary of the next text line-1 pixel.
  • the upper boundary of the target prediction of the first text line the lower boundary of the first text line ⁇ the height of the target prediction line.
  • the initial predicted line height of the first text line is corrected by the initial predicted position of the adjacent line, and then the target predicted position is determined based on the corrected target predicted line height, so that the obtained first text area includes more
  • Each text line is more accurate in terms of line height and position relationship, which improves the accuracy of content recognition in the first text area.
  • FIG. 6 is a schematic diagram of still another process of the credential identification method provided by an embodiment of the disclosure. Wherein, for example, step 502 includes the following steps.
  • Step 602 Determine a first predicted average line height of multiple text lines in the first text area based on the information of the at least two boundary defining points of the first text area and the predicted line number of the first text area.
  • the at least two boundary defining points include the upper left key point and the lower right key point
  • the upper boundary coordinates of the first text area may be determined based on the upper left key point of the first text area
  • the first text area may be determined based on the lower right key point
  • the height of the first text area can be determined by the difference between the upper and lower boundary coordinates, and the predicted number of lines included in the first text area is recognized based on the neural network. At this time, the height of the first text area is By predicting the number of rows, the first predicted average row height can be determined.
  • Step 604 Determine the first predicted average row height based on at least one of the first predicted average row height of the multiple text rows in the first text area and the initial predicted row height corresponding to the initial predicted position of at least one adjacent row of the first text row. Whether the initial predicted line height of the text line is abnormal; for example, based on the first predicted average line height of the first text area and the initial predicted line height corresponding to the initial predicted position of at least one adjacent line of the first text line, determine the first Whether the initial predicted line height of the text line is abnormal.
  • the first predicted average line height can be used to measure the line heights of all text lines in the first text area.
  • the line number prediction is accurate, based on the initial predicted line height of the first text line and the first predicted average line height
  • the relationship between can determine whether the initial predicted line height is abnormal, for example, the initial predicted line height of the first text line is greater than a set multiple of the first predicted average line height.
  • the embodiment of the present disclosure adds the initial predicted position of adjacent lines on the basis of the first predicted average line height as whether the initial predicted line height of the first text line is abnormal.
  • the evaluation basis improves the accuracy of judging whether the initial predicted row height is abnormal.
  • step 604 includes: in response to the initial predicted line of the first text line being up to a first preset multiple of the first predicted average line height, determining that the initial predicted line height of the first text line is abnormal, or , In response to the initial prediction line of the first text line being up to a second preset multiple of the initial prediction line height of at least one adjacent line of the first text line, determining that the initial prediction line height of the first text line is abnormal, or, responding The initial predicted line of the first text line is up to a first preset multiple of the first predicted average line height and the initial predicted line of the first text line is up to the initial predicted line height of at least one adjacent line of the first text line The second preset multiple determines that the initial predicted line height of the first text line is abnormal.
  • the first prediction multiple and the second preset multiple may be the same or different.
  • the first prediction multiple and the second preset multiple are set to 1.2, etc.
  • the embodiment of the present disclosure does not limit the first prediction multiple and the second preset The specific value of the multiple.
  • step 604 includes: responding that the initial predicted line of the first text line is up to a first preset multiple of the first predicted average line height, and the initial predicted line of the first text line is up to the first The second preset multiple of the initial predicted line height of the next text line of one text line determines that the initial predicted line height of the first text line is abnormal.
  • the embodiment of the present disclosure is directed to the case where the second text area is located above the first text area. At this time, the lower the text line is, the farther away the second text area will interfere with the text content, that is, the lower text line
  • the initial predicted line height is relatively more variable. Therefore, the embodiment of the present disclosure performs abnormal confirmation on the initial predicted line height of the first text line based on the initial predicted line height of the next text line, which improves the accuracy of abnormal situation confirmation.
  • step 504 includes: determining a second prediction of a text line other than the first text line among the plurality of text lines based on the first predicted average line height and the initial predicted line height of the first text line Average line height: Based on the second predicted average line height, the initial predicted line height of the first text line is corrected to obtain the target predicted line height of the first text line.
  • the initial predicted line height of the first text line has been determined to be abnormal based on the first predicted average line height and the initial predicted line height of the next text line. At this time, other text lines (including the next text line) The initial predicted line height of line) is relatively accurate.
  • the second predicted average line height is obtained by averaging the initial predicted line heights of other text lines, and the initial predicted line height of the first text line is calculated based on the second predicted average line height.
  • the correction makes the target predicted line height of the first text line closer to the line height of other text lines in the first text area, which provides the accuracy of the target predicted line height of each text line in the first text area.
  • the line height of the first text line is corrected to the second preset value; for example, in response to the correction line of the first text line
  • the height is greater than or equal to a second preset value, and the initial predicted row height corresponding to the initial predicted position of the next text row of the first text row is taken as the target predicted row height of the first text row.
  • the line height of the first text line is corrected to the second predicted average line height.
  • the line height of the first text line is theoretically equal to the second predicted average line height determined based on the line heights of other lines after removing the first text line height. If the second predicted average line height is greater than the first preset value, it means that this time
  • the first text line detected is not a line in the first text area of the real ID, but the result of merging the two lines into one line after a false detection. For example, suppose there are four lines in the first text area of the real ID card, and Three rows are actually detected. The row height of the middle row happens to be close to the first average row height. At this time, the middle row is corrected based on the initial predicted row height of the second row of the first and third rows; Set the line height of the first text line to the second preset value. If the second predicted average line height is less than or equal to the second preset value, set the line height of the first text line to the second predicted average line height .
  • the target predicted line height of the first text line is determined, while keeping the lower boundary of the first text line still, the target predicted line height of the first text line is The prediction upper boundary corresponding to the initial prediction position of the first text line is adjusted to obtain the target prediction upper boundary of the first text line.
  • step 604 includes:
  • the corrected line height of the first text line is obtained.
  • the first text line is the middle line, and the text line adjacent to it includes the previous text line and the next text line.
  • the initial predicted line height of the first text line passes the first
  • the predicted average line height and the initial predicted line height of the next text line cannot be determined whether it is abnormal, it may happen that the initial predicted line height of the first text line is close to the first predicted average price line height, but greater than the next line of text
  • the initial predicted line height of the first text line is used to confirm whether it is a line number recognition error.
  • Two text lines are recognized as a first text line, when the initial prediction line of the first text line is as high as the second preset multiple of the initial prediction line height of the previous text line and the next text line (for example, , Close to 2 times, etc.), it can be confirmed that it is a line number recognition error.
  • the line height of the first text line is corrected by the initial predicted line height of the previous text line and the next text line; the correction process includes:
  • the third predicted average line height is used as the target predicted line height of the first text line.
  • step 504 it further includes:
  • the initial predicted line height of the next text line of the first text line is used as the target predicted line height of the first text line;
  • the modified line height of the first text line is taken as the target predicted line height of the first text line.
  • the corrected line height is still significantly greater than the standard line height.
  • the corrected line height provided by the embodiment of the present disclosure The line height is greater than or equal to the second preset value (for example, 22 pixels). At this time, it means that there is still a problem with the line height of the first text line. If the first text line is not the first line, the next text line
  • the initial predicted line height is used as the target predicted line height of the first text line; when the corrected line height is close to the standard line height, for example, the corrected line height in the embodiment of the present disclosure is less than the third preset value, then the line height will be corrected As the target predicted line height for the first text line.
  • step 330 includes: correcting the initial predicted position of the third text area in the at least one target text area based on the target predicted line height corresponding to the target predicted position of the multiple text lines contained in the first text area, Obtain the target predicted position of the third text area; and obtain the text recognition result of the third text area based on the target predicted position of the third text area.
  • the line height of each text line in the first text area is the corrected target predicted line height.
  • the third text area for example, the name field in the ID card image
  • the initial predicted line height of is abnormal (for example, greater than the set line height or the difference from the set line height is greater than a preset value, etc.)
  • the initial predicted position of the third text area Make corrections to obtain the final predicted position of the third text area.
  • the third predicted average row height of the first text area can be obtained by averaging the target predicted row height of each text row in the first text area, and the row height of the third text area is calculated based on the average row height.
  • Correction may be to replace the line height of the text line in the third text area with the third predicted average line height.
  • the information of each line of text detection in the first text area is read. If the line height of each line is normal and no abnormal height occurs, then the average line height of the first text area is recorded. High for correction.
  • the correction rules may include: if the line height of the text line in the third text area-the third predicted average line height of the first text area> 2 pixels, then the line height of the text line in the third text area is corrected to the first The third predicted average line height of a text area.
  • the credential includes an ID card; and/or, the first text area includes an address area.
  • FIG. 7 is a diagram of an application example of the certificate recognition method provided by the embodiment of the disclosure.
  • Step 710 Perform key point detection on the certificate image of the ethnic minority ID card to obtain the information of the 24-point key point of the ethnic minority ID card.
  • the 24-point key point includes the key point in the upper left corner and the key in the lower right corner of the address field information area.
  • Point, the address field information area includes multiple text lines corresponding to Chinese characters.
  • the address field information area is determined by the key points in the upper left corner and the key point in the lower right corner, and the number of text lines included in the address field information area and the line height of each text line are recognized by means such as neural network.
  • Step 730 Determine whether the line height of each text line is normal (for example, the difference between the line height of the ID card counted by big data is less than the set value), if the line height of each text line is normal, go to step 750, otherwise, go to Step 740;
  • step 740 if the number of text lines in the address field information area obtained by the identification is greater than or equal to 3 lines and one or more text lines (usually one text line) are abnormal in height, perform a measurement on the height of the text line with abnormal height. Correction to obtain the average line height of the text line in the address field information area after correction. In some embodiments, since the ethnic minority characters are located above the Chinese characters, the correction method at this time only corrects the first N-1 lines, and does not correct the last line. N represents the number of text lines included in the address field information area.
  • Step 750 Record the average line height avg_h_addr of the text line in the address field information area, and correct the line height h_name of the name field information area.
  • the correction rule is: if h_name–avg_h_addr>2 pixels (pixels), the line height h_name of the name field information area is corrected to the average line height avg_h_addr of the address field.
  • Step 760 Recognize the Chinese character content of each text line in the address field information area based on the average line height of the text line in the address field information area to obtain the address information in the ethnic minority ID card, based on the corrected name field information area Xing Gao recognizes the content of Chinese characters in the name field information area, obtains the name information in the ethnic minority ID card, and realizes the identification of the ethnic minority ID card.
  • FIG. 8 is a diagram of another application example of the certificate recognition method provided by the embodiments of the disclosure.
  • the line height correction method provided in step 740 the multiple text lines in the address field information area of the ethnic minority ID card are sequentially corrected from top to bottom (for example, from the first line to the N-1th line).
  • the correction process includes the following steps.
  • Step 802 Obtain the upper and lower boundaries of the rectangular box where the address field information area is located and the number of rows, calculate and obtain the average row height of the text line in the address field information area of the ethnic minority ID card; detect the row height of the current row and the row height of the next row Row height.
  • Step 804 Determine whether the row height of the current row is greater than or equal to 1.2 times the row height of the next row (it is a set value, which can be set according to different situations), and is greater than or equal to 1.2 times the average row height (the set value, It can be set according to different situations), if it is, it is determined that the row height of the current row is abnormal, and step 806 is executed; otherwise, step 808 is executed.
  • the current height is theoretically equal to the average line height of other lines (all text lines except the current line in the address field) after removing the line height of the current line. New_h_avg_line.
  • Optional value can be obtained through big data statistics), it means that the current line detected at this time is not really a line of the address field of the ethnic minority ID card, but the result of merging the two lines into one line after a false detection.
  • Set the line height of the current line to 15 pixels, if new_h_avg_line is less than or equal to 15 pixels, set the line height of the current line to new_h_avg_line to obtain the corrected line height of the current line, and go to step 810.
  • Step 808 When it is detected that the row height of the current row is close to the average row height (for example, the row height of the current row is equal to the height of the address field information area divided by the number of rows), then it is judged that the row height of the current row is equal to the current row height.
  • the height difference between the row heights of two adjacent rows if the row height of the current row is greater than 1.8 times the row height of the next row (it is a set value, can be set according to different situations) and greater than 1.8 times the row height of the previous row ,
  • the situation in this step may correspond to the situation if the address field of the real ethnic minority ID card has four lines, but three lines are actually detected.
  • Step 810 Determine whether the corrected row height of the current row is greater than 22 pixels (optional value and can be obtained through big data statistics), if yes, go to step 812, otherwise, use the corrected row height of the current row as the target row of the current row High, go to step 814.
  • step 812 when the current row is not the first row, the row height of the next row is taken as the target row height of the current row, and step 814 is executed.
  • Step 814 Correct the upper boundary of the current line.
  • a person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.
  • FIG. 9 is a schematic structural diagram of a credential identification device provided by an embodiment of the disclosure.
  • the device can be used to implement the foregoing method embodiments of the present disclosure. As shown in Figure 9, the device includes:
  • the key point detection unit 91 is configured to perform key point detection on the credential image to obtain information on multiple key points of the credential included in the credential image.
  • the multiple key points include at least two boundary defining points of the first text area in the certificate, and the first text area includes multiple text lines corresponding to the first character type.
  • the text recognition unit 92 is used to determine the text recognition result of the certificate based on the information of multiple key points.
  • the text recognition result of the document is determined based on the information of the multiple key points.
  • the recognition accuracy of the text position of the multi-line text in the first text area reduces the influence of other character types on the text recognition of the first character type, and improves the recognition accuracy of the content of the first character type in the certificate .
  • the certificate further includes a second text area, wherein the second text area includes at least one text line corresponding to a second character type different from the first character type, and the second text area and the first text area The text content is the same.
  • the first character type is Chinese characters
  • the second character type is ethnic minority scripts.
  • the text recognition unit 92 includes:
  • a position prediction module configured to determine the target predicted position of each text line in the plurality of text lines contained in the first text area based on the information of at least two boundary defining points of the first text area;
  • the text recognition module is configured to determine at least one target text area corresponding to the first character type contained in the certificate based on the target predicted position of each text line in the plurality of text lines contained in the first text area Perform recognition to obtain the text recognition result of the certificate.
  • the position prediction module is configured to determine, based on the information of at least two boundary defining points of the first text area, the value of each text line in the plurality of text lines contained in the first text area Initial prediction position; determine whether there is an abnormality in the initial prediction positions of the plurality of text lines; in response to determining that the initial prediction positions of the plurality of text lines are abnormal, the initial prediction positions of the plurality of text lines contained in the first text area The predicted position is corrected to obtain the target predicted position of the multiple text lines.
  • the position prediction module includes:
  • the position prediction module is configured to determine that there is an abnormality in the initial prediction position of the plurality of text lines in response to the presence of a corresponding text line whose initial prediction line height is greater than the first preset line height among the plurality of text lines.
  • the position prediction module includes:
  • the position prediction module is configured to determine that the initial predicted position of the plurality of text lines is abnormal, and determine the text line with the abnormal initial predicted line height in the first text area; in response to determining that the first text area is abnormal
  • the initial predicted line height of the first text line is abnormal, and the initial predicted line height of the first text line is corrected to obtain the target predicted line height of the first text line; based on the target predicted line of the first text line Gao corrects the initial predicted position of the first text line to obtain the target predicted position of the first text line.
  • the position prediction module is configured to determine the first predicted average line height of the multiple text lines included in the first text area and the initial predicted line height of the first text line The second predicted average row height of at least one second text row in the plurality of text rows except the first text row; based on the second predicted average row height, the initial predicted row height of the first text row High to make corrections.
  • the position prediction module is configured to correct the line height of the first text line to a second preset value in response to the second predicted average line height exceeding a first preset value; and /Or in response to the second predicted average line height being less than or equal to the second preset value, correcting the line height of the first text line to the second predicted average line height.
  • the position prediction module is configured to correct the initial predicted line height of the first text line to obtain the corrected line height of the first text line;
  • the modified line height is greater than or equal to the second preset value, the initial predicted line height corresponding to the initial predicted position of the next text line of the first text line is used as the target predicted line height of the first text line, and/or In response to the modified line height of the first text line being less than a third preset value, the modified line height of the first text line is used as the target predicted line height of the first text line.
  • the position prediction module is configured to adjust the prediction upper boundary corresponding to the initial prediction position of the first text line based on the target prediction line height of the first text line to obtain the first The target prediction upper boundary of a text line.
  • the position prediction module is configured to be based on a first predicted average line height of a plurality of text lines in the first text area and an initial predicted position of at least one adjacent line of the first text line Corresponding to at least one item of the initial predicted line height, determining whether the initial predicted line height of the first text line is abnormal.
  • the position prediction module is configured to respond to that the initial predicted line of the first text line is up to a first preset multiple of the first predicted average line height
  • the position prediction module is further configured to determine the first text area based on the information of at least two boundary defining points of the first text area and the number of predicted lines of the first text area The first predicted average line height of multiple text lines in.
  • the text recognition module is configured to, based on the target predicted line height corresponding to the target predicted position of the multiple text lines contained in the first text area, perform a calculation of the third text in the at least one target text area.
  • the initial predicted position of the region is corrected to obtain the target predicted position of the third text region; based on the target predicted position of the third text region, the text recognition result of the third text region is obtained.
  • the text recognition module is configured to determine the target predicted average line of the multiple text lines in the first text area based on the target predicted line height of the multiple text lines contained in the first text area high;
  • the initial predicted position of the third text line is corrected to obtain the The final predicted position of the third text line.
  • the certificate includes an ID card; and/or
  • the first text area includes an address field information area.
  • an electronic device including a processor, and the processor includes the credential identification device of any of the foregoing embodiments of the present disclosure.
  • an electronic device including: a memory for storing executable instructions;
  • processor is configured to communicate with the memory to execute executable instructions to complete any of the above-mentioned embodiments of the certificate recognition method provided in the present disclosure.
  • a computer storage medium for storing instructions readable by a computer.
  • the processor executes any of the foregoing implementations of the certificate identification method provided by the present disclosure example.
  • a computer program including computer-readable code.
  • the computer-readable code runs on a device
  • a processor in the device executes the credential identification method provided by the present disclosure.
  • a computer program product for storing computer-readable instructions, which when executed, cause a computer to execute the credential identification method described in any of the foregoing possible implementations.
  • the embodiments of the present disclosure also provide a computer program product for storing computer-readable instructions, which when executed, cause the computer to execute the instructions in any of the above-mentioned embodiments.
  • the identification method of the document is not limited to:
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • another certificate recognition method and its corresponding device and electronic equipment, computer storage medium, computer program, and computer program product are also provided.
  • the method includes: performing key point detection on the certificate image to obtain the certificate Information about multiple key points of the certificate included in the image, where the multiple key points include at least two boundary defining points of the first text area in the certificate, and the first text area includes a plurality of texts corresponding to the first character type OK; Based on the information of multiple key points, determine the text recognition result of the document.
  • the target tracking instruction may specifically be a call instruction
  • the first device may instruct the second device to perform credential identification by invoking.
  • the second device may execute the credential identification described above. Steps and/or processes in any embodiment of the method.
  • plural can refer to two or more, and “at least one” can refer to one, two, or more than two.
  • the embodiments of the present disclosure also provide an electronic device, which may be a mobile terminal, a personal computer (PC), a tablet computer, a server, etc., for example.
  • an electronic device which may be a mobile terminal, a personal computer (PC), a tablet computer, a server, etc., for example.
  • the electronic device 1000 includes one or more processors and a communication unit.
  • the one or more processors are, for example: one or more central processing units (CPU) 1001, and/or one or more graphics processing units (GPU) 1013, etc.
  • the processors may be stored in a read-only memory (The executable instructions in the ROM 1002 or the executable instructions loaded from the storage part 1008 to the random access memory (RAM) 1003 execute various appropriate actions and processes.
  • the communication unit 1012 includes but is not limited to a network card, and the network card includes but is not limited to an IB (Infiniband) network card.
  • the processor can communicate with the read-only memory 1002 and/or the random access memory 1003 to execute executable instructions, is connected to the communication unit 1012 through the bus 1004, and communicates with other target devices via the communication unit 1012, thereby completing the embodiments of the present disclosure.
  • the operation corresponding to any one of the methods, for example, performing key point detection on the document image to obtain information of multiple key points of the document included in the document image, wherein the multiple key points include at least two of the first text area in the document.
  • a boundary defining point, the first text area includes a plurality of text lines corresponding to the first character type; based on the information of the plurality of key points, the text recognition result of the certificate is determined.
  • RAM 1003 can also store various programs and data required for device operation.
  • the CPU 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004.
  • ROM1002 is an optional module.
  • the RAM 1003 stores executable instructions, or writes executable instructions into the ROM 1002 during runtime, and the executable instructions cause the processor 1001 to perform operations corresponding to the aforementioned communication method.
  • An input/output (I/O) interface 1005 is also connected to the bus 1004.
  • the communication unit 1012 can be integrated, or can be configured to have multiple sub-modules (for example, multiple IB network cards) and be on the bus link.
  • the following components are connected to the I/O interface 1005: an input part 1006 including a keyboard, a mouse, etc.; an output part 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 1008 including a hard disk, etc. ; And a communication section 1009 including a network interface card such as a LAN card, a modem, and the like. The communication section 1009 performs communication processing via a network such as the Internet.
  • the driver 1010 is also connected to the I/O interface 1005 as needed.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1010 as required, so that the computer program read therefrom is installed into the storage part 1008 as required.
  • FIG. 10 is only an optional implementation.
  • the number and types of components in Figure 10 can be selected, deleted, added or replaced according to actual needs; Different functional component settings can also be implemented in separate settings or integrated settings.
  • the GPU and CPU can be set separately or the GPU can be integrated on the CPU.
  • the communication part can be set separately or integrated on the CPU or GPU. and many more.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program tangibly contained on a machine-readable medium, the computer program includes program code for executing the method shown in the flowchart, and the program code includes corresponding execution
  • the instructions corresponding to the steps of the method provided in the embodiments of the present disclosure for example, perform key point detection on the credential image to obtain information on multiple key points of the credential included in the credential image, where the multiple key points include the first text area in the credential.
  • the at least two boundary defining points of the first text area include a plurality of text lines corresponding to the first character type; the text recognition result of the certificate is determined based on the information of the plurality of key points.
  • the computer program may be downloaded and installed from the network through the communication part 1009, and/or installed from the removable medium 1011.
  • the computer program is executed by the central processing unit (CPU) 1001
  • the above-mentioned functions defined in the method of the present disclosure are executed.
  • the method and apparatus of the present disclosure may be implemented in many ways.
  • the method and apparatus of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless specifically stated otherwise.
  • the present disclosure may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Character Input (AREA)

Abstract

一种证件识别方法和装置、电子设备(1000)、计算机可读存储介质,其中,方法包括:对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息(210),其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;基于所述多个关键点的信息,确定所述证件的文本识别结果(220)。

Description

证件识别方法和装置、电子设备、计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为201910362419.4、申请日为2019年04月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及计算机视觉技术,尤其是一种证件识别方法和装置、电子设备、计算机可读存储介质。
背景技术
光学字符识别(Optical Character Recognition,OCR)技术被广泛应用于各种证件、卡片和票据的识别中。目前的OCR识别技术对于常用字符的识别具有较高的识别精度,然而对于少数民族文字等特殊类型的字符的识别精度还有待提高。
发明内容
本公开实施例提供的一种证件识别技术。
本公开实施例第一方面提供一种证件识别方法,包括:
对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;
基于所述多个关键点的信息,确定所述证件的文本识别结果。本申请实施例第二方面提供一种证件识别装置,包括:
关键点检测单元,用于对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;
文本识别单元,用于基于所述多个关键点的信息,确定所述证件的文本识别结果。
在一些实施例中,所述证件还包括第二文本区域,其中,所述第二文本区域包括至少一个对应于不同于所述第一字符类型的第二字符类型的文本行,且所述第二文本区域与所述第一文本区域的文本内容相同。
根据本公开实施例的另一方面,提供的一种证件识别装置,包括:
关键点检测单元,用于对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;
文本识别单元,用于基于所述多个关键点的信息,确定所述证件的文本识别结果。
根据本公开实施例的又一方面,提供的一种电子设备,包括处理器,所述处理器包括上述任意一项实施例所述的证件识别装置。
根据本公开实施例的还一方面,提供的一种电子设备,包括:存储器,用于存储可执行指令;
以及处理器,用于与所述存储器通信以执行所述可执行指令从而完成上述任意一项实施例所述的证件识别方法的操作。
根据本公开实施例的再一方面,提供的一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时执行上述任意一项实施例所述的证件识别方法的操作。
根据本公开实施例的另一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现上述任意一项实施例所述的证件识别方法的指令。
根据本公开实施例的又一个方面,提供的另一种计算机程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任一可能的实现方式中所述人脸识别方法或人脸识别网络的训练方法的操作。
在一个可选实施方式中,所述计算机程序产品具体为计算机存储介质,在另一个可选实施方式中,所述计算机程序产品具体为软件产品,例如SDK等。
根据本公开实施例还提供了另一种证件识别方法和装置、电子设备、计算机可读存储介质、计算机程序产品,其中,对证件图像进行关键点检测,获得所述证件图像的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;基于所述多个关键点的信息,确定所述证件的文本识别结果。
基于本公开上述实施例提供的一种证件识别方法和装置、电子设备、计算机可读存储介质,对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;基于所述多个关键点的信息,确定所述证件的文本识别结果,通过增加第一文本区域的至少两个边界限定点,有利于提高对第一文本区域中多行文本的文本位置的识别准确率,减小了其他字符类型对第一字符类型的文本识别带来的负面影响,提高了对证件中第一字符类型内容的识别准确率。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开实施例提供的证件识别技术适用的身份证示例图。
图2为本公开实施例提供的证件识别方法的一个流程示意图。
图3为本公开实施例提供的证件识别方法的另一流程示意图。
图4为本公开实施例提供的证件识别方法的另一流程示意图。
图5为本公开实施例提供的证件识别方法的又一流程示意图。
图6为本公开实施例提供的证件识别方法的再一流程示意图。
图7为本公开实施例提供的证件识别方法的一个应用示例图。
图8为本公开实施例提供的证件识别方法的另一个应用示例图。
图9为本公开实施例提供的证件识别装置的结构示意图。
图10为本公开实施例的电子设备的示例性结构示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开实施例主要应用于身份证识别,但也可以应用于其他具有固定或部分固定版式的证件或票据的识别,本公开实施例对此不做限定。
目前的OCR识别算法对大部分身份证例如汉族身份证具有较高的识别精度,然而,对于一小部分身份证例如少数民族身份证的识别主要面临以下几种关键问题:
常见的少数民族身份证,如蒙古族,维吾尔族等,这些证件中了除了汉字之外,还有相对应的少数民族文字,例如如图1所示。相关技术中使用的身份证识别模型并不能识别少数民族文字,所以在少数民族身份证的文本识别中会将少数民族文字识别为乱码,同时因为少数民族文字的影响,导致汉字识别出现大量错误。
此外,少数民族身份证具有多种版式,以地址字段为例,目前有两种常用版式,第一类版本中少数民族文字和汉字之间没有明显行差,逐行出现;在第二类版式中,如图1所示,少数民族和汉字虽然出现在同一区域,但是中间有明显行差,非逐行出现。版式的多样化也会影响少数民族身份证识别的准确性。
针对上述至少一种问题,本公开实施例提出了一种图像识别技术,通过在关键点增加以下限定点作为关键点:汉字区域中包括多个文本行的第一文本区域的至少两个边界限定点(例如,左上角关键点和右下角关键点等可确定第一文本区域边界的点),提高对至少包含第一文本区域的汉字区域的定位精度,降低少数民族文字对汉字识别的影响,从而有利于提高证件识别精度。
图1示例性地示出了本公开实施例中的24个关键点,包括:证件图像的四个顶角关键点、字段名称区域(包括:“姓名”、“性别”、“出生”、“住址”和“公民身份证号码”)的左上角关键点和右下角关键点、部分字段的字段信息区域(包括:姓名字段信息区域、性别字段信息区域、民族字段信息区域和身份证号码字段信息区域)的左上角关键点和右下角关键点,此外,还包括地址字段信息区域的左上角关键点和右下角关键点,本公开实施例通过地址字段信息区域的左上角关键点和右下角关键点,提高了对少数民族身 份证中的汉字识别的准确率。
应理解,图1所示的24个关键点仅用于示例,本公开实施例也可以采用其他数量和类型的关键点,本公开实施例对此不做限定。
应理解,本公开实施例提供的技术方案有利于少数民族身份证识别的精度,但也适用于汉族身份证的识别,或者任意类似地包含至少两种不同文字类型的证件识别,本公开实施例对此不做限定。
图2为本公开实施例提供的证件识别方法的一个流程示意图。
步骤210,对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息。
在一些实施中,该证件识别方法可应用于各种图像处理设备中,例如,该图像处理设备包括:手机、平板电脑、可穿戴式设备、门禁设备等终端设备。
在另一些实施例中,该证件识别方法可应用于网络侧的服务器中,利用,终端采集了一个证件图像,并上传到了服务器,服务器识别该证件图像获得证件图像所对应证件的证件信息,该证件信息至少包括文本识别结果。
例如,在需要用户提交身份信息进行身份验证的场景下,人采用本申请实施例的证件识别方法,用户就无需手动输入身份信息,而是可以简便的采集证件图像,终端或者服务器会通过对证件图像的识别,获得证件中文本识别结果。
其中,多个关键点包括证件中第一文本区域的至少两个边界限定点,第一文本区域中包括多个对应于第一字符类型的文本行。
所述多个关键点的信息包括:多个关键点在证件图像中的位置信息。
所述证件图像为对证件采集形成的图像。所述证件包括但不限于:身份证、护照、居住证、暂住证、学位证、学历证等各种包含多种类型字符的证件图像。
证件中包括两种类型的字符,即第一字符类型和第二字符类型,其中,第一字符类型和第二字符类型的文本出现在不同行,其中,第一字符类型的文本行和第二字符类型的文本行可以具有相同或不同的内容。
在一些实施例中,第一字符类型为可识别字符类型或识别的目标字符类型,例如汉字等,第二字符类型为无法识别的字符类型或不进行识别的字符类型,例如,少数民族文字等。例如在身份证识别技术中,为了保持识别技术的普适性,同时适用于汉族身份证的识别和少数民族身份证的识别,对身份证中的汉字进行识别,而不识别其中的少数民族文字。
在一些实施例中,所述第一字符类型可为汉字,而所述第二字符类型可为其他国家或地区所使用的语言,例如,其他国家的小语种的字符。
在本公开实施例中,对应于第一字符类型的文本区域可以仅包含第一字符类型的文本,或者也可以进一步包含除第一和第二字符类型之外的其他字符类型,例如数字等,类似地,对应于第二字符类型的文本区域可以包括第二字符类型的文本以及其他字符类型的文本,本公开实施例对此不做限定。
在一些可选的实施例中,证件还包括第二文本区域,其中,第二文本区域包括至少一个对应于不同于第一字符类型的第二字符类型的文本行,且第二文本区域与第一文本区域具有相同的文本内容。例如,如图1所示,身份证中的地址字段信息区域包含汉字信息区域和少数民族文字信息区域,表示人物的同一住址。假设第一文本区域和第二文本区域分别为图1所示的例子中的地址字段信息区域的汉字信息区域和少数民族文字信息区域,第二文本区域与第一文本区域相邻或间隔至少一个空白行,但本公开实施例不限于此。
本公开实施例对证件图像进行关键点检测,得到证件图像中包括的证件的多个关键点的信息,其中,关键点的信息包括位置信息,或者进一步包含其他信息,本公开实施例对此不做限定。
证件的多个关键点包括第一文本区域的至少两个边界限定点,例如,左上角关键点和右下角关键点,或者左下角关键点和右上角关键点,或者四个顶点,等等,本公开实施例对此不做限定。
通过在关键点中包含对应于第一字符类型的第一文本区域的至少两个边界限定点,可以较为精确地定位到第一文本区域,有利于得到第一文本区域的较为准确的预测行高,降低第二字符类型的文本对于证件识别的影响,提高识别精度。
步骤220,基于多个关键点的信息,确定证件的文本识别结果。
在一些实施例中,基于多个关键点的信息,可确定第一文本区域中包括的文本行的较为精确的位置,基于文本识别方法,并进一步对确定了位置的第一字符类型的文本进行识别,得到第一文本区域的文本识别结果。在一些实施例中,还可以基于第一文本区域中包括的第一字符类型的文本行的位置,确定证件中包括的其他文本区域中的第一字符类型的文本行的位置,有利于提高证件的文本识别精度。
基于本公开上述实施例提供的一种证件识别方法,对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限 定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;基于所述多个关键点的信息,确定所述证件的文本识别结果,通过增加第一文本区域的至少两个边界限定点,有利于提高对第一文本区域中多行文本的文本位置的识别准确率,减小了其他字符类型对第一字符类型的文本识别带来的影响,提高了对证件中第一字符类型内容的识别准确率。
在针对少数民族的证件中,第一字符类型为汉字,第二字符类型为少数民族文字。
目前文字识别技术尚未能实现针对少数民族文字进行识别,因此,本公开实施例需要排除少数民族文字对汉字内容的干扰,例如,对于少数民族文字和汉字非逐行出现时,即少数民族文字字段与汉字字段中间有间隔,此时原身份证处理方法经常检测不到文本区域,错误的将少数民族文字作为汉字进行检测识别,导致结果错误。
在一些实施例中,所述第一文本区域和所述第二文本区域均可为联通的四边形区域,例如,矩形区域。
图3为本公开实施例提供的证件识别方法的另一流程示意图。
步骤310,对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息。
其中,多个关键点包括证件中第一文本区域的至少两个边界限定点,第一文本区域中包括多个对应于第一字符类型的文本行。
步骤320,基于第一文本区域的至少两个边界限定点的信息,确定第一文本区域包含的多个文本行中每个文本行的目标预测位置。
在一些实施例中,基于第一文本区域的至少两个边界限定点的信息可确定一个矩形区域,该矩形区域中至少包括第一文本区域,可能还包括部分的第二文本区域;为了对第一文本区域中的第一字符类型进行识别,需要对每个文本行的位置进行确定,即本公开实施例中确定的每个文本行的目标预测位置,之后在目标预测位置进行文字识别,即可确定第一区域中包括的第一字符类型的内容。对于第一文本区域中内容的识别可逐行进行识别,逐行识别提高了文字识别的准确率,减少了行与行之间的交叉造成的识别错误。
步骤330,基于第一文本区域包含的多个文本行中每个文本行的目标预测位置,对证件中包含的对应于第一字符类型的至少一个文本区域进行识别,获得证件的文本识别结果。
证件的类型包括多种,因此,证件中可以包括多个可识别内容的文本区域(包括第一文本区域),这些文本区域中的字符类型都为第一字符类型,并且,由于证件属于格式较为固定的特殊图像,会存在多个文本区域中文字的行高相同,例如,在身份证中汉字的高度相同,即身份证图像中汉字行高相同;因此,在确定了第一文本区域中包括的文本行的目标预测位置时,即可确定第一文本区域中包括的文本行的行高,可以该行高对其他文本区域中的文本行的行高进行校正,以校正后的文本行行高确定其他文本区域中每个文本行的位置,进而确定其他文本区域中的内容,提高了其他文本区域中文字的识别准确率。
图4为本公开实施例提供的证件识别方法的另一实施例中部分流程示意图。在上述实施例基础上,步骤320包括:
步骤402,基于第一文本区域的至少两个边界限定点的信息,确定第一文本区域包含的多个文本行中每个文本行的初始预测位置。
在一些实施例中,文本行的初始预测位置可以包括文本行的上边界和下边界,通过上下边界的坐标即可确定文本行的位置;本公开实施例中的初始预测位置可以基于第一文本区域中包括的行数、每个文本行的初始行高、以及基于边界限定点的信息确定的第一文本区域的上边界和下边界确定,其中,行数和初始行高可利用神经网络获得,例如,利用深度神经网络识别证件中第一文本区域包括的行数和第一文本区域中每个文本的初始行高。
步骤404,响应于确定多个文本行的初始预测位置存在异常,对第一文本区域包含的多个文本行的初始预测位置进行修正处理,获得多个文本行的目标预测位置。
为了提高内容识别的准确率,本公开实施例在获得初始预测位置之后,需要判断该初始预测位置是否正常,当初始预测位置存在异常时,以该初始预测位置进行识别将导致识别内容的错误,本公开实施例通过修正处理,提高了文本行位置的准确性;由于第一文本区域中包括的多个文本行,其中可能存在一个或多个文本行的初始预测位置存在异常,其修正过程可以基于其他文本行的行高对存在异常的初始预测位置进行修正,也可以基于其他方式对初始预测位置进行修正,本公开实施例不限制具体的修正方式。
在得到多个文本行的初始预测位置之后,可以确定多个文本行的初始预测位置存在异常。
具体地,可以综合判断多个文本行的初始位置是否存在异常。在一些实施例中,通过判断多个文本行中是否存在行高异常的文本行,确定多个文本行的初始位置是否存在异常。例如,响应于所述多个文本行中存在对应的初始预测行高大于第一预设行高的文本行,确定所述多个文本行的初始预测位置存在异常。再例如,响应于所述多个文本行的平均预测行高高于第二预设行高,确定所述多个文本行的初始预测位置存在异常,等等。
在一些实施例中,第一预设行高可以是通过统计大量证件中的文本行高获得的,例如,将第一预设行高设置为15像素。
本公开实施例将是否大于第一预设行高作为初始预测行高是否正常的判断标准。当每个文本行的行高都小于或等于第一预设行高时,说明行数和初始预测行高的识别结果是相对准确的,此时,在一些实施例中,基于识别获得的第一文本区域上边界和第一文本区域下边界以及行数(或对所有行的行高求平均),获得第一平均行高,以第一平均行高作为各个文本行的目标预测行高,进而确定每个文本行的目标预测位置。而在另一些实施例中,当多个文本行中有一个或一个以上的文本行的初始预测行高大于第一预设行高时,说明多个文本行的初始预测行高识别错误,需要对其进行修正,以提高文字识别结果的准确率。
在一些实施例中,步骤404包括:响应于确定所述多个文本行的初始预测位置存在异常,确定所述第一文本区域中初始预测行高存在异常的文本行;响应于确定所述第一文本区域中第一文本行的初始预测行高异常,对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的目标预测行高;基于所述第一文本行的目标预测行高对所述第一文本行的初始预测位置进行修正,得到所述第一文本行的目标预测位置。
具体地,在确定多个文本行的初始预测位置存在异常的情况下,首先判断多个文本行中哪些文本行的初始预测位置存在异常,然后对这些初始预测位置存在异常的文本行进行位置修正。示例性地,如果检测到多个文本行中第一文本行的初始预测位置存在异常,例如,初始预测行高存在异常,则对该第一文本行进行预测行高的修正,从而得到精确的目标预测位置。
在一些实施例中,基于所述第一文本区域包括的多个文本行的第一预测平均行高和所述第一文本行的初始预测行高,确定所述多个文本行中除所述第一文本行之外的至少一个第二文本行的第二预测平均行高,并基于所述第二预测平均行高,对所述第一文本行的初始预测行高进行修正。
在一些实施例中,可以基于第一文本区域的边界限定点的位置信息以及预测行数,得到第一文本区域的第一预测平均行高,然后基于第一预测平均行高和第一文本行的初始预测行高,得到第一文本区域中剩余的至少一个第二文本行的平均预测行高,即第二平均预测行高,最后,可以基于第二平均预测行高,对第一文本行的初始预测行高进行修正,得到第一文本行的目标预测行高。
图5为本公开实施例提供的证件识别方法的又一流程示意图。其中,示例性地,步骤404包括以下步骤。
步骤502,基于第一文本区域的至少两个边界限定点的信息以及第一文本行的至少一个相邻行的初始预测位置,确定第一文本行的初始预测位置对应的初始预测行高是否异常。
其中,相邻行可以为第一文本行的上一行文本行和/或下一行文本行,当第一文本行为第一行时,该相邻行为下一行文本行,当第一文本行为中间行时,该相邻行为上一行文本行和下一行文本行,当第一文本行为最后一行时,该相邻行为上一行,第一文本区域中包括的多个文本行中每个文本行的行高应当相同,因此,当第一文本行与相邻行的初始预测行高之间的差异达到一定程度时,说明第一文本行的初始预测行高异常。
步骤504,响应于确定第一文本行的初始预测行高异常,对第一文本行的初始预测行高进行修正,得到第一文本行的目标预测行高。
在一些实施例中,由于第二文本区域中的内容与第一文本区域中的内容相同,因此,第二文本区域通常与第一文本区域相邻。
为了减少第二文本区域对第一文本区域内的文字内容产生影响,当第二文本区域在第一文本区域上方时,本公开实施例第一文本区域中最后一行的位置通常不需要进行修正。此时以第一文本行的下一行对第一文本行的初始预测位置进行修正,对第一文本区域中文本行的修正从第一行到倒数第二行;而当第二文本区域在第一文本区域下方时,本公开实施例第一文本区域中第一行的位置通常不需要进行修正,此时以第一文本行的上一行对第一文本行的初始预测位置进行修正,对第一文本区域中文本行的修正从最后一行到第二行。
步骤506,基于第一文本行的目标预测行高对第一文本行的初始预测位置进行修正,得到第一文本行的目标预测位置。
在一些实施例中,确定了第一文本行的目标预测行高之后,可基于确定的第一文本行的上边界确定下边界,或基于确定的第一文本行的下边界确定上边界,基于上边界和下边界即可确定目标预测位置。
在一些实施例中,基于第一文本行的目标预测行高,对第一文本行的初始预测上边界进行调整,得到第一文本行的目标预测上边界。
当已确定第一文本行的目标预测行高后,当第二文本区域位于第一文本区域上方,可确定可能出现错误识别的通常为上边界。此时,可基于下一行的上边界确定第一文本行的上边界,在一些实施例中,第一 文本行的下边界与下一文本行的上边界可能有交集,本公开实施例可对第一文本行的下边界进行修正,以防止下一文本行的文字对第一文本行产生影响。例如,第一文本行的下边界=下一文本行的上边界–1像素(pixel)。可选地,通过第一文本行的目标预测上边界=第一文本行的下边界–目标预测行高。
本公开实施例通过相邻行的初始预测位置对第一文本行的初始预测行高进行修正,再基于修正后的目标预测行高确定目标预测位置,使获得的第一文本区域中包括的多个文本行在行高和位置关系上更加准确,提高了第一文本区域中内容识别的准确率。
图6为本公开实施例提供的证件识别方法的再一流程示意图。其中,示例性地,步骤502包括以下步骤。
步骤602,基于第一文本区域的至少两个边界限定点的信息以及第一文本区域的预测行数,确定第一文本区域中多个文本行的的第一预测平均行高。
例如,至少两个边界限定点包括左上角关键点和右下角关键点,可基于第一文本区域的左上角关键点确定第一文本区域的上边界坐标,基于右下角关键点确定第一文本区域的下边界坐标,通过上边界坐标和下边界坐标求差可确定第一文本区域的高度,基于神经网络识别第一文本区域中包括的预测行数,此时,以第一文本区域的高度处于预测行数,即可确定第一预测平均行高。
步骤604,基于第一文本区域中多个文本行的第一预测平均行高以及基于第一文本行的至少一个相邻行的初始预测位置对应的初始预测行高的至少一项,确定第一文本行的初始预测行高是否异常;例如,基于第一文本区域的第一预测平均行高以及基于第一文本行的至少一个相邻行的初始预测位置对应的初始预测行高,确定第一文本行的初始预测行高是否异常。
本公开实施例中,第一预测平均行高可用于衡量第一文本区域中所有文本行的行高,当行数预测准确时,基于第一文本行的初始预测行高与第一预测平均行高之间的关系即可确定初始预测行高是否异常,例如,第一文本行的初始预测行高大于第一预测平均行高的设定倍数。但还可能在识别过程中对行数预测错误,因此,本公开实施例在第一预测平均行高的基础上增加相邻行的初始预测位置作为第一文本行的初始预测行高是否异常的评价基础,提高了判断初始预测行高是否异常的准确性。
例如,在一些实施例中,步骤604包括:响应于第一文本行的初始预测行高达到第一预测平均行高的第一预设倍数,确定第一文本行的初始预测行高异常,或,响应于第一文本行的初始预测行高达到第一文本行的至少一个相邻行的初始预测行高的第二预设倍数,确定第一文本行的初始预测行高异常,或,响应于第一文本行的初始预测行高达到第一预测平均行高的第一预设倍数且第一文本行的初始预测行高达到第一文本行的至少一个相邻行的初始预测行高的第二预设倍数,确定第一文本行的初始预测行高异常。此时第一预测倍数和第二预设倍数可相同或不同,例如,将第一预测倍数和第二预设倍数设为1.2等,本公开实施例不限制第一预测倍数和第二预设倍数的具体取值。
又例如,在一些实施例中,步骤604包括:响应于第一文本行的初始预测行高达到第一预测平均行高的第一预设倍数,且第一文本行的初始预测行高达到第一文本行的下一文本行的初始预测行高的第二预设倍数,确定第一文本行的初始预测行高异常。
本公开实施例是针对第二文本区域位于第一文本区域上方的情况,此时,越位于下方的文本行越远离会对文本内容产生干扰的第二文本区域,即,位于下方的文本行的初始预测行高相对较为转,因此,本公开实施例基于下一文本行的初始预测行高对第一文本行的初始预测行高进行异常确认,提高了异常情况确认的准确率。
在一些实施例中,步骤504,包括:基于第一预测平均行高和第一文本行的初始预测行高,确定多个文本行中除第一文本行之外的其他文本行的第二预测平均行高;基于第二预测平均行高,对第一文本行的初始预测行高进行修正,得到第一文本行的目标预测行高。在本公开实施例中,已基于第一预测平均行高和下一文本行的初始预测行高确定第一文本行的初始预测行高异常,此时,可认为其他文本行(包括下一文本行)的初始预测行高相对准确,因此,以其他文本行的初始预测行高求平均获得第二预测平均行高,以该第二预测平均行高对第一文本行的初始预测行高进行修正,使第一文本行的目标预测行高与第一文本区域中的其他文本行的行高更接近,提供了第一文本区域中各文本行的目标预测行高的准确性。
在一些实施例中,响应于第二预测平均行高超过第一预设数值,将第一文本行的行高修正为第二预设数值;例如,响应于所述第一文本行的修正行高大于或等于第二预设数值,将所述第一文本行的下一文本行的初始预测位置对应的初始预测行高作为所述第一文本行的目标预测行高。
在另一些实施例中,响应于第二预测平均行高小于或等于第二预设数值,将第一文本行的行高修正为第二预测平均行高。
第一文本行的行高理论上等于去除第一文本行高之后基于其他行的行高确定的第二预测平均行高,如 果第二预测平均行高大于第一预设数值,则说明此时检测出的第一文本行并不是真实证件中第一文本区域的一行,而是产生误检之后,将两行合并为一行的结果,例如,假设真实身份证第一文本区域有四行,而实际检测出三行,中间一行的行高又恰巧接近第一平均行高,此时对于中间行基于第一行和第三行的第二行高初始预测行高对其进行修正;此时将第一文本行的行高置为第二预设数值即可,如果第二预测平均行高小于或等于第二预设数值,则将第一文本行的行高置为第二预测平均行高。
在一些实施例中,在确定第一文本行的目标预测行高之后,在保持第一文本行的下边界不动的条件下,基于所述第一文本行的目标预测行高,对所述第一文本行的初始预测位置对应的预测上边界进行调整,得到所述第一文本行的目标预测上边界。
在一些实施例中,步骤604包括:
响应于第一文本行的初始预测行高达到第一文本行的上一文本行和下一文本行的初始预测行高的第二预设倍数,确定第一文本行的初始预测行高异常;
基于第一文本行的上一文本行和下一文本行的初始预测行高,得到第一文本行的修正行高。
本公开实施例中,第一文本行为中间行,与其相邻的文本行包括上一文本行和下一文本行,当第一文本行的初始预测行高通过上述实施例中提出的与第一预测平均行高和下一文本行的初始预测行高无法确定是否异常时,可能出现的情况是该第一文本行的初始预测行高接近第一预测平均价行高,但大于下一行文本行的初始预测行高,此时,可通过第一文本行的初始预测行高与上一文本行和下一文本行的初始预测行高之间的关系确认是否是行数识别错误,错误的将两个文本行识别为一个第一文本行,当第一文本行的初始预测行高达到第一文本行的上一文本行和下一文本行的初始预测行高的第二预设倍数(例如,接近2倍等),可确认是行数识别错误,此时通过上一文本行和下一文本行的初始预测行高对第一文本行的行高进行修正;修正的过程包括:
对第一文本行的上一文本行和下一文本行的初始预测行高求平均,得到第三预测平均行高;
将第三预测平均行高作为第一文本行的目标预测行高。
获得目标预测行高的公式可以为:目标预测行高=(上一文本行高度+下一文本行高度)/2。在一些实施例中,还包括:基于第三平均行高和第一文本行的下边界确定第一文本行的上边界。即,第一文本行上边界=第一文本行下边界–目标预测行高。
在一些实施例中,在步骤504之后,还包括:
响应于第一文本行的修正行高大于或等于第二预设数值,将第一文本行的下一文本行的初始预测行高作为第一文本行的目标预测行高;和/或
响应于第一文本行的修正行高小于第三预设数值,将第一文本行的修正行高作为第一文本行的目标预测行高。
经过上述实施例中第一文本行的初始预测行高经过修正之后,还可能存在一种情况,即,修正后的行高仍然明显大于标准行高,例如,本公开实施例提供的经过修正的行高大于或等于第二预设数值(如,22像素),此时,说明第一文本行的行高仍有问题,在第一文本行不是第一行的情况下,以下一文本行的初始预测行高作为第一文本行的目标预测行高;当修正行高与标准行高较为接近,例如,本公开实施例中的修正行高小于第三预设数值,此时将修正行高作为第一文本行的目标预测行高。
在一些实施例中,步骤330包括:基于第一文本区域包含的多个文本行的目标预测位置对应的目标预测行高,对至少一个目标文本区域中第三文本区域的初始预测位置进行修正,得到第三文本区域的目标预测位置;基于所述第三文本区域的目标预测位置,得到所述第三文本区域的文本识别结果。
本公开实施例中第一文本区域中每个文本行的行高为经过修正后的目标预测行高,在一些实施例中,当第三文本区域(例如,身份证图像中的姓名字段)获得的初始预测行高不正常时(例如,大于设定行高或与设定行高的差值大于预设值等),在一些实施例中,基于第一文本区域包含的多个文本行的目标预测行高,确定第一文本区域的第三预测平均行高;基于第三预测平均行高和第三文本区域的初始预测位置对应的初始预测行高,对第三文本区域的初始预测位置进行修正,得到第三文本区域的最终预测位置。在本示例中,可基于第一文本区域的每一文本行的目标预测行高求平均获得第一文本区域的第三预测平均行高,以该平均行高对第三文本区域的行高进行修正,在一些实施例中,修正方法可以是将第三文本区域中文本行的行高替换为该第三预测平均行高。
在一些实施例中,读取第一文本区域文字检测的各行信息,如果各行行高正常,没有出现异常高度,则记录第一文本区域的平均行高,对第三文本区域中文本行的行高进行校正。校正规则可以包括:如果第三文本区域中文本行的行高–第一文本区域的第三预测平均行高>2像素(pixels),则将第三文本区域中文本行的行高修正为第一文本区域的第三预测平均行高。
在一些实施例中,证件包括身份证;和/或,第一文本区域包括地址区域。
在一个具体应用示例中,将本公开实施例提供的证件识别方法应用到对少数民族身份证的识别,图7为本公开实施例提供的证件识别方法的一个应用示例图。
步骤710,对少数民族身份证的证件图像进行关键点检测,获得少数民族身份证的24点的关键点的信息,该24点的关键点包括地址字段信息区域的左上角关键点和右下角关键点,该地址字段信息区域包括多个对应汉字的文本行。
步骤720,通过左上角关键点和右下角关键点确定地址字段信息区域,通过神经网络等手段识别获得地址字段信息区域中包括的文本行的行数和每个文本行的行高。
步骤730,判断各个文本行的行高是否正常(例如,与大数据统计的身份证行高的差值小于设定值),如果各个文本行的行高都正常,执行步骤750,否则,执行步骤740;
步骤740,如果识别获得的地址字段信息区域的文本行的数量大于等于3行且其中的一个或者多个文本行(通常为一个文本行)的高度异常,则对高度异常的文本行的高度进行修正,获得修正后的地址字段信息区域中文本行的平均行高。在一些实施例中,由于少数民族文字位于汉字上方,此时的修正方法只对前N-1行进行修正,不对最后一行进行修正,N表示地址字段信息区域包括的文本行的数量。
步骤750,记录地址字段信息区域中文本行的平均行高avg_h_addr,并对姓名字段信息区域的行高h_name进行校正。其中,校正规则为:如果h_name–avg_h_addr>2像素(pixels),则将姓名字段信息区域的行高h_name修正为地址字段的平均行高avg_h_addr。
步骤760,基于地址字段信息区域中文本行的平均行高对地址字段信息区域中每个文本行的汉字内容进行识别,获得少数民族身份证中的地址信息,基于校正后的姓名字段信息区域的行高对姓名字段信息区域中的汉字内容进行识别,获得少数民族身份证中的姓名信息,实现少数民族身份证的识别。
图8为本公开实施例提供的证件识别方法的另一个应用示例图。通过上述步骤740提供的行高修正方法对少数民族身份证中地址字段信息区域中多个文本行从上到下(例如,从第一行到第N-1行)依次进行修正操作,在一些实施例中,修正过程包括以下步骤。
步骤802,通过地址字段信息区域所在矩形框的上下边界以及行数得到,计算获得少数民族身份证中地址字段信息区域中文本行的平均行高;检测获得当前行的行高,以及下一行的行高。
步骤804,判断当前行的行高是否大于或等于下一行行高的1.2倍(为设定值,可根据不同情况进行设置),且大于或等于平均行高的1.2倍(为设定值,可根据不同情况进行设置),如果是,确定当前行的行高异常,执行步骤806,否则,执行步骤808。
步骤806,根据识别确定当前行的下边界,如果当前行的下边界与下一行的上边界有交集,则对当前行的下边界进行修正,以防止下一行的文字对当前行产生影响。此时,当前行的下边界=下一行的上边界–1pixel。再对当前行的行高进行修正,当前高度理论上等于去除当前行的行高之后其他行(地址字段中除了当前行的所有文本行)的行高的平均值new_h_avg_line,如果new_h_avg_line大于15pixels(为可选值,可通过大数据统计获得),则说明此时检测出的当前行并不是真是少数民族身份证地址字段的一行,而是产生误检之后,将两行合并为一行的结果,此时将当前行的行高置为15pixels即可,如果new_h_avg_line小于等于15pixels,则将当前行的行高置为new_h_avg_line,获得当前行的修正行高,执行步骤810。
步骤808,当检测到当前行的行高接近平均行高(例如,当前行的行高等于地址字段信息区域的高度除以行数),此时判断当前行的行高和与当前行的相邻两行的行高之间的高度差异,如果当前行的行高大于下一行的行高的1.8倍(为设定值,可根据不同情况进行设置)且大于上一行的行高的1.8倍,则对当前行的上下边界进行修正,修正公式为:当前行的修正行高=(上一行的行高+下一行的行高)/2,执行步骤810。
该步骤出现的情况可能对应的是如果真实少数民族身份证地址字段有四行,而实际检测出三行的情况。
步骤810,判断当前行的修正行高是否大与22pixels(为可选值,可通过大数据统计获得),如果是,执行步骤812,否则,将当前行的修正行高作为当前行的目标行高,执行步骤814。
步骤812,在当前行不是第一行的情况下,将下一行的行高作为当前行的目标行高,执行步骤814。
步骤814,对当前行的上边界进行修正。修正规则为:当前行上边界=当前行下边界–当前行的目标行高。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图9为本公开实施例提供的证件识别装置的结构示意图。该装置可用于实现本公开上述各方法实施例。如图9所示,该装置包括:
关键点检测单元91,用于对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息。
其中,多个关键点包括证件中第一文本区域的至少两个边界限定点,第一文本区域中包括多个对应于第一字符类型的文本行。
文本识别单元92,用于基于多个关键点的信息,确定证件的文本识别结果。
基于本公开上述实施例提供的一种证件识别装置,基于所述多个关键点的信息,确定所述证件的文本识别结果,通过增加第一文本区域的至少两个边界限定点,有利于提高对第一文本区域中多行文本的文本位置的识别准确率,减小了其他字符类型对第一字符类型的文本识别带来的影响,提高了对证件中第一字符类型内容的识别准确率。
在一些实施例中,证件还包括第二文本区域,其中,第二文本区域包括至少一个对应于不同于第一字符类型的第二字符类型的文本行,且第二文本区域与第一文本区域的文本内容相同。
在一些实施例中,第一字符类型为汉字,第二字符类型为少数民族文字。
在一个或多个实施例中,文本识别单元92,包括:
位置预测模块,用于基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的目标预测位置;
文本识别模块,用于基于所述第一文本区域包含的多个文本行中每个文本行的目标预测位置,对所述证件中包含的对应于所述第一字符类型的至少一个目标文本区域进行识别,获得所述证件的文本识别结果。
在一些实施例中,所述位置预测模块,用于基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的初始预测位置;确定所述多个文本行的初始预测位置是否存在异常;响应于确定所述多个文本行的初始预测位置存在异常,对所述第一文本区域包含的多个文本行的初始预测位置进行修正处理,获得所述多个文本行的目标预测位置。
在一些实施例中,所述位置预测模块,包括:
位置预测模块,用于响应于所述多个文本行中存在对应的初始预测行高大于第一预设行高的文本行,确定所述多个文本行的初始预测位置存在异常。
在一些实施例中,所述位置预测模块,包括:
位置预测模块,用于响应于确定所述多个文本行的初始预测位置存在异常,确定所述第一文本区域中初始预测行高存在异常的文本行;响应于确定所述第一文本区域中第一文本行的初始预测行高异常,对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的目标预测行高;基于所述第一文本行的目标预测行高对所述第一文本行的初始预测位置进行修正,得到所述第一文本行的目标预测位置。
在一些实施例中,所述位置预测模块,用于基于所述第一文本区域包括的多个文本行的第一预测平均行高和所述第一文本行的初始预测行高,确定所述多个文本行中除所述第一文本行之外的至少一个第二文本行的第二预测平均行高;基于所述第二预测平均行高,对所述第一文本行的初始预测行高进行修正。
在一些实施例中,所述位置预测模块,用于响应于所述第二预测平均行高超过第一预设数值,将所述第一文本行的行高修正为第二预设数值;和/或响应于所述第二预测平均行高小于或等于所述第二预设数值,将所述第一文本行的行高修正为所述第二预测平均行高。
在一些实施例中,所述位置预测模块,用于对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的修正行高;响应于所述第一文本行的修正行高大于或等于第二预设数值,将所述第一文本行的下一文本行的初始预测位置对应的初始预测行高作为所述第一文本行的目标预测行高,和/或响应于所述第一文本行的修正行高小于第三预设数值,所述第一文本行的修正行高作为所述第一文本行的目标预测行高。
在一些实施例中,所述位置预测模块,用于基于所述第一文本行的目标预测行高,对所述第一文本行的初始预测位置对应的预测上边界进行调整,得到所述第一文本行的目标预测上边界。
在一些实施例中,所述位置预测模块,用于基于所述第一文本区域中多个文本行的第一预测平均行高以及所述第一文本行的至少一个相邻行的初始预测位置对应的初始预测行高中的至少一项,确定所述第一文本行的初始预测行高是否异常。
在一些实施例中,所述位置预测模块,用于响应于所述第一文本行的初始预测行高达到所述第一预测平均行高的第一预设倍数,
和/或,
响应于所述第一文本行的初始预测行高达到所述第一文本行的至少一个相邻行的初始预测行高的第二预设倍数,
确定所述第一文本行的初始预测行高异常。
在一些实施例中,所述位置预测模块,还用于基于所述第一文本区域的至少两个边界限定点的信息以 及所述第一文本区域的预测行数,确定所述第一文本区域中多个文本行的第一预测平均行高。
在一些实施例中,所述文本识别模块,用于基于所述第一文本区域包含的多个文本行的目标预测位置对应的目标预测行高,对所述至少一个目标文本区域中第三文本区域的初始预测位置进行修正,得到所述第三文本区域的目标预测位置;基于所述第三文本区域的目标预测位置,得到所述第三文本区域的文本识别结果。
在一些实施例中,所述文本识别模块,用于基于所述第一文本区域包含的多个文本行的目标预测行高,确定所述第一文本区域中多个文本行的目标预测平均行高;
基于所述目标预测平均行高和所述第三文本区域中包括的第三文本行的初始预测位置对应的初始预测行高,对所述第三文本行的初始预测位置进行修正,得到所述第三文本行的最终预测位置。
在一些实施例中,所述证件包括身份证;和/或
所述第一文本区域包括地址字段信息区域。
根据本公开实施例的另一个方面,提供的一种电子设备,包括处理器,处理器包括本公开上述任一实施例的证件识别装置。
根据本公开实施例的另一个方面,提供的一种电子设备,包括:存储器,用于存储可执行指令;
以及处理器,用于与存储器通信以执行可执行指令从而完成本公开提供的证件识别方法上述任一实施例。
根据本公开实施例的另一个方面,提供的一种计算机存储介质,用于存储计算机可读取的指令,指令被处理器执行时,该处理器执行本公开提供的证件识别方法上述任一实施例。
根据本公开实施例的另一个方面,提供的一种计算机程序,包括计算机可读代码,当计算机可读代码在设备上运行时,设备中的处理器执行本公开提供的证件识别方法。
根据本公开实施例的再一个方面,提供的一种计算机程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任一可能的实现方式中所述的证件识别方法。
在一个或多个可选实施方式中,本公开实施例还提供了一种计算机程序程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任一实施例中所述的证件识别方法。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选例子中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选例子中,所述计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
根据本公开实施例还提供了另一种证件识别方法及其对应的装置和电子设备、计算机存储介质、计算机程序以及计算机程序产品,其中,该方法包括:对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息,其中,多个关键点包括证件中第一文本区域的至少两个边界限定点,第一文本区域中包括多个对应于第一字符类型的文本行;基于多个关键点的信息,确定证件的文本识别结果。
在一些实施例中,该目标跟踪指示可以具体为调用指令,第一装置可以通过调用的方式指示第二装置执行证件识别,相应地,响应于接收到调用指令,第二装置可以执行上述证件识别方法中的任意实施例中的步骤和/或流程。
应理解,本公开实施例中的“第一”、“第二”等术语仅仅是为了区分,而不应理解成对本公开实施例的限定。
还应理解,在本公开中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
还应理解,对于本公开中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
本公开实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图10,其示出了适于用来实现本公开实施例的终端设备或服务器的电子设备1000的结构示意图:如图10所示,电子设备1000包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)1001,和/或一个或多个图像处理器(GPU)1013等,处理器可以根据存储在只读存储器(ROM)1002中的可执行指令或者从存储部分1008加载到随机访问存储器(RAM)1003中的可执行指令而执行各种适当的动作和处理。通信部1012包括但不限于网卡,所述网卡包括但不限于IB(Infiniband)网卡。
处理器可与只读存储器1002和/或随机访问存储器1003中通信以执行可执行指令,通过总线1004与通信部1012相连、并经通信部1012与其他目标设备通信,从而完成本公开实施例提供的任一项方法对应的操作,例如,对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息,其中,多 个关键点包括证件中第一文本区域的至少两个边界限定点,第一文本区域中包括多个对应于第一字符类型的文本行;基于多个关键点的信息,确定证件的文本识别结果。
此外,在RAM 1003中,还可存储有装置操作所需的各种程序和数据。CPU1001、ROM1002以及RAM1003通过总线1004彼此相连。在有RAM1003的情况下,ROM1002为可选模块。RAM1003存储可执行指令,或在运行时向ROM1002中写入可执行指令,可执行指令使处理器1001执行上述通信方法对应的操作。输入/输出(I/O)接口1005也连接至总线1004。通信部1012可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1009。通信部分1009经由诸如因特网的网络执行通信处理。驱动器1010也根据需要连接至I/O接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
需要说明的,如图10所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图10的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本公开公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码包括对应执行本公开实施例提供的方法步骤对应的指令,例如,对证件图像进行关键点检测,获得证件图像中包括的证件的多个关键点的信息,其中,多个关键点包括证件中第一文本区域的至少两个边界限定点,第一文本区域中包括多个对应于第一字符类型的文本行;基于多个关键点的信息,确定证件的文本识别结果。在这样的实施例中,该计算机程序可以通过通信部分1009从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被中央处理单元(CPU)1001执行时,执行本公开的方法中限定的上述功能。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (37)

  1. 一种证件识别方法,包括:
    对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;
    基于所述多个关键点的信息,确定所述证件的文本识别结果。
  2. 根据权利要求1所述的方法,其中,所述证件还包括第二文本区域,其中,所述第二文本区域包括至少一个对应于不同于所述第一字符类型的第二字符类型的文本行,且所述第二文本区域与所述第一文本区域的文本内容相同。
  3. 根据权利要求2所述的方法,其中,所述第一字符类型为汉字,所述第二字符类型为少数民族文字。
  4. 根据权利要求1-3任一所述的方法,其中,所述基于所述多个关键点的信息,确定所述证件的文本识别结果,包括:
    基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的目标预测位置;
    基于所述第一文本区域包含的多个文本行中每个文本行的目标预测位置,对所述证件中包含的对应于所述第一字符类型的至少一个目标文本区域进行识别,获得所述证件的文本识别结果。
  5. 根据权利要求4所述的方法,其中,所述基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的目标预测位置,包括:
    基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的初始预测位置;
    确定所述多个文本行的初始预测位置是否存在异常;
    响应于确定所述多个文本行的初始预测位置存在异常,对所述第一文本区域包含的多个文本行的初始预测位置进行修正处理,获得所述多个文本行的目标预测位置。
  6. 根据权利要求5所述的方法,其中,所述确定所述多个文本行的初始预测位置是否存在异常,包括:
    响应于所述多个文本行中存在对应的初始预测行高大于第一预设行高的文本行,确定所述多个文本行的初始预测位置存在异常。
  7. 根据权利要求5或6所述的方法,其中,所述响应于确定所述多个文本行的初始预测位置存在异常,对所述第一文本区域包含的多个文本行的初始预测位置进行修正处理,获得所述多个文本行的目标预测位置,包括:
    响应于确定所述多个文本行的初始预测位置存在异常,确定所述第一文本区域中初始预测行高存在异常的文本行;
    响应于确定所述第一文本区域中第一文本行的初始预测行高异常,对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的目标预测行高;
    基于所述第一文本行的目标预测行高对所述第一文本行的初始预测位置进行修正,得到所述第一文本行的目标预测位置。
  8. 根据权利要求7所述的方法,其中,所述对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的目标预测行高,包括:
    基于所述第一文本区域包括的多个文本行的第一预测平均行高和所述第一文本行的初始预测行高,确定所述多个文本行中除所述第一文本行之外的至少一个第二文本行的第二预测平均行高;
    基于所述第二预测平均行高,对所述第一文本行的初始预测行高进行修正。
  9. 根据权利要求8所述的方法,其中,所述基于所述第二预测平均行高,对所述第一文本行的初始预测行高进行修正,包括:
    响应于所述第二预测平均行高超过第一预设数值,将所述第一文本行的行高修正为第二预设数值;和/或
    响应于所述第二预测平均行高小于或等于所述第二预设数值,将所述第一文本行的行高修正为所述第二预测平均行高。
  10. 根据权利要求7至9任一所述的方法,其中,所述对所述第一文本行的初始预测行高进行 修正,得到所述第一文本行的目标预测行高,包括:
    对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的修正行高;
    响应于所述第一文本行的修正行高大于或等于第二预设数值,将所述第一文本行的下一文本行的初始预测位置对应的初始预测行高作为所述第一文本行的目标预测行高,和/或
    响应于所述第一文本行的修正行高小于第三预设数值,将所述第一文本行的修正行高作为所述第一文本行的目标预测行高。
  11. 根据权利要求7至10任一所述的方法,其中,所述基于所述第一文本行的目标预测行高对所述第一文本行的初始预测位置进行修正,得到所述第一文本行的目标预测位置,包括:
    基于所述第一文本行的目标预测行高,对所述第一文本行的初始预测位置对应的预测上边界进行调整,得到所述第一文本行的目标预测上边界。
  12. 根据权利要求7至11任一项所述的方法,其中,所述确定所述第一文本区域中初始预测行高存在异常的文本行,包括:
    基于所述第一文本区域中多个文本行的第一预测平均行高以及所述第一文本行的至少一个相邻行的初始预测位置对应的初始预测行高中的至少一项,确定所述第一文本行的初始预测行高是否异常。
  13. 根据权利要求12所述的方法,其中,所述基于所述第一文本区域的第一预测平均行高以及所述第一文本行的至少一个相邻行的初始预测位置对应的初始预测行高中的至少一项,确定所述第一文本行的初始预测行高是否异常,包括:
    响应于所述第一文本行的初始预测行高达到所述第一预测平均行高的第一预设倍数,
    和/或,
    响应于所述第一文本行的初始预测行高达到所述第一文本行的至少一个相邻行的初始预测行高的第二预设倍数,
    确定所述第一文本行的初始预测行高异常。
  14. 根据权利要求12或13所述的方法,其中,还包括:
    基于所述第一文本区域的至少两个边界限定点的信息以及所述第一文本区域的预测行数,确定所述第一文本区域中多个文本行的第一预测平均行高。
  15. 根据权利要求4至14任一所述的方法,其中,所述基于所述第一文本区域包含的多个文本行中每个文本行的目标预测位置,对所述证件中包含的对应于所述第一字符类型的至少一个目标文本区域进行识别,包括:
    基于所述第一文本区域包含的多个文本行的目标预测位置对应的目标预测行高,对所述至少一个目标文本区域中第三文本区域的初始预测位置进行修正,得到所述第三文本区域的目标预测位置;
    基于所述第三文本区域的目标预测位置,得到所述第三文本区域的文本识别结果。
  16. 根据权利要求15所述的方法,其中,所述基于所述第一文本区域包含的多个文本行的目标预测位置对应的目标预测行高,对所述至少一个目标文本区域中第三文本区域的初始预测位置进行修正,得到所述第三文本区域的目标预测位置,包括:
    基于所述第一文本区域包含的多个文本行的目标预测行高,确定所述第一文本区域中多个文本行的目标预测平均行高;
    基于所述目标预测平均行高和所述第三文本区域中包括的第三文本行的初始预测位置对应的初始预测行高,对所述第三文本行的初始预测位置进行修正,得到所述第三文本区域行的最终预测位置。
  17. 根据权利要求1至16任一项所述的方法,其中,所述证件包括身份证;和/或
    所述第一文本区域包括地址字段信息区域。
  18. 一种证件识别装置,包括:
    关键点检测单元,用于对证件图像进行关键点检测,获得所述证件图像中包括的证件的多个关键点的信息,其中,所述多个关键点包括所述证件中第一文本区域的至少两个边界限定点,所述第一文本区域中包括多个对应于第一字符类型的文本行;
    文本识别单元,用于基于所述多个关键点的信息,确定所述证件的文本识别结果。
  19. 根据权利要求18所述的装置,其中,所述证件还包括第二文本区域,其中,所述第二文本区域包括至少一个对应于不同于所述第一字符类型的第二字符类型的文本行,且所述第二文本区域与所述第一文本区域的文本内容相同。
  20. 根据权利要求19所述的装置,其中,所述第一字符类型为汉字,所述第二字符类型为少数 民族文字。
  21. 根据权利要求19所述的装置,其中,所述文本识别单元,包括:
    位置预测模块,用于基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的目标预测位置;
    文本识别模块,用于基于所述第一文本区域包含的多个文本行中每个文本行的目标预测位置,对所述证件中包含的对应于所述第一字符类型的至少一个目标文本区域进行识别,获得所述证件的文本识别结果。
  22. 根据权利要求21所述的装置,其中,所述位置预测模块,用于基于所述第一文本区域的至少两个边界限定点的信息,确定所述第一文本区域包含的多个文本行中每个文本行的初始预测位置;确定所述多个文本行的初始预测位置是否存在异常;响应于确定所述多个文本行的初始预测位置存在异常,对所述第一文本区域包含的多个文本行的初始预测位置进行修正处理,获得所述多个文本行的目标预测位置。
  23. 根据权利要求22所述的装置,其中,所述位置预测模块用于响应于所述多个文本行中存在对应的初始预测行高大于第一预设行高的文本行,确定所述多个文本行的初始预测位置存在异常。
  24. 根据权利要求22或23所述的装置,其中,所述位置预测模用于:响应于确定所述多个文本行的初始预测位置存在异常,确定所述第一文本区域中初始预测行高存在异常的文本行;响应于确定所述第一文本区域中第一文本行的初始预测行高异常,对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的目标预测行高;基于所述第一文本行的目标预测行高对所述第一文本行的初始预测位置进行修正,得到所述第一文本行的目标预测位置。
  25. 根据权利要求24所述的装置,其中,所述位置预测模块,用于基于所述第一文本区域包括的多个文本行的第一预测平均行高和所述第一文本行的初始预测行高,确定所述多个文本行中除所述第一文本行之外的至少一个第二文本行的第二预测平均行高;基于所述第二预测平均行高,对所述第一文本行的初始预测行高进行修正。
  26. 根据权利要求25所述的装置,其中,所述位置预测模块,用于响应于所述第二预测平均行高超过第一预设数值,将所述第一文本行的行高修正为第二预设数值;和/或响应于所述第二预测平均行高小于或等于所述第二预设数值,将所述第一文本行的行高修正为所述第二预测平均行高。
  27. 根据权利要求24至26任一所述的装置,其中,所述位置预测模块,用于对所述第一文本行的初始预测行高进行修正,得到所述第一文本行的修正行高;响应于所述第一文本行的修正行高大于或等于第二预设数值,将所述第一文本行的下一文本行的初始预测位置对应的初始预测行高作为所述第一文本行的目标预测行高,和/或响应于所述第一文本行的修正行高小于第三预设数值,所述第一文本行的修正行高作为所述第一文本行的目标预测行高。
  28. 根据权利要求24至27任一所述的装置,其中,所述位置预测模块,用于基于所述第一文本行的目标预测行高,对所述第一文本行的初始预测位置对应的预测上边界进行调整,得到所述第一文本行的目标预测上边界。
  29. 根据权利要求24至28任一项所述的装置,其中,所述位置预测模块,用于基于所述第一文本区域中多个文本行的第一预测平均行高以及所述第一文本行的至少一个相邻行的初始预测位置对应的初始预测行高中的至少一项,确定所述第一文本行的初始预测行高是否异常。
  30. 根据权利要求29所述的装置,其中,所述位置预测模块,用于响应于所述第一文本行的初始预测行高达到所述第一预测平均行高的第一预设倍数,和/或,响应于所述第一文本行的初始预测行高达到所述第一文本行的至少一个相邻行的初始预测行高的第二预设倍数,确定所述第一文本行的初始预测行高异常。
  31. 根据权利要求29或30所述的装置,其中,所述位置预测模块,还用于基于所述第一文本区域的至少两个边界限定点的信息以及所述第一文本区域的预测行数,确定所述第一文本区域中多个文本行的第一预测平均行高。
  32. 根据权利要求20至31任一项所述的装置,其中,所述位置预测模块,用于基于所述第一文本区域包含的多个文本行的目标预测位置对应的目标预测行高,对所述至少一个目标文本区域中第三文本区域的初始预测位置进行修正,得到所述第三文本区域的目标预测位置;所述文本识别模块用于基于所述第三文本区域的目标预测位置,得到所述第三文本区域的文本识别结果。
  33. 根据权利要求32所述的装置,其中,所述位置预测模块,用于基于所述第一文本区域包含的多个文本行的目标预测行高,确定所述第一文本区域中多个文本行的目标预测平均行高;
    基于所述目标预测平均行高和所述第三文本区域中包括的第三文本行的初始预测位置对应的初 始预测行高,对所述第三文本行的初始预测位置进行修正,得到所述第三文本行的最终预测位置。
  34. 根据权利要求18-33任一所述的装置,其中,所述证件包括身份证;和/或
    所述第一文本区域包括地址字段信息区域。
  35. 一种电子设备,包括:存储器,用于存储可执行指令;
    以及处理器,用于与所述存储器通信以执行所述可执行指令从而完成权利要求1至17任一项所述证件识别方法的操作。
  36. 一种计算机可读存储介质,用于存储计算机可读取的指令,其中,所述指令被执行时执行权利要求1至17任一项所述证件识别方法的操作。
  37. 一种计算机程序产品,包括计算机可读代码,其中,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1至17任一项所述证件识别方法的指令。
PCT/CN2019/108209 2019-04-30 2019-09-26 证件识别方法和装置、电子设备、计算机可读存储介质 WO2020220575A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2020543760A JP7033208B2 (ja) 2019-04-30 2019-09-26 証明文書認識方法及び装置、電子機器並びにコンピュータ可読記憶媒体
SG11202007758TA SG11202007758TA (en) 2019-04-30 2019-09-26 Certificate recognition method and apparatus, electronic device, and computer-readable storage medium
KR1020207025083A KR102435365B1 (ko) 2019-04-30 2019-09-26 증명서 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능한 저장 매체
US16/991,533 US20200372248A1 (en) 2019-04-30 2020-08-12 Certificate recognition method and apparatus, electronic device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910362419.4 2019-04-30
CN201910362419.4A CN110321895A (zh) 2019-04-30 2019-04-30 证件识别方法和装置、电子设备、计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/991,533 Continuation US20200372248A1 (en) 2019-04-30 2020-08-12 Certificate recognition method and apparatus, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2020220575A1 true WO2020220575A1 (zh) 2020-11-05

Family

ID=68113412

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/108209 WO2020220575A1 (zh) 2019-04-30 2019-09-26 证件识别方法和装置、电子设备、计算机可读存储介质

Country Status (7)

Country Link
US (1) US20200372248A1 (zh)
JP (1) JP7033208B2 (zh)
KR (1) KR102435365B1 (zh)
CN (1) CN110321895A (zh)
SG (1) SG11202007758TA (zh)
TW (1) TW202042105A (zh)
WO (1) WO2020220575A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569839A (zh) * 2021-08-31 2021-10-29 重庆紫光华山智安科技有限公司 证件识别方法、系统、设备及介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126125B (zh) * 2019-10-15 2023-08-01 平安科技(深圳)有限公司 证件中的目标文本提取方法、装置、设备及可读存储介质
CN111191652A (zh) * 2019-12-20 2020-05-22 中国建设银行股份有限公司 一种证件图像识别方法、装置、电子设备及存储介质
CN111242083B (zh) * 2020-01-21 2024-01-26 腾讯云计算(北京)有限责任公司 基于人工智能的文本处理方法、装置、设备、介质
CN117912017A (zh) * 2020-02-17 2024-04-19 支付宝(杭州)信息技术有限公司 文本识别方法、装置及电子设备
CN111639648B (zh) * 2020-05-26 2023-09-19 浙江大华技术股份有限公司 证件识别方法、装置、计算设备和存储介质
CN112232336A (zh) * 2020-09-02 2021-01-15 深圳前海微众银行股份有限公司 一种证件识别方法、装置、设备及存储介质
KR102560051B1 (ko) * 2021-01-28 2023-07-27 네이버 주식회사 고차원 다항식 회귀를 이용한 문자열 검출 방법 및 시스템
CN113313114B (zh) * 2021-06-11 2023-06-30 北京百度网讯科技有限公司 证件信息获取方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040117192A1 (en) * 2001-06-01 2004-06-17 Siemens Ag System and method for reading addresses in more than one language
CN105809164A (zh) * 2016-03-11 2016-07-27 北京旷视科技有限公司 文字识别方法和装置
CN108229299A (zh) * 2017-10-31 2018-06-29 北京市商汤科技开发有限公司 证件的识别方法和装置、电子设备、计算机存储介质
CN109598272A (zh) * 2019-01-11 2019-04-09 北京字节跳动网络技术有限公司 字符行图像的识别方法、装置、设备及介质
CN109670480A (zh) * 2018-12-29 2019-04-23 深圳市丰巢科技有限公司 图像判别方法、装置、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751567B (zh) * 2008-12-12 2012-10-17 汉王科技股份有限公司 快速文本识别方法
US9798948B2 (en) * 2015-07-31 2017-10-24 Datalogic IP Tech, S.r.l. Optical character recognition localization tool
CN105426818B (zh) * 2015-10-30 2019-07-02 小米科技有限责任公司 区域提取方法及装置
CN106886777B (zh) * 2017-04-11 2020-06-09 深圳怡化电脑股份有限公司 一种字符边界确定方法及装置
JP6458239B1 (ja) * 2017-08-29 2019-01-30 株式会社マーケットヴィジョン 画像認識システム
CN109492643B (zh) * 2018-10-11 2023-12-19 平安科技(深圳)有限公司 基于ocr的证件识别方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040117192A1 (en) * 2001-06-01 2004-06-17 Siemens Ag System and method for reading addresses in more than one language
CN105809164A (zh) * 2016-03-11 2016-07-27 北京旷视科技有限公司 文字识别方法和装置
CN108229299A (zh) * 2017-10-31 2018-06-29 北京市商汤科技开发有限公司 证件的识别方法和装置、电子设备、计算机存储介质
CN109670480A (zh) * 2018-12-29 2019-04-23 深圳市丰巢科技有限公司 图像判别方法、装置、设备及存储介质
CN109598272A (zh) * 2019-01-11 2019-04-09 北京字节跳动网络技术有限公司 字符行图像的识别方法、装置、设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569839A (zh) * 2021-08-31 2021-10-29 重庆紫光华山智安科技有限公司 证件识别方法、系统、设备及介质
CN113569839B (zh) * 2021-08-31 2024-02-09 重庆紫光华山智安科技有限公司 证件识别方法、系统、设备及介质

Also Published As

Publication number Publication date
TW202042105A (zh) 2020-11-16
JP2021524948A (ja) 2021-09-16
KR20200128015A (ko) 2020-11-11
SG11202007758TA (en) 2020-12-30
CN110321895A (zh) 2019-10-11
JP7033208B2 (ja) 2022-03-09
KR102435365B1 (ko) 2022-08-23
US20200372248A1 (en) 2020-11-26

Similar Documents

Publication Publication Date Title
WO2020220575A1 (zh) 证件识别方法和装置、电子设备、计算机可读存储介质
US10977523B2 (en) Methods and apparatuses for identifying object category, and electronic devices
CN107798299B (zh) 票据信息识别方法、电子装置及可读存储介质
CN112016438B (zh) 一种基于图神经网络识别证件的方法及系统
US10296803B2 (en) Image display apparatus, image display method, and computer program product
JP6406932B2 (ja) 帳票認識装置及び方法
EP4099217A1 (en) Image processing model training method and apparatus, device, and storage medium
US20240012846A1 (en) Systems and methods for parsing log files using classification and a plurality of neural networks
JP4733577B2 (ja) 帳票認識装置及び帳票認識プログラム
CN111144400A (zh) 身份证信息的识别方法、装置、终端设备及存储介质
JP2019204417A (ja) 帳票認識システム
CN108734161B (zh) 冠字号区域的识别方法、装置、设备及存储介质
US20220415008A1 (en) Image box filtering for optical character recognition
CN112396047B (zh) 训练样本生成方法、装置、计算机设备和存储介质
US10796143B2 (en) Information processing apparatus, information processing system, and non-transitory computer readable medium
JP2021043775A (ja) 情報処理装置及びプログラム
CN107358718B (zh) 一种冠字号识别方法、装置、设备及存储介质
US9483834B1 (en) Object boundary detection in an image
US11611678B2 (en) Image processing apparatus and non-transitory computer readable medium
CN115964492A (zh) 文本知识抽取方法、装置、电子设备和可读存储介质
CN114299509A (zh) 一种获取信息的方法、装置、设备及介质
WO2020155484A1 (zh) 基于支持向量机的文字识别方法、装置和计算机设备
CN114549596A (zh) 一种图像校准方法、装置、电子设备及存储介质
JP7360660B1 (ja) 情報処理システム
US11704921B2 (en) Image processing apparatus, image processing method, and storage medium

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020543760

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927458

Country of ref document: EP

Kind code of ref document: A1