WO2019101358A1

WO2019101358A1 - Extraction of identification data

Info

Publication number: WO2019101358A1
Application number: PCT/EP2018/000528
Authority: WO
Inventors: Curd Wallhäusser; René Kugler; Stephan HACKENBERG
Original assignee: Giesecke+Devrient Mobile Security Gmbh
Priority date: 2017-11-24
Filing date: 2018-11-22
Publication date: 2019-05-31
Also published as: EP3714379A1; DE102017010920A1

Abstract

The invention relates to a method for the automatic and remote extraction of identification data, which method is used, for example, in what is referred to as a video identification process. According to the invention, information can be fully automatically read from an identification document, even if the identification document is provided by means of a video stream. In further method steps, the provided information can then be verified. The invention further relates to a correspondingly designed system assembly and to a computer program product having control commands that implement the method or operate the system assembly.

Description

Extract badge data

The present invention is directed to a method for automatic and remote extraction of ID data, as used for example in a so-called video-Ident method. According to the invention, it is possible to read out information from a badge document fully automatically, even if the identity document is provided by means of a video stream. In further method steps, the information provided can then be verified. The present invention is also directed to a suitably configured system arrangement as well as to a computer program product having control instructions that implement the method and system arrangement, respectively.

US 2008/0091713 A1 shows a reading of characters from a video menu using text recognition, also known as Optical Character Recognition, OCR.

No. 7,689,613 B2 shows a readout of metadata from a picture frame, whereby an OCR algorithm is likewise used.

No. 7,639,387 B2 shows the preparation of metadata in the context of multimedia data.

According to conventional methods, telebanking is known, in the course of which a so-called video-ident method is known. In this case, the bank customer should be made possible that he does not have to search for a branch, but rather he can initiate a verification by means of a video transmission. For this purpose, at least the bank customer and optionally the bank branch uses a webcam, in which the bank customer shows a pass. Thus, a video stream, ie a moving picture, of authorization documents transmitted to the bank and the bank can then a certain Service if the bank customer has been positively verified. It must be ensured that the ID document can be safely checked even if the video quality varies.

Another problem with such remote authorization is that certain security features of a badge document are no longer recognizable from the image stream. For example, value documents or identity documents comprise security features which have a hologram or a watermark. Also, certain tilting effects can not be reproduced, which result in different images depending on the viewing angle of the document of alienation. Furthermore, diffractive elements are known which typically can not be transmitted by means of a video stream either. Since such conventional security features can not be used in remote authorization, there is a need for novel methods which can reliably read out identification data even if provided remotely by means of a video transmission.

In accordance with conventional methods, such verification is typically performed by a human user monitoring the image stream, ie the video, on the bank side and manually accepting identity data of the bank customer. This is not only labor-intensive, but it is also prone to error, as due to a delayed and unclear video transmission a falsification of the ID data is possible.

Thus, individual picture frames are typically segmented and, if necessary, these segments are not transmitted synchronously. Certain video algorithms do not provide a segment that encompasses the entire image, but carry only that information of the video image that is changing. Hereby it is possible that segments are not synchronized at the same time and thus, for example, a string is displayed incorrectly. While this is not noticeable in the case of video streaming in the entertainment sector, in the case of security-relevant applications, data can be corrupted.

Thus, especially in the application scenario, the verification of ID documents has special requirements, and existing video streaming methods can not simply be transmitted. In this case, it is disadvantageous that known methods, which for example apply an OCR algorithm to image data, are not so reliable that they can be used as part of a safety test.

It is thus an object of the present invention to provide a method for automatically and remotely extracting ID data which provides both securely verifiable data and moreover excludes human intervention such that the method can be performed fully automated and thereby human error be avoided. It is a further object of the present invention to propose a system arrangement accordingly set up, as well as a computer program product with control commands which implement the method or operate the system arrangement.

The object is achieved with the features of the independent patent claims. Further advantageous embodiments are specified in the dependent claims.

Accordingly, a method for automatically and remotely extracting badge data from an identity document is proposed, comprising providing a moving image of the badge of a document of identification. user, wherein a provision of a ID card specification together with at least one format specification of the ID document and recognition of alphanumeric characters of the ID document depending on the provided identification specification such that based on the For matetermgabe only those areas of the ID document are analyzed, which according to the provided ID card specification have alpha numeric characters.

The proposed method can be carried out fully automatically, with only a human user providing his identity document by means of a video stream. Thus, not only is a single image of the identification document provided, but typically a video stream is transmitted in the form of moving pictures, wherein a plurality of so-called frames are provided. Thus, a moving picture includes a plurality of individual pictures. These individual images in their entirety result in a moving picture or a video stream. In this case, it is not necessary that the individual frames or frames of the moving image are transmitted individually, but rather, they can also be transmitted in such a way that only individual segments of the overall image are transmitted to the person.

If, for example, a black background is present in a first frame and an identification document is in the foreground, the black background can not be retransmitted when the identification document is moved, and only the changing image information is retransmitted. How the moving pictures are transmitted accurately depends on the selected algorithm or the selected picture coding. In the present case, a moving picture refers to a video which preferably is transmitted by means of a communication network or is streamed. While in the context of the present invention a stream is preferred, a locally stored video can also be analyzed in a time-translated manner. Remote extraction is based on the fact that the moving images are transmitted by telecommunications technology and preferably transmitted by means of buffered data communication. Thus, there is no extraction of ID data such that a user holds his ID document and this is analyzed in the presence of the user or the physical presence of the ID document. Rather, it is an aspect of the present invention that the moving images of the badge document are provided by means of a data communication interface, and preferably the moving image is provided by means of video streaming. Thus, it may thus be a computer-implemented method which uses a data interface to receive the moving image of the identification document.

For example, an identity card, a passport or even an identity document issued by a company can be used as an identity document. All these identity documents typically have in common that they can be described by means of a Ausweispezifikati on. For example, in a staff ID, relevant information is always in the same place, such as name and date of birth. In general, therefore, different types of identity documents can be used, which, however, are identical to one another. Thus, a badge specification can describe either an identity card or a passport. In general, relevant information is contained in an identity card at a location other than a passport. In order to be able to consider this in advance, a passport specification is created provided which describes the identification document. Thus, the ID card specification provides information about where to provide certain information and if so how to interpret it.

The badge specification may be provided such that the service provider specifies which identity card to use and, for example, requires that only one passport be accepted. Then just just a ID card specification is provided, which describes the passport or short the passport. Optionally, it is also possible for the customer or the user to decide which identity document he would like to use. In this case, known methods, such as, for example, pattern matching, can be used which automatically detect whether the identity document is an identity card or a passport. Thus, the respective ID card specification can be selected automatically and the ID document can be flexibly analyzed. It is also possible for a company to issue its own identity document and thus also provide the ID card specification. Thus, it is possible for the company to design the ID card according to its own specifications and to specify by means of the ID card specification which information is to be provided at which position on the ID card.

Thus, the format specification describes specifications of the badge document and not only describes where which information is specified according to specification, but rather optionally can also specify how the individual pieces of information are to be interpreted. Thus formats are already known which indicate how to interpret a certain string. For example, different semantic information is ranked against each other, and a reader is arranged to extract and interpret these different pieces of information from the string. Such a format specification can provide that a name is always given in a first line and always a date of birth in a second line. Thus, for example, the format specification can be used so that if only one name is to be read, only that portion of the identification document that has the name is analyzed. If, for example, a date of birth is always stated in the middle of an identity document, the format specification can also specify this, and only the center of the identity document must be analyzed and non-relevant segments of the identity document need not be analyzed.

Hereby, technical considerations are taken into account, which take into account that analyzing an identity document is typically computer-intensive and often inaccurate in that it is not possible to ensure that the image quality corresponds to a desired image quality in the case of remote extraction. Thus, according to the invention, the disadvantage is overcome that an entire image must always be analyzed, although information can only be expected at a certain point. Thus, computing capacity is saved and the error rate is minimized. In particular, a human intervention is excluded, which would also be error prone. Known methods can therefore not be transferred to the extraction of ID data, since in this case it is typically not relevant that the transmitted information withstands a security check. Thus, it is particularly advantageous that the proposed method is specially geared to ID data, taking into account that ID documents always conform to a predetermined format. Since it is described by means of the ID specification, where to find what information, it is now also possible to perform a recognition of alpha-numeric characters. This can advantageously be done in such a way that only that area of the identity document is analyzed at which the desired alpha numeric characters are present according to the specification. Alphanumeric characters can be letters, characters, or a combination of both. In this case, a specific character set can be taken into account, which, for example, pretends that the alpha-numeric characters may also include umlauts. This makes it possible to extract the badge data and to extract metadata from the received image data. Thus, who the ID data made a review available and are then available as a string. Recognition of alpha-numeric characters can include algorithms that are also used in the context of OCR methods. For example, it is possible to perform so-called pattern matching such that the character set exists as images of the individual characters and these images are compared with the analyzed moving pictures. In this way, if there is sufficient agreement, it can be recognized that a moving picture or a frame thereof has an alpha-numeric character.

According to one aspect of the present invention, a validity information is provided and by means of comparing the validity information with the recognized alpha-numeric characters, a validity check of the identification document is performed. This has the advantage that as soon as the alpha-numeric characters are recognized, a user's authorization can take place. It can be specified here that the validity information specifies what the name of a user to be authorized is and can be used with the recognized characters. be the same. If the validity information is found on the identity document, the validity is positively verified. On the other hand, an expiration date of the identification document can also be read out and, if this is specified as a validity criterion, the identification document can be negatively verified because it has expired. This makes it possible to ensure, even by means of a remote provision of a moving image of identity documents, that the card is actually valid. According to a further aspect of the present invention, by means of a comparison of stored identification information with the presence of the identity document, it is automatically detected which identity card specification corresponds to the identity document. This has the advantage that the identity document does not have to be specified in advance, but rather the user can select whether he now holds his identity card or his passport. Since images of identity documents can be stored beforehand, it is now possible to determine automatically which identity card type is involved and, accordingly, to load the corresponding identity card specification. Thus, it is therefore recognized which identity document is kept, and it can also be recognized at which point or at which area an information must be on the identity document. It can be dynamically selected at runtime a de passport specification depending on the provided Ausweisdo document.

In accordance with another aspect of the present invention, the specification provides an indication of a machine-readable area. This has the advantage that it can be identified, which information is to be provided and thus only the relevant area of the ID card. must be analyzed. Background data or data of the exit document, which are not relevant, can thus be hidden and are not analyzed. It is ensured that computing capacity is not overly stressed. This also means that existing computing capacities can be focused on relevant areas. Thus, the provided moving picture can be analyzed at a higher frame rate. The frame rate, also referred to as frame rate, indicates how many frames are provided or analyzed per second. According to the invention, this can be a particularly large number of frames, as a subset of the provided moving image is analyzed in focus. Furthermore, a machine-readable area may be present as a so-called Machine Readable Zone, MRZ. For this purpose, standards can be used which specify exactly where on a data medium, for example on a travel document, certain information must be located. If deviations are detected on the provided identification document, the corresponding identification document can be negatively verified.

In accordance with another aspect of the present invention, the disclosure specification is provided in accordance with document 9303 of the International Civil Aviation Organization. This is also a standard, and predetermined badge specifications can be provided which are standard. Thus, inter alia, a falsification of an identity document can be detected. Document 9303 may be the standard of the International Civil Aviation Organization (ICAO) editor in the seventh edition, with ISBN 978-92-9249-790-3, but it is also possible that other standards Find application. According to a further aspect of the present invention, the selection specification provides an indication of a character set of the identification data of the identification document. This has the advantage that it can be specified in advance which alpha-numeric characters are to be expected at all, and thus the recognition of such characters can be optimized. For example, you can specify whether umlauts are to be expected or not. This increases the overall hit rate of character recognition.

According to a further aspect of the present invention, the validity information provides an indication of a valid checksum, a date, a character set, a value range, a data structure, a data coding and / or a validity criterion of the identity document. This has the advantage that the authenticity of the identity document can be checked, for which checksums are used, for example. Furthermore, it can be specified by means of the date specification whether the identity document is still valid or whether it has already expired. If there are unknown fonts, it can also be determined that it is a fake. In the data coding, it can be specified at which place a semantic information is to be located. Thus, it can be specified that one text field must be a name, and another text field must be an address. In this way, the identity document can also be analyzed semantically and information can be provided in advance as to what information is expected so that an identity document is positively identified.

According to a further aspect of the present invention, he knows of alphanumeric characters using an OCR algorithm. This has the advantage that known implementations In particular, a pattern matching can be used which is based on already known methods. This offers the overall advantage that already proven algorithms can be used in a special way in another application scenario.

According to another aspect of the present invention, the badge data is interpreted based on the badge specification. This has the advantage that not only character strings are compared, but much more information is considered in such a way that their semantic content can be evaluated. This is possible by specifying at what point which information is to be located, and this can ensure that a string in a name field is actually a name. Thus, certain lines or columns can be taken into account and it is determined that a date is either a date of birth or an expiration date. This makes it possible to verify the genuineness of the certificate document, and not just to compare strings. According to another aspect of the present invention, the moving image has a plurality of image frames, and the recognition of alpha-numeric characters is performed for a plurality of image frames. This has the advantage that the moving image can be segmented into individual frames and an identified information can be checked on the basis of further image frames. Since the same ID document is always maintained, it can be assumed that all image frames have the same name. If, for example, a picture frame is misinterpreted and a letter misrecognized, it can be recognized from the other picture frames that the individual picture frame was misinterpreted. For example, would 100 image frames be lysed and a string is recognized in 99 picture frames and another string is recognized in a picture frame, it can be determined that the string of the 99 picture frames has been correctly interpreted and a picture frame has been mis-evaluated.

According to a further aspect of the present invention, the number of image frames to be analyzed is selected as a function of a frame rate of the motion picture. This has the advantage that, if several image frames are available as needed, corresponding redundant Bildrah men can be sorted out. On the other hand, if the frame rate is particularly slow, it may be possible that all transmitted image frames are necessary for the analysis. If, for example, the moving picture is transmitted at 30 frames per second, ie 30 frames per second, it may be necessary to analyze every second frame picture, wherein a lower rate is selected from a frame rate of 60 picture frames. Thus, the computational intensity can also be scaled.

According to another aspect of the present invention, the moving image has a plurality of image frames, and with reference to an image portion of the identification document, orientation, exposure, focus, and / or other image parameters, a suitable image frame is selected for recognizing the alpha-numeric characters. This has the advantage that the individual image frames of the moving image can be analyzed and it can be determined whether they are suitable for extracting ID data. Thus it can be ensured that a correct image section of the identification document is present in the moving image and the identification document is also aligned. Furthermore, the exposure or focusing can be analyzed, and thus it can be estimated whether a character is likely to be recognized. Thus, a picture frame is always appropriate if it can be extracted with high accuracy Treffsi a character. If, for example, the moving image is too dark or too little illuminated, then it can be decided that the image frame is not suitable and thus another image frame is used to recognize the alpha numeric characters. For this purpose, the person skilled in the art knows corresponding threshold values which indicate whether a picture frame is suitable for analysis or not.

According to another aspect of the present invention, further optical security features are recognized in addition to alpha-numeric characters. This has the advantage that also image data can be included in the analysis and thus it can be ensured that in fact the card holder can be recognized on the moving image.

The object is also achieved by a system arrangement for automatic rule and remote extraction of ID data from a ID document, comprising a telecommunications interface, configured to provide a moving image of the ID document of a user, wherein a memory unit configured to provide a Ausweispe cification together with at least one format specification of the ID document and a recognition unit configured to recognize alpha

numerical character of the identity document is provided in dependence on the prepared Ausweispezifikation such that using the For matvorgabe only those areas of the identity document are analyzed, which have according to the provided Ausweispezifikation alpha numeric characters. The object is also achieved by a computer program product with control commands which execute the proposed method or operate the proposed system arrangement. According to the invention, it is particularly advantageous that the method provides procedural steps, which can also be provided by means of structural features of the system arrangement. Furthermore, the system design includes structural features whose functionality can also be mapped as method steps. The method is suitable for operating the proposed system arrangement and the system arrangement is set up accordingly to carry out the method.

Further advantageous embodiments are explained in more detail with reference to the accompanying Figu ren. Show it:

FIG. 1 is a schematic flow diagram of a conventional method for analyzing a video stream; FIG.

2 is a schematic flowchart of a method for automatically and remotely extracting badge data in accordance with an aspect of the present invention; and

3 shows a further schematic flowchart of the method according to the invention for extracting ID data.

1 shows a method according to the prior art in which a validity check is carried out by means of human intervention. In this case, in a first method step 10, a video stream is evaluated by a human, whether or not it contains an image that is accessible to OCR treatment is. If necessary, then follows a manual feedback. Overall, this requires a lot of human attention and is error-prone. In a subsequent method step 11, a generic OCR, ie no specialized OCR method as proposed according to the invention, is used, which does not use additional information about the text to be captured. This is very compute-intensive and therefore costly. So then in step 12, the validity of the Ausweisda th th.

A video stream (or camera viewfinder window) is observed by a human operator and judged whether to include an image suitable for capturing (that is, matching picture, orientation, exposure, focus, etc.). If so, the operator triggers a capture of the current image. The captured image is z. B. further processed by means of generic OCR and then automatically with respect to known constraints (eg., Checksums, validity of dates, defined structure of the name field, etc.) checked. Other processing steps, such. B. Checking optical security features ei Nes identification document, are conceivable. If the test was unsuccessful, the operation must be repeated by the operator.

Fig. 2, however, shows a flow chart according to the proposed invention and He provides in step 20, that a continuous Erfas solution of an image from a video stream is carried out. Then carried out in procedural step 21, a specialized OCR recognition method, which can be carried out on the one hand, although fast, but still relatively little computing intensive. In this case, a priori knowledge about MRZ field contents is used on the basis of a character position, ie a row or a column. In a subsequent method step 22, an automated gül tigkeitsprüfung. In this case, it is possible, after the execution of the procedural step 22, to move back to the method step 20 iteratively and to analyze another image frame. Analogously, the method can also terminate and the validity check is completed.

Fig. 3 also shows a schematic flow diagram of another aspect of the present invention. In particular, FIG. 3 shows a method for automatically and remotely extracting badge data from an identity document, comprising providing a moving image of a user's badge 100, providing 101 a selection specification including at least one format specification of the badge document, and recognizing 102 numerical characters of the Ausisdi documents in dependence on the provided selection specification such that are based on the format default only those areas of the identification document to be analyzed, which according to the prepared ten selection specification alpha-numeric characters have. Optionally, a validity information can be provided 101A and by means of a comparison 103 of the validity information with the recognized 102 al pha-numeric characters carried out a validity check of the identity document.

The person skilled in the art recognizes that the aforementioned method steps can be carried out iteratively and / or in a different order. Images of an optically readable data field (eg, MRZ area of an identity card or passport) are continuously taken from a video stream and processed with a special OCR algorithm. Ideally, the OCR algorithm works so fast that many detection cycles are done per second. The constraints known for the data (eg. Checksums, validity of dates, defined structure of the field of parameters, etc.) are used to automatically terminate the continuous identification process and to take over the data fields that are most likely recognized correctly. Although the special OCR algorithm is supposed to work very fast, it must have a relatively low recognition capability, since false attempts are very likely to be recognized due to the automatic checking of the secondary conditions. The special OCR algorithm uses as much as possible a priori knowledge about the content of the data fields (eg restrictions in the character set, validity of checksums, validity of dates). If the OCR

Capture after checking the constraints was successful, it can be assumed with high probability that the entire image is "good" (ie there is a matching image detail, orientation, exposure, focus, etc.), so it is to be stated that it is also suitable for other processing steps (eg checking optical security features).

Claims

Patent claims

A method for automatically and remotely extracting badge data from an identification document, comprising:

Providing (100) a moving image of the identification document of a user, characterized in that

- providing (101) a badge specification including at least one format specification of the badge document; and

a recognition (102) of alpha-numeric characters of the identity document is carried out in such a way that only those areas of the identity document are analyzed on the basis of the format specification, which have alpha-numeric characters in accordance with the provided identification.

2. The method according to claim 1, characterized in that a Gül activity information provided (101A) and by means of a Ver (103) of the validity information with the recognized (102) al pha-numeric characters a validity check of Ausweisdo document takes place.

3. The method according to claim 1 or 2, characterized in that by means of a comparison of stored Ausweispezifikationen with the moving image of the identity document is automatically detected which Ausweispezifikation the identity document ent speaks.

4. The method according to any one of the preceding claims, characterized in that the ID card specification provides an indication of a machine-readable area.

5. The method according to any one of the preceding claims, characterized in that the ID card specification according to document 9303 of the International Civil Aviation Organization is provided.

6. The method according to any one of the preceding claims, character- ized in that the ID card specification an indication of a

Provides a character set of the ID data of the ID document.

7. The method according to any one of claims 2 to 6, characterized in that the validity information provides an indication of a valid checksum, a date, a character set, a range of values, a data structure, a data encoding and / or a validity criterion of the identity document.

8. The method according to any one of the preceding claims, characterized in that the recognition (102) of alpha-numeric characters Chen using an OCR algorithm.

9. The method according to any one of the preceding claims, characterized in that the badge data are interpreted on the basis Ausweispezifikati on.

10. The method according to any one of the preceding claims, characterized in that the moving image has a plurality of image frames and recognizing (102) alpha-numeric characters for multiple picture frames.

11. Method according to claim 10, characterized in that the number of image frames to be analyzed is selected as a function of a frame rate of the moving image.

12. The method according to any one of the preceding claims, characterized in that the moving image has a plurality of image frames and with respect to an image section of the identity document, a Ori entierung, an exposure, a focusing and / or further image parameters a suitable image frame for detecting (102 ) of the alpha numeric characters.

13. The method according to any one of the preceding claims, characterized in that in addition to alpha-numeric characters further optical security features are detected.

14. A system arrangement for automatically and remotely extracting badge data from a badge document, comprising:

a telecommunication interface configured to provide (100) a moving image of the identification document of a user, characterized in that

a storage unit configured to provide (101) a passport specification including at least one format specification of the passport document; and

a recognition unit is provided for recognizing (102) al pha-numeric characters of the identity document in dependence on the provided identification specification such that on the basis of the format specification, only those areas of the identity document are analyzed which have alpha-numeric characters in accordance with the provided identification of the identification.

15. A computer program product having control instructions that perform the method according to any one of claims 1 to 13 when executed on a computer.