WO2023172190A1

WO2023172190A1 - Method and apparatus for accessing data in a plurality of machine readable medium

Info

Publication number: WO2023172190A1
Application number: PCT/SG2022/050120
Authority: WO
Inventors: Feng Yi Frances CHAN; Mariela Berrocal KRIEBEL; Pradyumna AGRAWAL; Nicholas FOO; Wei Han Tan
Original assignee: Affinidi Pte. Ltd.
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2023-09-14

Abstract

A method and an apparatus for accessing data in a plurality of machine readable mediums, the method comprising: receiving a machine readable medium as input; identifying a feature of the machine readable medium to determine a process to extract data contained in the machine readable medium; extracting data contained in the machine readable medium according to the determined process, wherein the data comprises one or more entity associated with the machine readable medium; verifying authenticity of the machine readable medium by matching the one or more entity with entity relationship data stored in a database; and upon successful authentication, mapping the extracted data to a predefined framework that assigns data fields in the extracted data to a pre-determined data field structure, wherein the entity relationship data comprises data indicative of relationship between more than one entities and more than one issuers of the machine readable medium.

Description

Method And Apparatus For Accessing Data in a Plurality of Machine Readable Medium

Field

The present invention relates to a method and an apparatus for accessing data in a plurality of machine readable medium. The machine readable medium may be a digitized document or a machine readable code containing contents of a digitized document.

Background

Currently, digitized documents come in a variety of formats. These formats vary in terms of form and presentation. Often, human intervention is required to process these documents against a reference or requirement. The requirement might be applicable in a scenario where documents are required for access control or authorization for a service. The activity is further complicated by the existence of a variety of digital presentation formats and forms that the documents come in, as well as the multitude of requirements that the documents have to be checked against. The presence of multiple methods of presenting a digital document also prevents an issuer (i.e. a party producing or distributing the document) from generating digitized documents which can be consumed by a wide audience. This limits the adoption of the digital documentation. This also prevents digital documents from being used in realtime or high-throughput environments where resources and time to process a document is limited.

Summary

According to an example of the present disclosure, there are provided a method and an apparatus as claimed in the independent claims. Some optional features are defined in the dependent claims.

Brief Description of the Drawings

Examples in the present disclosure will be better understood and readily apparent to one skilled in the art from the following written description, by way of example only and in conjunction with the drawings, in which:

FIG. 1 shows a system architecture of an example of the present disclosure.

FIG. 2 illustrates entity relationship data used by an example of the present disclosure.

FIG. 3 illustrates a system architecture of a specific use example of the present disclosure.

Detailed Description

At present, a method adopting a man-in-the-loop approach is typically used to process digitized documents. The method steps are as follows.

Step 1 : A user first submits documents to a document store.

Step 2: A back-office verification personnel then search through the document store, open a validation case and perform manual validation of the document in accordance with predetermined requirements.

Step 3: The user is then notified through some means (e.g. email) regarding the result of the validation process.

Machine Learning (ML)-based software recognition platforms may be used to attempt to automate the above method. This ML solution may comprise the following features: a) An ML recognition input stage is trained on pre-sighted document samples. b) The ML algorithm is run against user-submitted documents. c) Textual data is extracted by the ML algorithm and auto-populated in pre-defined fields. d) The fields are run through the necessary validation stages, as with a manual validation process.

Another solution may involve integration and/or Single Sign On (SSO) with existing data sources. This approach calls for system integration for a party interested in performing validation of user documents with trusted third-party data sources that one or more user has previously submitted said user documents or information to. Through the integration of systems, often through a single-sign-on process, a user is able to share information with interested parties for validation.

All of the above solutions are typically configured such that the digitized documents, including Quick Response codes, focus on a single use-case and are often meant to be read by a purpose built application.

In the present disclosure, the term “machine readable medium” refers to a digitized document. A machine readable medium may be a machine readable optical label containing information that is coded. For instance, the machine readable optical label may be a Quick Response (QR) code or other codes such as barcodes. The machine readable medium may also be a digitized document containing data in the form of text.

In a specific use case of an example of the present disclosure, the machine readable medium may contain data of, for example, a medical report of a subject (which can be an animal, including human being), and the medical report contains information indicative of whether the subject has contracted a disease. In more detail, the example may provide recognition and processing of digitized personal documents via means of a machine readable medium in the form of a loosely coupled Quick Response (QR) code for access control or service authorization purposes. The example may also provide cryptographically verifiable authenticity guarantees and provide automation to ensure deployability in high-volume and real-time environments.

Specifically, the example exploits the qualities of an industry standard Quick Response (QR) code to provide automated sharing, recognition, formatting and subsequent automated evaluation of a digital document.

With reference to FIG. 1 , the example may comprise an apparatus, which in this case is a server 104, to conduct processing and validation of a subject machine readable medium, which in this case is a QR code 1 10. A user device 102 comprising a code scanner is used to scan and read the QR code 1 10. The user device 102 is configured to communicate with the server 104 via a network 108 such the internet and/or a telecommunication network. The server 104 has access to a database 106 containing data such as entity relationship data to be used for validation purposes. The components comprising the apparatus 104, user device 102, network 108 and the database 106 may be regarded as part of a system 100. In this example, the QR code 1 10 contains data for the server 104 to validate that the QR code 1 10 being scanned at the user device 102 is indeed issued from one or more entity indicated in the data of the QR code 1 10. A user using the user device 102 may be called a verifier, whom has a purpose of verifying the authenticity of the QR code 1 10 and to determine a result from the data contained in the QR code 1 10.

Firstly, the QR code 1 10 containing data associated with a digitized document is ingested by (or inputted to) the server 104. This is done via scanning the QR code 1 10 using the user device 102 and sending data obtained from the QR code 1 10 from the user device 102 to the server 104. The scanning obtains data from the QR code 1 10 for processing by the user device 102 and/or the server 104.

In the present example, the data held in the QR code 1 10 do not have to be in any prescribed format, i.e. schema or format agnostic. However, a feature of the QR code 1 10, for example, the data structure or type of data contained within the QR code 1 10, needs to be known or made available to the server 104 beforehand so as to enable the server 104 to conduct processing on the data obtained from the QR code 1 10.

In the present example, to conduct processing, the server has to recognise the feature, which is the type of QR code 1 10, based on the data obtained from the QR code 1 10. This recognition can be performed at the user device 102 or at the server 104 through one or more of the 5 methods listed as follows: i) Recognition of prefixes, headers or content within the data of the QR code 1 10 (i.e. fingerprint verification) to allow the type of the QR code 1 10 to be identified; ii) Recognition of tags or logos within the data of the QR code 1 10 that can be queried; iii) Recognition of a uniform resource locator (url) encoded in the QR code 1 10; iv) Recognition of special characters or markings contained in the QR code 1 10; and/or v) Recognition of the overall data structure described by the data of the QR code 1 10.

If the recognition is performed at the user device 102, the identified type of QR code 1 10 has to be communicated to the server 104 from the user device 102 after the recognition.

Based on the recognition result, the server 104 determines a predetermined process or method specific to the identified type of the QR code 1 10 to use to obtain data from the QR code 1 10. According to the predetermined process or method, the server 104 may decode, decompress and/or transcribe the content held within the QR code 1 10, and perform processing based on the type of QR to extract relevant data from the QR code 1 10 that are necessary for a next action step, which is to validate the authenticity of the extracted data.

If the type of QR code 1 10 cannot be identified to determine the process specific to the QR code 1 10 to be used to extract data from the QR code 1 10, a parallel-brute forcing approach may be adopted. Specifically, the server 104 may proceed with one or more attempts to extract data contained in the QR code 1 10 by using a plurality of predetermined processes (known or made available to the server 104 beforehand) so as to identify one of the plurality of predetermined processes that is able to extract the data contained in the QR code 1 10. Checks will be performed by the server 104 to check whether the extracted data are as desired or make sense before the server 104 determines the process to be used.

Once the data is extracted using the identified process that is specific to extract data from the QR code 1 10, the server 104 performs validation of the authenticity of the extracted data. In this validation step, the server 104 may perform cryptographic checks to ensure that the QR code 1 10 is intact and authentic. These cryptographic checks may involve one or more of the following:

A) Authenticating an embedded signature (e.g. ECDSA) within the QR code 1 10 using a publicly published or pre-supplied public key made available to the server 104, thereby ascertaining that the QR code has been signed by the given trusted entity. The ECDSA (Elliptic Curve Digital Signature Algorithm) is a cryptographically secure digital signature scheme, based on the elliptic-curve cryptography (ECC);

B) Authenticating a signature or hash on a public blockchain, matching data of the signature or hash against data of the signature or hash associated with a public address of the blockchain accessible to the server 104, and subsequently matching domain data associated with the public address against data of a known trusted domain, thereby ascertaining that the QR code 1 10 was produced by a trusted entity that owns the domain name of the known trusted domain; and/or

C) Authenticating Transport Layer Security (TLS) certificates to ensure that the data described in the QR code originates from a trusted domain.

In the present disclosure, the hash mentioned above is a result of hashing, which refers to a concept of taking an arbitrary amount of input data, applying a predetermined algorithm to it, and generating a fixed-size output data called the hash (e.g. MD5 hashes). In the present disclosure, the input can be any number of bits that could represent a machine readable medium such as a document file, a QR code etc.

For example, an issuer of QR codes containing hashes generated by the issuer can be authenticated by obtaining matching these hashes with hashes obtained via other means. If the hashes obtained via other means do not match what were contained in the QR codes, the authentication of the identity of the issuer will then be unsuccessful. In the case that a blockchain is used in an issuer authentication process, a match in the hash obtained from the QR code and a hash associated with the issuer that is obtained from the blockchain can indicate successful authentication of the identity of the issuer.

With regard to the “embedded signature (e.g. ECDSA)” or “signature” mentioned above, each of them refers to a digital signature, which like real signatures, provides a way to prove that some party is who it indicates it is, except that cryptography or math is used, which is more secure than handwritten signatures that can be easily forged. A digital signature is a way to prove that a machine readable medium originates from an authorised issuer and no one else.

In one example, asymmetric encryption technique may be used and applied to the system 100 of FIG.

1 . An issuer of a QR code may generate a key pair, which is a public key and a private key using a predetermined algorithm. The keys are related such that data encrypted with the public key can only be decrypted using the corresponding private key. Similarly, the data encrypted with the private key can only be decrypted using the corresponding public key. The public key can be made available to the server 104. The private key is meant to be kept secret and the issuer may sign a digital signature contained in the QR code using the private key. The server 104 may obtain the digital signature in the QR code using the public key.

The server 104 may check the digital signature in the QR code 1 10 by matching it against a digital signature of the issuer pre-supplied to the server 104, or a digital signature of the issuer obtained via other means (e.g. request from another server, obtained from a blockchain etc.). A match will successfully authenticate the issuer i.e. indicate that the issuer is an authorised issuer of the QR code 1 10.

In one example, the issuer of the QR code 1 10 may digitally sign using its private key every transaction executed on a blockchain. Hence, with the public key, the server 104 can obtain from the blockchain the digital signature of the issuer. The server may use a public blockchain address in the process of obtaining the digital signature of the issuer from the blockchain. Thereafter, the server 104 can match the digital signature in the QR code 1 10 with the digital signature of the issuer obtained from the blockchain. If there is a match, it is indication that the identity of the issuer is authentic.

In addition to the above-mentioned signature and/or hash checks, further checks on trusted domain may be conducted. For example, matching domain data associated with a public blockchain address that is obtained after processing data extracted from a QR code against data of a known trusted domain pre-supplied to the server 104 to ascertain that the QR code 1 10 was produced by a trusted entity that owns the domain name of the known trusted domain. In another example, Transport Layer Security (TLS) certificates obtained after processing data extracted from a QR code may be authenticated to ensure that the data in the QR code originates from a trusted domain.

In addition, or as an alternative to the A), B) and/or C) checks described above, the server 104 of the present example will conduct a proof of provenance.

In one example of such proof of provenance, the server 104 checks whether a digital signature (which may be a public digital signature) held or described by the QR code 1 10 is associated with particular data fields extracted from the QR code 1 10. For example, the digital signature obtained from the QR code 1 10 indicates the identity of an issuer of the QR code 1 10, and a particular data field extracted from the QR code’s content indicates an entity that may or may not be associated with the issuer and requires checking to confirm. In other words, the QR code 1 10 provides data of an issuer of the QR code 1 10 and data of an entity that may be associated with the issuer. In this case, upon determining the identity of the issuer from the digital certificate, and the details of the entity that may be associated with the issuer, the server 104 checks these data against entity relationship data in the database 106. From the entity relationship check, the server 104 can determine whether the entity in the QR code’s content describes a particular entity that has contracted with (and thus is authorised by) the QR code issuer. The presence of such contractual relationship in the entity relationship data will be verified by the server 104 to be an existing and known contract. This would confirm that the QR code 1 10 is indeed authentic. In this manner, fraudulent data can be prevented from being supplied through unauthorised QR codes. The entity relationship data may be organised in a graph-based or hierarchical manner that allows complex entity contractual relationships to be described and verified in an automated manner by the server 104.

The checking of the entity relationship data may be performed as follows. FIG. 2 shows an example of what the entity relationship data may represent. For instance, an issuer 1 of a first type of QR code may authorise (or have a contract with) another entity 1 to distribute or create the QR code, and in turn the entity 1 may also authorise (or have a contract with) another entity 2 to distribute or create the QR code. Likewise, another issuer 2 of a second type of QR code may authorise (or have a contract with) the same entity 1 to distribute the QR code of the second type and the entity 1 also authorises (or have a contract with) the entity 2 to distribute the QR code of the second type. In a specific example, the content of a QR code may contain the digital signature of an issuer and details of an entity. This QR code can be authenticated by a first step of matching the issuer obtained from the QR code with the issuers 1 and 2 listed in the entity relationship data. If there is a match, for example, the issuer obtained from the QR code matches with issuer 2, the process moves on to check whether the entity obtained from the QR code matches with the entity 1 or 2 that are interlinked with the issuer 2. The matching step will be performed firstly for a first hop from the matched issuer (i.e. the first hop from the matched issuer 2 is entity 1 ) before moving on to check two hops from the matched issuer (i.e. two hops from the matched issuer 2 is entity 2). If the checking of more hops from the matched issuer is necessary, the process will move on to check further hops. In the present example, a match of the entity from the QR code with entity 1 or 2 in the entity relationship date of FIG. 2 will indicate successful authentication.

Referring back to FIG. 1 , in another example, the digital signature in the QR code 1 10 may be directed to only single entity to be verified, for instance, the single entity may be the issuer or an entity associated with the issuer. In this case, the server 104 simply checks whether the single entity is listed as an entity in the entity relationship data. If yes, the QR code 1 10 is authentic.

In yet another example, the server 104 may check whether an entity obtained from the QR code’s contents, and not obtained from a digital signature, matches an entity listed in the entity relationship data. If yes, then the QR code 1 10 is authentic.

After authentication of the QR code 1 10, the server 104 helps to map the extracted contents of the QR code 1 10 to a predefined framework that robustly assigns data fields in the contents of the QR code 1 10 to a pre-determined data field structure in accordance with predetermined requirements. For example, data of the extracted contents of the QR code 1 10 are processed by a framework in the form of a software application, which performs arrangement, categorisation, organisation and/or sorting of the data to assign data fields in the extracted contents of the QR code 1 10 to a predetermined data field structure according to user requirements. Such data field structure may be accessed and used, for instance, for the presentation of the arranged, categorised, organised and/or sorted data of the QR code 1 10 in a desired format, or to aid in further processing. The data field structure may, for example, be in a desirable format that is optimal for data presentation on a display (e.g. monitor or touchscreen), optimal for inputting to a rules engine to determine a result (or to calculate a result) from the inputted data field structure, optimal for input to a neural network system to provide accurate prediction, or optimal for other types of applications or purposes. For example, data in the QR code 1 10 may be originally arranged in a manner that is not organised for further steps of processing such as data matching, and may also contain irrelevant data that is not required for the further steps of processing. Hence, mapping the extracted data from the QR code 1 10 to the pre-determined data field structure will help to organised the data in the QR code 1 10 to aid the next processing steps.

With regard to the term “framework” above, it can refer to a template indicating what data fields in the QR code 1 10 should be extracted and where the data fields are to be assigned in the predetermined data field structure. The framework may also be a software component or application to facilitate the assignment of data fields in the construction of the predetermined field structure.

In one example, the QR code 1 10 may comprise data containing user credential that may for example, be in the form of W3C credential, which is a verifiable credential comprising a set of tamper- evident claims and metadata that cryptographically proves the party issuing the credential. The W3C credentials may be encoded or embedded in the QR code 1 10 and requires decoding to obtain the W3C credentials. The W3C credential can be said to be an example of the data field structure described above.

During the generation of the QR code 1 10, the W3C credential may be encoded by the following steps:

1 . Break the schema of the W3C credential into 2 groups of information: i) A template containing static information such as context, types and non-varying information (This template is an example of the framework described above); and ii) Varying data fields, e.g. signature, name, phone, etc.

2. Encapsulate only the varying data fields within the content of the QR code 1 10. That is, the QR code 1 10 only contains the varying fields. The template can be distributed publicly as it does not contain personal/private information.

When it is required, the complete W3C credential can be constructed by combining the template, which can be obtained publicly, with the data of the varying fields extracted from the QR code.

The same concept as that described above can be applied to other kinds of standard credential formats similar to W3C credential. In other examples, the user credentials may be in standard credential formats such as EU DCC (European Union Digital Covid Certificate), DIVOC (Digital Infrastructure for Vaccination Open Credentialing), SHC (Smart Health Cards), ICAO VDS (International Civil Aviation Organization Visible Digital Seal), and the like. The relevant software/hardware may be configured to handle the user credentials in one or more of these formats (including W3C), which may include detection, reading, writing, encoding and/or decoding of the user credentials.

After mapping the data extracted from the QR code 1 10 to the pre-determined data field structure, the data field structure can be subject to automatic matching by an internal or external rules engine, which determines a result after checking against predetermined validation requirements. In one example, the contents of the QR code 1 10 may be a medical report indicating that a person is fit for travel or fit for attending a particular event, and the validation requirements are used to determine from the QR code 1 10 whether the medical report is genuine or valid. For example, the medical report may indicate that a person is free from a particular disease. In a specific example, the disease may be an infectious disease such as Covid-19 and the user device 102 is used by an immigration officer of a territory to check whether a person travelling out of or to the territory has produced a genuine or valid medical report.

Examples of validation requirements in the Covid-19 example include checking the authenticity and/or validity of the negative Covid-19 test results shown in the medical report and/or vaccination credentials indicating that the person has been vaccinated against Covid-19.

In the case of Covid-19 test results, the validation requirements may include a check on one or more LOINC code in the medical report to see if it is listed in an approved list of LOINC codes. LOINC (Logical Observation Identifiers Names and Codes) is a clinical terminology that is important for laboratory test orders and results, and is one of a suite of designated standards for use by Government systems for the electronic exchange of clinical health information. The validation requirements may also include a check on one or more FHIR test result code in the medical report to see if it is listed in an approved list of FHIR test result codes. FHIR (Fast Healthcare Interoperability Resources) is a standard describing data formats and elements and an application programming interface for exchanging electronic health records. If a LOINC or FHIR code in the medical report is not approved, the medical report will be deemed as invalid.

Furthermore, the validation requirements may include a check on date of issuance of the medical report against a predetermined timeline acceptable to a territory. For example, a country may specify that a Covid-19 test result is only valid for a fixed time period and if the date of issuance of the medical report pre-dates the fixed time period, the medical report will be deemed as invalid.

In addition, the validation requirements may include a check on the territory in which the person conducted the Covid-19 test to determine whether the person has contracted the disease. If the territory in which the person conducted the Covid test is not listed as an acceptable territory, the medical report will be deemed as invalid.

The validation requirements may also include a check on whether the laboratory/test centre/clinic in a territory in which the person conducted the Covid-19 test is authorised or acceptable. If the laboratory/test centre is not listed as an authorised or acceptable, the medical report will be deemed as invalid.

In the case of Covid-19 vaccination indication in the medical report, the validation requirements may include similar checks conducted for the Covid-19 test checks described above, such as checking of any relevant codes/data against an approved list of codes/data, checking of date of vaccination, checking whether the territory in which the vaccination is administered is acceptable, and checking whether a laboratory/test centre/clinic in a territory in which the vaccination is administered is authorised or acceptable. Additionally, there may be other checks pertaining to the vaccination like number of doses taken for the vaccine. If any check fails, the vaccination may be deemed as invalid and/or require further verification.

All the above checks of the validation requirements may be configured as rules to be checked by the internal or external rules engine created to automate the checking process. Internal rules engine refers to a rules engine that checks against rules created for private consumption for user specified system or systems. External rules engine refers to a rules engine that checks against rules created for public consumption by any system. The steps starting from inputting the QR code 1 10 to extracting data in the QR code 1 10 and determining a result from the extracted data may be performed by a plug-in through the use of a pluginbased system implemented at the user device 102. Such plug-in may refer to a software component that adds a specific feature to an existing computer program. This plug-in based system can support multiple types of QR codes and multiple processes, wherein each process is specific to each type of QR code to extract data from the QR code, and each process handling each type of QR code may be a plug-in. In this manner, a multitude of different types of QR codes can be consumed by the system. This differs from existing QR code systems, in which each of them are built to support only 1 type of QR code and not a plurality of QR codes.

In summary, in a specific use case of an example of the present disclosure, there may be provided a system for checking validity of a digital certificate extracted from a machine readable medium that contains a medical report and details of a Laboratory that may be the issuer or distributor of the machine readable medium. The validation involves the one or more of the cryptographic checks A) to C) described above.

Thereafter, relationship of the Laboratory with issuers of medical reports is checked against a Laboratory Registry, which comprises entity relationship data of laboratories and issuers of medical reports.

Upon successful verification of the relationship between the Laboratory and an issuer in the Laboratory Registry, extracted data in the machine readable medium is checked against a Rules Engine to determine whether data in the machine readable medium passes predetermined rules used to determine whether the medical report is valid or not. If the result determined by the Rules Engine is a pass, the medical report is deemed to be valid.

FIG. 3 shows an example of a system architecture of the Covid-19 example described above. The components in FIG.3 with similar features as the components in FIG. 1 are given the same reference numerals as given in FIG. 1 . Specifically, there are present digital pre-departure tests (PDT) 1 10 of travellers seeking to depart a territory. Such PDTs may be in the form of the QR code 1 10 as described earlier. The PDT can comprise, for example, Covid-19 test results, vaccination certificates and/or other health certificates. These PDTs 1 10 may be issued by issuers 304, which can be clinics and/or healthcare institutions 302. The issuers 304 may also be other organizations affiliated to or associated with the clinics and/or healthcare institutions 302, for example, a laboratory that is working with the clinics and/or healthcare institutions 302. Hence, the issuer has to be checked against the entity relationship data of laboratories and issuers of the PDTs to ensure that the PDTs are issued by official sources.

In this example, there is provided a safe travel (ST) unifier system 300 comprising a unifier web application 102a. This web application 102a can be installed on the user device 102 (e.g. a device of a traveller) of FIG.1 described earlier. There is a universal verifier 104a present in system 300. This universal verifier 104a can be software and is a part of the web application 102a or a separate software from the web application 102a. The web application 102a can provide a user interface and interact with the universal verifier 104a to verify PDTs 1 10. The universal verifier 104a can also be the server 104 of FIG. 1 . Specifically, the universal verifier 104a can comprise software for traveller credential verification 306 and the software is configured to apply preconfigured rules 308 for the traveller credential verification. A safe travel unifier application programming interface (ST Unifier API) 310 may be created to facilitate interaction between the web application 102a and the universal verifier 104a, and traveller verification initiated by a user using the web application 102a to verify one or more digital PDT 1 10 inputted to the web application 102a. The one or more PDT 1 10 may be inputted to the web application 102a via image capturing of, for instance, optical machine readable medium like barcode or QR code, and then decoding the captured image of the optical machine readable medium to access and/or extract information about the one or more PDT 1 10. The access and/or extracted information are then verified using the universal verifier 104a based on the rules 308. During the verification process, a database may be used, such as a Global Health Care (HCI) Registry 106a containing entity relationship data of the issuers 304, and/or the clinics and/or healthcare institutions 302. This Global Health Care (HCI) Registry 106a is similar to the database 106 described earlier with reference to FIG.1 . A similar verification process as that described with reference to FIG.1 may be performed for travellers in the present example. In one example, the hardware of the system 300 may all be present on a single user device or separated like in the example of FIG.1 , wherein the server 104 is remotely accessible by the user device 102. The Registry 106a may be local data in the user device or an external database accessible remotely by the user device.

Details of examples of the user device 102 and/or the server 104 in FIG. 1 are as follows. Each of the user device 102 and the server 104 may have one or more of the following components in electronic communication via a bus:

1 . optionally a display for the server 104 but the user device 102 should have such display;

2. non-volatile memory;

3. random access memory ("RAM");

4. N processing components (“one or more processors”);

5. a transceiver component that includes N transceivers;

6. user controls i.e. user input devices;

7. optionally, image capturing components;

8. optionally, audio signal capturing components;

9. optionally, audio speakers; and

10. optionally, Input/Output interfaces for connecting to the user input devices (such as mouse, joystick, keyboard, sensors for detecting user gestures, and the like), the audio speakers, display, image capturing components and/or audio signal capturing components.

The display generally operates to provide a presentation of graphical content (e.g. graphical contents of the mobile device software, the one or more links, announcements and/or advertisements herein described) to a user, and may be realized by any of a variety of displays (e.g., CRT, LCD, HDMI, micro-projector and OLED displays). And in general, the non-volatile memory functions to store (e.g., persistently store) data and executable code, instructions, program or software including code, instructions, program or software that is associated with the functional components of a browser component and applications. In some examples, the code, instructions, program or software may include bootloader code, modem software, operating system code, file system code, and code to facilitate the operation of the apparatus as well as other components well known to those of ordinary skill in the art that are not depicted for simplicity.

In many implementations, the non-volatile memory is realized by flash memory (e.g., NAND or NOR memory), but it is certainly contemplated that other memory types may be utilized as well. Although it may be possible to execute the code from the non-volatile memory, the executable code in the non-volatile memory is typically loaded into RAM and executed by one or more of the N processing components.

The N processing components (or “one or more processors”) in connection with RAM generally operate to execute the instructions stored in non-volatile memory to effectuate the functional components. As one skilled in the art (including ordinarily skilled) will appreciate, the N processing components may include a video processor, modem processor, DSP, graphics processing unit (GPU), and other processing components. In some implementations, the processing components are configured to determine a type of software activated on the apparatus.

The transceiver component may include N transceiver chains, which may be used for communicating with external devices via wireless networks. Each of the N transceiver chains may represent a transceiver associated with a particular communication scheme. For example, each transceiver may correspond to protocols that are specific to local area networks, cellular networks (e.g., a CDMA network, a GPRS network, a UMTS networks, 5G network etc.), and other types of communication networks. In some implementations, the communication of the transceiver component with communication networks enables a location of the apparatus to be determined.

The image capturing components and the audio signal capturing components that are optionally available can also be utilised to input user controls, as defined control settings.

The processing of the executable code, instructions, program or software may be performed in parallel and/or sequentially. The executable code, instructions, program or software may be stored on any machine or computer readable medium that may be non-transitory in nature. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, QR code, barcode, or other storage devices suitable for interfacing with a computer or mobile device. The machine or computer readable medium may also include a hard-wired medium, or wireless medium. The executable code, instructions, program or software when loaded and executed effectively results in the apparatus that implements steps of methods in examples herein described.

In communications and computing a machine-readable medium, or computer-readable medium, is a medium capable of storing data in a format readable by a mechanical device (rather than human readable). Examples of machine-readable media include magnetic media such as magnetic disks, cards, tapes, and drums, punched cards and paper tapes, optical discs, QR codes, barcodes and magnetic ink characters. Common machine-readable technologies include magnetic recording, processing waveforms, and barcodes. Optical character recognition (OCR) can be used to enable machines to read information available to humans.

Examples of the present disclosure can provide an advantage to enable different types of digital documents (e.g. QR code, text document, or other types of documents) to be accessed securely in realtime or high-throughput environments where resources and time to process a document are limited.

Further advantages of the examples in the present disclosure may be described as follows.

A document validation system that can consume existing data presentation formats (e.g. QR code can be one of such data presentation format and a text document can be another format) to carry out automated validation process for each data presentation format can be provided. Such system may comprise the user device 102, the server 104 and the database 106 of FIG. 1 . The document validation process of the system is fully automated, including the following aspects:

- Authenticity checks against claims of data provenance provided in an inputted document to a user device.

- Relational data validity checks at a server. For example, to check the existence of a relationship between data of an issuer and data of an entity associated with the issuer data, wherein the data of the issuer and the data of the entity are extracted from the inputted document after determining a process specific to the inputted document to extract the data.

- Validation of extracted data against validation requirements to, for example, produce a status tied to the contents of the document. For example, a health status of a person.

No action is required on the part of the document issuer (for example, a laboratory issuing the medical report) to enable the validation system to read and consume digital documents. No integration action is required between two systems, i.e. the system of a document issuer and the system used to validate the document. No direct data exchange is required between the document issuer’s system and the validation system used to validate the document.

User consent is captured via means of presentation of document at a point of validation as no one else has the document. This means that a user presenting a document (such as a medical report in the form of QR code or otherwise) for validation by the system is indication that user consent is given for the validation system to access private data that may be contained in the document. As such, no additional system feature is required to store user consent data for checking to see if a user has authorised access to private data in the document.

Another advantage is that the validation system is extensible to support multiple use cases (e.g. can be used for validation for many kinds of applications) and digital presentation methods (e.g. support different types of QR codes and/or other document formats/types).

It should be noted that the method of data ingestion (or input) described in the examples of the present disclosure is not limited to ingesting just QR codes. Other types of machine readable mediums such as text files, barcodes, image files, or other types of documents can also apply.

In one example, the data that could be ingested can be an actual document with all necessary data presented in the document as human readable text. Specifically, a medical report (document) in human readable text instead of a QR code may be presented for scanning to capture an image of the report. Next, the system may identify specific headers, tags, logos, and/or other visual clues to determine a process specific to the medical report to extract text from the report. Thereafter, the image may be subjected to, for example, Optical Character Recognition to extract the required text from the document. Once the text is extracted, authenticity of an entity stated in the document will be checked against entity relationship data. Upon successful authentication, the system extracts data in the document and map them into a framework with a predetermined data field structure. Such data field structure may then be inputted to a rules engine to determine, for instance, a medical status and the determined medical status can be presented on a display.

Examples of the present disclosure may have the following features. A method for accessing data in a plurality of machine readable mediums (e.g. 1 10 in Figure 1 ), the method comprising: receiving a machine readable medium as input (e.g. 1 10 in Figure 1 ); identifying a feature of the machine readable medium to determine a process specific to the machine readable medium to extract data contained in the machine readable medium; extracting data contained in the machine readable medium according to the determined process if the feature is identified, wherein the data comprises one or more entity associated with the machine readable medium; verifying authenticity of the machine readable medium by matching the one or more entity with entity relationship data stored in a database (e.g. 106 in Figure 1 ), wherein successful authentication occurs when the one or more entity in the extracted data matches with data recorded in the entity relationship data; and upon successful authentication, mapping the extracted data to a predefined framework that assigns data fields in the extracted data to a pre-determined data field structure, wherein the entity relationship data comprises data indicative of relationship between more than one entities and more than one issuers of the machine readable medium.

The one or more entity may comprise an issuer of the inputted machine readable medium and an entity associated with the issuer, and successful authentication occurs when data indicative of relationship between the issuer of the inputted machine readable medium and entity associated with the issuer is recorded in the entity relationship data.

The pre-determined data field structure may be subjected to automatic matching by a rules engine, which determines a result after checking against predetermined validation requirements.

The feature of the machine readable medium to identify may indicate type and/or format of the machine readable medium, and the process to extract data from the machine readable medium may be determined according to the type and/or format of the machine readable medium.

The machine readable medium may be a machine readable optical label containing information that is coded.

The machine readable medium may be a document containing data in the form of text.

The machine readable optical label may be a Quick Response (QR) code.

The inputted machine readable medium may contain data of a medical report of a subject indicative of whether the subject has contracted a disease.

The feature may be identified through recognition of a predetermined prefix or header in the data of the machine readable medium.

The feature may be identified through recognition of predetermined tag or logo in the data of the machine readable medium.

The feature may be identified through recognition of predetermined uniform resource locator in the data of the machine readable medium.

The feature may be identified through recognition of predetermined character or marking in the data of the machine readable medium.

The feature may be identified through recognition of predetermined data structure in the data of the machine readable medium.

If the feature cannot be identified to determine the process specific to the machine readable medium, the method may comprise extracting data contained in the machine readable medium by using more than one predetermined processes so as to identify one of the more than one predetermined processes that is able to extract the data. The determined process specific to the machine readable medium to extract data contained in the machine readable medium may comprise: decoding, decompressing and/or transcription of the data of the machine readable medium.

The method may comprise: authenticating a signature in the data of the machine readable medium against a signature associated with a publicly published or pre-supplied public key thereby ascertaining that the machine readable medium has been signed by a trusted entity.

The method may comprise: authenticating a signature or hash in the data of the machine readable medium on a blockchain by: matching the signature or hash against signature or hash associated with an address of the blockchain; and matching domain data associated with the public address against data of a known trusted domain, thereby ascertaining that the machine readable medium originates from a trusted entity that owns a domain name of the known trusted domain.

The method may comprise: authenticating one or more Transport Layer Security (TLS) certificate in the data of the machine readable medium to ensure that the data in the machine readable medium originates from a trusted domain.

The data field structure may comprise a user credential and the user credential is constructed by combining a template containing non-varying data of the user credential with varying data fields of the user credential obtained from the extracted data of the machine readable medium.

The user credential may be W3C credential.

An apparatus (e.g. 102 in Figure 1 ) for accessing data in a plurality of machine readable mediums, the apparatus comprises: a processor configured to execute instructions to operate the apparatus to: receive a machine readable medium as input; identify a feature of the machine readable medium to determine a process specific to the machine readable medium to extract data contained in the machine readable medium; extract data contained in the machine readable medium according to the determined process if the feature is identified, wherein the data comprises one or more entity associated with the machine readable medium; verify authenticity of the machine readable medium by matching the one or more entity with entity relationship data stored in a database (e.g. 106 in Figure 1 ), wherein successful authentication occurs when the one or more entity in the extracted data matches with data recorded in the entity relationship data; and upon successful authentication, map the extracted data to a predefined framework that assigns data fields in the extracted data to a pre-determined data field structure, wherein the entity relationship data comprises data indicative of relationship between more than one entities and more than one issuers of the machine readable medium.

The feature of the machine readable medium to identify may indicate type and/or format of the machine readable medium, and the process to extract data from the machine readable medium may be determined according to the type and/or format of the machine readable medium. The machine readable medium may be a machine readable optical label containing information that is coded.

The machine readable optical label may be a Quick Response (QR) code.

The apparatus may be operable to: if the feature cannot be identified to determine the process specific to the machine readable medium, extract data contained in the machine readable medium by using more than one predetermined processes so as to identify one process that is able to extract the information.

The apparatus may be operable to: decode, decompress and/or transcribe the data of the machine readable medium during performance of the determined process specific to the machine readable medium to extract data contained in the machine readable medium.

The apparatus may be operable to: authenticate a signature in the data of the machine readable medium against a signature associated with a publicly published or pre-supplied public key thereby ascertaining that the machine readable medium has been signed by a trusted entity.

The apparatus may be operable to: authenticate a signature or hash in the data of the machine readable medium on a blockchain by: matching the signature or hash against signature or hash associated with an address of the blockchain; and matching domain data associated with the public address against data of a known trusted domain, thereby ascertaining that the machine readable medium originates from a trusted entity that owns a domain name of the known trusted domain.

The apparatus may be operable to: authenticate one or more Transport Layer Security (TLS) certificate in the data of the machine readable medium to ensure that the data in the machine readable medium originates from a trusted domain.

The user credential may be W3C credential. In the specification and claims, unless the context clearly indicates otherwise, the term “comprising” has the non-exclusive meaning of the word, in the sense of “including at least” rather than the exclusive meaning in the sense of “consisting only of”. The same applies with corresponding grammatical changes to other forms of the word such as “comprise”, “comprises” and so on.

While the invention has been described in the present disclosure in connection with a number of embodiments and implementations, the invention is not so limited but covers various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims. Although features of the invention are expressed in certain combinations among the claims, it is contemplated that these features can be arranged in any combination and order.

Claims

1 . A method for accessing data in a plurality of machine readable mediums, the method comprising: receiving a machine readable medium as input; identifying a feature of the machine readable medium to determine a process specific to the machine readable medium to extract data contained in the machine readable medium; extracting data contained in the machine readable medium according to the determined process if the feature is identified, wherein the data comprises one or more entity associated with the machine readable medium; verifying authenticity of the machine readable medium by matching the one or more entity with entity relationship data stored in a database, wherein successful authentication occurs when the one or more entity in the extracted data matches with data recorded in the entity relationship data; and upon successful authentication, mapping the extracted data to a predefined framework that assigns data fields in the extracted data to a pre-determined data field structure, wherein the entity relationship data comprises data indicative of relationship between more than one entities and more than one issuers of the machine readable medium.

2. The method as claimed in claim 1 , wherein the one or more entity comprises an issuer of the inputted machine readable medium and an entity associated with the issuer, and successful authentication occurs when data indicative of relationship between the issuer of the inputted machine readable medium and entity associated with the issuer is recorded in the entity relationship data.

3. The method as claimed in claim 1 or 2, wherein the pre-determined data field structure is subject to automatic matching by a rules engine, which determines a result after checking against predetermined validation requirements.

4. The method as claimed in claim 1 , 2 or 3, wherein the feature of the machine readable medium to identify indicates type and/or format of the machine readable medium, and the process to extract data from the machine readable medium is determined according to the type and/or format of the machine readable medium.

5. The method as claimed in any one of the preceding claims, wherein the machine readable medium is a machine readable optical label containing information that is coded.

6. The method as claimed in any one of claims 1 to 4, wherein the machine readable medium is a document containing data in the form of text.

7. The method as claimed in claim 4, wherein the machine readable optical label is a Quick Response (QR) code.

8. The method as claimed in any one of the preceding claims, wherein the inputted machine readable medium contains data of a medical report of a subject indicative of whether the subject has contracted a disease.

9. The method as claimed in any one of the preceding claims, wherein the feature is identified through recognition of a predetermined prefix or header in the data of the machine readable medium.

10. The method as claimed in any one of the preceding claims, wherein the feature is identified through recognition of predetermined tag or logo in the data of the machine readable medium.

1 1 . The method as claimed in any one of the preceding claims, wherein the feature is identified through recognition of predetermined uniform resource locator in the data of the machine readable medium.

12. The method as claimed in any one of the preceding claims, wherein the feature is identified through recognition of predetermined character or marking in the data of the machine readable medium.

13. The method as claimed in any one of the preceding claims, wherein the feature is identified through recognition of predetermined data structure in the data of the machine readable medium.

14. The method as claimed in any one of the preceding claims, wherein if the feature cannot be identified to determine the process specific to the machine readable medium, extracting data contained in the machine readable medium by using more than one predetermined processes so as to identify one of the more than one predetermined processes that is able to extract the data.

15. The method as claimed in any one of the preceding claims, wherein the determined process specific to the machine readable medium to extract data contained in the machine readable medium comprises: decoding, decompressing and/or transcription of the data of the machine readable medium.

16. The method as claimed in any one of the preceding claims, wherein the method comprises: authenticating a signature in the data of the machine readable medium against a signature associated with a publicly published or pre-supplied public key thereby ascertaining that the machine readable medium has been signed by a trusted entity.

17. The method as claimed in any one of the preceding claims, wherein the method comprises: authenticating a signature or hash in the data of the machine readable medium on a blockchain by: matching the signature or hash against signature or hash associated with an address of the blockchain; and matching domain data associated with the public address against data of a known trusted domain, thereby ascertaining that the machine readable medium originates from a trusted entity that owns a domain name of the known trusted domain.

18. The method as claimed in any one of the preceding claims, wherein the method comprises: authenticating one or more Transport Layer Security (TLS) certificate in the data of the machine readable medium to ensure that the data in the machine readable medium originates from a trusted domain.

19. The method as claimed in any one of the preceding claims, wherein the data field structure comprises a user credential and the user credential is constructed by combining a template containing non-varying data of the user credential with varying data fields of the user credential obtained from the extracted data of the machine readable medium.

20. The method as claimed in claim 19, wherein the user credential is W3C credential.

21 . An apparatus for accessing data in a plurality of machine readable mediums, the apparatus comprises: a processor configured to execute instructions to operate the apparatus to: receive a machine readable medium as input; identify a feature of the machine readable medium to determine a process specific to the machine readable medium to extract data contained in the machine readable medium; extract data contained in the machine readable medium according to the determined process if the feature is identified, wherein the data comprises one or more entity associated with the machine readable medium; verify authenticity of the machine readable medium by matching the one or more entity with entity relationship data stored in a database, wherein successful authentication occurs when the one or more entity in the extracted data matches with data recorded in the entity relationship data; and upon successful authentication, map the extracted data to a predefined framework that assigns data fields in the extracted data to a pre-determined data field structure, wherein the entity relationship data comprises data indicative of relationship between more than one entities and more than one issuers of the machine readable medium.

22. The apparatus as claimed in claim 21 , wherein the one or more entity comprises an issuer of the inputted machine readable medium and an entity associated with the issuer, and successful authentication occurs when data indicative of relationship between the issuer of the inputted machine readable medium and entity associated with the issuer is recorded in the entity relationship data.

23. The apparatus as claimed in claim 21 or 22, wherein the pre-determined data field structure is configured to be subjected to automatic matching by a rules engine, which determines a result after checking against predetermined validation requirements.

24. The apparatus as claimed in claim 21 , 22 or 23, wherein the feature of the machine readable medium to identify indicates type and/or format of the machine readable medium, and the process to extract data from the machine readable medium is determined according to the type and/or format of the machine readable medium.

25. The apparatus as claimed in any one of claims 21 to 24, wherein the machine readable medium is a machine readable optical label containing information that is coded.

26. The apparatus as claimed in any one of claims 21 to 25, wherein the machine readable medium is a document containing data in the form of text.

27. The apparatus as claimed in claim 25, wherein the machine readable optical label is a Quick Response (QR) code.

28. The apparatus as claimed in any one of claims 21 to 27, wherein the inputted machine readable medium contains data of a medical report of a subject indicative of whether the subject has contracted a disease.

29. The apparatus as claimed in any one of claims 21 to 28, wherein the feature is identified through recognition of a predetermined prefix or header in the data of the machine readable medium.

30. The apparatus as claimed in any one of claims 21 to 29, wherein the feature is identifiable through recognition of predetermined tag or logo in the data of the machine readable medium.

31 . The apparatus as claimed in any one of claims 21 to 30, wherein the feature is identifiable through recognition of predetermined uniform resource locator in the data of the machine readable medium.

32. The apparatus as claimed in any one of claims 21 to 31 , wherein the feature is identifiable through recognition of predetermined character or marking in the data of the machine readable medium.

33. The apparatus as claimed in any one of claims 21 to 32, wherein the feature is identifiable through recognition of predetermined data structure in the data of the machine readable medium.

34. The apparatus as claimed in any one of claims 21 to 33, wherein the apparatus is operable to: if the feature cannot be identified to determine the process specific to the machine readable medium, extract data contained in the machine readable medium by using more than one predetermined processes so as to identify one process that is able to extract the information.

35. The apparatus as claimed in any one of claims 21 to 34, wherein the apparatus is operable to: decode, decompress and/or transcribe the data of the machine readable medium during performance of the determined process specific to the machine readable medium to extract data contained in the machine readable medium. The apparatus as claimed in any one of the claims 21 to 35, wherein the apparatus is operable to: authenticate a signature in the data of the machine readable medium against a signature associated with a publicly published or pre-supplied public key thereby ascertaining that the machine readable medium has been signed by a trusted entity. The apparatus as claimed in any one of the claims 21 to 36, wherein the apparatus is operable to: authenticate a signature or hash in the data of the machine readable medium on a blockchain by: matching the signature or hash against signature or hash associated with an address of the blockchain; and matching domain data associated with the public address against data of a known trusted domain, thereby ascertaining that the machine readable medium originates from a trusted entity that owns a domain name of the known trusted domain. The apparatus as claimed in any one of claims 21 to 37, wherein the apparatus is operable to: authenticate one or more Transport Layer Security (TLS) certificate in the data of the machine readable medium to ensure that the data in the machine readable medium originates from a trusted domain. The apparatus as claimed in any one of claims 21 to 38, wherein the data field structure comprises a user credential and the user credential is constructed by combining a template containing non-varying data of the user credential with varying data fields of the user credential obtained from the extracted data of the machine readable medium. The apparatus as claimed in claim 39, wherein the user credential is W3C credential.