WO2019241913A1 - 数字身份标识生成方法、装置、系统及存储介质 - Google Patents

数字身份标识生成方法、装置、系统及存储介质 Download PDF

Info

Publication number
WO2019241913A1
WO2019241913A1 PCT/CN2018/091880 CN2018091880W WO2019241913A1 WO 2019241913 A1 WO2019241913 A1 WO 2019241913A1 CN 2018091880 W CN2018091880 W CN 2018091880W WO 2019241913 A1 WO2019241913 A1 WO 2019241913A1
Authority
WO
WIPO (PCT)
Prior art keywords
str
digital
digital code
information
target
Prior art date
Application number
PCT/CN2018/091880
Other languages
English (en)
French (fr)
Inventor
潘光明
杨梦�
赵宏德
唐强
周童
李航
李华平
Original Assignee
深圳华大基因科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司 filed Critical 深圳华大基因科技有限公司
Priority to PCT/CN2018/091880 priority Critical patent/WO2019241913A1/zh
Priority to EP18923623.5A priority patent/EP3812952A4/en
Publication of WO2019241913A1 publication Critical patent/WO2019241913A1/zh
Priority to US17/122,361 priority patent/US11822629B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/40Encryption of genetic data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, a device, a system, and a storage medium for generating a digital identity.
  • Traditional digital identities can be divided into two categories. One is the digital identity that needs to correspond to the real name of the physical identity, referred to as the physical digital identity.
  • the most typical applications include government-issued resident ID cards, personal digital certificates issued by banks, and digital certificates of legal persons.
  • the other type is a virtual digital identity that does not need to correspond to a physical identity, referred to as a virtual digital identity.
  • Common applications include user names of various network services.
  • the corresponding relationship between the physical digital identity and the physical identity is as follows:
  • the issue and management of the physical digital identity requires the real physical identity for related identification, that is, each digital identity corresponds to a real physical identity, and the physical identity generally passes through the biological of a natural person.
  • Information face features or fingerprint features).
  • the offline entity personal biometric identification technology that the entity's digital identity relies on in related technologies is weak, and the digital identity generation based on biometric identification technology has the following problems: in the application of digital identity based on facial feature recognition technology (such as ID number)
  • digital identity based on facial feature recognition technology such as ID number
  • the uniqueness of biometrics cannot be guaranteed to be 100% non-repeating, the reproducibility is high (easy to reconcile), the accuracy rate of face recognition by human or machine cannot reach 100%, and the expression effect of digital identity is not good.
  • This application is intended to solve at least one of the technical problems in the related technology.
  • an object of the present application is to propose a method for generating a digital identity, which can effectively improve the confidentiality and security of the generated digital identity, and enhance the expression effect of the digital identity.
  • Another object of the present application is to provide a digital identity generation device.
  • Another object of the present application is to propose a digital identity generation system.
  • Another object of the present application is to propose a non-transitory computer-readable storage medium.
  • Another object of the present application is to propose a computer program product.
  • the method for generating a digital identity includes: extracting a first preset number of short tandem repeat sequences STR from the whole genome data, and related information of each STR;
  • the related information of each STR generates a single STR digital code corresponding to each STR to obtain a plurality of single STR digital codes;
  • a preset rule is used to perform sequence transformation on each single STR digital code, and according to the sequence transformed
  • a single STR digital code generates a target STR digital code; generates summary information of the target STR digital code, and uses the summary information as summary information of the STR to which the target STR digital code belongs; and uses the summary information of the STR as generated Digital identity.
  • the method for generating a digital identity extracts a first preset number of short tandem repeat sequences STR from the whole genome data and the related information of each STR, and generates and A single STR digital code corresponding to each STR, to obtain multiple single STR digital codes, and use a preset rule to perform sequence conversion on each single STR digital code, and generate a target STR digital code based on the sequence-transformed single STR digital code, and Generate the summary information of the target STR digital code, and use the summary information as the summary information of the STR to which the target STR digital code belongs, and use the summary information of the STR as the generated digital identity because it is based on the short tandem repeat sequence STR in the whole genome data Generate a digital identity, making the generated digital identity unique and difficult to copy, and because of the digital encoding and sequence conversion of the STR, it can effectively improve the confidentiality and security of the generated digital identity, and enhance the digital identity Expression effect.
  • the digital identity generation device includes: a processor; a memory; an executable program code stored in the memory; and the processor runs by reading the executable program code stored in the memory.
  • a program corresponding to the executable program code for execution extracting a first preset number of short tandem repeat sequences STR from the whole genome data, and related information of each STR; and generating and The single STR digital encoding corresponding to each STR is obtained by obtaining multiple single STR digital encodings; a predetermined rule is used to perform sequence conversion on each single STR digital encoding, and a target STR number is generated according to the single STR digital encoding after the sequence conversion.
  • the digital identity generating device provided by the embodiment of the second aspect of the present application extracts a first preset number of short tandem repeat sequences STR from the whole genome data, and the related information of each STR, and generates an AND based on the related information of each STR.
  • a single STR digital code corresponding to each STR to obtain multiple single STR digital codes, and use a preset rule to perform sequence conversion on each single STR digital code, and generate a target STR digital code based on the sequence-transformed single STR digital code, and Generate the summary information of the target STR digital code, and use the summary information as the summary information of the STR to which the target STR digital code belongs, and use the summary information of the STR as the generated digital identity because it is based on the short tandem repeat sequence STR in the whole genome data
  • the digital identity generation system provided by the embodiment of the third aspect of the present application includes the digital identity generation device provided by the embodiment of the second aspect of the present application.
  • the digital identity generation system extracts a first preset number of short tandem repeat sequences STR from the whole genome data and related information of each STR, and generates A single STR digital code corresponding to each STR, to obtain multiple single STR digital codes, and use a preset rule to perform sequence conversion on each single STR digital code, and generate a target STR digital code based on the sequence-transformed single STR digital code, and Generate the summary information of the target STR digital code, and use the summary information as the summary information of the STR to which the target STR digital code belongs, and use the summary information of the STR as the generated digital identity because it is based on the short tandem repeat sequence STR in the whole genome data Generate a digital identity, making the generated digital identity unique and difficult to copy, and because of the digital encoding and sequence conversion of the STR, it can effectively improve the confidentiality and security of the generated digital identity, and enhance the digital identity Expression effect.
  • the non-transitory computer-readable storage medium provided by the embodiment of the fourth aspect of the present application has instructions stored therein, and when the instructions are executed by a processor of the electronic device, the processor executes the application The method for generating a digital identity according to the embodiment of the first aspect.
  • the non-transitory computer-readable storage medium provided by the embodiment of the fourth aspect of the present application extracts a first preset number of short tandem repeat sequences STR from the whole genome data, and the related information of each STR, according to the correlation of each STR.
  • the information generates a single STR digital code corresponding to each STR to obtain multiple single STR digital codes, and uses a preset rule to perform sequence conversion on each single STR digital code, and generates a target STR number according to the single STR digital code after the sequence conversion.
  • Repeated sequence STR generates digital identity, making the generated digital identity unique and difficult to copy, and because the STR is digitally encoded and sequenced, it can effectively improve the confidentiality and security of the generated digital identity, and improve The effect of digital identity.
  • the computer program product executeds a method for generating a digital identity when the instructions in the computer program product are executed by a processor, the method includes: The first preset number of short tandem repeat sequences STR and related information of each STR are extracted from the data; a single STR digital code corresponding to each STR is generated according to the related information of each STR to obtain multiple single STR numbers Encoding; using a preset rule to perform sequence transformation on each single STR digital code, and generating a target STR digital code according to the sequence-transformed single STR digital code; generating summary information of the target STR digital code, and The digest information is used as digest information of the STR to which the target STR digital code belongs; and the digest information of the STR is used as the generated digital identity.
  • the computer program product provided by the embodiment of the fifth aspect of the present application extracts a first preset number of short tandem repeat sequences STR from the whole genome data, and the related information of each STR, and generates the corresponding STRs according to the related information of each STR.
  • Corresponding single STR digital codes multiple single STR digital codes are obtained, and each single STR digital code is sequence-transformed using preset rules, and the target STR digital code is generated based on the sequence-transformed single-STR digital code, and the target is generated.
  • the STR digitally encoded summary information, and the summary information is used as the summary information of the STR to which the target STR digital code belongs, and the STR summary information is used as the generated digital identity.
  • the identity makes the generated digital identity unique and difficult to be copied, and because the STR is digitally encoded and sequenced, it can effectively improve the confidentiality and security of the generated digital identity and enhance the expression of the digital identity effect.
  • FIG. 1 is a schematic flowchart of a digital identity generation method according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a digital identity generation method according to another embodiment of the present application.
  • FIG. 4 is a schematic diagram of index information in an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a digital identity generation method according to another embodiment of the present application.
  • FIG. 6 is a schematic diagram of a digital identity ID database in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a digital identity generation device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a digital identity generation method according to an embodiment of the present application.
  • the digital identity generation method is configured as a digital identity generation device for illustration.
  • the method for generating a digital identity in this embodiment may be configured in a digital identity generating device.
  • the digital identity generating device may be set in a server, or may be set in an electronic device, which is not limited.
  • a method for generating a digital identity is configured in an electronic device as an example.
  • the digital identity is used to uniquely identify the identity information of a user, and the digital identity is generated based on the user's genomic data.
  • electronic devices such as smart phones, tablet computers, personal digital assistants, e-books and other hardware devices with various operating systems.
  • the execution subject of the embodiment of the present application may be, for example, a central processing unit (CPU) of an electronic device on hardware, and may generate a class application program for a digital identity in the electronic device on software. There are no restrictions on this.
  • Traditional digital identities can be divided into two categories. One is the digital identity corresponding to the real name of the physical identity, referred to as the physical digital identity.
  • the most typical applications include government-issued resident ID cards, personal digital certificates issued by banks, and digital certificates of legal persons
  • the other type is a virtual digital identity that does not need to correspond to a physical identity, referred to as a virtual digital identity.
  • Common applications include user names of various network services.
  • the corresponding relationship between the physical digital identity and the physical identity is as follows:
  • the issue and management of the physical digital identity requires the real physical identity for related identification, that is, each digital identity corresponds to a real physical identity, and the physical identity generally passes through the biological of a natural person.
  • Information face features or fingerprint features).
  • the offline entity personal biometric identification technology that the entity's digital identity relies on in related technologies is weak, and the digital identity generation based on biometric identification technology has the following problems: in the application of digital identity based on facial feature recognition technology (such as ID number)
  • digital identity based on facial feature recognition technology such as ID number
  • the uniqueness of biometrics cannot be guaranteed to be 100% non-repeating, the reproducibility is high (easy to reconcile), the accuracy rate of face recognition by human or machine cannot reach 100%, and the expression effect of digital identity is not good.
  • an embodiment of the present application provides a method for generating a digital identity, by extracting a first preset number of short tandem repeat sequences STR from the whole genome data, and related information of each STR, according to the STR's Relevant information generates a single STR digital code corresponding to each STR to obtain multiple single STR digital codes, and uses a preset rule to perform sequence transformation on each single STR digital code, and generates a target STR according to the sequence-transformed single STR digital code.
  • the serially repeated sequence STR generates a digital identity, making the generated digital identity unique and difficult to copy, and because the STR is digitally encoded and sequence-transformed, it can effectively improve the confidentiality and security of the generated digital identity. Improve the expression of digital identity.
  • the method includes:
  • S101 Extract a first preset number of short tandem repeat sequences STR from the whole genome data and related information of each STR.
  • the genome-wide data is the genome-wide data of a user who needs to generate a digital identity.
  • STR N short tandem repeats
  • the number of repeats of the base pair sequence contained in each STR is generally in the range of 1-K.
  • the current value of N is greater than 7000, and the value of K is about 100.
  • the number of repetitions of the repeated sequence in the STR at a specific position on the chromosome is fixed, and the number of repetitions at the same position may be different for different users. Polymorphism of STR.
  • a unique digital identity is generated.
  • the digital identity can be used to uniquely express the identity of a user, so that the generated The digital identity is unique and cannot be easily copied.
  • STR-based polymorphism generation provides a method that can generate a large number of different digital identities.
  • Each digital identity generated by the STR is unique to the user, and is not the same as the digital identities of other users. repeat.
  • the genome-wide data in the examples of the present application can be obtained from the published human genome standard sequence Hg19.
  • FIG. 2 is a schematic diagram of related information of each STR in the embodiment of this application.
  • the relevant information of each STR is displayed in the form of a document, each STR occupies a line, and the meanings of the contents from the horizontal to the front are as follows:
  • the termination site at the repeat is the termination site at the repeat.
  • extracting a first preset number of short tandem repeat sequences STR from the whole genome data includes:
  • S301 Extract a short tandem repeat STR having a polymorphism from the whole genome data and use it as an initial STR.
  • S302 Randomly extract a first preset number of different STRs from a plurality of initial STRs. Among the different first STRs, at least a second preset number of STRs is different, and the first preset number is different. Greater than the second preset number.
  • the first preset number and the second preset number may be preset by a user according to usage requirements, or may be preset by a factory program of the electronic device, which is not limited.
  • the first preset number can be represented by M, and the second preset number can be represented by J.
  • a short tandem repeat sequence STR with polymorphism can be extracted from the whole genome data and used as the initial STR, so that the user's entire genome is based on
  • the digital identity generated by the data can be traced back to the unique biological individual, which effectively prevents identity fraud.
  • the first preset number of different STRs randomly extracted each time can be used to generate a digital identity for the user. Random extraction, which can generate multiple different digital identities of the user, can generate a large number of digital identities, can ensure that individuals use different digital identities in each identification, for occasions with high privacy protection, you can Ensure that digital identities are not misused.
  • a first preset number of different STRs may be configured to have at least a second preset number of different STRs.
  • Set the number M to be approximately 50 and the second preset number J to be approximately 30.
  • M the cumulative coincidence rate of the digital identities of two different users is less than 10e -20 . Higher than other biometric technologies in related technology.
  • related information of M STRs can also be saved as index information. See FIG. 4, which is a schematic diagram of the index information in the embodiment of the present application, which facilitates the storage of related information of STRs. .
  • S102 Generate a single STR digital code corresponding to each STR according to the related information of each STR to obtain multiple single STR digital codes.
  • a single STR digital code may be used to indicate the position information, duplication of the corresponding STR in the whole genome data, The base pair sequence information and the number of repetitions of the repeated base pair sequence are shown in FIG. 5.
  • a single STR digital code corresponding to each STR is generated, and multiple single STR digital codes are obtained, which can include:
  • S501 Use the first number of bits to mark the position information of each STR in the whole genome data, and use the labeled first number of bits as the first number to encode.
  • the position information includes: chromosome number, starting position Point and STR fragment length.
  • the first number in the embodiment of the present application is configured as 53.
  • the bits of the first number mark the chromosome number, the starting position, and the fragment length of the STR.
  • the chromosome number may be binary using 5 bits.
  • the chromosome chr10 may be labeled with 00010, and the starting position may be binary labeled with 40 bits.
  • the length of the STR segment can be binary-tagged with 8 bits.
  • the length of the CCT segment repeat region is 23, and the left side can be filled with 0000 to complete the number of bits.
  • the minimum number of digits of the chromosome number is set to 5
  • the start position is set between 1-109
  • the lowest bit number of the start position is set to 40
  • the fragment length of the STR is set between 1-600
  • the lowest bit number of the fragment length is set to 8.
  • S502 Use the second number of bits to mark the repeated base pair sequence information of each STR, and use the labeled second number of bits as the second number to encode.
  • the repeat base pair sequence information includes the repeat base pair sequence content of the STR and the repeat base pair sequence length.
  • the second number in the embodiment of the present application is configured as 36, 36 bits are used to mark the content of the repeating base pair sequence of the STR, A is replaced by 100, G is replaced by 111, C is replaced by 110, and T is replaced by 101, and The left side is filled with 0000 completion digits, and the length of the repeated base pair sequence is between 2-12.
  • S503 Use the third number of bits to mark the number of repetitions of the repeated base pair sequence, and use the third bit of the label to encode the third number.
  • the third number in the embodiment of the present application is configured as 8, and 8 bits mark the number of repetitions of the repeating base pair sequence of the STR.
  • the number of repetitions is less than or equal to K, and the value of K is generally 50, that is, the number of repetitions is between Between 2 and 50, in the embodiment of the present application, redundancy exceeding the upper limit may be reserved, and the minimum number of bits is set to 8.
  • a repeating sequence of a corresponding site can be obtained through a program, and the number of repetitions can be determined.
  • S504 Perform concatenation processing on the first digital code, the second digital code, and the third digital code of each STR, and use the concatenated digital code as a single STR digital code corresponding to each STR.
  • the first digital code, the second digital code, and the third digital code may be concatenated to obtain a single STR digital code corresponding to the STR.
  • STRECD StartPosition
  • the generated single STR digital code can completely mark the relevant information of the STR, and by using the left-end padding 1010 sequence method, the single STR digital code STRECD is unified into an 8 * S bit encoding, which can facilitate subsequent software processing.
  • S103 Perform a sequence transformation on each single STR digital code by using a preset rule, and generate a target STR digital code according to the single STR digital code after the sequence transformation.
  • each single STR digital code is sequence-transformed by using a preset rule, so that the target STR digital code generated based on the sequence-transformed single-STR digital code is unique.
  • a preset rule is used to perform sequence conversion on each single STR digital code, and the single STR digital codes after the sequence conversion are directly concatenated to obtain the concatenated STR.
  • Digital encoding The serialized STR digital encoding can be referred to as the target STR digital encoding.
  • the target STR digital encoding corresponds to the first preset number of short tandem repeat sequences STR extracted in step S101.
  • the random extraction is performed twice, that is, a target STR digital code corresponding to the first preset number of short tandem repeat sequences STR extracted each time can be generated.
  • the M STRECDs obtained by the M STRs may be directly concatenated, and finally the target STR digital encoding MSTRECD of the M STRs is obtained, and the total number of bits is (8 * S) * M.
  • S104 Generate the digest information of the target STR digital code, and use the digest information as the digest information of the STR to which the target STR digital code belongs.
  • a hash algorithm may be used to generate the digest information of the target STR digital code, so that the target STR digital code corresponds to the unique digest information.
  • a hash algorithm may be used to perform summary calculation on the target STR digital code MSTRECD to obtain the calculation result, and generate a H-bit digest corresponding to the calculation result of F bits as the target STR digital code. Summary information.
  • the value of F may be greater than or equal to 256, and commonly used values are, for example, 256, 512, and 1024.
  • the hash algorithm includes, for example, SHA256 and SHA512.
  • the summary information of the target STR digital code is generated, and the summary information is used as the summary information of the STR to which the target STR digital code belongs.
  • the summary information is used as the summary information of the STR to which the target STR digital code belongs.
  • the summary information of the STR can be used as a digital identity of the user.
  • the digital identity can be used to uniquely identify the individual user. Based on the same genomic data, a large number of complementary and identical digital identity can be generated.
  • the value of the first preset number M can ensure that the digital identity generated by any two different users are different.
  • the value of M can be configured to be greater than or equal to 50.
  • the value of J is configured to ensure that the digital identity generated by the same user is different each time.
  • N has a value of 7000
  • M has a value of 50
  • J has a value of 2
  • the number of digital identity identifiers that can be generated according to the method in the embodiment of the present application is approximately equal to the following formula:
  • the result is on the order of 5 x e 150 .
  • digital identity generated according to the method in the embodiment of the present application can be used for identity identification in digital technology applications.
  • a typical application scenario is described as follows:
  • FIG. 6 is a schematic diagram of a digital identity ID library in an embodiment of the present application, which is different from an application scenario where a user's digital identity is fixed in a network application in the related art, and a user can use a different digital identity for application in a digital system each time Operation to improve the security of the user application digital system.
  • the used digital identity can be invalidated to prevent subsequent fraudulent use by other users.
  • Digital identity The digital identity generated according to the method in the embodiment of the present application has high security. The user can accurately verify the true correspondence between the number and the user by virtue of the whole genome data and the index information corresponding to the digital identity.
  • User identification in the blockchain system The user information in the blockchain system and a separate ID number are required for identification, and the digital identity generated according to the method in the embodiment of the present application is used as the user information ID each time. Can improve ID security.
  • Each transaction in the blockchain system needs a separate ID number to identify it.
  • the digital identity generated in accordance with the method in the embodiment of this application can be used as a transaction ID to enhance privacy. protection. Specifically, the transaction initiated by the same user uses its different digital identity as the transaction ID. After the transaction is stored on the chain, only the user can verify whether the transaction ID was initiated by the user's digital identity ID library.
  • a first preset number of short tandem repeat sequences STR and related information of each STR are extracted from the whole genome data, and a single STR digital code corresponding to each STR is generated according to the related information of each STR to obtain Multiple single STR digital codes, and using a preset rule to perform sequence transformation on each single STR digital code, and generate a target STR digital code according to the sequence-transformed single STR digital code, and generate summary information of the target STR digital code, and The summary information is used as the summary information of the STR to which the target STR digital code belongs, and the summary information of the STR is used as the generated digital identity.
  • the digital identity is generated based on the short tandem repeat sequence STR in the whole genome data, the generated digital identity
  • the identification is unique and cannot be easily copied, and because the STR is digitally encoded and serially transformed, it can effectively improve the confidentiality and security of the generated digital identity and enhance the expression effect of the digital identity.
  • Polymorphic short tandem repeat STR is extracted from the whole genome data and used as the initial STR.
  • the first preset number is greater than the second preset number.
  • the first preset number of different STRs can be configured to have at least the second preset number of STRs.
  • the first preset number M is about 50
  • the second preset number J is about 30. With the value of M being 50, the cumulative coincidence rate of the digital identities of two different users is lower than 10e -20 , which is much higher than other biometric technologies in related technologies.
  • FIG. 2 is a schematic diagram of related information of each STR in the embodiment of the present application.
  • the relevant information of each STR is displayed in the form of a document, each STR occupies a line, and the meanings of the contents from the horizontal to the front are as follows:
  • the termination site at the repeat is the termination site at the repeat.
  • the position information includes: Chromosome number, start site and fragment length of the STR.
  • the first number in the embodiment of the present application is configured as 53.
  • the bits of the first number mark the chromosome number, the starting position, and the fragment length of the STR.
  • the chromosome number may be binary using 5 bits.
  • the chromosome chr10 may be labeled with 00010, and the starting position may be binary labeled with 40 bits.
  • the length of the STR segment can be binary-tagged with 8 bits.
  • the length of the CCT segment repeat region is 23, and the left side can be filled with 0000 to complete the number of bits.
  • the minimum number of digits of the chromosome number is set to 5
  • the start position is set between 1-109
  • the lowest bit number of the start position is set to 40
  • the fragment length of the STR is set between 1-600
  • the lowest bit number of the fragment length is set to 8.
  • the repeat base pair sequence information includes the repeat base pair sequence content of the STR and the repeat base pair sequence length.
  • the second number in the embodiment of the present application is configured as 36, 36 bits are used to mark the content of the repeating base pair sequence of the STR, A is replaced by 100, G is replaced by 111, C is replaced by 110, and T is replaced by 101, and The left side is filled with 0000 completion digits, and the length of the repeated base pair sequence is between 2-12.
  • the third number of bits is used to mark the number of repetitions of the repeated base pair sequence, and the third bit of the labeled number is used as the third number to encode.
  • the third number in the embodiment of the present application is configured as 8, and 8 bits mark the number of repetitions of the repeating base pair sequence of the STR.
  • the number of repetitions is less than or equal to K, and the value of K is generally 50, that is, the number of repetitions is between Between 2 and 50, in the embodiment of the present application, redundancy exceeding the upper limit may be reserved, and the minimum number of bits is set to 8.
  • the first digital code, the second digital code, and the third digital code may be concatenated to obtain a single STR digital code corresponding to the STR.
  • STRECD StartPosition
  • the M STRECDs obtained by the M STRs may also be directly concatenated, and finally the target STR digital encoding MSTRECD of the M STRs is obtained, and the total number of bits is (8 * S) * M.
  • the value of the first preset number M can ensure that the digital identity generated by any two different users are different.
  • the value of M can be configured to be greater than or equal to 50.
  • the value of J is configured to ensure that the digital identity generated by the same user is different each time.
  • N has a value of 7000
  • M has a value of 50
  • J has a value of 2
  • the number of digital identity identifiers that can be generated according to the method in the embodiment of the present application is approximately equal to the following formula:
  • the result is on the order of 5 x e 150 .
  • FIG. 7 is a schematic structural diagram of a digital identity generation device according to an embodiment of the present application.
  • the device 700 includes: a processor 701; a memory 702; an executable program code stored in the memory 702; and the processor 701 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 702 For execution:
  • the summary information of the STR is used as the generated digital identity.
  • the processor 701 is further configured to:
  • a single STR digital code is used to indicate position information of the corresponding STR in the whole genome data, repeated base-pair sequence information, and the number of repeated base-pair sequences.
  • the position information includes: chromosome number, starting position and Fragment length of STR;
  • the processor 701 is further configured to:
  • the target STR digital code generated based on the sequence-transformed single STR digital code is unique.
  • the processor 701 is further configured to:
  • a hash algorithm is used to generate the digest information of the target STR digital code, so that the target STR digital code corresponds to a unique digest information.
  • a first preset number of short tandem repeat sequences STR and related information of each STR are extracted from the whole genome data, and a single STR digital code corresponding to each STR is generated according to the related information of each STR to obtain Multiple single STR digital codes, and using a preset rule to perform sequence transformation on each single STR digital code, and generate a target STR digital code according to the sequence-transformed single STR digital code, and generate summary information of the target STR digital code, and The summary information is used as the summary information of the STR to which the target STR digital code belongs, and the summary information of the STR is used as the generated digital identity.
  • the digital identity is generated based on the short tandem repeat sequence STR in the whole genome data, the generated digital identity
  • the identification is unique and cannot be easily copied, and because the STR is digitally encoded and serially transformed, it can effectively improve the confidentiality and security of the generated digital identity and enhance the expression effect of the digital identity.
  • the present application also proposes a non-transitory computer-readable storage medium.
  • the non-transitory computer-readable storage medium is used to store an application program, and the application program is used to execute the digital identity generation method in the embodiment of the present application at runtime.
  • the method includes:
  • the summary information of the STR is used as the generated digital identity.
  • the non-transitory computer-readable storage medium of the embodiment of the present application generates a first preset number of short tandem repeat sequences STR from the whole genome data and related information of each STR, and generates and The single STR digital code corresponding to the STR is used to obtain multiple single STR digital codes.
  • Each preset single STR digital code is sequence-transformed using a preset rule, and the target STR digital code is generated based on the sequence-transformed single-STR digital code.
  • Target STR digitally encoded summary information and use the summary information as the summary information of the STR to which the target STR digital code belongs, and use the STR summary information as the generated digital identity, because it is generated based on the short tandem repeat sequence STR in the whole genome data
  • the digital identity makes the generated digital identity unique and difficult to be copied, and because the STR is digitally encoded and sequenced, it can effectively improve the confidentiality and security of the generated digital identity, and enhance the digital identity Express the effect.
  • Any process or method description in a flowchart or otherwise described herein can be understood as representing a module, fragment, or portion of code that includes one or more executable instructions for implementing a particular logical function or step of a process
  • the scope of the preferred embodiments of the present application includes additional implementations, in which the functions may be performed out of the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order according to the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application pertain.
  • each part of the application may be implemented by hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it may be implemented using any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gate circuits, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
  • a person of ordinary skill in the art can understand that all or part of the steps carried by the methods in the foregoing embodiments may be implemented by a program instructing related hardware.
  • the program may be stored in a computer-readable storage medium.
  • the program is When executed, one or a combination of the steps of the method embodiment is included.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the aforementioned storage medium may be a read-only memory, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Computing Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Storage Device Security (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本申请提出一种数字身份标识生成方法、装置、系统及存储介质,该方法包括从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码;采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码;生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标STR数字编码所属STR的摘要信息;将所述STR的摘要信息作为所生成的数字身份标识。通过本申请能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。

Description

数字身份标识生成方法、装置、系统及存储介质 技术领域
本申请涉及计算机技术领域,尤其涉及一种数字身份标识生成方法、装置、系统及存储介质。
背景技术
传统数字身份可以分为两类,一类是需要和物理身份实名对应的数字身份,简称为实体数字身份,最典型的应用包括政府发布的居民身份证和银行发布的个人数字证书、法人数字证书等;另外一类是不需要和物理身份对应的虚拟数字身份,简称为虚拟数字身份,常见的应用包括各种网络服务的用户名等。
实体数字身份和物理身份的对应关系具体如下:实体数字身份的发放及管理,需要真实的物理身份进行关联鉴别,即,每个数字身份对应一个真实存在的物理身份,物理身份一般通过自然人的生物信息(脸部特征或指纹特征)进行识别。
相关技术中的实体数字身份所依赖的线下实体个人生物信息识别技术较为薄弱,依靠生物信息识别技术生成数字身份存在以下问题:基于脸部特征识别技术的数字身份应用中(例如身份证号码),生物特征唯一性无法保证100%不重复、可复制性较高(整容易容)、人工或机器的脸部识别准确率都无法达到100%,数字身份标识的表达效果不佳。
发明内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本申请的一个目的在于提出一种数字身份标识生成方法,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
本申请的另一个目的在于提出一种数字身份标识生成装置。
本申请的另一个目的在于提出一种数字身份标识生成系统。
本申请的另一个目的在于提出一种非临时性计算机可读存储介质。
本申请的另一个目的在于提出一种计算机程序产品。
为达到上述目的,本申请第一方面实施例提出的数字身份标识生成方法,包括:从全基 因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码;采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码;生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标STR数字编码所属STR的摘要信息;将所述STR的摘要信息作为所生成的数字身份标识。
本申请第一方面实施例提出的数字身份标识生成方法,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
为达到上述目的,本申请第二方面实施例提出的数字身份标识生成装置,包括:处理器;存储器;存储器内存储可执行程序代码;处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行:从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码;采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码;生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标STR数字编码所属STR的摘要信息;将所述STR的摘要信息作为所生成的数字身份标识。
本申请第二方面实施例提出的数字身份标识生成装置,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性, 提升数字身份标识的表达效果。
为达到上述目的,本申请第三方面实施例提出的数字身份标识生成系统,包括:本申请第二方面实施例提出的数字身份标识生成装置。
本申请第三方面实施例提出的数字身份标识生成系统,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
为达到上述目的,本申请第四方面实施例提出的非临时性计算机可读存储介质,具有存储于其中的指令,当所述指令被电子设备的处理器执行时,所述处理器执行本申请第一方面实施例提出的数字身份标识生成方法。
本申请第四方面实施例提出的非临时性计算机可读存储介质,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
为达到上述目的,本申请第五方面实施例提出的计算机程序产品,当所述计算机程序产品中的指令由处理器执行时,执行一种数字身份标识生成方法,所述方法包括:从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码;采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码;生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标 STR数字编码所属STR的摘要信息;将所述STR的摘要信息作为所生成的数字身份标识。
本申请第五方面实施例提出的计算机程序产品,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1是本申请一实施例提出的数字身份标识生成方法的流程示意图;
图2为本申请实施例中各STR的相关信息示意图;
图3是本申请另一实施例提出的数字身份标识生成方法的流程示意图;
图4为本申请实施例中索引信息示意图;
图5是本申请另一实施例提出的数字身份标识生成方法的流程示意图;
图6为本申请实施例中数字身份ID库示意图;
图7是本申请一实施例提出的数字身份标识生成装置的结构示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。相反,本申请的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。
图1是本申请一实施例提出的数字身份标识生成方法的流程示意图。
本实施例以该数字身份标识生成方法被配置为数字身份标识生成装置中来举例说明。
本实施例中数字身份标识生成方法可以被配置在数字身份标识生成装置中,数字身份标识生成装置可以设置在服务器中,或者也可以设置在电子设备中,对此不作限制。
本实施例以数字身份标识生成方法被配置在电子设备中为例。
其中的数字身份标识用于唯一标记一位用户的身份信息,数字身份标识为基于该用户的基因组数据所生成的。其中,电子设备例如智能手机、平板电脑、个人数字助理、电子书等具有各种操作系统的硬件设备。
需要说明的是,本申请实施例的执行主体,在硬件上可以例如为电子设备的中央处理器(Central Processing Unit,CPU),在软件上可以例如为电子设备中的数字身份标识生成类应用程序,对此不作限制。
传统数字身份可以分为两类,一类是需要和物理身份实名对应的数字身份,简称为实体数字身份,最典型的应用包括政府发布的居民身份证和银行发布的个人数字证书、法人数字证书等;另外一类是不需要和物理身份对应的虚拟数字身份,简称为虚拟数字身份,常见的应用包括各种网络服务的用户名等。
实体数字身份和物理身份的对应关系具体如下:实体数字身份的发放及管理,需要真实的物理身份进行关联鉴别,即,每个数字身份对应一个真实存在的物理身份,物理身份一般通过自然人的生物信息(脸部特征或指纹特征)进行识别。
相关技术中的实体数字身份所依赖的线下实体个人生物信息识别技术较为薄弱,依靠生物信息识别技术生成数字身份存在以下问题:基于脸部特征识别技术的数字身份应用中(例如身份证号码),生物特征唯一性无法保证100%不重复、可复制性较高(整容易容)、人工或机器的脸部识别准确率都无法达到100%,数字身份标识的表达效果不佳。
为了解决上述技术问题,本申请实施例提供一种数字身份标识生成方法,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
参见图1,该方法包括:
S101:从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息。
其中的全基因组数据为当前需要对其生成数字身份标识的用户的全基因组数据。
基于基因组数据的特性,在某个用户的全基因组数据中,存在N个短串联重复序列(short tandem repeat,STR),STR是核心序列为2-6个碱基的短串联重复结构,其中,每个STR包含的碱基对序列重复次数范围一般在1-K之间,基于基因组数据的特性,目前N取值大于7000,K取值约为100。对于任何特定用户的全基因组数据中,染色体上某个特定位置的STR中重复序列的重复次数是固定的,而对于不同用户在同一位置处的重复次数可能不同,因此,构成多个用户中该STR的多态性。本申请实施例中,正是基于用户全基因组数据中STR的多态性,通过基于STR的多态性生成唯一的数字身份标识,可以采用该数字身份标识唯一表达一个用户的身份,使得所生成数字身份标识具有唯一性,不易被复制。
本申请实施例中,基于STR的多态性生成提供一种可以生成海量不同数字身份标识的方法,其生成的每个数字身份标识都为该用户所特有,而不与其他用户的数字身份标识重复。
本申请实施例中的全基因组数据,可以从公开的人类基因组标准序列Hg19中获取。
作为一种示例,本申请实施例中提取的各STR的相关信息可以参见图2,图2为本申请实施例中各STR的相关信息示意图。其中,以文档的形式展示各STR的相关信息,每个STR占一行,横向从前到后的内容分别代表的含义举例如下:
Score ID;
Perc div;
Perc del;
Perc ins;
染色体编号;
在查询序列的起始位点;
在查询序列的终止位点
序列方向+为正向;
重复部分的碱基序列;
重复序列类型,simple_repeat即为STR;
在重复部分的起始位点;
在重复部分的终止位点。
可选地,一些实施例中,参见图3,从全基因组数据中提取第一预设个数的短串联重复序列STR,包括:
S301:从全基因组数据中提取具有多态性的短串联重复序列STR并作为初始STR。
S302:从多个初始STR中随机提取第一预设个数的不同的STR,第一预设个数的不同的STR中至少有第二预设个数的STR不同,第一预设个数大于第二预设个数。
其中的第一预设个数和第二预设个数可以有用户根据使用需求预先设定,或者,也可以由电子设备的出厂程序预先设定,对此不作限制。
其中的第一预设个数可以用M表示,第二预设个数可以用J表示。
本申请实施例在具体执行的过程中,为了有效使得所生成数字身份标识具有唯一性,可以从全基因组数据中提取具有多态性的短串联重复序列STR并作为初始STR,使得基于用户全基因组数据产生的数字身份标识可以回溯到唯一生物个体,有效防止身份冒用,每次随机提取的第一预设个数的不同的STR能够用于生成该用户的一个数字身份标识,经过多次的随机提取,即可以生成该用户的多个不同的数字身份标识,能够产生海量的数字身份标识,可以保证个人在每次身份鉴定中使用不同的数字身份标识,对于隐私保护较高的场合,可以保证数字身份标识不被滥用。
本申请实施例在具体执行的过程中,还为了进一步保障数字身份标识的唯一性,可以配置第一预设个数的不同的STR中至少有第二预设个数的STR不同,第一预设个数M取值约为50,第二预设个数J取值约为30,在M取值50的情况下,两个不同用户的数字身份标识累计重合率低于10e -20,远高于相关技术中的其它生物识别技术。
本申请实施例在具体执行的过程中,还可以将M个STR的相关信息保存为索引信息,参见图4,图4为本申请实施例中索引信息示意图,便于STR的相关信息的归类存储。
S102:根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码。
可选地,一些实施例中,为了在技术上实现根据全基因组数据中的STR的相关信息生成数字身份标识,可以采用单STR数字编码指示其所对应STR在全基因组数据中的位置信息、重复碱基对序列信息、重复碱基对序列的重复次数,参见图5,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,可以包括:
S501:采用第一个数的比特位标记各STR在全基因组数据中的位置信息,并将标记后的第一个数的比特位作为第一数字编码,位置信息包括:染色体编号、起始位点和STR的片段长度。
作为一种示例,本申请实施例中的第一个数配置为53,第一个数的比特位标记染色体编号、起始位点和STR的片段长度,染色体编号可以采用5个比特位进行二进制标记,具体如,针对24条染色体(chr1-chr22,chrx,chry),可以例如将其中的染色体chr10采用00010标记,起始位点可以采用40个比特位进行二进制标记。STR的片段长度可以采用8个比特位进行二进制标记,例如,CCT片段重复区域的长度为23,可以将左侧填充0000补全比特位的位数。
本申请实施例在具体执行的过程中,考虑到人类染色体有24种(常染色体22种,性染色体2种),为了保证后续实际开发更新需求,将染色体编号的最低比特位数设为5,起始位点设置在1-109之间,起始位点的最低比特位数设为40,STR的片段长度设置在1-600之间,片段长度的最低比特位数设为8。
举例如下:
Chr10→00010;
90608→0000000000000000000000010110000111110000;
23→00010111。
S502:采用第二个数的比特位标记各STR的重复碱基对序列信息,并将标记后的第二个数的比特位作为第二数字编码。
其中的重复碱基对序列信息包括STR的重复碱基对序列内容以及重复碱基对序列长度
本申请实施例中的第二个数配置为36,36个比特位标记该STR的重复碱基对序列内容,A用100代替、G用111代替、C用110代替、T用101代替,并且将左侧填充0000补全位数,重复碱基对序列长度在2-12之间,本申请实施例中可以依据最长碱基对序列长度和一个碱基换成3比特位字符串代替来算,最低比特位数设为12*3=36。
举例如下:
CCT→000000000000000000000000000110110101。
S503:采用第三个数的比特位标记重复碱基对序列的重复次数,并将标记后的第三个数的比特位作为第三数字编码。
本申请实施例中的第三个数配置为8,8个比特位标记该STR的重复碱基对序列的重复次数,该重复次数小于等于K,K值一般为50,也就是说重复次数在2-50之间,本申请实施例中可以预留超出上限冗余,最低比特位数设为8。
举例如下:
5→00000101。
本申请实施例在具体执行的过程中,可以通过程序获得对应位点的重复序列,并判断重复次数。
S504:对各STR的第一数字编码、第二数字编码,以及第三数字编码进行串接处理,并将串接后的数字编码作为与各STR对应的单STR数字编码。
本申请实施例在具体执行的过程中,可以将第一数字编码、第二数字编码,以及第三数字编码进行串接处理,即得到该STR对应的单STR数字编码STRECD=StartPosition|RepeatedSeq|RpeatedCnt,总长为53+36+8=97个比特位,而后,采用左端填充1010序列方式,把单STR数字编码STRECD统一为8*S比特位的编码。
本申请实施例中,通过对各STR的第一数字编码、第二数字编码,以及第三数字编码进行串接处理,并将串接后的数字编码作为与各STR对应的单STR数字编码,使得所生成的单STR数字编码能够完整地标记STR的相关信息,并且,通过采用左端填充1010序列方式,把单STR数字编码STRECD统一为8*S比特位的编码,能够方便后续软件处理。
举例如下:本实施中S取值为16,所以左端填充31个比特位得到:
1010101010101010101010101010101,
得到128比特位的二进制序列编码,具体数值如下:
10110101010101010101010101010101010001000000000000000000000000101100001111100000001011100000000000000000000000000011011010100000。
S103:采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码。
本申请实施例中,采用预设规则对每个单STR数字编码进行序列变换,使得根据序列变换后的单STR数字编码所生成的目标STR数字编码唯一。
本申请实施例在具体执行的过程中,采用预设规则对每个单STR数字编码进行序列变换,并对序列变换后的各单STR数字编码直接进行串接处理,可以得到串接后的STR数字编码,该串接后的STR数字编码可以被称为目标STR数字编码,该目标STR数字编码与步骤S101中所提取的第一预设个数的短串联重复序列STR相对应,则经过多次的随机提取,即可以生成对应与每次提取的第一预设个数的短串联重复序列STR相对应的一个目标STR数字编码。
或者,作为一种示例,也可以对M个STR所得到M个STRECD直接进行串接,最后得到该M个STR的目标STR数字编码MSTRECD,其总比特位的个数为(8*S)*M。
S104:生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息。
可选地,可以采用哈希算法生成目标STR数字编码的摘要信息,使得目标STR数字编码对应唯一摘要信息。
本申请实施例在具体执行的过程中,可以采用哈希算法对目标STR数字编码MSTRECD进行摘要计算,得到计算结果,并生成F个比特位的与计算结果对应的HASH摘要作为目标STR数字编码的摘要信息。
本申请实施例中,F取值可以为大于或者等于256,常用取值例如256、512、1024。
哈希算法例如包括SHA256、SHA512。
本申请实施例,通过生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,通过转换确保每个不同的目标STR数字编码能够映射为一个内容不同,而长度固定为F的二进制序列,并且无法通过该二进制序列还原出目标STR数字编码,保障数字身份标识的保密性和安全性。
目标STR数字编码所属STR的摘要信息举例如下:
570EA6E6A236C6E48B482FAA4F9BDD6BD22325841D3ACF69A88CE08843C0143A。
S105:将STR的摘要信息作为所生成的数字身份标识。
本申请实施例中,STR的摘要信息即可以作为用户的一个数字身份标识,该数字身份标识能够用于唯一标识该用户个体,基于同样的基因组数据,可以生成大量的互补相同的数字身份标识。
本申请实施例中,第一预设个数M的取值,可以确保任意两个不同的用户所生成的数字身份标识都不同,本申请实施例可以配置M的取值为大于或者等于50,配置J的取值是为了确保同一个用户每次生成的数字身份标识不同,本申请实施例可以配置J取值范围为M>=J>=30。
举例如下:
在存在N个短串联重复序列STR,且N的取值7000,M取值50,J取值2时,则根据本申请实施例中的方法可以产生的数字身份标识数量近似等于如下公式所得出的数据:
Figure PCTCN2018091880-appb-000001
结果约为5×e 150数量级。
进一步地,依据本申请实施例中的方法生成的数字身份标识,在数字化技术应用中可以用来做身份识别,典型的应用场景描述如下:
1、用户数字身份ID库:依据本申请实施例中的方法生成的数字身份标识,每个用户可 生成大量各不相同的数字身份标识,该数字身份标可以作为一个数字身份ID库使用,如图6所示。图6为本申请实施例中数字身份ID库示意图,区别于相关技术中的网络应用中用户数字身份标识固定不变的应用场景,用户在数字系统中可以每次使用不同的数字身份标识进行应用操作,提高用户应用数字系统的安全性。此外,结合第三方服务系统,可以针对使用过的数字身份标识进行作废处理,杜绝后续其他用户冒用。
2、数字身份识别:依据本申请实施例中的方法生成的数字身份标识,安全性高。用户可以凭借全基因组数据及该数字身份标识对应的索引信息,准确验证出该号码和本用户的真实对应关系。
3、区块链系统中的用户身份识别:区块链系统中的用户信息和需要单独的ID号码进行标识,依据本申请实施例中的方法生成的数字身份标识,作为每次用户信息ID,可以提升ID安全性。
4、区块链系统中的交易信息识别:区块链系统中每笔交易都需要单独的ID号码进行标识,依据本申请实施例中的方法生成的数字身份标识,作为交易ID,可以加强隐私保护。具体来说,同一个用户发起的交易使用其不同的数字身份标识作为交易ID,该交易上链存储后,只有用户本人才能通过其用户数字身份ID库验证出该交易ID是否本人所发起。
本实施例中,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
作为一种示例,本发明实施例中数字身份标识生成方法的具体示例如下:
1、从全基因组数据中提取具有多态性的短串联重复序列STR并作为初始STR。
2、从多个所述初始STR中随机提取所述第一预设个数的不同的STR,所述第一预设个数的不同的STR中至少有第二预设个数的STR不同,所述第一预设个数大于所述第二预设个数。
可以配置第一预设个数的不同的STR中至少有第二预设个数的STR不同,第一预设个数M取值约为50,第二预设个数J取值约为30,在M取值50的情况下,两个不同用户的数字身份标识累计重合率低于10e -20,远高于相关技术中的其它生物识别技术。
3、提取各STR的相关信息。
本申请实施例中提取的各STR的相关信息可以参见图2,图2为本申请实施例中各STR的相关信息示意图。其中,以文档的形式展示各STR的相关信息,每个STR占一行,横向从前到后的内容分别代表的含义举例如下:
Score ID;
Perc div;
Perc del;
Perc ins;
染色体编号;
在查询序列的起始位点;
在查询序列的终止位点
序列方向+为正向;
重复部分的碱基序列;
重复序列类型,simple_repeat即为STR;
在重复部分的起始位点;
在重复部分的终止位点。
4、采用第一个数的比特位标记各所述STR在所述全基因组数据中的位置信息,并将标记后的第一个数的比特位作为第一数字编码,所述位置信息包括:染色体编号、起始位点和所述STR的片段长度。
作为一种示例,本申请实施例中的第一个数配置为53,第一个数的比特位标记染色体编号、起始位点和STR的片段长度,染色体编号可以采用5个比特位进行二进制标记,具体如,针对24条染色体(chr1-chr22,chrx,chry),可以例如将其中的染色体chr10采用00010标记,起始位点可以采用40个比特位进行二进制标记。STR的片段长度可以采用8个比特位进行二进制标记,例如,CCT片段重复区域的长度为23,可以将左侧填充0000补全比特位的位数。
本申请实施例在具体执行的过程中,考虑到人类染色体有24种(常染色体22种,性染色体2种),为了保证后续实际开发更新需求,将染色体编号的最低比特位数设为5,起始位点设置在1-109之间,起始位点的最低比特位数设为40,STR的片段长度设置在1-600之间,片段长度的最低比特位数设为8。
举例如下:
Chr10→00010;
90608→0000000000000000000000010110000111110000;
23→00010111。
5、采用第二个数的比特位标记各所述STR的重复碱基对序列信息,并将标记后的第二个数的比特位作为第二数字编码。
其中的重复碱基对序列信息包括STR的重复碱基对序列内容以及重复碱基对序列长度
本申请实施例中的第二个数配置为36,36个比特位标记该STR的重复碱基对序列内容,A用100代替、G用111代替、C用110代替、T用101代替,并且将左侧填充0000补全位数,重复碱基对序列长度在2-12之间,本申请实施例中可以依据最长碱基对序列长度和一个碱基换成3比特位字符串代替来算,最低比特位数设为12*3=36。
举例如下:
CCT→000000000000000000000000000110110101。
6、采用第三个数的比特位标记所述重复碱基对序列的重复次数,并将标记后的第三个数的比特位作为第三数字编码。
本申请实施例中的第三个数配置为8,8个比特位标记该STR的重复碱基对序列的重复次数,该重复次数小于等于K,K值一般为50,也就是说重复次数在2-50之间,本申请实施例中可以预留超出上限冗余,最低比特位数设为8。
举例如下:
5→00000101。
7、对各所述STR的所述第一数字编码、所述第二数字编码,以及所述第三数字编码进行串接处理,并将串接后的数字编码作为与各所述STR对应的单STR数字编码。
本申请实施例在具体执行的过程中,可以将第一数字编码、第二数字编码,以及第三数字编码进行串接处理,即得到该STR对应的单STR数字编码STRECD=StartPosition|RepeatedSeq|RpeatedCnt,总长为53+36+8=97个比特位,而后,采用左端填充1010序列方式,把单STR数字编码STRECD统一为8*S比特位的编码。
举例如下:本实施中S取值为16,所以左端填充31个比特位得到:
1010101010101010101010101010101,
得到128比特位的二进制序列编码,具体数值如下:
10110101010101010101010101010101010001000000000000000000000000101100001111100000001011100000000000000000000000000011011010100000。
8、采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码。
作为一种示例,也可以对M个STR所得到M个STRECD直接进行串接,最后得到该M个STR的目标STR数字编码MSTRECD,其总比特位的个数为(8*S)*M。
目标STR数字编码MSTRECD举例如下:
101101010101010101010101010101010100010000000000000000000000001010000011111000000010111000000000000000000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010010000111110000000101110000000000001101101010000010110101010101010101010101010101010001000000000000000000000000101100001111100000001010100000000000010011010110000101101010101010101010101010101010100010000000000000000000000001011100011111000000011111000000000000110110101000001011010101010101010101010101010101000010000000000000000000000010110000111110000000101110000000000000000000000000001101101010000010110101010101010101010101010101010001000000000000000000000000101000001111100000001011100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001001000011111000000010111000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010110000100110000000101010000000000001001101011100010110101010101010101010101010101010001000000000000000000000000110110001111100000001111100000000000011011010100000101101010101010101010101010101010100001000000000000000000000001011000011110000000010111000000000000010110101100001011010101010101010101010101010101000100000000000000000000000010100000111110000000101010000000000001101101010000010110101010101010101010101010101010001000000000000000000000000100100001111100000001011100000000000000000000000000011011010100000101101010101010101010101010101010100010000000000000000000000001011000011111000000010101000000000000100110101100001011010101010101010101010101010101000100000000000000000000000010111000111110000000111010000000000001101101010000010110101010101010101010101010101010000100000000000000000000000101100001111100000001101100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001010000010111000000000000000000010111000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010010000011100000000101110000000000001101101 010000010110101010101010101010101010101010001000000000000000000000000101100001001100000001010100000000000010011010111000101101010101010101010101010101010100010000000000000000000000001100000011111000000011111000000000000110110101000001011010101010101010101010101010101000010000000000000000000000010110000111100000000101110000000000000000000000000000101101011000010110101010101010101010101010101010001000000000000000000000000101000001111100000001011100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001001000011111000000010111000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010110000111110000000101010000000000001001101011000010110101010101010101010101010101010001000000000000000000000000101110001111100000001111100000000000011011010100000101101010101010101010101010101010100001000000000000000000000001011000011111000000010111000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010100000111010000000101100000000000001101101010000010110101010101010101010101010101010001000000000000000000000000100100001101100000011011100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001011000010011000000010101000000000000100110101110001011010101010101010101010101010101000100000000000000000000000011011000111100000000111110000000000001101101010000010110101010101010101010101010101010000100000000000000000000000101100001111000000001000100000000000011001010110000101101010101010101010101010101010100010000000000000000000000001010000011111000000010111000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010010000111110000000101110000000000001101101010000010110101010101010101010101010101010001000000000000000000000000101100001111100000001010100000000000010011010110000101101010101010101010101010101010100010000000000000000000000001011100011111000000011111000000000000110110101000001011010101010101010101010101010101000010000000000000000000000010110000111110000000101110000000000001101101010000010110101010101010101010101010101010001000000000000000000000000101000000011100000001011100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001001000011111000000010111000000000000110111101000001011010101010101010101010101010101000100000000000000000000000010110000100110000000101010 0000000000010011010111000101101010101010101010101010101010100010000000000000000000000001111100011111000000011111000000000000110110101000001011010101010101010101010101010101000010000000000000000000000010110000111100000000101110000000000000101101011000010110101010101010101010101010101010001000000000000000000000000101000001111100000001011100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001001000011111000000010111000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010110000111110000000101010000000000001001101011000010110101010101010101010101010101010001000000000000000000000000101110001111100000001111100000000000011011010100000101101010101010101010101010101010100001000000000000000000000001011000011111000000010101000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010100000111110000000101010000000000001101101010000010110101010101010101010101010101010001000000000000000000000000100100001111100000001111100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001011000010011000000010101000000000000100110101110001011010101010101010101010101010101000100000000000000000000000011011000111110000000101110000000000001101101010000010110101010101010101010101010101010000100000000000000000000000111100001111100000001011100000000000001011010110000101101010101010101010101010101010100010000000000000000000000001011000011111000000010111000000000000110110101100001011010101010101010101010101010101000100000000000000000000000010110000111110000000101110000000000001101101010000010110101010101010101010101010101010001000000000000000000000000101100001111100000001001100000000000011011010100000101101010101010101010101010101010100010000000000000000000000001011000011111000000010110000000000000110110101000001011010101010101010101010101010101000100000000000000000000000010110000111110000000101110000000000001101101010000010110101010101010101010101010101010001000000000000000000000000101100001111100000001011100000000000011011010100000。
9、采用哈希算法生成所述目标STR数字编码的摘要信息,使得所述目标STR数字编码对应唯一摘要信息。
目标STR数字编码所属STR的摘要信息举例如下:
570EA6E6A236C6E48B482FAA4F9BDD6BD22325841D3ACF69A88CE08843C0143A。
10、将所述STR的摘要信息作为所生成的数字身份标识。
本申请实施例中,第一预设个数M的取值,可以确保任意两个不同的用户所生成的数字身份标识都不同,本申请实施例可以配置M的取值为大于或者等于50,配置J的取值是为了确保同一个用户每次生成的数字身份标识不同,本申请实施例可以配置J取值范围为M>=J>=30。
举例如下:
在存在N个短串联重复序列STR,且N的取值7000,M取值50,J取值2时,则根据本申请实施例中的方法可以产生的数字身份标识数量近似等于如下公式所得出的数据:
Figure PCTCN2018091880-appb-000002
结果约为5×e 150数量级。
图7是本申请一实施例提出的数字身份标识生成装置的结构示意图。
参见图7,该装置700包括:处理器701;存储器702;存储器702内存储可执行程序代码;处理器701通过读取存储器702中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行:
从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;
根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码;
采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码;
生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息;
将STR的摘要信息作为所生成的数字身份标识。
可选地,一些实施例中,处理器701,还用于:
从全基因组数据中提取具有多态性的短串联重复序列STR并作为初始STR;
从多个初始STR中随机提取第一预设个数的不同的STR,第一预设个数的不同的STR中至少有第二预设个数的STR不同,第一预设个数大于第二预设个数。
可选地,一些实施例中,单STR数字编码用于指示其所对应STR在全基因组数据中的位置信息、重复碱基对序列信息、重复碱基对序列的重复次数,处理器701,还用于:
采用第一个数的比特位标记各STR在全基因组数据中的位置信息,并将标记后的第一个 数的比特位作为第一数字编码,位置信息包括:染色体编号、起始位点和STR的片段长度;
采用第二个数的比特位标记各STR的重复碱基对序列信息,并将标记后的第二个数的比特位作为第二数字编码;
采用第三个数的比特位标记重复碱基对序列的重复次数,并将标记后的第三个数的比特位作为第三数字编码;
对各STR的第一数字编码、第二数字编码,以及第三数字编码进行串接处理,并将串接后的数字编码作为与各STR对应的单STR数字编码。
可选地,一些实施例中,处理器701,还用于:
根据序列变换后的单STR数字编码所生成的目标STR数字编码唯一。
可选地,一些实施例中,处理器701,还用于:
采用哈希算法生成目标STR数字编码的摘要信息,使得目标STR数字编码对应唯一一个摘要信息。
需要说明的是,前述图1-图6实施例中对数字身份标识生成方法实施例的解释说明也适用于该实施例的数字身份标识生成装置700,其实现原理类似,此处不再赘述。
本实施例中,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
为了实现上述实施例,本申请还提出一种非临时性计算机可读存储介质。其中,该非临时性计算机可读存储介质用于存储应用程序,该应用程序用于在运行时执行本申请实施例的数字身份标识生成方法,该方法包括:
从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;
根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码;
采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码;
生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘 要信息;
将STR的摘要信息作为所生成的数字身份标识。
需要说明的是,本实施例的应用程序执行数字身份标识生成方法和原理和实现方式与上述实施例的数字身份标识生成方法类似,为了避免冗余,此处不再赘述。
本申请实施例的非临时性计算机可读存储介质,通过从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息,根据各STR的相关信息生成与各STR对应的单STR数字编码,得到多个单STR数字编码,并采用预设规则对每个单STR数字编码进行序列变换,并根据序列变换后的单STR数字编码生成目标STR数字编码,以及生成目标STR数字编码的摘要信息,并将摘要信息作为目标STR数字编码所属STR的摘要信息,将STR的摘要信息作为所生成的数字身份标识,由于是基于全基因组数据中的短串联重复序列STR生成数字身份标识,使得所生成数字身份标识具有唯一性,不易被复制,并且由于是对STR进行数字编码以及序列变换,能够有效提升所生成数字身份标识的保密性和安全性,提升数字身份标识的表达效果。
需要说明的是,在本申请的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本申请的描述中,除非另有说明,“多个”的含义是两个或两个以上。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (13)

  1. 一种数字身份标识生成方法,其特征在于,包括以下步骤:
    从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;
    根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码;
    采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码;
    生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标STR数字编码所属STR的摘要信息;
    将所述STR的摘要信息作为所生成的数字身份标识。
  2. 如权利要求1所述的数字身份标识生成方法,其特征在于,所述从全基因组数据中提取第一预设个数的短串联重复序列STR,包括:
    从全基因组数据中提取具有多态性的短串联重复序列STR并作为初始STR;
    从多个所述初始STR中随机提取所述第一预设个数的不同的STR,所述第一预设个数的不同的STR中至少有第二预设个数的STR不同,所述第一预设个数大于所述第二预设个数。
  3. 如权利要求1所述的数字身份标识生成方法,其特征在于,所述单STR数字编码用于指示其所对应STR在所述全基因组数据中的位置信息、重复碱基对序列信息、重复碱基对序列的重复次数,所述根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码,包括:
    采用第一个数的比特位标记各所述STR在所述全基因组数据中的位置信息,并将标记后的第一个数的比特位作为第一数字编码,所述位置信息包括:染色体编号、起始位点和所述STR的片段长度;
    采用第二个数的比特位标记各所述STR的重复碱基对序列信息,并将标记后的第二个数的比特位作为第二数字编码;
    采用第三个数的比特位标记所述重复碱基对序列的重复次数,并将标记后的第三个数的比特位作为第三数字编码;
    对各所述STR的所述第一数字编码、所述第二数字编码,以及所述第三数字编码进行串接处理,并将串接后的数字编码作为与各所述STR对应的单STR数字编码。
  4. 如权利要求1所述的数字身份标识生成方法,其特征在于,其中,
    根据所述序列变换后的单STR数字编码所生成的目标STR数字编码唯一。
  5. 如权利要求1所述的数字身份标识生成方法,其特征在于,所述生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标STR数字编码所属STR的摘要信息,包括:
    采用哈希算法生成所述目标STR数字编码的摘要信息,使得所述目标STR数字编码对应唯一一个摘要信息。
  6. 一种数字身份标识生成装置,其特征在于,包括:
    处理器;
    存储器;
    存储器内存储可执行程序代码;处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行:
    从全基因组数据中提取第一预设个数的短串联重复序列STR,以及各STR的相关信息;
    根据所述各STR的相关信息生成与所述各STR对应的单STR数字编码,得到多个单STR数字编码;
    采用预设规则对每个单STR数字编码进行序列变换,并根据所述序列变换后的单STR数字编码生成目标STR数字编码;
    生成所述目标STR数字编码的摘要信息,并将所述摘要信息作为所述目标STR数字编码所属STR的摘要信息;
    将所述STR的摘要信息作为所生成的数字身份标识。
  7. 如权利要求6所述的数字身份标识生成装置,其特征在于,所述处理器,还用于:
    从全基因组数据中提取具有多态性的短串联重复序列STR并作为初始STR;
    从多个所述初始STR中随机提取所述第一预设个数的不同的STR,所述第一预设个数的不同的STR中至少有第二预设个数的STR不同,所述第一预设个数大于所述第二预设个数。
  8. 如权利要求6所述的数字身份标识生成装置,其特征在于,所述单STR数字编码用于指示其所对应STR在所述全基因组数据中的位置信息、重复碱基对序列信息、重复碱基对序列的重复次数,所述处理器,还用于:
    采用第一个数的比特位标记各所述STR在所述全基因组数据中的位置信息,并将标记后的第一个数的比特位作为第一数字编码,所述位置信息包括:染色体编号、起始位点和所述STR的片段长度;
    采用第二个数的比特位标记各所述STR的重复碱基对序列信息,并将标记后的第二个数 的比特位作为第二数字编码;
    采用第三个数的比特位标记所述重复碱基对序列的重复次数,并将标记后的第三个数的比特位作为第三数字编码;
    对各所述STR的所述第一数字编码、所述第二数字编码,以及所述第三数字编码进行串接处理,并将串接后的数字编码作为与各所述STR对应的单STR数字编码。
  9. 如权利要求6所述的数字身份标识生成装置,其特征在于,所述处理器,还用于:
    根据所述序列变换后的单STR数字编码所生成的目标STR数字编码唯一。
  10. 如权利要求6所述的数字身份标识生成装置,其特征在于,所述处理器,还用于:
    采用哈希算法生成所述目标STR数字编码的摘要信息,使得所述目标STR数字编码对应唯一一个摘要信息。
  11. 一种数字身份标识生成系统,其特征在于,包括:
    如权利要求6-10任一项所述的数字身份标识生成装置。
  12. 一种非临时性计算机可读存储介质,具有存储于其中的指令,当所述指令被电子设备的处理器执行时,所述处理器执行如权利要求1-5任一项所述的数字身份标识生成方法。
  13. 一种计算机程序产品,当所述计算机程序产品中的指令由处理器执行时,执行如上述权利要求1-5任一项所述的数字身份标识生成方法。
PCT/CN2018/091880 2018-06-19 2018-06-19 数字身份标识生成方法、装置、系统及存储介质 WO2019241913A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2018/091880 WO2019241913A1 (zh) 2018-06-19 2018-06-19 数字身份标识生成方法、装置、系统及存储介质
EP18923623.5A EP3812952A4 (en) 2018-06-19 2018-06-19 METHOD, DEVICE AND SYSTEM FOR DIGITAL IDENTIFICATION AND STORAGE MEDIUM
US17/122,361 US11822629B2 (en) 2018-06-19 2020-12-15 Method and apparatus for generating digital identity and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/091880 WO2019241913A1 (zh) 2018-06-19 2018-06-19 数字身份标识生成方法、装置、系统及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/122,361 Continuation US11822629B2 (en) 2018-06-19 2020-12-15 Method and apparatus for generating digital identity and storage medium

Publications (1)

Publication Number Publication Date
WO2019241913A1 true WO2019241913A1 (zh) 2019-12-26

Family

ID=68983105

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/091880 WO2019241913A1 (zh) 2018-06-19 2018-06-19 数字身份标识生成方法、装置、系统及存储介质

Country Status (3)

Country Link
US (1) US11822629B2 (zh)
EP (1) EP3812952A4 (zh)
WO (1) WO2019241913A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4143722A4 (en) * 2020-04-29 2023-10-25 Trellis Health Systems, Inc. ANONYMOUS DIGITAL IDENTITY FROM INDIVIDUAL GENOME INFORMATION

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11922017B2 (en) * 2021-04-27 2024-03-05 Apple Inc. Compact genome data storage with random access

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526975A (zh) * 2008-03-07 2009-09-09 樊世斌 基因测序信息转化为数字信息的方法
WO2013028699A2 (en) * 2011-08-21 2013-02-28 The Board Of Regents Of The University Of Texas System Cell line discernment using short tandem repeat
CN106520982A (zh) * 2016-12-05 2017-03-22 中国人民解放军军事医学科学院放射与辐射医学研究所 一种用于身份鉴定的复合分型系统
CN106906300A (zh) * 2017-04-21 2017-06-30 为朔医学数据科技(北京)有限公司 一种基因身份证及其制备方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1237327A3 (en) * 2001-03-01 2003-07-02 NTT Data Technology Corporation Method and system for individual authentication and digital signature utilizing article having DNA based ID information mark
US20030086591A1 (en) * 2001-11-07 2003-05-08 Rudy Simon Identity card and tracking system
NL2003311C2 (en) * 2009-07-30 2011-02-02 Intresco B V Method for producing a biological pin code.
US9094211B2 (en) * 2011-08-26 2015-07-28 Life Technologies Corporation Systems and methods for identifying an individual
BR112015005429A2 (pt) * 2012-09-11 2017-07-04 Theranos Inc sistemas de gestão de informação e métodos usando uma assinatura biológica
US20150254912A1 (en) * 2014-03-04 2015-09-10 Adamov Ben-Zvi Technologies LTD. DNA based security
FR3027753B1 (fr) * 2014-10-28 2021-07-09 Morpho Procede d'authentification d'un utilisateur detenant un certificat biometrique
US11468194B2 (en) * 2017-05-11 2022-10-11 Ethan Huang Methods and systems for anonymizing genome segments and sequences and associated information
US11539516B2 (en) * 2017-10-27 2022-12-27 Eth Zurich Encoding and decoding information in synthetic DNA with cryptographic keys generated based on polymorphic features of nucleic acids

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526975A (zh) * 2008-03-07 2009-09-09 樊世斌 基因测序信息转化为数字信息的方法
WO2013028699A2 (en) * 2011-08-21 2013-02-28 The Board Of Regents Of The University Of Texas System Cell line discernment using short tandem repeat
CN106520982A (zh) * 2016-12-05 2017-03-22 中国人民解放军军事医学科学院放射与辐射医学研究所 一种用于身份鉴定的复合分型系统
CN106906300A (zh) * 2017-04-21 2017-06-30 为朔医学数据科技(北京)有限公司 一种基因身份证及其制备方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3812952A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4143722A4 (en) * 2020-04-29 2023-10-25 Trellis Health Systems, Inc. ANONYMOUS DIGITAL IDENTITY FROM INDIVIDUAL GENOME INFORMATION

Also Published As

Publication number Publication date
US20210150005A1 (en) 2021-05-20
US11822629B2 (en) 2023-11-21
EP3812952A4 (en) 2022-02-09
EP3812952A1 (en) 2021-04-28

Similar Documents

Publication Publication Date Title
KR102612799B1 (ko) 종료 조건을 가진 복제가능 스마트 계약
CN111783875B (zh) 基于聚类分析的异常用户检测方法、装置、设备及介质
KR102459318B1 (ko) 블록체인에 기반한 정보 처리 방법 및 디바이스 및 컴퓨터 판독가능 저장 매체
Chan et al. Inferring phylogenies of evolving sequences without multiple sequence alignment
CN110245469B (zh) 网页的水印生成方法、水印解析方法、装置及存储介质
WO2022105179A1 (zh) 生物特征图像识别方法、装置、电子设备及可读存储介质
WO2023065632A1 (zh) 数据脱敏方法、数据脱敏装置、设备及存储介质
CN108809646A (zh) 安全共享密钥共享系统及方法
CN109993008A (zh) 用于隐式完整性的方法和布置
WO2018225291A1 (ja) 計算機システム、秘密情報の検証方法、及び計算機
WO2021208701A1 (zh) 代码变更的注释生成方法、装置、电子设备及存储介质
CN110135986A (zh) 一种基于区块链智能合约实现的可搜索加密文件数据方法
WO2021189855A1 (zh) 基于ct序列的图像识别方法、装置、电子设备及介质
JP2017229070A (ja) ユーザ認証のための暗号プリミティブ
CN111033506A (zh) 利用匹配操作和差异操作的编辑脚本核实
US11822629B2 (en) Method and apparatus for generating digital identity and storage medium
CN114124502B (zh) 消息传输方法、装置、设备及介质
WO2022068355A1 (zh) 基于信息的特征的加密方法、装置、设备及储存介质
WO2020206909A1 (zh) 口令强度计算方法、装置及计算机可读存储介质
CN114826553A (zh) 一种群签名和同态加密的云存储数据安全保护方法及装置
CN112256472B (zh) 分布式数据调取方法、装置、电子设备及存储介质
CN116825259B (zh) 一种基于物联网的医疗数据管理方法
CN111475690B (zh) 字符串的匹配方法和装置、数据检测方法、服务器
CN110458566B (zh) 一种区块链的账户地址生成方法、系统、装置及计算机可读存储介质
TWI664849B (zh) 產生安全替代表示碼的方法、計算機程式產品及處理系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18923623

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018923623

Country of ref document: EP

Effective date: 20210119