WO2022145135A1 - 情報処理方法、情報処理装置、及び情報処理プログラム - Google Patents
情報処理方法、情報処理装置、及び情報処理プログラム Download PDFInfo
- Publication number
- WO2022145135A1 WO2022145135A1 PCT/JP2021/041415 JP2021041415W WO2022145135A1 WO 2022145135 A1 WO2022145135 A1 WO 2022145135A1 JP 2021041415 W JP2021041415 W JP 2021041415W WO 2022145135 A1 WO2022145135 A1 WO 2022145135A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- information
- information processing
- region
- incentive
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 78
- 238000003672 processing method Methods 0.000 title claims description 21
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 93
- 238000004364 calculation method Methods 0.000 claims abstract description 56
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 230000002068 genetic effect Effects 0.000 claims description 55
- 239000008280 blood Substances 0.000 claims description 26
- 210000004369 blood Anatomy 0.000 claims description 26
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000000034 method Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 238000002493 microarray Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 12
- 210000000349 chromosome Anatomy 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 6
- 108700028369 Alleles Proteins 0.000 description 4
- 102000054766 genetic haplotypes Human genes 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 2
- 235000019577 caloric intake Nutrition 0.000 description 2
- 235000019504 cigarettes Nutrition 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000003205 genotyping method Methods 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 235000012054 meals Nutrition 0.000 description 2
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 230000004622 sleep time Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- This disclosure relates to a technique for collecting genetic data.
- SNP genotype imputation which estimates the genotype of a region that cannot be acquired by an SNP (single nucleotide polymorphism) microarray.
- SNP genotype imputation reference data containing a high density of information indicating the genotype of SNP is used.
- high-density reference data it is required to efficiently collect gene data in a region with low data density, that is, rare gene data, instead of randomly collecting gene data.
- Patent Document 1 discloses a method for providing life information data that makes it difficult to expose life information data and to falsify or alter genomic data by using blockchain technology.
- Patent Document 2 is an information transaction device that provides information users with only user information corresponding to the information provider for whom consent has been obtained, after presenting the amount of compensation to the information provider, and is an information trading device for acquiring user information. An information trading device that adjusts the amount of compensation according to the situation is disclosed.
- the present disclosure has been made to solve the above problems, and an object of the present disclosure is to provide a technique capable of efficiently collecting rare genetic data.
- the information processing method is an information processing method in an information processing apparatus that performs information processing using reference data
- the reference data is a base sequence indicating the genotype of the genome and the base sequence.
- the data density according to the locus is associated in advance and is the data, which is detected by the gene detection device, the gene data including the base sequence indicating the user's genotype is acquired, and the region where the gene data is located in the reference data is defined.
- the rarity indicating the rarity of the genetic data is calculated, and the incentive given to the user is calculated and calculated according to the calculated rarity.
- the incentive is output.
- rare genetic data can be efficiently collected.
- FIG. 1 It is a figure which shows an example of the whole structure of the information processing system to which the information processing apparatus in Embodiment 1 of this disclosure is applied. It is a block diagram which shows an example of the structure of the information processing apparatus shown in FIG. It is explanatory drawing of the term about a gene analysis. It is a figure which shows an example of the data structure of a reference data. It is the figure which represented the reference data according to the data density. It is a flowchart which shows an example of the processing of the information processing apparatus in Embodiment 1 of this disclosure. It is a block diagram which shows an example of the structure of the information processing apparatus in Embodiment 2 of this disclosure. It is a figure which shows an example of the data structure of the area reference data. It is a figure which represented the area reference data shown in FIG. 8 according to the data density. It is a flowchart which shows an example of the processing of the information processing apparatus in Embodiment 2 of this disclosure.
- SNP genotype imputation it is performed to statistically infer the genotypes of tens of millions of SNPs from the genetic data obtained by the SNP microarray.
- the genotype of SNP in an unobserved region is estimated by interpolating the base sequence of the gene data obtained by the SNP microarray with the base sequence in the reference data.
- reference data carrying the SNP genotype at high density is required. For that purpose, it is required to efficiently collect gene data corresponding to a region with low data density, that is, rare gene data, instead of collecting gene data randomly.
- Patent Document 1 discloses that the blockchain technique is used to provide the life information data of the second user encrypted with the public key of the second user to the first user who has succeeded in user authentication.
- the only task is to prevent the exposure of biometric data and the falsification or alteration of genomic data. Therefore, in Patent Document 1, it is not possible to efficiently collect rare genetic data.
- the user information provided by the information provider is personal information including position information, barometric pressure information, sound collection information, illuminance information, frequency information, age, occupation, and annual income, and is not genetic data. do not have. Therefore, in Patent Document 2, it is not possible to determine an appropriate incentive to be given to the information provider according to the rarity of the gene data, and as a result, it is not possible to efficiently collect the rare gene data.
- the information processing method is an information processing method in an information processing apparatus that performs information processing using reference data
- the reference data is a base sequence indicating the genotype of the genome and the base sequence.
- the data density according to the locus is associated in advance and is the data, which is detected by the gene detection device, the gene data including the base sequence indicating the user's genotype is acquired, and the region where the gene data is located in the reference data is defined.
- the rarity indicating the rarity of the genetic data is calculated, and the incentive given to the user is calculated and calculated according to the calculated rarity.
- the incentive is output.
- the region where the genetic data provided by the user is located is specified, and the rarity of the genetic data is calculated based on the data density associated with the specified region. Then, the incentive given to the user is calculated according to the rarity, and the calculated incentive is output. Therefore, it is possible to give a higher incentive to a user who provided gene data having a high rarity than a user who provided a gene having a low rarity. As a result, rare genetic data can be efficiently collected.
- the gene data is associated with attribute information including the user's attribute, and further, based on the attribute information, the degree of contribution of the gene data to the gene analysis is calculated, and in the calculation of the incentive. , The incentive according to the rarity and the contribution may be calculated.
- the degree of contribution to gene analysis is calculated based on the attribute information, and the incentive is calculated by further taking the calculated degree of contribution into consideration. Therefore, it is possible to motivate the user to provide the attribute information and efficiently collect the genetic data associated with the useful attribute information.
- the gene data is associated with locus information indicating the locus of a base sequence indicating the genotype, and in the calculation of rarity, the gene data is used in the reference data based on the locus information. You may specify the area where it is located.
- the locus information indicating the locus of the gene is associated with the gene data, the region in which the gene data is located can be easily specified in the reference data.
- the attribute information includes information indicating the place of residence of the user
- the reference data includes a plurality of area reference data corresponding to a predetermined area, and in specifying the area, the above-mentioned
- the region in which the genetic data is located may be specified in the area reference data corresponding to the information regarding the place of residence.
- the estimation accuracy will be improved.
- the genetic data of the user whose residence is the area corresponding to the area reference data with low data density is rarer than the genetic data of the user whose residence is the area corresponding to the area reference data with high data density. It gets higher. According to this configuration, it is possible to calculate an incentive according to the place of residence of the user who provided the genetic data. Therefore, it is possible to efficiently collect genetic data that is rare from the viewpoint of the region.
- the degree of contribution it is determined whether or not the attribute information includes information indicating the blood relationship of the user, and it is determined that the information indicating the blood relationship is included.
- the degree of contribution may be calculated higher than in the case where it is determined that the information indicating the blood relationship is not included.
- the attribute information includes information indicating the blood relationship of the user
- the contribution degree in the information processing method, in the calculation of the contribution degree, the contribution degree may be calculated higher as the amount of information of the information indicating the blood relationship included in the attribute information increases.
- the degree of contribution it is determined whether or not the attribute information includes information indicating the life pattern of the user, and it is determined that the information indicating the life pattern is included.
- the degree of contribution may be calculated higher than in the case where it is determined that the information indicating the life pattern is not included.
- the attribute information includes the life pattern of the user, it is possible to give a higher incentive to the user. Therefore, it is possible to motivate the user to provide life pattern data useful in epigenetics research, and it is possible to efficiently collect life pattern data.
- the contribution degree in the information processing method, in the calculation of the contribution degree, the contribution degree may be calculated higher as the amount of information of the information indicating the life pattern of the user included in the attribute information increases.
- the information processing apparatus is an information processing apparatus that performs information processing using reference data, and the reference data is in a base sequence indicating the genotype of the genome and a locus of the base sequence.
- the corresponding data density is the data associated in advance, and the acquisition unit that is detected by the gene detection device and acquires the gene data including the base sequence indicating the user's genotype, and the gene data are located in the reference data.
- a region specifying unit that specifies a region
- a rarity calculation unit that calculates the rarity indicating the rarity of the genetic data based on the data density associated with the region specified by the region specifying unit, and the rarity.
- the information processing program is an information processing program that causes a computer to function as an information processing device that performs information processing using reference data, and the reference data indicates the genotype of the genome.
- the data in which the base sequence and the data density corresponding to the locus of the base sequence are previously associated with each other, the acquisition unit for acquiring the gene data including the base sequence indicating the user's genotype detected by the gene detection device, and the above-mentioned Rarity indicating the rarity of the genetic data is calculated based on the region specifying part that specifies the region where the genetic data is located in the reference data and the data density associated with the region specified by the region specifying part.
- This disclosure can also be realized as an information processing system operated by such an information processing program.
- the information processing program can be distributed via a computer-readable non-temporary recording medium such as a CD-ROM or a communication network such as the Internet.
- FIG. 1 is a diagram showing an example of an overall configuration of an information processing system to which the information processing apparatus 1 according to the first embodiment of the present disclosure is applied.
- the information processing system includes an information processing device 1, a providing terminal 2, and a user terminal 3.
- the information processing apparatus 1 to the user terminal 3 are connected to each other so as to be able to communicate with each other via the network NT.
- the information processing device 1 is composed of, for example, a cloud server including one or more computers.
- the information processing apparatus 1 receives the gene data provided by the user from the providing terminal 2, and calculates an incentive to be given to the user based on the received gene data.
- the providing terminal 2 is composed of, for example, a computer owned by a medical institution, and transmits genetic data to the information processing device 1.
- Gene data is data that is detected by a gene detection device and contains a base sequence indicating a user's genotype.
- the gene detection device for example, an SNP microarray can be adopted.
- SNP microarray DNA fragments called probes, which detect differences in bases, are densely spread on the chip. SNP microarrays detect hundreds of thousands of SNP genotypes.
- the gene detection device is not limited to the SNP microarray, and other devices may be adopted.
- the gene data is associated with a user identifier that identifies the user who provides the gene data. Further, the gene data is associated with locus information indicating the locus of the base sequence indicating the genotype of SNP. This locus information is information indicating the locus on the genome of the base sequence indicating the genotype of SNP.
- the user terminal 3 is an information processing device possessed by a user who provides genetic data. Specifically, the user terminal 3 is composed of, for example, a mobile information terminal such as a smartphone and a tablet terminal, or a stationary computer such as a laptop computer. The user terminal 3 acquires the attribute information input by the user and transmits the acquired attribute information to the information processing apparatus 1.
- Network NT is composed of a wide area communication network including, for example, the Internet and a mobile phone communication network.
- the gene data is transmitted from the providing terminal 2 to the information processing device 1, but the present disclosure is not limited to this, and the gene data may be transmitted from the user terminal 3 to the information processing device 1.
- the user terminal 3 may acquire the gene data detected by the SNP microarray, associate it with the attribute information, and transmit it to the information processing apparatus 1.
- the attribute information may be transmitted from the providing terminal 2.
- the providing terminal 2 may acquire the gene data detected by the SNP microarray, associate it with the attribute information, and transmit it to the information processing apparatus 1.
- FIG. 2 is a block diagram showing an example of the configuration of the information processing apparatus 1 shown in FIG.
- the information processing device 1 includes a communication unit 11, a processor 12, and a memory 13.
- the communication unit 11 is composed of a communication circuit for connecting the information processing device 1 to the network NT.
- the communication unit 11 receives the gene data transmitted from the providing terminal 2.
- a user identifier and locus information are associated with the genetic data received here.
- the communication unit 11 receives the attribute information transmitted from the user terminal 3.
- the user identifier is associated with the received attribute information.
- the memory 13 is composed of a non-volatile storage device such as an SSD (Solid State Drive) or an HDD (Hard Disk Drive).
- the memory 13 stores the reference data 131 and the incentive information 132.
- Reference data 131 is reference data used in genotype imputation, and is data in which a base sequence indicating the genotype of the human genome and a data density according to the locus of the base sequence are associated with each other.
- FIG. 3 is an explanatory diagram of terms related to gene analysis.
- the two straight lines indicate homologous chromosomes 401 and 402.
- Lous coition 403 indicates where the genes on homologous chromosomes 401 and 402 are located.
- Allele 404 refers to genes paired on homologous chromosomes 401 and 402.
- Genotype 405 refers to a combination of alleles 404.
- Haplotype 406 refers to a combination of alleles 404.
- Diplotype 407 refers to a combination of haplotypes 406.
- FIG. 4 is a diagram showing an example of the data structure of the reference data 131.
- the reference data 131 has a data structure in which two base sequences corresponding to the homologous chromosomes 401 and 402 meander in units of two rows.
- the base sequence of the homologous chromosome 401 is arranged in the first line
- the base sequence of the homologous chromosome 402 is arranged in the second line
- the base sequence following the first line is arranged in the third line.
- the base sequence is arranged in the line so that the base sequence following the second line is arranged.
- the data density is associated with each locus 403 of the base sequence.
- the data density is a value determined according to the number of data used to determine the base at a locus 403. For example, if the number of data used is 10000, it is set to "1.0", if the number of data is 3000, it is set to "0.3". As the number of data used increases, the data density is set to a large value. ing.
- the reference data 131 is composed of the base sequence of the homologous chromosome 401 and the base sequence of the homologous chromosome 402 as a set. Therefore, the reference data 131 is loaded with information indicating genotypes such as alleles, haplotypes, and diprotypes.
- the reference data 131 may indicate the base sequence of tens of millions of human genome genes, the base sequence of the entire human genome, or the base sequence of tens of millions of SNPs. May be indicated.
- FIG. 5 is a diagram showing the reference data 131 according to the data density.
- the genotype contained in the high-concentration region indicated by reference numeral 601 is sequenced using more data than the genotype contained in the low-concentration region indicated by reference numeral 602. As described above, it can be seen that the data density of the reference data 131 varies depending on the sitting position.
- the genetic data detected by the SNP microarray is, for example, "... A ... A ... A ! and “... G ... C ... A !.
- the portion of "! indicates an undetermined base sequence
- A indicates adenine
- G indicates guanine
- C indicates cytosine.
- the SNP genotype imputation estimates the genotype of the SNP in this missing portion using reference data 131.
- the pattern of the base sequence confirmed in the genetic data is compared with the pattern of the base sequence of the reference data 131, and the region of the reference data 131 in which both patterns are most suitable is searched. Then, the base sequence of the missing portion in the gene data is inferred from the base sequence of the reference data 131 in the searched region, and the genotype of SNP is inferred based on the estimation result.
- the genotype estimation result obtained here is expressed with a probability of 0.95 for the "AA” type, 0.44 for the "AG” type, and 0.01 for the "GG” type, for example, for a certain SNP.
- the incentive information 132 is information in which a user identifier and an incentive given to a user are associated with each of one or more users.
- the incentive may be data having economic value such as electronic money, mileage points, virtual passage, purchase points of goods, and coupons, or data having no economic value such as a certificate. There may be.
- the processor 12 is composed of, for example, a CPU, and includes an acquisition unit 121, an area identification unit 122, a rarity calculation unit 123, a contribution degree calculation unit 124, an incentive calculation unit 125, and an output unit 126. These blocks included in the processor 12 are realized by the CPU executing an information processing program.
- the acquisition unit 121 acquires the gene data transmitted from the providing terminal 2 by using the communication unit 11.
- the acquisition unit 121 receives the attribute information transmitted from the user terminal 3 by using the communication unit 11.
- the acquisition unit 121 associates the gene data with the attribute information using the user identifier as a key. This gives a dataset associated with user identifiers, genetic data, locus information, and attribute information.
- the attribute information includes personal information of the user, residence information indicating the place of residence of the user, blood relationship information indicating the blood relationship of the user, and life pattern information indicating the life pattern of the user.
- the user's personal information includes the user's age, gender, occupation, etc.
- the user's personal information is, for example, information obtained by the user inputting to the user terminal 3.
- the place of residence information includes information indicating the name of the area where the user lives.
- the name of the area in which the person resides includes, for example, at least one of a country name, a prefecture name, and a state name.
- the information indicating the name of the area in which the person resides may include information having a larger grain size than the prefecture (for example, in the case of Japan, Honshu, Shikoku, Kyushu, and Hokkai degree), and information having a larger grain size than the country.
- the residence information may be obtained by inputting to the user terminal 3 by the user, or may be determined based on the position data detected by the GPS sensor included in the user terminal 3.
- the life pattern information indicates, for example, the life pattern of the user in a predetermined period (for example, one day).
- Life pattern information includes, for example, the average number of cigarettes smoked per day, the average alcohol intake per day, the average calorie consumption per day, the average calorie intake per day, the number of meals per day, the meal time, and the like.
- the average wake-up time, average bedtime, average sleep time per day, etc. are included.
- the life pattern information may be information input by the user or information monitored by a biosensor such as a smart watch.
- the region specifying unit 122 identifies the region in which the gene data acquired by the acquisition unit 121 is located in the reference data.
- the region specifying unit 122 may specify the region in which the genetic data is located based on the locus information associated with the genetic data.
- the shortage calculation unit 123 calculates the shortage indicating the rarity of the genetic data based on the data density associated with the region specified by the region identification unit 122. For example, the rarity calculation unit 123 calculates the average value of the density data from the density data associated with all the sitting positions in the region specified by the region identification unit 122, and calculates the reciprocal of the calculated average value as the rarity. do it. Alternatively, the rarity calculation unit 123 calculates the average value of the density data associated with the locus of the confirmed base in the region specified by the region identification unit 122, and the reciprocal of the calculated average value is the rarity. It may be calculated as. This makes it possible to calculate the rarity so that the rarity value increases as the mean value of the data density in the specified region decreases.
- the contribution calculation unit 124 calculates the contribution of the gene data to the gene analysis based on the attribute information associated with the gene data. For example, the contribution calculation unit 124 determines whether or not the attribute information includes blood-related information, and if it is determined that the blood-related information is included, or if it is determined that the blood-related information is not included. The degree of contribution is calculated higher than that of.
- the blood relationship information for example, information that identifies a relative of the user who provides the genetic data can be adopted. As relatives, for example, fathers, mothers, brothers, sisters, grandfathers, relatives and the like can be adopted. As the information for identifying the relative, for example, the identifier of the relative can be adopted.
- the contribution calculation unit 124 may calculate the contribution value higher as the amount of information related to blood relations increases. For example, the contribution calculation unit 124 may calculate the contribution value higher as the number of relatives indicated by the blood relationship information included in the attribute information increases.
- the contribution calculation unit 124 determines whether or not the attribute information includes the user's life pattern, and when it is determined that the attribute information is included, compared with the case where it is determined that the life pattern information is not included. Therefore, the degree of contribution may be calculated high. In this case, the contribution calculation unit 124 may calculate the contribution as the amount of life pattern information increases. For example, the contribution calculation unit 124 may determine that the amount of information on the life pattern information increases as the types of data included in the life pattern information such as the number of cigarettes smoked per day and the amount of alcohol intake per day increase. ..
- the incentive calculation unit 125 calculates an incentive to be given to the user so that the value increases as the rarity and contribution increase. Assuming that the rarity is A and the contribution is B, for example, the incentive calculation unit 125 may calculate the incentive using the following formula.
- Incentive ⁇ ⁇ A + ⁇ ⁇ B (1)
- ⁇ is a weighting coefficient for rarity
- ⁇ is a weighting coefficient for contribution.
- the coefficient ⁇ is set to a value larger than the coefficient ⁇
- the coefficient ⁇ is set to a value larger than the coefficient ⁇ .
- the output unit 126 outputs the incentive calculated by the incentive calculation unit 125.
- the output unit 126 may give an incentive by registering the calculated incentive in the incentive information 132 of the corresponding user. Further, the output unit 126 may transmit the presentation information for presenting the calculated incentive to the user to the user terminal 3 by using the communication unit 11.
- FIG. 6 is a flowchart showing an example of processing of the information processing apparatus 1 according to the first embodiment of the present disclosure.
- step S1 the acquisition unit 121 acquires the gene data transmitted from the providing terminal 2 by using the communication unit 11.
- step S2 the region specifying unit 122 identifies the region where the gene data is located in the reference data 131 based on the locus information associated with the gene data.
- the area 131a surrounded by the quadrangle is specified from the reference data 131.
- step S3 the rarity calculation unit 123 calculates the average value of the data density in the region specified in step S2, and calculates the reciprocal of the calculated average value as the rarity of the gene data.
- the average value of the data density of the region 131a was 1.3, 1 / 1.3 is calculated as the rarity.
- step S4 the contribution calculation unit 124 calculates the contribution based on the attribute information associated with the genetic data.
- the contribution calculation unit 124 increases the contribution value as the amount of information indicated by the blood relationship in the attribute information increases, and increases the contribution value as the amount of life pattern information increases. do it.
- step S5 the incentive calculation unit 125 inputs the rarity calculated in step S3 and the contribution calculated in step S4 into the equation (1), and calculates the incentive according to the rarity and the contribution. do.
- step S6 the output unit 126 gives an incentive to the user by registering the incentive calculated in step S5 in the incentive information 132 of the user who provided the gene data.
- the information processing apparatus 1 in the present embodiment it is possible to give a high incentive to the user who provided the gene data having a high degree of rarity and contribution. As a result, it is possible to efficiently collect gene data that is rare and has a high degree of contribution to gene analysis.
- FIG. 7 is a block diagram showing an example of the configuration of the information processing apparatus 1A according to the second embodiment of the present disclosure.
- the same components as those in the first embodiment are designated by the same reference numerals, and the description thereof will be omitted.
- the area specifying unit 122A identifies the area reference data 1310 corresponding to the place of residence of the user who provided the genetic data, based on the place of residence information included in the attribute information. Then, the region specifying unit 122A identifies a region in which the gene data is located in the specified region reference data 1310. Since the details of the process for specifying this region are the same as those in the first embodiment, the description thereof will be omitted.
- the memory 13A stores three area reference data 1310 corresponding to the area A, the area B, and the area C.
- the area specifying unit 122A may determine which area A to C the residential area indicated by the residential area information belongs to, and specify the area reference data 1310 corresponding to the area to which the residential area belongs.
- the memory 13 stores three regional reference data 1310, but this is an example, and two regional reference data 1310 may be stored, or four or more regional reference data 1310 may be stored. You may.
- FIG. 8 is a diagram showing an example of the data structure of the area reference data 1310.
- the region reference data 1310 corresponding to region A is generated based on the genetic data of the resident of region A
- the region reference data 1310 corresponding to region B is generated based on the genetic data of the resident of region B
- the region reference data 1310 corresponding to is generated based on the genetic data of the resident of region C.
- Each regional reference data 1310 differs only in the population used to generate it, and the detailed data structure is the same as the reference data 131. That is, the region reference data 1310 is data in which the base sequence indicating the genotype and the data density corresponding to the locus of the base sequence are associated with each other.
- the grain size of regions A to C may be a national unit or a regional unit that constitutes a country (for example, in the case of Japan, prefectures, or Honshu, Shikoku, Kyushu, and Hokkai degree). It may be a unit larger than a country (eg, Asian Continent, African Continent, North American Continent).
- FIG. 9 is a diagram showing the area reference data 1310 shown in FIG. 8 according to the data density. As shown in FIG. 9, it can be seen that the data densities of the region reference data 1310 differ depending on the regions A to C.
- FIG. 10 is a flowchart showing an example of processing of the information processing apparatus 1A according to the second embodiment of the present disclosure.
- the same processes as those of FIG. 6 are designated by the same reference numerals, and the description thereof will be omitted.
- step S101 following step S1 the region specifying unit 122A identifies the place of residence of the user who provided the genetic data from the regional information included in the attribute information associated with the genetic data acquired in step S1.
- step S102 the area specifying unit 122A specifies the area reference data 1310 corresponding to the residential place specified in step S101. After that, a process of calculating and outputting an incentive given to the user using the specified area reference data 1310 and the gene data acquired in step S1 is executed.
- the area reference data 1310 corresponding to the area A is specified, and the area 1310a in which the genetic data is located in the specified area reference data 1310. Is identified.
- the rarity is calculated by 1 / 1.3.
- the region 1310a in which the genetic data is located is specified in the region reference data 1310 of region B.
- the rarity is calculated by 1 / 0.3.
- the average value of the data density of the region 1310a is larger in the order of region A, region C, and region B. Therefore, the degree of rarity is in the order of region B, region C, and region A. As a result, the incentive given to the user belonging to the region B is the maximum, and the incentive given to the user belonging to the region A is the minimum.
- the information processing apparatus 1A in the second embodiment can give a high incentive to a user who has a place of residence in the area corresponding to the area reference data 1310 having a low data density. Therefore, it is possible to give a motivation to provide the genetic data to the user whose residence is the area corresponding to the area reference data 1310 having a low data density, and it is possible to efficiently collect the genetic data.
- the region specifying unit 122 identifies the region 131a using the locus information associated with the genetic data, but the present disclosure is not limited to this. For example, the region specifying unit 122 compares the pattern of the base sequence of the gene data with the pattern of the base sequence of the reference data 131, searches for the region of the reference data 131 in which both patterns best match, and searches for the searched region as the gene data. May be specified as the region 131a in which the is located. This also applies to the area specifying unit 122A.
- the incentive information 132 is stored in the information processing apparatus 1, but the present disclosure is not limited to this.
- the incentive information 132 may be stored in an external server owned by an administrator who manages the incentive. If the incentive is electronic money, the manager is, for example, a financial institution, if the incentive is mileage points, the manager is, for example, an airline company, and if the incentive is points for purchasing products, the manager is, for example, the point operating company. Become.
- the incentive calculation unit 125 may calculate the incentive based only on the rarity. In this case, the contribution calculation unit 124 becomes unnecessary.
- the reference data 131 is stored in the information processing device 1, but the present disclosure is not limited to this, and an external server may store the reference data 131.
Abstract
Description
数十万人を対象に、ヒトゲノム全体を網羅する数千万箇所のSNPの遺伝子型を特定するジェノタイピングを実施し、対象形質とSNPの遺伝子型との関連を評価するゲノムワイド関連解析の研究が進められている。ゲノムワイド関連解析には数千万のSNPの遺伝子型が必要とされる。その一方で、近年、低コスト且つ容易にSNPのジェノタイピングが行えるSNPマイクロアレイが普及している。
図1は、本開示の実施の形態1における情報処理装置1が適用された情報処理システムの全体構成の一例を示す図である。情報処理システムは、情報処理装置1、提供端末2、及びユーザ端末3を含む。情報処理装置1~ユーザ端末3は、ネットワークNTを介して相互に通信可能に接続されている。
ここで、αは希少度に対する重み係数であり、βは貢献度に対する重み係数である。希少度を重視する場合、係数αは係数βよりも大きな値に設定され、貢献度を重視する場合、係数βは係数αよりも大きな値が設定される。
実施の形態2は、ユーザの居住地を考慮に入れてインセンティブを算出するものである。図7は、本開示の実施の形態2における情報処理装置1Aの構成の一例を示すブロック図である。本実施の形態において実施の形態1と同一の構成要素には同一の符号を付し、説明を省略する。
Claims (10)
- 参照データを用いて情報処理を行う情報処理装置における情報処理方法であって、
前記参照データは、ゲノムの遺伝子型を示す塩基配列と前記塩基配列の座位に応じたデータ密度とが予め関連付けられデータであり、
遺伝子検出デバイスによって検出され、ユーザの遺伝子型を示す塩基配列を含む遺伝子データを取得し、
前記参照データにおいて前記遺伝子データが位置する領域を特定し、
特定した前記領域に関連付けられたデータ密度に基づいて、前記遺伝子データの希少性を示す希少度を算出し、
算出した前記希少度に応じて前記ユーザに付与するインセンティブを算出し、
算出した前記インセンティブを出力する、
情報処理方法。 - 前記遺伝子データは、ユーザの属性を含む属性情報が関連付けられ、
さらに、前記属性情報に基づいて、前記遺伝子データの遺伝子解析への貢献度を算出し、
前記インセンティブの算出では、前記希少度及び前記貢献度に応じたインセンティブを算出する、
請求項1記載の情報処理方法。 - 前記遺伝子データは、前記遺伝子型を示す塩基配列の座位を示す座位情報が関連付けられ、
前記希少度の算出では、前記座位情報に基づいて、前記参照データにおいて前記遺伝子データが位置する領域を特定する、
請求項1又は2記載の情報処理方法。 - 前記属性情報は、前記ユーザの居住地を示す情報を含み、
前記参照データは、予め定められた地域に応じた複数の地域参照データを含み、
前記領域の特定では、前記居住地に関する情報に対応する地域参照データにおいて前記遺伝子データが位置する領域を特定する、
請求項2記載の情報処理方法。 - 前記貢献度の算出では、前記属性情報に前記ユーザの血縁関係を示す情報が含まれているか否かを判定し、前記血縁関係を示す情報が含まれていると判定した場合、前記血縁関係を示す情報が含まれていないと判定した場合に比べて前記貢献度を高く算出する、
請求項2記載の情報処理方法。 - 前記貢献度の算出では、前記属性情報に含まれている前記血縁関係を示す情報の情報量が多くなるにつれて前記貢献度を高く算出する、
請求項5記載の情報処理方法。 - 前記貢献度の算出では、前記属性情報に前記ユーザの生活パターンを示す情報が含まれているか否かを判定し、前記生活パターンを示す情報が含まれていると判定した場合、前記生活パターンを示す情報が含まれていないと判定した場合に比べて前記貢献度を高く算出する、
請求項2記載の情報処理方法。 - 前記貢献度の算出では、前記属性情報に含まれている前記ユーザの生活パターンを示す情報の情報量が多くなるにつれて前記貢献度を高く算出する、
請求項7記載の情報処理方法。 - 参照データを用いて情報処理を行う情報処理装置であって、
前記参照データは、ゲノムの遺伝子型を示す塩基配列と前記塩基配列の座位に応じたデータ密度とが予め関連付けられたデータであり、
遺伝子検出デバイスによって検出され、ユーザの遺伝子型を示す塩基配列を含む遺伝子データを取得する取得部と、
前記参照データにおいて前記遺伝子データが位置する領域を特定する領域特定部と、
前記領域特定部により特定された前記領域に関連付けられたデータ密度に基づいて、前記遺伝子データの希少性を示す希少度を算出する希少度算出部と、
前記希少度算出部により算出された前記希少度に応じて前記ユーザに付与するインセンティブを算出するインセンティブ算出部と、
前記インセンティブ算出部により算出された前記インセンティブを出力する出力部とを備える、
情報処理装置。 - 参照データを用いて情報処理を行う情報処理装置としてコンピュータを機能させる情報処理プログラムであって、
前記参照データは、ゲノムの遺伝子型を示す塩基配列と前記塩基配列の座位に応じたデータ密度とが予め関連付けられたデータであり、
遺伝子検出デバイスによって検出され、ユーザの遺伝子型を示す塩基配列を含む遺伝子データを取得する取得部と、
前記参照データにおいて前記遺伝子データが位置する領域を特定する領域特定部と、
前記領域特定部により特定された前記領域に関連付けられたデータ密度に基づいて、前記遺伝子データの希少性を示す希少度を算出する希少度算出部と、
前記希少度算出部により算出された前記希少度に応じて前記ユーザに付与するインセンティブを算出するインセンティブ算出部と、
前記インセンティブ算出部により算出された前記インセンティブを出力する出力部としてコンピュータを機能させる、
情報処理プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180083928.9A CN116583906A (zh) | 2020-12-28 | 2021-11-10 | 信息处理方法、信息处理装置以及信息处理程序 |
JP2022572930A JPWO2022145135A1 (ja) | 2020-12-28 | 2021-11-10 | |
US18/212,802 US20230334520A1 (en) | 2020-12-28 | 2023-06-22 | Information processing method, information processing device, and non-transitory computer readable recording medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020218797 | 2020-12-28 | ||
JP2020-218797 | 2020-12-28 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/212,802 Continuation US20230334520A1 (en) | 2020-12-28 | 2023-06-22 | Information processing method, information processing device, and non-transitory computer readable recording medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022145135A1 true WO2022145135A1 (ja) | 2022-07-07 |
Family
ID=82260408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/041415 WO2022145135A1 (ja) | 2020-12-28 | 2021-11-10 | 情報処理方法、情報処理装置、及び情報処理プログラム |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230334520A1 (ja) |
JP (1) | JPWO2022145135A1 (ja) |
CN (1) | CN116583906A (ja) |
WO (1) | WO2022145135A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020149188A (ja) * | 2019-03-12 | 2020-09-17 | キヤノンメディカルシステムズ株式会社 | フルゲノム情報利用システム及び方法 |
US20200327250A1 (en) * | 2019-04-12 | 2020-10-15 | Novo Vivo Inc. | System for decentralized ownership and secure sharing of personalized health data |
JP2020177566A (ja) * | 2019-04-22 | 2020-10-29 | ジェネシスヘルスケア株式会社 | 研究支援システム、研究支援装置、研究支援方法及び研究支援プログラム |
-
2021
- 2021-11-10 CN CN202180083928.9A patent/CN116583906A/zh active Pending
- 2021-11-10 WO PCT/JP2021/041415 patent/WO2022145135A1/ja active Application Filing
- 2021-11-10 JP JP2022572930A patent/JPWO2022145135A1/ja active Pending
-
2023
- 2023-06-22 US US18/212,802 patent/US20230334520A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020149188A (ja) * | 2019-03-12 | 2020-09-17 | キヤノンメディカルシステムズ株式会社 | フルゲノム情報利用システム及び方法 |
US20200327250A1 (en) * | 2019-04-12 | 2020-10-15 | Novo Vivo Inc. | System for decentralized ownership and secure sharing of personalized health data |
JP2020177566A (ja) * | 2019-04-22 | 2020-10-29 | ジェネシスヘルスケア株式会社 | 研究支援システム、研究支援装置、研究支援方法及び研究支援プログラム |
Also Published As
Publication number | Publication date |
---|---|
CN116583906A (zh) | 2023-08-11 |
JPWO2022145135A1 (ja) | 2022-07-07 |
US20230334520A1 (en) | 2023-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Natarajan et al. | Deep-coverage whole genome sequences and blood lipids among 16,324 individuals | |
Mathias et al. | A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome | |
TWI363309B (en) | Genetic analysis systems, methods and on-line portal | |
Pearl et al. | The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis | |
Cheung et al. | Performance of ancestry-informative SNP and microhaplotype markers | |
CN102171697A (zh) | 用于个性化行动计划的方法和系统 | |
CN102187344A (zh) | 用于综合多种环境与遗传风险因子的方法和系统 | |
Barbeira et al. | Fine‐mapping and QTL tissue‐sharing information improves the reliability of causal gene identification | |
Pasaniuc et al. | Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation | |
RU2699517C2 (ru) | Способ оценки риска заболевания у пользователя на основании генетических данных и данных о составе микробиоты кишечника | |
CN107924719B (zh) | 疾病风险预测方法以及执行该方法的装置 | |
US20190087540A1 (en) | System and method for analyzing genotype using genetic variation information on individual's genome | |
Barbeira et al. | Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits | |
Rodríguez-Esparragón et al. | Paraoxonase 1 and 2 gene variants and the ischemic stroke risk in Gran Canaria population: an association study and meta-analysis | |
Drenos et al. | The use of meta‐analysis risk estimates for candidate genes in combination to predict coronary heart disease risk | |
Justice et al. | AUDIT‐C and ICD codes as phenotypes for harmful alcohol use: association with ADH1B polymorphisms in two US populations | |
JP2019505934A (ja) | コンピューターにより実施される集団に対する薬物安全性の評価 | |
Hahn et al. | locStra: Fast analysis of regional/global stratification in whole‐genome sequencing studies | |
Marchetti-Bowick et al. | A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits | |
Li et al. | Height associated variants demonstrate assortative mating in human populations | |
Yang et al. | Association between continuity of care and long-term mortality in Taiwanese first-ever stroke survivors: an 8-year cohort study | |
O’Neill et al. | Genetic susceptibility to severe childhood asthma and rhinovirus-C maintained by balancing selection in humans for 150 000 years | |
Alyousfi et al. | Gene-specific metrics to facilitate identification of disease genes for molecular diagnosis in patient genomes: a systematic review | |
WO2022145135A1 (ja) | 情報処理方法、情報処理装置、及び情報処理プログラム | |
Yang et al. | A systematic comparison of normalization methods for eQTL analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21914996 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022572930 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180083928.9 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21914996 Country of ref document: EP Kind code of ref document: A1 |