WO2024090667A1 - Système et procédé de prédiction de race, au moyen de la fréquence de variants - Google Patents

Système et procédé de prédiction de race, au moyen de la fréquence de variants Download PDF

Info

Publication number
WO2024090667A1
WO2024090667A1 PCT/KR2022/019581 KR2022019581W WO2024090667A1 WO 2024090667 A1 WO2024090667 A1 WO 2024090667A1 KR 2022019581 W KR2022019581 W KR 2022019581W WO 2024090667 A1 WO2024090667 A1 WO 2024090667A1
Authority
WO
WIPO (PCT)
Prior art keywords
race
mutation
frequency
target
rate
Prior art date
Application number
PCT/KR2022/019581
Other languages
English (en)
Korean (ko)
Inventor
한헌종
권기상
Original Assignee
주식회사 쓰리빌리언
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 쓰리빌리언 filed Critical 주식회사 쓰리빌리언
Publication of WO2024090667A1 publication Critical patent/WO2024090667A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • Embodiments of the present invention relate to a system and method for predicting race, and more specifically, to a system and method for predicting race using the frequency of occurrence of mutations by race based on conditional probability.
  • This invention was made under the support of the Ministry of Science and ICT of the Republic of Korea under project number 1711160581 and task number 2022-0-00333.
  • the research management agency for the project is IITP Information and Communication Planning and Evaluation Institute, and the research project name is "SW Computing Industry Source.”
  • “Technology Development (R&D)” the research project name is “Development of AI integrated SW solution for multi-faceted analysis of rare pediatric diseases”
  • the host organization is Three Billion Co., Ltd., and the research period is 2022.04.01. ⁇ 2024.12.31.
  • N*M mutation profiles must be constructed to create a prediction model.
  • the mutation profiles of all samples used for prediction must be collected and analyzed, but such data is difficult to obtain and analysis requires high-spec analysis equipment.
  • One embodiment of the present invention is a mutation that can quickly and accurately predict the race of a target without requiring a high-specification analysis device for analysis by predicting the race of the target using the frequency of appearance of mutations summarized based on probability methodology. Provides a race prediction system and method using frequency of appearance.
  • a race prediction system using the frequency of mutation appearance includes a mutation frequency calculator that calculates the frequency of mutation appearance by race in conjunction with a population genome mutation database; a racial score calculation unit that calculates a score by race of the target using the frequency of occurrence of mutations by race; and a race prediction unit that predicts the race of the target based on the race score of the target.
  • the mutation frequency calculation unit collects the number of mutations by race, which represents the number of times each mutation appears in a specific race, and the total number of people by race from the population genome mutation database, and the collected number of mutations by race and the total number of people by race.
  • the frequency of occurrence of each mutation by race can be calculated using the total number of people.
  • the mutation frequency calculation unit calculates the ratio of people with the homozygote mutation among all people (homozygote rate) using the number of people with the homozygote mutation (number of homozygote) and the total number of people (allele number / 2). And, using the number of people with a heterozygote mutation (allele count - 2 * number of homozygote) and the total number of people (allele number / 2), the ratio of people with the heterozygote mutation among all people (heterozygote rate) ) is calculated, and the frequency of occurrence of mutations by race may include the homozygote rate and the heterozygote rate.
  • the mutation frequency calculation unit selects a mutation for which the total number of people (allele number / 2) is at least 1,000 or more in order to select a discriminating mutation, and calculates the mutation if the mutation is too rare. Considering that it may affect the overall race, variants with an allele count ratio of 5% or more and 95% or less can be selected to select the target variant for calculation of the homozygote rate and the heterozygote rate.
  • the race prediction system using mutation frequency stores the homozygote rate and heterozygote rate calculated for each mutation in a table by race to generate a mutation frequency table by race. It further includes a construction unit, wherein the score calculation unit for each race searches for mutations (target mutations) related to the target from the mutation frequency table for each race, and values of the homozygote rate and heterozygote rate for each race for each of the searched target mutations. By loading, a score by race can be calculated for the target mutation set consisting of the plurality of target mutations.
  • the race-specific score calculation unit uses the racial homozygote rate and heterozygote rate values for each target mutation, and determines the target's race based on the conditional probability that the target mutation set (target mutation set) will appear in a specific race. You can calculate star scores.
  • the ethnic score calculation unit may calculate the ethnic score using Equation 1 below.
  • V represents the set of target variants (v 1 , v 2 , ..., v n ), E is race, n is the number of target variants, and Pr(Vn
  • the race prediction unit may predict the race corresponding to the highest score among the scores for each race of the target as the race of the target.
  • a method for predicting race using the frequency of mutation appearance includes the steps of the race prediction server linking with a population genome mutation database to calculate the frequency of mutation appearance by race; The racial prediction server calculating a score by race of the target using the frequency of occurrence of mutations by race; And a step of the race prediction server predicting the race of the target based on the race score of the target.
  • the step of calculating the frequency of occurrence of mutations by race includes collecting the number of mutations by race, which indicates the number of times each mutation appears in a specific race, and the total number of people by race from the population genome mutation database; And it may include calculating the frequency of appearance of each variant by race using the collected number of occurrences of variants by race and the total number of people by race.
  • the step of calculating the frequency of occurrence of each mutation by race is to calculate the number of people with the homozygote mutation (number of homozygote) and the total number of people (allele number / 2), and calculate the number of people with the homozygote mutation among all people. Calculating the homozygote rate; And using the number of people with a heterozygote mutation (allele count - 2 * number of homozygote) and the total number of people (allele number / 2), the ratio of people with the heterozygote mutation among all people (heterozygote rate) It includes the step of calculating , and the frequency of occurrence of mutations by race may include the homozygote rate and the heterozygote rate.
  • the racial prediction server stores the homozygote rate and heterozygote rate calculated for each mutation in a table by race to create a mutation frequency table by race. It further includes generating a score for each race of the object, wherein the step of calculating the score for each race includes: searching for a variant (target variant) related to the object from the variant appearance frequency table for each race; And it may include loading the values of the homozygote rate and heterozygote rate by race for each of the searched target mutations, and calculating a score by race for the target mutation set consisting of the plurality of target mutations.
  • a probability methodology is used rather than a machine learning technique such as existing PCA or random forest, and the existing method uses an N*M mutation profile consisting of N mutations and M samples to build a model. Unlike those that require , because it only requires information on the number of occurrences of mutations in a summary, high-spec analysis equipment is not required for analysis, and the race of the target can be predicted quickly and accurately.
  • the results can be interpreted in more detail because the average value of the probabilities for each race is presented, and the predicted racial information can be usefully used in various research and clinical diagnosis. For example, if a mutation known to cause a specific disease is found in large numbers in people of race A who do not have the disease, the association between the mutation and the disease can be lowered only for race A. Additionally, in the case of diseases whose prevalence varies depending on race, additional clues can be obtained for diagnosing the disease by confirming the patient's race.
  • Figure 1 is a diagram illustrating the configuration of a race prediction system using mutation frequency according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating the detailed configuration of the race prediction server of FIG. 1.
  • Figure 3 is a diagram illustrating an example of a mutation frequency table by race generated according to an embodiment of the present invention.
  • FIG. 4 is a diagram illustrating a method of generating a table of variation occurrence frequencies by race according to an embodiment of the present invention.
  • Figure 5 is a diagram illustrating a method of calculating scores by race according to an embodiment of the present invention.
  • Figures 6 to 8 are diagrams to explain the process of calculating the frequency of occurrence of variations (homozygote rate and heterozygote rate) by race according to an embodiment of the present invention.
  • Figure 9 is a table showing variation information of a specific object used in the step of calculating conditional probability for each race according to an embodiment of the present invention.
  • Figure 10 is a flowchart illustrating a method for predicting race using mutation frequency according to an embodiment of the present invention.
  • transmission refers to the direct transmission of signals or information from one component to another component. In addition, it also includes those transmitted through other components.
  • transmitting or “transmitting” a signal or information as a component indicates the final destination of the signal or information and does not mean the direct destination. This is the same for “receiving” signals or information.
  • FIG. 1 is a configuration diagram of a racial prediction system using mutation frequency according to an embodiment of the present invention
  • FIG. 2 is a block diagram illustrating the detailed configuration of the racial prediction server 110 of FIG. 1.
  • the race prediction system using the frequency of mutation appearance may be implemented as a race prediction server 110.
  • the race prediction server 110 includes a mutation frequency calculation unit 210, a mutation frequency table construction unit 220, a race score calculation unit 230, a race prediction unit 240, and a control unit 250. It can be.
  • the mutation frequency calculation unit 210 can calculate the mutation frequency by race in conjunction with the population genome mutation database 120.
  • the population genome variation database 120 may be implemented as a GnomAD (The Genome Aggregation Database) database.
  • the mutation frequency calculation unit 210 may collect the number of mutations by race, which indicates the number of times each mutation appears in a specific race, and the total number of people by race from the population genome mutation database 120. In addition, the mutation frequency calculation unit 210 may calculate the frequency of occurrence of each mutation by race using the collected number of occurrences of mutations by race and the total number of people by race.
  • the frequency of occurrence of mutations by race can be understood as a concept that includes the proportion of people with homozygote mutations (homozygote rate) and the proportion of people with heterozygote mutations (heterozygote rate) among all people.
  • the process for calculating the homozygote rate and the heterozygote rate is as follows.
  • the mutation frequency calculation unit 210 calculates a value for the total number of people (allele number / 2) and the ratio of the allele count to the total race. Restrictions may apply.
  • the mutation frequency calculation unit 210 selects mutations for which the total number of people (allele number / 2) is at least 1,000 or more, and considering that if the mutations are too rare, it may affect the calculation, the overall number By selecting mutations with an allele count ratio of 5% or more and 95% or less in a race, target mutations for calculating the homozygote rate and the heterozygote rate can be selected.
  • the mutation frequency table construction unit 220 may store the homozygote rate and heterozygote rate calculated for each mutation in a table by race to generate a mutation frequency table by race.
  • the variant frequency table by race can be generated as shown in FIG. 4.
  • accurate and fast prediction is possible because mutation information from the mutation frequency table by race, summarized by race, is used rather than mutation profile data.
  • the score calculation unit 230 for each race may calculate the score for each race of the target using the frequency of occurrence of mutations for each race.
  • the race-specific score calculation unit 230 may use the race-specific variant appearance frequency table.
  • the race-specific score calculation unit 230 searches for mutations (target mutations) related to the target from the race-specific mutation appearance frequency table, and values the racial homozygote rate and heterozygote rate for each of the searched target mutations. By loading, a score by race can be calculated for the target mutation set consisting of the plurality of target mutations.
  • the race-specific score calculation unit 230 uses the values of the homozygote rate and heterozygote rate for each race for each target mutation to determine the conditional probability that the mutation set (target mutation set) of the target will appear in a specific race. Based on this, the score for each race of the subject can be calculated.
  • the ethnicity score calculation unit 230 may calculate the ethnicity score using Equation 1 below.
  • V represents the set of target variants (v 1 , v 2 , ..., v n ), E is race, n is the number of target variants, and Pr(Vn
  • the race-specific score calculation unit 230 determines the probability (Pr(Vn
  • the racial score calculation unit 230 calculates the geometric mean (1/n squared) of each probability as in Equation 1 above. This can be calculated and calculated as a score for each race.
  • the race score calculation unit 230 selects samples for each sample from the racial variant frequency table. You can calculate the score by race by taking the Zygosity (1/1: Homozygote, 1/0: Heterozygote) of the mutation by race and applying it to Equation 1 above.
  • the race prediction unit 240 may predict the race of the target based on the race score of the target. That is, the race prediction unit 240 may predict the race corresponding to the highest score among the scores for each race of the target as the race of the target.
  • the race prediction unit 240 can predict the race of sample A as American because the score of American is the highest at 0.615 for Sample A.
  • the race prediction unit 240 can predict the race of sample B as African because the score of African is the highest at 0.342 for Sample B.
  • the race prediction unit 240 may predict the races with slightly different scores as the race of the target. In other words, if the score difference between two or more specific races is small within a preset range, the racial prediction unit 240 may predict the specific race as the target's race.
  • the race prediction unit 240 may apply a weight to the scores of the two or more specific races and predict the race with the highest final score as the target's race.
  • the racial prediction unit 240 may apply the distribution ratio by race as a weight to calculate the final score of the two or more specific races, and predict the race with the highest calculated final score as the race of the target. there is.
  • the control unit 250 generally controls the operations of the mutation frequency calculation unit 210, the mutation frequency table construction unit 220, the racial score calculation unit 230, and the racial prediction unit 240. You can.
  • the control unit 250 functionally includes components such as the mutation frequency calculation unit 210, the mutation frequency table construction unit 220, the racial score calculation unit 230, and the racial prediction unit 240. Alternatively, it may be implemented including the entirety. That is, the control unit 250 may perform some of the functions of the components or may perform all of the functions of the components.
  • the control unit 250 controls the overall operation of the race prediction server 110 and may include a processor such as a CPU.
  • the control unit 250 may control other components included in the race prediction server 110 to perform operations corresponding to user input received through the input/output unit.
  • the processor can process instructions within the computing device, such as displaying graphic information to provide a GUI (Graphic User Interface) on an external input or output device, such as a display connected to a high-speed interface.
  • GUI Graphic User Interface
  • multiple processors and/or multiple buses may be utilized along with multiple memories and memory types as appropriate.
  • the processor may be implemented as a chipset comprised of chips including multiple independent analog and/or digital processors.
  • the GnomAD data presents allele count, allele number, and number of nomozygote values for each mutation for each race, and this data is provided separately for each race.
  • the sequence of a specific position on a specific chromosome is called an allele, and since each person has two chromosomes, they have two alleles.
  • An allele can be the same sequence as the reference or an alternative (mutated sequence).
  • the allele count refers to the number of alleles corresponding to mutations found in a specific population. Since each person has two alleles, when the number of people is N, the allele count ranges from a minimum of 0 to a maximum of 2N. If it is known that the reference allele at a specific position is A and the alternative allele is T, the allele count refers to the number of T alleles found at that position.
  • the allele number is a number that represents the total number of alleles. Since it is the number of people * 2, it becomes 2N. In other words, dividing the allele number by 2 gives the total number of people.
  • Number of homozygote is the number of people who have a homozygote mutation. This represents the number of people, so the allele count of homozygote people is 2 * number of homozygote.
  • heterozygote rate is the proportion of people with a heterozygote allele, it can be calculated by subtracting the homozygote allele count from the total allele count.
  • the process of calculating the homozygote rate and the heterozygote rate will be explained using Figure 6 as an example.
  • the reference allele is A and the alternative allele is T. And, of the 12 people, 3 are wild type, 7 are heterozygote, and the remaining 2 are homozygote.
  • the allele count is 11, the allele number is 24, and the number of homozygotes is 2.
  • the frequency of occurrence of mutations by race can be calculated by calculating the homozygote rate and heterozygote rate as described above.
  • the 1-976506-AGGCGGGGGC-A mutation was excluded because the total allele number was less than 2000.
  • the 1-1007245-C-G mutation was excluded because the allele count was less than 5%.
  • the score for each race is calculated as follows.
  • 1-138593-G-T in the mutation frequency table by race in Figure 7 is not used in the calculation because it is not a mutation found in the subject.
  • 1-100293-G-C and 1-592801-G-GA are not used in the calculation because they are not in the mutation frequency table by race in Figure 7.
  • devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions.
  • a processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software.
  • OS operating system
  • a processing device may access, store, manipulate, process, and generate data in response to the execution of software.
  • a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include.
  • a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.
  • Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device.
  • Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave.
  • Software may be distributed over networked computer systems and stored or executed in a distributed manner.
  • Software and data may be stored on one or more computer-readable recording media.
  • Figure 10 is a flowchart illustrating a method for predicting race using mutation frequency according to an embodiment of the present invention.
  • the race prediction method described here can be performed by the race prediction server (see 110 in FIG. 1).
  • the race prediction server can be understood as a concept that includes the components and functions of a race prediction system using mutation frequency according to an embodiment of the present invention.
  • the racial prediction method is only one embodiment of the present invention.
  • various steps may be added as needed, and the following steps may also be performed by changing the order, so the present invention It is not limited to each step and its sequence described below.
  • the race prediction server 110 may calculate the frequency of occurrence of mutations by race in conjunction with the population genome mutation database 120.
  • the race prediction server 110 collects the number of mutations by race, which indicates the number of times each mutation appears in a specific race, and the total number of people by race from the population genome mutation database 120, and collects the collected
  • the frequency of appearance of each mutation by race can be calculated using the number of mutations by race and the total number of people by race.
  • the race prediction server 110 uses the number of people with a homozygote mutation (number of homozygote) and the total number of people (allele number / 2) to determine the ratio of people with the homozygote mutation among all people. Calculate the (homozygote rate) and use the number of people with a heterozygote mutation (allele count - 2 * number of homozygote) and the total number of people (allele number / 2) to calculate the number of people with the heterozygote mutation among all people. By calculating the human rate (heterozygote rate), the frequency of occurrence of mutations by race (homozygote rate and heterozygote rate) can be obtained.
  • the race prediction server 110 may calculate a score by race of the target using the frequency of occurrence of mutations by race.
  • the racial prediction server 110 stores the homozygote rate and the heterozygote rate calculated for each mutation in a table by race to generate a mutation frequency table by race, and generates a mutation frequency table by race. Variations regarding the target (target variation) can be searched. Thereafter, the race prediction server 110 may load the values of the homozygote rate and heterozygote rate by race for each of the searched target mutations, and calculate a score by race for the target mutation set consisting of the plurality of target mutations. .
  • the race prediction server 110 may predict the race of the target based on the race score of the target. At this time, the race prediction server 110 may predict the race corresponding to the highest score among the scores for each race of the target as the race of the target.
  • the method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium.
  • the computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination.
  • Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CDROMs and DVDs, and magneto-optical media such as floptical disks. Includes magneto-optical media and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc.
  • program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.
  • the hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Physiology (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Public Health (AREA)
  • Ecology (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Un système de prédiction de race au moyen de la fréquence de variants, selon un mode de réalisation de la présente invention, comprend : une unité de calcul de fréquence de variants pour calculer les fréquences de variants dans chaque race conjointement avec une base de données de variants génomiques de population ; une unité de calcul de scores par race pour calculer des scores par race d'un sujet au moyen des fréquences de variants dans chaque race ; et une unité de prédiction de race pour prédire la race du sujet sur la base des scores par race de celui-ci.
PCT/KR2022/019581 2022-10-26 2022-12-05 Système et procédé de prédiction de race, au moyen de la fréquence de variants WO2024090667A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2022-0138807 2022-10-26
KR1020220138807A KR102529401B1 (ko) 2022-10-26 2022-10-26 변이 출현 빈도를 이용한 인종 예측 시스템 및 방법

Publications (1)

Publication Number Publication Date
WO2024090667A1 true WO2024090667A1 (fr) 2024-05-02

Family

ID=86381233

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/019581 WO2024090667A1 (fr) 2022-10-26 2022-12-05 Système et procédé de prédiction de race, au moyen de la fréquence de variants

Country Status (2)

Country Link
KR (1) KR102529401B1 (fr)
WO (1) WO2024090667A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489729A (zh) * 2020-12-04 2021-03-12 北京诺禾致源科技股份有限公司 基因数据查询方法及装置、非易失性存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972406B2 (en) * 2012-06-29 2015-03-03 International Business Machines Corporation Generating epigenetic cohorts through clustering of epigenetic surprisal data based on parameters
KR102138165B1 (ko) * 2020-01-02 2020-07-27 주식회사 클리노믹스 국가, 민족, 및 인종별 표준게놈지도를 이용한 정체성 분석 서비스 제공 방법

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972406B2 (en) * 2012-06-29 2015-03-03 International Business Machines Corporation Generating epigenetic cohorts through clustering of epigenetic surprisal data based on parameters
KR102138165B1 (ko) * 2020-01-02 2020-07-27 주식회사 클리노믹스 국가, 민족, 및 인종별 표준게놈지도를 이용한 정체성 분석 서비스 제공 방법

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KYUNG SUN PARK: "Analysis of worldwide carrier frequency and predicted genetic prevalence of congenital hypothyroidism based on a general population database", GENES, vol. 12, 20 August 2020 (2020-08-20), pages 1 - 9, XP093163273, DOI: 10.22541/au.159795396.63518982 *
SANNA GUDMUNDSSON; MORIEL SINGER-BERK; NICHOLAS A. WATTS; WILLIAM PHU; JULIA K. GOODRICH; MATTHEW SOLOMONSON; GENOME AGGREGATION D: "Variant interpretation using population databases: lessons from gnomAD", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 July 2021 (2021-07-23), 201 Olin Library Cornell University Ithaca, NY 14853, XP091126968, DOI: 10.1002/humu.24309 *
TAO HUANG, YANG SHU, YU-DONG CAI: "Genetic differences among ethnic groups", BMC GENOMICS, vol. 16, no. 1, 21 December 2015 (2015-12-21), pages 1 - 10, XP055700079, DOI: 10.1186/s12864-015-2328-0 *

Also Published As

Publication number Publication date
KR102529401B1 (ko) 2023-05-08

Similar Documents

Publication Publication Date Title
WO2018106005A1 (fr) Système de diagnostic d'une maladie à l'aide d'un réseau neuronal et procédé associé
WO2024090667A1 (fr) Système et procédé de prédiction de race, au moyen de la fréquence de variants
WO2020096098A1 (fr) Procédé de gestion de travail d'annotation et appareil et système le prenant en charge
WO2017164478A1 (fr) Procédé et appareil de reconnaissance de micro-expressions au moyen d'une analyse d'apprentissage profond d'une dynamique micro-faciale
WO2017116123A1 (fr) Système d'identification de cause d'une maladie au moyen d'informations de variation génétique concernant le génome d'un individu
WO2021149913A1 (fr) Procédé et dispositif permettant de sélectionner un gène lié à une maladie dans une analyse ngs
WO2019235828A1 (fr) Système de diagnostic de maladie à deux faces et méthode associée
WO2020111378A1 (fr) Procédé et système pour analyser des données de façon à aider au diagnostic d'une maladie
WO2022145564A1 (fr) Procédé et dispositif de compression automatique de modèle pour optimisation de service de modèle d'apprentissage profond et procédé de fourniture de service d'inférence en nuage l'utilisant
WO2022245062A1 (fr) Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle
WO2018030733A1 (fr) Procédé et système d'analyse de corrélation mesure/rendement
WO2021025218A1 (fr) Dispositif et procédé de prédiction du risque de maladie associé à un risque génétique pour un phénotype associé
WO2017116139A1 (fr) Système d'analyse de variation bioactive au moyen d'informations de variation génétique concernant le génome d'un individu
WO2020235730A1 (fr) Procédé de prédiction de performance d'apprentissage basé sur un motif de balayage d'un agent d'apprentissage dans un environnement d'apprentissage vidéo
WO2018088824A1 (fr) Procédé et appareil pour détecter un utilisateur anormal en utilisant des données de journal de clic
WO2023090825A1 (fr) Dispositif et procédé de surveillance de dérive de modèle d'intelligence artificielle (ai)
WO2024005474A1 (fr) Dispositif de service à réalité augmentée et procédé de fourniture d'un affichage d'une distance appropriée
WO2015126058A1 (fr) Procédé de prévision du pronostic d'un cancer
WO2016085262A2 (fr) Procédé d'analyse de médicament virtuel, procédé de création de bibliothèque d'analyse intensive, et système associé
WO2023113445A1 (fr) Procédé et appareil pour arithmétique à virgule flottante
WO2023182661A1 (fr) Dispositif électronique d'analyse de mégadonnées et son procédé de fonctionnement
WO2023013959A1 (fr) Appareil et procédé de prédiction de l'accumulation de bêta-amyloïdes
WO2022245063A1 (fr) Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle
US20220399078A1 (en) Biological sequence distance explorer system providing user visualization of genomic distance between a set of genomes in a dynamic zoomable fashion
WO2019225798A1 (fr) Procédé et dispositif de sélection d'une question dans de multiples feuilles de test psychologique sur la base d'un apprentissage automatique pour diagnostiquer rapidement les symptômes d'anxiété et de dépression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22963628

Country of ref document: EP

Kind code of ref document: A1