AU2024201174A1 - Shared memory based gene analysis method, apparatus and computer device - Google Patents

Shared memory based gene analysis method, apparatus and computer device Download PDF

Info

Publication number
AU2024201174A1
AU2024201174A1 AU2024201174A AU2024201174A AU2024201174A1 AU 2024201174 A1 AU2024201174 A1 AU 2024201174A1 AU 2024201174 A AU2024201174 A AU 2024201174A AU 2024201174 A AU2024201174 A AU 2024201174A AU 2024201174 A1 AU2024201174 A1 AU 2024201174A1
Authority
AU
Australia
Prior art keywords
gene
analysis
shared memory
library file
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
AU2024201174A
Inventor
Zengquan HE
Chao SONG
Jin’an WANG
Jiaobo YANG
Chuang YU
Youjin Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Genomics Co Ltd
Bgi Health HK Co Ltd
Original Assignee
BGI Genomics Co Ltd
Bgi Health HK Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011139824.9A external-priority patent/CN112270959A/en
Application filed by BGI Genomics Co Ltd, Bgi Health HK Co Ltd filed Critical BGI Genomics Co Ltd
Priority to AU2024201174A priority Critical patent/AU2024201174A1/en
Publication of AU2024201174A1 publication Critical patent/AU2024201174A1/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Library & Information Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Automatic Analysis And Handling Materials Therefor (AREA)

Abstract

Asharedmemory based gene analysis method, apparatus and computer device. The method comprises: reading sample data and preprocessing the sample data; performing a gene analysis on the sample data preprocessed, and determining whether a required library file in the gene analysis is in a gene shared memory; if yes, obtaining the required library file from the gene shared memory, mapping the required library file to a process of the gene analysis of the sample data preprocessed, and completing a corresponding analysis. In this method, a shared memory mechanism is adopted to establish indexes for the gene analysis. Whether a library file that are frequently used in the gene analysis process are in the gene shared memory is determined; if yes, the library file can be obtained from the gene shared memory and can be mapped to the sample data, and can be convenientlymapped from the gene sharedmemory to an analysis process performed on the sample data. The method can greatly reduce the time and I/O occupation for loading the library file from a hard disk. Therefore, the efficiency of analysis can be improved.

Description

SHARED MEMORY BASED GENE ANALYSIS METHOD, APPARATUS AND COMPUTER DEVICE CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This is a divisional of Australian Patent Application No.
2020457044, the originally-filed specification of which is
incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0002] The present disclosure relates to the technical field of data
processing, in particular to a shared memory based gene analysis
method, apparatus, computer device, and a computer-readable storage
medium.
BACKGROUND
[0003] With the smooth implementation of the Human Genome Project
and the rapid development of sequencing technology, the cost of
sequencinghasbeen significantly reduced, and the speedofsequencing
has been significantly improved. The cost of the sequencing of human
whole genome has been reduced to less than $1000, and the amount of
DNA sequence data has increased exponentially. How to utilize and
express the of data quickly, then analyze and explain potential problems in gene sequences, and discover information beneficial to human beings from massive data has become an urgent problem to be solved. The more and more applications of sequence data generated by human whole genome sequencing (WGS) and the continuous demand for rapid analysis and processing of massive sequence data have formed a new technical bottleneck for data analysis, which restricts the clinical application of second-generation sequencing technology.
[0004] At present, there are many kinds methods and tools for data
analysis of the second-generation sequencing in the field of
bioinformatics internationally. The most commonly used process mainly
comprises an input of data, a preprocessing operation, a sequence
comparison, a annotation, a variant calling and a pathway analysis.
However, it is very time-consuming to apply the whole process in WGS.
In addition, samples input need customized processes such as merging
the samples, splitting the samples and so on which need to be performed
separately, so that the operation efficiency is low and the I/O
consumption is increased. In addition, in the process of data
analysis, index files should be loaded separately for each step of
analysis and processing. If multiple tasks load the same index file,
the tasks will consume more memory and take more time.
-2- IEE210846PAU
SUMMARY
[0005] In view of this, the disclosure provides a shared memory based
gene analysis method, apparatus, computer device, and a
computer-readable storage medium to solve a technical problem of the
low operation efficiency caused by the requirement of the processes
such as merging the input samples in some pipelines, the high memory
consumption and the high time consumption caused by loading of index
files repeatedly in the data analysis process in the prior art.
[0006] Some embodiments of this disclosure provide a shared memory
based gene analysis method, comprising: reading sample data and
preprocessing the sample data; performing a gene analysis on the
sample data preprocessed, and determining whether a required library
file in the gene analysis is in a gene shared memory; if yes, obtaining
the required library file from the gene shared memory, mapping the
required library file to a process of the gene analysis of the sample
data preprocessed, and completing a corresponding analysis.
[0007] Optionally, the method further comprises: determiningwhether
the required library file meets a load condition, in a case where
the required library file in the gene analysis is not in the gene
shared memory; and loading the required library file into the gene
shared memory, in a case where the loading condition is met.
[0008] Optionally, determining whether the required library file
meets a load condition, in a case where the required library file
in the gene analysis is not in the gene shared memory, and loading
-3 - IEE210846PAU the required library file into the gene shared memory, in a case where the loading condition is met comprises: acquiring information of the required library file and information of the gene shared memory, wherein the information of the required library file comprises a space required by the required library file and the number of historical load requests, and the information of the gene shared memory comprises a remaining space of the gene shared memory; and if the number of historical load requests is greater than a first preset number, and the space required by the required library file is less than the remaining space of the gene shared memory, loading the required library file into the gene shared memory.
[0009] Optionally, the information of the required library file
further comprises a load request frequency of the required library
file, the information of the gene shared memory further comprises
load request frequencies of all library files; determining whether
the required library file meets a load condition, and loading the
required library file into the gene shared memory, in a case where
the loading condition is met further comprises: if the number of
historical load requests is greater than the first preset number,
and the space required by the required library file is greater than
the remaining space of the gene shared memory, ranking the required
library file and the all library files in an order of priority
according to the load request frequency of the required library file
and the load request frequencies of the all library files to obtain
-4 - IEE210846PAU a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene shared memory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene shared memory; and loading the required library file into the gene shared memory.
[0010] Optionally, the method further comprises: setting the gene
shared memory for library files used in gene analysis, setting a size
of the gene shared memory, the number of library files that can be
accommodated, a name of each library file and a size offset of the
each library file; and loading library files commonly used in gene
analysis into the gene shared memory according to the size of the
gene shared memory, the number of library files that can be
accommodated, the name of the each library file and the size offset
of the each library file.
[0011] Optionally, the gene analysis comprises an alignment
analysis, a variation analysis and an annotation analysis, the method
further comprises: performing the alignment analysis, the variation
analysis, and the annotation analysis on the sample data preprocessed
in sequence, wherein in a case where the sample data preprocessed
comprises multiple groups of sample data, the multiple groups of
-5 - IEE210846PAU sample data are in a same step or different steps of the gene analysis at a time.
[0012] Optionally, the gene analysis further comprises a sorting
analysis and a marking-duplicate analysis, wherein after performing
the alignment analysis, the variation analysis, and the annotation
analysis on the sample data preprocessed in sequence, the method
further comprises: labeling the sample data after the alignment
analysis with a position tag; and performing the sorting analysis
and the marking-duplicate analysis by module on the sample data
labeled.
[0013] Optionally, the method further comprises: connecting some or
all steps of the gene analysis by a use of memory.
[0014] Optionally, preprocessing the sample data comprises:
performing a quality control, a filtering operation and a statistical
process on the sample data.
[0015] Some embodiments of the disclosure also provide a gene shared
memory based gene analysis apparatus, comprising: a data reading
module configured to read sample data; a data preprocessing module
configured to preprocess the sample data; and a gene analysis module
configured to perform a gene analysis on the sample data preprocessed,
and determine whether a required library file in the gene analysis
is in a gene shared memory; if yes, obtain the required library file
from the gene sharedmemory, map the required library file to aprocess
of the gene analysis of the sample data preprocessed, and complete
-6 - IEE210846PAU a corresponding analysis.
[0016] Some embodiments of the disclosure further provide a computer
device, comprising amemory, aprocessor and a computer program stored
on the memory and executable on the processor. The processor executes
the following steps: reading sample data and preprocessing the sample
data; performing a gene analysis on the sample data preprocessed,
and determining whether a required library file in the gene analysis
is in a gene shared memory; if yes, obtaining the required library
file from the gene shared memory, mapping the required library file
to a process of the gene analysis of the sample data preprocessed,
and completing a corresponding analysis.
[0017] Some embodiments of the disclosure further provide a
computer-readable storage medium on which a computer program is
stored, wherein the computer program when executed by a processor
implements the following steps: reading sample data and preprocessing
the sample data; performing a gene analysis on the sample data
preprocessed, and determining whether a required library file in the
gene analysis is in a gene shared memory; if yes, obtaining the
required library file from the gene shared memory, mapping the
required library file to a process of the gene analysis of the sample
data preprocessed, and completing a corresponding analysis.
[0018] The gene shared memory based gene analysis method, apparatus,
computer device and computer readable medium are provided in the
embodiments of the disclosure. Sample data is read first, and then
-7 - IEE210846PAU the sample data is preprocessed, and then a gene analysis is performed on the sample data preprocessed. In the gene analysis, it is necessary to determine whether a required library file is in a gene shared memory of library files in gene analysis; if yes, the required library file is obtained from the gene shared memory, and mapped to the gene analysis corresponding to the sample data to complete the corresponding analysis. In the gene sharedmemory based gene analysis method the gene shared memory mechanism is used to establish indexes for gene analysis (for example comprises alignment analysis, variant calling analysis, annotation analysis and so on), and then stores files in a database (i.e. library files) required in the gene analysis in the gene shared memory. A library file can be conveniently mapped from the gene shared memory to a process of the gene analysis performed on the sample data. On one hand, the time and the I/O occupation for loading the library file from a hard disk are greatly reduced. On the other hand, the communications among multiple processes in the process of the gene analysis are facilitated and the repeatedly loading of the library file is avoid.
-8g- IEE210846PAU
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] Inorder tomore clearlyexplain the embodiments of the present
disclosure or the technical solutions in the prior art, a brief
introduction will be given below for the drawings required to be used
in the description of the embodiments or the prior art. It is obvious
that, the drawings illustrated as follows are merely embodiments of
the present disclosure. For a person skilled in the art, he or she
may also acquire other drawings according to such drawings on the
premise that no inventive effort is involved.
[0020] Fig. 1 is a schematic diagram of an application environment
of a shared memory based gene analysis method according to some
embodiments of the present disclosure;
[0021] Fig. 2 is a flow diagramof a shared memory based gene analysis
method according to some embodiments of the present disclosure;
[0022] Fig. 3 is a schematic diagram showing a principle diagram
of a shared memory according to some embodiments of the present
disclosure;
[0023] Fig. 4 is a flow diagram of constructing a shared memory in
some embodiments of the present disclosure;
[0024] Fig. 5 is a structure diagram of a shared memory in some
embodiments of the present disclosure;
[0025] Fig. 6 is a flow diagramof a shared memory based gene analysis
method according to some embodiments of the present disclosure;
[0026] Fig. 7 is a diagram showing a CPU utilization and an I/O
-9 - IEE210846PAU utilization when a gene analysis is performed using a method A according to some embodiments of the present disclosure;
[0027] Fig. 8 is a diagram showing a CPU utilization and an I/O
utilization when a gene analysis is performed using a method B
according to some embodiments of the present disclosure;
[0028] Fig. 9 is a diagram showing a CPU utilization and an I/O
utilization when a gene analysis is performed using a method C
according to some embodiments of the present disclosure;
[0029] Fig. 10 is a structure diagram of a shared memory based gene
analysis apparatus according to some embodiments of the present
disclosure;
[0030] Fig. 11 is a structure diagram of a computer device according
to some embodiments of the present disclosure.
-10- IEE210846PAU
DETAILED DESCRIPTION
[0031] The technical solutions in the embodiments of the present
disclosure willbe clearly and completely described below. Obviously,
the described embodiments are only a part of the embodiments of the
present disclosure, but not all of the embodiments. All other
embodiments obtained by those of ordinary skill in the art based on
the embodiments of the present disclosure without creative efforts
shall fall within the protection scope of the present disclosure.
[0032] Glossary:
[0033] Gene (Mendelian factor) refers to a DNA or a RNA sequence that
carries geneticinformation (that is, a gene is a DNAor a RNA fragment
with genetic effects), also known as genetic factor, which is a basic
geneticunit that controls biologicaltraits.Agene expresses genetic
information it carries by directing a synthesis of proteins, thereby
controlling the traits of individual organisms. Gene sequencing is
a new type of gene detection technology that analyzes and determines
the whole sequence of genes from blood or saliva, so as to predict
the possibility of suffering from a variety of diseases, individual
behavior characteristics and reasonable behaviors.
[0034] Read: A short sequencing fragment, which is sequencing data
generated by a high-throughput sequencer. Tens of millions of reads
will be generated by sequencing an entire genome. Then, by splicing
these reads together, the full sequence of the genome can be obtained.
[0035] Alignment analysis: Reads sequencedby NGS are storedin FASTQ
-11- IEE210846PAU files. Although they originally came from an ordered genome, the sequential relationship between different reads in the files has been lost after DNA library building and sequencing. Therefore, there is no positional relationship between two reads next to each other in the FASTQ files. They are all short sequences randomly derived from certain positions in the original genome. Therefore, we need to straighten out a lot of short sequences first, compare them with a reference genome of the species one by one, find the position of each read on the reference genome, and then arrange them in order. This process is called the comparison of sequencing data.
[0036] Sorting analysis: Why are BAM files output out of order after
a BWA comparison? The reason is that these sequenced reads in the
FASTQ files are randomly distributed on the genome. The first step
of the comparison is to locate the reads one by one on the reference
genome according to their order in the FASTQ files, and then output
them directly. It is impossible in this step to automatically
recognize the sequence of their comparison positions and rearrange
the comparison results. Therefore, in the result file obtained after
the comparison, the positional order of the records is chaotic. We
need to sort the records in order for a subsequent step such as
marking-duplicate, which is the reason for the need to sort.
[0037] Marking-duplicate: After the sorting is completed,
deduplication is performed (i.e., removing PCR duplicated sequences) .
What is a duplicated sequence? How is it produced and why does it
-12- IEE210846PAU need to be removed? It is related to the library construction and sequencing in the experimental process. Before NGS sequencing, a sequencing library needs to be constructed: cut the original DNA sequence by physical (ultrasonic) interruption or using a chemical reagent (enzyme digestion), and then select sequences in a specific length range for PCR amplification and computer sequencing.
Therefore, the duplicated sequence here is actuallyintroduced during
the PCR process.
[0038] Base quality score correction: It is to (as far as possible)
correct systematicerrors in the sequencingprocess, because avariant
calling is a step that relies heavily on the sequencing base quality
scores. Because this quality score is an important (even the only)
indicator to measure how correct the base we sequenced is. It cannot
be measured directly, but an extremely close distribution result can
be obtained through statistical techniques. A known variation found
in a population is likely to be the same in someone. Therefore, we
can compare and analyze the comparison result directly, exclude all
knownvariation sites, and then calculate howmanybases are different
from those on the reference genome after comparison for each
(reported) quality score. These different bases are considered as
wrongbases, and their number ratio reflect the realbase error rates,
which are converted into Phred scores. This information is output
into a calibration table file, and is used to re-adjust the base
quality scores in the original BAM file. A new BAM file is output
-13- IEE210846PAU using these new quality scores.
[0039] Variant calling and analysis: the purpose of variant calling
and analysis is to accurately detect a variation set in the genome
of each sample (such as human), that is, those DNA sequences that
are different for different people.
[0040] In order tomake the object, technicalsolution and advantages
of the present application more clear and explicit, the present
application will be further described in detail in combination with
the drawings and the embodiments. It should be understood that the
detailed embodiments that will be described herein are only used for
explaining the present application, but not used for limiting the
present application.
[0041] This method can be applied to the terminal 102 in Fig. 1. The
terminal can be a personal computer, laptop, etc. The terminal 102
is connected with a gene sequencing device 104, which can be a gene
sequencer, etc.
[0042] When the terminal 102 is connected with the gene sequencing
device 104 through a local interface, the gene sequencing device 104
can sendsample dataafter sequencing to the terminal102. Inaddition,
the terminal 102 can obtain the sample data after sequencing in the
gene sequencing device 104 through instructions.
[0043] In some embodiments, as shown in Fig. 2, a shared memory based
gene analysis method is provided. As an illustration, this method
is applied to the terminal in Fig. 1 as an example, and comprises
-14- IEE210846PAU the following steps:
[0044] In step S202, sample data is read and the sample data is
preprocessed.
[0045] The sample data is data generated or formed after gene
sequencing of samples. The number of the samples can be one or more
groups.
[0046] In an optional embodiment, preprocessing the sample data
comprises: performing a quality control, a filtering operation and
a statistical process on the sample data.
[0047] The data obtained from gene sequencing is called raw data (i.e.
raw reads or raw data). The raw data may contain low-quality sequences
and splice sequences, which will affect the analysis result.
Therefore, a series of data processing shall be carried out on the
raw data, such as a quality control, a filtering operation and a
statistical process, to remove impurities in the raw data, so as to
determine whether the sequencing data is suitable for subsequent
analysis.
[0048] In step S204, a gene analysis on the sample data preprocessed
is performed, and whether a required library file in the gene analysis
is in a gene shared memory is determined.
[0049] Generally, after preprocessing the sample data, it is
necessary to carry out a relevant gene analysis on the sample data.
A common analysis mainly comprises a sequence alignment (i.e.
alignment analysis), a variant calling (i.e. variation analysis),
-15- IEE210846PAU a annotation statistics (i.e. annotation analysis) and a subsequent pathway analysis (such as a GO analysis, a KEGG analysis and a protein pathway analysis). However, no matter which analysis is carried out, it needs to adopt an analysis database. For example, a reference genome database is required for the alignment analysis, a species genome database (such as a human genome database) is required for the variant calling, an annotation database is required for the annotation analysis, a pathway database is required for the pathway analysis, etc. Each database has a large amount of data. These databases need to be loaded when the analysis is carried out on the sample data.
[0050] Shared memory is the last way of interprocess communication
in System V. Shared memory, as its name implies, allows two unrelated
processes to access a same logical memory, and is a very effective
way to share and transfer data between two running processes. The
memory shared between different processes is usually a same piece
of physical memory. Processes can connect the same piece of physical
memory to their own address space, and all processes can access
addresses in the shared memory. If a process writes data to the shared
memory, this change will immediately affect any other process that
can access the same piece of shared memory.
[0051] Fig. 3 is a schematic diagram showing the communication
principle of a shared memory. In Linux, each process has its own
process control block (PCB) and address space (Addr Space), and has
a corresponding page table, which is used for mapping virtual
-16- IEE210846PAU addresses of the process to physical addresses and is managed through a memory management unit (MMU). Two different virtual addresses may be mapped to a same area in a physical space by using the page table, and this area they point to is a shared memory. Referring to Fig.
3, there are two processes ProcA and ProcB in the Figure. When virtual
addresses are mapped to a physical address through page tables of
these two processes, there is a common memory area of the physical
address, that is, a shared memory, which can be seen by the two
processes at a same time. In this way, when one process writes and
another process reads, an inter process communication can be realized
between the two processes. For the shared memory, its implementation
adopts a principle of reference counting. When a process detaches
the shared memory area, a counter decreases by one. When a process
successfully hitches to the shared memory area, the counter increases
by one. The shared memory area can be deleted only if the counter
becomes zero. When the process terminates, the shared memory area
attached to it will automatically detach from it.
[0052] In the embodiments, a gene shared memory is constructed for
library files in gene analysis, in which the most commonly used
databases in gene analysis processing can be stored. When a database
is needed in an analysis of sample data, it can be obtained directly
from the gene shared memory, which greatly reduces time of loading
the database from a loading library of a disk. In addition, when
multiple groups of sample data are analyzed at a same time, the
-17- IEE210846PAU database can be shared among the multiple groups of sample data, which reduces repeated loading and I/O occupation.
[0053] In an optional embodiment, as shown in Fig. 4, there is also
provided a method of constructing a shared memory, comprising: steps
S402-S404o
[0054] In step S402, the gene shared memory for library files used
in gene analysis is set, a size of the gene shared memory, the number
of library files that can be accommodated, a name of each library
file and a size offset of the each library file are set.
[0055] In step S404, library files commonly used in gene analysis
are loaded into the gene shared memory according to the size of the
gene shared memory, the number of library files that can be
accommodated, the name of the each library file and the size offset
of the each library file.
[0056] Referring to Fig. 5, a certain area is selected in a terminal
system (i.e. a hardware device used for a gene analysis of sample
data) as the gene shared memory of library files in the gene analysis.
An appropriate size of the gene shared memory is determined according
to a storage space, a data processing ability and other performances
of the terminal system. Contents recorded or stored in the gene shared
memory area mainly comprise: a design of a table header of the gene
shared memory in node physical memory: 1) first, store the number
(n) of determined shared libraries and a total length (Len) of a shared
area; 2) store a name (e.g. Libl, Lib2) and a length offset (offsetl,
-18- IEE210846PAU offset2) of each specified library file in the gene shared memory;
3) store data of the each specified library file in a selected area
in turn.
[0057] Its working principle is as follows: the sample data can
comprise multiple groups of data; each group of data has a
corresponding sample process. Fromsample process P1to sample process
PN, each process has its own process control block (PCB) and address
space (Addr space), and has a corresponding page table, which is used
for mapping virtual addresses of the process to physical addresses
and is managed through a memory management unit (MMU). Two different
virtual addresses may be mapped to a same area in a physical space
byusing the page table, and this area theypoint tois a sharedmemory.
Through the above method, each sample process can enter the shared
memory area, so as to obtain a required library file in the shared
memory area.
[0058] In step S206, if yes, the required library file from the gene
shared memory is obtained, the required library file is mapped to
a process of the gene analysis of the sample data preprocessed, and
a corresponding analysis is completed.
[0059] In the gene shared memory based gene analysis method provided
in the embodiments of the disclosure, Sample data is read first, and
then the sample data is preprocessed, and then a gene analysis is
performed on the sample data preprocessed. In the gene analysis, it
is necessary to determine whether a required library file is in a
-19- IEE210846PAU gene shared memory of library files in gene analysis; if yes, the required library file is obtained from the gene shared memory, and mapped to the gene analysis corresponding to the sample data to complete the corresponding analysis. In the gene shared memory based gene analysis method the gene shared memory mechanism is used to establish indexes for gene analysis (for example comprises alignment analysis, variant calling analysis, annotation analysis and so on), and then stores files in a database (i.e. library files) required in the gene analysis in the gene shared memory. A library file can be conveniently mapped from the gene shared memory to a process of the gene analysis performed on the sample data. On one hand, the time and the I/O occupation for loading the library file from a hard disk are greatly reduced. On the other hand, the communications among multiple processes in the process of the gene analysis are facilitated and the repeatedly loading of the library file is avoid.
[0060] In some embodiment, the method further comprises: determining
whether the required library file meets a load condition, in a case
where the required library file in the gene analysis is not in the
gene shared memory; and loading the required library file into the
gene shared memory, in a case where the loading condition is met.
[0061] Specifically, if the required library file in the gene
analysis is not in the gene shared memory, it is determined whether
the required library file meets the load condition. The required
library file can be loaded into the gene shared memory if the loading
-20- IEE210846PAU condition is met. On the one hand, it is faster and more efficient to load the required library file into the gene shared memory and then obtain the required library file from the gene shared memory; on the other hand, it can also facilitate other sample data processes to use the required library file, which avoids a repeated loading.
[0062] In some embodiments, determining whether the required library
file meets a load condition, in a case where the required library
file in the gene analysis is not in the gene shared memory, and loading
the required library file into the gene shared memory, in a case where
the loading condition is met comprises:
[0063] acquiring information of the required library file and
information of the gene shared memory, wherein the information of
the required library file comprises a space required by the required
library file and the number of historical load requests, and the
information of the gene shared memory comprises a remaining space
of the gene shared memory; and if the number of historical load
requests is greater than a first preset number, and the space required
by the required library file is less than the remaining space of the
gene shared memory, loading the required library file into the gene
shared memory.
[0064] The information of the required library file refers to
information related to the required library file, which can comprises
a type of the required library file, a size of the required library
file, a space required by the required library file, the number of
-21- IEE210846PAU historical load requests and a load request frequency of the required library file, etc. Information of the gene shared memory refers to information related to the gene shared memory, mainly comprising a size of the gene shared memory, a remaining space of the gene shared memory, etc.
[0065] A first preset number is a preset value, which can be used
to reflect an importance of a library file to a certain extent. That
is, if the number of historical load requests is greater than the
first preset number, it indicates that the required library file is
needed or used frequently, i.e., the required library file is
important in the gene analysis, and can be loaded into the gene shared
memory, so as to facilitate the use for other sample data. After
determining the importance of the required library file, it is further
necessary to determine whether the remaining space of gene shared
memoryis enough to store the requiredlibrary file, thatis, determine
whether the space required by the required library file is less than
the remaining space of gene shared memory. If so, the required library
file can be directly loaded into gene shared memory.
[0066] In some embodiments, the information of the required library
file further comprises a load request frequency of the required
library file, the information of the gene shared memory further
comprises load request frequencies of all library files; determining
whether the required library file meets a load condition, and loading
the required library file into the gene shared memory, in a case where
-22- IEE210846PAU the loading condition is met further comprises: if the number of historical load requests is greater than the first preset number, and the space required by the required library file is greater than the remaining space of the gene shared memory, ranking the required library file and the all library files in an order of priority according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene shared memory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene shared memory; and loading the required library file into the gene shared memory.
[0067] Specifically, if it is determined that the space required for
the required library file is greater than the remaining space of the
gene shared memory, it indicates that the remaining space of the gene
shared memory is not enough to store the required library file; in
this case, it is necessary to compare the required library file with
the library files already stored in the gene shared memory, delete
a library file with a low load request frequency according to the
load request frequency priorities of the library files, and then load
-23- IEE210846PAU the required library file into the gene shared memory.
[0068] In the embodiments, the required library file and the library
files stored in the gene shared memory are ranked in an order of
priority mainly according to the load request frequency of each
library file. If the load request frequency priority of the required
library file is higher than that of a library file in the gene shared
memory, the library file in the gene shared memory is deleted to load
the required library file into the gene shared memory. The sizes of
all the library files are taken into comprehensive consideration in
the above process. It is only necessary to ensure that the memory
occupied by the deleted library file is sufficient to store the
required library file.
[0069] In this way, when the required library file in the process
of the gene analysis is not in the gene shared memory, the library
file can be loaded into the gene shared memory first, so as to improve
the efficiency of a subsequent calculation.
[0070] In some embodiments, the gene analysis comprises an alignment
analysis, a variation analysis and an annotation analysis; the method
further comprises: performing the alignment analysis, the variation
analysis, and the annotation analysis on the sample data preprocessed
in sequence, wherein in a case where the sample data preprocessed
comprises multiple groups of sample data, the multiple groups of
sample data are in a same step or different steps of the gene analysis
at a time.
-24- IEE210846PAU
[0071] In the embodiments, the method of the gene analysis comprises
the alignment analysis, the variation analysis and the annotation
analysis. However, there is usually a sequence requirement in the
process of the gene analysis, that is, the alignment analysis is
generally carried out first, followed by the variation analysis, and
then the annotation analysis. However, when there are multiple groups
of sample data, each group of sample data can be in a same step or
different steps of the gene analysis. For example, sample data 1 can
be in an alignment analysis, sample data 2 can be in a variation
analysis, and sample data 3 can be in an annotation analysis. It is
also possible for sample data 1, sample data 2 and sample data 3 to
be in an alignment analysis, a variation analysis or an annotation
analysis at the same time. Multiple groups of sample data can be
processed at the same time by using the method, which can further
improve the data processing speed.
[0072] In some embodiments, the gene analysis further comprises a
sorting analysis and a marking-duplicate analysis, wherein after
performing the alignment analysis, the variation analysis, and the
annotation analysis on the sample data preprocessed in sequence, the
method further comprises: labeling the sample dataafter the alignment
analysis with a position tag; and performing the sorting analysis
and the marking-duplicate analysis by module on the sample data
labeled.
[0073] Specifically, the gene analysis further comprises the
-25- IEE210846PAU sequencing analysis and the marking-duplicate analysis; labeling the sample data after the alignment analysis with a position tag is to add a position-related tag to a file after a comparison, so that the sequencing analysis and the marking-duplicate analysis can be performed by module, and more efficient multi-threaded sorting can be increased to the sequencing analysis and the marking-duplicate analysis.
[0074] In some embodiments, the method further comprises: connecting
some or all steps of the gene analysis by a use of memory.
[0075] Specifically, several steps or all steps in processes of
comparison, sorting, marking-duplicate and variant calling in the
process of the gene analysis can be connected by the use of memory.
Sam/bam files outputted intermediately can be reduced by connecting
each step by the use of memory which reduces the I/O occupation.
[0076] For ease of understanding, a detailed embodiment is given
below. Fig. 6 shows the whole process of a gene analysis and a process
in the gene shared memory area. The process of the gene analysis is
as follows: after samples are input, the data of each sample is
preprocessed, and then whether a library file required for an
alignment analysis is loaded into the gene shared memory area is
determined; if yes, the alignment analysis is started, or if not,
the library file is loaded from a hard disk to perform the alignment
analysis; the process of the alignment analysis is synthesized as
a flexible step by a memory connection and an algorithm optimization;
-26- IEE210846PAU then the variant calling is performed, and whether a library file of annotation information has been loaded into the gene shared memory is determined; if yes, an annotation statistics is started, or if not, the library file is loaded from a hard disk for the annotation statistics; the analysis process is ended.
[0077] A process in the gene shared memory area is as follows: if
there is a request for information of library lib-x (i.e. a required
library file), whether the required library file is in the gene shared
memory area is determined; if yes, library data is feedback, and the
process is ended; if the required library file is not in the gene
shared memory area, whether to load the required library file through
a load method Q is determined; if yes, the required library file is
loaded into the gene shared memory area, the library data is return,
and the process is ended; if the required library file is not to be
loaded through the load method Q, no information is returned and the
process is ended.
[0078] The specific steps of the load method Q are as follows: 1.
a type and a size of the required library file are determined; 2.
a record file is obtained; 3. a total memory size of the node, a size
of the shared memory area, the number of historical load requests
of library and the total number of historical load requests of all
types of libraries are read from the record file; 4. the memory size
of the node is updated from a hard disk to prevent the memory size
of the node from changing; 5. the number of historical load requests
-27- IEE210846PAU of this type of library is increased by 1 (f type+1); 6. the total number of historical load requests of the all types of libraries is increased by 1 (ftotal+l); 7. whether the remaining space is enough to load the library is determined; 8. request frequencies
(ftype/ftotal) of all types of libraries in the record file are
ranked in descending order, and a ranked linked list is returned;
9. whether the required library file has been loaded is determined;
if the required library file has been loaded, a library index is
returned; if the required library file has not been loaded and the
number of historical load requests of this type of library is more
than 10, its priority and rank position in all unloaded libraries
are determined; 10. if the priority of this type of library exceeds
that of a loaded library, the system predicts whether a sum of the
sizes of the loaded libraries ranked after this type of library in
the typelist meets a condition W of a size of memory for loading
this type of library; if yes, these loaded libraries are unloaded
in reverse order until the condition W is met; if not, no process
is performed; 11. if the load condition is met, the record of the
size of the shared memory area is updated; 12. otherwise, a case that
the library has not been loaded because there is no sufficient memory
to load the library is marked, and update it to the record file.
[0079] The format of the record file is given below:
[0080] M: 63492649171200
-28- IEE210846PAU
[0081] Len: 13492649171200
[0082] f total: 100
Type Size Loaded The number of type flag historical load requests f type Libx 10000000000000 Yes 75 0 Liby 3492649171200 Yes 12 0 Libw 40000000000000 No 10 1 Libz 5000000 No 3 0
[0083] typeflag indicates the reason for not being loaded, wherein
"1" indicates that the load priority of this type of library was ranked
first and it was not loaded because of insufficient memory, and the
typeflag of a loaded library is 0.
[0084] In addition, the pseudo code of the loadmethod Qis as follows.
[0085] RequestShareMem(type, size) // type: the type of
the library for sharing, size: the size of the library for sharing
[0086] File = RecordFile // the record file
[0087] ReadFromFile(M, Len, ftype, ftotal) // read from
the record file (M: total memory size of the node; Len: current size
of the shared memory area; ftype: the number of historical load
requests of this type of library; ftotal: the total number of
historical load requests of all types of libraries;
[0088] Update (M) // update the memory size of
the node from a hard disk to prevent the memory size of the node from
changing;
[0089] f type = f type + 1 //update f type
[0090] f total = f total + 1 updatee ftotal
-29- IEE210846PAU
[0091] W = M*0.5 - Len - size > 0 // the condition W:
determine whether there is remaining space for loading, 0.5 is an
adjustable factor, currently 50% of the total memory is used
[0092] typelist = SortAllTypeInFile() / rank
request frequencies (ftype/ftotal) of all types of libraries in
the record file in descending order, and return a ranked linked list;
[0093] if AlreadyLoaded(type) then
[0094] id = GetShareMemId(type) // if the required
library file has been loaded, return a library index
[0095] else if f type > 10 // the number of historical load
requests of this type of library is more than 10
[0096] if IsPrior (typelist, type) // determine whether it is
the first priority: ranked in front of all other unloaded libraries
whose typeflag is 0
[0097] if typeflag = 1
[0098] UnloadShareMem (type-list, type) // if a sum of
the sizes of the loaded libraries ranked after this type of library
in the typelist meets a condition W, unload these loaded libraries
in reverse order until the condition W is met; otherwise, no process
is performed
[0099] if W
[00100] id= LoadShareMem ( type, size)
[00101] Len = Len + size //update the size of the shared
memory area
-30- IEE210846PAU
[00102] typeflag = 0
[00103] else
[00104] typeflag = 1 //mark that there
is no sufficient memory, update the record
[00105] id = 0
[00106] else
[00107] id = 0 //return no information
[00108] UpdateFile (M, Len, ftype, ftotal, typeflag) //
update the record file
[00109] return id returnn an index of the shared
memory area, "0" represents no information
[00110] end
[00111] Some embodiments for showing effects:
[00112] Inorder toverifythe effectiveness ofthe sharedmemorybased
gene analysis method in the embodiments of the disclosure, three gene
analysis methods, namely method A (software without optimization
(i.e. all steps of the gene analysis are not connected by a use of
memory, and the steps are independent from each other) + without a
use of the gene shared memory), method B (software with optimization
(i.e. all steps of the gene analysis are connected by a use of memory)
+ without a use of the gene shared memory) and method C (software
with optimization(i.e. all steps of the gene analysis are connected
by a use of memory) + with a use of the gene shared memory) are given
to compare CPU utilizations and I/O times of the methods. The results
-31- IEE210846PAU are shown in Figs. 7 to 9, wherein FIG. 7 shows an analysis result of the method A, Fig. 8 shows an analysis result of the method B, and Fig. 9 shows an analysis result of the C.
[00113] It can be seen fromFigs. 7 to 9 that running time ofan analysis
portion of the method A before acceleration (i.e. each of a step of
reading sample data and a step of preprocessing before the alignment
analysis runs independently and the comparison is processed directly
without the use of the gene shared memory) is 2.83 hours, and the
CPU utilization fluctuates greatly. Running time of the comparison
portion and an annotation portion before acceleration (i.e. the
comparison and the annotation are processed directly without the use
of the gene shared memory) is 2.61 hours, the CPU utilization is high,
and the I/O sec (i.e. the number of transfers output to a physical
disk per second) is high, indicating that the I/O utilization is high
and the probability of blocking is high.
[00114] Running time of an analysis portion of the method B after
acceleration (i.e. a step of reading sample data and a step of
preprocessing before the alignment analysis are connected by the use
of memory and the comparison is processed by using the gene shared
memory) is 1.75 hours, and the CPU utilization fluctuates smaller
than that of method A. Running time of a library comparison portion
before the use of gene shared memory (i.e. the comparison is processed
directly without the use of the gene shared memory) is 2.38 hours,
the CPU utilization is high, and the I/O sec (i.e. the number of
-32- IEE210846PAU transfers output to a physical disk per second) is high, indicating that the I/O utilization is high and the probability of blocking is high.
[00115] Running time of an analysis portion of the method C after
acceleration (i.e. a step of reading sample data and a step of
preprocessing before the alignment analysis are connected by the use
of memory and the comparison is processed by using the gene shared
memory) is 1.75 hours, and the CPU utilization fluctuates smaller
than that of method A (this portion is the same as method B). Running
time of a library comparison portion after the use of gene shared
memory (i.e. the comparison is processed with the use of the gene
shared memory) is 0.82 hours, the CPU utilization is high, and the
I/O sec (i.e. the number of transfers output to a physical disk per
second) is low, indicating that the I/O utilization is low and the
probability of blocking is low.
[00116] Therefore, the method C is used for the gene analysis, that
is, the gene analysis steps are connected by the use of memory. The
method of adopting the gene shared memory in comparison, annotation
and other processes can greatly reduce the time used for the gene
analysis and reduce the I/O utilization rate, that is, reduce I/O
blocking.
[00117] It should be understood that although the steps in the
flowcharts of FIGS. 2, 4 and 6 are shown in order as indicated by
the arrows, these steps are not necessarily performed in order as
-33 - IEE210846PAU indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited in order, and these steps can be performed in other orders. Moreover, at least some steps in Figs. 2, 4, and 6 may comprise multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be performed alternately with other steps or at least some sub-steps or stages of other steps.
[00118] In some embodiments, as shown in FIG. 10, there is provided
a shared memory based gene analysis apparatus, comprising:
[00119] The data reading module 102 is configured to read sample data.
[00120] The data preprocessing module 104 is configured to preprocess
the sample data.
[00121] The gene analysis module 106 is configured to perform a gene
analysis on the sample data preprocessed, and determine whether a
required library file in the gene analysis is in a gene shared memory;
if yes, obtain the required library file from the gene shared memory,
map the required library file to a process of the gene analysis of
the sample data preprocessed, and complete a corresponding analysis.
[00122] In some embodiments, a library file loading module configured
to determine whether the required library file meets a load condition,
in a case where the required library file in the gene analysis is
not in the gene shared memory; and load the required library file
into the gene shared memory, in a case where the loading condition
is met.
-34 - IEE210846PAU
[00123] In some embodiments, the library file loadingmodule comprises
a library information and memory information acquisition module.
[00124] The library information and memory information acquisition
module is configured to acquire information of the required library
file and information of the gene shared memory, wherein the
information of the required library file comprises a space required
by the required library file and the number of historical load
requests, and the information of the gene shared memory comprises
a remaining space of the gene shared memory.
[00125] The library file loadingmodule is configured to, if the number
of historical load requests is greater than a first preset number,
and the space required by the required library file is less than the
remaining space of the gene shared memory, load the required library
file into the gene shared memory.
[00126] In some embodiments, the information of the required library
file further comprises a load request frequency of the required
library file, the information of the gene shared memory further
comprises load request frequencies of all library files; and the
library file loading module further comprises a priority ranking
module and a library file deleting module.
[00127] The priority sorting module is configured to, if the number
of historical load requests is greater than the first preset number,
and the space required by the required library file is greater than
the remaining space of the gene shared memory, rank the required
-35 - IEE210846PAU library file and the all library files in an order of priority according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file.
[00128] The library file deleting module is configured to, if the load
request frequency priority of the required library file is higher
than that of a library file in the gene shared memory, and if the
remaining space of the gene shared memory after deleting the library
file with a lower load request frequency priority in the gene shared
memory is greater than or equal to the space required by the required
library file, delete the library file with the lower load request
frequency priority in the gene shared memory.
[00129] the library file loading module is further configured to load
the required library file into the gene shared memory.
[00130] In some embodiments, the apparatus further comprises: a gene
shared memory setting module configured to set the gene shared memory
for library files used in gene analysis, set a size of the gene shared
memory, the number of library files that can be accommodated, a name
of each library file and a size offset of the each library file.
[00131] The library file loading module is further configured to load
library files commonly used in gene analysis into the gene shared
memory according to the size of the gene shared memory, the number
of library files that can be accommodated, the name of the each library
file and the size offset of the each library file.
- 36- IEE210846PAU
[00132] In some embodiments, the gene analysis comprises an alignment
analysis, a variation analysis and an annotation analysis.
[00133] The gene analysis module is configured to perform the
alignment analysis, the variation analysis, and the annotation
analysis on the sample data preprocessed in sequence, wherein in a
case where the sample data preprocessed comprises multiple groups
of sample data, the multiple groups of sample data are in a same step
or different steps of the gene analysis at a time.
[00134] In some embodiments, the gene analysis further comprises a
sorting analysis and a marking-duplicate analysis, and the apparatus
further comprises: a sorting and marking-duplicate module configured
to label the sample data after the alignment analysis with a position
tag; and perform the sorting analysis and the marking-duplicate
analysis by module on the sample data labeled.
[00135] In some embodiments, the apparatus further comprises: amemory
connection module configured to connect some or all steps of the gene
analysis by a use of memory.
[00136] In some embodiments, the data preprocessing module is further
a quality control, a filtering operation and a statistical process
on the sample data perform a quality control, a filtering operation
and a statistical process on the sample data.
[00137] For the specific definition of the shared memory based gene
analysis apparatus, please refer to the definition of the shared
memory based gene analysis method described above, which will not
-37- IEE210846PAU be repeated here. All or some of the modules in the shared memory based gene analysis apparatus can be realized by software, hardware, or a combination thereof. The above modules can be embedded in or independent of a processor of a computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so as to facilitate the processor to call and execute the corresponding operations of the above modules.
[00138] In some embodiments, a computer device is provided, which may
be a server, and its internal structure may be as shown in FIG. 11.
The computer device comprises a processor, a memory, a network
interface and a database connected through a systembus. The processor
of the computer device is used to provide computing and control
capabilities. The memory of the computer device comprises a
nonvolatile storage medium and a memory device. The nonvolatile
storage medium stores an operating system, a computer program, and
a database. The memory device provides an environment for the
operation of the operating system and the computer program in
nonvolatile storage medium. The database of the computer device is
used to store the data of a resistance equivalent modeland equivalent
sub models, as well as the equivalent resistance, working resistance
and contact resistance obtained during calculation. The network
interface of the computer device is used to communicate with external
terminals through network connection. The computer program is
executed by the processor to implement a shared memory based gene
-3g- IEE210846PAU analysis method.
[00139] Those skilled in the art can understand that the structure
shown in FIG. 11 is only a block diagram of some structures related
to the scheme ofthis application, and does not constitute alimitation
on the computer device to which the scheme of this application is
applied. The specific computer device may comprise more or fewer
components than those shown in the Figure, or combine some components,
or have different component arrangements.
[00140] In some embodiments, acomputer device is provided, comprising
a processor, a memory, and a computer program stored in the memory
and executable by the processor, which when executing the computer
program implements the following steps: reading sample data and
preprocessing the sample data; performing a gene analysis on the
sample data preprocessed, and determining whether a required library
file in the gene analysis is in a gene shared memory; if yes, obtaining
the required library file from the gene shared memory, mapping the
required library file to a process of the gene analysis of the sample
data preprocessed, and completing a corresponding analysis.
[00141] In some embodiments, the processor whenexecuting the computer
program further implements the following steps: determining whether
the required library file meets a load condition, in a case where
the required library file in the gene analysis is not in the gene
shared memory; and loading the required library file into the gene
shared memory, in a case where the loading condition is met.
-39 - IEE210846PAU
[00142] In some embodiments, the processor whenexecuting the computer
programfurtherimplements a stepof: determiningwhether the required
library file meets a load condition, in a case where the required
library file in the gene analysis is not in the gene shared memory,
and loading the required library file into the gene shared memory,
in a case where the loading condition is met comprises: acquiring
information of the required library file and information of the gene
shared memory, wherein the information of the required library file
comprises a space required by the required library file and the number
of historical load requests, and the information of the gene shared
memory comprises a remaining space of the gene shared memory; and
if the number of historical load requests is greater than a first
preset number, and the space required by the required library file
is less than the remaining space of the gene shared memory, loading
the required library file into the gene shared memory.
[00143] In some embodiments, the processor whenexecuting the computer
program further implements the following step: the information of
the required library file further comprises a load request frequency
ofthe requiredlibrary file, the information ofthe gene sharedmemory
further comprises load request frequencies of all library files in
the gene shared memory; determining whether the required library file
meets a load condition, and loading the required library file into
the gene shared memory, in a case where the loading condition is met
further comprises: if the number of historical load requests is
-40- IEE210846PAU greater than the first preset number, and the space required by the required library file is greater than the remaining space of the gene shared memory, ranking the required library file and the all library files in an order of priority according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene sharedmemory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene shared memory; and if the number of historical load requests is greater than a first preset number, and the space required by the required library file is less than the remaining space of the gene shared memory, loading the required library file into the gene shared memory.
[00144] In some embodiments, the processor whenexecuting the computer
program further implements the following step: the information of
the required library file further comprises a load request frequency
ofthe requiredlibrary file, the information ofthe gene sharedmemory
further comprises load request frequencies of all library files;
determining whether the required library file meets a load condition,
and loading the required library file into the gene shared memory,
-41- IEE210846PAU in a case where the loading condition is met further comprises: if the number ofhistoricalloadrequestsis greater than the firstpreset number, and the space required by the required library file is greater than the remaining space of the gene shared memory, ranking the required library file and the all library files in an order of priority according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene shared memory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene sharedmemory; and loading the required library file into the gene shared memory.
[00145] In some embodiments, the processor whenexecuting the computer
program further implements the following steps: setting the gene
shared memory for library files used in gene analysis, setting a size
of the gene shared memory, the number of library files that can be
accommodated, a name of each library file and a size offset of the
each library file; and loading library files commonly used in gene
analysis into the gene shared memory according to the size of the
gene shared memory, the number of library files that can be
-42- IEE210846PAU accommodated, the name of the each library file and the size offset of the each library file.
[00146] In some embodiments, the processor whenexecuting the computer
program further implements the following step: the gene analysis
comprises an alignment analysis, a variation analysis and an
annotation analysis, and the processor when executing the computer
program further implements the following step: performing the
alignment analysis, the variation analysis, and the annotation
analysis on the sample data preprocessed in sequence, wherein in a
case where the sample data preprocessed comprises multiple groups
of sample data, the multiple groups of sample data are in a same step
or different steps of the gene analysis at a time.
[00147] In some embodiments, the processor whenexecuting the computer
program further implements the following step: the gene analysis
further comprises a sorting analysis and a marking-duplicate
analysis, wherein after performing the alignment analysis, the
variation analysis, and the annotation analysis on the sample data
preprocessed in sequence, the processor when executing the computer
program further implements the following steps: labeling the sample
data after the alignment analysis with a position tag; and performing
the sorting analysis and the marking-duplicate analysis by module
on the sample data labeled.
[00148] In some embodiments, the processor whenexecuting the computer
program further implements the following step: connecting some or
-43 - IEE210846PAU all steps of the gene analysis by a use of memory.
[00149] In some embodiments, the processor whenexecuting the computer
program further implements the following step: preprocessing the
sample data comprises: performing a quality control, a filtering
operation and a statistical process on the sample data.
[00150] Some embodiments provide a computer-readable storage medium
on which a computer program is stored, which when executed by a
processor implements the following steps: reading sample data and
preprocessing the sample data; performing a gene analysis on the
sample data preprocessed, and determining whether a required library
file in the gene analysis is in a gene shared memory; if yes, obtaining
the required library file from the gene shared memory, mapping the
required library file to a process of the gene analysis of the sample
data preprocessed, and completing a corresponding analysis.
[00151] In some embodiments, the processor whenexecuting the computer
program further implements the following steps: determining whether
the required library file meets a load condition, in a case where
the required library file in the gene analysis is not in the gene
shared memory; and loading the required library file into the gene
shared memory, in a case where the loading condition is met.
[00152] In some embodiments, the computer program when executed by
a processor implements the following steps: determining whether the
required library file meets a load condition, in a case where the
required library file in the gene analysis is not in the gene shared
-44 - IEE210846PAU memory, and loading the required library file into the gene shared memory, in a case where the loading condition is met comprises: acquiring information of the required library file and information of the gene shared memory, wherein the information of the required library file comprises a space required by the required library file and the number of historical load requests, and the information of the gene shared memory comprises a remaining space of the gene shared memory; and if the number of historical load requests is greater than a first preset number, and the space required by the required library file is less than the remaining space of the gene shared memory, loading the required library file into the gene shared memory.
[00153] In some embodiments, the computer program when executed by
a processor implements the following steps: the information of the
required library file further comprises a load request frequency of
the required library file, the information of the gene shared memory
further comprises load request frequencies of all library files in
the gene shared memory; determining whether the required library file
meets a load condition, and loading the required library file into
the gene shared memory, in a case where the loading condition is met
further comprises: if the number of historical load requests of the
required library file is greater than the first preset number, and
the space required by the required library file is greater than the
remaining space of the gene shared memory, ranking the required
library file and the all library files in an order of priority
-45 - IEE210846PAU according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene shared memory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene shared memory; and if the number of historical load requests of the required library file is greater than a first preset number, and the space required by the required library file is less than the remaining space of the gene shared memory, loading the required library file into the gene shared memory.
[00154] In some embodiments, the computer program when executed by
a processor implements the following steps: the information of the
required library file further comprises a load request frequency of
the required library file, the information of the gene shared memory
further comprises load request frequencies of all library files;
determining whether the required library file meets a load condition,
and loading the required library file into the gene shared memory,
in a case where the loading condition is met further comprises: if
the number ofhistoricalloadrequestsis greater than the firstpreset
number, and the space required by the required library file is greater
-46 - IEE210846PAU than the remaining space of the gene shared memory, ranking the required library file and the all library files in an order of priority according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene shared memory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene sharedmemory; and loading the required library file into the gene shared memory.
[00155] In some embodiments, the computer program when executed by
a processor further implements the following steps: setting the gene
shared memory for library files used in gene analysis, setting a size
of the gene shared memory, the number of library files that can be
accommodated, a name of each library file and a size offset of the
each library file; and loading library files commonly used in gene
analysis into the gene shared memory according to the size of the
gene shared memory, the number of library files that can be
accommodated, the name of the each library file and the size offset
of the each library file.
[00156] In some embodiments, the computer program when executed by
-47 - IEE210846PAU a processor further implements the following steps: the gene analysis comprises an alignment analysis, a variation analysis and an annotation analysis, and the computer program when executed by a processor further implements the following step: performing the alignment analysis, the variation analysis, and the annotation analysis on the sample data preprocessed in sequence, wherein in a case where the sample data preprocessed comprises multiple groups of sample data, the multiple groups of sample data are in a same step or different steps of the gene analysis at a time.
[00157] In some embodiments, the computer program when executed by
a processor further implements the following steps: the gene analysis
further comprises a sorting analysis and a marking-duplicate
analysis, wherein after performing the alignment analysis, the
variation analysis, and the annotation analysis on the sample data
preprocessed in sequence, the computer program when executed by a
processor further implements the following steps: labeling the sample
data after the alignment analysis with a position tag; and performing
the sorting analysis and the marking-duplicate analysis by module
on the sample data labeled.
[00158] In some embodiments, the computer program when executed by
a processor further implements the following step: connecting some
or all steps of the gene analysis by a use of memory.
[00159] In some embodiments, the computer program when executed by
a processor further implements the following step: preprocessing the
-48- IEE210846PAU sample data comprises: performing a quality control, a filtering operation and a statistical process on the sample data.
[00160] As understood by those skilled in the art, all or part of the
steps for carrying out the method in the above embodiments can be
completed by hardware or a program instructing the related hardware,
wherein the program can be stored in a computer readable nonvolatile
storage medium; the program when executed can carry out the steps
of the embodiments of the above methods; Any reference to memory,
storage, database or other media used in the embodiments provided
by the present application may comprise nonvolatile and/or volatile
memory. The nonvolatile memory may comprise read only memory (ROM),
programmable ROM (PROM) , electrically programmable ROM (EPROM)
, electrically erasable programmable ROM (EEPROM), or flash memory.
The volatile memory may comprise random access memory (RAM) or
external cache memory. As an illustration rather than a limitation,
RAM is available in various forms, such as static RAM (SRAM) , dynamic
RAM (DRAM) , synchronous DRAM (SDRAM) , dual data rate SDRAM (DDRSDRAM) ,
enhanced SDRAM (ESDRAM) , synchronous link DRAM (SLDRAM) , Rambus
direct RAM (RDRAM) , direct memory bus dynamic RAM (DRDRAM) and Rambus
dynamic RAM (RDRAM), etc.
[00161] The technical features of the above embodiments can be
combined arbitrarily. In order to make the description concise, all
possible combinations of the various technical features in the
embodiments are not described, but should be regarded as within the
-49 - IEE210846PAU scope of this description, as long as there is no contradiction in the combinations of these technical features.
[00162] The aforesaid embodiments merely present several embodiments
of the present application. However, the relatively specific and
detailed descriptions thereof cannot therefore be construed as
limiting the scope of the present application. It shall be pointed
out that a person skilled in the art is capable of making various
modifications and improvements without departing from the concept
of the present application. Suchmodifications and improvements shall
be regarded as within the protection scope of the present application.
Therefore, the protection scope of the present application shall be
determined by the terms of the claims.
-50- IEE210846PAU

Claims (7)

What is claimed is:
1. A shared memory based gene analysis method, characterized by,
comprising:
reading sample data and preprocessing the sample data;
performing a gene analysis on the sample data preprocessed, and
determining whether a required library file in the gene analysis is
in a gene shared memory;
if yes, obtaining the required library file from the gene shared
memory, mapping the required library file to a process of the gene
analysis of the sample data preprocessed, and completing a
corresponding analysis.
2. The shared memory based gene analysis method according to claim
1, characterized by, further comprising:
determining whether the required library file meets a load
condition, in a case where the required library file in the gene
analysis is not in the gene shared memory; and
loading the required library file into the gene shared memory,
in a case where the loading condition is met.
3. The shared memory based gene analysis method according to claim
2, characterized in that determining whether the required library
file meets a load condition, in a case where the required library
-51- IEE210846PAU file in the gene analysis is not in the gene shared memory, and loading the required library file into the gene shared memory, in a case where the loading condition is met comprises: acquiringinformation of the requiredlibrary file andinformation of the gene shared memory, wherein the information of the required library file comprises a space required by the required library file and the number of historical load requests, and the information of the gene shared memory comprises a remaining space of the gene shared memory; and if the number of historical load requests is greater than a first preset number, and the space required by the required library file is less than the remaining space of the gene shared memory, loading the required library file into the gene shared memory.
4. The shared memory based gene analysis method according to claim
3, characterized in that the information of the required library file
further comprises a load request frequency of the required library
file, the information of the gene shared memory further comprises
load request frequencies of all library files; determining whether
the required library file meets a load condition, and loading the
required library file into the gene shared memory, in a case where
the loading condition is met further comprises:
if the number of historical load requests is greater than the first
preset number, and the space required by the required library file
-52- IEE210846PAU is greater than the remaining space of the gene shared memory, ranking the required library file and the all library files in an order of priority according to the load request frequency of the required library file and the load request frequencies of the all library files to obtain a load request frequency priority of each library file; if the load request frequency priority of the required library file is higher than that of a library file in the gene shared memory, and if the remaining space of the gene shared memory after deleting the library file with a lower load request frequency priority in the gene shared memory is greater than or equal to the space required by the required library file, deleting the library file with the lower load request frequency priority in the gene shared memory; and loading the required library file into the gene shared memory.
5. The shared memory based gene analysis method according to any
one of claims 1 to 4, characterized by, further comprising:
setting the gene shared memory for library files used in gene
analysis, setting a size of the gene shared memory, the number of
library files that can be accommodated, a name of each library file
and a size offset of the each library file; and
loading library files commonly used in gene analysis into the gene
shared memory according to the size of the gene shared memory, the
number of library files that can be accommodated, the name of the
each library file and the size offset of the each library file.
-53 - IEE210846PAU
6. The shared memory based gene analysis method according to claim
1, characterized in that the gene analysis comprises an alignment
analysis, a variation analysis and an annotation analysis, and the
method further comprises:
performing the alignment analysis, the variation analysis, and
the annotation analysis on the sample data preprocessed in sequence,
wherein in a case where the sample data preprocessed comprises
multiple groups of sample data, the multiple groups of sample data
are in a same step or different steps of the gene analysis at a time.
7. The shared memory based gene analysis method according to claim
6, characterizedin that the gene analysis further comprises a sorting
analysis and a marking-duplicate analysis, wherein after performing
the alignment analysis, the variation analysis, and the annotation
analysis on the sample data preprocessed in sequence, the method
further comprises:
labeling the sample data after the alignment analysis with a
position tag; and performing the sorting analysis and the
marking-duplicate analysis by module on the sample data labeled.
8. The shared memory based gene analysis method according to claim
7, characterized by, further comprising:
connectingsome orallsteps ofthe gene analysisbyause ofmemory.
-54 - IEE210846PAU
9. The shared memory based gene analysis method according to any
one of claims 6 to 8, characterized in that preprocessing the sample
data comprises:
performing a quality control, a filtering operation and a
statistical process on the sample data.
10. A shared memory based gene analysis apparatus, characterized
by, comprising:
a data reading module configured to read sample data;
a data preprocessing module configured to preprocess the sample
data; and
a gene analysis module configured to perform a gene analysis on
the sample datapreprocessed, anddetermine whether arequiredlibrary
file in the gene analysis is in a gene shared memory; if yes, obtain
the requiredlibrary file fromthe gene sharedmemory, map the required
library file to a process of the gene analysis of the sample data
preprocessed, and complete a corresponding analysis.
11. A computer device comprising: a memory, a processor, and a
computer program stored on the memory and executable on the processor,
characterized in that the processor when executing the computer
program implements the steps of the method according to any one of
claims 1 to 9.
-55 - IEE210846PAU
12. Acomputer-readable storage medium on which a computer program
is stored, characterized in that the computer program when executed
by a processor implements the steps of the method according to any
one of claims 1 to 9.
-56 - IEE210846PAU
102 104
102 104 2024201174
Network o
Network
Fig. 1 Fig. 1
S202 Read sample data and the preprocess sample S202 data Read sample data and the preprocess sample data
Perform a gene analysis on the sample data S204 preprocessed , and determine whether a S204 Perform a gene analysis on the sample data required library file in the gene analysis is preprocessed , and determine whether a in a gene shared memory required library file in the gene analysis is in a gene shared memory
S206 If yes, obtain the required library file from the gene shared memory , map the required If yes, obtain the required library file from S206 library file to a process of the gene the gene shared memory , map the required analysis of the sample data preprocessed, and library file to a process of the gene complete the gene analysis analysis of the sample data preprocessed , and complete the gene analysis
Fig. 2 Fig. 2 1 / 7 IEE210846PAU 1 / 7 IEE210846PAU
Process Physical address Process A Process Physical address B A Process B 2024201174
Address space Address space Shared Address space Page table memory Address space Shared Page table
Page table memory Page table
Fig. 3 Fig. 3
Set the gene shared memory for library files used in gene analysis, set a size of the gene S402 Set the gene shared memory for library files shared memory, the number of library files used in gene analysis, set a size of the gene S402 that can be accommodated, a name of each shared memory, the number of library files library file and a size offset of the each that can be accommodated, a name of each library file library file and a size offset of the each library file Load library files commonly used in gene S404 analysis into the gene shared memory according Load library files commonly used in gene S404 to the size of the gene shared memory, the analysis into the gene shared memory according number of library files that can be to the size of the gene shared memory, the accommodated, the name of the each library number of library files that can be file and the size offset of the each library accommodated, the name of the each library file file and the size offset of the each library file
Fig. 4 Fig. 4 2 / 7 IEE210846PAU 2 / 7 IEE210846PAU
Gene shared memory area M (in physical memory of a node)
0 72 offset2 memory of a node) memory area M (in physical Gene sharedoffset1 offset n Len Total information: Libl: Lib2: Number of shared libraries n Name1 Raw data (Data of database) 0 72 Name2 ... offset1 offset2 offset n Len Total length of shared memory area Len Offset1 Offset2 Total information: Lib1: Lib2: Number of shared libraries n Name1 Name2 ... Raw data (Data of database) Total length of shared memory area Len Offset1 Offset2
Logic address space Logic address space Logic address space
System kernel area System kernel area System kernel area Logic address space Logic address space Logic address space User stack User stack User stack
DynamicSystem library link area kernel area DynamicSystem library link area kernel area DynamicSystem library link area kernel area
User stack Heap User stack Heap User stack Heap ...
Data Dynamic .data, link segment (library area .bss) Data Dynamic segment (.data, link area library .bss) Data Dynamic .data, link segment (library area .bss)
Heap Code segment (text, .rodata) Heap Code segment (text, .rodata) Heap Code segment ( text, .rodata) … segmentarea DataReserved ( .data, .bss) ( .data, .bss) segmentarea DataReserved ( .data, .bss) segmentarea DataReserved Code segment ( .text, .rodata) Code segment ( .text, .rodata) Code segment ( .text, .rodata) Reserved area Reserved area Reserved area
Sample process P1 Sample process P2 Sample process P3 Sample process Pn
Sample process P1 Sample process P2 Sample process P3 Sample process Pn
Fig. 5 Fig. 5
3 / 7 IEE210846PAU 3 / 7 IEE210846PAU
Gene analysis process Start
Gene analysis process Start Sample input
Start Sample input Data preprocessing 2024201174
process quality
control, filtering and Start Request for Data preprocessing statistical processing Lib-x process quality information control, filtering and Request for Use comparison statistical processing library Lib-x Lib-x information Use comparison Lib-x in the library Lib-x shared area? Load Lib-x Obtain Lib-x from gene from hard disk No Lib-x in the shared memory area? shared area? Load Lib-x Obtain Lib-x from gene No from hard disk sharedYes memory area? Map lib data to this Load method Q?
Yes process
Map lib data to this Load method Q? Yes process Alignment No analysis Load Lib-x to gene Yes shared memory area Alignment No analysis Load Lib-x to gene Variation analysis shared memory area
Variation Use analysis annotation Return lib Return no information library Lib-y data information
Use annotation Return lib Return no information library Lib-y data information Load Lib-y Obtain Lib-y from gene
from hard disk No shared memory area? End
Load Lib-y Obtain Lib-y from gene No End from hard disk shared Yes memory area? Map lib data to this Yes process
Map lib data to this process Annotation statistics
Annotation statistics Output
Output End
End
Fig. 6 Fig. 6
4 / 7 IEE210846PAU 4 / 7 IEE210846PAU
An analysis portion before acceleration 2.83h An comparison portion and an annotation portion before acceleration 2.61h 2024201174
Fig. 7
Fig. 7
A library comparison portion before the An analysis portion after acceleration 1.75h use of gene shared memory 2.38h
Fig. 8
Fig. 8
5/7 5 / 7 IEE210846PAU
An analysis portion after System Summary ubuntu 2020/1/16 A library comparison portion after the acceleration 1.75h on -Other use of gene shared memory 0.82h 13000
A n analysis portion a f t e r A library comparison portion a ft e r the acceleration 1.75h use of gene shared memory 0.82h 2024201174
20
as
.
Fig. 9 Fig. 9
102 104 106
Data reading 102 Data preprocessing 104 106 Gene analysis module module module Data reading Data preprocessing Gene analysis module module module
Fig. 10
Fig. 10
6 / 7 IEE210846PAU 6 / 7 IEE210846PAU
Processor System bus Processor System bus Memory OS device Memory OS Computer device Network interface program Computer Network program Database interface Nonvolatile Database storage medium Nonvolatile storageComputer medium device Computer device
Fig. 11
Fig. 11
7 / 7 IEE210846PAU 7 / 7 IEE210846PAU
AU2024201174A 2020-10-22 2024-02-22 Shared memory based gene analysis method, apparatus and computer device Pending AU2024201174A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2024201174A AU2024201174A1 (en) 2020-10-22 2024-02-22 Shared memory based gene analysis method, apparatus and computer device

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
CN202011139824.9A CN112270959A (en) 2020-10-22 2020-10-22 Shared memory-based gene analysis method and device and computer equipment
CN202011139824.9 2020-10-22
AU2020457044A AU2020457044A1 (en) 2020-10-22 2020-11-06 Shared memory based gene analysis method, apparatus and computer device
PCT/CN2020/127072 WO2022082878A1 (en) 2020-10-22 2020-11-06 Shared memory-based gene analysis method and apparatus, and computer device
AU2024201174A AU2024201174A1 (en) 2020-10-22 2024-02-22 Shared memory based gene analysis method, apparatus and computer device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU2020457044A Division AU2020457044A1 (en) 2020-10-22 2020-11-06 Shared memory based gene analysis method, apparatus and computer device

Publications (1)

Publication Number Publication Date
AU2024201174A1 true AU2024201174A1 (en) 2024-03-14

Family

ID=80469111

Family Applications (2)

Application Number Title Priority Date Filing Date
AU2020457044A Abandoned AU2020457044A1 (en) 2020-10-22 2020-11-06 Shared memory based gene analysis method, apparatus and computer device
AU2024201174A Pending AU2024201174A1 (en) 2020-10-22 2024-02-22 Shared memory based gene analysis method, apparatus and computer device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
AU2020457044A Abandoned AU2020457044A1 (en) 2020-10-22 2020-11-06 Shared memory based gene analysis method, apparatus and computer device

Country Status (4)

Country Link
EP (1) EP4235679A1 (en)
JP (1) JP7344996B2 (en)
AU (2) AU2020457044A1 (en)
IL (1) IL289071A (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20240025702A (en) 2016-06-07 2024-02-27 일루미나, 인코포레이티드 Bioinformatics systems, apparatus, and methods for performing secondary and/or tertiary processing
CN108197433A (en) 2017-12-29 2018-06-22 厦门极元科技有限公司 Datarams and hard disk the shunting storage method of rapid DNA sequencing data analysis platform
CN108776617A (en) 2018-06-08 2018-11-09 山东超越数控电子股份有限公司 It is a kind of that target identification method is prefetched based on access frequency and dynamic priority

Also Published As

Publication number Publication date
JP7344996B2 (en) 2023-09-14
JP2023512610A (en) 2023-03-28
IL289071A (en) 2022-02-01
AU2020457044A1 (en) 2022-05-12
EP4235679A8 (en) 2023-10-18
EP4235679A1 (en) 2023-08-30

Similar Documents

Publication Publication Date Title
Li Minimap2: pairwise alignment for nucleotide sequences
Volfovsky et al. A clustering method for repeat analysis in DNA sequences
Kankainen et al. POBO, transcription factor binding site verification with bootstrapping
CN112885412B (en) Genome annotation method, apparatus, visualization platform and storage medium
CN113867645A (en) Data migration and data read-write method and device, computer equipment and storage medium
Delehelle et al. ASGART: fast and parallel genome scale segmental duplications mapping
AU2024201174A1 (en) Shared memory based gene analysis method, apparatus and computer device
EP1850250A1 (en) Method and system for renewing an index
Srivastava et al. NetSeekR: a network analysis pipeline for RNA-Seq time series data
WO2022082878A1 (en) Shared memory-based gene analysis method and apparatus, and computer device
Abouelhoda et al. Multiple genome alignment: Chaining algorithms revisited
RU2792228C1 (en) Method, device and computer device for gene analysis based on shared memory
US20220157414A1 (en) Method and system for facilitating optimization of a cluster computing network for sequencing data analysis using adaptive data parallelization, and non-transitory storage medium
Cui et al. Homology search for genes
US11821031B2 (en) Systems and methods for graph based mapping of nucleic acid fragments
Li et al. T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
JP4540556B2 (en) Data access method and program thereof
Shih et al. GS-Aligner: a novel tool for aligning genomic sequences using bit-level operations
US20210366578A1 (en) Embryonic dna registry
CN109760044A (en) A kind of data processing method and device
CN116414733B (en) Data processing method, device, computer equipment and storage medium
US20220383980A1 (en) Processing sequencing data relating to amyotrophic lateral sclerosis
CN112550951B (en) Biological sample storage method and device, computer equipment and storage medium
Aizad et al. Graph Data Modelling for Genomic Variants
Gunady et al. Fast and interpretable alternative splicing and differential gene-level expression analysis using transcriptome segmentation with Yanagi