WO2024070589A1 - Procédé d'analyse de risque hors cible, système d'analyse de risque hors cible, programme et support d'enregistrement - Google Patents

Procédé d'analyse de risque hors cible, système d'analyse de risque hors cible, programme et support d'enregistrement Download PDF

Info

Publication number
WO2024070589A1
WO2024070589A1 PCT/JP2023/032841 JP2023032841W WO2024070589A1 WO 2024070589 A1 WO2024070589 A1 WO 2024070589A1 JP 2023032841 W JP2023032841 W JP 2023032841W WO 2024070589 A1 WO2024070589 A1 WO 2024070589A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
target
virtual
score
target sequence
Prior art date
Application number
PCT/JP2023/032841
Other languages
English (en)
Japanese (ja)
Inventor
哲史 佐久間
和恭 中前
Original Assignee
国立大学法人広島大学
プラチナバイオ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立大学法人広島大学, プラチナバイオ株式会社 filed Critical 国立大学法人広島大学
Publication of WO2024070589A1 publication Critical patent/WO2024070589A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present disclosure relates to an off-target risk analysis method and an off-target prediction system that analyze the risk of off-target effects occurring in genome editing and its derived technologies.
  • Genome editing is a technology that recognizes the DNA sequence of a target region on a genome sequence (hereafter referred to as the target sequence) and introduces mutations such as deletions, substitutions, and insertions into the sequence of any target gene using a DNA-binding tool that can cleave the target region.
  • DNA-binding tools include zinc-finger nucleases (ZFNs), TALE nucleases (Transcription Activator-Like Effector Nucleases), and CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR-associated protein).
  • off-target effects a phenomenon in which unexpected mutations are introduced into non-target genome sequences
  • the frequency of off-target effects depends on the DNA sequence recognized by the DNA-binding tool, so even if the same DNA-binding tool (e.g., CRISPR/Cas9) is used, the frequency of off-target effects varies if the target sequence is different.
  • the risk of introducing unexpected mutations into other genome sequences that are not the target genome sequence in genome editing is also called off-target risk.
  • DNA-binding tools e.g., transcriptional regulation, epigenetic editing, chromosome imaging, etc.
  • technologies that apply DNA-binding tools e.g., transcriptional regulation, epigenetic editing, chromosome imaging, etc.
  • zinc fingers e.g., TALEs, and CRISPR/Cas with inactivated nuclease activity, so there is a risk of off-target effects depending on the target sequence.
  • the off-target risk prediction methods shown in (1) and (2) above can only be used for biological species whose entire genome information has been decoded, or for specific individual organisms, tissues, cell clones, varieties, bacterial strains, and virus strains, etc. Therefore, it has been difficult to apply these methods to predicting off-target risks when using DNA-binding tools for biological species (including industrial organisms) whose genome information has not been decoded well or whose genomes are difficult to decode because they contain difficult-to-read sequence elements such as repeat structures.
  • genomic information is not definitive because spontaneous mutations may occur in each individual and cell. Even by referring to the sequences of reference genomes and unique genomes sequenced in advance, it is not possible to predict potential off-target effects of DNA-binding tools that may be manifested in the presence of spontaneous mutations.
  • DNA-binding tools for example, medical treatments that apply DNA-binding tools
  • the potential off-target risks of DNA-binding tools can be a major risk, but until now there has been no method that can predict these risks in advance.
  • the off-target risk analysis method includes a virtual sequence generation step of generating a plurality of virtual sequences including a sequence identical to a target sequence and a sequence in which at least one mutation has been introduced into the target sequence, a score calculation step of calculating, for each of the plurality of virtual sequences, a score related to the probability that a DNA-binding tool that recognizes the target sequence will act, and a prediction result output step of outputting, based on the calculated score, a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence.
  • the off-target risk analysis system includes a virtual sequence generation unit that generates a plurality of virtual sequences including a sequence identical to a target sequence and a sequence in which at least one mutation has been introduced into the target sequence, a score calculation unit that calculates a score related to the probability that a DNA-binding tool that recognizes the target sequence will act for each of the plurality of virtual sequences, and a prediction result output unit that outputs a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence based on the calculated score.
  • the off-target risk analysis system may be realized by a computer.
  • the control program of the off-target risk analysis system that realizes the off-target risk analysis system on a computer by causing the computer to operate as each part (software element) of the off-target risk analysis system, and the computer-readable recording medium on which it is recorded, also fall within the scope of the present disclosure.
  • FIG. 1 is a block diagram showing an example of a schematic configuration of an off-target risk analysis system.
  • FIG. 13 is a diagram illustrating an example of a virtual array.
  • FIG. 13 is a diagram illustrating an example of a virtual array.
  • FIG. 13 is a diagram illustrating an example of a virtual array.
  • 13 is a flowchart showing an example of a processing flow executed by the off-target risk analysis device.
  • FIG. 1 is a diagram illustrating an example of a schematic configuration of an off-target risk analysis system.
  • FIG. 1 is a block diagram showing an example of a schematic configuration of an off-target risk analysis device.
  • FIG. 4 is a diagram illustrating an example of a data structure of user information.
  • FIG. 13 illustrates an example of a data structure of an analysis result log.
  • 1 is a graph showing the correlation between the prediction results output using the off-target risk analysis method according to the present disclosure and the off-target incidence rate obtained as a result of analyzing the off-target effect in actual cells.
  • 1 is a graph showing the correlation between the prediction results output using the off-target risk analysis method according to the present disclosure and the off-target incidence rate obtained as a result of analyzing the off-target effect in actual cells.
  • 1 is a graph showing the correlation between the prediction results output using the off-target risk analysis method according to the present disclosure and the off-target incidence rate obtained as a result of analyzing the off-target effect in actual cells.
  • 13 is a scatter plot showing the correlation between the specificity prediction score calculated using the MOFF score and the “Off-on-ratio” score in the “TTISS” group.
  • the off-target risk analysis system 100 is a system that outputs a prediction result indicating the possibility that a DNA binding tool that recognizes a target sequence may act on a sequence different from the target sequence.
  • a DNA-binding tool is a tool that specifically binds to a specific genomic region and is capable of cutting, modifying, editing, etc., the genomic region.
  • the genomic region that is cut, modified, edited, etc. by the DNA-binding tool may be a specific region on the genome of a cell, and the DNA-binding tool may also be referred to as a genome editing tool.
  • off-target effect When a DNA-binding tool that recognizes a target sequence acts on a sequence other than the target sequence, this is called an "off-target effect.”
  • off-target effects on genes in the genome of a cell can result in the activation of cancer genes and the inactivation of tumor suppressor genes in that cell.
  • the effects of off-target effects have a permanent effect on cells. Therefore, it is desirable to accurately estimate the risk of off-target effects (hereinafter referred to as off-target risk) before actually using a DNA-binding tool that recognizes a target sequence.
  • the off-target risk analysis system 100 executes the following processes (a) to (c).
  • a "DNA binding tool” may be any of (i) to (v) below.
  • a TALE Transcription Activator-Like Effector
  • Pentatricopeptide repeats (PPRs) or fusion polypeptides combining pentatricopeptide repeats with a functional domain.
  • PPRs Pentatricopeptide repeats
  • PPRs Pentatricopeptide repeats
  • a wild-type CRISPR/Cas CRISPR-associated protein
  • a fusion polypeptide-nucleic acid complex combining a wild-type CRISPR/Cas with a functional domain.
  • a modified CRISPR/Cas CRISPR-associated protein
  • a fusion polypeptide-nucleic acid complex combining a modified CRISPR/Cas with a functional domain.
  • the target sequence is a base sequence recognized by a domain having a zinc finger protein motif.
  • the target sequence is a base sequence recognized by a region to which a TAL module (Transcription Activator-Like Module) is bound.
  • TAL module Transcription Activator-Like Module
  • the target sequence is a base sequence recognized by a region of consecutive PPR motifs.
  • the target sequence is a base sequence complementary to the guide RNA (gRNA) that forms a complex with Cas, and a protospacer adjacent motif (PAM) sequence recognized by Cas.
  • gRNA guide RNA
  • PAM protospacer adjacent motif
  • a DNA-binding tool that falls under any of the above (i) to (v) may misidentify unintended DNA that has a base sequence similar to the target sequence.
  • Non-Patent Documents 1 to 3 All of the analytical methods described in Non-Patent Documents 1 to 3 are capable of analyzing off-target risks when DNA-binding tools are used on biological species whose entire genome information has been decoded, or on specific individual organisms, tissues, cell clones, varieties, bacterial strains, and virus strains. However, it is difficult to apply the analytical methods described in Non-Patent Documents 1 to 3 to predict off-target risks when DNA-binding tools are used on biological species (including industrial organisms) whose genome information has not been decoded or whose genome is difficult to decode.
  • the off-target risk analysis system 100 can output a prediction result indicating the off-target risk of a DNA-binding tool that recognizes a target sequence, provided that the target sequence is given.
  • the off-target risk analysis system 100 can evaluate the risk of causing an off-target effect that a DNA-binding tool that recognizes a target sequence potentially has, without referring to the genomic sequence of the target to which the DNA-binding tool is applied.
  • Fig. 1 is a block diagram showing an example of a schematic configuration of the off-target risk analysis system 100.
  • the off-target risk analysis system 100 may include an off-target risk analysis device 1 and a display device 4.
  • FIG. 1 shows an off-target risk analysis system 100 that includes one off-target risk analysis device 1 and one display device 4.
  • the configuration of the off-target risk analysis system 100 is not limited to this.
  • the number of display devices 4 in the off-target risk analysis system 100 may be zero or more than one.
  • the off-target risk analysis device 1 and the display device 4 are connected to each other so that they can communicate with each other.
  • the off-target risk analysis device 1 and the display device 4 may be directly connected by wire or wirelessly, or may be connected via a communication network.
  • the form of the communication network is not limited, and may be a local area network (LAN) or the Internet.
  • the off-target risk analysis device 1 is a device that uses target sequence data to output a prediction result indicating the off-target risk of a DNA-binding tool.
  • the output prediction result may be transmitted from the off-target risk analysis device 1 to a display device 4.
  • the display device 4 may typically be a computer, smartphone, tablet terminal, etc. used by a user of the off-target risk analysis system 100.
  • FIG. 1 shows an off-target risk analysis system 100 in which the display device 4 is separate from the off-target risk analysis device 1.
  • the configuration of the off-target risk analysis system 100 is not limited to this.
  • the display device 4 may be a device integrated with the off-target risk analysis device 1, in which case the display device 4 may be a display unit (display, etc.) provided in the off-target risk analysis device 1.
  • the off-target risk analysis device 1 includes a control unit 10, a storage unit 20, and an input unit 30.
  • control unit 10 may be a CPU (Central Processing Unit).
  • the control unit 10 reads a control program, which is software stored in the memory unit 20, and expands it into a memory such as a RAM (Random Access Memory) to execute various functions. Note that, in the memory unit 20 shown in Figure 1, the control program is not shown in order to simplify the explanation.
  • the control unit 10 includes a target sequence receiving unit 11, a virtual sequence generating unit 12, a score calculating unit 13, and a prediction result output unit 14.
  • the target sequence receiving unit 11 receives target sequence data indicating a target sequence input using the input unit 30.
  • the target sequence receiving unit 11 may store the received target sequence data in the memory unit 20.
  • the virtual sequence generating unit 12 generates a plurality of virtual sequences from the target sequence data, including a sequence identical to the target sequence and a sequence in which at least one mutation has been introduced into the target sequence.
  • the virtual sequence generating unit 12 virtually generates a variety of virtual sequence data including sequences that may be misrecognized by a DNA binding tool that recognizes the target sequence.
  • the virtual sequence generating unit 12 may generate virtual sequences based on virtual sequence generation rules 21 stored in the memory unit 20.
  • the mutation that the virtual sequence generation unit 12 introduces into the target sequence may be any one of substitutions, deletions, and insertions.
  • Figs. 2 to 4 are diagrams showing examples of virtual sequences.
  • the virtual sequences shown in Figs. 2 to 4 are generated from a target sequence derived from the human ⁇ -globin gene, which consists of 23 bases.
  • the virtual sequence may be shorter or longer than 23 bases.
  • the target sequence is not limited to that derived from the human ⁇ -globin gene.
  • the virtual sequence does not have to be a continuous sequence, and may be, for example, a discontinuous sequence that contains a single or multiple arbitrary bases (N) at one or multiple locations.
  • FIG. 2 shows an example of a virtual sequence in which a substitution has been introduced into a target sequence.
  • the virtual sequence generation unit 12 may generate a plurality of virtual sequences including a sequence identical to the target sequence and a sequence in which a substitution has been introduced into the target sequence.
  • sequence M1 (SEQ ID NO: 1) is the same sequence as the target sequence.
  • Sequences M2 to M7 show examples of sequences in which a substitution has been introduced into one of the nucleotides contained in sequence M1.
  • sequence M2 (SEQ ID NO: 2) is a sequence in which the "A" at the 5' end of sequence M1 has been replaced with a "T”
  • sequence M3 (SEQ ID NO: 3) is a sequence in which it has been replaced with a "G”
  • sequence M4 (SEQ ID NO: 4) is a sequence in which it has been replaced with a "C”.
  • sequence M5 (SEQ ID NO: 5) is a sequence in which the second "G” from the 5' end of sequence M1 has been replaced with an "A”
  • sequence M6 (SEQ ID NO: 6) is a sequence in which it has been replaced with a "T”
  • sequence M7 (SEQ ID NO: 7) is a sequence in which it has been replaced with a "C”.
  • sequences M2 to M7 are shown as examples of virtual sequences in which a substitution has been introduced into one nucleotide of the target sequence, but the virtual sequences generated by the virtual sequence generation unit 12 are not limited to these.
  • the virtual sequence generation unit 12 may comprehensively generate sequences in which a single base substitution has been introduced into the target sequence.
  • the virtual sequence generation unit 12 may generate virtual sequences in which substitutions have been introduced into multiple nucleotides of the target sequence.
  • the virtual sequence generation unit 12 may comprehensively generate sequences in which two-base substitutions, three-base substitutions, and four-base substitutions have been introduced into the target sequence.
  • the multiple virtual sequences generated by the virtual sequence generating unit 12 may include the following sequences: A sequence in which at least one adenine (A) in the target sequence is replaced with at least one of thymine (T), cytosine (C), and guanine (G), and/or A sequence in which at least one thymine (T) in the target sequence is replaced with at least one of adenine (A), cytosine (C), and guanine (G), and/or A sequence in which at least one cytosine (C) in the target sequence is replaced with at least one of adenine (A), thymine (T), and guanine (G), and/or A sequence in which at least one guanine (G) in the target sequence is replaced with at least one of adenine (A), thymine (T), and cytosine (C).
  • FIG. 3 shows an example of a virtual sequence in which a deletion has been introduced into a target sequence.
  • the virtual sequence generation unit 12 may generate a plurality of virtual sequences including a sequence identical to the target sequence and a sequence in which a deletion has been introduced into the target sequence.
  • sequence M1 (SEQ ID NO: 1) is the same sequence as the target sequence.
  • Sequences M8 to M10 show examples of sequences in which a deletion has been introduced into one of the nucleotides contained in sequence M1.
  • sequence M8 (SEQ ID NO: 8) is a sequence in which the "A" at the 5' end of sequence M1 has been deleted
  • sequence M9 (SEQ ID NO: 9) is a sequence in which the second "G” from the 5' end of sequence M1 has been deleted
  • sequence M10 (SEQ ID NO: 10) is a sequence in which the third "C" from the 5' end of sequence M1 has been deleted.
  • sequences M8 to M10 are shown as examples of virtual sequences in which a deletion has been introduced into one nucleotide of the target sequence, but the virtual sequences generated by the virtual sequence generation unit 12 are not limited to these.
  • the virtual sequence generation unit 12 may comprehensively generate sequences in which a sequence in which a single base deletion has been introduced into the target sequence.
  • the virtual sequence generation unit 12 may generate virtual sequences in which deletions have been introduced into multiple nucleotides of the target sequence.
  • the virtual sequence generation unit 12 may comprehensively generate sequences in which a two-base deletion, a three-base deletion, and a four-base deletion have been introduced into the target sequence.
  • FIG. 4 shows an example of a virtual sequence in which an insertion has been introduced into a target sequence.
  • the virtual sequence generating unit 12 may generate a plurality of virtual sequences including a sequence identical to the target sequence and a sequence in which an insertion has been introduced into the target sequence.
  • sequence M1 (SEQ ID NO: 1) is the same sequence as the target sequence.
  • Sequences M11 to M18 show examples of sequences in which one base has been inserted into sequence M1.
  • sequence M11 (SEQ ID NO: 11) is a sequence in which "A” has been inserted between the 5'-end and second nucleotide "AG” of sequence M1
  • sequence M12 (SEQ ID NO: 12) is a sequence in which "T” has been inserted
  • sequence M13 (SEQ ID NO: 13) is a sequence in which "G” has been inserted
  • sequence M14 (SEQ ID NO: 14) is a sequence in which "C” has been inserted.
  • Sequence M15 is a sequence in which "A” has been inserted between the second and third nucleotides "GC” from the 5'-end of sequence M1
  • sequence M16 (SEQ ID NO: 6) is a sequence in which "T” has been inserted
  • sequence M17 (SEQ ID NO: 17) is a sequence in which "G” has been inserted
  • sequence M18 (SEQ ID NO: 18) is a sequence in which "C" has been inserted.
  • sequences M11 to M18 are shown as examples of virtual sequences in which an insertion has been introduced at one position in the target sequence, but the virtual sequences generated by the virtual sequence generation unit 12 are not limited to these.
  • the virtual sequence generation unit 12 may generate a sequence in which a two-base insertion has been introduced at one position in the target sequence.
  • the virtual sequence generation unit 12 may also comprehensively generate sequences in which an insertion has been introduced at one position in the target sequence.
  • the virtual sequence generation unit 12 may generate virtual sequences in which insertions have been introduced at multiple positions in the target sequence.
  • the virtual sequence generation unit 12 may comprehensively generate sequences in which insertions have been introduced at two, three, and four positions in the target sequence.
  • the virtual sequence generating unit 12 may generate a virtual sequence in which two or more types of mutations are introduced into the target sequence. For example, the virtual sequence generating unit 12 may generate a virtual sequence in which a substitution is introduced into the target sequence, a virtual sequence in which a deletion is introduced into the target sequence, and a virtual sequence in which an insertion is introduced into the target sequence.
  • the DNA-binding tool is less likely to act on sequences that have low homology to the target sequence recognized by the DNA-binding tool. Therefore, each of the multiple virtual sequences only needs to differ from the target sequence by four or fewer nucleotides, and there is little need to generate virtual sequences that introduce more than this number of mutations. This ensures the accuracy of the prediction results while reducing the burden on computational resources.
  • the off-target risk analysis device 1 may be configured not to generate a virtual sequence that satisfies a specific condition and not to calculate a score, as described below.
  • a virtual sequence that satisfies a specific condition may be, for example, a virtual sequence that is predicted from known knowledge to be unlikely to contribute to the occurrence of off-target effects.
  • the score calculation unit 13 calculates a score for each of the multiple virtual sequences that is related to the probability that the DNA binding tool that recognizes the target sequence will act.
  • the score calculation unit 13 may calculate the score using a value indicating the stability when the DNA binding tool is bound to each of the multiple virtual sequences.
  • the virtual sequence generation unit 12 may calculate the score based on score calculation rules 22 stored in the memory unit 20.
  • the score calculation rules 22 may be publicly known score calculation rules that apply an in silico analysis method developed according to the type of DNA binding tool. Furthermore, the score calculation rules 22 may be a combination of multiple rules, including publicly known calculation rules.
  • CRISPR-Net is a different scoring tool from the MOFF score (Jiecong Lin et al., "CRISPR-Net: A Recurrent Convolutional Network Quantifies CRISPR Off-Target Activities with Mismatches and Indels", Advanced Science, Vol 7, 1903562, 2020) (https://doi.org/10.1002/advs.201903562).
  • the DNA-binding tool is any of zinc finger nucleases (ZFNs), TALE nucleases (TALENs), and pentatricopeptide repeat (PPR) nucleases, it is possible to perform similar scoring by, for example, alignment taking into account mismatches and gaps and Tm value calculations, which can be performed by Biophython, etc.
  • ZFNs zinc finger nucleases
  • TALENs TALE nucleases
  • PPR pentatricopeptide repeat
  • the prediction result output unit 14 outputs a prediction result indicating the possibility that the DNA binding tool acts on a sequence different from the target sequence based on the calculated score.
  • the prediction result output unit 14 may output an evaluation value calculated using all the scores calculated for each of the multiple virtual sequences as the prediction result.
  • This evaluation value may be a value obtained by adding up all the scores calculated for each of the multiple virtual sequences.
  • this value is not limited to the sum of all the scores, but refers to a calculation that can be expressed as an n-variable function f(s1, s2, ..., sn) for n virtual sequences.
  • "s" indicates the score of each virtual sequence.
  • This n-variable function f may include not only linear transformations but also nonlinear transformations based on a model generated by machine learning, and may also include a term indicating an error by the computer.
  • the n-variable function f does not have to be a single-valued function, and may be a multi-valued function.
  • the prediction result may be calculated so that the score calculated for a virtual sequence in which a mutation has been introduced so that the sequence is unlikely to be misrecognized by the DNA binding tool (i.e., a sequence that is unlikely to contribute to the occurrence of off-target effects) does not unduly affect the prediction result.
  • the prediction result output unit 14 may output as the prediction result a value that is the sum of only the scores calculated for virtual sequences in which a mutation has been introduced so that the sequence is likely to contribute to the occurrence of off-target effects, among multiple virtual sequences.
  • Fig. 5 is a flowchart showing an example of a process flow executed by the off-target risk analysis device 1.
  • Fig. 5 also shows a process flow executed by an off-target risk analysis system 100 including the off-target risk analysis device 1.
  • the target sequence receiving unit 11 receives a target sequence input by a user (step S1).
  • the target sequence data indicating the target sequence may be, for example, text data.
  • the virtual sequence generation unit 12 generates multiple virtual sequences including a sequence identical to the target sequence and a sequence in which at least one mutation has been introduced into the target sequence (step S2: virtual sequence generation step).
  • the score calculation unit 13 calculates a score related to the probability that the DNA binding tool that recognizes the target sequence will act for each of the multiple virtual sequences generated in step S2 (step S3: score calculation step).
  • the prediction result output unit 14 outputs a prediction result indicating the possibility that the DNA binding tool acts on a sequence different from the target sequence based on the score calculated in step S3 (step S4: prediction result output step).
  • the off-target risk analysis device 1 generates multiple virtual sequences from a target sequence recognized by the DNA-binding tool, and calculates a score related to the probability that the DNA-binding tool will act for each of the multiple virtual sequences generated. The off-target risk analysis device 1 then uses the calculated score to output a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence. This prediction result is, so to speak, information indicating the potential off-target risk of the DNA-binding tool.
  • the off-target risk analysis device 1 can predict the potential off-target risks of a DNA-binding tool even if the genomic information of the target on which the DNA-binding tool is to be acted upon is unknown or uncertain.
  • the off-target risk analysis device 1 is provided with an input unit 30 that accepts input of a target sequence by a user, and is configured to output a prediction result to a display device 4, but is not limited to this.
  • an off-target risk analysis device 1a may be provided that is communicatively connected to communication devices 5a and 5b used by each user via a communication network 9.
  • the off-target risk analysis device 1a receives target sequence data indicating the target sequence from each of the communication devices 5a and 5b. The off-target risk analysis device 1a then transmits to the communication device 5a a prediction result corresponding to the target sequence received from the communication device 5a, and transmits to the communication device 5b a prediction result corresponding to the target sequence received from the communication device 5b.
  • FIG. 6 shows the off-target risk analysis system 100a including the communication devices 5a and 5b and the off-target risk analysis device 1a, the system is not limited to this. In the off-target risk analysis system 100a, the off-target risk analysis device 1a may be capable of communicating with two or more communication devices.
  • Fig. 7 is a functional block diagram showing a configuration example of an off-target risk analysis system 100a according to one aspect of the present disclosure.
  • the same reference numerals are attached to members having the same functions as those described in the above embodiment, and the description thereof will not be repeated.
  • the off-target risk analysis device 1a includes a communication unit 16 that functions as a communication interface with the communication devices 5a and 5b.
  • the target sequence receiving unit 11 receives target sequence data via the communication unit 16.
  • the prediction result output unit 14 transmits the prediction results to each of the communication devices 5a and 5b via the communication unit 16.
  • the off-target risk analysis device 1a may generate a web page showing the prediction results for each received target sequence for each target sequence, and provide information for accessing the web page to the user who sent the target sequence.
  • communication device 5a and communication device 5b may be communication devices used by users who are pre-registered as users of off-target risk analysis system 100a.
  • memory unit 20a may store user information 23 including information about users who are pre-registered as users of off-target risk analysis system 100a.
  • FIG. 8 is a diagram showing an example of the data structure of user information 23.
  • a user ID assigned to each user may be associated with the user's name, affiliation, and contact information.
  • a user assigned user ID "U001” has the name "AA AA”, belongs to “XX University School of Medicine”, and has contact information (e.g., email address) "AAAA@xxx.xx.xx”.
  • a user assigned user ID "U002” has the name "BB BB”, belongs to "YY Research Institute”, and has contact information "BBBB@yyy.yy.yyy".
  • a prediction result regarding a target sequence received from a user with user ID "U001" is sent to "AAAA@xxx.xx.xx".
  • the off-target risk analysis device 1a may store the prediction results for each received target sequence in the analysis result log 24 of the storage unit 20a.
  • FIG. 9 is a diagram showing an example of the data structure of the analysis result log 24.
  • the user ID assigned to each user may be associated with the target sequence data received from each user, the reception date and time, and the prediction results.
  • the target sequence data received from user ID "U001" at "PM 1:50” on “2022/9/1" and the prediction results for the target sequence are stored in association with each other.
  • the off-target risk analysis device 1a can provide prediction results obtained by analyzing target sequences received from each of multiple users to each user who is the sender of each target sequence data. For example, an administrator who manages the off-target risk analysis device 1a may charge each user (or each user's affiliated institution) a specified fee as compensation for the service of providing prediction results regarding the received target sequences.
  • the functions of the off-target risk analysis device 1, 1a can be realized by a program for causing a computer to function as the device, and a program for causing a computer to function as each control block of the device (particularly each part included in the control unit 10, 10a).
  • the device includes a computer having at least one control device (e.g., a processor) and at least one storage device (e.g., a memory) as hardware for executing the program.
  • control device e.g., a processor
  • storage device e.g., a memory
  • the program may be recorded on one or more computer-readable recording media, not on a temporary basis.
  • the recording media may or may not be included in the device. In the latter case, the program may be supplied to the device via any wired or wireless transmission medium.
  • each of the above control blocks can be realized by a logic circuit.
  • the scope of this disclosure also includes an integrated circuit in which a logic circuit that functions as each of the above control blocks is formed.
  • each process described in each of the above embodiments may be executed by AI (Artificial Intelligence).
  • AI Artificial Intelligence
  • the AI may run on the control device, or may run on another device (such as an edge computer or a cloud server).
  • the off-target risk analysis method includes a virtual sequence generation step of generating a plurality of virtual sequences including a sequence identical to a target sequence and a sequence in which at least one mutation has been introduced into the target sequence; a score calculation step of calculating, for each of the plurality of virtual sequences, a score related to the probability that a DNA-binding tool that recognizes the target sequence will act; and a prediction result output step of outputting, based on the calculated score, a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence.
  • multiple virtual sequences are generated, including a sequence identical to the target sequence and a sequence in which at least one mutation has been introduced into the target sequence, and a score related to the probability that a DNA-binding tool that recognizes the target sequence will act is calculated for each of the multiple virtual sequences. Then, based on the calculated score, a prediction result is output that indicates the possibility that the DNA-binding tool will act on a sequence different from the target sequence.
  • this off-target risk analysis method makes it possible to evaluate the risk of causing an off-target effect that a DNA-binding tool that recognizes a target sequence potentially has, without referring to the genomic sequence of the subject to which it is applied.
  • the at least one mutation may be introduced into the entire target sequence or a part of the target sequence.
  • the risk of off-target effects occurring in a portion of the target sequence is higher than the risk of off-target effects occurring in other portions.
  • the risk of off-target effects may be assessed by focusing on that portion with a high risk of off-target effects.
  • the virtual sequence generation step at least one mutation is introduced into the entire target sequence or into a part of the target sequence. This makes it possible to efficiently evaluate the risk of off-target effects.
  • the mutation may be any one of a substitution, a deletion, and an insertion.
  • the multiple virtual sequences may include (1) a sequence in which at least one adenine (A) in the target sequence is replaced with at least one of thymine (T), cytosine (C), and guanine (G), and/or (2) a sequence in which at least one thymine (T) in the target sequence is replaced with at least one of adenine (A), cytosine (C), and guanine (G), and/or (3) a sequence in which at least one cytosine (C) in the target sequence is replaced with at least one of adenine (A), thymine (T), and guanine (G), and/or (4) a sequence in which at least one guanine (G) in the target sequence is replaced with at least one of adenine (A), thymine (T), and cytosine (C).
  • the above configuration makes it possible to comprehensively generate multiple virtual sequences, including sequences identical to the target sequence and sequences in which at least one substitution has been introduced into the target sequence. This allows for an unbiased evaluation of the potential risk of off-target effects that a DNA-binding tool that recognizes a target sequence has.
  • each of the multiple virtual sequences may differ from the target sequence by four or less nucleotides.
  • the DNA binding tool is less likely to act on sequences that have low homology to the target sequence recognized by the DNA binding tool. With the above configuration, it is possible to reduce the burden on computational resources while ensuring the accuracy of the prediction results.
  • the score in any one of aspects 1 to 5 above, in the score calculation step, the score may be calculated using a value indicating the stability when the DNA-binding tool is bound to each of the multiple virtual sequences.
  • the above configuration allows the score to be calculated with high accuracy for each of multiple virtual arrays.
  • an evaluation value calculated using all of the scores calculated for each of the multiple virtual sequences may be output as the prediction result.
  • the above configuration makes it possible to easily evaluate the risk of off-target effects that a DNA-binding tool that recognizes a target sequence potentially has, using the scores calculated for each of the multiple virtual sequences.
  • the off-target risk analysis method in any one of aspects 1 to 7 above, may be such that the DNA-binding tool is (1) a zinc finger or a fusion polypeptide combining a zinc finger with a functional domain, (2) a TALE (Transcription Activator-Like Effector) or a fusion polypeptide combining a TALE with a functional domain, (3) a pentatricopeptide repeat or a fusion polypeptide combining a pentatricopeptide repeat with a functional domain, (4) a wild-type CRISPR/Cas (CRISPR-associated protein) or a fusion polypeptide-nucleic acid complex combining a wild-type CRISPR/Cas with a functional domain, or (5) a modified CRISPR/Cas (CRISPR-associated protein) or a fusion polypeptide-nucleic acid complex combining a modified CRISPR/Cas with a functional domain.
  • CRISPR/Cas CRISPR-associated protein
  • the off-target risk analysis system includes a virtual sequence generation unit that generates a plurality of virtual sequences including a sequence identical to a target sequence and a sequence in which at least one mutation has been introduced into the target sequence, a score calculation unit that calculates a score related to the probability that a DNA-binding tool that recognizes the target sequence will act for each of the plurality of virtual sequences, and a prediction result output unit that outputs a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence based on the calculated score.
  • the above configuration provides the same effect as aspect 1.
  • the program according to aspect 10 of the present disclosure causes a computer to execute a virtual sequence generation step of generating a plurality of virtual sequences including a sequence identical to a target sequence and a sequence in which at least one mutation has been introduced into the target sequence, a score calculation step of calculating, for each of the plurality of virtual sequences, a score related to the probability that a DNA-binding tool that recognizes the target sequence will act, and a prediction result output step of outputting, based on the calculated score, a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence.
  • the recording medium according to aspect 11 of the present disclosure is a computer-readable recording medium having the program described in aspect 10 recorded thereon.
  • Example 1 the first embodiment of the present disclosure will be described with reference to FIGS.
  • the correlation between the prediction results output by the off-target risk analysis method according to one embodiment of the present disclosure and the off-target incidence rate calculated as described above was examined for 14 types of target sequences.
  • virtual sequences in which four nucleotide substitutions were introduced into the target sequences were comprehensively generated.
  • the MOFF score described in Non-Patent Document 3 was used as the score calculation tool used to output the prediction results.
  • 10 to 12 are graphs showing the correlation between the prediction results output using the off-target risk analysis method according to the present disclosure and the off-target incidence rate obtained as a result of analyzing the off-target action in actual cells.
  • the prediction results shown in FIG. 10 use prediction results output using scores calculated for a virtual sequence in which mutations are comprehensively introduced into the entire target sequence.
  • the correlation between the prediction results and the off-target incidence rate is high ( R2 is 0.5675), demonstrating that the prediction accuracy of the prediction results output by the off-target risk analysis method according to one embodiment of the present disclosure is high.
  • the prediction results shown in Figure 11 are based on prediction results output using only the scores calculated for the virtual sequence in which a mutation has been introduced into the non-seed region (the 8 nucleotides immediately preceding the PAM) with low specificity in the target sequence. As shown in Figure 11, it was found that the prediction accuracy was improved by outputting the prediction results using only the scores calculated for the virtual sequence in which a mutation has been introduced into the non-seed region with low specificity ( R2 is 0.57465).
  • the prediction results shown in Figure 12 are obtained by comprehensively generating virtual sequences in which two nucleotide substitutions are introduced into the target sequence, calculating a score for each virtual sequence, and using the calculated scores to output the prediction results. As shown in Figure 12, even when the number of substitutions introduced to generate the virtual sequence was reduced from 4 to 2, the correlation between the prediction results and the off-target incidence rate decreased ( R2 was 0.5351), but the prediction accuracy of the prediction results remained high.
  • Example 2 A second embodiment of the present disclosure will be described below with reference to FIGS.
  • the virtual sequence used in this example is a base sequence in which a mismatch of up to one base pair is introduced into the entire target sequence.
  • the base sequence used in the off-target analysis experiment in Non-Patent Document 3 and evaluated with the scoring tool CRISPR-Net was used as the target sequence.
  • gRNA sequences used in the off-target analysis experiment “CHANGE-seq”
  • 59 types used in the off-target analysis experiment “TTISS”
  • 10 types used in the off-target analysis experiment “GUIDE-seq” in Non-Patent Document 3 were used.
  • guide RNA (gRNA) sequences that satisfy the following condition I were selected and used.
  • Genomic DNA sequences including PAM sequences can be extracted using the program "ExtendSeq.py” published at https://github.com/KazukiNakamae/Frame_Editor_sgRNA_selection.
  • the selected gRNA sequences were 102 used in the off-target analysis experiment “CHANGE-seq”, 54 used in the off-target analysis experiment “TTISS”, and 8 used in the off-target analysis experiment “GUIDE-seq”.
  • a virtual sequence was generated by introducing a mismatch of up to one base pair to the target sequence, which is a base sequence complementary to the selected gRNA sequence.
  • a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the target sequence was calculated by multiplying the sum of the CRISPR-Net scores by -1. That is, the forecast result is (-1) x ⁇ (CRISPR-Net score).
  • a prediction result indicating the possibility that the DNA-binding tool will act on a sequence different from the sequence that matches the target sequence was calculated by multiplying the sum of the MOFF scores by -1. That is, the forecast result is (-1) x ⁇ (MOFF score).
  • Figures 13 and 16 are scatter plots for the "CHANGE-seq” group
  • Figures 14 and 17 are scatter plots for the "TTISS” group
  • Figures 15 and 18 are scatter plots for the "GUIDE-seq” group.
  • FIG. 13 is a scatter plot showing the correlation between the specificity prediction score calculated using the CRISPR-Net score and the "Off-on-ratio" score in the "CHANGE-seq” group.
  • the Spearman correlation for the correlation between the specificity prediction score calculated using the CRISPR-Net score and the "Off-on-ratio” score in the "CHANGE-seq” group was -0.231839.
  • Figure 14 is a scatter plot showing the correlation between the specificity prediction score calculated using the CRISPR-Net score and the "Off-on-ratio" score in the "TTISS" group.
  • the Spearman correlation for the correlation between the specificity prediction score calculated using the CRISPR-Net score and the "Off-on-ratio” score in the "TTISS” group was -0.495973.
  • Figure 15 is a scatter plot showing the correlation between the specificity prediction score calculated using the CRISPR-Net score and the "Off-on-ratio" score in the "GUIDE-seq” group. As shown in Figure 15, the Spearman correlation for the correlation between the specificity prediction score calculated using the CRISPR-Net score and the "Off-on-ratio" score in the "GUIDE-seq” group was -0.380952.
  • [Correlation between specificity prediction score calculated using MOFF score and "Off-on-ratio” score] 16 is a scatter plot showing the correlation between the specificity prediction score calculated using the MOFF score and the "Off-on-ratio" score in the "CHANGE-seq” group. As shown in FIG. 16, the Spearman correlation for the correlation between the specificity prediction score calculated using the MOFF score and the "Off-on-ratio" score in the "CHANGE-seq” group was -0.566788.
  • Figure 17 is a scatter plot showing the correlation between the specificity prediction score calculated using the MOFF score and the "Off-on-ratio" score in the "TTISS" group. As shown in Figure 17, the Spearman correlation for the correlation between the specificity prediction score calculated using the MOFF score and the "Off-on-ratio" score in the "TTISS" group was -0.655498.
  • Figure 18 is a scatter plot showing the correlation between the specificity prediction score calculated using the MOFF score and the "Off-on-ratio" score in the "GUIDE-seq” group. As shown in Figure 18, the Spearman correlation for the correlation between the specificity prediction score calculated using the MOFF score and the "Off-on-ratio" score in the "GUIDE-seq” group was -0.761904.
  • Target sequence receiving unit 12 Virtual sequence generating unit 13
  • Score calculation unit 14 Prediction result output unit 100, 100a Off-target risk analysis system S2 Virtual sequence generating step S3 Score calculation step S4 Prediction result output step

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention aborde le problème de la prédiction d'un risque potentiel hors cible d'un outil de liaison à l'ADN même lorsque les informations génomiques sont inconnues ou indéterminées. La solution selon l'invention porte sur un procédé d'analyse de risque hors cible qui comprend : une étape de génération de séquence virtuelle (S2) pour générer une pluralité de séquences virtuelles comprenant une séquence identique à une séquence cible et une séquence dans laquelle au moins une mutation est introduite dans la séquence cible ; une étape de calcul de score (S3) pour calculer un score relatif à la probabilité qu'un outil de liaison à l'ADN reconnaissant la séquence cible agisse pour chacune de la pluralité des séquences virtuelles ; et une étape de délivrance de résultat prédit (S4) pour délivrer, sur la base du score calculé, le résultat prédit indiquant la potentialité que l'outil de liaison à l'ADN agit sur une séquence différente de la séquence cible.
PCT/JP2023/032841 2022-09-30 2023-09-08 Procédé d'analyse de risque hors cible, système d'analyse de risque hors cible, programme et support d'enregistrement WO2024070589A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-158915 2022-09-30
JP2022158915 2022-09-30

Publications (1)

Publication Number Publication Date
WO2024070589A1 true WO2024070589A1 (fr) 2024-04-04

Family

ID=90477418

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/032841 WO2024070589A1 (fr) 2022-09-30 2023-09-08 Procédé d'analyse de risque hors cible, système d'analyse de risque hors cible, programme et support d'enregistrement

Country Status (1)

Country Link
WO (1) WO2024070589A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021100731A1 (fr) * 2019-11-19 2021-05-27 国立大学法人 長崎大学 Procédé pour induire une recombinaison homologue à l'aide d'une nucléase cas9

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021100731A1 (fr) * 2019-11-19 2021-05-27 国立大学法人 長崎大学 Procédé pour induire une recombinaison homologue à l'aide d'une nucléase cas9

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAKASHI YAMAMOTO: ""CRISPRdirect" Utilizing Genome Editing Technology: Designing Guide RNA with Little Off-Target Action", 23 November 2021 (2021-11-23), XP093152028, Retrieved from the Internet <URL:https://web.archive.org/web/20211123033113/https://biosciencedbc.jp/about-us/files/leaflet_crisprdirect.pdf> *
TAMANO, SHIYU: "Proposal of Evaluation Off-target Effects Method in Gapmer ASO", IPSJ SIG TECHNICAL REPORT, vol. 2022-BIO-69, no. 7, 10 March 2022 (2022-03-10), pages 1 - 7, XP009554159, ISSN: 2188-8590 *

Similar Documents

Publication Publication Date Title
Van der Valk et al. Million-year-old DNA sheds light on the genomic history of mammoths
Kumar et al. The evolutionary history of bears is characterized by gene flow across species
Marchant et al. The C-Fern (Ceratopteris richardii) genome: insights into plant genome evolution with the first partial homosporous fern genome assembly
Clum et al. DOE JGI metagenome workflow
Muszewska et al. Transposable elements contribute to fungal genes and impact fungal lifestyle
Wang et al. CG gene body DNA methylation changes and evolution of duplicated genes in cassava
Park et al. Comparative analyses of DNA methylation and sequence evolution using Nasonia genomes
Igic et al. Evolutionary relationships among self-incompatibility RNases
Tirosh et al. Comparative analysis indicates regulatory neofunctionalization of yeast duplicates
Forslund et al. Evolution of protein domain architectures
Everson et al. Speciation, gene flow, and seasonal migration in Catharus thrushes (Aves: Turdidae)
Hao et al. The contributions from the progenitor genomes of the mesopolyploid Brassiceae are evolutionarily distinct but functionally compatible
Wu Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach
Zeng et al. The effects of background and interference selection on patterns of genetic variation in subdivided populations
Feng et al. Reconstructing yeasts phylogenies and ancestors from whole genome data
Klein et al. LOCAS–a low coverage assembly tool for resequencing projects
Podsiadlowski et al. The genome assembly and annotation of the Apollo butterfly Parnassius apollo, a flagship species for conservation biology
Sankaranarayanan et al. Ionizing radiation and genetic risks. XVII. Formation mechanisms underlying naturally occurring DNA deletions in the human genome and their potential relevance for bridging the gap between induced DNA double-strand breaks and deletions in irradiated germ cells
Lin et al. Probing the genomic limits of de-extinction in the Christmas Island rat
Gallegos et al. Rapid, robust plasmid verification by de novo assembly of short sequencing reads
Xu et al. An efficient pipeline for ancient DNA mapping and recovery of endogenous ancient DNA from whole‐genome sequencing data
Harikrishnan et al. Sequence and gene expression evolution of paralogous genes in willows
Zhang et al. The lack of negative association between TE load and subgenome dominance in synthesized Brassica allotetraploids
Byrnes et al. Reorganization of adjacent gene relationships in yeast genomes by whole-genome duplication and gene deletion
Branstetter et al. Phylogenomic analysis of ants, bees and stinging wasps: improved taxon sampling enhances understanding of hymenopteran evolution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23871826

Country of ref document: EP

Kind code of ref document: A1