CN117219157A - Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application - Google Patents

Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application Download PDF

Info

Publication number
CN117219157A
CN117219157A CN202311184864.9A CN202311184864A CN117219157A CN 117219157 A CN117219157 A CN 117219157A CN 202311184864 A CN202311184864 A CN 202311184864A CN 117219157 A CN117219157 A CN 117219157A
Authority
CN
China
Prior art keywords
drug
oprd
beta
gene
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311184864.9A
Other languages
Chinese (zh)
Other versions
CN117219157B (en
Inventor
高建鹏
饶冠华
韩朋
蒋智
贾雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinshao Medical Laboratory Co ltd
Tianjin Huazhinuo Technology Co ltd
Tianjin Jinke Medical Technology Co ltd
Original Assignee
Jinshi Zhizao Tianjin Medical Technology Co ltd
Tianjin Jinke Medical Technology Co ltd
Beijing Jinshao Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinshi Zhizao Tianjin Medical Technology Co ltd, Tianjin Jinke Medical Technology Co ltd, Beijing Jinshao Medical Laboratory Co ltd filed Critical Jinshi Zhizao Tianjin Medical Technology Co ltd
Priority to CN202311184864.9A priority Critical patent/CN117219157B/en
Priority to CN202410204077.4A priority patent/CN118006813A/en
Publication of CN117219157A publication Critical patent/CN117219157A/en
Application granted granted Critical
Publication of CN117219157B publication Critical patent/CN117219157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application belongs to the technical field of bioinformatics, and particularly relates to a screening method for drug-sensitive phenotype drug resistance characteristics of pseudomonas aeruginosa-carbapenem drugs, a screened characteristic gene combination for predicting the drug-sensitive phenotype of pseudomonas aeruginosa-carbapenem drugs, a kit and application thereof, and the drug-sensitive phenotype can be accurately predicted.

Description

Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application
Technical Field
The application belongs to the technical field of bioinformatics, and particularly relates to a characteristic gene for predicting a drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, a kit and application thereof.
Technical Field
Due to the long-term wide use and abuse of antibiotics, the increasing of bacterial resistance has become one of the problems to be solved urgently in global public health safety, and the accurate and rapid identification of the resistance of pathogenic bacteria to antibiotics has become an urgent need. Currently, antibacterial drug resistance (antimicrobial resistance, AMR) assays currently mainly include bacterial drug resistance phenotype assays and bacterial drug resistance gene molecule assays. The drug-resistant phenotype detection technology is mainly a traditional antimicrobial drug sensitivity test (antimicrobial susceptibility testing, AST), is a main means for clinically detecting whether a pathogen has drug resistance, is a 'gold standard' for diagnosing drug-resistant pathogen infection, but the AST still has the defects in bacterial drug resistance detection, such as long time consumption, low coverage of antibiotics and the like, and the detection means cannot meet the current medical requirements; the bacterial drug resistance genotype detection technology can only identify drug resistance genes existing in pathogenic bacteria, but bacterial drug resistance is usually caused by the synergism of multiple genes and multiple mechanisms, so that the drug resistance/sensitivity of bacteria to antibiotics can not be accurately judged only by the existence of the drug resistance genes.
Artificial intelligence (artificial intelligence, AI), in particular Machine Learning (ML), is increasingly being introduced into the field of life sciences. Machine learning directs a computer through a mathematical algorithm to build an appropriate predictive model using known data and to use the model to make decisions about new variables, commonly including Random Forest (RF), support vector machine (supportvector machines, SVM), naive Bayes (NB), and artificial neural networks (artificial neural networks, ANN), among others. Machine learning is used as a core technology of artificial intelligence, and has important potential application value in the field of predicting AMR (advanced RIR) and the like based on bacterial genome sequencing data due to unique advantages of the technology in the aspects of processing high dimensionality, big data and the like. More and more researches show that on the premise of not needing to have priori knowledge of a drug resistance mechanism, when a training data set with enough size is used for training an ML model, an accurate and rapid AST prediction can be provided by constructing a drug resistance prediction model.
Pseudomonas aeruginosa (Pseudomonas aeruginosa) is a highly developed nosocomial infectious bacterium, gram-negative, 6.3Mbp of which is much higher in genome than other common pathogenic bacteria, such as Escherichia coli (4.6 Mbp), mycobacterium tuberculosis (Mycobacterium tuberculosis,4.4 Mbp), bacillus subtilis (Bacillus subtilis,4.2 Mbp) and the like, and the huge gene library also enables the complex drug resistance mechanisms such as natural drug resistance, acquired drug resistance, adaptive drug resistance and the like (figure 1), and the generation of multiple drug resistance and even pan drug resistance can significantly increase the death rate and hospitalization cost of patients. Carbapenem antibiotics are atypical beta-lactam antibiotics, and have wide antibacterial spectrum, strong antibacterial activity and high safety. The antibiotics become important medicines for clinically treating the pseudomonas aeruginosa infection, but due to the irregular use of carbapenem antibiotics in the clinical treatment, the occurrence of drug resistance of the pseudomonas aeruginosa is aggravated, and the occurrence and the transmission of the carbapenem-resistant pseudomonas aeruginosa are quickened, so that great trouble is brought to the clinical treatment of the pseudomonas aeruginosa infection, and the clinical treatment of the pseudomonas aeruginosa infection is seriously influenced. Therefore, the carbapenem-resistant pseudomonas aeruginosa can be detected rapidly and accurately, and has important significance for reasonably using carbapenem antibiotics and preventing and controlling the propagation of drug-resistant strains.
The clinical microbiology laboratory usually determines whether the carbapenemase is produced by the aeruginosa through a drug sensitivity test, so as to judge the drug resistance of the pseudomonas aeruginosa to carbapenem antibacterial drugs. The common carbapenemase phenotype detection method mainly comprises a Minimum Inhibitory Concentration (MIC) assay method, a paper sheet diffusion method, a Hodge method, a Carba NP method and the like; however, the methods have the defects of long detection period, complicated operation, easiness in result judgment, limitation of culture medium types, limitation of enzyme type for detecting carbapenemase and the like, and have adverse effects on accurate and rapid treatment of clinical copper green antibiotics. At present, few studies have applied machine learning to the study of the prediction of the resistance to the copper green, aiming at constructing a prediction model of the resistance to the copper green which is accurate, rapid and suitable for clinically effective treatment, such as: the machine learning model is based on the fact that the machine learning model performs training excavation on a plurality of dimension characteristics such as gene variation, gene expression, existence of genes and the like, and an effect is obtained, however, the number of the characteristics screened by the model is excessive, fitting risks exist, so that the generalization performance of the constructed prediction model is reduced, and the clinical application value is low. Therefore, based on machine learning, aiming at carbapenem antibiotics which are mainly used for clinically treating the aeruginosa infection, a aeruginosa-carbapenem drug resistance prediction model which has big data support and clinical application prospect is developed, and has great significance for clinically guiding the treatment of the aeruginosa antibiotics.
In view of this, the present application has been proposed.
Disclosure of Invention
In order to solve the technical problems, the application creatively provides a screening method for drug-resistant phenotype characteristics of a pseudomonas aeruginosa-carbapenem drug and a characteristic gene for predicting drug-sensitive phenotype of the pseudomonas aeruginosa-carbapenem drug, which is screened by the method. The application carries out the comparison detection identification of the pseudomonas aeruginosa and the drug resistance genes carried by the pseudomonas aeruginosa based on the single genome Contig sequence and predicts the data analysis flow of the drug sensitivity phenotype of the pseudomonas aeruginosa-carbapenem drugs. Firstly, acquiring genome data of pseudomonas aeruginosa strains, and simultaneously collecting corresponding drug susceptibility test result data; then, based on the contig sequence of the pseudomonas aeruginosa genome, the comparison of the CARD drug resistance database and drug resistance gene annotation are carried out; meanwhile, the Oprd gene (having important effect on the resistance to the copper green) is subjected to gene mutation site detection and annotation, the degree of functional mutation is graded, and the HIGH level characteristic with the highest degree of mutation is integrated into a key characteristic; then, carrying out genotype and drug resistance phenotype data association analysis on carbapenem drugs (imipenem and meropenem), screening important characteristic genes related to drug resistance generation, and calculating weight coefficients of the important characteristic genes; finally, through ROC analysis and evaluation, the performance of a machine learning model is predicted by the aid of the verdigris-carbapenem drug sensitivity constructed based on the screened important characteristic genes.
Secondly, although the prior art discloses that some genes are related to the copper green drug resistance, for example, the clinical drug resistance phenotype of some common carbapenem enzymes such as KPC, NDM, VIM is relatively high in consistency and clinically accepted, genes such as PDC, oprD and the like are possibly related to the carbapenem drug resistance although literature reports, for special sequencing data such as an infected metagenome, the detection limit of the metagenome sequencing data is usually not ideal because of high content of sample hosts and complex sample composition, and in practice, even the genes possibly related to the carbapenem drug resistance still exist in the actual process due to the fact that the problems of target gene sequencing abundance and the like can not be used for drug sensitivity prediction, the application screens the characteristic genes/combinations of the drug sensitivity phenotype predictions available in feasible metagenome sequencing samples by combining the metagenome big data with drug sensitivity phenotype data by using a LASSO machine learning method.
Specifically, the application provides the following technical scheme:
the application firstly provides a screening method of drug-resistant phenotype characteristics of a copper green-carbapenem drug, which comprises the following steps:
1) The public database acquires genome data of pseudomonas aeruginosa strains and collects corresponding drug susceptibility test result data;
2) Performing comparison of a CARD drug resistance database and drug resistance gene annotation based on the contig sequence of the pseudomonas aeruginosa genome;
3) Performing mutation site detection and annotation based on the contig sequence of the pseudomonas aeruginosa genome to obtain a mutation site spectrum;
4) Aiming at imipenem and/or meropenem, carrying out correlation analysis on the detected and annotated drug resistance genes and mutation site spectrums and drug resistance phenotype data, screening important features related to drug resistance generation, and calculating important feature gene weight coefficients;
preferably, the method further comprises:
5) ROC analysis evaluates the performance of classification models constructed based on the screened signature genes.
Further, in step 1), the public database includes an NCBI database and/or a PATRIC database;
further, in step 3), the mutation site detection and annotation is specifically: breaking the sequence of the pseudomonas aeruginosa genome contig into a fragmented sequence, comparing the fragmented sequence with a pseudomonas aeruginosa reference genome, and then detecting a mutation site to obtain a mutation site spectrum;
further, in step 2), the alignment and drug resistance gene annotation is specifically: comparing the contig sequence with a CARD drug-resistant gene reference sequence library, filtering hit with identity less than 80% or reference gene coverage less than 80%, selecting best hit for the region on each contig as the final comparison result of the contig region, adding annotation information of drug-resistant genes, counting the detection condition of the drug-resistant genes of each strain, and summarizing into a matrix table; preferably, the 0-1 matrix table is assembled, wherein 0 indicates that no drug resistance gene is detected, and 1 indicates that the drug resistance gene is detected;
Still further, step 3) further comprises: detecting and annotating gene mutation sites of Oprd isogenes, grading the degree of functional mutation in a HIGH-medium-low mode, and integrating the characteristics of the highest degree of mutation Oprd (HIGH) as key characteristics, wherein the mutation mainly comprises SNP mutation causing CDS premature termination, inDel causing CDS frame shift and nonsense SNP mutation of an initiator; and, the gene with PPV >0.9 can further increase the template for mutation detection.
Further, in step 4), the association analysis is: according to the antibiotic drug classification information corresponding to each drug resistance gene recorded in the CARD library, based on the gene detection matrix table obtained in the step 2), the mutation site spectrum obtained in the step 3) and the 3 oprD gene mutation levels obtained in the step 3), selecting a subarray table of imipenem or meropenem related drug resistance genes, filtering out genes with lower detection frequency (preferably less than 3) and lower PPV (preferably less than 0.9), carrying out correlation analysis on the filtered table data with imipenem or meropenem drug sensitivity results by using a dragline regression model so as to obtain important characteristics related to imipenem or meropenem drug resistance by screening, and calculating important characteristic gene weight coefficients.
Further, in step 5), the evaluation is specifically: defining a negative positive interpretation index Score index:wherein arg_W represents a weight coefficient value for detecting the corresponding gene; based on an important gene weight coefficient matrix obtained by model screening, calculating the Score value of each sample by combining with the actual detection condition of the drug-resistant genes of the samples, and performing ROC curve analysis to obtain the AUC value of the training set model; further performing ROC analysis by using the verification set to obtain an AUC value of a verification set model; the higher the AUC of the training set and validation set models, the better the method performance.
Further, the screening results in important characteristics related to imipenem or meropenem resistance as follows:
aiming at imipenem, the important characteristics are as follows: KPC beta-lactase, VIMseta-lactase, oprD (p.Ser 278 Pro), ampR (p.Ser 178 Thr), GES beta-lactase, mexT (p.Asp59Ala), par (p.Leu137Pro), oprD (p.Val 359Leu), PA2020 (p.Gln 134), oxA beta-lactamase Cluster-40, PER beta-lactase, oprD (p.Leu 229 Phe), ampD (p.Val10Gly), IMP beta-lactase, ftsI (p.Phe533 Leu) and/or oprD (HIGH);
for meropenem, the important features are: KPC beta-lactate, parS (p.Ala13Thr), oprD (p.Ser 228 Pro), IMP beta-lactate, oprD (p.Val 359 Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampD (p.Gly121121), ftsI (p.Arg 504 Cys), mpl (p.Val124Gly), PA2020 (p.Arg87 Pro), parS (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly191Arg), PA2020 (p.Gly137Asp), PA (p.152Ala), oxabeta-lactamase Cluster-121, PAP (p.Cys), and pP-52-Arg (p.Alafowl), and/or pP-Alafowl (p.Alafowl) can be used to determine the conditions of the whole cell.
The application combines the important characteristic genes related to the drug resistance phenotype of the screened and determined carbapenem drugs, and constructs a data analysis method for directly carrying out comparison detection identification on target carbapenem pathogenic bacteria and drug resistance genes carried by the target carbapenem pathogenic bacteria based on mNGS sequencing reads sequence and predicting the drug resistance phenotype of the carbapenem drugs, and specifically: aiming at all the genomes of the copper-green strains in the training set which is incorporated by the early BGWAS, detecting the target drug resistance genes by simulating NGS sequencing reads sequence comparison (reads-based) and genome content sequence comparison (assembly-based), and verifying and optimizing a reads-based detection flow by taking the detection result of an assmbl-based method as a reference so as to realize the purpose of accurately detecting genotyping of the reads-based; then, a Score is calculated by a self-defined formula, the Score is used as an interpretation index for predicting the drug sensitivity property of the carbapenem drugs, ROC analysis is carried out by combining a reads sequence simulation test to determine an optimal cutoff threshold, and meanwhile, the accuracy and the performance of a prediction model are evaluated.
The present application also provides an electronic device including: a processor and a memory; the processor is connected to a memory, wherein the memory is for storing a computer program, and the processor is for invoking the computer program to perform the method as claimed in any of the preceding claims.
The present application also provides a computer storage medium storing a computer program comprising program instructions which, when executed by a processor, perform a method as claimed in any one of the preceding claims.
The application also provides the use of a reagent or module for detecting the genes KPC beta-lactaase, VIMseta-lactaase, oprD (p.Ser278Pro), ampR (p.Ser179Thr), GES beta-lactaase, mexT (p.Asp59Ala), parS (p.Leu137Pro), oprD (p.Val359Leu), PA2020 (p.Gln134), oxA beta-lactamase Cluster-40, PER beta-lactaase, oprD (p.Leu229Phe), ampD (p.Val10Gly), IMP beta-lactaase, ftsI (p.Phe53 Leu) and/or oprD (HIGH) in the preparation of a kit for predicting the drug-sensitive phenotype of Pseudomonas aeruginosa imipenem; preferably, the application of the kit for predicting the drug sensitivity phenotype of pseudomonas aeruginosa imine penem by sequencing of infection metagenome is prepared.
The application also provides a gene for detecting KPC beta-lactase, par S (p.Ala13Thr), oprD (p.Ser 278 Pro), IMP beta-lactase, oprD (p.Val 359 Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, P3047 (p.Phe171Leu), ampD (p.Gly121Glu), ftsI (p.Arg504Cys), mpl (p.Val124Gly), PA2020 (p.Arg87Pro), parS (p.Ala13Val), PA2020 (p.Lys55 Glu), mexR (p.Ala108fs), use of a reagent or module of mexT (p.gly 191 arg), PA2020 (p.gly137asp), par (p.val 152ala), ftsI (p.pro 527ser), mexR (p.gly101arg), par (p.ala1495r), ampD (p.ala96thr), ftsI (p.phe533leu), mexT (p.ala143thr), par (p.arg 383 ser), oprD (HIGH), OXA beta-lactamase Cluster-52, vim beta-lactate ase and/or OXA beta-lactamase Cluster-40 for the preparation of a kit for predicting the drug sensitive phenotype of pseudomonas aeruginosa meropenem;
Preferably, the application in preparing a kit for predicting the drug-sensitive phenotype of pseudomonas aeruginosa meropenem by infecting metagenome;
further, the drug-responsive phenotype includes a drug-resistant phenotype and a drug-responsive phenotype;
preferably, the genes are detected simultaneously, and if the detection results are negative, the detection results can be presumed to be sensitive, namely sensitive detection is realized; similarly, the above genes may be detected separately, and if any detection result is positive, it is presumed that the gene is drug-resistant, that is, drug-resistant detection is achieved.
Further, the detecting is performed at a nucleic acid level, the nucleic acid level being obtained by a sequencing technique, a nucleic acid amplification technique, a nucleic acid hybridization technique, an electrophoresis technique, a biological mass spectrometry technique, or a chromatography technique; preferably, the nucleic acid level acquisition method includes, but is not limited to, any of the following methods: gene sequencing, polymerase chain reaction, isothermal amplification, gene chip, probe hybridization, gel electrophoresis, northern blotting, and nucleic acid mass spectrometry.
Further, the sample to be tested is from one or more of tissue, cells, body fluid, serum, plasma, whole blood, urine, semen, saliva, hydrothorax, ascites, cerebrospinal fluid, stool, or synovial fluid.
The application also provides a kit for predicting the drug-sensitive phenotype of pseudomonas aeruginosa imipenem in an infected metagenome, which comprises reagents capable of detecting KPC beta-lactase, VIMb eta-lactase, oprD (p.Ser 278 Pro), ampR (p.Ser 178 Thr), GES beta-lactase, mexT (p.Asp 59Ala), parS (p.Leu137 Pro), oprD (p.Val359 Leu), PA2020 (p.Gln 134), OXA beta-lactamase Cluster-40, PER beta-lactase, oprD (p.Leu 229 Phe), ampD (p.Val10 Gly), IMP beta-lactase, ftsI (p.Phe533 Leu) and/or oprD (GH).
The application also provides a kit for predicting a meropenem drug-sensitive phenotype in an infected metagenome, comprising reagents capable of detecting KPC beta-lactamase, par S (p.Ala13Thr), oprD (p.Ser 278 Pro), IMP beta-lactamase, oprD (p.Val359Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxabeta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampD (p.Gly121Glu), ftsI (p.Arg504Cys), mpl (p.Val124Gly), PA2020 (p.Arg87Pro), parS (p.Ala13Val), PA2020 (p.Lys5555 Leu), mexR (p.Ala108), mexT (p.Alts2020), nalD (p.ValtRNAi), p-lactamase Cluster-121, P (p.Val.Val.p.fowl), and pP-37 (p.Val 121), and pP-p-35.
Further, the drug-responsive phenotype includes a drug-resistant phenotype and a drug-responsive phenotype;
preferably, the genes are detected simultaneously, and if the detection results are negative, the detection results can be presumed to be sensitive, namely sensitive detection is realized; similarly, the above genes may be detected separately, and if any detection result is positive, it is presumed that the gene is drug-resistant, that is, drug-resistant detection is achieved.
Further, by way of example, the term "oprD (HIGH)" herein refers to a type of mutation that has occurred in oprD genes that greatly affects gene function, such as SNP mutation that prematurely terminates CDS, nonsense SNP mutation that results in InDel and initiator of CDS frame shift, etc., and that greatly affects gene function; for example, oprD (p.Ser278Pro) refers to a mutation in the gene of oprD, which results in the mutation of Ser at position 278 of the corresponding coding region to Pro.
The application also provides a method for predicting the drug-sensitive phenotype of pseudomonas aeruginosa imipenem and/or meropenem, which comprises the step of obtaining the important characteristic gene level in a subject sample;
preferably, the method comprises the steps of:
(i) Obtaining a level of the above-mentioned important feature in the sample of the subject;
(ii) Comparing the level of the important characteristic gene with a control sample; wherein a significant difference in the level of the above-mentioned important trait gene in the subject sample and the control sample is indicative of drug sensitivity of the subject;
Or, (ii) comparing with a set threshold absolute amount; wherein a sample level of the subject above a threshold absolute amount is indicative of drug sensitivity of the subject.
The application has the beneficial technical effects that:
1) The application expands the application of the machine learning technology in the research direction of the drug resistance of pseudomonas aeruginosa, mainly carries out association analysis on the deletion characteristics obtained by non-core genes related to the drug resistance phenotype, finds out important drug resistance genes with high contribution degree to the drug resistance phenotype, calculates and obtains the corresponding weight coefficients of the drug resistance genes, and is convenient for conversion application to clinical drug resistance detection.
2) According to the application, through a bioinformatics means, based on collected large-sample-amount pseudomonas aeruginosa genome and carbapenem drug (imipenem and meropenem) drug sensitivity test data, a machine learning model for antibiotic drug sensitivity prediction is constructed, important genes or new characteristics related to drug resistance phenotypes are obtained through excavation and screening, and relative weight coefficients of the genes or characteristics are calculated to quantify the important influence degree of the genes or characteristics on the drug resistance phenotypes, so that the characteristics of high speed, high accuracy and the like can be realized, the traditional culture restriction is avoided, the drug resistance genes of the aeruginosa-carbapenem are detected, the drug resistance phenotypes are predicted, and the like; meanwhile, a foundation is laid for the subsequent development of a quick detection kit (Patnel) for drug resistance detection of the aeruginosa-carbapenem. In addition, the generalization capability of the machine learning model for predicting the resistance of the verdigris-carbapenem is remarkably improved by grading the characteristics of the oprD mutation site and integrating important mutation into a HIGH level.
3) When the drug-resistant gene is detected, drug-resistant gene detection and annotation are directly carried out based on the comparison method of the genomic contig sequence and the public CARD drug-resistant database, and the links of gene prediction and drug-resistant gene detection based on the comparison of the cds sequence obtained by prediction are bypassed, so that the deviation possibly introduced in the gene prediction process is avoided.
4) The application determines important drug resistance genes or characteristics with predictive significance, identifies pseudomonas aeruginosa and drug resistance gene carrying conditions by directly carrying out metagenomic sequencing (mNGS) detection on clinical specimens, and directly predicts drug sensitivity results of carbapenem drugs (imipenem and meropenem) based on the detection of the important drug resistance genes or characteristics. The application can effectively predict the drug sensitivity result of carbapenem drugs (imipenem and meropenem), and has extremely high prediction accuracy: the clinical specimen sampling verification shows that the overall drug sensitivity prediction accuracy (PPV) of the carbapenem can reach more than 89%, and the sample number ratio for clearly giving the drug sensitivity prediction result prompt is more than 60%, which is not easy in practice and the effect exceeds the conventional expectation.
Drawings
FIG. 1, pseudomonas aeruginosa drug resistance mechanism;
FIG. 2, a technical route for constructing a copper green drug resistance model;
FIG. 3, bubble diagrams of different drug resistance gene influence degrees;
FIG. 4, a thermal map of the detection distribution of the Albapenem oprD HIGH mutation;
FIG. 5, meropenrD HIGH mutation detection distribution heat map;
FIG. 6, model CV error rate and AUC variation curve (Aerugo-imipenem) for different numbers of characteristic genes;
FIG. 7, aerugo-imipenem resistance phenotype trait genes;
FIG. 8, aerugo-meropenem drug resistance phenotype trait genes;
FIG. 9, ROC curves to verify reliability of the Aerugo-imipenem trait gene;
FIG. 10, ROC curves to verify reliability of the Aerugo-meropenem trait gene;
FIG. 11 is a graph of the variation in performance (AUC values) of drug resistance predictive models simulating imipenem and meropenem at different sequencing data volumes;
figure 12, imipenem and meropenem predictive model performance (AUC values) at 15X genome data volume were simulated for training and validation sets.
Detailed Description
Embodiments of the present application will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only for illustrating the present application and should not be construed as limiting the scope of the present application. The specific conditions are not noted in the examples and are carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.
Partial term definition
Unless defined otherwise hereinafter, all technical and scientific terms used in the detailed description of the application are intended to be identical to what is commonly understood by one of ordinary skill in the art. While the following terms are believed to be well understood by those skilled in the art, the following definitions are set forth to better explain the present application.
As used herein, the terms "comprising," "including," "having," "containing," or "involving" are inclusive or open-ended and do not exclude additional unrecited elements or method steps. The term "consisting of …" is considered to be a preferred embodiment of the term "comprising". If a certain group is defined below to contain at least a certain number of embodiments, this should also be understood to disclose a group that preferably consists of only these embodiments.
The indefinite or definite article "a" or "an" when used in reference to a singular noun includes a plural of that noun.
The term "about" in the present application means a range of accuracy that one skilled in the art can understand while still guaranteeing the technical effect of the features in question. The term generally means a deviation of + -10%, preferably + -5%, from the indicated value.
Furthermore, the terms first, second, third, (a), (b), (c), and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein.
The application is illustrated below in connection with specific embodiments.
EXAMPLE 1 screening of drug-sensitive phenotype drug resistance characteristics of Pseudomonas aeruginosa-carbapenem drugs
FIG. 2 is a general technical scheme of the screening of the present application, and the detailed steps are described as follows:
step 1, searching and downloading pseudomonas aeruginosa genome and corresponding antibiotic susceptibility test result data from a public database.
Download from NCBI NDARO database: the website https is opened,// www.ncbi.nlm.nih.gov/pathens/Isolates, the search bar is input with Pseudomonas aeruginosa to search pseudomonas aeruginosa information, then the Matched Isolates sub-window is clicked on "Choose color" to select "AST biotypes" to display the information, then the table data of the whole window is downloaded, the verdigris strains with drug sensitivity test result data are arranged, and genome sequences are downloaded in batches from genome database (ftp:// ftp. NCBI. Nlm. Nih. Gov/genome) of NCBI according to the analysis ID information.
Download from PATRIC database: the website https is opened, the// patricbrc.org is selected and the BACTERIA button is clicked in the BROWSE column of the search window, the AMR Phenotypes is selected firstly, the filter is carried out by inputting Pseudomonas aeruginosa in the KEYWORS column, meanwhile, the Evidece column is Computational Method, only Laboratory Method items are reserved, the pseudomonas aeruginosa strain drug sensitive information is obtained, the data table is downloaded, the Genome is selected, and the data table is downloaded after Assembly Accession column information is added. The Genome PATRIC ID or Assemble ID of the strain with drug susceptibility test result data is found according to the Genome ID correspondence in the two tables downloaded, and then the Genome sequence is downloaded in batches from PATRIC or NCBI Genome database (ftp:// ftp. NCBI. N lm. Nih. Gov/Genome).
The downloaded genomes from NCBIPathogen Detetction and PATRIC databases are combined and filtered, and the redundant genomes are finally obtained to obtain a total 2275 pseudomonas aeruginosa genome and drug susceptibility test result data thereof, and meanwhile, the strains are randomly divided into two subsets in a random mode to be used as a model training set and a verification set respectively. Wherein the number of strains for imipenem and meropenem is as follows:
And 2, performing CARD drug resistance database comparison and drug resistance gene (ARG) detection and annotation based on the downloaded pseudomonas aeruginosa genome contig sequence. RGI (v5.2.1) is adopted to detect drug resistance genes, hit with identity less than 80% or reference gene coverage less than 80% is filtered, then best hit (first hit) is selected for the region on each contig as the final comparison result of the contig region, annotation information of the drug resistance genes is added, the detection condition of the drug resistance genes of each strain is counted, and finally, a 0-1 matrix table is assembled, wherein 0 indicates that no drug resistance genes are detected, and 1 indicates that the drug resistance genes are detected.
And 3, breaking the downloaded pseudomonas aeruginosa genome contig sequence into a 100bp sequence, and aligning the sequence fragments to a pseudomonas aeruginosa reference genome (GCF_ 000006765.1) by using BWA (v0.7.17-r 1198-dirty) to obtain an initial alignment result in a BAM format. Then, mutation site detection is carried out by using Pisces software (5.2.5.20), and a VCF text file for describing mutation results of SNP, inDel and the like is obtained. Then, the VCF file is input into SnpEff software (V5.0), the detected mutation sites are functionally annotated, and the reference document is arranged to obtain a gene list related to carbapenem drug resistance, wherein the gene list is shown in the following table:
And respectively calculating the overall PPV of each gene in the classification based on the mutation influence level (HIGH, MODERATE, LOW) given by SnpEff software, if the type, especially the HIGH type, of the mutation is judged to be the drug resistance highly relevant gene, and the model construction is directly carried out based on the mutation influence level major class in the subsequent modeling process. The PPV of each level of genetic variation effect is shown in figure 3.
And simultaneously, more samples carrying key mutation features are recalled for further improving the sensitivity of the model. Mutation detection of the gene with PPV >0.9 was further increased by the template, where only the HIGH of the oprD gene satisfied PPV greater than 0.9, and ATCC_27853, F23197, FRD1, LESB58, MTB-1, PA-VAP-4 and UCBPP-PA14 were added as templates for mutation detection of oprD (see oprD PAO1+ in FIG. 3). In addition, the mutation detection based on the HIGH dimension further improves the generalization capability of the model, as shown in fig. 4-5, a plurality of mutation sites only appear in the training set, and a part of mutation sites only appear in the verification set. This portion of the sample can be identified if detected based on the HIGH dimension.
And 4, carrying out genotype and antibiotic resistance phenotype data association analysis by using a dragline regression model based on the training sample so as to screen and find out important characteristic genes related to the generation of drug resistance. Taking imipenem as an example, other drugs are similar. And (3) selecting a submatrix form of the imipenem-related drug resistance genes based on the gene detection matrix form and the 3 mutation-level oprD genes obtained in the step (2) and the step (3) according to the antibiotic drug classification information corresponding to each drug resistance gene recorded in the CARD library, filtering out genes with low detection frequency (preferably, the detection frequency is less than 3) and low PPV (preferably, the PPV is less than 0.9), and carrying out correlation analysis on the obtained form data and the imipenem drug sensitivity result. The submatrix table data (noted X) format is as follows (data more, only part presented):
Imipenem drug susceptibility result data (noted Y) format is as follows:
Sample AST
11 R
1 R
23 R
28 R
287.1477 R
287.2972 R
... ...
the two data sets (X and Y) are taken as input, the R language glmnet program package is used for carrying out the association analysis of genotype and drug resistance phenotype data, and 10-fold cross validation is carried out to screen and obtain important characteristic genes related to imipenem drug resistance. The partial running program code is as follows:
after running the above procedure, genes related to carbapenem resistance were analyzed. Since the number of genes obtained by the preliminary analysis is often large, it is necessary to sort the genes according to importance, and then finally pick the important genes which are sorted first.
The important gene picking process is as follows:
and (3) setting gradients to select different numbers of gene combinations based on the sequenced genes obtained by preliminary analysis, constructing a dragline regression model according to the program codes, obtaining an AUC value and CV error rate of the model, calculating an AUC-error difference value, and then drawing a graph. The number of genes selected as the final number of genes was selected according to the number of abscissas corresponding to the first occurrence of a corner decrease in the course of a gradual increase in AUC-error value, or according to the number of abscissas corresponding to the first occurrence of a maximum in AUC or the first occurrence of a minimum in error (see fig. 6).
Here, 14 important genetic variation features were screened based on a machine learning model for imipenem final determination.
In summary, the finally screened important genes related to the carbapenem-resistant drugs of pseudomonas aeruginosa and the weight coefficients thereof are shown in the following table:
it can be seen that for imipenem, screening for important genes/gene characteristics associated with the drug resistant phenotype includes (see fig. 7): KPC beta-lactate, VIMseta-lactate, oprD (p.Ser 278 Pro), ampR (p.Ser 178 Thr), GES beta-lactate, mexT (p.Asp59Ala), par S (p.Leu137Pro), oprD (p.Val 359Leu), PA2020 (p.Gln 134), oxA beta-lactamase Cluster-40, PER beta-lactate, oprD (p.Leu 229 Phe), ampD (p.Val10Gly), IMP beta-lactate, ftsI (p.Phe533 Leu) and oprD (HIGH).
For meropenem, screening for important genes/gene characteristics related to drug resistant phenotype includes (see fig. 8): KPC beta-lactate, parS (p.Ala13Thr), oprD (p.Ser 228 Pro), IMP beta-lactate, oprD (p.Val 359 Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxabeta-lactamase Cluster-121, PA3047 (p.Phe 171Leu), ampD (p.Gly121121), ftsI (p.Arg504Cys), mpl (p.Val124Gly), PA2020 (p.ArgPro), parS (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly191Arg), PA2020 (p.Gly), parS (p.152 Ala), oxtI (p.527Ser), mexR (p.10140), and Pp-AlaCys (p.Alafowl), and pR (p.Alafowl (p.Ala13Val), and pR (p.Val.) are included.
Step 5, ROC analysis is carried out to determine the performance of the classification model constructed based on the screened characteristic genes
Definition of Score indexWherein arg_W represents the weight coefficient value of the detected corresponding gene), and is used as an index of yin-yang interpretation. Aiming at imipenem, based on the important gene weight coefficient matrix obtained by screening the models, the Score value of each sample is calculated by combining the actual detection condition of the drug-resistant genes of the samples, and then ROC curve analysis is carried out to obtain an AUC value of a training set (n=695) model of 0.9056. ROC analysis was then further performed with a validation set (n=177), yielding AUC values of 0.8869 (see fig. 9). The training set and verification set models have higher AUC, which shows that the method has better performance, namelyThe screening method and the model of the application are accurate and effective.
Similarly, AUC values for training set (n=1121) and validation set (n=282) were 0.924, 0.8878, respectively, for meropenem (fig. 10).
Therefore, based on the model of the application, according to the important characteristic genes which are screened and related to the carbapenem-resistant medicines, the drug sensitivity result of the corresponding antibiotics can be predicted by combining the gene weight coefficients obtained by the model when the characteristic genes are detected.
For imipenem, the target drug resistance genes/gene characteristics include KPC beta-lactate, vimbetta-lactate, oprD (p.ser278 pro), ampR (p.ser178 thr), GES beta-lactate, mexT (p.asp59ala), par (p.leu137 pro), oprD (p.val 359 leu), PA2020 (p.gln 134), OXA beta-lactamase Cluster-40, per beta-lactate, oprD (p.leu229 phe), ampD (p.val 10 gly), impbeta-lactate, ftsI (p.phe533 leu), oprD (HIGH); combining all possible mechanisms of gene weight, frequency of gene occurrence and resistance generation, resistance phenotype prediction can be performed practically by KPC beta-lactate, vimreta-lactate, oprD (p.ser278 pro), ampR (p.ser178 thr), GES beta-lactate, mexT (p.asp59ala), parS (p.leu137pro), oprD (p.val 359 leu), PA (p.gln134), OXA beta-lactamase Cluster-40, per beta-lactate, oprD (p.leu229 phe), amp (p.val10 gly), impbeta-lactate, ftsI (p.533 leu) and oprD (HIGH), for which HIGH frequency of resistance generation is mainly mediated.
For meropenem, the genes/gene signatures of interest include: KPC beta-lactate, parS (p.Ala13Thr), oprD (p.Ser 228 Pro), IMP beta-lactate, oprD (p.Val 359 Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampD (p.Gly121121P), ftsI (p.Arg504 Cys), mpl (p.Val124Gly), PA2020 (p.Arg87 Pro), parS (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly191Arg), PA2020 (p.Gly137Asp), PA (p.152Ala), ox (p.527Ser), and Pde (p.Val) and pP-container (p.AlaCys) (p.Val). By combining all possible mechanisms of gene weight, frequency of gene occurrence and drug resistance production, it is known that in practice, by KPC beta-lactate, par S (p.Ala13Thr), oprD (p.Ser 278 Pro), IMPbeta-lactate, oprD (p.Val359 Leu), ftsI (p.Ala244Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, PA3047 (p.Phe171Leu), amp D (p.Gly121Glu), ftsI (p.Arg504Cys), mpl (p.Val 124Gly), PA2020 (p.Arg87 Pro), PA S (p.Ala13Val), PA (p.Lys 2020), mexR (p.Val 359 Leu), ftP (p.Alafowl), and Gly (p.Val 313 Asp), OXA-lactamase Cluster-121, PA3047 (p.Phe 171Leu), amp (p.Gly), map (p.Gly 121), fpp.Arg-52, and oxap.p.p.Cys (p.AlvL.p.Alg3757).
Example 2 detection and identification of drug resistance genes and prediction of drug resistance phenotypes of Pseudomonas aeruginosa carbapenem drugs based on metagenomic sequencing technique
Step 1, based on the pseudomonas aeruginosa strain genome, screening a machine learning model to obtain important drug resistance genes related to carbapenem drugs, and calculating to obtain weight coefficients of the genes (see example 1).
Step 2, based on a training set 2275 cases of pseudomonas aeruginosa genome, simulating NGS sequencing reads, and detecting and correcting the carbapenem drug resistance genes through a reads-based comparison method.
The ART_Illumina software (Version 2.5.8) was used to simulate the test verification of the read-based drug resistance gene detection procedure of 75bp short reads (NGS sequencing platform), and then the test verification was performed to simulate different gradient data amounts (parameter settings-ss NS 50-l 75-f 5-nf 0-rs 1), such as 0.05X, 0.1X, 0.2X, 0.3X, 0.4X, 0.5X, 0.6X, 0.7X, 0.8X, 0.9X, 1X, 2X, 3X, 5X, 10X, 30X, etc., and then carbapenem drug resistance gene detection and screening was performed, and the detailed method can be referred to in applicant's early patent CN202111680866.8. After the drug resistance gene detection result of each simulated specimen is obtained, the Score index value of the specimen is defined and calculated according to the weight coefficients of the important genes and the corresponding gene families, and the calculation formula is as follows:
Wherein genevalatism_wi represents the weight coefficient of the variation characteristic of the drug resistance characteristic, genevaloly_wi represents the weight coefficient of the corresponding gene family, and if only one type of characteristic, such as gene variation, is detected, the drug resistance model is mainly driven by the variation.
For the simulation test under different data volumes based on the training set strain, ROC curve analysis is carried out based on the drug resistance gene detection result, the training set strain actual drug sensitivity result and the sample Score index value, model performance AUC values of meropenem and imipenem under different data volumes are obtained, and then a change curve of the model performance AUC values is drawn as shown in figure 11. At 15X data, model performance has stabilized, at which point the verdigris genome coverage is greater than 95%, at which point meropenem and imipenem model performance AUC values are shown in fig. 12. Finally, the report rules ("drug resistance" or "sensitivity") and cutoff thresholds of pseudomonas aeruginosa to two antibiotics, meropenem and imipenem, are determined as follows, and the detailed method can be seen in the applicant's early patent CN202111680866.8 method.
Example 3 clinical samples meropenem and imipenem resistance Gene detection and validation
In this example, 105 clinical samples containing Pseudomonas aeruginosa were collected and identified by clinical culture, and a metagene second generation (insert length 200-400 bp) library was constructed after nucleic acid extraction of all the clinical samples, and on-machine sequencing was performed for the second generation (Illumina nextseq CN SE 75), followed by a belief analysis.
Based on the constructed data analysis flow for predicting antibiotic drug resistance phenotype based on gene sequencing reads comparison, pathogen and drug resistance gene carried by the pathogen are identified on the next machine data, and then a Score value and drug susceptibility result prediction judgment are calculated.
Finally, the pseudomonas aeruginosa and drug resistance gene detection identification result and drug sensitivity prediction result of each specimen are obtained, and the results of 43 cases of clinical metagenome samples are shown in the following table:
description: ND is not detected, and "-" is not predicted.
The medicine sensitivity prediction accuracy and the reported sample number proportion are obtained through statistics and are specifically shown in the following table:
the results show that the drug resistance characteristic gene combination determined by the application can effectively and accurately identify the pseudomonas aeruginosa and the drug resistance gene carried by the pseudomonas aeruginosa aiming at clinical samples, can effectively predict drug susceptibility results of imipenem and meropenem drugs, and can be used for assisting in clinically detecting and diagnosing the drug resistance pseudomonas aeruginosa.
Therefore, for carbapenem drugs imipenem, all possible mechanisms of gene weight, gene family weight, frequency of gene occurrence and drug resistance generation are combined, and the like, by detecting the genes of KPC beta-lactate, VIMb eta-lactate, oprD (p.Ser 278 Pro), ampR (p.Ser 178 Thr), GES beta-lactate, mexT (p.Asp59Ala), parS (p.Leu137Pro), oprD (p.Val359Leu), PA2020 (p.Gln134), oxA beta-lactamase Cluster-40, PER beta-lactate, oprD (p.Leu229 Phe), amamp D (p.Val10 Gly), IMP beta-lactate, ftsI (p.533Leu) and oprD (HIGH) simultaneously, namely, the sensitive detection results can be presumed to be negative, if the genes are all sensitive; similarly, the genes may be detected separately, and if any one or more of the detection results of the drug resistance characteristics are positive, it is presumed that the gene is drug resistant.
While for the carbapenem drug meropenem, all possible mechanisms of gene weight, gene family weight, frequency of gene occurrence and drug resistance generation are combined, and the like, the factors such as KPC beta-lactate, par S (p.Ala13Thr), oprD (p.Ser278Pro), IMP beta-lactate, oprD (p.Val359Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxa beta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampD (p.Gly121Glu), ftsI (p.Arg504), mpl (p.Val124Gly), PA2020 (p.Arg 87 Pro), parS (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly191Arg), PA2020 (p.Gly137Asp), parS (p.Val152Al a), ftsI (p.Pro527Ser), mexR (p.Gly101Arg), parS (p.Ala1495r), ampD (p.Ala96Thr), ftsI (p.Phe53Leu), mexT (p.Ala143Thr), parS (p.Arg383S), oprD (HIGH), oxA beta-lactamase Cluster-52, VIM beta-tamase and OxA beta-lactamase Cluster-40 genes were detected simultaneously, and if any one or more of the drug resistance characteristics detected were positive, they could be drug resistance; if the detection results are negative, the detection results can be presumed to be sensitive, namely the drug sensitivity is realized.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (10)

1. The screening method for the drug-resistant phenotype characteristics of the pseudomonas aeruginosa carbapenem drug is characterized by comprising the following steps:
1) The public database acquires genome data of pseudomonas aeruginosa strains and collects corresponding drug susceptibility test result data;
2) Performing comparison of a CARD drug resistance database and drug resistance gene annotation based on the contig sequence of the pseudomonas aeruginosa genome;
3) Performing mutation site detection and annotation based on the contig sequence of the pseudomonas aeruginosa genome to obtain a mutation site spectrum;
4) Aiming at carbapenem drugs, correlation analysis is carried out on the detected and annotated drug resistance genes and mutation site spectrums and drug resistance phenotype data, important characteristic genes related to drug resistance generation are obtained through screening, and the weight coefficient of the important characteristic genes is calculated;
preferably, the method further comprises:
5) ROC analysis evaluates the performance of classification model constructed by the screened important characteristic genes.
2. The method according to claim 1, wherein,
in the step 1), the public database comprises an NCBI database and/or a PATRIC database;
in the step 2), the comparison and drug resistance gene annotation specifically comprises the following steps: comparing the contig sequence based on the pseudomonas aeruginosa genome with a CARD drug-resistant gene reference sequence library, filtering hit with the identity of less than 80% or the reference gene coverage of less than 80%, selecting best hit for each contig region as the final comparison result of the contig region, adding comment information of drug-resistant genes, counting the drug-resistant gene detection condition of each strain, and summarizing into a matrix table.
3. The method according to claim 1, wherein,
in the step 3), the detection and annotation specifically includes: breaking the sequence of the pseudomonas aeruginosa genome contig into a fragmented sequence, comparing the fragmented sequence with a pseudomonas aeruginosa reference genome, and then detecting a mutation site to obtain a mutation site spectrum;
preferably, step 3) further includes: detecting and annotating gene mutation sites of the Oprd genes, grading the functional mutation degree in a high-medium-low mode, and integrating the feature with the highest mutation degree into key features; more preferably, the variation includes SNP variation resulting in premature termination of CDS, inDel resulting in frame shift of CDS, and nonsense SNP mutation of the initiator.
4. The screening method according to claim 1, wherein in the step 4), the carbapenem is imipenem and/or meropenem; the association analysis specifically comprises the following steps: according to the antibiotic drug classification information corresponding to each drug resistance gene recorded in the CARD library, based on the gene detection matrix table obtained in the step 2), the mutation site spectrum obtained in the step 3) and the oprD gene mutation level obtained in the step 3), selecting a submatrix table of imipenem or meropenem related drug resistance genes, filtering out genes with lower detection frequency and lower PPV (preferably, the detection frequency is less than 3, and the PPV is less than 0.9), and carrying out correlation analysis on imipenem and/or meropenem drug sensitivity results by using a dragline regression model according to the filtered table data so as to obtain important characteristics related to imipenem or meropenem drug resistance.
5. The screening method according to claim 1, wherein in step 5), the evaluation is specifically: defining a negative positive interpretation index Score index:wherein arg_W represents a weight coefficient value for detecting the corresponding gene; based on an important gene weight coefficient matrix obtained by model screening, calculating the Score value of each sample by combining with the actual detection condition of the drug-resistant genes of the samples, and performing ROC curve analysis to obtain the AUC value of the training set model; further performing ROC analysis by using the verification set to obtain an AUC value of a verification set model; the higher the AUC of the training set and validation set models, the better the method performance.
6. The method of any one of claims 1-5, wherein the screening results in a gene of an important characteristic associated with drug resistance generation:
for imipenem, the important characteristic genes are: KPC beta-lactase, VIMseta-lactase, oprD (p.Ser 278 Pro), ampR (p.Ser 178 Thr), GES beta-lactase, mexT (p.Asp59Ala), par (p.Leu137Pro), oprD (p.Val 359Leu), PA2020 (p.Gln 134), oxA beta-lactamase Cluster-40, PER beta-lactase, oprD (p.Leu 229 Phe), ampD (p.Val10Gly), IMP beta-lactase, ftsI (p.Phe533 Leu) and/or oprD (HIGH);
For meropenem, the important characteristic genes are: KPC beta-lactate, parS (p.Ala13Thr), oprD (p.Ser 228 Pro), IMP beta-lactate, oprD (p.Val 359 Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampD (p.Gly121121), ftsI (p.Arg 504 Cys), mpl (p.Val124Gly), PA2020 (p.Arg87 Pro), parS (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly191Arg), PA2020 (p.Gly137Asp), PA (p.152Ala), oxabeta-lactamase Cluster-121, PAP (p.Cys), and pP-52-Arg (p.Alafowl), and/or pP-Alafowl (p.Alafowl) can be used to determine the conditions of the whole cell.
7. Use of a reagent or module for detecting important signature genes KPC beta-lactate, vimreta-lactate, oprD (p.ser278pro), ampR (p.ser179thr), GES beta-lactate, mexT (p.asp59ala), par s (p.leu137pro), oprD (p.val 359leu), PA2020 (p.gin134 x), OXA beta-lactamase Cluster-40, per beta-lactate, oprD (p.leu229 phe), ampD (p.val10gly), IMP beta-lactate, ftsI (p.phe533 leu) and/or oprD (HIGH) in the preparation of a kit for predicting the southern drug sensitive phenotype of pseudomonas aeruginosa by genome sequencing infection; preferably, the drug-responsive phenotype includes a drug-resistant phenotype and a drug-responsive phenotype.
8. The important characteristic genes KPC beta-lactase, par S (p.Ala13Thr), oprD (p.Ser 278 Pro), IMP beta-lactase, oprD (p.Val359Leu), ftsI (p.Ala244 Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, P3047 (p.Phe171Leu), ampD (p.Gly121Glu), ftsI (p.Arg 504Cys), mpl (p.ValGly), PA2020 (p.Arg87Pro), parS (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly191), use of PA2020 (p.Gly137Asp), par (p.Val152Ala), ftsI (p.Pro527Ser), mexR (p.Gly101Arg), par (p.Ala1495r), ampD (p.Ala96Thr), ftsI (p.Phe533Leu), mexT (p.Ala143Thr), par (p.Arg383 ser), oprD (HIGH), oxabeta-lactamase Cluster-52, VIM beta-lactate and/or OxA beta-lactamase Cluster-40 in the preparation of a kit for predicting a meropenem drug-sensitive phenotype in pseudomonas aeruginosa by sequencing; preferably, the drug-responsive phenotype includes a drug-resistant phenotype and a drug-responsive phenotype.
9. A kit for predicting a drug-sensitive phenotype of pseudomonas aeruginosa imipenem or meropenem by metagenomic sequencing, characterized in that the kit comprises reagents capable of detecting KPC beta-lactaase, vimbreta-lactase, oprD (p.ser278pro), ampR (p.ser179thr), GES beta-lactase, mexT (p.asp59ala), parS (p.leu137pro), oprD (p.val 359 leu), PA2020 (p.gln134), OXA beta-lactamase Cluster-40, per beta-lactase, oprD (p.leu229 phe), ampD (p.10 val), IMP beta-lactase, ftsI (p.phe533 leu) and/or oprD (HIGH);
Or the kit comprises a reagent capable of detecting KPC beta-lactamase, par S (p.Ala13Thr), oprD (p.Ser278 Pro), IMP beta-lactamase, oprD (p.Val359 Leu), ftsI (p.Ala244Thr), nalD (p.Val151Leu), oprD (p.Gly316 Asp), oxA beta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampD (p.Gly121Glu), ftsI (p.Arg50Cys), mpl (p.Val124Gly), PA2020 (p.Arg87 Pro), par S (p.Ala13Val), PA2020 (p.Lys55Glu), mexR (p.Ala108fs), mexT (p.Gly19120 Arg), PA (p.Gly137Asp), PA S (p.152Ala), oxabeta-lactamase Cluster-121, PA3047 (p.Phe171Leu), ampI (p.Gly121Glu), F (p.Arg) and/or other reagents (p.Ala217Val) and (p.Ala218P) respectively (p.Ala3858 Arg), p-35, p-AlaP (p.Ala13Val) and p-p.55 Leu; preferably, the drug-responsive phenotype includes a drug-resistant phenotype and a drug-responsive phenotype.
10. A method for predicting the drug-sensitive phenotype of pseudomonas aeruginosa imipenem and/or meropenem by metagenomic sequencing of infection, comprising the step of obtaining the level of the important characteristic gene in a sample of a subject;
preferably, the method specifically comprises the following steps:
(i) Obtaining the level of the above-mentioned important characteristic gene in the sample of the subject;
(ii) Comparing the level of the important characteristic gene with a control sample; wherein a significant difference in the level of the above-mentioned important trait gene in the subject sample and the control sample is indicative of drug sensitivity of the subject;
or, comparing with a set threshold absolute amount; wherein a sample level of the subject above a threshold absolute amount is indicative of drug sensitivity of the subject.
CN202311184864.9A 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application Active CN117219157B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202311184864.9A CN117219157B (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application
CN202410204077.4A CN118006813A (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drug

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311184864.9A CN117219157B (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202410204077.4A Division CN118006813A (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drug

Publications (2)

Publication Number Publication Date
CN117219157A true CN117219157A (en) 2023-12-12
CN117219157B CN117219157B (en) 2024-04-09

Family

ID=89038308

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202410204077.4A Pending CN118006813A (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drug
CN202311184864.9A Active CN117219157B (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202410204077.4A Pending CN118006813A (en) 2023-09-14 2023-09-14 Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drug

Country Status (1)

Country Link
CN (2) CN118006813A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180265913A1 (en) * 2015-07-22 2018-09-20 Ares Genetics Gmbh Genetic testing for predicting resistance of pseudomonas species against antimicrobial agents
CN114067912A (en) * 2021-11-23 2022-02-18 天津金匙医学科技有限公司 Method for screening important characteristic genes related to drug-resistant phenotype of bacteria based on machine learning
CN114354936A (en) * 2022-01-12 2022-04-15 上海交通大学医学院附属第九人民医院 Method for screening cetuximab drug resistance biomarkers, biomarkers screened by method and application of biomarkers
CN115305292A (en) * 2022-09-19 2022-11-08 北京金匙医学检验实验室有限公司 Characteristic gene combination, kit and sequencing method for predicting antibiotic drug sensitive phenotype of staphylococcus aureus
CN116665771A (en) * 2023-06-01 2023-08-29 福建和瑞基因科技有限公司 Predictive model for simultaneously detecting multiple tumors and carrying out tissue tracing and training method and application thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180265913A1 (en) * 2015-07-22 2018-09-20 Ares Genetics Gmbh Genetic testing for predicting resistance of pseudomonas species against antimicrobial agents
CN114067912A (en) * 2021-11-23 2022-02-18 天津金匙医学科技有限公司 Method for screening important characteristic genes related to drug-resistant phenotype of bacteria based on machine learning
CN114606331A (en) * 2021-11-23 2022-06-10 天津金匙医学科技有限公司 Application of non-core type drug resistance gene in prediction of drug sensitivity of Klebsiella pneumoniae
CN114354936A (en) * 2022-01-12 2022-04-15 上海交通大学医学院附属第九人民医院 Method for screening cetuximab drug resistance biomarkers, biomarkers screened by method and application of biomarkers
CN115305292A (en) * 2022-09-19 2022-11-08 北京金匙医学检验实验室有限公司 Characteristic gene combination, kit and sequencing method for predicting antibiotic drug sensitive phenotype of staphylococcus aureus
CN116665771A (en) * 2023-06-01 2023-08-29 福建和瑞基因科技有限公司 Predictive model for simultaneously detecting multiple tumors and carrying out tissue tracing and training method and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐荣;谢;张聪玲;谭为;: "铜绿假单胞菌产β-内酰胺酶与耐药相关性分析", 实验与检验医学, no. 04, 15 August 2016 (2016-08-15) *
阮尉月清;刘家法;张米;李健健;杨壁珲;邓雪媚;董兴齐;: "云南省男男性行为人群HIV/AIDS病例抗病毒治疗失败基因型耐药分析", 预防医学, no. 10, 29 September 2020 (2020-09-29) *

Also Published As

Publication number Publication date
CN118006813A (en) 2024-05-10
CN117219157B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
Břinda et al. Rapid inference of antibiotic resistance and susceptibility by genomic neighbour typing
Steele et al. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
Sangiovanni et al. From trash to treasure: detecting unexpected contamination in unmapped NGS data
CN112530519B (en) Method and system for detecting microorganisms and drug resistance genes in sample
CN114067912B (en) Method for screening important characteristic genes related to drug-resistant phenotype of bacteria based on machine learning
Spinali et al. Microbial typing by matrix-assisted laser desorption ionization–time of flight mass spectrometry: do we need guidance for data interpretation?
CN113160882B (en) Pathogenic microorganism metagenome detection method based on third generation sequencing
CN108319813A (en) Circulating tumor DNA copies the detection method and device of number variation
CN114333987B (en) Data analysis method for predicting drug resistance phenotype based on metagenomic sequencing
CN110577998A (en) Construction of molecular model for predicting postoperative early recurrence risk of liver cancer and application evaluation thereof
Shi et al. Role of MIRU-VNTR and spoligotyping in assessing the genetic diversity of Mycobacterium tuberculosis in Henan Province, China
US20200294628A1 (en) Creation or use of anchor-based data structures for sample-derived characteristic determination
US20230141128A1 (en) Molecular technology for predicting a phenotypic trait of a bacterium from its genome
CN105986013A (en) Method and device for determining microbial species
CN115064215B (en) Method for tracing strains and identifying attributes through similarity
Hoffman et al. Species-level resolution of female bladder microbiota from 16S rRNA amplicon sequencing
Dettman et al. Phylogenomic analyses of Alternaria section Alternaria: A high-resolution, genome-wide study of lineage sorting and gene tree discordance
Mariner-Llicer et al. Accuracy of an amplicon-sequencing nanopore approach to identify variants in tuberculosis drug-resistance-associated genes
CN114038501B (en) Background bacterium judgment method based on machine learning
Zhang et al. MaLAdapt reveals novel targets of adaptive introgression from Neanderthals and Denisovans in worldwide human populations
CN114388062A (en) Method, equipment and application for predicting antibiotic resistance phenotype based on machine learning
Wang et al. Large-scale samples based rapid detection of ciprofloxacin resistance in Klebsiella pneumoniae using machine learning methods
CN117219157B (en) Characteristic gene for predicting drug sensitivity phenotype of pseudomonas aeruginosa carbapenem drugs, kit and application
Jia et al. Combining comparative genomic analysis with machine learning reveals some promising diagnostic markers to identify five common pathogenic non‐tuberculous mycobacteria
Owen Bacterial taxonomics: finding the wood through the phylogenetic trees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 110, 1st Floor, Building 3, No. 2 East Binhe Road, Youanmenwai, Fengtai District, Beijing, 100050

Patentee after: Beijing Jinshao Medical Laboratory Co.,Ltd.

Country or region after: China

Patentee after: Tianjin JinKe Medical Technology Co.,Ltd.

Patentee after: Tianjin Huazhinuo Technology Co.,Ltd.

Address before: Room 110, 1st Floor, Building 3, No. 2 East Binhe Road, Youanmenwai, Fengtai District, Beijing, 100050

Patentee before: Beijing Jinshao Medical Laboratory Co.,Ltd.

Country or region before: China

Patentee before: Tianjin JinKe Medical Technology Co.,Ltd.

Patentee before: Jinshi Zhizao (Tianjin) Medical Technology Co.,Ltd.