CN117070620A

CN117070620A - Marker and method for identifying central precocious girls in vitro

Info

Publication number: CN117070620A
Application number: CN202310948982.6A
Authority: CN
Inventors: 沈昊; 周莎莎; 沈益行; 李华; 沈忆芬; 刘超; 郑江南; 杨涛
Original assignee: Suzhou Ninth People's Hospital Suzhou Wujiang District First People's Hospital; Suzhou Jingmai Biotechnology Co ltd
Current assignee: Suzhou Ninth People's Hospital Suzhou Wujiang District First People's Hospital; Suzhou Jingmai Biotechnology Co ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-11-17

Abstract

The application provides an in-vitro identification central precocious girl detection marker and a method, wherein the expression level of has-miR-584-5p, hsa-miR-625-3p and has-miR-7-5p in miRNA markers in serum of central precocious girl is found to be obviously changed; the method can be used as a biological molecular marker for detecting central precocious girls, and is used for the second-generation sequencing detection of circulating miRNA in a blood sample of a subject; the expression quantity of each miRNA in the test sample is obtained through data analysis, a feature subset is obtained through random forest modeling based on the expression quantity of the miRNA markers in the test sample, and the accuracy of miRNA combination is evaluated through calculation of AUC values, so that whether the subject suffers from central precocity is judged.

Description

Marker and method for identifying central precocious girls in vitro

Technical Field

The application relates to the technical field of biology, in particular to an in-vitro identification central precocious girl detection marker and a method thereof.

Background

Central precocity (Central precocious puberty), also known as true precocity, complete precocity, is the result of premature activation and maturation of the hypothalamic-pituitary-gonadal (HPG) axis. The incidence rate of children with sexual precocity in China is in a trend of increasing year by year and gradually decreasing in incidence age, and the incidence rate of girls is 10 times that of boys, and the children with sexual precocity are the second most endocrine diseases of children with obesity at present. The central precocious puberty refers to the fact that girls are clinically diagnosed through a GnRH excitation test, clinical confirmation standards are compiled according to the national endocrinological genetic metabonomics group of the Chinese medical society, and the biggest harm of children is that epiphysis closed in advance, so that short stature is caused, even the life height is less than 150cm, and the risks of onset of breast cancer, endometrial cancer, obesity, type 2 diabetes mellitus, cardiovascular diseases and the like after adults are obviously increased.

At present, central precocity cannot be effectively diagnosed through growth rate, height, body mass index and the like, or through bone age examination, hypothalamic pituitary nuclear magnetic resonance examination, uterine ovarian ultrasonic examination, endocrine hormone examination and the like. The clinically accepted diagnostic criteria for central precocity is the gonadotrophin releasing hormone (Gonadotrpin-releasing hormone, gnRH) challenge test. GnRH challenge tests diagnose central precocity by intravenous administration of gonadorelin, by intravenous blood withdrawal at 30, 60, 90 minutes before and after injection, for example, by a peak LH of >5.0I U/L and a peak LH/FSH of >0.6 after gonadorelin injection. Systemic and local allergic reactions occur in part of patients injected with gonadorelin, so that GnRH challenge tests are not suitable for outpatient service, hospitalization examination is needed, venous blood needs to be repeatedly extracted for the examination, and patients often have resistant moods, so that the application of the GnRH challenge tests in central precocity differential diagnosis is limited.

Micrornas (mirnas) are a class of non-coding short RNAs of about 19-25 nucleotides in length. It is capable of degrading target gene mRNA or inhibiting its translation by complete or incomplete pairing with the 3' UTR of the target gene mRNA. Past studies have shown that mirnas are involved in a variety of regulatory pathways including development, viral defense, hematopoiesis, organogenesis, cell proliferation and death, and the like. In recent years, a great deal of research shows that the abundance change of miRNAs is closely related to the occurrence and development of various diseases. Wherein blood circulating mirnas can exhibit significantly different expression profiles depending on the individual's different physiological and pathological states, and thus can be used to distinguish between normal and disease states. At present, the circulating miRNA is used for assisting in differential diagnosis, has dye fingers in various aspects of cancers, neurodegenerative diseases, cardiovascular and cerebrovascular diseases, aging and the like, but has not been developed in central precocity.

Therefore, it is necessary to develop in vitro detection markers of central precocity with clinical application value, and corresponding detection methods and reagents, so as to be used for rapid and convenient detection of central precocity population, and facilitate early clinical intervention.

Disclosure of Invention

The application aims to provide an in-vitro identification central precocious girl detection marker and a method, which are friendly to patients, simple in material acquisition compared with GnRH excitation tests and high in accuracy compared with other clinical detection means.

In a first aspect of the present application, there is provided an in vitro identification of a central precocious girl detection marker, said marker being a free miRNA marker from human serum, wherein said miRNA marker comprises: one or more than two of has-miR-584-5p, hsa-miR-625-3p and has-miR-7-5 p.

Further, the miRNA marker is mature miRNA in serum.

Furthermore, the expression level of the miRNA marker in blood is a relative expression level, and the miRNA marker has obvious statistical difference between central precocious girls and healthy girls, and can distinguish the central precocious girls from healthy girls and also distinguish the central precocious girls from simple breast precocious girls.

Furthermore, the free miRNA samples of the serum of the central precocious girl and the serum of the normal healthy girl form a training set, the samples are subjected to second generation sequencing and data analysis, and based on the miRNA with obvious differential expression between the serum sample of the central precocious girl and the serum sample of the normal healthy girl, the miRNA with statistical significance is screened out by using a recursive feature elimination cross validation (Recursive features eliminate cross validation, RFECV) method to be used as a marker.

The application also provides a method for identifying the central precocious girl detection marker in vitro, which is characterized by comprising the following steps:

(a) The method comprises the steps of forming a training set by using free miRNA samples of serum of a central precocious girl and normal healthy girl, and determining the expression level RPM of each miRNA in the samples by comparing the positions of the miRNAs in a human reference genome after library preparation, second-generation sequencing and data analysis of the samples;

(b) Taking the expression quantity RPM of miRNA as an independent variable, carrying out feature screening by utilizing a random forest, and selecting a feature subset with the most prediction capability for predicting the performance of the model;

(c) And (3) performing permutation and combination on the obtained feature subsets, drawing ROC curves one by one, calculating AUC values, and taking the highest AUC combination as a marker.

Further, in step (a), RPM of each miRNA expression level in the sample is obtained, specifically comprising the steps of:

(a1) Obtaining the next-machine data after library preparation and second-generation sequencing of the sample, and performing data quality control and pretreatment on the next-machine data through a quality control tool to obtain effective data with low-quality sequences and sequencing joints removed;

(a2) Comparing the sequence of the effective data with a human reference genome sequence to obtain miRNA position information positioned in the human reference genome sequence, wherein the miRNA position information is taken from a miRBase database, and when the 5 'end of a certain sequence is consistent with the 5' end position of a certain miRNA, the sequence is recorded as a sequencing sequence of the miRNA;

(a3) Determining the expression level RPM of each miRNA in the test sample, wherein RPM (reads per million) is the unit of expression level, and the expression level RPM of a miRNA is the percentage of the total amount of the miRNA sequencing sequences in the total amount of all sequencing sequences of the sample that are comparable to the human reference genome.

Further, in step (b), selecting a feature subset having the most predictive capability for predicting model performance, specifically comprising the steps of:

(b1) Carrying out Min-Max standardization treatment on the expression level RPM value of each miRNA, and firstly calculating the minimum value X of each feature _min And maximum value X _max For each feature j and sample i, a data scaled value X is calculated using equation S1 _scaled The formula S1 is:

(b2) The two types of feature subsets were labeled, i.e., { "central precocity": 0, "NC":1}, training the middle sexual precocity and normal healthy serum samples;

(b3) Gradually converging by using an RFECV algorithm, reserving a feature subset with higher accuracy, so that through cyclic iteration, finally converging to a plurality of miRNAs, wherein the finally converged miRNAs are required to meet the following characteristics, (1) the miRNA combination shows the highest feature value; (2) The number of miRNAs can be in the range of computer acceptance when being arranged and combined.

In some embodiments, the convergence speed is controlled by modifying the min_features_to_select parameter, so as to obtain the effect of different feature quantities, and the feature subset with the highest accuracy is selected, so that the number of feature variables is determined.

Further, in step (c), the highest AUC combination is obtained, specifically comprising the steps of:

(c1) C, arranging and combining the feature subsets obtained in the step b3, drawing ROC curves one by combining, and calculating AUC values;

(c2) The miRNA combination with the maximum AUC value is taken as a marker.

Compared with the prior art, the application has the beneficial effects that:

(1) The peripheral blood sample is easier to obtain, has strong clinical operability and small wound, is favorable for a tested person to accept the detection, has wide application prospect, has better stability and higher content of serum miRNA, has relatively lower difficulties in extraction, library establishment and sequencing, and needs conventional experimental technology and easily purchased reagents and medicines;

(2) Compared with the existing means for detecting central precocity, the miRNA marker disclosed by the application has higher detection capability for central precocity identification, and the experimental cost based on second-generation sequencing is also in an acceptable range;

(3) By using the miRNA marker and the expression quantity RPM of the miRNA marker in the test sample, the application can judge whether the individual of the test sample suffers from central precocity or not by adopting a simple formula calculation, and the data analysis method is not complex, so that the application can be mastered by ordinary technicians quickly.

Drawings

The foregoing and other features of the present disclosure will be more fully described when considered in conjunction with the following drawings. It is appreciated that these drawings depict only several embodiments of the present disclosure and are therefore not to be considered limiting of its scope. The present disclosure will be described more specifically and in detail by using the accompanying drawings.

FIG. 1 shows that three miRNA combinations of has-miR-584-5p, hsa-miR-625-3p and has-miR-7-5p are obtained according to the arrangement of AUC, and the AUC value is 0.9885 at the highest; the combined AUC value of hsa-miR-625-3p, has-miR-874-3p, hsa-miR-181c-3p and hsa-miR-1290 is 0.9722 in the next highest; ROC plots with AUC values 0.9685 for three miRNA combinations of has-miR-584-5p, has-miR-652-3p and hsa-miR-181c-3 p.

Detailed Description

The following examples are described to aid in the understanding of the application and are not, nor should they be construed in any way to limit the scope of the application.

The experimental procedures, which do not address the specific conditions in the examples below, are followed by conventional experimental conditions, such as those described in the molecular cloning laboratory Manual of Sambrook et al (New York: cold Spring Harbor Laboratory Press, 1989), or by the manufacturer's recommendations. Percentages and parts are by weight unless otherwise indicated. Unless otherwise specified, the materials used in the examples are all commercially available products.

Example 1: obtaining training set samples

The applicant collected 28 cases of peripheral venous blood of girls with precocious central sexual intercourse from 7 months 2022 to 2 months 2023, each case containing 10mL of peripheral blood, with an average age of 8.6 years and an age distribution of 8.1-9.1 years. Meanwhile, 23 samples of peripheral venous blood of normal healthy girls (namely healthy controls without various diseases, the following is the same) are collected by the applicant, each sample contains 10mL of peripheral blood, the average age is 8.2 years, and the age distribution is 7.8-8.6. The two groups of samples are taken as training group samples, and the ages of the two groups of samples are not statistically significantly different, and the gender is female, so that the principle of gender and age matching is satisfied. For each peripheral blood sample, sequencing library preparation and second generation sequencing were performed to obtain off-machine data.

Example 2: obtaining sample miRNA expression profiles by sequencing libraries

Each training set sample was subjected to library preparation and second generation sequencing using the following reagents and procedures:

(1) Collecting 10mL of peripheral blood sample by a dry blood collection tube, standing at 4 ℃ for more than half an hour, then obtaining 400g of free RNA, centrifuging at 4 ℃ for 10 minutes to obtain supernatant, obtaining serum sample, and storing in a refrigerator at-80 ℃;

(2) 50-200ng of serum free RNA was extracted from the above serum samples using the Qiagen miRNeasy Serum/Plasma Kit (cat# 217184), diluted to a total volume of 4. Mu.L with ultrapure water (DNase and RNase free, the same applies hereinafter), and placed in 200. Mu.L thin-walled PCR tubes;

(3) Adding 1 mu L of 10 mu M-concentration adapter RA3 into the solution obtained in the step (2), uniformly mixing, reacting at 70 ℃ for 2 minutes, and immediately cooling on ice, wherein the sequence of RA3 is 5'-TGGAATTCTCGGGTGCCAAGG-3';

(4) Adding 2 mu L of HML (Ligation Buffer) (Illumina, accession number 15013206), 1 mu L RNase Inhibitor (Illumina, accession number 15003548), 1 mu L T RNA, ligation 2, ligation, and Ligation (Epicentre, accession number LR2D 11310K) into the solution obtained in the step (3), mixing, and incubating at 28 ℃ for 1 hour; (5) Adding 1 mu L of STP (Stop Solution) (Illumina, product number 15016304) into the Solution obtained in the step (4), uniformly mixing, and incubating at 28 ℃ for 15 minutes;

(6) Taking a new PCR tube, adding 1.1 mu L of adapter RA5, incubating for 2 minutes at 70 ℃ with the base sequence of 5'-GUUCAGAGUUCUACAGUCCGACGAUC-3', RA and the concentration of 10 mu M, and immediately cooling on ice after the reaction;

(7) Adding 1.1 mu L of 10mM ATP (Illumina, accession number 15007432) to the solution obtained in the step (6), and adding 1.1 mu L T RNA ligase (Illumina, accession number 1000587) and uniformly mixing;

(8) Taking 3 mu L of the solution obtained in the step (7), adding the solution obtained in the step (5), uniformly mixing, and reacting at 28 ℃ for 1 hour;

(9) Adding 1 mu L of RNA RT Primer (10 mu M) into the solution obtained in the step (8), uniformly mixing, reacting for 2 minutes at 70 ℃, and carrying out reverse transcription reaction to obtain a first strand of DNA, wherein the sequence of the reverse transcription Primer RT Primer is 5'-CCTTGGCACCCGAGAATTCCA-3', and immediately placing the first strand on ice for cooling after the reaction;

(10) To the solution obtained in step (9) was added 2. Mu.L of 5X First Strand Buffer (Thermo, cat. No. 1889832), 0.5. Mu.L of dNTP Mix (12.5 mM, illumina, cat. No. 11318102), 1. Mu.L of 100mM DTT (Thermo, cat. No. 1850670), 1. Mu. L RNase Inhibitor and 1. Mu. L SuperScript II Reverse Transcriptase (Thermo, cat. No. 2008170) and incubated at 50℃for 1 hour;

(11) Adding 25 mu L of PML (PCR Mix) (Illumina, cat. No. 15022681), 2 mu L of Primer1 (10 mu M) and 2 mu L of Primer2 (10 mu M) into the solution obtained in the step (10), uniformly mixing, performing PCR reaction, performing pre-denaturation at 98 ℃ for 30s, denaturation at 98 ℃ for 10s, annealing at 60 ℃ for 30s, and extension at 72 ℃ for 15s, performing 18 cycles, and then extending at 72 ℃ for 10min and preserving at 4 ℃; wherein the sequence of Primer1 is

5'-CAAGCAGAAGACGGCATACGAGATGTCGTGATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA-3' the sequence of Primer2 is 5'-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA-3', and the 8 bases "GTCGTGAT" in Primer1 is index sequence;

(12) And (3) performing 6% polyacrylamide gel electrophoresis on the PCR product obtained in the step (11), dyeing with 120V of voltage and 1h of time for 5 minutes by using Gelred dye liquor, then observing and photographing under an ultraviolet lamp, cutting off strips between 149 and 169, recovering, performing fragment length range detection by using an Agilent 2100 Bioanalyzer and concentration quantification by using Invitrogen Qubit, and sending to an Illumina Novaseq 6000 sequencing platform for sequencing, wherein the sequencing read length is 75bp, and the sequencing mode is single-ended sequencing, thereby obtaining the next machine data.

And carrying out data analysis on the machine-setting data of the training set sample by adopting the following steps to obtain the expression quantity RPM of each miRNA in the sample:

(1) Performing data quality control and pretreatment (using default parameters) on the off-machine data of the sample by using FastQC, cutadcat and Trimmomatic to obtain effective data with low-quality sequences and sequencing adaptors removed;

(2) Removing the base sequence in RA5 from the 5' -end of the sequence of the effective data, and then using sequence alignment software Bowtie to align the obtained sequence onto the human reference genome sequence (allowing up to 1 base mismatch) so as to obtain the position information of the human reference genome;

(3) Comparing the position of the obtained sequence with the position of the miRNA in the human reference genome, and determining the expression level RPM of each miRNA in the sample. RPM value information for 466 mirnas was obtained altogether. The miRNA position information is taken from a miRBase database (http:// www.mirbase.org /), and when the 5 'end of a certain sequence is consistent with the 5' end of the certain miRNA, the sequence is recorded as the sequencing sequence of the miRNA; each miRNA expression level RPM (reads per million) is the sum of the miRNA sequencing sequences in parts per million of the total of all sequencing sequences of the sample that can be aligned to the reference genome.

Example 3: obtaining miRNA markers for central precocity identification

The expression RPM value of each miRNA was normalized by using the formula S1 using the MinMaxScaler tool in the scikit-learn library. The central precocious puberty group was defined as 0 and the nc group was defined as 1. The reserved features were gradually converged using the RFECV algorithm, where the convergence rate was controlled to 9 feature subsets, the code was as follows:

the accuracy in the convergence process is as follows:

399 miRNAs: 0.931034482758620793;253 miRNAs: 0.7931034482758621;174 mirnas: 0.7931034482758621;91 miRNAs: 0.8275862068965517;57 mirnas: 0.7931034482758621;39 miRNAs: 0.8620689655172413;36 miRNAs: 0.7931034482758621;26 mirnas: 0.9255172413793104;18 miRNAs: 0.9310344827586207;12 miRNAs: 0.896551724137931;9 miRNAs: 0.9310344827586207.

it can be seen that the same accuracy can be achieved for the effect of retaining 9 features and the effect of retaining 18 features, but the workload of arranging and combining the 9 features is less than 18, and can be within the acceptance range of the computer.

They are respectively has-miR-584-5p, has-miR-625-3p, has-miR-652-3p, has-miR-7-5p, has-miR-874-3p, hsa-miR-181c-3p, hsa-miR-1290, hsa-miR-454-3p and hsa-let-7i-3p. The 9 mirnas obtained were combined in a permutation to obtain 511 combinations, and the ROC curves were plotted one by one and AUC values were calculated. According to the AUC arrangement, three miRNA combinations of has-miR-584-5p, hsa-miR-625-3p and has-miR-7-5p are obtained, the AUC value is 0.9885 at most, and in addition, the AUC value of the four miRNA combinations of hsa-miR-625-3p, has-miR-874-3p, hsa-miR-181c-3p and hsa-miR-1290 is 0.9722 at the next highest; the combined AUC values of three miRNAs, has-miR-584-5p, has-miR-652-3p and hsa-miR-181c-3p, are 0.9685, and are shown in FIG. 1.

Therefore, the miRNA marker can well distinguish the central precocious patients from healthy people, and can be used as an auxiliary diagnosis basis for central precocious identification.

While the application has been disclosed in terms of various aspects and embodiments, other aspects and embodiments will be apparent to those skilled in the art in view of this disclosure, and many changes and modifications can be made without departing from the spirit of the application. The various aspects and embodiments of the present application are disclosed for illustrative purposes only and are not intended to limit the application, the true scope of which is set forth in the following claims.

Claims

1. An in vitro identification of a central precocious girl detection marker, wherein the marker is a free miRNA marker from human serum, wherein the miRNA marker comprises:

one or more than two of has-miR-584-5p, hsa-miR-625-3p and has-miR-7-5 p.

2. The in vitro diagnostic central precocious girl detection marker according to claim 1, wherein said miRNA marker is a mature miRNA in serum.

3. The in vitro identification central precocious girl detection marker according to claim 1, wherein the expression level of the miRNA marker in blood is a relative expression level, and the miRNA marker has a significant statistical difference between central precocious girl and healthy girl, and can distinguish central precocious girl from healthy girl, and also can distinguish central precocious girl from simple breast precocious girl.

4. The in vitro identification central precocity girl detection marker according to claim 1, wherein the free miRNA samples of the serum of central precocity girl and normal healthy girl are formed into a training set, the samples are subjected to second generation sequencing and data analysis, and the miRNA with statistical significance is screened out as the marker by using a recursive feature elimination cross-validation method based on the miRNA with significant differential expression between the serum sample of central precocity girl and the serum sample of normal healthy girl.

5. A method for in vitro identification of a central precocious girl test marker according to any one of claims 1 to 4, comprising the steps of:

6. The method for in vitro identification of central precocious girl detection markers according to claim 5, wherein in step (a) RPM of the expression level of each miRNA in the sample is obtained, comprising in particular the steps of:

(a2) Comparing the sequence of the effective data with a human reference genome sequence to obtain miRNA position information positioned in the human reference genome sequence, wherein the miRNA position information is taken from

A miRBase database, wherein when the 5 'end of a certain sequence is consistent with the 5' end position of a certain miRNA, the sequence is recorded as the sequence of the miRNA;

(a3) Determining the expression level RPM of each miRNA in the test sample, wherein RPM (reads per

million) is the unit of expression, and the expression RPM of a miRNA is the percentage of the total amount of the miRNA sequenced sequences in the sample over the total amount of sequenced sequences that can be aligned to the human reference genome.

7. The method for in vitro identification of central precocious girl detection markers according to claim 5, wherein in step (b) a subset of features is selected which has the most predictive power for predicting the performance of the model, comprising in particular the steps of:

8. The method for in vitro identification of central precocious girl detection markers according to claim 5, wherein the convergence rate is controlled by modifying the min_features_to_select parameter to obtain the effect of different feature quantities, and the feature subset with the highest accuracy is selected to determine the number of feature variables.

9. The method for in vitro identification of central precocious girl detection markers according to claim 5, wherein in step (c) the highest AUC combination is obtained, comprising in particular the steps of:

(c2) The miRNA combination with the maximum AUC value is taken as a marker.