CN115786479A

CN115786479A - Police performance detection method for police dog

Info

Publication number: CN115786479A
Application number: CN202211491466.7A
Authority: CN
Inventors: 程占军; 强京宁; 蔡靖; 李飞; 李大伟; 高明军; 朱程程; 万宁
Original assignee: Nanjing Police Dog Research Institute of Ministry of Public Security
Current assignee: Nanjing Police Dog Research Institute of Ministry of Public Security
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-03-14

Abstract

The invention relates to selection and breeding of German shepherd dogs, in particular to a police performance detection method for police dogs, which is characterized in that whether the German shepherd dogs carry CGNL1 and/or KIF5C genes is determined by a gene sequencing method. For the field of breeding, the method is simple, feasible and reliable, and is suitable for large-scale popularization.

Description

Police performance detection method for police dog

Technical Field

The invention relates to breeding of German shepherd dogs, in particular to a police performance detection method for police dogs

Background

Today, the unique functions of police dogs are still irreplaceable by people and instruments, and are widely used by police in various countries around the world. The police dog technology is an important component of public security science and technology, is an important means for fighting and preventing crimes by public security organs, and continuously shows a unique effect of the police dog in handling group emergencies by means of violent appearance and strong deterrence force. The police dog technology is fully applied to treat group emergencies, various infringement tactics are flexibly applied, and the police dog technology becomes an important component for fighting and preventing various crimes and maintaining the security and stability of the society of various public security departments.

German shepherds (German Shepherd), often called German demise or German demise, are agile and adapted to the working environment of the activity, often being deployed with tasks such as police, guard, search and rescue and military, which also work as blind guides for the blind. Not all german shepherd dogs are qualified as police dogs, which have a desirable trait-bile number. Traditional breeding is to create genetic variation through various ways to perform optimized selection and evaluation from offspring, is based on canine phenotype, needs breeding workers to have abundant practical experience, has certain blindness and unpredictability in the aspect of screening the police dogs with good courage, and has the disadvantages of long breeding period, time consumption and labor consumption. Therefore, the method for rapidly and reliably screening the police dog with good gallbladder capacity becomes a technical problem to be solved urgently.

Disclosure of Invention

The technical problem to be solved is as follows: the police performance detection method for the police dog provided by the invention is rapid and reliable, overcomes the defects of blindness and unpredictability in the traditional breeding method, and greatly saves time and cost.

Technical scheme

A police performance detection method for a police dog is characterized in that a German shepherd dog is determined to carry CGNL1 and/or KIF5C genes through a gene sequencing method.

The CGNL1 is Gene30799; the KIF5C is Gene22425.

The gene sequencing method is a first-generation, second-generation or higher detection method.

The present invention can detect the presence of CGNL1 and/or KIF5C by PCR.

Advantageous effects

The SLAF-seq is adopted to carry out whole genome association analysis on the gallbladder character of the police dog so as to find SNPs sites associated with the gallbladder character, and finally 24 SNPs sites related to the gallbladder character are identified. According to the 100kb related gene upstream and downstream of the position of SNPs, 5 SNPs sites are related to the gene. The positions of 5 SNPs are respectively located:

LOC10659757；TRIM2；MND1；LOC106559787；LOC111093549；LCO111093557；

LOC102151701; CGNL1; MYZAP; KIF5C; LYPD6B; RUBCN; MUC20; the MUC4 gene is within or near the 14 genes. Performing statistical analysis on GO functions of 14 genes in a relevant interval, finding that 2 Gene functions are possibly related to the gallbladder quantity, and participating in ATP binding activity by Gene22425 (KIF 5C) (GO: 0005524); participate in the production of kinetin complexes (GO: 0005871). Gene30799 (CGNL 1) is involved in motor activity (GO: 0003774), and may have an effect on gallbladder volume. The above data only illustrate the possible effects and the lack of direct evidence suggests that there must be a correlation with gallbladder quantity. Since binding to ATP is associated with numerous reactions, such as heat generation, etc. Kinetin (Kinetin) is one of the mitogenic plant hormones cytokinins. Because of the ability to induce cell division, this substance is called kinetin, and the motor activity function is only indicative of its motor-related activity.

The gallbladder is not afraid of difficulty and fear, is a complex psychological performance and is influenced by multiple factors of audiences. Finding which genes are associated with gall bladder capacity, particularly in canines, has been a problem in the art.

The inventor carries out sequencing analysis on the CGNL1 gene or the KIF5C gene of the German shepherd dog, namely, whether the CGNL1 gene or the KIF5C gene is mutated or not is detected, and through examination items about examination details of the German shepherd dog according to police department police dog breeding rules issued by the police department, if the gene is mutated, the gall examination of the corresponding dog is excellent, and the gene mutation is further proved to be related to the dog gall. Therefore, for the breeding field, only the variation of the genes of the dogs needs to be determined, and the gall content of the dogs is deduced to meet the requirements.

Drawings

FIG. 1: schematic flow chart of SLAF experiment;

FIG. 2: WGS population evolution information analysis flow chart;

FIG. 3: a sample phylogenetic tree;

FIG. 4 shows the result of clustering samples corresponding to each K value of Admixure;

FIG. 5: cross-validation error rates for respective K values of Admixure;

FIG. 6: a principal component analysis result;

FIG. 7: linkage disequilibrium analysis results;

FIG. 8: manhattan plots for trait 1 association analysis;

FIGS. 9Q-Q are diagrams;

FIG. 10: the result of police dog detection and screening by using KIF5C is proved;

FIG. 11: the result of police dog detection and screening by using KIF5C is proved;

FIG. 12: the result of screening police dogs by using CGNL1 detection proves.

Detailed Description

Example 1

1. GWGS technical introduction

Whole Genome Sequencing (WGS) is a next generation Sequencing technology for rapid, low-cost determination of the complete Genome sequence of an organism. SNP information is obtained through analysis and is subjected to genotyping, the method is a rapid, simple and convenient and low-cost genotyping method, is suitable for carrying out marker screening on large-scale samples, and can be used in the fields of molecular marker development, ultrahigh-density map construction, population genetic analysis, population GWAS analysis and the like.

The information analysis content includes: 1) Evaluating the quality of sequencing data; 2) Comparing sample data; 3) Group SNP detection and annotation; 4) Analyzing a population genetic structure; 5) Analyzing a main component; 6) And (5) GWAS analysis.

2. Screening procedures

2.1 basic information on the Material

110 German shepherd dogs, 110 Nanjing police dog institute of public Security department, age of experimental dogs 0.5-1 year, weight of the experimental dogs 20-30 kg, good body conditions, complete immunity and insect repelling, feeding complete pellet feed, performing routine training and feeding, performing gall bladder police performance assessment scoring on each dog, collecting 5ML blood samples through veins, collecting 5ML blood from fresh dogs in an anticoagulation tube, labeling, and storing the samples in a refrigerator at-20 ℃.

2.2 methods

2.2.1 Experimental design

TABLE 1WGS detection packet

2.2.2 Experimental procedures

And (4) respectively carrying out enzyme digestion on the genomic DNA of each sample qualified by detection according to the selected optimal enzyme digestion scheme. And (3) carrying out treatment of adding A to the 3' end of the obtained enzyme digestion fragment (SLAF label), connecting a Dual-index sequencing joint, carrying out PCR amplification, purifying, mixing samples, cutting gel, selecting a target fragment, and carrying out sequencing by using IlluminaHiSeq after the library quality is qualified. In order to evaluate the accuracy of the enzyme digestion experiment, the rice Nipponbare was selected as a Control (Control) for sequencing. The experimental procedure is shown in FIG. 1: SLAF experimental protocol.

2.2.3 information analysis flow

The specific steps are shown in a WGS population evolution information analysis flow chart of FIG. 2;

3. information analysis results

The information analysis mainly comprises the following steps: (1) sequencing data quality control: performing quality control on Raw data obtained by off-line to obtain Clean data; (2) alignment analysis: comparing the Clean data with a reference genome; (3) SNP detection: performing group SNP detection and annotation according to the comparison result; (4) population genetic structure analysis: the method comprises the steps of system evolutionary Tree construction (Tree) and Principal Component Analysis (PCA); and (5) carrying out association analysis on the SNP and the traits.

3.1 sequencing data statistics

Qualified DNA libraries were tested for HiseqTM sequencing, yielding Raw data (i.e., raw data or Raw reads) and results stored in the FASTQ file format (filename:. Fq). The original sequencing data contains joint information, low-quality bases and undetected bases (expressed by N), in order to ensure the quality of information analysis, the information can cause great interference on subsequent information analysis, the interference information needs to be removed before analysis, and finally obtained data is effective data which is called Clean data or Clean reads. The raw data filtering method is as follows:

(1) The reads pair containing the linker sequence needs to be filtered out;

(2) When the content of N contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, the pair of paired reads needs to be removed;

(3) The pair of paired reads needs to be removed when the number of low-mass (quality Q. Ltoreq.5) bases contained in the single ended sequencing read exceeds 50% of the length proportion of the read.

And strictly filtering sequencing data to obtain high-quality Clean data. Statistics were performed on the sequencing data for all samples, including data yield, sequencing error rate, Q20, Q30, GC content, etc. The statistical portion of the sequencing data is shown in Table 2.

TABLE 2 sequencing data yield statistics

Note: sample: a sample name; raw base: the number of bases of the original data; clean base: the high-quality base number of the original data after being filtered; error Rate: sequencing error rate; q20, Q30: the percentage of bases with a Phred number greater than 20, 30 to the total bases; GC content (%): GC content;

in addition, clean data was compared to the nucleotide database of NCBI to assess whether there was DNA contamination from other sources. The statistical result shows that the sequencing quality is high (Q20 is more than or equal to 95 percent, Q30 percent is more than or equal to 90 percent), and the GC distribution of the sample is normal. In conclusion, the data volume meets the contract requirement, and the library construction and sequencing are successful.

3.2 comparative analysis of samples

The sequencing data was aligned to the pseudo-reference genome. The sample alignment rate can reflect the similarity of the sample and a reference genome, and the sequencing depth and the sequencing coverage can directly reflect the uniformity of sequencing data and the homology with a reference sequence. The valid sequencing data were aligned to the reference sequence by BWA [1] software (Version: 0.7.17-r1198, parameter: mem-t 4-k 32-M meaning parameter-t: number of reads; -k: minimal seed length; -M: mark short split as sequences) and the alignment and coverage between the sample and reference sequence were counted based on the alignment results (see Table 3). The alignment results were then sorted by SAMTOOLS [2] (Version: 0.1.19-44428cd, ref: sort).

TABLE 3 statistics of alignment and depth of coverage (parts)

Note: total reads: the number of all reads used for alignment;

mapping reads: comparing Clean data to the number of reads on a reference genome;

mapping rate: mapping reads account for the proportion of the total Clean reads;

average depth: average sequencing depth, the total number of bases aligned to a reference genome divided by the genome size;

coverage1X: the percentage of sites covered by at least 1 base in the reference genome to the genome;

coverage4X: the percentage of sites in the reference genome that are covered by at least 4 bases is the percentage of the genome.

3.3 mutation detection assay

SNP (single nucleotide polymorphism) mainly refers to DNA sequence polymorphism caused by variation of a single nucleotide at the genome level, and includes conversion, modification, etc. of a single base. After all samples are aligned by BWA, the SNPs of the population are detected by adopting software such as SAMTOOLS and the like. Detecting polymorphic sites in the population by using a Bayesian model, and filtering the obtained SNPs: basic filtering was performed to obtain high quality SNPs by the following filtering criteria:

(1) Q20 quality control (i.e., SNP with sequencing error rate greater than 1% is filtered out);

(2) The support number (coverage depth) of SNPs was in the range of sample number x 20;

(3) Filtering out of SNPs with a minimum allele frequency (maf) of less than 0.05;

(4) SNP deletion rate in the whole population is less than 0.5 for filtering (subsequent gene filtering results for subsequent analysis).

3.4 population genetic diversity analysis

Genetic diversity analysis was performed on each population. And constructing an evolutionary tree by using a corresponding mathematical operation method according to the genetic difference among the populations, and carrying out principal component analysis to reveal the genetic diversity, the genetic distance and the genetic difference degree among different populations.

3.4.1 construction of phylogenetic Tree

A phylogenetic tree (also called evolutionary tree) is a branching diagram or tree for describing the evolutionary order among the populations, and is used to represent the evolutionary relationships among the populations. The relationship can be deduced according to the common points or differences of the physical or genetic characteristics of the populations. We constructed the evolutionary tree using the neighbor-join methods.

And (3) according to the SNP locus information of different groups, a phylogenetic tree is constructed, the genetic distance of each group is visually displayed, and a reference is provided for screening the groups with longer genetic distances. The results are shown in FIG. 3, which shows the lineage structure tree (NJ tree) for the entire population, and the lineage evolution pattern of the population can be understood in combination with the information on the geographic distribution of each particular sample.

Software: treebest (1.9.2, http:// treisoft. Sourceform. Net /)

3.4.2. Evolutionary tree analysis

And (3) genetic evolution analysis, wherein 137937 high-consistency group SNPs are obtained in total for subsequent genetic evolution related analysis by filtering according to the completeness of >0.5 and the minor genotype frequency of > 0.05. The difference between sample groups is large, and the integrity of the snp site is not high.

3.4.3 phylogenetic analysis

The phylogenetic tree is used to express the evolutionary relationship between species, and according to the distance of the relationship between various organisms, various organisms are arranged on a branched tree-shaped chart to express the evolutionary process and the relationship simply. Based on SNP, through MEGA5 software, neighbor-join algorithm, construct the population evolutionary tree of the sample as shown in FIG. 3, note: each branch in the figure is a sample. .

3.4.4 genetic Structure analysis

The group genetic structure analysis can provide the ancestral source of individuals and the composition information thereof, and is an important genetic relationship analysis tool. Based on the SNPs, the population structure of the samples was analyzed by the additive software, and clustering was performed assuming that the number of clusters (K value) of the samples was 2 to 10, respectively. And performing cross validation on the clustering result, and determining the optimal clustering number according to the valley value of the cross validation error rate. Clustering cases with K values of 2-10 (FIG. 4) and cross-validation error rates corresponding to each K value (FIG. 5).

3.4.5 principal component analysis

Principal Component Analysis (PCA) is a pure mathematical operation method, and a small number of important variables can be selected from a plurality of related variables through linear transformation. PCA is applied to many disciplines, and is mainly used for clustering analysis in genetics, which is based on individual genome SNP difference degree, clusters individuals into different subgroups according to different character characteristics and is used for mutual verification with other methods.

And (3) performing principal component analysis on each population to obtain a clustering result of each population, and further verifying the differentiation relationship of each population obtained in the evolutionary tree analysis, wherein the display result is shown in figure 6.

3.4.5LD linkage disequilibrium

The Linkage disequilibrium decay (LD decay) pattern is an important element in population genetics research and also the basis of association analysis. General articles of genetic evolution GWAS will give LD attenuation maps. This analysis was performed using a PopLDdecadey (v 3.40) LD analysis, and the LD decay curves for each packet are shown in the results of FIG. 7.

The LD attenuation map is a graph showing the process in which the average LD coefficient between molecular markers on a genome decreases as the distance between markers increases. The general calculation principle is that the LD coefficients between every two markers on the genome are counted, then the LD coefficients are classified according to the distance between the markers, and finally the average LD coefficient between the molecular markers with a certain distance can be calculated. The abscissa is the physical distance (kb) and the ordinate is the LD coefficient (r ^ 2).

3.5 Genome wide association analysis (Genome wide association study)

Genome wide association analysis (GWAS) is used for detecting genetic variation polymorphism of multiple individuals in a Genome wide range, identifying genotypes, and performing association analysis on phenotypic data of a trait of interest to obtain a candidate gene or a Genome segment related to the target trait.

3.5.1 genome-wide association analysis of Complex traits Using GEMMA

GEMMA (Genome-wide Efficient Mixed Model Association algorithm) is a GWAS analysis software based on a Mixed linear Model. Compared with other software based on a hybrid linear model, the method has the advantages that: (1) quick: much faster than other algorithms (EMMA and FaST-LMM). (2) accurate: both EMMAX and GAPIT adopt a strategy of invariable variance component in a fixed zero model to improve the operation speed, which is not as accurate as GEMMA. (3) convenience: the plink binary format data can be directly used without complicated data format conversion. (4 fully functional: single marker GWAS, multi marker GWAS and multi trait GWAS analysis can be performed. The results are shown in Table 4, FIG. 8 and FIG. 9. Note: the height of the genetic locus-log 10 (p-value) in the manhattan plot on the Y axis corresponds to the degree of association of the SNP locus with the phenotypic trait or disease, the stronger the association (i.e., the lower the p-value), the higher the association.

TABLE 4 Whole genome Association analysis results for each trait Table (parts)

The file contains 14 columns of results. The specific meanings are as follows:

and (2) chr: the number of the chromosome where the SNP is located; ps: physical location of SNP; n _ miss: number of SNP-deleted individuals; allele1: a minor allele; allele0: a major allele; af: the frequency of SNPs; beta: (ii) a SNP effect value; se: beta estimation standard error; l _ remle: calculating a remle estimated value of lamda corresponding to the SNP effect; p _ wald: p value by wald test; p _ lrt: a likelihood ratio P value; p _ score: scores check p-value.

Note: the height of the locus-log 10 (p-value) of the gene in the manhattan map on the Y axis corresponds to the degree of association of the SNP locus with a phenotypic trait or a disease, and the stronger the association (i.e., the lower the p-value), the higher the association. Generally, SNPs around strongly associated sites will also show similar signal intensities due to Linkage Disequilibrium (LD) relationships, and peak positions are also generally of real interest for research. Q-Q plot is mainly used to estimate the difference between observed and predicted values of quantitative traits. 3.5.2 site annotation Using ANNOVAR software

The SNP sites were annotated using the annotation software ANNOVAR. The variant _ function.xls notes the genes and positions of all variants in the result file, and notes the variant functions, types, amino acid changes, etc. of exon regions in detail.

TABLE 5 statistical table of ANNOVAR annotation results

4. Results and discussion

The invention adopts SLAF-seq to carry out whole genome association analysis on the gallbladder character of the police dog so as to find SNPs sites associated with the gallbladder character. Finally, 24 SNPs sites related to the gallbladder quantity are identified. According to the 100kb related gene at the upstream and downstream of the position of SNPs, 5 SNPs sites are related to the gene. The positions of 5 SNPs are respectively located:

LOC10659757；TRIM2；MND1；LOC106559787；LOC111093549；LCO111093557；

LOC102151701; CGNL1; MYZAP; KIF5C; LYPD6B; RUBCN; MUC20; inside or near the 14 genes of MUC 4.

It was predicted that the 14 genes may be related to the biliary mass trait, see table 6.

TABLE 6 SNP-associated Gene information

Statistical analysis is carried out on GO functions of 14 genes in a relevant interval, 2 Gene functions are found to be possibly related to courage, and Gene22425 (KIF 5C) is involved in ATP binding activity (GO: 0005524); participate in the production of kinetin complexes (GO: 0005871). Gene30799 (CGNL 1) is involved in motor activity function (GO: 0003774), and may have direct influence on gall volume.

According to KEGG enrichment analysis, only two functions are significantly enriched, namely Ribosome (Ribosome) function and Dopaminergic synapse (Dopaminergic synapse) function, and the dopamine is found to have multiple effects by consulting data: 1. exciting the heart, directly exciting beta 1 receptors and promoting nerve endings to release NA; 2. promoting vasoconstriction, stimulating alpha receptor, skin membrane and bone vasoconstriction; 3. promoting renal function, increasing renal blood flow and renal filtration rate, directly inhibiting tubule from reabsorbing Na +, and removing sodium and promoting urination. The associated gene enrichment function is very close to the trait.

According to the indication of the KEGG enrichment result, the function of KO04728 dopaminergic synapse on KEGG pathway is analyzed, and the corresponding Gene22425 (KIF 5C) is used for further guiding the possibility of controlling the cholecystometric trait of the Gene.

5. Conclusion

According to the statistical analysis of GO functions of 14 genes in a correlation interval, the SNP functions of 2 variant genes CGNL1 and KIF5C are related to the gallbladder quantity of German shepherd dogs and can directly influence the gallbladder quantity.

Example 2 detection and screening of KIF5C for qualified police dogs with good gallbladder content

By adopting a sequencing method, 10 German shepherd dogs are subjected to KIF5C detection, 18 police dogs with KIF5C gene variation are screened out, and further, the 8 dogs have good performance through industry assessment and prove that the method is reliable through conventional means, namely through identification, tracking, public security and precaution training for 4 months.

The results are shown in FIG. 10, which demonstrates the feasibility and reliability of the method of the present invention.

Example 3 detection and screening of KIF5C for qualified police dogs with good gallbladder content

A sequencing method is adopted to detect KIF5C of 22 German shepherd dogs, 18 police dogs with KIF5C genetic variation are screened out, and further, the 18 dogs have good performance through the industry assessment and the result of the conventional means, namely, through the identification, tracking, public security and precaution training for 4 months, the method is proved to be reliable.

The results are shown in FIG. 11, which demonstrates that the method of the present invention is feasible and reliable.

Example 4 CGNL1 was tested to screen qualified police dogs.

In the same way as example 2, a sequencing method is adopted to detect CGNL1 of 68 German shepherd dogs, 56 dogs with CGNL1 gene variation are screened out, and then the evaluation details of the German shepherd dogs are evaluated by a conventional means according to the police department police dog reproduction rules issued by the police department: the dog can shuttle more than 6 people in a traction state, can adapt to traction and touch of strangers, and can fire and discharge firecrackers 50 meters outside when the dog leaves the people, so that the dog is not timid and is not timid for vehicles and pedestrians; the environmental adaptation is strong. All the 56 assessment results are excellent; the method is proved to be reliable.

The result of testing and screening qualified police dogs with good courage by CGNL1 is shown in a figure 12, which proves that the method is feasible and reliable.

Claims

1. A police performance detection method for police dogs is characterized in that whether German shepherd dogs carry variant CGNL1 and/or KIF5C genes is determined through gene sequencing.

2. The method of claim 1, wherein the CGNL1 is Gene30799; the KIF5C is Gene22425.