EP3830287A1

EP3830287A1 - Method for the quality control of seed lots

Info

Publication number: EP3830287A1
Application number: EP19749675.5A
Authority: EP
Inventors: Nathalie RIVIERE; Jordi Comadran; Sandra CONTAMINE; Jean-Pierre Martinant; Guillaume Collange; Aurélien AUDES
Original assignee: Limagrain Europe SA
Current assignee: Limagrain Europe SA
Priority date: 2018-07-30
Filing date: 2019-07-29
Publication date: 2021-06-09
Also published as: CA3107562A1; FR3084374B1; WO2020025554A1; AU2019312799A1; FR3084374A1; US20210317539A1; JP2021532834A

Abstract

The invention relates to a method for the quality control of the varietal purity of seed lots by analysing sub-lots of the seeds, said control being carried out by sequencing the genes of interest.

Description

QUALITY CONTROL PROCESS FOR SEED LOTS

The invention relates to a quality control process in the field of seeds and varietal purity.

The marketing of seeds is subject to the control of their purity rate. This rate is specific to each species but must be 98% by weight or more (Directive 66/402 / EEC on the marketing of cereal seeds), this standard also applies to seeds which are marketed for the production of seeds of bases, pre-base, the production of certified seeds or the production of hybrids. This varietal purity is mainly controlled by field inspection, in the case of production of hybrid seeds with a sterile male parent parent, the purity rate of this parent must be even higher (99.9% for corn).

The availability of an alternative quality control solution to field control is of interest to seed companies, in particular by the need to have a rapid evaluation, without waiting for the development of plants necessary for a phenotypic evaluation.

Furthermore, for these companies, the control of varietal purity is not limited to the stages mentioned above, each step upstream of the production of basic seeds is concerned by this requirement of varietal purity. It is recalled that the varietal purity rate is defined as the percentage of plants originating from a batch and which conform to the description of the variety. This percentage is expressed by weight of seeds.

In hybrid seed production, improving the quality of agricultural seed production requires verifying the genetic purity of the lots of basic seeds (parental lines used for the production of hybrids) used in the production of commercial seeds . This purity is assessed by detecting and identifying contaminating grains in a sample from the broodstock.

The contaminants are seeds of the same species, but showing genetic variations at certain loci in their genome, compared to the genotype expected for the seeds of the batch considered. In the process of producing seed lots, the presence of contaminants is reduced, due to vigilance in the upstream production stages, cultural practices, purification, isolation, and the controls carried out throughout the process. Thus, almost all of the seeds in the batch have the same genotype, the contaminants being present at a generally low percentage and indeed the level tolerated in a batch so that it can be marketed must be less than 2%.

The identification of genetic traits of interest is also important in the marketing of seeds, indeed certain traits ensuring for example tolerance to a herbicide or a pathogen (for example Mildew in Sunflower) bring a certain added value to a batch of seed and when a variety is marketed as a carrier of this trait, a check of the presence of this trait in the seed lot will be interesting. By trait is meant the allelic form of a loci linked to a phenotypic character.

A similar problem relates to the fortuitous presence of GMOs or any other alteration in the genome. The marketing of non-GMO plants requires proof of the absence of GMOs or the presence of a rate below a percentage determined by regulations. In contrast, the regulations in certain countries, for certain GMO traits, resistance against insects in particular, provides that seeds containing GMOs are sold with a certain rate of seeds not having the GMO trait, so to provide refuge areas for the insect.

The massive development of SNP (Single Nucleotide Polymorphism) markers and high throughput genotyping technologies has helped to promote the development of marker-assisted selection. Genotyping is conventionally carried out using different technologies, by PCR (Kasp - LGC Genomics, Taqman - Life Technologies) or hybridization on DNA chips (Axiom - Life Technologies, Infinium - Illumina).

If the Taqman quantitative PCR technology is today considered as the benchmark for the detection of the fortuitous presence of GMO plants in a mixture of non-GMO plants, it is based on the detection of a polymorphism of the presence / absence type. a given sequence, not on a polymorphism between different allelic forms of an SNP. Thus, in this particular case of GMO detection, the polymorphism relates to the presence of a trait which can be amplified (amplicon) and therefore easily identifiable.

The estimation of the purity of seed lots, understood as the absence of a GMO trait, has been worked on by Remund (Seed Science Research (2001) 1 1, 101- 119), two solutions have been identified by these authors to limit the resources necessary for these verifications and in particular the pool analysis. They indicate that this method is effective when looking for the absence of a particular individual, on the other hand when a purity rate is sought it is better to work seed by seed. These authors have developed a Seedcalc tool, which allows in particular a quantitative approach by playing on the number of pools and the number of seeds per batch, this method is particularly suitable for real-time PCR (Laffont, Seed Science Research (2005) 15 , 197-204).

An example of using seed pools to verify varietal purity exists, however. Application WO 2015/1 10472 proposes to analyze batches of seeds by manual or semi-automatic sampling of a determined sample volume from one or more seeds, this volume being determined to allow the analysis of at least one constituent of the seed or seeds. The tissue taken from several seeds is placed in an identified and traceable well, then the said constituent is analyzed on the content of the well (s). This bulk constitution method makes it possible to make varietal purity (example 6) this purity is evaluated by the Kaspar method (KBioscience) from bulks of 5 and 10 seeds, the presence of a contaminant in these bulks is characterized by the presence of a heterozygous cluster, however the authors indicate that this cluster is close to the homozygous cluster and that it is easier to identify for a bulk of 5 seeds than for a bulk of 10 seeds.

The development of high-throughput sequencing technologies, or NGS (Next Generation Sequencing) has revolutionized the world of genomics, allowing the massive discovery of SNP markers between lines of a given species. These techniques allow a large number of possible sequence readings in a single experiment.

The depth of sequencing makes it possible to identify an allele that is poorly represented when identifying allelic forms for a group of individuals in a pool. It can also make it possible to identify a number of allelic forms greater than two for the same locus. Thus, the sequencing of amplicons makes it possible to study in a targeted manner loci of interest, to identify SNPs and to characterize the allelic composition of an individual or a mixture of individuals. A research application is the detection of rare mutations in a mutagenized population (TILLING, Targeting Induced Local Lésions in Genomes). In these applications the identification of rare pooled alleles can be combined with pools of individuals in 2D or 3D allowing a reduction in the number of pools to be analyzed (Tsai et al, Plant Physiol. 201 1 Jul; 156 (3): 1257-68; Taheri et al, Mol Breeding (2017) 37:40 ; Gupta et al, The Plant Journal (2017) 92, 495-508) WO2014134729, EP 2 200 424). This approach can also be applied to the identification of mutations by Gene Editing methods (Kumar et al, Mol Breeding (2017) 37:14). These approaches remain qualitative, however, there is no quantitative consideration.

The possibility of using pool sequencing genotyping has been evaluated for the identification of allelic frequencies on populations by Gautier (Mol Ecol. 2013 Jul; 22 (14): 3766-79). However, this approach is particularly suitable for the analysis of large populations on a large number of SNPs, and does not seem suitable for the detection of rare alleles (generally less than 5%).

One of the difficulties linked to finding a rare allele is the reliability of the result, the frequency of the rare allele approaching the sequencing error rate.

In the case of quality control of seed lots, the objective is to detect the presence of a contaminant, to accurately estimate the rate within the seed lot from which the analyzed sample comes, and preferably to determine its genetic profile to better understand its origin. Detection can be carried out by analyzing the loci of interest, chosen by a person skilled in the art, based on their knowledge of the genetic material to be qualified and the genetic material likely to contaminate it.

Thus, Chen et al (2016, PLOS ONE 1 1 (6)) have developed, for corn, two series of SNPs for quality control: a set of markers for rapid control, using a reduced number of SNPs (50- 100) to identify potential labeling errors in seed packets or plots, and a wider set of markers, and used for further characterization and discrimination of genetic material. In this example, the sampling of 192 individuals analyzed individually would make it possible to have a probability close to 100% of detecting a contamination of 5% in a batch, but this probability becomes lower than 90% if one is interested in a 1% contamination.

In the case of quality control of batches of basic seeds, the expected genetic purity is high, as well as the precision of estimation sought, which depends on both the number of seeds sampled (tested) and the number of seeds of the batch of basic seeds. For example, if 200 grains are analyzed and the impurity rate is 0%, the confidence interval for this value ranges from 0% to 1.49%. The workforce analyzed is therefore too small to guarantee a sufficient level of purity by analyzing only 200 grains. In contrast, when analyzing 2000 grains, a 0% impurity rate has a 0% confidence interval at

0.15%. However, even if genotyping costs have dropped considerably, such sampling, combined with plant-to-plant processing, is not economically viable for quality control.

Genia (Montevideo, Uruguay) offers a method of determining genetic purity on batches of lines, and identifying contaminants, by analyzing a unique mixture of 10,000 seeds and sequencing amplicons targeting approximately 350 SNP. This company claims to determine varietal purity with a sensitivity of 0.8% and a confidence interval of 99%. This approach is similar to that developed by Gautier et al., In that it is based on a statistical model for estimating allelic frequencies on a large number (350) of SNPs, from which an estimate of the frequency is made. different genetic profiles present in the mixture. However, such an approach does not allow reliable detection of a rare allele for a given SNP, which is necessary in the search for contamination for a given trait.

It is therefore necessary to have an economic method, allowing the analysis of a large number of individuals, in order to precisely determine the genetic purity of a given seed lot and this in particular for seed lots having a rate of high purity.

The method presented here is based on the estimation of the purity of a seed lot from binary qualitative analysis (presence / absence of a contaminant) of several sub-lots of samples. The analysis on each sub-lot consists of detecting the presence of an alternative allele to one or more loci of interest, by sequencing of amplicons. The number of sublots, as well as the size of each sublot are defined according to the expected purity rate (estimated by the operator) and the precision sought, and so that there is preferably a statistical probability of finding a maximum of a contaminant in a given sublot. This means that, from a given number of seeds used for the test, at least as many sublots are formed as the number of contaminants estimated, preferably exactly as many sublots as the estimated number of contaminants. Furthermore, due to the analysis of several sublots, the method makes it possible to distinguish a contamination by a hybrid (segregation) and a contamination by a line (no segregation), by comparing the contaminating profiles of the different sublots. .

However, this method is not limited to this binary approach, in fact the use of sequencing makes it possible not to limit the method to the identification of two allelic forms and in this context the method also makes it possible to identify contaminants in batches heterozygous seeds for the allele considered, the contaminant being heterologous to the allelic forms of this individual.

The invention thus relates to a method for determining the quantity of contaminants at at least one locus of interest, present in a batch of seeds of a variety of interest, characterized in that

a) the seeds of a seed lot are grouped into sublots of at least 10 seeds, the number of sublots thus obtained being greater than or equal to 10

b) a targeted sequencing of at least the genome region of the seeds containing the locus of interest is carried out for each sub-lot, c) the presence of a contaminant is determined for each sub-lot qualitative in case of detection of an alternative allele to the expected allele (s) (there may be several expected alleles at a single locus, especially if the seeds are seeds of a hybrid plant) for each genomic region sequenced (presence / absence of an alternative allele)

d) the quantity of contaminants in the overall batch is determined by the compilation of the qualitative results obtained for all of the sublots.

Optionally and preferably, and to carry out the sequencing, the region corresponding to the locus of interest is amplified by PCR between step a) and step b). This amplification step is carried out directly on all the seeds in each sublot. Alternatively, the sequencing of step b) is carried out on the DNA extracted from the seeds present in a sublot, the region of the genome of the seeds containing the locus of interest being optionally amplified. In another embodiment, the RNA present in the batch is also extracted of seed, a reverse transcription is carried out to obtain complementary DNA (cDNA), and optionally an amplification of loci of interest of this cDNA, and the sequencing of loci of interest (preferably amplified) is also carried out on the CDNA obtained.

The estimate of the impurity P of the batch is obtained according to the formula:

where n is the number of pools; m is the number of grains in a pool; d is the number of pools in which a contaminant has been identified.

This formula is the formula proposed by Remund (2001, op. C / ^' f.), Which makes it possible in particular to take into account the fact that the searches for contaminants are carried out only on a sample of the seed lot and therefore to take into account the biases potentially induced by this sampling.

This process therefore makes it possible to calculate the percentage of contaminants in the seed lot (and therefore the purity of the seed lot: 1- P).

A contaminant is a seed with an allele different from the expected allele at the locus of interest given in this seed lot. However, when we apply the method on several loci of interest, we can decide that we have contamination of a lot only when we observe, in this lot, unexpected alleles at more than 'a locus, for example with 2 or 3 loci.

Preferably, in step a), a maximum number of seeds is used, calculated so that at most one contaminant is present in each sample (sublot) of seeds, from a statistical point of view. . In industrial production methods, a purity level higher than 99% is generally observed. Thus, with a workforce of around 100 seeds, for example between 80 and 120, we can expect to detect a contaminating seed mainly. The methods described above are in fact used for homogeneous seed lots, that is to say for which at least 95%, preferably at least 96%, more preferably at least 97% so even more preferably at least 98%, most preferably at least 99% of the seeds have the same genotype. Depending on the estimated purity of the seed lot, the sublots contain a maximum of 20, or a maximum of 50, or a maximum of 80, or a maximum of 100, even a maximum of 200, or 2,000 seeds. When evaluating a character for which the expected purity is of the order of at least 90%, respectively at least 95% (such as the germination character of the seeds), the quantity of seeds in each sublot prepared in step a ) is then of the order of 10, respectively 20, or between 15 and 25.

Step b) of the process consists of the targeted sequencing of at least one genomic region containing the locus of interest for which the presence of a contaminant is sought.

It is clear that this sequencing step is carried out on nucleic acid. Thus, the DNA of the batches is prepared, for example by crushing the seeds and using the flour or isolating the DNA from this flour. These methods are known in the art. As seen above, one can also prepare cDNA.

This sequencing step is preferably carried out by high throughput sequencing (NGS). Different technologies (Illumina®, Roche 454, Ion torrent: Proton / PGM (ThermoFisher) or SOLiD (Applied BioSystems)).

In summary, these NGS technologies have 2 common steps: an amplification step, by PCR

a sequencing step, this step being carried out by different approaches depending on the technology used.

Illumina® technology uses clonal amplification and sequencing by synthesis (SBS). A double-stranded DNA library is generated from the sample to be analyzed by PCR amplification and addition of specific adapters at the ends, then the DNA is stranded in single strand, and the end of the single strands is fixed. randomly on the “flowcell” surface, on which a solid-phase “bridge” PCR is carried out (creation of dense groups (clusters) where the fragments are amplified).

The sequencing is carried out by adding the 4 labeled reversible terminators, the primers and the DNA polymerase, then the fluorescence emitted by each cluster is read, making it possible to determine the first base. We then perform several cycles in order to read the entire sequence.

For the implementation of technology 454, a single-stranded template DNA bank is obtained, specific adapters being added at the 3 ′ and 5 ′ ends, and each DNA strand being immobilized on a bead (a fragment of DNA = a ball). These beads are then integrated with the amplification products in a water-oil emulsion, in order to create "microreactors" (each drop of water in oil) containing a single ball. The PCR is carried out in this emulsion, the entire bank being amplified in parallel, making it possible to obtain several million copies per bead.

Then the beads are purified and the fragments are loaded onto plates such that the diameter of the wells allows the entry of only one ball at a time. The sequencing enzymes are added and the individual labeled nucleotides are sent one after the other. The sequence is detected by a CCD camera according to the luminescent signal.

For the SOLiD technology, the banks are prepared, the adapters are added and a PCR is carried out in an emulsion, as in method 454. Then an enrichment of the amplified beads is carried out, the 3 'end of the DNAs is modified to allow covalent attachment on a slide, and the balls are placed on the slide. The sequencing is carried out by ligation: primers hybridize on the adapters present on the matrix. A set of 4 fluorescently labeled 2 base probes are associated with the primers. The specificity of the 2 base probes is carried out with the 1 ^st and 2 ⁿ bases of each ligation reaction. Several ligation, detection and cleavage cycles are carried out. In this process each base is detected by two independent ligation reactions by two different primers. The coding system for reading on two bases allows very high fidelity in reading the results. This method makes it possible to differentiate between sequencing errors and real variants (SNP, insertions and deletions).

For lonTorrent technology, banks are prepared and adapters are added. Emulsion PCR is carried out. Sequencing does not rely on the detection of fluorescence of nucleotides or their polymerization residues by a CCD optical sensor, but uses a CMOS sensor which detects the H + ions released during the polymerization of DNA. The CMOS sensor measures the pH in each of the wells, which indicates the presence of one or more bases which have been integrated into the DNA under analysis. Add the bases one after the other to detect which one is integrated, then rinse and start again.

Other sequence technologies exist such as the Min ION technique from Oxford Nanopore technologies (https: //nanoporetech.eom/products#minion, Mikheyev and Tin (2014). Molecular Ecology Resources. 14 (6): 1097-102.) or Pac Bio of Pacific bioscience (https://www.pacb.com / products-and-services / PacBio-systems /).

The method described here makes it possible to limit the risk of detection of a false-positive (one concludes by error in the presence of the alternative allele) or of a false-negative (one concludes by error in the absence of the alternative allele) that these NGS sequencing methods can present due to the sequencing error rate inherent in each technology. In fact, step c) consists in determining the absence or the presence, for a sample, of an unexpected sequence in the sequencing products. In the presence of such an unexpected sequence (corresponding to the presence of a contaminant), there is no need to quantify the quantity of unexpected sequence compared to the quantity of expected sequence (corresponding to the sequence of correct seeds from the seed lot). The detection is therefore only qualitative (that is to say binary: presence / absence of a sequence of an alternative allele to the expected allele (s). The fact of using sublots of seeds also allows to increase the number of seeds studied for each sequencing reaction and thus to have a sufficient sample of seeds while controlling costs.

The presence of such a sequence of an alternative allele is indicative of the presence of a contaminant for this allele.

This analysis is carried out for each genomic region analyzed, that is to say for each locus of interest determined beforehand by a person skilled in the art, and making it possible to characterize the batch of seeds.

In fact, when the number of seeds in each sub-lot is chosen so that only one contaminant is present (statistically) within this sub-lot, the presence of an alternative allele is sufficient to conclude that a single contaminant is present.

The next step in the process is to calculate the effective percentage of contaminants in the seed lot. This is done by compiling the qualitative results obtained for all of the sublots.

The purity rate of the seed lot is then estimated by considering the number of contaminated sublots, the total number of sublots analyzed, and the workforce of each of the sublots. lot is obtained according to the formula:

We can also determine the confidence interval of this estimate by any appropriate statistical method, in particular by an F distribution, as applied in the SeedCal tool used within the framework of the ISTA (International Seed Test Association) and such that explained in Remund (2001).

In a preferred embodiment, in step b), the targeted sequencing of several regions of the genome containing several loci of interest is carried out. This makes it possible to better guarantee the identity of the seeds present in each sample and to detect, more precisely, the presence of contaminants.

Thus, it is possible to sequence in a targeted manner, at least 2, preferably, at least 5, preferably, at least 10, more preferably at least 100, 50, 40, 15 loci of interest, see at least 20 loci interest. Even if there is no upper limit to the number of interest loci that can be assessed, we prefer to limit these. Indeed, it is possible to characterize a variety with a limited number of markers (specific for loci) (between 15 and 20), and to use this set of markers to discriminate plants from this variety of other plants. A variety is understood as a set of plants with the same genetic background, the variety can be a commercial variety, but also a line not yet listed in the catalog, basic line, pre-base line or line undergoing propagation.

The optimal number of loci of interest is defined by a person skilled in the art, as a function of the plant material considered, but also by fixing the minimum number of loci discriminating any pair of given varieties. Thus, the minimum number of loci discriminating any pair of varieties can be fixed at three, limiting the risk of confusing a real contamination and an experimental false positive. Different algorithms are described by Rosenberg et al. (Journal of Computational Biology 12 (9), 2005, 1183-1201) to select a set of discriminating markers.

These algorithms can be improved or modified to take into account other criteria such as the quality of the markers chosen (by quality means their ability to be amplified, unequivocally identified). Groups or categories of markers can be identified and define a subgroup of markers which will preferably contain markers from a given group or from different groups. We can thus define a set of markers that we want to use.

The algorithm can also take into account the statistical quality of these markers defined as the minimum number of discriminating markers to declare a couple of individuals as different. From this criterion, the quality of discrimination of a set of markers can be assessed by the number of pairs of individuals that this set is capable of discriminating, ideally all of the individuals managed by the producer.

In the context of the present invention, the method will preferably be implemented on loci of interest making it possible both to discriminate the variety of interest (ensuring the consistency and the concordance of the genetic background between plants) and to identify the presence or absence of other loci of interest (notably linked to traits of interest).

In this embodiment, that is to say when a sequencing of several regions of the genome is carried out, it may be decided to consider that there is a contaminant in a batch only if the presence is observed unexpected sequences for more than one locus of interest in this lot. In other words, we can decide that, if we observe, in a given batch, the presence of a single alternative allele (an unexpected sequence for a single region of the genome, while the sequences obtained for the others regions are those expected), it is considered that the presence of a contaminant is not proven.

The method described here therefore makes it possible to determine the presence of contaminants in a batch of seeds, in particular to control varietal purity during an industrial production process.

This method can also be implemented in order to check the purity level of a trait which is sought in the homozygous state in the batch of seeds. In this embodiment, the region is preferably evaluated only of the genome containing the particular trait that one wishes to follow. Several lines can be followed simultaneously, using specific markers for each line.

By trait is meant allelic form specific to a given locus, in this context this allelic form can be native, linked to a mutation identified by Tilling or Ecotilling, mutation linked to the imprint of a transposable element, mutation obtained by Gene Editing ( gene editing) or by any other method ... in this context the mutation whether it is a point mutation, an insertion or a deletion implies a limited number of bases. This method can also be applied to a desired trait in the heterozygous state, the contaminant will then correspond to an alternative form to the allelic forms expected in this individual.

In a preferred embodiment, a line (which can be linked to a single allele or to several alleles) provides the plant with a phenotypic character of interest (such as drought resistance, resistance to biotic stress, resistance to lack of nitrogen, increased yield ...).

When the trait is linked to a mutation involving a large insertion, such as a GMO trait, a mutant obtained by insertion of a transposable element or a mutant obtained by Gene Editing, the method can be implemented by searching for the presence of the allelic form not containing the insertion or the mutation considered. The presence of this allelic form indicating that the presence of the trait linked to the mutation in a homozygous form in the seed lot is not fully guaranteed. This method could be used for example when the mutation corresponds to the introgression of a DNA fragment from another species, this particular case will be encountered for example to check the purity of fertility restoring lines in rapeseed.

This method also makes it possible to make the search for the fortuitous presence of a trait, the trait for which one will seek the fortuitous presence could be a GMO, a mutation linked to Gene Editing or the introgression of a fragment coming from a species heterologous, this research will be done by amplification then sequencing of a specific region of T-DNA, or insertion. By extension, this method can be applied to traits linked to small mutations if primers allowing specific amplification of the region when one is in the presence of the mutated allelic form can be defined. By adapting the protocol number of lots and number of seeds per lot, the protocol can be extended to identify the presence of lines for frequencies ranging, for example up to 10% and in this context we can check for example the presence of 10% wild seeds in a batch of GMO seeds (legislation on refuge areas). These applications are not limited to GMOs, the trait followed by this method can be introgression in a line of a fragment from another species, the presence of a fertility restoring locus from radish in rapeseed by example. In the same way, verification can make it possible to verify that this introgression is indeed in the homozygous state.

In another embodiment, the method can be used to detect the fortuitous (unwanted) presence of GMOs or of another mutation linked to the insertion of a fragment of substantial size, in a batch of seeds. This mutation can be linked to the presence of a transposable element or to an insertion obtained in particular by Gene Editing. In this embodiment, primers specific to a transgene or of the particular insertion will be used (if a particular contamination is suspected) or different generic primers making it possible to detect different transgenes without a priori.

In the case of varietal purity, one can also add markers linked to these traits to the list of markers used to characterize the variety.

Thus, in a preferred embodiment, steps b), c) and d) are carried out for several regions of the genome containing several loci of interest.

In this embodiment, it is preferred when a subset of several loci makes it possible to discriminate or identify a variety of interest. As seen above, this number of loci is variable and these loci can be determined by the skilled person in particular according to the teachings of Rosenberg (cited above). In a particular embodiment of the invention, it may integrate information concerning the production plan, involving specific controls and measures: isolation distances, border areas, castration, which implies that the risk of contamination will be limited and the seed lot will be a priori uncontaminated or slightly contaminated. Furthermore, due to these measures, contamination will most likely come from a known contaminant, in particular from a parental line, including the parental lines involved in the production of basic and pre-basic seeds. In this particular context, the number of markers making it possible to identify the purity of a line can be very reduced, it can in particular be 20 or less.

As seen above, in one embodiment, a batch is declared as containing a contaminant if an alternative allele to the allele is observed. expected for a single locus of interest. In another embodiment, a batch is declared as containing a contaminant if an alternative allele is observed to the expected allele for more than one locus of interest (in particular 2 or 3 loci).

In one embodiment, at least or exactly one locus of interest is linked to a character of interest (trait). In another embodiment, it is a combination of loci which is related to a character of interest (trait).

In one embodiment, at least one locus of interest is linked to a specific trait a priori not present in the seeds of the batch. In this embodiment, we seek the fortuitous presence of this trait. We therefore add markers to verify the absence of the line. In this embodiment, the method is essentially qualitative. The integration of these markers in the claimed protocol makes it possible to carry out, in a single experiment, the additional controls necessary elsewhere.

Generally, a lot is considered non-compliant if the frequency of the unwanted trait (s) is more than 10% in the seed lot.

In a preferred embodiment, the quantity of seeds in each sublot prepared in step a) is between 80 and 120.

The method described here can also be used to determine the intrinsic agronomic characteristics of the seeds present in the lot. Thus, one can determine the expression of genes that will lead to unwanted properties of seeds (for example dormancy marker genes which, if expressed, are a marker of seed non-germination). In order to determine the expression of these genes in the seeds of the batch, the RNA is extracted and reverse transcribed. Thus, the method described above can also include the following steps:

i) in addition, an RNA extraction is carried out from the seeds of the sublot, and a reverse transcription of this RNA into cDNA before step b)

ii) sequencing of this cDNA is carried out using primers specific for dormant genes, at the same time as the sequencing of step b)

iii) the presence of non-germinative seeds is qualitatively determined for each sub-lot, in the event of detection of cDNAs relating to dormant genes during the sequencing step ii) (presence / absence of l cDNA) iv) the quantity of dormant seeds in the overall lot is determined by the compilation of the qualitative results obtained for all of the sublots in iii).

Steps iii) and iv) are carried out in the same way as described above. The seeds in the batch do not generally have the dormancy character and, by choosing the number of seeds in the over-batches adequately, the qualitative information in iii) can be used to obtain quantitative information. Thus, if it is known that at most 5% of the seeds will exhibit the dormancy character (case generally observed in commercial seed lots, for which at least 95% of the seeds germinate satisfactorily), sub- lots containing around 20 seeds (between 15 and 25 seeds).

This dormancy problem is particularly important for sunflower, wheat and rice seeds.

The dormancy marker genes whose expression is evaluated by sequencing of cDNA obtained from the seed RNA are preferably chosen from the genes known in the art and some of which are described below.

In another embodiment, a trait may correspond to an expression level of a marker gene. For example, the germination quality of a seed lot is an essential characteristic, and this quality can change during the conservation of seeds.

A state in which a seed does not germinate when it is in a favorable germination condition (temperature and humidity) is called a dormant state. Dormancy reflects an adaptation of plant species to environmental conditions (ability to put itself in a latent state in the absence of favorable conditions for the development of the plant). Thus the sunflower, rice or sorghum have a dormancy whose emergence is accompanied by an improvement in germination at low temperature, while in the case of wheat, barley or oats, it is acts to improve germination at higher temperatures (Baskin and Baskin, Seed Science Research (2004) 14, 1-16).

This property is particularly important in the case of cultivated species, the objective being to produce and market batches of seeds capable of germinating quickly and evenly after sowing. It is therefore important to be able to characterize the dormancy level of a batch of seeds, and such analyzes are carried out routinely in factories, through germination tests, these tests use in particular Ethrel which has the ability to raise the dormancy. However, these analyzes are long and require a large workforce, hence the advantage of being able to replace them with molecular analyzes.

Studies carried out in different species have made it possible to identify genes whose level of expression is correlated with the dormant or non-dormant state of the seeds. Bassel et al. (PNAS June 7, 2011 108 (23) 9709-9714; T rends in Plant Science, June 2016, Vol. 21, No. 6, 498-505) have identified sets of genes co-expressed specifically according to the state of dormancy or non-dormancy in Arabidopsis thaliana. For example, the DOG1 (Delay Of Germination 1) gene is involved in maintaining dormancy at low temperatures in Arabidopsis, and the role of this gene appears to be conserved between species such as in lettuce (Huo et al., PNAS April 12 , 2016 1 13 (15) E2199-E2206) or wheat (Ashikawa et al., Transgenic Res (2014) 23: 621). In sunflowers, Layat et al. (New Phytologist (2014) 204: 864-872) analyzed the abundance of RNA associated with the polysomal fraction in dormant or non-dormant embryos, and identified genes associated with the dormant state, such as HSP ( HSP70, HSP101) as well as stress response genes or involved in the signaling pathways of abscissic acid (ABA), a hormone associated with maintaining dormancy. Conversely, other genes, such as tubulin alpha, are specifically expressed in non-dormant seeds (Layat et al., Op. Cit).

Thus, the analysis of the expression of a specific gene from the dormant state makes it possible to characterize the germinative quality of a batch of seeds. The objective being to qualify batches for their germinative capacity, the analysis of the expression of a specific gene of the dormant state makes it possible to determine the percentage of dormant seeds in a batch not dormant, by semi-quantitative analysis. In the case of a high dormancy rate, in particular> 1%, the joint analysis of a specific gene from the dormant state and a specific gene from the non-dormant state would make it possible, by calculating the abundances relative of these two genes, to express a dormancy rate. Similarly, other evaluations of the physiological status of the seeds may be carried out and thus replace laboratory tests. The appropriate marker gene can be chosen based on the timing of this sequencing test phase. These tests may be carried out, for example, shortly before the seeds are packaged for marketing. This evaluation will concern in particular the quality of the priming, the aptitude for germination, the vigor and the viability of the seeds. The aptitude for germination is described in particular in application WO 2018/015495. The method described above can also be used to determine the specific purity of the seed lot, i.e. the presence or not (and the quantification) of seeds from a species other than the species of seeds from the seed lot. Such an analysis is currently carried out systematically by operators, who visually determine the presence or not of seeds of unwanted species (ISTA (International Seed Testing Association) rules chapter 4).

It is therefore possible to implement a method as described above, characterized in that

i) the DNA of the sublots is also sequenced using primers specific for one or more species different from those of the seeds present in the sublot, at the same time as the sequencing of step b)

ii) it is determined, for each sub-lot, the presence of seeds of different species qualitatively, in the event of detection of genes belonging to said species (presence / absence of genes specific for other species)

iii) the quantity of exogenous seeds in the overall lot is determined by the compilation of the qualitative results obtained for all of the sublots in ii).

In this embodiment, we are looking in particular for the presence of weeds as a different species. In particular, we are looking for the presence of seeds of Aeginetia, Alectra, Orobanche and Striga. The presence of sclerotia will also be regularly checked.

Steps ii) and iii) are carried out in the same manner as described above. Seeds in the lot generally do not have many seeds from other species and, by choosing the number of seeds in the over-lots adequately, the qualitative information in iii) can be used to obtain quantitative information. Thus, if we know that at most 1% of the seeds present come from a species other than the species of interest, (case generally observed in commercial seed lots, for which at least 99% of the seeds are of the species of interest), sublots containing about 100 seeds (between 80 and 120 seeds) are used. The method described above can also be used to detect the presence of pathogens in the seed lot (contamination) (see ISTA (International Seed Testing Association) rules chapter 7). For example, the quantity of Sunflower seeds contaminated with Botrytis tolerated for the marketing of a batch of sunflower seed is 5%.

It is thus also possible to implement a method described above, by further carrying out the following steps:

i) sequencing the DNA or cDNA included in the sublots using primers specific for pathogenic species, at the same time as the sequencing of step b)

ii) the presence or absence of DNA of the pathogenic species is determined for each sub-lot in the event of detection of sequences belonging to said pathogenic species

iii) it is concluded as to the contamination of the batch as a function of the presence of sequences belonging to said pathogenic species.

You can sequence a gene from any pathogen, such as a bacteria, a fungus, a virus or an insect. This method is particularly suitable for detecting the presence of Xanthomonas Campestris pv. campestris in seeds of Brassica ISTA (rules 7-019a: Detection of Xanthomonas campestris pv. campestris in Brassica spp. Seed) or Berg (Plant Pathology (2005) 54, 416 -427). A PCR test for the identification of a pathogen on seed exists for the identification of downy mildew on sunflower (loos et al., Plant Pathology (2007) 56, 209-218). It has the advantage of detecting a pathogen on the seed while the presence of this pathogen on the seed does not cause any symptoms, especially at the very low levels sought. This protocol indicates primers, the fact of doing a sequencing and not a revelation on gel will allow better accuracy. Identification of clavibacter michiganensis on tomatoes can also be done (Hadas et al, Plant Pathology (2005) 54, 643-649).

In order to implement the methods described above, the following steps can be carried out, before step b)

i) DNA is extracted from each seed sublot

ii) The RNA is extracted from each seed sublot and a reverse transcription of this RNA into cDNA is carried out.

iii) The DNA extracted in i) and the cDNA obtained in ii) are mixed iv) Optionally, an amplification is carried out on the DNA obtained in iii), specific to certain loci or a non-specific amplification

v) The DNA obtained in iii) or the amplification products obtained in iv) are used as a template for carrying out the sequencing step.

In one embodiment, steps i) and ii) can be carried out simultaneously, the extraction of DNAs and RNAs being able to be carried out in particular by means of the total DNA, RNA and protein isolation kit NucleoSpin® TriPrep from Macherey-Nagel.

Thus, in a preferred embodiment, step iv) is carried out by amplifying specific sequences of the genes (in particular other organisms) of which it is desired to verify the absence or the presence. We are therefore trying to determine if these other organisms are present in quantities lower than the tolerated rates for marketing. It is thus possible to detect the presence in particular of viral sequences. We can also make a non-specific amplification of the entire DNA of the genome.

In another embodiment, step iv) can also be carried out by amplifying specific sequences making it possible to determine certain agronomic properties of the seeds of the sublot, at least one agronomic property of the seeds being notably chosen from the state of dormancy , in particular the quality of the priming, the aptitude for germination, the vigor and the viability of the seeds.

In one embodiment, the method contains the steps:

i) in addition to isolating the DNA, an RNA extraction is also carried out from the seeds of the sublot, and a reverse transcription of this RNA into cDNA before step b)

ii) sequencing of this cDNA is carried out using primers specific for genes linked to an agronomic property of the seeds, at the same time as the sequencing of step b) is carried out

iii) for each sublot, the presence of seeds having the agronomic property is determined qualitatively, in the event of detection of cDNAs relating to the genes specific to the agronomic property of the seeds during the sequencing step ii) ( presence / absence of cDNA)

iv) the quantity of seeds exhibiting this agronomic character in the overall lot is determined by the compilation of the qualitative results obtained for all of the sublots in iii). Generally, the agronomic property of the seeds is chosen from the dormant state, in particular the quality of the priming, the aptitude for germination, the vigor and the viability of the seeds. Several agronomic properties can also be sought by sequencing suitable genes.

The gene which marks the physiological state and the agronomic property of the seeds is chosen from the genes which are expressed, in the seeds, at the same time as the undesired agronomic character (dormancy, lack of vigor, etc.). Thus, we want an absence of expression of this gene and we generally wish that the expression of this gene is not present in more than 10% of the seeds in the seed lot.

In a preferred embodiment, and in the implementation of varietal purity analysis (do the seeds present contaminants (i.e. unwanted alleles) at loci of interest), can identify the contaminant (s) present in the seed lot.

For each subsample, it is possible to define a molecular profile corresponding to the compilation of data from each locus of interest. The profile of each subsample can then be compared to the expected molecular profile, and a contaminating molecular profile can be deduced by subtraction. Thus, a locus of interest with no alternative allele will be considered identical to the locus between the expected variety and the contaminant, while a locus with an alternative allele will be defined as potentially homozygous for the alternative allele, or heterozygote allele expected / alternative allele.

These contaminant molecular profiles can then be compared to a reference database in order to identify the nature of the contaminant, and possibly when it entered the production cycle.

Thus, a method of identifying the contaminant is envisaged, which implements the method as described above, and which further comprises the steps consisting in

i) define the molecular profile of the contaminant of each contaminated sublot by comparison of the profile observed in this sublot with the profile expected in the absence of contaminant, and

ii) compare the profile obtained in i) with those of a reference database. Alternatively, a method of determining the degree of purity is considered, as defined above, characterized in that the contaminant is further identified for each sublot contaminated in

i) deducing the molecular profile of the contaminant in a contaminated sublot by comparison of the profile observed in this sublot with the profile expected in the absence of contaminant and in

ii) Comparing the profile obtained in i) with those of a reference database.

One or more contaminant profiles are therefore obtained for the starting seed lot, corresponding to the sum of the contaminants of each contaminated sublot.

The methods described above therefore make it possible to carry out a quality control of seed lots, on several different traits (varietal purity, specific purity, agronomic characteristics contamination by pathogens), in a single step, and by quantifying the presence of some of the unwanted traits or contaminants. Furthermore, these methods allow the precise determination of the nature of the contaminants present, due to the use of sequencing which gives precise information which can be easily used, as well as the determination of the presence of SNP (Single Nucleotide Polymorphism, polymorphism relating to a single nucleotide) which could not be detected by other methods (probes, amplifications, DNA chips). These methods therefore provide high precision with regard to the characterization of the batch of seeds tested. They are also quick and easy to implement and therefore save time and reduce the costs of analyzing seed lots. Thus, these methods simplify the analyzes of specific purity, today carried out in a tedious way by operators. They also make it possible to quickly test and reveal the presence of a large number of pathogens (and also to characterize their genotype according to the genes sequenced), which is currently carried out by potential growth of the pathogens. The agronomic character of the lot (and in particular everything related to germination and vigor) can be determined by the presence of the expression of unfavorable genes, rather than by germination of seed samples, which allows saving time and resources.

Thus, the methods described make it possible to improve the precision of the control of seed batches, in particular when they are combined. These same methods can also be transposed and used for the study of the conformity of plants marketed in the form of plants, species with vegetative multiplication, the material evaluated will then consist of sampling plant tissues, the amount of which will be equivalent from one plant to the other, this plant tissue could be, among other things, a leaf disc.

DESCRIPTION OF THE FIGURES

Figure 1: result of the Taqman analysis for a SNP, comprising two allelic forms detected respectively by the FAM and VIC fluorochromes, in samples of maize homozygous (A, B) or heterozygous for SNP (C). A: homozygous sample for the allelic form detected in FAM. B: homozygous sample for the allelic form detected in VIC. C: heterozygous sample for the allelic forms detected in FAM and VIC.

Figure 2: Relative frequency, in each sub-lot, of the alternative allele for SNP10. Sub-lots 3, 14 and 16 show a significant frequency of the alternative allele.

Figure 3: Qualitative profile (presence / absence of a contaminating allele) Profile of the presence of an alternative allele for the 17 markers (line) (16 discriminating markers and one marker associated with a trait) within the 16 sublots ( column). The presence of an alternative allele is detected for at least 3 SNPs in sub-lots 3, 14 and 16. These sub-lots are declared contaminated. The other 13 sublots are declared uncontaminated.

Figure 4: molecular profiles obtained on the 17 SNPs (16 discriminating markers and one marker associated with a trait) obtained on the 16 sublots analyzed. The profile of the first line corresponds to the majority profile, the following profiles to the contaminated profiles observed for lots 3, 14 and 16 respectively.

EXAMPLES

Example 1: Detection of contaminants by Taqman

This example evaluates the possibility of detecting a contaminating seed in a sub-batch of corn seeds, by genotyping using Taqman technology (Applied Biosystem).

FIG. 1 shows the result of the Taqman analysis for an SNP, comprising two allelic forms detected respectively by the fluorochromes FAM and VIC, in samples of corn homozygous or heterozygous at the SNP, and highlights the presence of signal with the FAM probe in a sample homozygous for the VIC allele (B), that is to say a non-specific signal, which does not allow a false positive signal to be distinguished from a signal linked to actual contamination in a sample.

These results show that the Taqman method does not make it possible to detect contaminants reliably.

Example 2: Detection of contaminants by genotyping on a chip

In this example, lots of 200 seeds from a line A containing 10%, 20%, 30%, 40%, and up to 90% of contaminants by a line B were produced and a sample of 15 seeds from this batch was analyzed by genotyping on an Infinium chip (Illumina), in order to assess the feasibility of identifying a contamination. We manage to detect contaminations greater than 10%, but mixtures containing 10% of contamination cannot be distinguished from uncontaminated controls. A fortiori, less significant contaminations will not be detectable.

Example 3: implementation of the method according to the invention on a set of markers

In this example, a set of 16 discriminating markers (SNPs) was used, which unambiguously identify the presence of a variety other than that expected. This set of 16 markers has been defined from reference genotyping data on several thousand markers for the varieties of interest, and makes it possible to differentiate each one from the others thanks to at least 3 discriminating markers. In this case, it is the overall molecular profile of the 16 markers that determines the identity of each variety. Each marker is specific to a locus of interest.

In an experiment under controlled contamination conditions, 24 seeds of a pure L1 line were introduced into a batch of 2376 seeds of a pure L2 line, the batch thus obtained has a purity rate of 99%, the seeds were randomly distributed into twenty-four sub-lots of 100 grains (i.e. 2400 grains analyzed) each batch of seeds thus obtained is ground independently and the DNA is extracted from the ground materials. Thus, there is on average 1 contaminant per batch: the number of sublots is indeed equal to the number of contaminants present in the complete batch of seeds. Due to the random statistical distribution, we know, however, that some sub-lots will not contain contaminants, and that other sublots will contain several contaminants, due to sampling by forming the sublots

For each of the 16 markers, an amplicon of 70 to 120bp was defined, and the 16 markers co-amplified by multiplex PCR. A unique index (TAG) is used for each DNA sample, allowing sequencing of all the amplicons and assigning the sequences obtained to their original batch.

The amplicons have been sequenced by technology. Illumina on a Miniseq sequencer. Matched sequences of 75 bases were generated, assigned to the original DNAs by a demultiplexing step. After removal of the poor quality adapter and base sequences (threshold Q30), each pair of sequences is assembled into a single sequence, then aligned with the reference corn genome (RefGenV4). For each SNP, the relative allelic frequencies of the majority allele and the alternative allele were calculated, and correspond to the number of readings containing the allele of interest compared to the sum of the readings of each allele.

It is considered that there is contamination for an SNP marker if, in a sublot, the sequence of an allelic form, which is not that of the expected allele for the variety tested, appears to be greater than the noise background.

A sample is declared contaminated when it contains at least 3 SNPs for which an alternative allele is detected. Thus, it is concluded that, among these 24 sublots, 13 are considered to be contaminated and 11 to be pure.

The number of contaminated sublots makes it possible to estimate the varietal purity of the analyzed batch, this calculation is carried out using the Seed Cale software which uses the formulas of Remund (2001). In this example, the estimated purity is 99.22% (98.64% -99.6%), for an actual controlled purity of 99%.

d JL

P = 1— (1 -) rn

U ·

In the above case: 1 - (1-13 / 24) ° ⁰¹ = 1-0.9922 = 0.0078 or a purity of 99.22. The confidence interval is also calculated according to the methods described in Remund 2001. Example 4: Identification of the contaminant

In this example, lots of basic corn seeds were analyzed using the same approach as that cited in Example 3. For one lot, 16 sublots of 100 seeds were made.

The seeds of each sublot were crushed and the DNA extracted. A set of 17 markers, including 16 discriminating SNPs (allowing unambiguous identification of the presence of a variety other than that expected) and a marker associated with a trait, has been identified. For each marker, an amplicon of 70-120bp was defined, and the 17 markers were co-amplified by multiplex PCR. A unique index (Tag) is used for each DNA sample, allowing sequencing of all the amplicons and assigning the sequences obtained to their original batch.

The amplicons were sequenced by Illumina technology on a Miniseq sequencer. Matched sequences of 75 bases were generated, assigned to the original DNAs by a demultiplexing step. After removal of the poor quality adapter and base sequences (threshold Q30), each pair of sequences is assembled into a single sequence, then aligned with the reference corn genome (RefGenV4). For each SNP, the relative allelic frequencies of the majority allele and the alternative allele were calculated, and correspond to the number of readings containing the allele of interest compared to the sum of the readings of each allele.

Figure 2 shows, for an SNP (SNP10), the frequency of the alternative allele in each of the sub-lots (i.e. the frequency of appearance of the sequence of the alternative allele). In this example, sublots 3, 14 and 16 show a significant presence of the alternative allele (above the background noise represented by the horizontal line). This analysis is performed for each SNP, and Figure 3 shows the qualitative profile (presence / absence of the alternative allele) obtained for each SNP in each sublot. Confirmation of the presence of an alternative allele for at least 3 SNPs in sub-lots 3, 14 and 16. These 3 sub-lots are declared contaminated. The other 13 sublots are declared uncontaminated. The varietal purity rate estimated with SeedCalc is 99.79% (95% confidence interval: 99.39% - 99.96%).

In parallel, the same batch was analyzed on 558 individual seeds. For each seed, a fragment is taken by punching the embryo using a cookie cutter, then the extracted DNA and genotyping carried out with KASP technology (LGC Genomics) on 16 discriminating markers. This analysis allows to estimate a purity of 99.46% (95% confidence interval: 98.42% - 99.89%).

The SNP17 marker was analyzed separately and used to estimate the purity of the associated trait.

Figure 3 shows that sublots 3 and 16 have a significant frequency of the alternative allele. These 2 sublots are declared contaminated, leading to an estimate of the line purity of 99.87% (95% confidence interval: 99.52 - 99.98%).

The molecular profile identified on the uncontaminated sublots is first used to check its compliance with the expected profile for the variety analyzed (the previous step allows you to check the varietal purity of the batch, this step allows you to check that the variety identified is the one expected). Then, on sub-lots 3, 14 and 16 showing contamination, a contaminating molecular profile is deduced from the observed molecular profile, by subtraction from the expected profile. For each SNP marker showing contamination, the 2 alleles observed are reported (Figure 4). The contaminant can thus be homozygous for the minority allele, or heterozygous.

Each contaminating molecular profile is then compared to a reference database in order to identify it. If this genotype corresponds to a known accession, this is proposed as a potential contaminant, otherwise the contaminating genotype is declared unidentifiable.

This reference database can be refined according to the production plan in particular, this database will then contain, as a priority, all of the varieties grown in the line production sector. And in this context a contaminant which will not appear in this reference base will be qualified as a contaminant linked to the post-harvest process.

Example 5: Implementation of the method for the simultaneous evaluation of the varietal purity and the germinative quality of a batch of seeds

In this example, 16 sub-lots of 100 seeds are formed, so as to evaluate the seed lot on a sample of 1600 seeds. From each sublot, the DNAs and the RNAs are co-extracted.

For this, each sublot is mechanically ground in a tube by the addition of stainless steel balls, the tubes and the grinding support being previously cooled in liquid nitrogen in order to preserve the integrity of the nucleic acids, in particular RNA. Co-extraction of DNA and RNA is carried out using the total DNA, RNA and protein isolation NucleoSpin® TriPrep kit from Macherey-Nagel. In a 1st step, a lysis buffer is added to the ground materials, making it possible to destroy the cellular structures as well as to inactivate enzymes such as RNases simultaneously. The lysates are then deposited on columns containing a silica membrane to which the DNA and RNA molecules are attached. A first elution in a specific buffer makes it possible to elute the DNAs while keeping the RNAs fixed on the silica membrane. After treatment with DNAse degrading the residual DNA, the RNAs are washed and then eluted in RNAse free water.

For each sub-lot, a reverse transcription is carried out, initiated with oligo-dT oligonucleotides making it possible to synthesize the double-stranded DNA complementary to the messenger RNA present in each sample. A DNA mixture is then constituted for each sub-lot, composed of the genomic DNAs extracted and the cDNAs synthesized from the RNA fraction.

A multiplex PCR is carried out on each DNA sample in order to specifically amplify the targets of interest in the form of amplicons from 70 to 120 bp. These amplicons correspond to the genomic regions of interest for determining the molecular profile of varietal identification on the one hand (set of discriminating SNPs), and to the DOG1 gene marker of the dormant state of the seeds on the other hand. A unique index (TAG) is used for each DNA sample, thus making it possible to carry out a sequencing of all the amplicons and to attribute the sequences obtained to their original sublot. The amplicons are sequenced by Illumina technology, generating paired sequences of 75 bases each. These sequences are then assigned to the original DNAs by a demultiplexing step, then undergo different treatments consisting in the removal of the sequences of poor quality adapters and bases (threshold Q30). Each pair of sequences is finally assembled into a single sequence, then aligned with the sequence of the reference genome.

For each SNP, the relative allelic frequencies of the majority allele and the alternative allele were calculated, and correspond to the number of readings containing the allele of interest compared to the sum of the readings of each allele. It is considered that there is contamination for an SNP marker if, in a sublot, the sequence of an allelic form, which is not that of the expected allele for the variety tested, appears to be greater than the noise background. A sample is declared contaminated when it contains at least 3 SNPs for which an allele alternative is detected. The number of contaminated sublots makes it possible to estimate the varietal purity of the batch analyzed. This calculation is carried out using the Seed Cale software which uses the formulas of Remund (2001).

With regard to the DOG1 gene, a sublot is considered to contain a dormant seed if specific sequences of the transcript of this gene are detected in quantities significantly different from the background noise, the expression of this gene being negligible in seeds not dormant. This significance threshold is determined beforehand using a standard range. The dormancy rate is then estimated by counting the number of sublots for which expression of the DOG1 gene is detected, using the calculation method used previously.

Claims

claims

1. Method for determining the quantity of contaminants at at least one locus of interest, present in a seed lot of a variety of interest, characterized in that

a) the seeds of a seed lot are grouped by sub-lots of at least 10 seeds, the number of sub-lots thus obtained being greater than or equal to 10 b) a targeted sequencing is carried out for each sub-lot at least the region of the seed genome, containing the locus of interest,

c) the presence of a contaminant is qualitatively determined for each sub-lot, in the event of detection of an alternative allele to the expected allele (s) for each sequenced genomic region (presence / absence of / expected allele (s))

2. Method according to claim 1, characterized in that the sequencing of step b) is carried out on the DNA extracted from the seeds present in a sublot, the region of the genome of the seeds containing the locus of interest being optionally amplified.

3. Method according to claim 1 or 2, characterized in that steps b), c) and d) are carried out for several regions of the genome corresponding to several loci of interest.

4. Method according to claim 3, characterized in that a subset of these loci of interest is sufficient to identify the variety of interest.

5. Method according to claim 4, characterized in that a batch is declared as containing a contaminant if one observes an alternative allele or allele (s) expected for a single locus of interest.

6. Method according to claim 4, characterized in that a batch is declared as containing a contaminant if one observes an alternative allele to or to the allele (s) expected for more than one locus of interest.

7. Method according to one of claims 1 to 6, characterized in that at least one locus of interest is linked to a character of interest (trait).

8. Method according to claim 3, characterized in that a combination of loci is linked to characters of interest (line).

9. Method according to claim 3, characterized in that a combination of loci is linked to a character of interest (trait).

10. Method according to one of claims 1 to 9, characterized in that at least one locus of interest is linked to a specific trait a priori not present in the seeds of the batch, in order to detect the fortuitous presence of this trait.

1 1. Process according to claim 10, characterized in that the batch is considered to be non-compliant if the line frequency is greater than 10% in the seed batch.

12. Method according to one of claims 1 to 1 1, characterized in that

i) an extraction of RNA is also carried out from the seeds of the sublot, and a reverse transcription of this RNA into cDNA before step b) ii) a sequencing of this cDNA is carried out using primers specific for linked genes to an agronomic property of the seeds, at the same time as the sequencing of step b) is carried out iii) it is determined, for each sub-lot, the presence of seeds having the agronomic property qualitatively, in the event of detection of cDNAs relating to genes specific for the agronomic property of the seeds during the sequencing step ii) (presence / absence of the cDNA)

iv) the quantity of seeds with this agronomic character in the overall lot is determined by the compilation of the qualitative results obtained for all of the sublots in iii).

13. Method according to claim 12, characterized in that the agronomic property of the seeds is chosen from the state of dormancy, the quality of priming, the aptitude for germination, the vigor and the viability of the seeds.

14. Method according to one of claims 1 to 13, characterized in that i) a DNA sequencing of the sublots is carried out using primers specific for one or more species different from those of the seeds present in the sublot, at the same time as the sequencing of step b) is carried out

15. Method according to claim 14, characterized in that at least one 'different species is a weed.

16. Method according to one of claims 1 to 15, characterized in that

17. Method according to claim 16, characterized in that the pathogenic species is a bacterium, a fungus, a virus or an insect.

18. Method according to one of claims 1 to 17, characterized in that, before step b)

i) DNA is extracted from each seed sublot

iii) The DNA extracted in i) and the cDNA obtained in ii) are mixed iv) Optionally, an amplification is carried out on the DNA obtained in iii), specific for certain loci or non-specific

19. The method of claim 18, characterized in that step iv) is carried out by amplifying specific sequences of other organisms whose absence or presence is to be verified.

20. The method of claim 18 or 19, characterized in that step iv) is carried out by amplifying specific sequences making it possible to determine certain agronomic properties of the seeds of the sublot.

21. Method according to claim 20, characterized in that at least one agronomic property of the seeds is chosen from the dormant state, the quality of the priming, the aptitude for germination, the vigor and the viability of the seeds.

22. Method according to one of claims 1 to 21, characterized in that the quantity of seeds in each sublot prepared in step a) is between 80 and 120.

23. Method according to one of claims 1 to 22, characterized in that the quantity of seeds in each sublot prepared in step a) is between 15 and 25.

24. Method according to one of claims 1 to 23, characterized in that the contaminant is further identified for each sublot contaminated with

ii) Comparing the profile obtained in i) with those of a reference database.