KR102270719B1 - Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same - Google Patents

Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same Download PDF

Info

Publication number
KR102270719B1
KR102270719B1 KR1020180108016A KR20180108016A KR102270719B1 KR 102270719 B1 KR102270719 B1 KR 102270719B1 KR 1020180108016 A KR1020180108016 A KR 1020180108016A KR 20180108016 A KR20180108016 A KR 20180108016A KR 102270719 B1 KR102270719 B1 KR 102270719B1
Authority
KR
South Korea
Prior art keywords
gcf
species
lactic acid
acid bacteria
strain
Prior art date
Application number
KR1020180108016A
Other languages
Korean (ko)
Other versions
KR20200029689A (en
Inventor
조서애
곽우리
설동혁
장지성
김혜강
김희발
곽효선
김순한
이우정
Original Assignee
주식회사 이지놈
대한민국 (식품의약품안전처장)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 이지놈, 대한민국 (식품의약품안전처장) filed Critical 주식회사 이지놈
Priority to KR1020180108016A priority Critical patent/KR102270719B1/en
Priority to PCT/KR2019/011665 priority patent/WO2020055076A1/en
Publication of KR20200029689A publication Critical patent/KR20200029689A/en
Application granted granted Critical
Publication of KR102270719B1 publication Critical patent/KR102270719B1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

본 발명은 유산균 동정용 참조서열 제조방법 및 이를 이용한 유산균 동정방법에 관한 것으로서, 상기 방법을 이용하면 시료 내에 존재하는 2종 이상의 유산균을 간단하고 정확하게 검출할 수 있으므로, 이를 효과적으로 유산균의 동정에 이용할 수 있다.The present invention relates to a method for preparing a reference sequence for identification of lactic acid bacteria and a method for identification of lactic acid bacteria using the same, and by using the method, two or more types of lactic acid bacteria present in a sample can be detected simply and accurately, so that it can be effectively used for identification of lactic acid bacteria. have.

Description

유산균 동정용 참조서열 제조방법 및 이를 이용한 유산균 동정방법{Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same}Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same

본 발명은 유산균 동정용 참조서열 제조방법 및 이를 이용한 유산균 동정방법에 관한 것으로서, 더욱 상세하게는 유산균 종별 대표 균주를 선정하고 서열 정보를 멀티-파스타 파일로 생성한 참조서열을 제조하는 방법 및 이를 이용하여 시료 내에 존재하는 2종 이상의 유산균을 간단하고 정확하게 검출하는 동정방법에 관한 것이다.The present invention relates to a method for preparing a reference sequence for identifying lactic acid bacteria and a method for identifying a lactic acid bacteria using the same, and more particularly, to a method for selecting a representative strain for each type of lactic acid bacteria and preparing a reference sequence in which sequence information is generated as a multi-pasta file and using the same Thus, it relates to an identification method that simply and accurately detects two or more types of lactic acid bacteria present in a sample.

최근 건강에 대한 관심이 높아가면서 다양한 건강기능식품들이 출시되고 있다. 그 중 프로바이오틱스 시장은 2016년 기준 처음으로 시장점유율에서 비타민·무기질을 따돌리는 등 가파른 성장세를 보이고 있다. 안전한 건강기능식품 제조유통관리를 위해서는 지속적인 수거검사 등을 통해 원재료로 사용하였다고 제품에 표시한 균들의 정확한 확인이 필수적이다. 하지만 최근의 프로바이오틱스 제품들은 원재료로 단일 균이 아닌 다양한 유산균을 복합적으로 사용하고 있어 정확한 성상 파악에 큰 어려움이 있다.Recently, as interest in health has increased, various health functional foods have been released. Among them, the probiotic market is showing rapid growth, surpassing vitamins and minerals in market share for the first time as of 2016. For safe health functional food manufacturing and distribution management, it is essential to accurately identify the bacteria that have been used as raw materials through continuous collection and inspection. However, recent probiotic products use a combination of various lactic acid bacteria rather than a single bacteria as raw materials, so it is difficult to accurately identify their properties.

지금처럼 시퀀싱 기술의 발달로 미생물의 전장 유전체 정보가 축적되기 전에는 미생물의 분류 동정을 위해 실험적인 기법이 사용되었으며 그 중 현재까지 표준으로 여겨지는 것으로는 DNA-DNA 혼성화(DNA-DNA hybridization; DDH)가 있다. DDH는 한 가닥의 DNA가 일정한 조건 하에서 다른 특정한 염기서열과 상보적으로 염기쌍을 형성하는 성질을 이용한 방법으로 70%의 DDH를 기준으로 동일 종 여부를 판단하였다. 하지만 유전체 정보가 쏟아져 나오고 있는 현재에 상대적으로 오랜 시간이 걸리고, 실험적인 로드가 큰 DDH 기법은 미생물 분류 동정에 더 이상 적합하지 않다.Before the entire genome information of microorganisms was accumulated due to the development of sequencing technology like now, an experimental technique was used for classification and identification of microorganisms, and among them, DNA-DNA hybridization (DDH), which is considered the standard so far, is there is DDH is a method using the property that one strand of DNA complementarily forms base pairs with other specific base sequences under certain conditions. Based on 70% of DDH, whether or not the same species is the same was determined. However, with the current abundance of genomic information, the DDH technique, which takes a relatively long time and has a large experimental load, is no longer suitable for identification of microbial classification.

그 후, DDH 실험보다 상대적으로 쉬운 PCR을 이용하여 16s rRNA 유전자를 증폭하고 시퀀싱하여 유사성(similarity)을 산출하고, 이를 두 균주의 유사도를 측정하는 지표로 사용하여 종을 구분하는 방법도 등장하였다. 이 때 70% DDH에 해당하는 종 구분의 기준은 97% 16s rRNA 유사성이다. 이 방법은 대중적으로 사용되고 있지만 여전히 프로바이오틱스 제품과 같이 다양한 균이 섞인 샘플에서의 미생물 동정에는 적합하지 않다. 보통 1600 bp의 길이를 갖는 16s rRNA 유전자 서열을 단일 리드로 한 번에 시퀀싱하게 되면 현재의 기술로는 에러율이 10%를 초과하는데 이는 종 구분의 기준인 97%를 확인하기에 부적합하며, 또한 여러 균들이 섞여있기 때문에 짧은 리드들을 어셈블리하는 것도 불가능하기 때문이다.Then, using PCR, which is relatively easier than the DDH experiment, the 16s rRNA gene was amplified and sequenced to calculate similarity, and a method of classifying the species using this as an index to measure the similarity between the two strains appeared. In this case, the criterion for species classification corresponding to 70% DDH is 97% 16s rRNA similarity. Although this method is popularly used, it is still not suitable for identification of microorganisms in samples mixed with various bacteria such as probiotic products. When sequencing the 16s rRNA gene sequence, which usually has a length of 1600 bp, as a single read at a time, the error rate exceeds 10% with the current technology, which is inappropriate to identify 97%, the standard for species classification, and also It is also impossible to assemble short leads because the bacteria are mixed.

차세대 염기서열 분석(Next-Generation Sequencing; NGS)이 보편화되면서 미생물의 전장 유전체를 쉽게 얻게 되었고, 이를 통한 인실리코(in silico) 기반 미생물 동정 방법도 등장하게 되었다. ANI(average nucleotide identity)는 비교하려는 균의 유전체 서열을 1020 bp씩 잘라낸 후 서로 높은 유사성을 가진 가닥들의 아이덴티티(identity)를 구한 값으로 95% 값을 기준으로 같은 종 여부를 파악한다. 하지만 이 또한 이미 정보를 알고 있는 타입의 균과 비교하려는 균을 정렬을 통해 짝을 이루어 참조서열 커버율을 계산하는 것이므로 여러 유산균이 섞여있는 샘플의 동정에는 사용할 수 없다.As Next-Generation Sequencing (NGS) became common, whole genomes of microorganisms were easily obtained, and an in silico-based identification method for microorganisms appeared. ANI (average nucleotide identity) is a value obtained by obtaining the identities of strands with high similarity to each other after cutting the genome sequence of the bacteria to be compared by 1020 bp, and identifying the same species based on the 95% value. However, since this also calculates the reference sequence coverage by pairing the bacteria to be compared with the known type of bacteria through sorting, it cannot be used for identification of samples containing several lactic acid bacteria.

기존 미생물의 동정법들은 대부분 단일 종에 대한 동정 분석이어서 여러 균이 섞여있는 메타지놈 샘플에 사용하기에 부적합하다. 메타지놈 샘플에서 단일종을 하나씩 분리하는데 실험적인 어려움이 있을 뿐더러 혼합된 미지의 균을 단일종과 비교할 수도 없기 때문이다.Most of the existing methods for identification of microorganisms are identification analysis for a single species, so they are not suitable for use in metagenome samples in which several bacteria are mixed. This is because there is an experimental difficulty in isolating a single species from a metagenome sample one by one, and it is also impossible to compare the mixed unknown bacteria with a single species.

이에 본 발명자들은 종별 대표 균주를 선정하고 이로부터 생성한 참조서열을 이용하여 시료에 함유된 유산균을 동정하는 경우 검출능이 우수한 것을 확인하였다.Accordingly, the present inventors selected a representative strain for each species and confirmed that the detection ability was excellent when identifying the lactic acid bacteria contained in the sample using the reference sequence generated therefrom.

이에, 본 발명의 목적은 유산균 동정용 참조서열 제조방법을 제공하는 것이다.Accordingly, it is an object of the present invention to provide a method for preparing a reference sequence for identification of lactic acid bacteria.

본 발명의 또 다른 목적은 유산균 동정방법을 제공하는 것이다.Another object of the present invention is to provide a method for identifying lactic acid bacteria.

본 발명은 유산균 동정용 참조서열 제조방법 및 이를 이용한 유산균 동정방법에 관한 것으로, 본 발명에 따른 방법에 의하면 시료 내에 존재하는 2종 이상의 유산균을 간단하고 정확하게 동정할 수 있다.The present invention relates to a method for preparing a reference sequence for identifying lactic acid bacteria and a method for identifying lactic acid bacteria using the same. According to the method according to the present invention, two or more types of lactic acid bacteria present in a sample can be identified simply and accurately.

이하 본 발명을 더욱 자세히 설명하고자 한다.Hereinafter, the present invention will be described in more detail.

본 발명의 일 양태는 다음 단계를 포함하는 유산균 동정용 참조서열 제조방법이다:One aspect of the present invention is a method for preparing a reference sequence for identification of lactic acid bacteria comprising the following steps:

유산균으로부터 유래한 전체 유전체 서열 정보 데이터를 이용하여 종별 대표 균주(strain)를 선정하는 대표 균주 선정 단계; 및A representative strain selection step of selecting a representative strain (strain) for each species using the entire genome sequence information data derived from lactic acid bacteria; and

종별 대표 균주들의 서열 정보를 멀티-파스타(multi-fasta) 파일로 생성하는 참조서열 생성 단계.A reference sequence generation step of generating sequence information of representative strains of each species as a multi-fasta file.

상기 대표 균주 선정 단계는 하기와 같이 수행되는 것일 수 있다:The representative strain selection step may be performed as follows:

각각의 종 내에서 균주 간 페어와이즈 커버리지(pairwise coverage) 최소값을 도출하는 커버리지 계산 단계; 및a coverage calculation step of deriving a minimum value of pairwise coverage between strains within each species; and

종마다 균주 중 커버리지 최소값이 가장 큰 균주를 선택하는 균주 선택 단계.A strain selection step of selecting a strain having the largest coverage minimum among strains for each species.

본 명세서상의 용어 커버리지(breadth of coverage)는, 참조서열을 기준으로 한 특정 영역을 의미한다. 구체적으로, 대상이 되는 염기서열을 랜덤하게 잘라 리드를 생성하고, 상기 리드를 참조서열에 맞추어 정렬하였을 때, 리드가 쌓인 부분의 비율을 커버리지라고 지칭한다.As used herein, the term "breadth of coverage" refers to a specific region based on a reference sequence. Specifically, when a target nucleotide sequence is randomly cut to generate a read, and the read is aligned with a reference sequence, the ratio of the portion in which the read is accumulated is referred to as coverage.

예를 들어, 커버리지가 높을수록 리드가 참조서열의 보다 더 넓은 영역에 쌓인다는 것을 의미하므로, 대상이 되는 염기서열이 참조서열과의 유사도가 높음을 의미한다.For example, a higher coverage means that reads are accumulated in a wider area of the reference sequence, meaning that the target nucleotide sequence has a high degree of similarity to the reference sequence.

본 명세서상의 용어 뎁스(depth of coverage)는, 참조서열을 기준으로 한 특정 지점에서의 수치를 의미한다. 구체적으로, 대상이 되는 염기서열을 랜덤하게 잘라 리드를 생성하고, 상기 리드를 참조서열에 맞추어 정렬하였을 때, 특정 지점에 쌓인 리드의 개수를 표현한 수치를 뎁스라고 지칭한다.As used herein, the term "depth of coverage" means a numerical value at a specific point based on a reference sequence. Specifically, when a target nucleotide sequence is randomly cut to generate a read, and the read is aligned with a reference sequence, a numerical value expressing the number of reads stacked at a specific point is referred to as a depth.

본 명세서상의 용어 1-1 페어와이즈 커버리지는 리드를 참조서열에 맞추어 정렬함으로써 커버율을 계산하는 방법을 의미한다.As used herein, the term 1-1 pairwise coverage refers to a method of calculating a coverage rate by aligning a read with a reference sequence.

상기 유산균은 박테리아, 균류 및 바이러스로 이루어진 군으로부터 선택되는 2종 이상인 것일 수 있으나, 이에 한정되는 것은 아니다.The lactic acid bacteria may be two or more types selected from the group consisting of bacteria, fungi and viruses, but is not limited thereto.

참조서열로 대표균주가 아닌 모든 균을 다 포함시킬 경우 참조서열이 매우 길어지며 정렬을 시키는데 시간이 굉장히 오래 걸리므로 비효율적일 수 있다. 또한 종 내에서 리드들이 한 곳에 집중해서 붙지 않고 여러 군데 나눠서 붙게 되어 커버리지가 매우 낮아진다. 뿐만 아니라 각 유산균마다 전체 유전체가 공개된 균주 수가 전부 달라 커버리지의 기준값을 설정하기 어렵게 된다.If all bacteria other than the representative strain are included as the reference sequence, the reference sequence becomes very long and it takes a very long time to align, which may be inefficient. Also, within a species, leads do not concentrate in one place and stick to several places, which results in very low coverage. In addition, it is difficult to set a reference value for coverage because the number of strains for which the entire genome has been disclosed for each lactic acid bacteria.

따라서 종내의 균주들 중 커버리지 최소값이 가장 큰 균주를 선택함으로써, 자신을 제외한 나머지 균주들과 가장 유사성이 높은(1-1 페어와이즈 커버리지가 높은) 균주를 대표균주로 설정한 뒤 진행하는 것이 바람직하다.Therefore, by selecting the strain with the largest coverage among the strains within the species, it is preferable to set the strain with the highest similarity to the rest of the strains except itself (high 1-1 pairwise coverage) as the representative strain before proceeding. .

본 발명의 다른 양태는 다음 단계를 포함하는 유산균 동정방법이다:Another aspect of the present invention is a method for identifying lactic acid bacteria comprising the following steps:

유산균으로부터 유래한 전체 유전체 서열 정보 데이터를 이용하여 종별 대표 균주(strain)를 선정하는 대표 균주 선정 단계;A representative strain selection step of selecting a representative strain (strain) for each species using the entire genome sequence information data derived from lactic acid bacteria;

종별 대표 균주들의 서열 정보를 멀티-파스타(multi-fasta) 파일로 생성하는 참조서열 생성 단계;a reference sequence generation step of generating sequence information of representative strains of each species as a multi-fasta file;

참조서열 및 종별 대표 균주들의 서열정보 간 페어와이즈 커버리지 최소값을 계산하여 기준값으로 설정하는 기준값 설정 단계; 및a reference value setting step of calculating the minimum pairwise coverage value between the reference sequence and the sequence information of the representative strains of each species and setting it as a reference value; and

시료에 함유된 유산균의 전체 유전체 서열 정보 및 상기 참조서열 간의 페어와이즈 커버리지(pairwise coverage) 값을 계산하는 서열 비교 단계.A sequence comparison step of calculating a pairwise coverage value between the entire genome sequence information of the lactic acid bacteria contained in the sample and the reference sequence.

본 발명의 유산균 동정방법은 박테리아, 균류 및 바이러스로 이루어진 군으로부터 선택되는 2종 이상에 대하여 수행될 수 있으나, 이에 한정되는 것은 아니다. 다만, 동정방법을 수행함에 있어서 과도하게 넓은 종을 포함하는 범위 내에서 수행하는 경우 많은 시간이 소요될 수 있고, 종간 유사성이 높은 종을 동정 대상으로 할 경우 구별에 어려움이 발생할 수 있다.The lactic acid bacteria identification method of the present invention may be performed on two or more types selected from the group consisting of bacteria, fungi and viruses, but is not limited thereto. However, when performing the identification method within a range that includes an excessively wide species, it may take a lot of time, and when a species with high similarity between species is identified, it may be difficult to distinguish.

상기 대표 균주 선정 단계는 하기와 같이 수행되는 것일 수 있다:The representative strain selection step may be performed as follows:

각각의 종 내에서 균주 간 페어와이즈 커버리지(pairwise coverage) 최소값을 도출하는 커버리지 계산 단계; 및a coverage calculation step of deriving a minimum value of pairwise coverage between strains within each species; and

종마다 균주 중 커버리지 최소값이 가장 큰 균주를 선택하는 균주 선택 단계.A strain selection step of selecting a strain having the largest coverage minimum among strains for each species.

상기 서열 비교 단계에서 도출된 값이 상기 기준값을 초과한 경우 해당 균주가 검출된 것으로 판단하는 검출 확인 단계를 추가적으로 포함하는 것일 수 있다.When the value derived in the sequence comparison step exceeds the reference value, the detection confirmation step of determining that the corresponding strain is detected may be additionally included.

상기 방법은 시료에 함유된 2종 이상의 유산균을 동시에 검출하는 것일 수 있다.The method may be to simultaneously detect two or more types of lactic acid bacteria contained in the sample.

본 발명은 유산균 동정용 참조서열 제조방법 및 이를 이용한 유산균 동정방법에 관한 것으로서, 상기 방법을 이용하면 시료 내에 존재하는 2종 이상의 유산균을 간단하고 정확하게 검출할 수 있으므로, 이를 효과적으로 유산균의 동정에 이용할 수 있다.The present invention relates to a method for preparing a reference sequence for identification of lactic acid bacteria and a method for identification of lactic acid bacteria using the same, and by using the method, two or more types of lactic acid bacteria present in a sample can be detected simply and accurately, so that it can be effectively used for identification of lactic acid bacteria. have.

도 1은 대표 균주의 선정 과정을 나타낸 모식도이다.
도 2a는 본 발명의 실시예에서 참조서열 제조에 이용한 유산균 중 비피도박테리움 롱검(Bifidobacterium longum; B. longum)에 대한 1-1 페어와이즈 ANI 결과를 나타낸 그림이다.
도 2b는 본 발명의 실시예에서 참조서열 제조에 이용한 유산균 중 락토코커스 락티스(Lactococcus lactis; Lc . lactis)에 대한 1-1 페어와이즈 ANI 결과를 나타낸 그림이다.
도 2c는 본 발명의 실시예에서 참조서열 제조에 이용한 유산균 중 락토바실러스 파라카제이(Lactobacillus paracasei; L. paracasei)와 락토바실러스 카제이(Lactobacillus paracasei; L. paracasei)에 대한 1-1 페어와이즈 ANI 결과를 나타낸 그림이다.
도 3은 본 발명의 유산균 동정 방법으로 샘플 내 유산균의 상대 비율을 측정한 그래프이다.
도 4a는 검출능 확인을 위한 시뮬레이션 수행 시 19종의 유산균이 들어간 실제 데이터를 이용하여 샘플링을 통해 데이터 용량별 소요 시간을 비교한 그래프이다.
도 4b는 검출능 확인을 위한 시뮬레이션 수행 시 19종의 유산균이 들어간 실제 데이터를 이용하여 정렬 옵션별 소요시간을 비교한 그래프이다.
도 5는 본 발명의 실시예에 따라 시료 053의 종별 검출 여부를 나타내는 커버리지 그래프이다.
1 is a schematic diagram showing the selection process of a representative strain.
Figure 2a is Bifidobacterium longum (Bifidobacterium longum) among the lactic acid bacteria used for preparing the reference sequence in the embodiment of the present invention; longum ; B. longum ) is a figure showing the result of 1-1 pairwise ANI.
Figure 2b is Lactococcus lactis among the lactic acid bacteria used in the preparation of the reference sequence in the embodiment of the present invention (Lactococcus lactis ; Lc . lactis ) is a figure showing the results of 1-1 pairwise ANI.
2c is the embodiment of the invention in reference example sequence produced Lactobacillus casei of the lactic acid bacteria used in the para-1-1 pairwise ANI for;; (L. paracasei Lactobacillus paracasei) (Lactobacillus paracasei L. paracasei) and Lactobacillus casei The figure shows the result.
Figure 3 is a graph measuring the relative ratio of lactic acid bacteria in the sample by the lactic acid bacteria identification method of the present invention.
Figure 4a is a graph comparing the time required for each data volume through sampling using actual data containing 19 kinds of lactic acid bacteria when performing a simulation for confirming detectability.
Figure 4b is a graph comparing the time required for each sorting option using actual data containing 19 kinds of lactic acid bacteria when performing a simulation for confirming detectability.
5 is a coverage graph showing whether sample 053 is detected by type according to an embodiment of the present invention.

이하, 본 발명을 하기의 실시예에 의하여 더욱 상세히 설명한다. 그러나 이들 실시예는 본 발명을 예시하기 위한 것일 뿐이며, 본 발명의 범위가 이들 실시예에 의하여 한정되는 것은 아니다.Hereinafter, the present invention will be described in more detail with reference to the following examples. However, these examples are only for illustrating the present invention, and the scope of the present invention is not limited by these examples.

실시예 1: 유산균 종별 대표 균주의 선정Example 1: Selection of representative strains for each type of lactic acid bacteria

하기 표 1과 같이 식약처 고시 유산균 3속 19종 257균주를 포함한 9속 126종 597균주를 대상으로 하여 전체 유전체 데이터를 수집하였다.As shown in Table 1 below, total genome data were collected from 9 genera, 126 species, 597 strains, including 3 genera, 19 species, and 257 strains of 3 genera, announced by the Ministry of Food and Drug Safety.

GenusGenus # of species# of species # of strain# of strain LactobacillusLactobacillus 4242 183183 BacillusBacillus 3535 195195 BifidobacteriumBifidobacterium 1717 7070 StreptococcusStreptococcus 1010 3636 EnterococcusEnterococcus 77 4343 LeuconostocLeuconostoc 77 1818 PediococcusPediococcus 44 1818 LactococcusLactococcus 33 3333 OenococcusOenococcus 1One 1One TotalTotal 126126 597597

구체적으로 상기 유산균에 해당하는 균주는 하기 표 2와 같다.Specifically, the strains corresponding to the lactic acid bacteria are shown in Table 2 below.

유산균 목록Lactobacillus list GCF_000195515.1, GCF_000196735.1, GCF_000204275.1, GCF_000221645.1, GCF_000242855.2, GCF_000262385.1, GCF_000494835.1, GCF_000508265.1, GCF_000833005.1, GCF_000835145.1, GCF_000973485.1, GCF_001483885.1, GCF_001586105.1, GCF_001593765.1, GCF_001593785.1, GCF_001596755.1, GCF_001705195.1, GCF_001874385.1, GCF_001889285.1, GCF_001922005.1, GCF_002173635.1, GCF_002209305.1, GCF_000165925.1, GCF_000830075.1, GCF_002173495.1, GCF_002243495.1, GCF_001721685.1, GCF_000177235.2, GCF_000009825.1, GCF_000737305.2, GCF_002250115.1, GCF_000169195.2, GCF_000217835.1, GCF_000832905.1, GCF_000876545.1, GCF_001039495.1, GCF_001870065.1, GCF_002250055.1, GCF_000972245.3, GCF_002024265.1, GCF_001719185.1, GCF_900093775.1, GCF_000011145.1, GCF_002157855.1, GCF_002276165.1, GCF_002109385.1, GCF_000706725.1, GCF_000008425.1, GCF_000011645.1, GCF_001596055.1, GCF_001726125.1, GCF_002074075.1, GCF_002074095.1, GCF_002074115.1, GCF_002074135.1, GCF_002173615.1, GCF_002173675.1, GCF_002174255.1, GCF_002236895.1, GCF_000025805.1, GCF_000025825.1, GCF_000225265.1, GCF_000832985.1, GCF_001050455.1, GCF_002009195.1, GCF_000724485.1, GCF_001645685.2, GCF_000294775.2, GCF_000408885.1, GCF_000876525.1, GCF_002068155.1, GCF_000005825.2, GCF_000017885.4, GCF_000590455.1, GCF_000972685.1, GCF_001191605.1, GCF_001431145.1, GCF_001431785.1, GCF_001548215.1, GCF_001578165.1, GCF_001578205.1, GCF_001700735.1, GCF_001704975.1, GCF_001908475.1, GCF_900186955.1, GCF_001895885.1, GCF_001938665.1, GCF_001938685.1, GCF_001938705.1, GCF_002077215.1, GCF_000093085.1, GCF_001578185.1, GCF_002243645.1, GCF_001050115.1, GCF_002202015.1, GCF_000009045.1, GCF_000146565.1, GCF_000186745.1, GCF_000209795.2, GCF_000227465.1, GCF_000227485.1, GCF_000293765.1, GCF_000321395.1, GCF_000338735.1, GCF_000344745.1, GCF_000349795.1, GCF_000497485.1, GCF_000523045.1, GCF_000699465.1, GCF_000699525.1, GCF_000706705.1, GCF_000737405.1, GCF_000772125.1, GCF_000772165.1, GCF_000772205.1, GCF_000782835.1, GCF_000789275.1, GCF_000789295.1, GCF_000827065.1, GCF_000953615.1, GCF_000971925.1, GCF_000973605.1, GCF_001015095.1, GCF_001037985.1, GCF_001465815.1, GCF_001534785.1, GCF_001541905.1, GCF_001565875.1, GCF_001597265.1, GCF_001604995.1, GCF_001660525.1, GCF_001697265.1, GCF_001703495.1, GCF_001704095.1, GCF_001720505.1, GCF_001746575.1, GCF_001747445.1, GCF_001808235.1, GCF_001889385.1, GCF_001889625.1, GCF_001890405.1, GCF_001902555.1, GCF_002055965.1, GCF_002072735.1, GCF_002096095.1, GCF_002142595.1, GCF_002163815.1, GCF_002173695.1, GCF_002173715.1, GCF_002201955.1, GCF_002201995.1, GCF_002202035.1, GCF_002202055.1, GCF_002216085.1, GCF_002269175.1, GCF_002269195.1, GCF_000496285.1, GCF_002113805.1, GCF_000015785.1, GCF_000283695.1, GCF_000284395.1, GCF_000319475.1, GCF_000341875.1, GCF_000455565.1, GCF_000455585.1, GCF_000493375.1, GCF_000583065.1, GCF_000685725.1, GCF_000769555.1, GCF_000973585.1, GCF_000987825.1, GCF_000988345.1, GCF_001023595.1, GCF_001536925.1, GCF_001593395.2, GCF_001685645.1, GCF_001687745.1, GCF_001723585.1, GCF_001752685.1, GCF_001854345.1, GCF_001857985.1, GCF_002005345.1, GCF_002057535.1, GCF_002072695.1, GCF_002105595.1, GCF_002117165.1, GCF_002157265.1, GCF_002192235.1, GCF_002205715.1, GCF_002216755.1, GCF_002237515.1, GCF_002238395.1, GCF_002243325.1, GCF_001889165.1, GCF_001857925.1, GCF_001263395.1, GCF_000010425.1, GCF_000737885.1, GCF_000817995.1, GCF_000966445.2, GCF_001025155.1, GCF_000021425.1, GCF_000022705.1, GCF_000022965.1, GCF_000025245.1, GCF_000092765.1, GCF_000220885.1, GCF_000224965.2, GCF_000260715.1, GCF_000277325.1, GCF_000277345.1, GCF_000414215.1, GCF_000471945.1, GCF_000695895.1, GCF_000816205.1, GCF_000817045.1, GCF_000818055.1, GCF_001688645.2, GCF_002220485.1, GCF_000304215.1, GCF_000164965.1, GCF_000165905.1, GCF_000265095.1, GCF_001025135.1, GCF_001281345.1, GCF_000213865.1, GCF_000220135.1, GCF_000568955.1, GCF_000568975.1, GCF_000569015.1, GCF_000569035.1, GCF_000569055.1, GCF_000569075.1, GCF_001025175.1, GCF_001281425.1, GCF_001990225.1, GCF_001025195.1, GCF_000737865.1, GCF_000024445.1, GCF_001042595.1, GCF_000706765.1, GCF_000800455.1, GCF_001042615.1, GCF_000007525.1, GCF_000008945.1, GCF_000020425.1, GCF_000092325.1, GCF_000166315.1, GCF_000196555.1, GCF_000196575.1, GCF_000219455.1, GCF_000269965.1, GCF_000730205.1, GCF_000772485.1, GCF_000829295.1, GCF_001281305.1, GCF_001293145.1, GCF_001446255.1, GCF_001446275.1, GCF_001719085.1, GCF_001725985.1, GCF_001025215.1, GCF_000800475.2, GCF_001042635.1, GCF_000347695.1, GCF_000157355.2, GCF_001267395.1, GCF_001267865.1, GCF_000007785.1, GCF_000172575.2, GCF_000281195.1, GCF_000317915.1, GCF_000550745.1, GCF_000742975.1, GCF_001598635.1, GCF_001689055.2, GCF_001878735.1, GCF_001886675.1, GCF_001989555.1, GCF_002163735.1, GCF_000174395.2, GCF_000250945.1, GCF_000336405.1, GCF_000444405.1, GCF_000737555.1, GCF_001298485.1, GCF_001412695.1, GCF_001518735.1, GCF_001587115.1, GCF_001635875.1, GCF_001720945.1, GCF_001721065.1, GCF_001721905.1, GCF_001750885.1, GCF_001886635.1, GCF_001895905.1, GCF_001953235.1, GCF_001953255.1, GCF_002007625.1, GCF_002024245.1, GCF_002025045.1, GCF_002025065.1, GCF_900066025.1, GCF_900092475.1, GCF_001558875.1, GCF_000271405.2, GCF_001641305.1, GCF_000504125.1, GCF_001042405.1, GCF_900116935.1, GCF_000011985.1, GCF_000389675.2, GCF_000934625.1, GCF_002224305.1, GCF_002240375.1, GCF_002075105.1, GCF_001936335.1, GCF_000191545.1, GCF_000194115.1, GCF_001663655.1, GCF_001663675.1, GCF_001663715.1, GCF_001663735.1, GCF_001663755.1, GCF_000014465.1, GCF_000359625.1, GCF_001676805.1, GCF_002117225.1, GCF_002117325.1, GCF_002117345.1, GCF_002117375.1, GCF_002138395.1, GCF_002173555.1, GCF_002174235.1, GCF_000211375.1, GCF_000298115.2, GCF_000019245.4, GCF_000026485.1, GCF_000194765.1, GCF_000194785.1, GCF_000309565.2, GCF_000318035.1, GCF_000418515.1, GCF_000829055.1, GCF_002192215.1, GCF_001951175.1, GCF_000785105.2, GCF_001663835.1, GCF_001698165.1, GCF_001723545.1, GCF_002224425.1, GCF_002224505.1, GCF_000014405.1, GCF_000056065.1, GCF_000182835.1, GCF_000191165.1, GCF_001469775.1, GCF_001888905.1, GCF_001888925.1, GCF_001888945.1, GCF_001888965.1, GCF_001888985.1, GCF_001908415.1, GCF_001953135.1, GCF_002000885.1, GCF_002142575.1, GCF_900196735.1, GCF_000010145.1, GCF_000210515.1, GCF_000397165.1, GCF_000466785.3, GCF_001742205.1, GCF_001941785.1, GCF_002119645.1, GCF_002192435.1, GCF_001314245.2, GCF_000014425.1, GCF_002158885.1, GCF_001050475.1, GCF_000831645.3, GCF_000015385.1, GCF_000165775.1, GCF_000189515.1, GCF_000422165.1, GCF_000525715.1, GCF_000961015.1, GCF_001006025.1, GCF_001308285.1, GCF_001702095.1, GCF_001746265.1, GCF_000829395.1, GCF_001936235.1, GCF_000008065.1, GCF_000091405.1, GCF_000204985.1, GCF_000498675.1, GCF_001714745.1, GCF_002176835.1, GCF_002176855.1, GCF_000214785.1, GCF_001050435.1, GCF_001314945.1, GCF_001702115.1, GCF_001702135.1, GCF_000248095.2, GCF_001922025.1, GCF_000014525.1, GCF_000155515.2, GCF_000582665.1, GCF_000829035.1, GCF_001191565.1, GCF_001244395.1, GCF_001514415.1, GCF_002079285.1, GCF_002257625.1, GCF_001702155.1, GCF_001702175.1, GCF_001702195.1, GCF_001443645.1, GCF_002211885.1, GCF_000023085.1, GCF_000148815.2, GCF_000203855.3, GCF_000338115.2, GCF_000392485.3, GCF_000412205.1, GCF_000604105.1, GCF_000931425.1, GCF_001278015.1, GCF_001296095.1, GCF_001302645.1, GCF_001484005.1, GCF_001581895.1, GCF_001596095.1, GCF_001617525.1, GCF_001659745.1, GCF_001660025.1, GCF_001672035.1, GCF_001704315.1, GCF_001704335.1, GCF_001715615.1, GCF_001874125.1, GCF_001880185.1, GCF_001908455.1, GCF_001990145.1, GCF_002024845.1, GCF_002109405.1, GCF_002109425.1, GCF_002116955.1, GCF_002117245.1, GCF_002117265.1, GCF_002117285.1, GCF_002117305.1, GCF_002173655.1, GCF_002174195.1, GCF_002205775.2, GCF_002220175.1, GCF_002220815.1, GCF_000010005.1, GCF_000016825.1, GCF_000159455.2, GCF_000236455.2, GCF_000410995.1, GCF_000439275.1, GCF_001046835.1, GCF_001618905.1, GCF_001688685.2, GCF_000011045.1, GCF_000026505.1, GCF_000026525.1, GCF_000233755.1, GCF_000418475.1, GCF_000418495.1, GCF_001721925.1, GCF_001988935.1, GCF_002076955.1, GCF_002158925.1, GCF_900070175.1, GCF_000224985.1, GCF_000026065.1, GCF_002224565.1, GCF_002250035.1, GCF_000008925.1, GCF_000143435.1, GCF_000758365.1, GCF_001011095.1, GCF_001723525.1, GCF_002162055.1, GCF_900094615.1, GCF_000225325.1, GCF_900183405.1, GCF_000269925.1, GCF_000269945.1, GCF_000006865.1, GCF_000009425.1, GCF_000014545.1, GCF_000025045.1, GCF_000143205.1, GCF_000192705.1, GCF_000236475.1, GCF_000312685.1, GCF_000344575.1, GCF_000468955.1, GCF_000478255.1, GCF_000479375.2, GCF_000761115.1, GCF_000807375.1, GCF_002078375.1, GCF_002078415.1, GCF_002078435.1, GCF_002078475.1, GCF_002078495.1, GCF_002078615.1, GCF_002078765.1, GCF_002078855.1, GCF_002078895.1, GCF_002078915.1, GCF_002078935.1, GCF_002078955.1, GCF_002078975.1, GCF_002078995.1, GCF_002148215.1, GCF_900088425.1, GCF_000981525.1, GCF_000300135.1, GCF_000026405.1, GCF_001998805.1, GCF_000196855.1, GCF_000298875.1, GCF_001536305.1, GCF_000092505.1, GCF_001698145.1, GCF_000014445.1, GCF_000234825.3, GCF_000512955.1, GCF_001047695.1, GCF_001583825.1, GCF_001886915.1, GCF_001891125.1, GCF_002009375.1, GCF_002117185.1, GCF_002148235.1, GCF_000014385.1, GCF_001767275.1, GCF_001922325.1, GCF_002173575.1, GCF_002173595.1, GCF_002174215.1, GCF_000237995.1, GCF_001702215.1, GCF_001702235.1, GCF_001611035.1, GCF_001611075.1, GCF_001611115.1, GCF_001611135.1, GCF_001611155.1, GCF_000014505.1, GCF_000496265.1, GCF_001411765.2, GCF_002173535.1, GCF_002202155.1, GCF_000385925.1, GCF_000017005.1, GCF_000970665.2, GCF_001281105.1, GCF_002073435.1, GCF_001598035.1, GCF_001708305.1, GCF_000283635.1, GCF_001623565.1, GCF_900187085.1, GCF_001642085.1, GCF_000253315.1, GCF_000253335.1, GCF_000448685.2, GCF_000785515.1, GCF_001543085.1, GCF_002073835.1, GCF_002094955.1, GCF_002094975.1, GCF_000011825.1, GCF_000011845.1, GCF_000014485.1, GCF_000182875.1, GCF_000253395.1, GCF_000262675.1, GCF_000698885.1, GCF_000971665.1, GCF_001008015.1, GCF_001280285.1, GCF_001514435.1, GCF_001663795.1, GCF_001685375.1, GCF_001705585.1, GCF_001855705.1, GCF_002012365.1, GCF_900094135.1GCF_000195515.1, GCF_000196735.1, GCF_000204275.1, GCF_000221645.1, GCF_000242855.2, GCF_000262385.1, GCF_000494835.1, GCF_000508265.1, GCF_000833005.1, GCF_000835105.5.1, GCF_000973485.1, GCF_001483885.1, GCF_00148388 1, GCF_001593765.1, GCF_001593785.1, GCF_001596755.1, GCF_001705195.1, GCF_001874385.1, GCF_001889285.1, GCF_001922005.1, GCF_002173635.1, GCF_002209305.1, GCF_000165925.1, GCF_000830075.1, GCF_002173495.1 GCF_002243495.1, GCF_001721685.1, GCF_000177235.2, GCF_000009825.1, GCF_000737305.2, GCF_002250115.1, GCF_000169195.2, GCF_000217835.1, GCF_000832905.1, GCF_000876545.1, GCF_001039495.1, GCF_0018700655. 1, GCF_000972245.3, GCF_002024265.1, GCF_001719185.1, GCF_900093775.1, GCF_000011145.1, GCF_002157855.1, GCF_002276165.1, GCF_002109385.1, GCF_000706725.1, GCF_000008425.1, GCF_000011645.1, GCF_001596055.1, GCF_001596055.1 GCF_001726125.1, GCF_002074075.1, GCF_002074095.1, GCF_002074115.1, GCF_002074135.1, GCF_002173615.1, GCF_002173675.1, GCF_002174255.1, GCF_002236895. 1, GCF_000025805.1, GCF_000025825.1, GCF_000225265.1, GCF_000832985.1, GCF_001050455.1, GCF_002009195.1, GCF_000724485.1, GCF_001645685.2, GCF_000294775.2, GCF_000408885.1, GCF_000876525.1, GCF_002068155.1, GCF_002068155.1 GCF_000005825.2, GCF_000017885.4, GCF_000590455.1, GCF_000972685.1, GCF_001191605.1, GCF_001431145.1, GCF_001431785.1, GCF_001548215.1, GCF_001578165.1, GCF_001578205.1, GCF_001700735.1, GCF_001908475. 1, GCF_900186955.1, GCF_001895885.1, GCF_001938665.1, GCF_001938685.1, GCF_001938705.1, GCF_002077215.1, GCF_000093085.1, GCF_001578185.1, GCF_002243645.1, GCF_001050115.1, GCF_0022020015.1, GCF_002000005.1, GCF GCF_000146565.1, GCF_000186745.1, GCF_000209795.2, GCF_000227465.1, GCF_000227485.1, GCF_000293765.1, GCF_000321395.1, GCF_000338735.1, GCF_000344745.1, GCF_000349795.1, GCF_00065497485.1, GCF_0005269945.1, GCF_0005269945.1. 1, GCF_000699525.1, GCF_000706705.1, GCF_000737405.1, GCF_000772125.1, GCF_000772165.1, GCF_000772205.1, GCF_000782835.1, GCF_000789275.1, GCF_0007892 95.1, GCF_000827065.1, GCF_000953615.1, GCF_000971925.1, GCF_000973605.1, GCF_001015095.1, GCF_001037985.1, GCF_001465815.1, GCF_001534785.1, GCF_001541905.1, GCF_001565875.1, GCF_00150071604995.1, GCF_001500160499 GCF_001660525.1, GCF_001697265.1, GCF_001703495.1, GCF_001704095.1, GCF_001720505.1, GCF_001746575.1, GCF_001747445.1, GCF_001808235.1, GCF_001889385.1, GCF_00165889625.1, GCF_001890405.1, GCF_00190255889. 1, GCF_002072735.1, GCF_002096095.1, GCF_002142595.1, GCF_002163815.1, GCF_002173695.1, GCF_002173715.1, GCF_002201955.1, GCF_002201995.1, GCF_002202035.1, GCF_002202055.1, GCF_002216085.1, GCF_002269175.1 GCF_002269195.1, GCF_000496285.1, GCF_002113805.1, GCF_000015785.1, GCF_000283695.1, GCF_000284395.1, GCF_000319475.1, GCF_000341875.1, GCF_000455565.1, GCF_0005725585.1, GCF_000493375.1, GCF_00058683065.1, GCF_00058683065.1 1, GCF_000769555.1, GCF_000973585.1, GCF_000987825.1, GCF_000988345.1, GCF_001023595.1, GCF_001536925.1, GCF_001593395.2, GCF_001685645.1, GCF_0016 87745.1, GCF_001723585.1, GCF_001752685.1, GCF_001854345.1, GCF_001857985.1, GCF_002005345.1, GCF_002057535.1, GCF_002072695.1, GCF_002105595.1, GCF_002117165.1, GCF_002157265.1, GCF_002205715.1, GCF_002192235.1, GCF_002192235.1 GCF_002216755.1, GCF_002237515.1, GCF_002238395.1, GCF_002243325.1, GCF_001889165.1, GCF_001857925.1, GCF_001263395.1, GCF_000010425.1, GCF_000737885.1, GCF_000817995.1, GCF_000966445.2, GCF_0010255.1, GCF_0001025. 1, GCF_000022705.1, GCF_000022965.1, GCF_000025245.1, GCF_000092765.1, GCF_000220885.1, GCF_000224965.2, GCF_000260715.1, GCF_000277325.1, GCF_000277345.1, GCF_000414215.1, GCF_000471945.1, GCF_000695895.1, GCF_000695895.1 GCF_000816205.1, GCF_000817045.1, GCF_000818055.1, GCF_001688645.2, GCF_002220485.1, GCF_000304215.1, GCF_000164965.1, GCF_000165905.1, GCF_000265095.1, GCF_001025135.1, GCF_001281345.1, GCF_0002210135135.1, GCF_000213865.1. 1, GCF_000568955.1, GCF_000568975.1, GCF_000569015.1, GCF_000569035.1, GCF_000569055.1, GCF_000569075.1, GCF_001025175.1, GCF_001281425.1, GCF_0 01990225.1, GCF_001025195.1, GCF_000737865.1, GCF_000024445.1, GCF_001042595.1, GCF_000706765.1, GCF_000800455.1, GCF_001042615.1, GCF_000007525.1, GCF_000008945.1, GCF_000020425.1, GCF_000000092325.1, GCF_000000092325.1, GCF GCF_000196555.1, GCF_000196575.1, GCF_000219455.1, GCF_000269965.1, GCF_000730205.1, GCF_000772485.1, GCF_000829295.1, GCF_001281305.1, GCF_001293145.1, GCF_001446255.1, GCF_001446275.1, GCF_00172595985, GCF_001446275.1, GCF. 1, GCF_001025215.1, GCF_000800475.2, GCF_001042635.1, GCF_000347695.1, GCF_000157355.2, GCF_001267395.1, GCF_001267865.1, GCF_000007785.1, GCF_000172575.2, GCF_000281195.1, GCF_000317915.1, GCF_000550745.1 GCF_000742975.1, GCF_001598635.1, GCF_001689055.2, GCF_001878735.1, GCF_001886675.1, GCF_001989555.1, GCF_002163735.1, GCF_000174395.2, GCF_000250945.1, GCF_000336405.1, GCF_000444405.1, GCF_000798485.440555.1, GCF_000737555.1 1, GCF_001412695.1, GCF_001518735.1, GCF_001587115.1, GCF_001635875.1, GCF_001720945.1, GCF_001721065.1, GCF_001721905.1, GCF_001750885.1, GC F_001886635.1, GCF_001895905.1, GCF_001953235.1, GCF_001953255.1, GCF_002007625.1, GCF_002024245.1, GCF_002025045.1, GCF_002025065.1, GCF_900066025.1, GCF_9000305.2475.1, GCF_001558875.1, GCF_000271405. 1, GCF_000504125.1, GCF_001042405.1, GCF_900116935.1, GCF_000011985.1, GCF_000389675.2, GCF_000934625.1, GCF_002224305.1, GCF_002240375.1, GCF_002075105.1, GCF_001936335.1, GCF_000191545.1, GCF_000194115.1, GCF_000194115.1 GCF_001663655.1, GCF_001663675.1, GCF_001663715.1, GCF_001663735.1, GCF_001663755.1, GCF_000014465.1, GCF_000359625.1, GCF_001676805.1, GCF_002117225.1, GCF_002117325.1, GCF_002117345.1, GCF_002111387375.1, GCF_002111387375.1 1, GCF_002173555.1, GCF_002174235.1, GCF_000211375.1, GCF_000298115.2, GCF_000019245.4, GCF_000026485.1, GCF_000194765.1, GCF_000194785.1, GCF_000309565.2, GCF_000318035.1, GCF_000418515.1, GCF_000829055.1 GCF_002192215.1, GCF_001951175.1, GCF_000785105.2, GCF_001663835.1, GCF_001698165.1, GCF_001723545.1, GCF_002224425.1, GCF_002224505.1, GCF_000014405.1, GCF_000056065.1, GCF_000182835.1, GCF_000191165.1, GCF_001469775.1, GCF_001888905.1, GCF_001888925.1, GCF_001888945.1, GCF_001888965.1, GCF_001888985.1, GCF_001908415.1, GCF_001953135. 1, GCF_900196735.1, GCF_000010145.1, GCF_000210515.1, GCF_000397165.1, GCF_000466785.3, GCF_001742205.1, GCF_001941785.1, GCF_002119645.1, GCF_002192435.1, GCF_001314245.2, GCF_000014425.1, GCF_002158885.1, GCF_002158885.1 GCF_001050475.1, GCF_000831645.3, GCF_000015385.1, GCF_000165775.1, GCF_000189515.1, GCF_000422165.1, GCF_000525715.1, GCF_000961015.1, GCF_001006025.1, GCF_00130893.285.1, GCF_001702095.1265.1, GCF_001746000 1, GCF_001936235.1, GCF_000008065.1, GCF_000091405.1, GCF_000204985.1, GCF_000498675.1, GCF_001714745.1, GCF_002176835.1, GCF_002176855.1, GCF_000214785.1, GCF_001050435.1, GCF_001314945.1, GCF_001702115.1, GCF_001702115.1 GCF_001702135.1, GCF_000248095.2, GCF_001922025.1, GCF_000014525.1, GCF_000155515.2, GCF_000582665.1, GCF_000829035.1, GCF_001191565.1, GCF_001244395 .1, GCF_001514415.1, GCF_002079285.1, GCF_002257625.1, GCF_001702155.1, GCF_001702175.1, GCF_001702195.1, GCF_001443645.1, GCF_002211885.1, GCF_000023085.1, GCF_000148815.2, GCF_00020385115.2, GCF , GCF_000392485.3, GCF_000412205.1, GCF_000604105.1, GCF_000931425.1, GCF_001278015.1, GCF_001296095.1, GCF_001302645.1, GCF_001484005.1, GCF_001581895.1, GCF_00161596095.1, GCF_00166002525.1, GCF_001652525.1, GCF_0016525.145.1 .1, GCF_001672035.1, GCF_001704315.1, GCF_001704335.1, GCF_001715615.1, GCF_001874125.1, GCF_001880185.1, GCF_001908455.1, GCF_001990145.1, GCF_002024845.1, GCF_002109405.1, GCF_00210942955.1, GCF_002109425.1, GCF_002109425.1 , GCF_002117245.1, GCF_002117265.1, GCF_002117285.1, GCF_002117305.1, GCF_002173655.1, GCF_002174195.1, GCF_002205775.2, GCF_002220175.1, GCF_002220815.1, GCF_000010005.1, GCF_000016825.1, GCF_0002394555.2, GCF_0002364555.2 .2, GCF_000410995.1, GCF_000439275.1, GCF_001046835.1, GCF_001618905.1, GCF_001688685.2, GCF_000011045.1, GCF_000026505.1, GCF_000026525.1, GCF_000233 755.1, GCF_000418475.1, GCF_000418495.1, GCF_001721925.1, GCF_001988935.1, GCF_002076955.1, GCF_002158925.1, GCF_900070175.1, GCF_000224985.1, GCF_000026065.1, GCF_002224565.1, GCF_00222500035.1, GCF_0000089251, GCF GCF_000143435.1, GCF_000758365.1, GCF_001011095.1, GCF_001723525.1, GCF_002162055.1, GCF_900094615.1, GCF_000225325.1, GCF_900183405.1, GCF_000269925.1, GCF_000269945.1, GCF_000006865.1, GCF_00000945. 1, GCF_000025045.1, GCF_000143205.1, GCF_000192705.1, GCF_000236475.1, GCF_000312685.1, GCF_000344575.1, GCF_000468955.1, GCF_000478255.1, GCF_000479375.2, GCF_000761115.1, GCF_00080207375.1, GCF_00207375.1, GCF_00207375.1, GCF GCF_002078415.1, GCF_002078435.1, GCF_002078475.1, GCF_002078495.1, GCF_002078615.1, GCF_002078765.1, GCF_002078855.1, GCF_002078895.1, GCF_002078915.1, GCF_002078935.1, GCF_00207958975.1, GCF_002078955.1, GCF_002078955.1, GCF_002078955.1 1, GCF_002148215.1, GCF_900088425.1, GCF_000981525.1, GCF_000300135.1, GCF_000026405.1, GCF_001998805.1, GCF_000196855.1, GCF_000298875.1, GCF_001 536305.1, GCF_000092505.1, GCF_001698145.1, GCF_000014445.1, GCF_000234825.3, GCF_000512955.1, GCF_001047695.1, GCF_001583825.1, GCF_001886915.1, GCF_001891125.1, GCF_0020093751, GCF_00211718235.1, GCF_002148235.1, GCF GCF_000014385.1, GCF_001767275.1, GCF_001922325.1, GCF_002173575.1, GCF_002173595.1, GCF_002174215.1, GCF_000237995.1, GCF_001702215.1, GCF_001702235.1, GCF_001611035.1, GCF_001611075.1, GCF_001611115.1, GCF_001611115.1. 1, GCF_001611155.1, GCF_000014505.1, GCF_000496265.1, GCF_001411765.2, GCF_002173535.1, GCF_002202155.1, GCF_000385925.1, GCF_000017005.1, GCF_000970665.2, GCF_001281105.1, GCF_002073435.1, GCF_001598035.1, GCF_001598035.1 GCF_001708305.1, GCF_000283635.1, GCF_001623565.1, GCF_900187085.1, GCF_001642085.1, GCF_000253315.1, GCF_000253335.1, GCF_000448685.2, GCF_000785515.1, GCF_0015475.3085.1, GCF_0020773835.1, GCF_002094955.1, GCF_00209495 1, GCF_000011825.1, GCF_000011845.1, GCF_000014485.1, GCF_000182875.1, GCF_000253395.1, GCF_000262675.1, GCF_000698885.1, GCF_000971665.1, GCF_ 001008015.1, GCF_001280285.1, GCF_001514435.1, GCF_001663795.1, GCF_001685375.1, GCF_001705585.1, GCF_001855705.1, GCF_002012365.1, GCF_900094135.1

NCBI(https://www.ncbi.nlm.nih.gov/)에서 종별로 전체 유전체(complete genome)를 받아서 종내 1-1 페어와이즈(pairwise) ANI(https://github.com/chjp/ANI)를 구한 후 95%를 기준으로 필터링하였다(A균과 B균의 ANI를 구하는 명령어: perl ANI.pl --fd formatdb --bl blastall --qr A.fa --sb B.fa --od result > A_B_ANI.txt).Receive the complete genome for each species from NCBI (https://www.ncbi.nlm.nih.gov/) and 1-1 pairwise ANI ( https://github.com/chjp/ANI) ) and filtered based on 95% (command to obtain ANI of bacteria A and B: perl ANI.pl --fd formatdb --bl blastall --qr A.fa --sb B.fa --od result > A_B_ANI.txt).

그 후 종내 균주(strain) 각각에 대해 ART 시뮬레이션(https://www.niehs.nih.gov/research/resources/software/biostatistics/art/)을 이용(art_illumina -p -l 100 -f 100 -m 350 -s 10)하였고, 시뮬레이션 데이터(illumina pair-end simulation data)를 얻었다(A균을 illumina paired-end 리드로 시뮬레이션 하는 명령어: ART/art_illumina -i A.fa -p -l 100 -f 100 -m 350 -s 10 -o A_).Then, for each strain within the species, ART simulations (https://www.niehs.nih.gov/research/resources/software/biostatistics/art/) were used (art_illumina -p -l 100 -f 100 -m) 350 -s 10) and obtained simulation data (illumina pair-end simulation data) (command to simulate bacteria A with illumina paired-end read: ART/art_illumina -i A.fa -p -l 100 -f 100 - m 350 -s 10 -o A_).

위와 마찬가지로 종내에서 1-1 페어와이즈로 참조 유전체를 지정해 각각의 커버리지(정렬은 bowtie2 기본 옵션, Bam file sorting - samtools(http://samtools.sourceforge.net/), 커버리지 계산은 bedtools의 genomecov)를 구하였다(input.bam 파일을 정렬하여 output_sorted.bam 파일을 생성하는 명령어: samtools sort input.bam -o output_sorted.bam).As above, by designating the reference genome with 1-1 pairwise within the species, each coverage (sort is bowtie2 default option, Bam file sorting - samtools (http://samtools.sourceforge.net/), and coverage calculation is bedtools genomecov) (command to generate output_sorted.bam file by sorting input.bam file: samtools sort input.bam -o output_sorted.bam).

도 1과 같이 참조 유전체로 사용한 각 균주의 커버리지 최소값을 구한 후, 그 중 가장 큰 최소값을 보인 균주를 그 종의 대표 균주로 설정하였다.As shown in FIG. 1, after obtaining the minimum coverage value of each strain used as a reference genome, the strain showing the largest minimum value among them was set as the representative strain of that species.

구체적으로, NCBI에서 모은 전체 유전체에 대해 종별로 모은 후 종내에서 ANI 기준 95%가 넘지 않는 균주를 필터링하였다. 그 후 필터링된 대표 균주에 대해 1-1 페어와이즈 커버리지를 구하였다.Specifically, after collecting by species for the entire genome collected by NCBI, strains that did not exceed 95% of the ANI standard within the species were filtered. Then, 1-1 pairwise coverage was obtained for the filtered representative strain.

예를 들어 종내 균주가 총 4개일 때 (Strain_1, Strain_2, Strain_3, Strain_4), 각 균주가 참조서열이 될 때 가지는 커버리지 값들 중 최소값을 뽑아(Stain_1: 0.79, Strain_2: 0.86, Strain_3: 0.87, Strain_4: 0.83) 균주별로 비교하여 가장 큰 커버리지 값을 보이는 Strain_3을 이 종의 대표 균주라 선정하였다.For example, when there are a total of 4 strains within a species (Strain_1, Strain_2, Strain_3, Strain_4), the minimum value among the coverage values of each strain as a reference sequence is extracted (Stain_1: 0.79, Strain_2: 0.86, Strain_3: 0.87, Strain_4: 0.83) Strain_3, which shows the largest coverage value compared to each strain, was selected as the representative strain of this species.

상기 대표 균주 130개에 대하여 멀티-파스타(multi-fasta) 파일을 생성하여 메타지놈 샘플의 성분을 파악하는 참조서열로 준비하였다. 참조서열이 대표 균주일 때 및 대표 균주들의 집단일 때의 커버리지 차이를 보아 위에서 정한 커버리지의 기준값을 사용하여도 되는지 확인하였다.A multi-fasta file was generated for 130 of the representative strains and prepared as a reference sequence for identifying the components of the metagenome sample. It was confirmed whether the reference value of the coverage determined above could be used by looking at the difference in coverage when the reference sequence was a representative strain and a group of representative strains.

한편, ANI를 기준으로 필터링할 때 둘 이상의 그룹을 보인다면 그룹을 따로 분리하여 커버리지를 계산해 둘 이상의 대표 균주를 설정하였고 종내 전체 유전체가 2개 균주일 경우 둘 중 하나를 랜덤으로, 1개 균주일 경우 별도의 계산 없이 바로 대표 균주로 설정하였다. 70% DDH 내지 95% ANI 에 해당하는 커버리지는 종별 대표 균주가 가진 커버리지 값들 중 최소값으로 설정하였다.On the other hand, if two or more groups are shown when filtering based on ANI, two or more representative strains are set by calculating coverage by separating groups. In this case, it was set as a representative strain directly without a separate calculation. The coverage corresponding to 70% DDH to 95% ANI was set as the minimum value among the coverage values of the representative strains by species.

실시예 2: 검출능 검정Example 2: Detectability Assay

NCBI-SRA(https://www.ncbi.nlm.nih.gov/SRA)에서 단일 균종에 대한 WGS(Whole genome shotgun) 데이터를 다운받아 대표 균주 집단에 정렬하여 해당 균을 검출해 내는지 확인하였다. 이 때 하나의 균이 아닌 두 개의 균이 검출된 종 (L.casei, L. paracasei , L. helveticus)에 대해서는 검출된 두 종에 대한 모든 전체 유전체를 참조 서열로 사용하여 bowtie2 -a 옵션으로 정렬, 가장 높은 커버리지를 보인 종을 검출종으로 지정하였다.WGS (Whole genome shotgun) data for a single strain was downloaded from NCBI-SRA (https://www.ncbi.nlm.nih.gov/SRA) and aligned to a representative strain group to confirm whether the corresponding strain was detected. At this time, for the species ( L. casei , L. paracasei , L. helveticus ) in which two bacteria were detected instead of one, all genomes of the two detected species were used as reference sequences and aligned with the bowtie2 -a option. , the species with the highest coverage was designated as the detection species.

다음으로 메타지놈 샘플에서의 검출능을 살펴보기 위하여 시뮬레이션 데이터, NCBI-SRA 데이터, 실제 유산균 데이터(illumina, ion torrent의 두 가지 플랫폼) 세 가지 단계로 프로그램을 구동하였다.Next, the program was run in three steps: simulation data, NCBI-SRA data, and actual lactic acid bacteria data (two platforms of illumina and ion torrent) to examine the detectability in metagenome samples.

구체적으로, 시뮬레이션 데이터의 경우 유산균 종마다의 실제 유전 정보를 마치 시퀀싱한 것처럼 리드로 만들어 낸 데이터를 말한다. 전체 유전정보가 있으면 컴퓨터 소프트웨어(ART simulator)로 시뮬레이션 리드 데이터를 생성할 수 있다. 즉, 실제 데이터가 아닌 시뮬레이션 데이터를 만들어 이를 합산한 것을 말한다.Specifically, in the case of simulation data, it refers to data created as a read as if sequencing the actual genetic information of each lactic acid bacteria species. With full genetic information, simulation read data can be generated with computer software (ART simulator). In other words, it means that simulation data, not actual data, is created and summed.

NCBI-SRA 데이터는 공개 데이터로 다른 실험자가 유산균 단일 종에 대해 실제 시퀀싱한 데이터를 말하며 10개의 독립 시퀀싱 데이터를 하나로 합산한 것이다.The NCBI-SRA data is public data, and refers to data actually sequenced for a single species of lactic acid bacteria by another experimenter, and is the sum of 10 independent sequencing data into one.

실제 유산균 데이터란 프로바이오틱스 제품에서 뽑아낸 시퀀싱 데이터로 여러 종의 데이터를 포함하는 메타게놈 시퀀싱 데이터를 말한다.Actual lactic acid bacteria data is sequencing data extracted from probiotic products and refers to metagenome sequencing data that includes data of several species.

첫째 시뮬레이션 데이터를 활용한 경우, 유산균 10종(L.reuteri, L.delbrueckii , L.rhamnosus, B.longum, L.acidophilus, B.bifidum, L.salivarius, L.fermentum, B.breve, E.faecalis)에 대해 각각의 전체 유전체에서 ART 시뮬레이션을 이용하여 리드(simulated illumina pair-end)를 얻고(art_illumina -p -l 100 -f 100 -m 350 -s 10) 이를 결합하여 하나의 커다란 메타지놈 데이터를 만들었다. 이 때 각 종의 리드 수는 그 종의 시퀀스 길이에 비례해서 넣어주었다.First, when the simulation data were used, 10 lactic acid bacteria (L.reuteri, L.delbrueckii , L.rhamnosus, B.longum, L.acidophilus, B.bifidum, L.salivarius, L.fermentum, B.breve, E. faecalis) from each entire genome using ART simulation to obtain a simulated illumina pair-end (art_illumina -p -l 100 -f 100 -m 350 -s 10) and combine them to form one large metagenomic data made At this time, the number of reads of each species was added in proportion to the sequence length of that species.

메타지놈 샘플을 위와 같은 방식으로 대표 균주 집단에 정렬하고 커버리지 측정을 이용하여 균을 검출하였으며, 기존의 메타지놈 분석 소프트웨어인 MetaPhlan(http://huttenhower.sph.harvard.edu/metaphlan), MetaPhlan 2(http://huttenhower.sph.harvard.edu/metaphlan2)와 비교하였다. 이 때 설정한 커버리지 기준값인 0.7137을 넘어서는 종을 검출종으로 판단하였고, MetaPhlan과 MetaPhlan 2는 종 수준으로 판별된 것을 검출종으로 판단하였다.Metagenome samples were aligned to the representative strain population in the same manner as above, and bacteria were detected using coverage measurement, and MetaPhlan (http://huttenhower.sph.harvard.edu/metaphlan), MetaPhlan 2, an existing metagenome analysis software. (http://huttenhower.sph.harvard.edu/metaphlan2). At this time, the species exceeding the set coverage standard value of 0.7137 was judged as the detection species, and MetaPhlan and MetaPhlan 2 were judged to be the detection species at the species level.

추가적으로 비율의 분포를 구하기 위해 식약처 고시 유산균 19종에 해당하는 전체 유전체를 모두 사용하였는데, 하기 표 3과 같이 19종 내 대표 균주인 23 균주별 각 그룹의 모든 균주를 연결하여 하나의 fasta 파일로 만들고 이를 합쳐 참조 서열로 이용하였다. 그 후 비율은 각 그룹별 뎁스(depth)의 상대 비율을 구해 사용하였으며 뎁스는 그룹 내 균주들의 평균 길이로 나누어 구하였다.In addition, to obtain the distribution of the ratio, all genomes corresponding to 19 lactic acid bacteria notified by the Ministry of Food and Drug Safety were used, and as shown in Table 3 below, all strains of each group of 23 strains, which are representative strains within 19 strains, were linked to form a single fasta file. created and combined to be used as a reference sequence. After that, the ratio was used by obtaining the relative ratio of the depth for each group, and the depth was obtained by dividing the average length of the strains in the group.

SpeciesSpecies AccessionAccession Bifidobacterium_animalisBifidobacterium_animalis GCF_000260715.1GCF_000260715.1 Bifidobacterium_bifidumBifidobacterium_bifidum GCF_000164965.1GCF_000164965.1 Bifidobacterium_breveBifidobacterium_breve GCF_000568955.1GCF_000568955.1 Bifidobacterium_longumBifidobacterium_longum GCF_001719085.1GCF_001719085.1 GCF_000092325.1GCF_000092325.1 GCF_001281305.1GCF_001281305.1 Enterococcus_faecalisEnterococcus_faecalis GCF_001886675.1GCF_001886675.1 Enterococcus_faeciumEnterococcus_faecium GCF_900066025.1GCF_900066025.1 Lactobacillus_acidophilusLactobacillus_acidophilus GCF_000389675.2GCF_000389675.2 Lactobacillus_caseiLactobacillus_casei GCF_000829055.1GCF_000829055.1 GCF_000019245.4GCF_000019245.4 Lactobacillus_delbrueckiiLactobacillus_delbrueckii GCF_001953135.1GCF_001953135.1 Lactobacillus_fermentumLactobacillus_fermentum GCF_001742205.1GCF_001742205.1 Lactobacillus_gasseriLactobacillus_gasseri GCF_002158885.1GCF_002158885.1 Lactobacillus_helveticusLactobacillus_helveticus GCF_000525715.1GCF_000525715.1 Lactobacillus_paracaseiLactobacillus_paracasei GCF_001514415.1GCF_001514415.1 Lactobacillus_plantarumLactobacillus_plantarum GCF_001581895.1GCF_001581895.1 Lactobacillus_reuteriLactobacillus_reuteri GCF_001046835.1GCF_001046835.1 Lactobacillus_rhamnosusLactobacillus_rhamnosus GCF_000418475.1GCF_000418475.1 Lactobacillus_salivariusLactobacillus_salivarius GCF_900094615.1GCF_900094615.1 Lactococcus_lactisLactococcus_lactis GCF_000006865.1GCF_000006865.1 GCF_002078765.1GCF_002078765.1 Streptococcus_thermophilusStreptococcus_thermophilus GCF_900094135.1GCF_900094135.1

두 번째로 NCBI-SRA 데이터를 활용한 경우, 상기 데이터를 다운받아 종별로 가장 가까운 균주를 찾고, 해당 균주의 서열 길이에 비례하여 리드 수를 맞춘 후 결합하여 상기와 같은 방법으로 비교하여 표 4와 같이 나타내었다.Second, when using NCBI-SRA data, download the data, find the closest strain for each species, match the number of reads in proportion to the sequence length of the strain, combine and compare in the same way as in Table 4 shown together.

SpeciesSpecies AccesionAccesion Nearest strainnearest strain Strain's length
(bp)
Strain's length
(bp)
# of Read# of Read
L.delbrueckiiL. delbrueckii ERR231531ERR231531 GCF_001953135.1GCF_001953135.1 1,868,1801,868,180 1,774,9591,774,959 L.gasseriL. gasseri ERX980028ERX980028 GCF000014425.1GCF000014425.1 1,894,3601,894,360 1,799,8331,799,833 L.salivariusL. salivarius ERX529268ERX529268 GCF001011095.1GCF001011095.1 1,978,3641,978,364 1,879,6451,879,645 L.acidophilusL. acidophilus SRX456377SRX456377 GCF_000934625.1GCF_000934625.1 1,991,9691,991,969 1,892,5711,892,571 L.reuteriL. reuteri SRX456270SRX456270 GCF_001046835.1GCF_001046835.1 1,993,9671,993,967 1,894,4701,894,470 B.bifidumB. bifidum SRX456396SRX456396 GCF_001025135.1GCF_001025135.1 2,211,0392,211,039 2,100,7102,100,710 L.helveticusL. helveticus SRX456228SRX456228 GCF_000422165.1GCF_000422165.1 2,225,9622,225,962 2,114,8882,114,888 B.breveB. breve SRX456387SRX456387 GCF_001025175.1GCF_001025175.1 2,269,4152,269,415 2,156,1732,156,173 Lc.lactisLc.lactis ERX231530ERX231530 GCF_000192705.1GCF_000192705.1 2,518,7372,518,737 2,393,0542,393,054 B.longumB. longum SRX456377SRX456377 GCF_000269965.1GCF_000269965.1 2,828,9582,828,958 2,687,7952,687,795

마지막으로 실제 데이터의 검출능을 살펴보기 위하여 유산균 19종이 모두 포함된 시뮬레이션 전체 유전체 데이터와 4 내지 11개의 유산균이 포함된 유산균 제품의 이온 토렌트(on torrent) 전체 유전체 데이터를 사용하여 분석하였으며 마찬가지로 MetaPhlan, MetaPhlan 2와 비교하였다. 먼저, TRIMMOMATIC(TRAILING:30, 퀄리티가 떨어지는 시퀀싱 리드를 제거하는 명령어)을 이용하여 품질관리(quality control)를 한 뒤 사용하였다.Finally, in order to examine the detectability of the actual data, simulation whole genome data including all 19 types of lactic acid bacteria and on torrent whole genome data of lactic acid bacteria products containing 4 to 11 lactic acid bacteria were used for analysis. Similarly, MetaPhlan, It was compared with MetaPhlan 2. First, quality control was performed using TRIMMOMATIC (TRAILING:30, a command to remove sequencing reads with poor quality) and then used.

이 때 일루미나(Illumina) 데이터의 경우 용량을 30Gb, 15Gb, 7.5Gb, 3Gb, 1.5Gb 줄여가면서 수행, 소요시간을 측정하였으며 bowtie2의 -very-fast와 130종 참조 서열을 연결시켰을 때의 시간도 비교하였다. 이온 토렌트 데이터의 경우 정렬 프로그램으로 bowtie2가 아닌 TMAP aligner를 사용하였으며 stage1 map4 옵션을 사용하였다.At this time, in the case of Illumina data, the capacity was reduced by 30 Gb, 15 Gb, 7.5 Gb, 3 Gb, and 1.5 Gb, and the execution and required time were measured. did. In the case of ion torrent data, the TMAP aligner, not bowtie2, was used as the alignment program, and the stage1 map4 option was used.

상기 3가지 방법을 사용하여, MetaPhlan, MetaPhlan 2와 비교하여 표 5와 같이 나타내었다.Using the above three methods, compared to MetaPhlan and MetaPhlan 2, it is shown in Table 5.

SpeciesSpecies 서열 길이 (bp)Sequence length (bp) Fastq 리드 계수Fastq lead counting 데이터 용량 (Mb)Data capacity (Mb) L.reuteriL. reuteri 1,993,9671,993,967 1,809,7651,809,765 446.11446.11 L.delbrueckiiL. delbrueckii 1,868,1801,868,180 1,695,5981,695,598 424.36424.36 L.rhamnosusL. rhamnosus 2,883,3762,883,376 2,617,0112,617,011 650.3650.3 B.longumB. longum 2,477,8382,477,838 2,054,9032,054,903 508.65508.65 L.acidophilusL. acidophilus 1,991,5791,991,579 1,807,5981,807,598 452.41452.41 B.bifidumB. bifidum 2,186,8822,186,882 1,984,8591,984,859 493.1493.1 L.salivariusL. salivarius 2,033,3612,033,361 1,845,5201,845,520 460.23460.23 L.fermentumL. fermentum 1,949,8741,949,874 1,769,7451,769,745 439.61439.61 B.breveB. breve 2,244,6242,244,624 2,037,2662,037,266 502.29502.29 E.faecalisE. faecalis 2,668,2552,668,255 2,421,7632,421,763 597.34597.34

실시예 3: 기준값의 설정Example 3: Setting of reference values

사용한 전체 유전체는 126종 597균주였으며, 종 내에는 1 내지 61개의 균주를 포함하였다. 대표 균주를 구하기에 앞서 종내 1-1 페어와이즈 ANI를 구하여 필터링하였다.The total genome used was 597 strains of 126 species, and 1 to 61 strains were included in the species. Prior to obtaining representative strains, intraspecies 1-1 pairwise ANIs were obtained and filtered.

도 2a 내지 2c에서 확인할 수 있듯이, 95% 기준 두 개 이상의 그룹을 보인 경우는 식약처 고시 19종 중 비피도박테리움 롱검(Bifidobacterium longum; B.longum) 및 락토코커스 락티스(Lactococcus lactis; Lc . lactis), 그리고 서로 다른 종이지만 ANI 기준 같은 종으로 구분되는 락토바실러스 카제이(Lactobacillus casei; L. casei) 및 락토바실러스 파라카제이(Lactobacillus paracasei; L.paracasei)까지 3종류로 나타났다. B. longum의 경우 97% ANI를 기준으로 세 그룹으로 묶였으며, 나머지 종은 95% ANI 기준으로 두 그룹으로 묶였다.As can be seen in Figures 2a to 2c, 95% based on the case shown the two or more groups are sikyak notification destination 19 species of Bifidobacterium ronggeom (Bifidobacterium longum ; B. longum ) and Lactococcus lactis. lactis ; Lc . lactis), and different species but Lactobacillus casei (Lactobacillus casei, separated by species such as ANI criteria; were three types to L.paracasei); L. casei) and Lactobacillus casei Farah (Lactobacillus paracasei. B. longum was grouped into three groups based on 97% ANI, and the remaining species were grouped into two groups based on 95% ANI.

같은 종임에도 불구하고 서로 간의 유사성이 낮아(ANI 기준 다른 종으로 구분) 아종을 구분하지 못하여 검출을 하지 못하는 일이 생길 수 있기 때문에, 이들 중 종별 대표 균주를 B.longum 2 균주, L.casei 1 균주, Lc.lactis 1 균주 추가로 선정하여 126종으로부터 총 130균주를 얻었다.Although they are of the same species, the similarity between them is low (differentiated according to the ANI standard), and the subspecies cannot be distinguished and detection may not be possible. A total of 130 strains were obtained from 126 strains, Lc.lactis 1 strains were additionally selected.

구체적으로, Lc.lactis는 Lc.lactis.lactis, Lc.lactis.cremoris 두 가지의 아종을 가지고 있다. 같은 종임에도 불구하고 두 아종 사이의 ANI는 78 정도 밖에 되지 않는데(ANI 기준 95%가 넘는 것이 같은 종의 기준) 이는 Lc.lactis.lactis에서만 대표 균주를 뽑아 참조 데이터베이스로 이용하면 Lc.lactis.cremoris를 검출해 내지 못하는 일을 발생시킨다. 그러므로 종내에 ANI 기준으로 그룹이 생기는 경우 그룹별로 대표균주를 추가 선정하였다.Specifically, Lc.lactis has two subspecies, Lc.lactis.lactis and Lc.lactis.cremoris. Even though the two subspecies are of the same species, the ANI between the two subspecies is only about 78 (more than 95% of the ANI standard is the same species standard). causes things that cannot be detected. Therefore, if a group was created based on ANI in the species, representative strains were additionally selected for each group.

종내 대표 균주를 참조서열로 하여 구한 1-1 페어와이즈 커버리지는 종별로 큰 차이를 보였는데 그 중 최소값은 B. longum에서 95% ANI 기준 0.7137이었다. 이 값은 종 내에 전체 유전체 서열상의 변이들이 얼마나 많은지에 따라 달라지는 것으로 보인다. 특히, 0.7137의 최소값을 보인 B. longum의 경우 97% ANI 기준 3개의 그룹으로 나뉘어졌었고, 그룹별로 나누어 그룹내 대표 균주와의 1-1 페어와이즈 커버리지는 0.8453까지 증가하였다. 이 수치는 종내의 아종에 대한 분리 기준이므로, 종 간의 분리 기준은 0.7137로 이용하였다.The 1-1 pairwise coverage obtained by using the representative strain within the species as a reference sequence showed a significant difference between species, and the minimum value was 0.7137 based on 95% ANI in B. longum. This value appears to depend on how many variations there are in the entire genome sequence within a species. In particular, in the case of B. longum , which showed a minimum value of 0.7137, it was divided into three groups based on 97% ANI, and the 1-1 pairwise coverage with the representative strain in the group increased to 0.8453 by dividing by group. Since this figure is a separation criterion for subspecies within a species, the separation criterion between species was used as 0.7137.

표 6에서 확인할 수 있듯이, 각 균주들을 종내 대표 균주에 정렬시켰을 때와 대표 균주 130개를 한 번에 묶은 집단에 정렬 시켰을 때의 커버리지 차이가 매우 적어(<0.17%) 95% ANI에 해당하는 커버리지 기준 값을 0.7137로 이용하였다.As can be seen in Table 6, the coverage difference between when each strain is aligned to a representative strain within a species and when aligning 130 representative strains to a group at once is very small (<0.17%), so the coverage corresponding to 95% ANI A reference value of 0.7137 was used.

Lactobacillus_Lactobacillus_ brevisbrevis Lactobacillus_helveticusLactobacillus_helveticus 균주strain ANIANI 커버리지coverage 차이Difference 균주strain ANIANI 커버리지coverage 차이Difference 대표 균주Representative strains 대표 집단representative group 대표 균주Representative strains 대표 집단representative group 5269352693 99.07399.073 0.94480.9448 0.94510.9451 0.00030.0003 5281952819 98.66898.668 0.92690.9269 0.92650.9265 0.00040.0004 5270752707 97.38397.383 0.89340.8934 0.89390.8939 0.00050.0005 5282152821 97.25497.254 0.8740.874 0.87340.8734 0.00060.0006 5271352713 96.86696.866 0.88570.8857 0.8870.887 0.00130.0013 5282252822 97.94197.941 0.89050.8905 0.88980.8898 0.00070.0007 5271452714 96.9296.92 0.88540.8854 0.88650.8865 0.00110.0011 5282352823 98.76598.765 0.93210.9321 0.93180.9318 0.00030.0003 5271552715 97.26497.264 0.90260.9026 0.90090.9009 0.00170.0017 5283252832 98.95598.955 0.93590.9359 0.93550.9355 0.00040.0004 5271652716 97.19897.198 0.90260.9026 0.90110.9011 0.00150.0015 5283352833 99.15399.153 0.94790.9479 0.94760.9476 0.00030.0003 5271752717 99.29899.298 0.94550.9455 0.94620.9462 0.00070.0007 5283452834 97.83897.838 0.88240.8824 0.88160.8816 0.00080.0008 5271852718 99.0999.09 0.95720.9572 0.95810.9581 0.00090.0009 5283952839 98.71298.712 0.94390.9439 0.94310.9431 0.00080.0008 5271952719 98.95398.953 0.94960.9496 0.95020.9502 0.00060.0006 5284052840 98.71798.717 0.94390.9439 0.94310.9431 0.00080.0008

식약처 고시 19종 유산균에 대한 단일 균 WGS 데이터를 NCBI-SRA에서 받아 단일 균에 대한 검출능을 살펴본 결과, 표 7에 나타난 바와 같이 L. casei, L.paracasei, L. helveticus에서 두 개 이상의 종이 검출되었고, 나머지 종들은 하나의 종만 검출되었다.Sikyak notification destination 19 kinds of lactic acid bacteria single WGS results data received by the NCBI-SRA examined the detection capability of a single fungus, more than one paper in L. casei, L.paracasei, L. helveticus, as shown in Table 7 for the were detected, and only one species was detected in the remaining species.

SpeciesSpecies 접근번호access number 검출 (커버리지)detection (coverage) Next (커버리지)Next (coverage) Lc.lactisLc.lactis ERX231530ERX231530 0.920.92 0.070.07 S.thermophilusS. thermophilus SRX2610845SRX2610845 0.970.97 0.250.25 L.acidophilusL. acidophilus SRX2610831SRX2610831 1.001.00 0.050.05 L.plantarumL. plantarum ERX1625346ERX1625346 0.940.94 0.140.14 E.faeciumE. faecium ERX2085159ERX2085159 0.890.89 0.070.07 B.longumB. longum ERX1960389ERX1960389 0.740.74 0.180.18 B.animalisB.animalis SRX2610848SRX2610848 0.890.89 0.050.05 B.breveB. breve SRX2610844SRX2610844 0.940.94 0.150.15 L.delbrueckiiL. delbrueckii ERX231531ERX231531 0.960.96 0.170.17 E.faecalisE. faecalis ERX2102726ERX2102726 0.930.93 0.010.01 L.rhamnosusL. rhamnosus SRX2610827SRX2610827 0.930.93 0.040.04 L.salivariusL. salivarius SRX2268576SRX2268576 0.880.88 0.190.19 L.gasseriL. gasseri ERX980028ERX980028 0.770.77 0.190.19 L.reuteriL. reuteri SRX2268579SRX2268579 0.830.83 0.100.10 L.fermentumL. fermentum SRX2268582SRX2268582 0.880.88 0.110.11 B.bifidumB. bifidum ERX1101269ERX1101269 0.940.94 0.020.02 L.caseiL. casei ERX450901ERX450901 0.880.88 0.86 (L.paracasei)0.86 ( L. paracasei ) 0.070.07 L.paracaseiL. paracasei ERX178725ERX178725 0.870.87 0.88 (L.casei)0.88 ( L. casei ) 0.170.17 L.helveticusL. helveticus SRX2268585SRX2268585 0.850.85 0.73 (L.gallinarum)0.73 ( L. gallinarum ) 0.110.11

표 7의 Next는 검출된 종 그 다음으로 높게 나온 커버리지 값이다. 단일 종의 시퀀싱 데이터를 이용한 것이기에 검출 파이프라인의 결과로 단 하나의 종만 검출이 되어야 하지만, 만약 Lc.lactis의 단일 종 데이터에서 0.92의 커버리지로 Lc.lactis가 검출되었고 그 다음으로 높게 나온 커버리지가 0.6의 커버리지가 나오게 되면 이는 검출되지는 않았지만 여러 종이 섞인 데이터에서는 잘못 검출될 수도 있음을 의미한다. 두 번째로 높게 나온 커버리지가 0.0 내지 0.2로 낮게 나와 정확히 하나의 종만을 검출했다는 것을 확인할 수 있는 항목이다.Next in Table 7 is the highest coverage value after the detected species. Since sequencing data of a single species was used, only one species should be detected as a result of the detection pipeline, but if Lc.lactis was detected with a coverage of 0.92 in the single species data of Lc.lactis, the coverage with the next highest was 0.6 If the coverage of . The second highest coverage was as low as 0.0 to 0.2, indicating that exactly one species was detected.

ANI를 기준으로 구별할 수 없었던 종이 동시에 검출된 경우 참조 서열로 두 종내 모든 균주를 이용하였는데, L. paracasei의 SRA 데이터를 L. paracaseiL. casei의 전체 유전체(18 균주)를 참조서열로 삼아 정렬하여 표 8과 같이 나타내었다.If it is detected at the same time which could not be distinguished based on the ANI of paper it was used for two intraspecific all strains as a reference sequence, and make SRA data of L. paracasei the whole genome (18 strains) of the L. paracasei and L. casei as a reference sequence Sorted and shown in Table 8.

검출detection 디폴트 옵션default option All (-a)All (-a) All+perfect
(-a --score-min 'C,0,-1')
All+perfect
(-a --score-min 'C,0,-1')
L.paracaseiL. paracasei 0.74200.7420 0.91190.9119 0.69330.6933 L.caseiL. casei 0.73370.7337 0.90060.9006 0.68960.6896

기본 옵션, multi-fasta 기준 모든 곳에 다 붙는 All 옵션 및 참조서열과 정확히 일치해야 붙는 perfect 옵션까지 총 세 가지 옵션 전부에서 L. paracasei 균주가 제일 높은 커버리지를 보여 정확히 검출되었다. L. caseiL. paracasei의 경우 bowtie2 -a 옵션을 주었을 때 정확히 해당하는 종이 검출되었다. 또한, L.helveticus의 경우 L. gallinarum의 균주가 1개 밖에 없어 서로의 ANI가 95%가 넘는 것만을 확인하였다. The L. paracasei strain showed the highest coverage in all three options, including the basic option, the All option that attaches everywhere in the multi-fasta standard, and the perfect option that must match the reference sequence exactly, and was accurately detected. In the case of L. casei and L. paracasei , exactly the corresponding species was detected when bowtie2 -a option was given. In addition, in the case of L. helveticus , there was only one strain of L. gallinarum , so it was confirmed that each other's ANI was over 95%.

실시예 4: 혼합 메타지놈 샘플로부터의 검출능 확인Example 4: Confirmation of detectability from mixed metagenome samples

첫째로 시뮬레이션 데이터를 사용한 경우, 본 발명에 따른 방법과 MetaPhlan, MetaPhlan 2 모두에서 정확히 시뮬레이션한 10종이 검출되었다.First, when simulation data were used, 10 accurately simulated species were detected in both the method according to the present invention and MetaPhlan and MetaPhlan 2.

뎁스의 경우 genomcov 파일에서 참조 서열의 bp 하나당 리드가 쌓인 횟수를 모두 더한 뒤 시퀀스 길이로 나누어 구했으며 리드 개수의 경우 samtools idxstats를 이용하여 실제 각 대표 균주에 붙은 리드 수를 적어 표 9로 나타내었다.In the case of the depth, the number of reads per bp of the reference sequence was added up in the genomcov file and divided by the sequence length. For the number of reads, the number of reads actually attached to each representative strain was written down using samtools idxstats and is shown in Table 9

SpeciesSpecies Methphaln 1Methphaln 1 Methphaln 2Methphaln 2 뎁스depth L.reuteriL. reuteri 8.178.17 7.927.92 9.989.98 L.delbrueckiiL. delbrueckii 10.2910.29 10.1710.17 10.0610.06 L.rhamnosusL. rhamnosus 10.8410.84 9.629.62 10.1310.13 B.longumB. longum 8.698.69 7.957.95 9.779.77 L.acidophilusL. acidophilus 12.0612.06 11.7811.78 10.5110.51 B.bifidumB. bifidum 11.2711.27 11.3511.35 10.3910.39 L.salivariusL. salivarius 8.838.83 9.349.34 10.0810.08 L.fermentumL. fermentum 10.7410.74 11.1211.12 9.519.51 B.breveB. breve 9.469.46 10.5610.56 10.0710.07 E.faecalisE. faecalis 9.669.66 10.1810.18 9.499.49

표 9 및 도 3에서 확인할 수 있듯이, 리드의 길이가 100 bp로 모두 같아 각 대표 균주의 리드 개수/시퀀스 길이의 비율과 뎁스의 비율이 일치하였다.As can be seen in Table 9 and FIG. 3 , the read lengths were all equal to 100 bp, so the ratio of the number of reads/sequence length of each representative strain and the ratio of the depth were identical.

표 10에서 확인할 수 있듯이, 시뮬레이션 샘플 안의 각종 비율의 분산 값은 뎁스에서 0.11이 나와 MetaPhlan 1의 1.56, MetaPhlan 2의 1.75보다 낮았다.As can be seen in Table 10, the variance values of various ratios in the simulation sample were 0.11 in depth, which was lower than 1.56 of MetaPhlan 1 and 1.75 of MetaPhlan 2.

프로그램명Program name 분산Dispersion DepthDepth 0.110.11 MetaPhlan 1MetaPhlan 1 1.561.56 MetaPhlan 2MetaPhlan 2 1.751.75

두 번째로 NCBI-SRA에서 단일 균종 데이터 10개를 받아 합친 메타지놈에 대해 프로그램을 작동시킨 결과 10종이 검출되었고, MetaPhlan에선 41종, MetaPhlan 2에선 37종이 검출되어 MetaPhlan 1 또는 MetaPhlan 2에 비해 정확한 결과를 도출할 수 있음을 확인하였다.Second, 10 species were detected as a result of running the program on the metagenome that received 10 single species data from NCBI-SRA, and 41 species were detected in MetaPhlan and 37 species were detected in MetaPhlan 2, which is more accurate than MetaPhlan 1 or MetaPhlan 2. It was confirmed that .

세 번째 실제 데이터는 19종의 유산균이 들어간 illumina paired-end 데이터를 이용하여 데이터의 용량별, 정렬시킬 때의 옵션별 소요시간을 확인하고 각각 표 11 및 12로 나타내었다.The third actual data used illumina paired-end data containing 19 kinds of lactic acid bacteria to confirm the time required for each data volume and each option for sorting, and are shown in Tables 11 and 12, respectively.

수행Perform 프로그램 (버전)Program (version) 디폴트default 50%50% 25%25% 10%10% 5%5% AlignAlign Bowtie2 (2.3.3.1)Bowtie2 (2.3.3.1) 383383 105105 6060 2020 1010 BAM file sortingBAM file sorting Samtools (1.3.1)Samtools (1.3.1) 4040 1515 88 33 22 genomecovgenomecov Bedtools (v2.20.1)Bedtools (v2.20.1) 2828 1010 55 22 1One SumSum 451451 130130 7373 2525 1313

표 11 및 도 4a에서 확인할 수 있듯이, Illumina paired-end 데이터의 경우 용량이 약 60Gb(30Gb*2) 이었는데 용량을 30Gb, 15Gb, 7.5Gb 및 3Gb(각각 15Gb*2, 7.5Gb*2, 3Gb*2 및 1.5Gb*2)로 줄여가면서 소요된 시간을 계산한 결과, 차례로 451분, 130분, 73분, 25분, 15분으로 줄어들었다.As can be seen in Table 11 and Figure 4a, for Illumina paired-end data, the capacity was about 60 Gb (30 Gb*2), but the capacity was changed to 30 Gb, 15 Gb, 7.5 Gb and 3 Gb (15 Gb*2, 7.5 Gb*2, 3 Gb*, respectively). 2 and 1.5 Gb*2), the time required was reduced to 451 minutes, 130 minutes, 73 minutes, 25 minutes, and 15 minutes, respectively.

데이터의 용량을 줄이는 것은 리드 생성량을 줄여서 수행할 수 있고, 리드 생성량은 시퀀싱 과정에서 조절할 수 있다. 또는 생성된 리드에서 랜덤 샘플링을 통해 용량을 원하는 만큼 줄일 수 있다(sampling을 통해 리드 수를 10,000개로 줄이는 명령어: seqtk sample -s100 read1.fq 10,000 > sub1.fq).Reducing the data capacity may be performed by reducing the amount of read generation, and the amount of read generation may be adjusted during the sequencing process. Alternatively, the capacity can be reduced as desired through random sampling from the generated reads (command to reduce the number of reads to 10,000 through sampling: seqtk sample -s100 read1.fq 10,000 > sub1.fq).

표 12 및 도 4b에서 확인할 수 있듯이, Bowtie2의 -very-fast 옵션(Bowtie2의 옵션 중 하나로 덜 민감하게 정렬시켜 시간을 줄이는 옵션)과 참조 서열을 연결할 때의 경우는 각각 242분, 295분으로 줄어들었다.As can be seen in Table 12 and Figure 4b, when linking the -very-fast option of Bowtie2 (one of Bowtie2's options to reduce the time by aligning less sensitively) and the reference sequence, it was reduced to 242 and 295 minutes, respectively It was.

수행Perform 프로그램 (버전)Program (version) 디폴트default very-fastvery-fast 참조서열 연결Reference Sequence Linking AlignAlign Bowtie2 (2.3.3.1)Bowtie2 (2.3.3.1) 383383 190190 235235 BAM file sortingBAM file sorting Samtools (1.3.1)Samtools (1.3.1) 4040 3535 4444 genomecovgenomecov Bedtools (v2.20.1)Bedtools (v2.20.1) 2828 1717 1616 SumSum 451451 242242 295295

표 13에서 확인할 수 있듯이, 식약처 고시 유산균 19종이 모두 포함된 illumina platform 형식의 실제 데이터의 각 프로그램 별 검출 현황으로서는 비피도박테리움 비피덤(Bifidobacterium bifidum; B. bifidum)이 0.3722로 불검출되었고 나머지 18종은 검출되었다. MetaPhlan의 경우 19종에 2종이 추가 검출되었으며, MetaPhlan 2의 경우 바이러스 2종을 포함하여 5종이 추가로 검출되었다.As can be seen in Table 13, the detection status for each program of the illumina platform format including all 19 species of lactic acid bacteria notified by the Ministry of Food and Drug Safety is Bifidobacterium Bifidobacterium bifidum ; B. bifidum ) was not detected as 0.3722 and the remaining 18 species were detected. In the case of MetaPhlan, 2 additional types were detected in 19 types, and in the case of MetaPhlan 2, 5 additional types including 2 viruses were detected.

프로그램program 검출detection 불검출non-detection 본 발명의 실시예embodiment of the present invention 1818 1 (B.bifidum)1 ( B. bifidum ) MetaPhlan 1MetaPhlan 1 2121 -- MetaPhlan 2MetaPhlan 2 2424 --

표 14에서 확인할 수 있듯이, 5개의 유산균 제품에서 뽑아낸 ion torrent 데이터의 경우 다음과 같은 유산균을 가지고 있다고 표시되었다.As can be seen in Table 14, in the case of ion torrent data extracted from 5 lactic acid bacteria products, it was indicated to have the following lactic acid bacteria.

유산균 제품lactic acid bacteria products ## Detect speciesDetect species 1_0511_051 1212 L.L. rhamnosusrhamnosus , L., L. paracaseiparacasei , L., L. caseicasei , B., B. longumlongum , B., B. brevebreve , B., B. animalisanimalis , E.faecium, L., E. faecium, L. plantarumplantarum , L.acidophilus, S., L. acidophilus, S. thermophilusthermophilus , , LcLc .. lactislactis , B.subtilis, B. subtilis 4_0524_052 44 B.longum, B.breve, L.plantarum, S.thermophilusB.longum, B.breve, L.plantarum, S.thermophilus 7_0537_053 1010 L.L. reuterireuteri , L., L. rhamnosusrhamnosus , L., L. caseicasei , B., B. longumlongum , L., L. delbrueckiidelbrueckii , B., B. brevebreve , B.animalis, L.plantarum, L.acidophilus, B.animalis, L.plantarum, L.acidophilus 10_05410_054 77 B.B. bifidumbifidum , L., L. rhamnosusrhamnosus , B., B. longumlongum , B., B. animalisanimalis , L., L. plantarumplantarum , L.acidophilus, L.casei, L. acidophilus, L. casei 19_05519_055 55 B.bifidum, L.rhamnosus, B.longum, L.plantarum, L.acidophilusB.bifidum, L.rhamnosus, B.longum, L.plantarum, L.acidophilus

상기 5개의 유산균 제품 시료에 대한 각 프로그램 별 검출종의 결과를 하기 표 15에 나타내었다.The results of the detection species for each program for the five lactic acid bacteria product samples are shown in Table 15 below.

프로그램program 검출detection 1_0511_051 4_0524_052 7_0537_053 10_05410_054 19_05519_055 본 발명의 실시예embodiment of the present invention 1212 44 9
(B. bifidum 불검출)
9
( B. bifidum not detected)
77 55
MetaPhlan 1MetaPhlan 1 11671167 335335 11781178 10351035 757757 MetaPhlan 2MetaPhlan 2 1212 44 1010 8(L. casei 불검출)8 ( L. casei not detected) 99

표 14 및 15 모두에서, 053번 유산균 제품에서는 B. bifidum이 0.6004의 커버리지로 기준인 0.7137을 넘지 못하여 불검출되었다.In both Tables 14 and 15, in the lactic acid bacteria product No. 053, B. bifidum was not detected because it did not exceed the standard 0.7137 with a coverage of 0.6004.

MetaPhlan의 경우 5개 모든 제품에서 초과검출이 나왔으며 많게는 1100여종까지 검출되었다. MetaPhlan 2에선 054 제품에서 L. casei 불검출 및 바이러스 한 종, L. zeae 검출되었고 055 제품에서 표시된 균 5종 외에 바이러스 한 종과 L. helveticus, Lc . lactis, S. thermophilus가 추가 검출되었다.In the case of MetaPhlan, excess detection was found in all 5 products, and up to 1,100 species were detected. In MetaPhlan 2, 054 product did not detect L. casei and one virus and L. zeae were detected. In addition to 5 bacteria indicated in 055 product, one virus and L. helveticus , Lc . lactis and S. thermophilus were additionally detected.

초과검출이 나오는 동정방법을 이용할 경우, 유산균 제품 허가 등과 같은 상황에서 실제로 제품 안에 존재하지 않은 유산균을 존재한다고 표시하고 허가를 받는 문제가 발생할 수 있다.If an identification method with excess detection is used, there may be a problem of obtaining permission by indicating that lactic acid bacteria that are not actually present in the product exist in situations such as approval of lactic acid bacteria products.

본 발명에 따른 방법은, 특히 여러 가지 균이 섞여 있는 경우, MetaPhlan, MetaPhlan 2와 비교하여 초과검출을 확실하게 제어할 수 있어 샘플 내 종의 존재에 대해 신뢰성 높은 결과를 보여주었다. 커버리지에 대한 동정법은 MetaPhlan과 MetaPhlan 2와는 달리 초과검출이 없었다. The method according to the present invention, in particular, when various bacteria are mixed, can control over-detection more reliably compared to MetaPhlan and MetaPhlan 2, showing reliable results for the presence of species in the sample. There was no overdetection in the identification method for coverage, unlike MetaPhlan and MetaPhlan 2.

표 16 및 도 5에서 확인할 수 있듯이, 본 발명에 따른 방법을 수행하였을 경우 불검출을 해결하지 못하였다. 구체적으로, 19종의 illumina 데이터에서 B.bifidum을 검출해 내지 못하였고, 053 유산균에서도 B. bifidum을 검출해 내지 못하였다.As can be seen in Table 16 and FIG. 5, the non-detection could not be solved when the method according to the present invention was performed. Specifically, it had failed to detect the B.bifidum illumina data from the 19 species of lactic acid bacteria in 053 were not to detect the B. bifidum.

Probiotics_7_053Probiotics_7_053 Bifidobacterium_bifidumBifidobacterium_bifidum 0.6003710.600371 Lactobacillus_reuteriLactobacillus_reuteri 0.9998360.999836 Lactobacillus_fermentumLactobacillus_fermentum 0.0229010.022901 Lactobacillus_salivariusLactobacillus_salivarius 0.0125280.012528 Lactobacillus_rhamnosusLactobacillus_rhamnosus 0.9143460.914346 Lactobacillus_gasseriLactobacillus_gasseri 0.016310.01631 Enterococcus_faecalisEnterococcus_faecalis 0.0137270.013727 Lactobacillus_paracaseiLactobacillus_paracasei 0.8786620.878662 Lactobacillus_caseiLactobacillus_casei 0.8670510.867051 Bifidobacterium_longumBifidobacterium_longum 0.8591620.859162 Lactobacillus_helveticusLactobacillus_helveticus 0.0308590.030859 Lactobacillus_delbrueckiiLactobacillus_delbrueckii 0.9146130.914613 Bifidobacterium_breveBifidobacterium_breve 0.8917930.891793 Bifidobacterium_animalisBifidobacterium_animalis 0.9075330.907533 Enterococcus_faeciumEnterococcus_faecium 0.0222370.022237 Lactobacillus_plantarumLactobacillus_plantarum 0.9405380.940538 Lactobacillus_acidophilusLactobacillus_acidophilus 0.9997960.999796 Streptococcus_thermophilusStreptococcus_thermophilus 0.0131840.013184 Lactococcus_lactisLactococcus_lactis 0.0136010.013601

이 이유는 각 종별 뎁스와 리드 수를 파악함으로써 알 수 있었는데 19종 데이터에서의 B. bifidum에 대한 뎁스는 1.21, 리드 수는 17,678개로 극히 낮았고, 053 유산균에서의 B. bifidum에 대한 뎁스는 2.28, 리드 수는 25529로 역시 부족하였다.The reason for this could be known by identifying the depth and the number of reads for each species. The depth for B. bifidum in the 19 species data was 1.21, and the number of reads was 17,678, which was extremely low, and the depth for B. bifidum in 053 lactic acid bacteria was 2.28, The number of leads was also insufficient at 25529.

19종 데이터에서 B. bifidum에 대한 커버리지는 0.3722였고 053 유산균 제품 시료에서 B. bifidum에 대한 커버리지가 0.6이었는데, 이는 샘플량을 늘려 뎁스를 충분하게 조절해주면 검출 가능할 것으로 추측되었다. 19종 유산균 데이터에서 용량을 줄여가면서 분석한 결과를 살펴보면 샘플 내 종의 비율이 1%로 예상될 때 illumina paired-end 데이터로 3Gb*2 만큼의 샘플을 뽑아내면 종을 검출해 내기에 충분하였다.In 19 kinds of data coverage for the B. bifidum are lactic acid bacteria product was 0.3722 was at 053 samples the coverage of the B. bifidum 0.6, which was supposed to be detected by increasing haejumyeon sufficiently control the depth of the sample volume. Looking at the results of analysis while reducing the capacity of the 19 types of lactic acid bacteria data, when the ratio of species in the sample is expected to be 1%, it was enough to detect the species by extracting 3Gb*2 samples from the illumina paired-end data.

Claims (6)

다음 단계를 포함하는 유산균 동정용 참조서열 제조방법:
유산균으로부터 유래한 전체 유전체 서열 정보 데이터를 이용하여 각각의 종 내에서 균주 간 페어와이즈 커버리지(pairwise coverage) 최소값을 도출하는 커버리지 계산 단계;
종마다 균주 중 커버리지 최소값이 가장 큰 균주를 선택하여 종별 대표 균주(strain)로 선정하는 대표 균주 선정 단계; 및
종별 대표 균주들의 서열 정보를 멀티-파스타(multi-fasta) 파일로 생성하는 참조서열 생성 단계.
A method for preparing a reference sequence for identification of lactic acid bacteria comprising the following steps:
a coverage calculation step of deriving a minimum value of pairwise coverage between strains within each species using the entire genome sequence information data derived from lactic acid bacteria;
A representative strain selection step of selecting a strain having the largest coverage minimum value among strains for each species and selecting it as a representative strain (strain) for each species; and
A reference sequence generation step of generating sequence information of representative strains of each species as a multi-fasta file.
삭제delete 다음 단계를 포함하는 유산균 동정방법:
유산균으로부터 유래한 전체 유전체 서열 정보 데이터를 이용하여 각각의 종 내에서 균주 간 페어와이즈 커버리지(pairwise coverage) 최소값을 도출하는 커버리지 계산 단계;
종마다 균주 중 커버리지 최소값이 가장 큰 균주를 선택하여 종별 대표 균주(strain)로 선정하는 대표 균주 선정 단계;
종별 대표 균주들의 서열 정보를 멀티-파스타(multi-fasta) 파일로 생성하는 참조서열 생성 단계;
참조서열 및 종별 대표 균주들의 서열정보 간 페어와이즈 커버리지 최소값을 계산하여 기준값으로 설정하는 기준값 설정 단계; 및
시료에 함유된 유산균의 전체 유전체 서열 정보 및 상기 참조서열 간의 페어와이즈 커버리지(pairwise coverage) 값을 계산하는 서열 비교 단계; 및
상기 서열 비교 단계에서 도출된 값이 상기 기준값을 초과한 경우 해당 균주가 검출된 것으로 판단하는 검출 확인 단계.
A method for identifying lactic acid bacteria comprising the following steps:
a coverage calculation step of deriving a minimum value of pairwise coverage between strains within each species using the entire genome sequence information data derived from lactic acid bacteria;
A representative strain selection step of selecting a strain having the largest coverage minimum value among strains for each species and selecting it as a representative strain (strain) for each species;
a reference sequence generation step of generating sequence information of representative strains of each species as a multi-fasta file;
a reference value setting step of calculating the minimum pairwise coverage value between the reference sequence and the sequence information of the representative strains of each species and setting it as a reference value; and
a sequence comparison step of calculating a pairwise coverage value between the entire genome sequence information of the lactic acid bacteria contained in the sample and the reference sequence; and
A detection confirmation step of determining that the corresponding strain is detected when the value derived in the sequence comparison step exceeds the reference value.
삭제delete 삭제delete 제3항에 있어서, 상기 방법은 시료에 함유된 2종 이상의 유산균을 동시에 검출하는 것인, 유산균 동정방법.The method according to claim 3, wherein the method simultaneously detects two or more types of lactic acid bacteria contained in the sample.
KR1020180108016A 2018-09-10 2018-09-10 Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same KR102270719B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020180108016A KR102270719B1 (en) 2018-09-10 2018-09-10 Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same
PCT/KR2019/011665 WO2020055076A1 (en) 2018-09-10 2019-09-09 Method for preparing reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria by using same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020180108016A KR102270719B1 (en) 2018-09-10 2018-09-10 Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same

Publications (2)

Publication Number Publication Date
KR20200029689A KR20200029689A (en) 2020-03-19
KR102270719B1 true KR102270719B1 (en) 2021-07-01

Family

ID=69777192

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020180108016A KR102270719B1 (en) 2018-09-10 2018-09-10 Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same

Country Status (2)

Country Link
KR (1) KR102270719B1 (en)
WO (1) WO2020055076A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114420212B (en) * 2022-01-27 2022-10-21 上海序祯达生物科技有限公司 Escherichia coli strain identification method and system
KR20230167285A (en) 2022-05-31 2023-12-08 종근당건강 주식회사 Primer set for identifying bacteria species in probiotics composition and identification method using the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160364523A1 (en) * 2015-06-11 2016-12-15 Seven Bridges Genomics Inc. Systems and methods for identifying microorganisms
KR101869832B1 (en) * 2016-05-31 2018-06-21 강원대학교산학협력단 A Novel Enterococcus species specific primer, a method for isolating and identifying specific Enterococcus strain by using the same and a composition therefor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Rey et al., Genomics Data, Vol.9, pp.78-86 (2016) 1부.*

Also Published As

Publication number Publication date
KR20200029689A (en) 2020-03-19
WO2020055076A1 (en) 2020-03-19

Similar Documents

Publication Publication Date Title
De Filippis et al. A selected core microbiome drives the early stages of three popular Italian cheese manufactures
Tanizawa et al. DFAST and DAGA: web-based integrated genome annotation tools and resources
Chung et al. Functional dynamics of bacterial species in the mouse gut microbiome revealed by metagenomic and metatranscriptomic analyses
Amor et al. Advanced Molecular Tools for the Identification of Lactic Acid Bacteria1
Waller et al. Classification and quantification of bacteriophage taxa in human gut metagenomes
Ramasamy et al. A polyphasic strategy incorporating genomic data for the taxonomic description of novel bacterial species
Sun et al. Comparative genomic analysis of 45 type strains of the genus Bifidobacterium: a snapshot of its genetic diversity and evolution
Temmerman et al. Identification of lactic acid bacteria: culture-dependent and culture-independent methods
CN106055924B (en) Microbiological manipulations taxon is determining and sequence assists isolated method and system
CN107653306B (en) Rapid bifidobacterium detection method based on high-throughput sequencing and application
Seol et al. Accurate and strict identification of probiotic species based on coverage of whole-metagenome shotgun sequencing data
Choi et al. Assessment of overall microbial community shift during Cheddar cheese production from raw milk to aging
Kim et al. Novel real-time PCR assay for Lactobacillus casei group species using comparative genomics
Sánchez et al. Polyphasic study of the genetic diversity of lactobacilli associated with ‘Almagro’eggplants spontaneous fermentation, based on combined numerical analysis of randomly amplified polymorphic DNA and pulsed‐field gel electrophoresis patterns
KR102270719B1 (en) Method for preparing a reference sequence for identification of lactic acid bacteria and method for identifying lactic acid bacteria using the same
Bottari et al. The interrelationship between microbiota and peptides during ripening as a driver for Parmigiano Reggiano cheese quality
US20190136299A1 (en) Metagenomic method for in vitro diagnosis of gut dysbiosis
WO2023098152A1 (en) Construction method and system for microbial gene database
Satokari et al. Identification of pediococci by ribotyping
CN110827917A (en) Method for identifying individual intestinal flora type based on SNP
Nakayama Pyrosequence-based 16S rRNA profiling of gastro-intestinal microbiota
Patro et al. Development and utility of the FDA ‘GutProbe’DNA microarray for identification, genotyping and metagenomic analysis of commercially available probiotics
Kim et al. Development of real-time PCR assay to specifically detect 22 Bifidobacterium species and subspecies using comparative genomics
Andrighetto et al. Use of RAPD‐PCR and TTGE for the evaluation of biodiversity of whey cultures for Grana Padano cheese
Lugli et al. The probiotic identity card: a novel “Probiogenomics” approach to investigate probiotic supplements

Legal Events

Date Code Title Description
E902 Notification of reason for refusal
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant