KR20240026931A

KR20240026931A - Professional signal profiler for base calling

Info

Publication number: KR20240026931A
Application number: KR1020237043986A
Authority: KR
Inventors: 압데 알리 후나이드 카갈왈라; 에릭 존 오자드; 라미 메히오; 개빈 데렉 파르나비; 니틴 우드파; 존 에스. 비에셀리
Original assignee: 일루미나, 인코포레이티드
Priority date: 2021-07-19
Filing date: 2022-07-14
Publication date: 2024-02-29
Also published as: WO2023003758A1

Abstract

시스템을 개시한다. 시스템은 메모리 및 런타임 로직을 포함한다. 메모리는 복수의 전문 신호 프로파일러를 저장한다. 복수의 전문 신호 프로파일러 내의 각각의 전문 신호 프로파일러는 특정 분석물 클래스에서 분석물에 대해 검출되고 특정 훈련 데이터 세트에서 특징지어지는 특정 신호 프로파일에서 시퀀싱된 신호의 신호 대 잡음비를 최대화하도록 훈련된다. 메모리에 액세스하는 런타임 로직은, 염기 호출 동작 동안 각각의 분석물 클래스 내의 분석물에 대해 검출된 각각의 신호 프로파일에서 시퀀싱된 신호에 복수의 전문 신호 프로파일러 내의 각각의 전문 신호 프로파일러를 적용함으로써 염기 호출 동작을 실행하도록 구성된다.Start the system. The system includes memory and runtime logic. The memory stores multiple specialized signal profilers. Each expert signal profiler within the plurality of expert signal profilers is trained to maximize the signal-to-noise ratio of signals detected for an analyte in a particular analyte class and sequenced at a particular signal profile characterized in a particular training data set. The runtime logic that accesses the memory performs base calling operations by applying each specialized signal profiler in the plurality of specialized signal profilers to the sequenced signal in each detected signal profile for an analyte within each analyte class. It is configured to execute a calling operation.

Description

Professional signal profiler for base calling

우선권 출원priority application

본 출원은 2022년 6월 13일자로 출원되고 발명의 명칭이 "Specialist Signal Profilers for Base Calling"인 미국 정규 특허 출원 제17/839,353호(대리인 문서 번호 ILLM 1041-2/IP-2063-US)에 대한 우선권의 이익을 주장하며, 이는 2021년 7월 19일자로 출원되고 발명의 명칭이 "Specialist Signal Profilers for Base Calling"인 미국 임시 특허 출원 제63/223,408호(대리인 문서 번호 ILLM 1041-1/IP-2063-PRV)에 대한 우선권을 주장한다. 우선권 출원들은 모든 목적들을 위해 이로써 참고로 포함된다.This application is filed on June 13, 2022 and is entitled "Specialist Signal Profilers for Base Calling," U.S. Provisional Patent Application Serial No. 17/839,353 (Attorney Docket No. ILLM 1041-2/IP-2063-US) Claims the benefit of priority to U.S. Provisional Patent Application No. 63/223,408, filed July 19, 2021, entitled "Specialist Signal Profilers for Base Calling" (Attorney Docket No. ILLM 1041-1/IP) -2063-PRV). The priority applications are hereby incorporated by reference for all purposes.

기술분야Technology field

개시된 기술은 이미지의 자동 분석 또는 패턴의 인식을 위한 장치 및 대응하는 방법들에 관한 것이다. 본원에는, 이미지를 하기 (a) 내지 (c)의 목적을 위해 변환하는 시스템들이 포함된다: (a) 그의 시각적 품질을 인식 전에 향상시키는 것, (b) 센서 또는 저장된 프로토타입에 대해 이미지를 위치시키고 등록하거나, 또는 무관한 데이터를 폐기함으로써 이미지 데이터의 양을 감소시키는 것, 및 (c) 이미지의 유의한 특성들을 측정하는 것. 특히, 개시된 기술은 등화 기반 이미지 프로세싱 기법들을 사용하여 센서 픽셀들로부터 공간적 크로스토크를 제거하는 것에 관한 것이다.The disclosed technology relates to devices and corresponding methods for automatic analysis of images or recognition of patterns. Included herein are systems that transform an image for the purposes of (a) through (c): (a) enhancing its visual quality prior to recognition, (b) positioning the image relative to a sensor or stored prototype. (c) reducing the amount of image data by registering and registering, or discarding irrelevant data, and (c) measuring significant characteristics of the image. In particular, the disclosed technology relates to removing spatial crosstalk from sensor pixels using equalization-based image processing techniques.

참조 문헌들References

다음은 본 명세서에 충분히 설명된 것처럼 모든 목적들을 위해 참고로 포함된다:The following is incorporated by reference for all purposes as if fully set forth herein:

2021년 5월 4일자로 출원된 "Equalization-Based Image Processing and Spatial Crosstalk Attenuator"라는 명칭의 미국 정규 특허 출원 제17/308,035호(대리인 문서 번호 ILLM 1032-2/IP-1991-PRV);U.S. Provisional Patent Application No. 17/308,035, entitled “Equalization-Based Image Processing and Spatial Crosstalk Attenuator,” filed May 4, 2021 (Attorney Docket No. ILLM 1032-2/IP-1991-PRV);

2020년 10월 27일자로 출원된 "Systems and Methods for Per-Cluster Intensity Correction and Base Calling"이라는 명칭의 미국 임시 특허 출원 제63/106,256호;U.S. Provisional Patent Application No. 63/106,256, entitled “Systems and Methods for Per-Cluster Intensity Correction and Base Calling,” filed October 27, 2020;

2018년 3월 1일자로 출원된 "Optical Distortion Correction for Imaged Samples"라는 명칭의 미국 정규 특허 출원 제15/909,437호;U.S. Provisional Patent Application No. 15/909,437, entitled “Optical Distortion Correction for Imaged Samples,” filed March 1, 2018;

2014년 10월 31일자로 출원된 "Image Analysis Useful for Patterned Objects"라는 명칭의 미국 정규 특허 출원 제14/530,299호;U.S. Provisional Patent Application No. 14/530,299, entitled “Image Analysis Useful for Patterned Objects,” filed October 31, 2014;

2014년 12월 3일자로 출원된 "Methods and Systems for Analyzing Image Data"라는 명칭의 미국 정규 특허 출원 제15/153,953호;U.S. regular patent application Ser. No. 15/153,953, entitled “Methods and Systems for Analyzing Image Data,” filed December 3, 2014;

2018년 1월 5일자로 출원된 "Phasing Correction"이라는 명칭의 미국 정규 특허 출원 제15/863,241호;U.S. Provisional Patent Application No. 15/863,241, entitled “Phasing Correction,” filed January 5, 2018;

2013년 9월 6일자로 출원된 "Centroid Markers for Image Analysis of High Density Clusters in Complex Polynucleotide Sequencing"이라는 명칭의 미국 정규 특허 출원 제14/020,570호;U.S. Provisional Patent Application No. 14/020,570, entitled “Centroid Markers for Image Analysis of High Density Clusters in Complex Polynucleotide Sequencing,” filed September 6, 2013;

2009년 9월 23일자로 출원된 "Method and System for Determining the Accuracy of DNA Base Identifications"라는 명칭의 미국 정규 특허 출원 제12/565,341호;U.S. Provisional Patent Application No. 12/565,341, entitled “Method and System for Determining the Accuracy of DNA Base Identifications,” filed September 23, 2009;

2007년 3월 30일자로 출원된 "Systems and Devices for Sequence by Synthesis Analysis"라는 명칭의 미국 정규 특허 출원 제12/295,337호;U.S. regular patent application Ser. No. 12/295,337, entitled “Systems and Devices for Sequence by Synthesis Analysis,” filed March 30, 2007;

2008년 1월 28일자로 출원된 "Image Data Efficient Genetic Sequencing Method and System"이라는 명칭의 미국 정규 특허 출원 제12/020,739호;U.S. Provisional Patent Application No. 12/020,739, entitled “Image Data Efficient Genetic Sequencing Method and System,” filed January 28, 2008;

2013년 3월 15일자로 출원된 "Biosensors for Biological or Chemical Analysis and Systems and Methods for Same"이라는 명칭의 미국 정규 특허 출원 제13/833,619호(대리인 문서 번호 IP-0626-US);U.S. Provisional Patent Application No. 13/833,619, entitled “Biosensors for Biological or Chemical Analysis and Systems and Methods for Same,” filed March 15, 2013 (Attorney Docket No. IP-0626-US);

2016년 6월 7일자로 출원된 "Biosensors for Biological or Chemical Analysis and Methods of Manufacturing the Same"이라는 명칭의 미국 정규 특허 출원 제15/175,489호(대리인 문서 번호 IP-0689-US);U.S. Provisional Patent Application No. 15/175,489, entitled “Biosensors for Biological or Chemical Analysis and Methods of Manufacturing the Same,” filed June 7, 2016 (Attorney Docket No. IP-0689-US);

2013년 4월 26일자로 출원된 "Microdevices and Biosensor Cartridges for Biological Or Chemical Analysis and Systems and Methods for the Same"이라는 명칭의 미국 정규 특허 출원 제13/882,088호(대리인 문서 번호 IP-0462-US);U.S. Provisional Patent Application No. 13/882,088, entitled “Microdevices and Biosensor Cartridges for Biological Or Chemical Analysis and Systems and Methods for the Same,” filed April 26, 2013 (Attorney Docket No. IP-0462-US);

2012년 9월 21일자로 출원된 "Methods and Compositions for Nucleic Acid Sequencing"이라는 명칭의 미국 정규 특허 출원 제13/624,200호(대리인 문서 번호 IP-0538-US);U.S. Provisional Patent Application No. 13/624,200, entitled “Methods and Compositions for Nucleic Acid Sequencing,” filed September 21, 2012 (Attorney Docket No. IP-0538-US);

2011년 1월 13일자로 출원된 "Data Processing System and Methods"라는 명칭의 미국 정규 특허 출원 제13/006,206호;U.S. Provisional Patent Application No. 13/006,206, entitled “Data Processing System and Methods,” filed January 13, 2011;

2018년 3월 26일자로 출원되고 발명의 명칭이 "Detection Apparatus Having a Microfluorometer, A Fluidic System, and a Flow Cell Latch Clamp Module"이라는 명칭의 미국 정규 특허 출원 제15/936,365호;U.S. Regular Patent Application No. 15/936,365, filed March 26, 2018 and entitled “Detection Apparatus Having a Microfluorometer, A Fluidic System, and a Flow Cell Latch Clamp Module”;

2019년 9월 11일자로 출원된 "Flow Cells and Methods Related to Same"이라는 명칭의 미국 정규 특허 출원 제16/567,224호;U.S. Provisional Patent Application No. 16/567,224, entitled “Flow Cells and Methods Related to Same,” filed September 11, 2019;

2019년 6월 12일자로 출원된 "Device for Luminescent Imaging"이라는 명칭의 미국 정규 특허 출원 제16/439,635호;U.S. Provisional Patent Application No. 16/439,635, entitled “Device for Luminescent Imaging,” filed June 12, 2019;

2017년 5월 12일자로 출원된 "Integrated Optoelectronic Read Head and Fluidic Cartridge Useful for Nucleic Acid Sequencing"이라는 명칭의 미국 정규 특허 출원 제15/594,413호;U.S. Provisional Patent Application No. 15/594,413, entitled “Integrated Optoelectronic Read Head and Fluidic Cartridge Useful for Nucleic Acid Sequencing,” filed May 12, 2017;

2019년 3월 12일자로 출원된 "Illumination for Fluorescence Imaging Using Objective Lens"라는 명칭의 미국 정규 특허 출원 제16/351, 193호;U.S. Regular Patent Application No. 16/351, No. 193, entitled “Illumination for Fluorescence Imaging Using Objective Lens,” filed March 12, 2019;

2009년 12월 15일자로 출원된 "Dynamic Autofocus Method and System for Assay Imager"라는 명칭의 미국 정규 특허 출원 제12/638,770호;U.S. Provisional Patent Application No. 12/638,770, entitled “Dynamic Autofocus Method and System for Assay Imager,” filed December 15, 2009;

2013년 3월 1일자로 출원된 "Kinetic Exclusion Amplification of Nucleic Acid Libraries"라는 명칭의 미국 정규 특허 출원 제13/783,043호; 및U.S. Provisional Patent Application No. 13/783,043, entitled “Kinetic Exclusion Amplification of Nucleic Acid Libraries,” filed March 1, 2013; and

2020년 3월 21일자로 출원된 "Artificial Intelligence-Based Sequencing"이라는 명칭의 미국 정규 특허 출원 제16/826,168호(대리인 문서 번호 ILLM 1008-20/IP-1752-PRV).U.S. Provisional Patent Application No. 16/826,168, entitled “Artificial Intelligence-Based Sequencing,” filed March 21, 2020 (Attorney Docket No. ILLM 1008-20/IP-1752-PRV).

이 섹션에서 논의되는 주제는 이 섹션에서 언급된 결과만으로 선행 기술로 가정되어서는 안 된다. 마찬가지로, 이 섹션에서 언급되거나 배경으로서 제공된 주제와 관련된 문제는 선행 기술에서 이전에 인식된 것으로 가정되어서는 안 된다. 이 섹션의 주제는 단지 다양한 접근법을 나타낼 뿐이며, 그 자체로 청구된 기술의 구현에 해당할 수도 있다.The subject matter discussed in this section should not be assumed to be prior art solely as a result of its mention in this section. Likewise, it should not be assumed that any matter related to the subject matter mentioned or provided as background in this section has been previously recognized in the prior art. The subject matter of this section merely represents various approaches and may itself constitute implementations of the claimed technology.

염기 호출 정확도는 고처리량 시퀀싱 및 하류 분석, 예컨대 판독 맵핑 및 게놈 조립에 중요하다. 본 개시내용은 시퀀싱 런(run) 동안 정확히 염기 호출 클러스터에 대한 이미지 데이터를 최적화하는 것에 관한 것이다. 이미지 데이터 최적화의 한 가지 문제는 염기 호출되는 클러스터 모집단에서 클러스터의 세기 프로파일(또는 세기 분포)의 변동이다. 변동의 규모를 관리하기 어렵게 만들어 데이터 처리량을 감소시키고 오류율을 증가시키킴에 따라, 이는 많은 수(예: 수천, 수백만, 수십억 등)의 클러스터를 갖는 기판(예: 플로우 셀)의 다중 사이클 이미징에 특히 해롭다.Base calling accuracy is important for high-throughput sequencing and downstream analyses, such as read mapping and genome assembly. This disclosure relates to optimizing image data for accurate base call clusters during a sequencing run. One problem in image data optimization is the variation in the intensity profile (or intensity distribution) of clusters in the base-called cluster population. This makes multi-cycle imaging of substrates (e.g. flow cells) with large numbers (e.g. thousands, millions, billions, etc.) of clusters difficult to manage, reducing data throughput and increasing error rates. Particularly harmful.

플로우 셀 상의 수백만 개의 클러스터들의 세기 프로파일들은 각각의 클러스터들 사이에서 또는 클러스터들의 하위집단들 사이에서 변할 수 있다. 이러한 변동에 대한 많은 잠재적인 이유가 있다. 이는 클러스터 집단에서의 단편 길이 분포 또는 인접 클러스터들로부터의 원치 않는 광 방출들(공간적 크로스토크)에 의해 야기되는 클러스터 밝기의 차이들로부터 기인할 수 있다. 그것은 클러스터에서의 분자가 일부 서열분석 사이클에서 뉴클레오티드를 포함하지 않고 다른 분자들보다 뒤처지는 경우, 또는 분자가 단일 서열분석 사이클에서 하나 초과의 뉴클레오티드를 포함할 때 발생하는 페이즈 오류(phase error)에서 비롯될 수 있다. 그것은 페이딩, 즉 시퀀싱 런이 진행됨에 따라 과도한 세척 및 레이저 노출로 인해 시퀀싱 사이클 수의 함수로서 클러스터의 신호 세기의 기하급수적 감소에서 비롯될 수 있다. 그것은 저개발 클러스터 콜로니, 즉 패턴화된 플로우 셀에서 비어 있거나 부분적으로 채워진 웰을 생성하는 작은 클러스터 크기에서 비롯될 수 있다. 그것은 비배타적 증폭으로 인한 중복되는 클러스터 콜로니에서 비롯될 수 있다. 그것은 예를 들어 클러스터가 플로우 셀의 가장자리에 위치함으로 인해 부족한 조명 또는 불균일한 조명에서 비롯될 수 있다. 그것은 방출된 신호를 난독화하는(obfuscate) 플로우 셀의 불순물(예: 버블)에서 비롯될 수 있다. 그것은 다중클론 클러스터, 즉 다수의 클러스터가 동일한 웰에 침착되는 경우에서 비롯될 수 있다. 이는 광학 렌즈의 기하학적 구조에 의해 유도된 이미지에서 상이한 유형의 왜곡으로부터 기인할 수 있다. 이러한 왜곡들은 예를 들어 확대 왜곡, 스큐 왜곡, 변환 왜곡, 및 배럴 왜곡 및 핀쿠션 왜곡과 같은 비선형 왜곡들을 포함할 수 있다.The intensity profiles of the millions of clusters on a flow cell can vary between individual clusters or between subpopulations of clusters. There are many potential reasons for this variation. This may result from differences in cluster brightness caused by fragment length distribution in the cluster population or unwanted light emissions from adjacent clusters (spatial crosstalk). It results from phase errors, which occur when a molecule in a cluster does not contain a nucleotide in some sequencing cycles and lags behind other molecules, or when a molecule contains more than one nucleotide in a single sequencing cycle. It can be. It may result from fading, i.e., an exponential decrease in the signal intensity of a cluster as a function of the number of sequencing cycles due to excessive cleaning and laser exposure as the sequencing run progresses. It may result from underdeveloped cluster colonies, i.e., small cluster sizes that result in empty or partially filled wells in the patterned flow cell. It may result from overlapping cluster colonies resulting from non-exclusive amplification. It may result from insufficient or uneven illumination, for example due to the cluster being located at the edge of the flow cell. It may originate from impurities (e.g. bubbles) in the flow cell that obfuscate the emitted signal. It may result from polyclonal clusters, i.e., when multiple clusters are deposited in the same well. This may result from different types of distortion in the image induced by the geometry of the optical lens. These distortions may include, for example, magnification distortion, skew distortion, translation distortion, and nonlinear distortions such as barrel distortion and pincushion distortion.

이러한 변동은 전체 클러스터 집단에 대해 세기 교정기를 훈련시킴으로써 대략적인 방식으로 교정될 수 있다. 이와 상이하게, 클러스터들의 각각의 서브집단들에 대한 각각의 세기 교정기들을 훈련시키는 것일 것이며, 여기서 서브집단들은 시퀀싱 에러들을 최소화하고 이용가능한 계산의 경계 내에서 염기 호출 정확도를 최대화하는 방식으로 세그먼트화된다. 본 발명은 후자의 보다 세부적인 접근법에 관한 것이다. 더 많은 상세사항들은 하기와 같다.These variations can be corrected in a coarse-grained manner by training an intensity corrector on the entire population of clusters. Alternatively, one would train individual intensity correctors for each subpopulation of clusters, where the subpopulations are segmented in a way that minimizes sequencing errors and maximizes base call accuracy within the bounds of available computation. . The present invention relates to the latter, more detailed approach. More details are as follows.

도면에서, 유사한 도면 부호는, 대체로, 상이한 도면들 전체에 걸쳐서 유사한 부분들을 지칭한다. 또한, 도면은 반드시 축척대로인 것은 아니며, 그 대신, 대체적으로, 개시된 기술의 원리들을 예시할 시에 강조된다. 하기의 설명에서, 개시된 기술의 다양한 구현예들이 하기의 도면을 참조하여 기술된다.
도 1은 이미징 시스템을 갖는 예시적인 시퀀싱 환경을 도시한다.
도 2는 특정 구현예에서 구현될 수 있는 예시적인 2-채널, 라인-스캐닝 모듈형 광학 이미징 시스템을 예시한 블록도이다.
도 3은 각각의 공간적 구성들 1 내지 N에서 플로우 셀 상에 위치된 클러스터들의 각각의 클래스들 1 내지 N에 대해 생성된 각각의 이미지 데이터 서브세트들 1 내지 N의 신호 대 잡음비를 최대화하도록 훈련된 각각의 신호 프로파일러 1 내지 N의 일 구현예를 도시한다.
도 4는 본 명세서에 개시된 구현예에 따라 이미징될 수 있는 플로우 셀의 예시적인 구성을 예시한다.
도 5a는 플로우 셀의 상부 표면의 레인들을 도시한다.
도 5b는 플로우 셀의 상부 표면의 레인 내의 타일들의 스와스들(swathes)을 도시한다.
도 5c는 플로우 셀의 상부 표면의 레인의 스와스 내의 타일을 예시한다.
도 5d는 플로우 셀의 상부 표면의 스와스 내의 타일 내의 서브타일들을 도시한다.
도 6a는 시퀀싱 런(600) 동안 각각의 클러스터 클래스들에 대해 각각의 표면-특이적 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 6b는 각각의 클러스터 클래스들에 대응하는 이미지 데이터 서브세트들에 훈련된 표면-특이적 전문 신호 프로파일러들을 적용하는 일 구현예를 도시한다.
도 7a는 각각의 클러스터 클래스들에 대해 각각의 레인 그룹-특이적 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 7b는 각각의 클러스터 클래스들에 대해 각각의 레인-특이적 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 7c는 각각의 클러스터 클래스들에 대해 각각의 스와스-특이적 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 7d는 각각의 클러스터 클래스들에 대해 각각의 타일-특이적 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 7e는 각각의 클러스터 클래스들에 대해 각각의 서브타일-특이적 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 8은 시퀀싱 런 동안 개개의 클러스터 클래스들에 대응하는 이미지 데이터 서브세트들에 훈련된 레인 그룹-특이적 전문 신호 프로파일러들을 적용하는 일 구현예를 도시한다.
도 9는 시퀀싱 런 동안 개개의 클러스터 클래스들에 대응하는 이미지 데이터 서브세트들에 훈련된 레인-특이적 전문 신호 프로파일러들을 적용하는 일 구현예를 도시한다.
도 10은 시퀀싱 런 동안 각각의 클러스터 클래스들에 대응하는 이미지 데이터 서브세트들에 훈련된 스와스-특이적 전문 신호 프로파일러들을 적용하는 일 구현예를 도시한다.
도 11은 시퀀싱 런 동안 개개의 클러스터 클래스들에 대응하는 이미지 데이터 서브세트들에 훈련된 타일-특이적 전문 신호 프로파일러들을 적용하는 일 구현예를 도시한다.
도 12는 시퀀싱 런 동안 개개의 클러스터 클래스들에 대응하는 이미지 데이터 서브세트들에 훈련된 서브타일-특이적 전문 신호 프로파일러들을 적용하는 일 구현예를 도시한다.
도 13은 총 N개의 시퀀싱 사이클들을 갖는 시퀀싱 런의 각각의 서브시리즈의 시퀀싱 사이클들에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들의 일 구현예를 도시한다.
도 14는 상이한 공간 구성(예를 들어, 상이한 서브타일)과 상이한 시간적 구성(예를 들어, 시퀀싱 사이클의 상이한 서브시리즈)의 조합에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들의 일 구현예를 도시한다.
도 15는 시퀀싱 런 동안 시퀀싱된 각각의 클러스터/웰에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들의 일 구현예를 도시한다.
도 16은 하나 이상의 완성된/이미 실행된 시퀀싱 런들로부터의 시퀀싱된 데이터 상에서 전문 신호 프로파일러들의 오프라인 훈련, 및 진행 중인 시퀀싱 런으로부터의 시퀀싱된 데이터 상에 훈련된 전문 신호 프로파일러들의 적용의 일 구현예를 도시한다.
도 17은 서브타일 이미지들로 분할된 예시적인 타일 이미지를 도시한다.
도 18은 진행 중인 시퀀싱 런의 초기 시퀀싱 사이클들로부터의 시퀀싱된 데이터 상의 전문 신호 프로파일러들의 온라인 훈련, 및 진행 중인 시퀀싱 런의 나중의 시퀀싱 사이클들로부터의 시퀀싱된 데이터 상에 훈련된 전문 신호 프로파일러들의 적용의 일 구현예를 도시한다.
도 19는 시퀀싱된 데이터에서 관찰된 각각의 신호 분포에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다.
도 20은 신호 분포/신호 프로파일/클러스터 세기 프로파일의 일례를 도시한다.
도 21은 개시된 기술을 구현하는 프로세싱 파이프라인의 일 구현예를 도시한다.
도 22a 내지 도 22e는 예시적인 공간적 등화기의 등화기 계수 세트들을 도시한다.
도 23 내지 도 31은 등화기를 훈련하는 일 구현예를 도시한다.
도 32a는 등화기의 사용 없이, 그리고 11.96 데시벨(dB)의 신호 대 잡음비를 갖는 클러스터 집단의 염기별 신호 분포를 도시한다.
도 32b는 등화기의 사용과 동일한 클러스터 집단의 염기별 신호 분포들을 도시하고, 신호 대 잡음비가 13.13 dBs로 개선된 것을 도시한다.
도 33은 전문 신호 프로파일러에 대한 비용 함수가 구배 하강의 각각의 반복에 따라 개선되는 방법을 도시한다.
도 34는 전문 신호 프로파일러가 각각의 시퀀싱 사이클에서 적응/훈련/구성/업데이트될 때 도 33의 비용 함수의 초기 및 최종 값들을 도시하는 플롯이다.
도 35a 및 도 35b는 전문 신호 프로파일러를 적응/훈련/구성/업데이트할 때 시퀀싱 런에 대한 1차 분석 지표의 개선을 도시한다.
도 36a 및 도 36b는 시퀀싱 타일이 각각의 전문 프로파일러들의 적응적 등화를 위해 분할될 수 있는 서브타일들의 수를 평가하는 2개의 플롯들을 도시한다.
도 37은 개시된 기술을 구현하는 데 사용될 수 있는 예시적인 컴퓨터 시스템을 도시한다.In the drawings, like reference numbers generally refer to like parts throughout the different views. Additionally, the drawings are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the disclosed technology. In the following description, various implementations of the disclosed technology are described with reference to the drawings.
Figure 1 depicts an example sequencing environment with an imaging system.
2 is a block diagram illustrating an example two-channel, line-scanning modular optical imaging system that may be implemented in certain implementations.
Figure 3 shows image data subsets 1 to N generated for each class 1 to N of clusters located on the flow cell in respective spatial configurations 1 to N, trained to maximize the signal-to-noise ratio. An implementation example of each signal profiler 1 to N is shown.
4 illustrates an example configuration of a flow cell that can be imaged according to implementations disclosed herein.
Figure 5A shows the lanes of the upper surface of the flow cell.
Figure 5B shows swathes of tiles within a lane of the upper surface of the flow cell.
Figure 5C illustrates tiles within a swath of lanes of the upper surface of the flow cell.
Figure 5D shows subtiles within a tile within a swath of the upper surface of the flow cell.
Figure 6A shows one implementation of training respective surface-specific expert signal profilers for each cluster class during sequencing run 600.
Figure 6B shows one implementation of applying trained surface-specific expert signal profilers to image data subsets corresponding to each cluster class.
Figure 7A shows one implementation of training respective lane group-specific expert signal profilers for each cluster class.
Figure 7b shows one implementation of training respective lane-specific expert signal profilers for each cluster class.
Figure 7C shows one implementation of training respective swath-specific expert signal profilers for each cluster class.
Figure 7d shows one implementation of training respective tile-specific expert signal profilers for each cluster class.
Figure 7E shows one implementation of training respective subtile-specific expert signal profilers for each cluster class.
Figure 8 shows one implementation of applying trained lane group-specific expert signal profilers to image data subsets corresponding to individual cluster classes during a sequencing run.
Figure 9 shows one implementation of applying trained lane-specific expert signal profilers to image data subsets corresponding to individual cluster classes during a sequencing run.
Figure 10 shows one implementation of applying trained swath-specific expert signal profilers to image data subsets corresponding to each cluster class during a sequencing run.
Figure 11 shows one implementation of applying trained tile-specific expert signal profilers to image data subsets corresponding to individual cluster classes during a sequencing run.
Figure 12 shows one implementation of applying trained subtile-specific expert signal profilers to image data subsets corresponding to individual cluster classes during a sequencing run.
Figure 13 shows one implementation of specialized signal profilers for each/separate/different/independent sequencing cycles of each subseries of a sequencing run with a total of N sequencing cycles.
Figure 14 shows the results of each/separate/different/independent expert signal profilers for combinations of different spatial configurations (e.g. different subtiles) and different temporal configurations (e.g. different subseries of sequencing cycles). One implementation example is shown.
Figure 15 shows one implementation of each/separate/different/independent expert signal profilers for each cluster/well sequenced during a sequencing run.
16 shows one implementation of offline training of expert signal profilers on sequenced data from one or more completed/already run sequencing runs, and application of trained expert signal profilers on sequenced data from an ongoing sequencing run. An example is shown.
Figure 17 shows an example tile image divided into subtile images.
18 shows online training of expert signal profilers on sequenced data from early sequencing cycles of an ongoing sequencing run, and expert signal profilers trained on sequenced data from later sequencing cycles of an ongoing sequencing run. One implementation example of application of these is shown.
Figure 19 shows one implementation of training separate/separate/different/independent expert signal profilers for each signal distribution observed in sequenced data.
Figure 20 shows an example of signal distribution/signal profile/cluster intensity profile.
Figure 21 illustrates one implementation of a processing pipeline implementing the disclosed technology.
22A-22E illustrate equalizer coefficient sets of an example spatial equalizer.
Figures 23-31 illustrate one implementation of training an equalizer.
Figure 32A shows the base-by-base signal distribution of a population of clusters without the use of an equalizer and with a signal-to-noise ratio of 11.96 decibels (dB).
Figure 32b shows the signal distributions for each base of the same cluster group using the equalizer, and shows that the signal-to-noise ratio was improved to 13.13 dBs.
Figure 33 shows how the cost function for an expert signal profiler improves with each iteration of gradient descent.
Figure 34 is a plot showing the initial and final values of the cost function of Figure 33 as the expert signal profiler is adapted/trained/configured/updated at each sequencing cycle.
Figures 35A and 35B illustrate the improvement of primary analysis metrics for sequencing runs when adapting/training/configuring/updating an expert signal profiler.
Figures 36A and 36B show two plots evaluating the number of subtiles a sequencing tile can be divided into for adaptive equalization of each expert profiler.
Figure 37 depicts an example computer system that can be used to implement the disclosed techniques.

아래의 논의는 어느 당업자라도 개시된 기술을 제조하고 사용할 수 있게 하도록 제시되며, 특정의 적용 및 그의 요건과 관련하여 제공된다. 개시된 구현에 대한 다양한 수정은 당업자에게 용이하게 명백할 것이며, 본원에 정의된 일반 원리는 개시된 기술의 사상 및 범위로부터 벗어나지 않고 다른 구현 및 응용에 적용될 수 있다. 따라서, 개시된 기술은 도시된 구현예들로 제한되도록 의도된 것이 아니라, 본원에 개시된 원리들 및 특징들과 일치하는 가장 넓은 범주에 부합되어야 한다.The following discussion is presented to enable any person skilled in the art to make and use the disclosed technology, and is presented in relation to specific applications and requirements thereof. Various modifications to the disclosed implementations will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other implementations and applications without departing from the spirit and scope of the disclosed technology. Accordingly, the disclosed technology is not intended to be limited to the illustrated implementations but is to be accorded the broadest scope consistent with the principles and features disclosed herein.

먼저 신호 프로파일러 및 이어서 개시된 전문 신호 프로파일러를 설명한다.First, the signal profiler and then the disclosed professional signal profiler are described.

신호 프로파일러signal profiler

본 명세서에 사용되는 바와 같이, "신호 프로파일러"는 노이즈에 의해 방해받는 신호의 신호 대 잡음비를 최대화한다. 신호 프로파일러는 데이터를 원하는 방식으로 수정하기 위해 데이터에 적용되는 값 또는 함수일 수 있다. 예를 들어, 데이터는 특정 상황에 대해 그의 정확도, 관련성, 또는 적용가능성을 증가시키도록 수정될 수 있다. 신호 프로파일러는 추가, 감산, 나눗셈, 승산, 또는 이들의 조합을 포함하지만 이로 제한되지 않는 다양한 수학적 조작 중 임의의 것에 의해 데이터에 적용될 수 있다. 신호 프로파일러는 수학식, 논리 기능, 컴퓨터 구현 알고리즘 등일 수 있다. 데이터는 이미지 데이터, 전기 데이터, 또는 이들의 조합일 수 있다.As used herein, a “signal profiler” maximizes the signal-to-noise ratio of a signal that is disturbed by noise. A signal profiler can be a value or function applied to data to modify it in a desired way. For example, data may be modified to increase its accuracy, relevance, or applicability to a particular situation. A signal profiler can be applied to data by any of a variety of mathematical operations, including but not limited to addition, subtraction, division, multiplication, or combinations thereof. Signal profilers can be mathematical expressions, logical functions, computer-implemented algorithms, etc. The data may be image data, electrical data, or a combination thereof.

일 구현예에서, 신호 프로파일러는 등화기(예: 공간적 등화기)이다. 등화기는 시퀀싱 이미지에서 클러스터 세기 데이터의 신호 대 잡음비를 최대화하도록 훈련될 수 있다(예: 최소 제곱 추정, 적응형 등화 알고리즘 사용). 일부 구현예들에서, 등화기는, "등화기 필터들" 또는 "컨볼루션 커널들"로도 지칭되는, 서브픽셀 해상도를 갖는 복수의 룩업 테이블(LUT)을 갖는 LUT 뱅크이다. 하나의 구현예에서, 등화기 내의 LUT들의 수는 시퀀싱 이미지들의 픽셀들이 분할될 수 있는 서브픽셀들의 수에 의존한다. 예를 들어, 픽셀들이 n x n 서브픽셀들에 의해 분할될 수 있는 경우(예컨대, 5 x 5 서브픽셀들), 등화기는 n²개의 LUT들(예컨대, 25개의 LUT들)을 생성한다.In one implementation, the signal profiler is an equalizer (eg, spatial equalizer). Equalizers can be trained to maximize the signal-to-noise ratio of cluster intensity data in sequencing images (e.g., using least squares estimation, adaptive equalization algorithms). In some implementations, the equalizer is a LUT bank with a plurality of lookup tables (LUTs) with subpixel resolution, also referred to as “equalizer filters” or “convolution kernels.” In one implementation, the number of LUTs in the equalizer depends on the number of subpixels into which the pixels of the sequencing images can be divided. For example, if the pixels can be partitioned by nxn subpixels (eg, 5x5 subpixels), the equalizer generates n ² LUTs (eg, 25 LUTs).

훈련 등화기의 일 구현예에서, 시퀀싱 이미지들로부터의 데이터는 웰 서브픽셀 위치에 의해 비닝된다. 예를 들어, 5 x 5 LUT의 경우, 웰들의 대략 l/25은 빈(1,1)에 있는 중심(예컨대, 센서 픽셀의 상부 좌측 코너)을 갖고, 웰들의 l/25은 빈(1,2)에 있고, 등등이다. 일 구현예에서, 각각의 빈에 대한 등화기 계수들은 각각의 빈에 대응하는 웰들로부터의 데이터의 서브세트에 대해 최소 제곱 추정을 사용하여 결정된다. 이러한 방식으로, 생성된 추정된 등화기 계수들은 각각의 빈에 대해 상이하다.In one implementation of a training equalizer, data from sequencing images are binned by well subpixel positions. For example, for a 5 x 5 LUT, approximately l/25 of the wells have their centers in bin(1,1) (e.g., top left corner of the sensor pixel), and l/25 of the wells have their centers in bin(1,1). 2), etc. In one implementation, the equalizer coefficients for each bin are determined using least squares estimation on the subset of data from the wells corresponding to each bin. In this way, the generated estimated equalizer coefficients are different for each bin.

각각의 LUT/등화기 필터/컨벌루션 커널은 훈련으로부터 학습된 복수의 계수들을 갖는다. 일 구현예에서, LUT 내의 계수들의 수는 클러스터를 염기 호출하는 데 사용되는 픽셀들의 수에 대응한다. 예를 들어, 클러스터를 염기 호출하는 데 사용되는 픽셀들(이미지 또는 픽셀 패치)의 로컬 그리드가 크기 p x p(예컨대, 9 x 9 픽셀 패치)의 것인 경우, 각각의 LUT는 p²개의 계수들(예컨대, 81개의 계수들)을 갖는다.Each LUT/equalizer filter/convolution kernel has a plurality of coefficients learned from training. In one implementation, the number of coefficients in the LUT corresponds to the number of pixels used to base call the cluster. For example, if the local grid of pixels (images or pixel patches) used to base call a cluster is of size pxp (e.g., a 9 x 9 pixel patch), then each LUT has p ² coefficients ( For example, 81 coefficients).

일 구현예에서, 훈련은 신호대잡음비를 최대화하는 방식으로 염기 호출되는 표적 클러스터로부터의 세기 방출물들 및 하나 이상의 인접 클러스터들로부터의 세기 방출물들을 묘사하는 픽셀들의 세기 값들을 혼합/조합하도록 구성된 등화기 계수들을 생성한다. 신호대잡음상이한 레인 그룹서 최대화된 신호는 표적 클러스터로부터의 세기 방출물들이고, 신호대잡음비에서 최소화된 잡음은 (예컨대, 배경 세기 방출물들을 처리하기 위한) 인접 클러스터들로부터의 세기 방출물들, 즉 공간적 크로스토크 플러스 일부 랜덤 잡음이다. 등화기 계수들은 가중치들로서 사용되고, 혼합하기/조합하기는 픽셀들의 세기 값들의 가중 합을 계산하기 위해 등화기 계수들과 픽셀들의 세기 값들 사이에서 요소별 곱셈을 실행하는 것, 즉, 컨벌루션 동작(즉, 연산)을 포함한다. 또한, 경우에 따라서, 이미지 데이터는 다수의 컬러 채널들에 걸쳐 있고, 등화기 계수들의 세트가 각각의 컬러 채널(예: 하나의 채널, 3개의 채널, 4개의 채널 등)에 대해 생성된다.In one implementation, the training includes an equalizer configured to mix/combine the intensity values of pixels depicting intensity emissions from a base-called target cluster and intensity emissions from one or more adjacent clusters in a manner that maximizes the signal-to-noise ratio. Generate coefficients. Signal-to-noise In different lane groups, the maximized signal is the intensity emissions from the target cluster, and the noise minimized at the signal-to-noise ratio is the intensity emissions from adjacent clusters (e.g., to account for background intensity emissions), i.e., the spatial cross torque plus some random noise. The equalizer coefficients are used as weights, and mixing/combining involves performing an element-wise multiplication between the equalizer coefficients and the intensity values of the pixels to compute a weighted sum of the intensity values of the pixels, i.e. a convolution operation (i.e. , operations). Additionally, in some cases, the image data spans multiple color channels, and a set of equalizer coefficients is generated for each color channel (eg, one channel, three channels, four channels, etc.).

도 22a 내지 도 22e는 예시적인 공간적 등화기의 등화기 계수 세트들을 도시한다. 열 맵들에 의해 나타낸 바와 같이, 상이한 등화기 계수 세트들은 픽셀들의 위치들에 따라 픽셀들의 신호들을 상이하게 감쇠시키고 증강시키도록 구성된다.22A-22E illustrate equalizer coefficient sets of an example spatial equalizer. As shown by the heat maps, different sets of equalizer coefficients are configured to attenuate and enhance the signals of pixels differently depending on the positions of the pixels.

도 23은 등화기를 훈련하는 하나의 구현예를 도시한다. 제1 시퀀싱 사이클(사이클 1)의 경우, 도 23의 등화기는 녹색 컬러 채널에 대한 등화기 계수들(2302)의 제1 세트, 및 청색 컬러 채널에 대한 등화기 계수들(2304)의 제2 세트를 갖는다. 또한, 제1 시퀀싱 사이클(사이클 1)의 경우, 제1 클러스터(클러스터 1)는 녹색 컬러 채널에 대한 입력 이미지 픽셀들(2306) 및 청색 컬러 채널에 대한 입력 이미지 픽셀들(2308)을 갖는다.Figure 23 shows one implementation of training an equalizer. For the first sequencing cycle (Cycle 1), the equalizer of FIG. 23 has a first set of equalizer coefficients 2302 for the green color channel, and a second set of equalizer coefficients 2304 for the blue color channel. has Additionally, for the first sequencing cycle (Cycle 1), the first cluster (Cluster 1) has input image pixels 2306 for the green color channel and input image pixels 2308 for the blue color channel.

도 24에서, 등화기 계수들(2302)의 제1 세트는 입력 이미지 픽셀들(2306)을 요소별로 곱하여(2402) 녹색 컬러 채널에 대한 가중 합계(2316)를 생성한다. 도 25에서, 등화기 계수들(2304)의 제2 세트는 입력 이미지 픽셀들(2308)을 요소별로 곱하여(2502) 청색 컬러 채널에 대한 가중 합계(2318)를 생성한다.24, a first set of equalizer coefficients 2302 multiplies the input image pixels 2306 element-wise (2402) to produce a weighted sum 2316 for the green color channel. 25, a second set of equalizer coefficients 2304 multiplies the input image pixels 2308 element-wise (2502) to produce a weighted sum 2318 for the blue color channel.

이어서, 염기 호출 로직(2322)은 위에서 논의된 기대 최대화(EM) 알고리즘을 사용하여 가중 합계들(2316, 2318)에 기초하여 염기 호출(2324)을 예측한다. 도 26에서, 예측된 염기 호출(2324)에 기초하여, 녹색 컬러 채널에 대한 가중 합계(2316)는 염기 호출 로직(2322)을 갖는 녹색 컬러 채널에 대한 지불된 베이스의 중심 값(2612)과 비교된다. 비교는 녹색 채널에 대한 염기 호출 에러(2336)를 산출한다. 또한, 도 26에서, 예측된 염기 호출(2324)에 기초하여, 청색 컬러 채널에 대한 가중 합계(2318)는 염기 호출 로직(2322)을 갖는 청색 컬러 채널에 대한 지불된 베이스의 중심 값(2712)과 비교된다. 비교는 청색 채널에 대한 염기 호출 에러(2338)를 산출한다.Base call logic 2322 then predicts base call 2324 based on the weighted sums 2316 and 2318 using the expectation maximization (EM) algorithm discussed above. 26, based on predicted base calls 2324, the weighted sum 2316 for the green color channel is compared to the centroid value of paid bases 2612 for the green color channel with base call logic 2322. do. The comparison yields a base call error (2336) for the green channel. 26 , based on the predicted base call 2324, the weighted sum 2318 for the blue color channel is the centroid value of paid bases 2712 for the blue color channel with base call logic 2322. compared to The comparison yields a base calling error (2338) for the blue channel.

도 28 및 도 29에서, 염기 호출 에러들(2336, 2338)은 업데이트 로직(2342)에 의해 사용되어 녹색 컬러 채널에 대한 등화기 계수들(2356)의 업데이트된 제1 세트, 및 청색 컬러 채널에 대한 등화기 계수들(2358)의 업데이트된 제2 세트를 생성한다.28 and 29, base call errors 2336, 2338 are used by update logic 2342 to update an updated first set of equalizer coefficients 2356 for the green color channel and the blue color channel. generates an updated second set of equalizer coefficients 2358.

일부 구현예들에서, 상기 단계들은 복수의 클러스터들에 대해 실행된다. 예를 들어, 도 30 및 도 31에 도시된 3개의 클러스터들의 경우에, 녹색 컬러 채널에 대한 등화기 계수들의 제1 세트의 3개의 업데이트된 버전들, 및 청색 컬러 채널에 대한 등화기 계수들의 제2 세트의 3개의 업데이트된 버전들이 생성된다. 3개의 업데이트된 버전들은, 제2 시퀀싱 사이클(사이클 2)에 대해, 녹색 컬러 채널에 대한 등화기 계수들(2362)의 제1 세트, 및 청색 컬러 채널에 대한 등화기 계수들(2354)의 제2 세트를 계산하는 데 사용된다.In some implementations, the steps above are performed on multiple clusters. For example, for the three clusters shown in Figures 30 and 31, three updated versions of the first set of equalizer coefficients for the green color channel, and the first set of equalizer coefficients for the blue color channel. Two sets of three updated versions are produced. The three updated versions are: for the second sequencing cycle (Cycle 2), a first set of equalizer coefficients 2362 for the green color channel, and a first set of equalizer coefficients 2354 for the blue color channel. 2 is used to calculate the set.

도 32a는 등화기의 사용 없이, 그리고 11.96 데시벨(dB)의 신호 대 잡음비를 갖는 클러스터 집단의 염기별 신호 분포를 도시한다. 도 32b는 등화기의 사용과 동일한 클러스터 집단의 염기별 신호 분포들을 도시하고, 신호 대 잡음비가 13.13 dBs로 개선된 것을 도시한다. 신호 대 잡음비의 개선은 도 32a의 염기별 클라우드와 비교하여 도 32b에서의 조기/더 별개의 염기별 클라우드/분포에 의해 시각적으로 관찰가능하다.Figure 32A shows the base-by-base signal distribution of a population of clusters without the use of an equalizer and with a signal-to-noise ratio of 11.96 decibels (dB). Figure 32b shows the signal distributions for each base of the same cluster group using the equalizer, and shows that the signal-to-noise ratio was improved to 13.13 dBs. The improvement in signal-to-noise ratio is visually observable by the earlier/more distinct base-specific clouds/distribution in Figure 32B compared to the base-specific clouds in Figure 32A.

등화기에 대한 추가 세부사항은, 본 명세서에 완전히 기재된 것과 같이 참고로 포함된, 2021년 5월 4일자로 출원된 "Equalization-Based Image Processing and Spatial Crosstalk Attenuator"라는 명칭의 미국 정규 특허 출원 제17/308,035호(대리인 문서 번호 ILLM 1032-2/IP-1991-PRV)에서 찾을 수 있다.Additional details regarding the equalizer may be found in U.S. Provisional Patent Application No. 17/, entitled “Equalization-Based Image Processing and Spatial Crosstalk Attenuator,” filed May 4, 2021, which is incorporated by reference as if fully set forth herein. 308,035 (Attorney Docket No. ILLM 1032-2/IP-1991-PRV).

이제, 전문 신호 프로파일러들에 대해 논의한다.Now, we discuss professional signal profilers.

전문 신호 프로파일러Professional signal profiler

팬-플로우 셀 레벨 또는 팬-시퀀싱 런 레벨에서 적용되는 전역 세기 보정은 이미지 데이터에서 다양한 소음들을 고려하지 못한다. 예를 들어, 비선형 왜곡 및 잡음은 이미지 데이터를 캡처하는 광학 렌즈의 형상에 의해 유도될 수 있다. 또한, 이미지형성된 플로우 셀은 또한 제조 공정으로 인해 웰 패턴의 왜곡을 도입할 수 있다(예: 기재의 비강성으로 인해 웰의 결합 또는 이동에 의해 도입된 3D 욕조 효과). 마지막으로, 홀더 내의 플로우 셀의 틸트는 글로벌 세기 보정에 의해 고려되지 않는다.Global intensity correction applied at the pan-flow cell level or pan-sequencing run level does not take into account various noises in the image data. For example, non-linear distortion and noise can be induced by the shape of the optical lens that captures the image data. Additionally, imaged flow cells can also introduce distortions in the well pattern due to the manufacturing process (e.g., 3D bath effect introduced by joining or moving wells due to the non-rigidity of the substrate). Finally, the tilt of the flow cell within the holder is not taken into account by global intensity correction.

본 명세서에 사용되는 바와 같이, "전문 신호 프로파일러"는 데이터의 특정 카테고리/유형/구성/특성/클래스/빈의 신호 대 잡음비를 최대화하도록 구성되는/훈련되는 신호 프로파일러이다. 다양한 전문 신호 프로파일러를 개시한다. 예를 들어, "표면-특이적 전문 신호 프로파일러"는 특정 표면 또는 특정 표면 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 플로우 셀의 상단 표면이나 하단 표면 또는 표면 1 내지 N). 마찬가지로, "레인-특이적 전문 신호 프로파일러"는 특정 레인 또는 특정 레인 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 중심 레인, 주변 레인 또는 플로우 셀의 레인 1 내지 N). 또한, "타일-특이적 전문 신호 프로파일러"는 특정 타일 또는 특정 타일 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 중심 타일, 주변 타일 또는 플로우 셀의 타일 1 내지 N). 또한, "서브타일-특이적 전문 신호 프로파일러"는 특정 서브타일 또는 특정 서브타일 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 중심 서브타일, 주변 서브타일 또는 플로우 셀의 서브타일 1 내지 N). 개시된 전문 신호 프로파일러들의 더 많은 예들 및 세부사항들이 따른다.As used herein, a “professional signal profiler” is a signal profiler that is configured/trained to maximize the signal-to-noise ratio of a particular category/type/configuration/characteristic/class/bin of data. Discloses various professional signal profilers. For example, a “surface-specific expert signal profiler” is configured/trained to maximize the signal-to-noise ratio of sequencing data for clusters located on a specific surface or on a specific surface type/category/class (e.g., the top surface of a flow cell). or bottom surface or surface 1 to N). Similarly, “lane-specific expert signal profilers” are configured/trained to maximize the signal-to-noise ratio of sequencing data for clusters located in a particular lane or a particular lane type/category/class (e.g., central lane, peripheral lane, or flow Lanes 1 to N of cells). Additionally, “tile-specific expert signal profilers” are configured/trained to maximize the signal-to-noise ratio of sequencing data for a specific tile or a cluster located in a specific tile type/category/class (e.g., central tile, peripheral tile, or flow Tiles 1 to N of cells). Additionally, “subtile-specific expert signal profilers” are configured/trained to maximize the signal-to-noise ratio of the sequencing data of a specific subtile or clusters located in a specific subtile type/category/class (e.g. central subtile, Peripheral subtiles or subtiles 1 through N of the flow cell). More examples and details of the disclosed specialized signal profilers follow.

일부 구현예들에서, 단일 신호 프로파일러는 복수의 특정 계수 세트들을 포함할 수 있어서, 각각의 특정 계수 세트는 데이터의 특정 카테고리/유형/구성/특성/클래스/빈의 신호 대 잡음비를 최대화하도록 구성/훈련된다. 일부 구현예들에서, 단일 신호 프로파일러는 다양한 특정 계수 세트들을 포함할 수 있다. 예를 들어, "표면-특이적 전문 계수 세트"는 특정 표면 또는 특정 표면 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 플로우 셀의 상단 표면이나 하단 표면 또는 표면 1 내지 N). 마찬가지로, "레인-특이적 전문계수 세트"는 특정 레인 또는 특정 레인 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 중심 레인, 주변 레인 또는 플로우 셀의 레인 1 내지 N). 또한, "타일-특이적 전문 계수 세트"는 특정 타일 또는 특정 타일 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 중심 타일, 주변 타일 또는 플로우 셀의 타일 1 내지 N). 또한, "서브타일-특이적 전문 계수 세트"는 특정 서브타일 또는 특정 서브타일 유형/카테고리/클래스에 위치한 클러스터의 시퀀싱 데이터의 신호 대 잡음비를 최대화하도록 구성/훈련된다(예: 중심 서브타일, 주변 서브타일 또는 플로우 셀의 서브타일 1 내지 N). 개시된 특정 계수 세트들의 더 많은 예들 및 세부사항들이 따른다.In some implementations, a single signal profiler may include a plurality of specific coefficient sets, such that each specific coefficient set is configured to maximize the signal-to-noise ratio of a specific category/type/configuration/characteristic/class/bin of data. /trained. In some implementations, a single signal profiler can include a variety of specific coefficient sets. For example, a “surface-specific expert coefficient set” is constructed/trained to maximize the signal-to-noise ratio of sequencing data for clusters located on a specific surface or on a specific surface type/category/class (e.g., the top surface of a flow cell or bottom surface or surfaces 1 to N). Likewise, a “lane-specific expert coefficient set” is constructed/trained to maximize the signal-to-noise ratio of sequencing data for clusters located in a particular lane or a particular lane type/category/class (e.g., central lane, peripheral lane, or flow cell). lanes 1 to N). Additionally, a “tile-specific expert coefficient set” is constructed/trained to maximize the signal-to-noise ratio of the sequencing data of a particular tile or a cluster located in a particular tile type/category/class (e.g., central tile, peripheral tile, or flow cell tiles 1 to N). Additionally, a “subtile-specific expert coefficient set” is constructed/trained to maximize the signal-to-noise ratio of the sequencing data of a particular subtile or clusters located in a particular subtile type/category/class (e.g. central subtile, peripheral Subtiles 1 to N of the subtile or flow cell). More examples and details of the specific coefficient sets disclosed follow.

개시된 전문 신호 프로파일러는 플로우 셀의 패턴화된 표면 및 비패턴화된 표면 둘 모두에 위치된 클러스터에 적용 가능하다. 비패턴화된 표면에 의해, 클러스터는 플로우 셀 상에 랜덤하게 분포된다. 무작위로 분포된 클러스터들 및 그에 대한 데이터(예: 이미지)는 공간적으로, 시간적으로, 신호로, 또는 이들의 임의의 조합에 의해 비닝될 수 있다. 따라서, 전문 신호 프로파일러들은 상이하게 비닝된 랜덤하게 분포된 클러스터들의 상이한 구성들에 대해 구성 및 훈련될 수 있다. 패턴화된 표면에 의해, 클러스터는 고정 위치로 패턴화된 웰 상에 위치된다. 패턴화된 웰 및 구성 클러스터는 공간적으로, 시간적으로, 신호로, 또는 이들의 임의의 조합에 의해 비닝될 수 있다. 따라서, 전문 신호 프로파일러들은 상이하게 비닝된 패턴화된 클러스터들의 상이한 구성들에 대해 구성 및 훈련될 수 있다.The disclosed specialized signal profiler is applicable to clusters located on both patterned and unpatterned surfaces of the flow cell. Due to the unpatterned surface, clusters are randomly distributed on the flow cell. Randomly distributed clusters and their data (e.g., images) may be binned spatially, temporally, signally, or any combination thereof. Accordingly, expert signal profilers can be configured and trained on different configurations of differently binned randomly distributed clusters. By means of the patterned surface, clusters are positioned on the patterned wells in fixed positions. Patterned wells and constituent clusters can be binned spatially, temporally, signally, or any combination thereof. Accordingly, expert signal profilers can be configured and trained for different configurations of differently binned patterned clusters.

개시된 전문 신호 프로파일러들은 시퀀싱 런의 상이한 구성들에 대해 생성된 이미지 데이터의 신호 대 잡음비를 최대화하도록 훈련된 구성 전문 신호 프로파일러들이다. 이러한 구성들은 플로우 셀 상의 상이한 영역들과 관련된 공간적 구성들, 시퀀싱 런의 상이한 시퀀싱/이미징 사이클들에 관한 시간적 구성들, 이미징된 데이터에서 관찰되는/인코딩된 신호 프로파일들의 상이한 분포들/패턴들에 관한 신호 분포 구성들, 또는 이들의 조합일 수 있다. 본 명세서에 개시된 시스템들 및 방법들의 다양한 구현예들을 설명하기 전에, 본 명세서에 개시된 기술이 구현될 수 있는 예시적인 환경을 설명하는 것이 유용하다.The disclosed expert signal profilers are configuration expert signal profilers trained to maximize the signal-to-noise ratio of image data generated for different configurations of a sequencing run. These configurations are spatial configurations associated with different regions on the flow cell, temporal configurations associated with different sequencing/imaging cycles of a sequencing run, and different distributions/patterns of signal profiles observed/encoded in the imaged data. It may be signal distribution configurations, or a combination thereof. Before describing various implementations of the systems and methods disclosed herein, it is useful to describe an example environment in which the techniques disclosed herein may be implemented.

시퀀싱 환경Sequencing environment

도 1은 이미징 시스템(100)을 갖는 예시적인 시퀀싱 환경을 도시한다. 예시적인 이미징 시스템(100)은 샘플의 이미지를 획득하거나 생성하기 위한 디바이스를 포함할 수 있다. 도 1에서 약술된 예는 백라이트 설계 구현예의 예시적인 이미징 구성을 도시한다. 시스템 및 방법이 예시적인 이미징 시스템(100)의 맥락에서 때때로 본 명세서에 기술될 수 있지만, 이들은 본 명세서에 개시된 전문 신호 프로파일러의 구현예가 구현될 수 있는 예일 뿐이라는 것에 유의하여야 한다.1 depicts an example sequencing environment with imaging system 100. Exemplary imaging system 100 may include a device for acquiring or generating images of a sample. The example outlined in FIG. 1 illustrates an exemplary imaging configuration of a backlight design implementation. It should be noted that although systems and methods may sometimes be described herein in the context of example imaging system 100, these are merely examples of how implementations of the expert signal profiler disclosed herein may be implemented.

도 1의 예에서 알 수 있는 바와 같이, 대상 샘플은 샘플 용기(110)(예컨대, 본 명세서에 기술된 바와 같은 플로우 셀(flow cell)) 상에 위치되고, 이는 대물 렌즈(142) 아래에서 샘플 스테이지(170) 상에 위치된다. 광원(160) 및 연관된 광학계가 레이저 광과 같은 광의 빔을 샘플 용기(110) 상의 선택된 샘플 위치로 지향시킨다. 샘플 형광 및 결과적인 광은 대물 렌즈(142)에 의해 수집되고 카메라 시스템(140)의 이미지 센서로 지향되어 형광을 검출한다. 샘플 스테이지(170)는 샘플 용기(110) 상의 다음 샘플 위치를 대물 렌즈(142)의 초점에 위치시키기 위해 대물 렌즈(142)에 대해 이동된다. 대물 렌즈(142)에 대한 샘플 스테이지(110)의 이동은 샘플 스테이지 자체, 대물 렌즈, 이미징 시스템(100)의 일부 다른 구성요소, 또는 전술한 것들의 임의의 조합을 이동함으로써 달성될 수 있다. 추가 구현예는 또한 고정된 샘플에 대해 전체 이미징 시스템(100)을 이동시키는 것을 포함할 수 있다.As can be seen in the example of Figure 1, a sample of interest is placed on a sample vessel 110 (e.g., a flow cell as described herein), which captures the sample under objective lens 142. It is located on stage 170. Light source 160 and associated optics direct a beam of light, such as laser light, to a selected sample location on sample vessel 110. The sample fluorescence and resulting light are collected by objective lens 142 and directed to an image sensor in camera system 140 to detect the fluorescence. Sample stage 170 is moved relative to objective lens 142 to position the next sample location on sample vessel 110 at the focus of objective lens 142. Movement of sample stage 110 relative to objective lens 142 may be accomplished by moving the sample stage itself, the objective lens, some other component of imaging system 100, or any combination of the foregoing. Additional implementations may also include moving the entire imaging system 100 relative to a stationary sample.

유체 전달 모듈 또는 디바이스(100)는 시약(예컨대, 형광 표지 뉴클레오티드, 완충제, 효소, 절단 시약 등)의 유동을 샘플 용기(110) 및 폐기물 밸브(120)로 (그리고 이를 통해) 지향시킨다. 샘플 용기(110)는 샘플이 상부에 제공되는 하나 이상의 기재를 포함할 수 있다. 예를 들어, 다수의 상이한 핵산 시퀀스를 분석하기 위한 시스템의 경우, 샘플 용기(110)는 시퀀싱될 핵산이 결합, 부착 또는 회합되는 하나 이상의 기재를 포함할 수 있다. 다양한 구현예에서, 기재는 예를 들어 유리 표면, 플라스틱 표면, 라텍스, 덱스트란, 폴리스티렌 표면, 폴리프로필렌 표면, 폴리아크릴아미드 겔, 금 표면, 및 실리콘 웨이퍼와 같은, 핵산이 부착될 수 있는 임의의 불활성 기재 또는 매트릭스를 포함할 수 있다. 일부 애플리케이션에서, 기재는 샘플 용기(110)에 걸쳐 매트릭스 또는 어레이로 형성된 복수의 위치에서 채널 또는 다른 영역 내에 있다.The fluid transfer module or device 100 directs the flow of reagents (e.g., fluorescently labeled nucleotides, buffers, enzymes, cleavage reagents, etc.) to (and through) the sample vessel 110 and waste valve 120. Sample container 110 may include one or more substrates on top of which a sample is provided. For example, for a system for analyzing multiple different nucleic acid sequences, sample vessel 110 may include one or more substrates to which the nucleic acids to be sequenced are bound, attached, or associated. In various embodiments, the substrate is any material to which nucleic acids can be attached, such as, for example, glass surfaces, plastic surfaces, latex, dextran, polystyrene surfaces, polypropylene surfaces, polyacrylamide gels, gold surfaces, and silicon wafers. It may comprise an inert substrate or matrix. In some applications, the substrate is within channels or other regions at multiple locations formed in a matrix or array throughout sample vessel 110.

일부 구현예에서, 샘플 용기(110)는 하나 이상의 형광 염료를 사용하여 이미징되는 생물학적 샘플을 포함할 수 있다. 예를 들어, 특정 구현예에서, 샘플 용기(110)는 반투명 커버 플레이트, 기재, 및 이들 사이에 개재된 액체를 포함하는 패턴화된 플로우 셀로서 구현될 수 있고, 생물학적 샘플이 반투명 커버 플레이트의 내측 표면 또는 기재의 내측 표면에 위치될 수 있다. 플로우 셀은 기재 내로 한정된 어레이(예를 들어, 육각형 어레이, 장방형 어레이 등)로 패턴화되는 다수(예를 들어, 수천, 수백만 또는 수십억 개)의 웰 또는 영역을 포함할 수 있다. 각각의 영역은 DNA, RNA, 또는 예를 들어 합성에 의한 시퀀싱을 사용하여 시퀀싱될 수 있는 다른 게놈 물질과 같은 생물학적 샘플의 클러스터(예를 들어, 단일클론 클러스터)를 형성할 수 있다. 플로우 셀은 다수의 이격된 레인(예컨대, 8개의 레인)으로 추가로 분할될 수 있고, 각각의 레인은 클러스터의 육각형 어레이를 포함한다. 본 명세서에 개시된 구현예에 사용될 수 있는 예시적인 플로우 셀은 미국 특허 제8,778,848호에 기술되어 있다.In some implementations, sample vessel 110 may contain a biological sample that is imaged using one or more fluorescent dyes. For example, in certain embodiments, sample vessel 110 may be implemented as a patterned flow cell comprising a translucent cover plate, a substrate, and a liquid sandwiched between them, with the biological sample flowing inside the translucent cover plate. It may be located on the surface or on the inner surface of the substrate. A flow cell may include a large number (e.g., thousands, millions, or billions) of wells or regions that are patterned into an array (e.g., a hexagonal array, a rectangular array, etc.) defined within a substrate. Each region may form a cluster (e.g., a monoclonal cluster) of a biological sample such as DNA, RNA, or other genomic material that can be sequenced using, for example, synthetic sequencing. The flow cell can be further divided into multiple spaced lanes (e.g., eight lanes), each lane containing a hexagonal array of clusters. Exemplary flow cells that can be used in embodiments disclosed herein are described in U.S. Pat. No. 8,778,848.

시스템은 또한 샘플 용기(110) 내의 유체의 온도 상태를 선택적으로 조절할 수 있는 온도 스테이션 액추에이터(130) 및 히터/쿨러(135)를 포함한다. 카메라 시스템(140)은 샘플 용기(110)의 시퀀싱을 모니터링하고 추적하기 위해 포함될 수 있다. 카메라 시스템(140)은 예를 들어 전하-결합 소자(CCD) 카메라(예컨대, 시간 지연 적분(TDI) CCD 카메라)로서 구현될 수 있고, 이는 필터 스위칭 조립체(145) 내의 다양한 필터, 대물 렌즈(142), 및 포커싱 레이저/포커싱 레이저 조립체(150)와 상호작용할 수 있다. 카메라 시스템(140)은 CCD 카메라로 제한되지 않고, 다른 카메라 및 이미지 센서 기술이 사용될 수 있다. 특정 구현예들에서, 카메라 센서는 약 5 내지 약 15 pm의 픽셀 크기를 가질 수 있다.The system also includes a temperature station actuator 130 and a heater/cooler 135 that can selectively adjust the temperature state of the fluid within the sample vessel 110. A camera system 140 may be included to monitor and track the sequencing of sample vessel 110. Camera system 140 may be implemented, for example, as a charge-coupled device (CCD) camera (e.g., a time delay integral (TDI) CCD camera), which includes various filters within filter switching assembly 145, objective lens 142 ), and can interact with the focusing laser/focusing laser assembly 150. Camera system 140 is not limited to CCD cameras, and other camera and image sensor technologies may be used. In certain implementations, the camera sensor may have a pixel size of about 5 to about 15 pm.

카메라 시스템(140)의 센서로부터의 출력 데이터는, 이미지 데이터(예컨대, 이미지 품질 스코어링)를 분석하고, 레이저 빔의 특성(예컨대, 초점, 형상, 세기, 전력, 휘도, 위치)을 그래픽 사용자 인터페이스(GUI)에 보고하거나 디스플레이하고, 추가로 후술되는 바와 같이, 이미지 데이터의 세기 노이즈를 동적으로 정정하는 소프트웨어 애플리케이션으로서 구현될 수 있는 실시간 분석 모듈(도시되지 않음)로 전달될 수 있다.Output data from the sensors of camera system 140 can be used to analyze image data (e.g., image quality scoring) and characterize the laser beam (e.g., focus, shape, intensity, power, brightness, position) through a graphical user interface (e.g., image quality scoring). Reporting or display in a graphical user interface (GUI) and, as further described below, may be passed to a real-time analysis module (not shown), which may be implemented as a software application that dynamically corrects intensity noise in the image data.

광원(160)(예컨대, 선택적으로 다수의 레이저를 포함하는 조립체 내의 여기 레이저) 또는 다른 광원이 (선택적으로 하나 이상의 리-이미징 렌즈(re-imaging lens), 광섬유 마운팅(fiber optic mounting) 등을 포함할 수 있는) 광섬유 인터페이스를 통한 조명을 통하여 샘플 내에서 형광 시퀀싱 반응을 조명하기 위해 포함될 수 있다. 저와트 램프(165), 포커싱 레이저(150), 및 역방향 다이크로익(185)이 또한 도시된 예에서 제시된다. 일부 구현예에서, 포커싱 레이저(150)는 이미징 동안 꺼질 수 있다. 다른 구현예에서, 대안적인 초점 구성이 제2 포커싱 카메라(도시되지 않음)를 포함할 수 있고, 이는 데이터 수집과 동시에 표면으로부터 반사되는 산란된 빔의 위치를 측정하기 위한 사분면 검출기, 위치 감응 검출기(Position Sensitive Detector, PSD), 또는 유사한 검출기일 수 있다.Light source 160 (e.g., an excitation laser in an assembly optionally comprising multiple lasers) or another light source (optionally including one or more re-imaging lenses, fiber optic mounting, etc. can be included to illuminate the fluorescent sequencing reaction within the sample through illumination through an optical fiber interface. A low wattage lamp 165, focusing laser 150, and reverse dichroic 185 are also shown in the illustrated example. In some implementations, focusing laser 150 can be turned off during imaging. In other implementations, an alternative focusing configuration may include a second focusing camera (not shown), which includes a quadrant detector to measure the position of the scattered beam reflected from the surface simultaneously with data collection, a position-sensitive detector ( Position Sensitive Detector (PSD), or similar detector.

백라이트 디바이스로서 예시되지만, 다른 예는 대물 렌즈(142)를 통해 샘플 용기(110) 상의 샘플 상으로 지향되는 레이저 또는 다른 광원으로부터의 광을 포함할 수 있다. 샘플 용기(110)는 최종적으로 대물 렌즈(142)에 대한 샘플 용기(110)의 이동 및 정렬을 제공하기 위해 샘플 스테이지(170) 상에 장착될 수 있다. 샘플 스테이지는 3개의 차원 중 임의의 차원으로 이동할 수 있게 하는 하나 이상의 액추에이터를 가질 수 있다. 예를 들어, 직교 좌표계의 측면에서, 액추에이터는 스테이지가 대물 렌즈에 대해 X, Y 및 Z 방향으로 이동할 수 있도록 제공될 수 있다. 이는 샘플 용기(110) 상의 하나 이상의 샘플 위치가 대물 렌즈(142)와 광학 정렬로 위치되도록 할 수 있다.Although illustrated as a backlight device, other examples may include light from a laser or other light source directed onto the sample on sample vessel 110 through objective lens 142. Sample vessel 110 may ultimately be mounted on sample stage 170 to provide movement and alignment of sample vessel 110 relative to objective lens 142 . The sample stage may have one or more actuators that allow it to move in any of three dimensions. For example, in terms of a Cartesian coordinate system, actuators may be provided to enable the stage to move in the X, Y and Z directions relative to the objective lens. This can cause one or more sample locations on sample vessel 110 to be placed in optical alignment with objective lens 142 .

초점(z-축) 컴포넌트(175)가 초점 방향(전형적으로 z 축 또는 z 방향으로 지칭됨)으로의 샘플 용기(110)에 대한 광학 컴포넌트의 위치설정을 제어하기 위해 포함되는 것으로 이러한 예에 도시되어 있다. 초점 성분(175)은 광학 구성요소(예를 들어, 대물 렌즈(142))에 대해 샘플 스테이지(170) 상의 샘플 용기(110)를 이동시켜 이미징 작동을 위한 적절한 초점 조정을 제공하기 위해 광학 스테이지 또는 샘플 스테이지에, 또는 둘 다 모두에, 물리적으로 결합된 하나 이상의 액추에이터를 포함할 수 있다. 예를 들어, 액추에이터는 예를 들어 스테이지에 대한 기계적, 자기적, 유체적 또는 다른 부착 또는 스테이지와의 직접적 또는 간접적 접촉에 의한 것과 같이, 각각의 스테이지에 물리적으로 결합될 수 있다. 하나 이상의 액추에이터는 샘플 스테이지를 동일한 평면 내에 유지하면서(예를 들어, 광축에 수직인 레벨 또는 수평 자세를 유지하면서) 스테이지를 z 방향으로 이동시키도록 구성될 수 있다. 하나 이상의 액추에이터는 또한 스테이지를 기울이도록 구성될 수 있다. 예를 들어, 이는 샘플 용기(110)가 그의 표면의 임의의 기울기를 고려하여 동적으로 수평이 되도록 수행될 수 있다.A focus (z-axis) component 175 is shown in this example as being included to control the positioning of the optical component relative to the sample vessel 110 in the focus direction (typically referred to as the z-axis or z-direction). It is done. Focus component 175 moves sample vessel 110 on sample stage 170 relative to an optical component (e.g., objective lens 142) to provide appropriate focus adjustment for imaging operations. It may include one or more actuators physically coupled to the sample stage, or both. For example, an actuator may be physically coupled to each stage, such as by mechanical, magnetic, fluidic or other attachment to the stage or direct or indirect contact with the stage. One or more actuators may be configured to move the sample stage in the z-direction while maintaining the sample stage in the same plane (e.g., maintaining a level or horizontal posture perpendicular to the optical axis). One or more actuators may also be configured to tilt the stage. For example, this can be done so that sample vessel 110 is dynamically leveled to account for any tilt of its surface.

시스템의 포커싱은 일반적으로 대물 렌즈의 초점 평면을 선택된 샘플 위치에서 이미징될 샘플과 정렬시키는 것을 지칭한다. 그러나, 포커싱은 또한 예를 들어 시험 샘플의 이미지에 대한 원하는 레벨의 선명도 또는 콘트라스트와 같은, 샘플의 표현에 대한 원하는 특성을 획득하기 위한 시스템에 대한 조정을 지칭할 수 있다. 대물 렌즈의 초점 평면의 사용가능한 피사계 심도(depth of field)가 작을 수 있기 때문에(때때로 대략 1pm 이하), 초점 컴포넌트(175)는 이미징되는 표면을 밀접하게 따른다. 샘플 용기가 기구 내에 고정된 것처럼 완벽하게 평평하지 않기 때문에, 초점 컴포넌트(175)는 스캐닝 방향(본 명세서에서 y-축으로 지칭됨)을 따라 이동하는 동안 이러한 프로파일을 따르도록 설정될 수 있다.Focusing of the system generally refers to aligning the focal plane of the objective lens with the sample to be imaged at a selected sample location. However, focusing can also refer to adjustments to the system to obtain desired characteristics for the representation of the sample, such as, for example, a desired level of sharpness or contrast for the image of the test sample. Because the usable depth of field of the objective lens' focal plane can be small (sometimes on the order of 1 pm or less), the focal component 175 closely follows the surface being imaged. Because the sample vessel is not perfectly flat as if fixed within the instrument, the focusing component 175 can be set to follow this profile while moving along the scanning direction (referred to herein as the y-axis).

이미징되는 샘플 위치에서 시험 샘플로부터 나오는 광은 카메라 시스템(140)의 하나 이상의 검출기로 지향될 수 있다. 초점 영역으로부터 나오는 광만이 검출기로 통과하도록 허용하기 위해 개구가 포함되고 위치될 수 있다. 개구는 초점 영역의 외측에 있는 영역으로부터 나오는 광의 성분을 필터링함으로써 이미지 품질을 개선하기 위해 포함될 수 있다. 방출 필터가 필터 스위칭 조립체(145) 내에 포함될 수 있고, 이는 결정된 방출 파장을 기록하도록 그리고 임의의 스트레이 레이저 광(stray laser light)을 차단하도록 선택될 수 있다.Light emerging from the test sample at the sample location being imaged may be directed to one or more detectors of camera system 140. An aperture may be included and positioned to allow only light from the focus area to pass to the detector. The aperture may be included to improve image quality by filtering components of light coming from areas outside the focus area. An emission filter may be included in the filter switching assembly 145, which may be selected to record the determined emission wavelength and block any stray laser light.

예시되지는 않지만, 스캐닝 시스템의 작동을 제어하기 위해 제어기가 제공될 수 있다. 제어기는 예를 들어 포커싱, 스테이지 이동, 및 이미징 동작과 같은 시스템 동작의 양태를 제어하도록 구현될 수 있다. 다양한 구현예에서, 제어기는 하드웨어, 알고리즘(예컨대, 기계 실행가능 명령어), 또는 전술한 것의 조합을 사용하여 구현될 수 있다. 예를 들어, 일부 구현예에서, 제어기는 연관된 메모리를 가진 하나 이상의 CPU 또는 프로세서를 포함할 수 있다. 다른 예로서, 제어기는 컴퓨터 프로세서 및 기계 판독가능 명령어가 저장된 비일시적 컴퓨터 판독가능 매체와 같은 작동을 제어하기 위한 하드웨어 또는 다른 회로를 포함할 수 있다. 예를 들어, 이러한 회로는 하기 중 하나 이상을 포함할 수 있다: 필드 프로그램가능 게이트 어레이(FPGA), 주문형 집적 회로(ASIC), 프로그램가능 로직 디바이스(PLD), 복합 프로그램가능 로직 디바이스(CPLD), 프로그램가능 로직 어레이(PLA), 프로그램가능 어레이 로직(PAL) 또는 다른 유사한 처리 디바이스 또는 회로. 또 다른 예로서, 제어기는 이러한 회로와 하나 이상의 프로세서의 조합을 포함할 수 있다.Although not illustrated, a controller may be provided to control the operation of the scanning system. A controller may be implemented to control aspects of system operation such as focusing, stage movement, and imaging operations, for example. In various implementations, the controller may be implemented using hardware, algorithms (e.g., machine executable instructions), or a combination of the foregoing. For example, in some implementations, a controller may include one or more CPUs or processors with associated memory. As another example, a controller may include hardware or other circuitry for controlling the operation of a computer processor and a non-transitory computer-readable medium having machine-readable instructions stored thereon. For example, such circuits may include one or more of the following: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), programmable logic devices (PLDs), complex programmable logic devices (CPLDs), Programmable Logic Array (PLA), Programmable Array Logic (PAL) or other similar processing device or circuit. As another example, a controller may include a combination of such circuitry and one or more processors.

도 2는 특정 구현예에서 구현될 수 있는 예시적인 2-채널, 라인-스캐닝 모듈형 광학 이미징 시스템(200)을 예시한 블록도이다. 시스템 및 방법이 예시적인 이미징 시스템(200)의 맥락에서 때때로 본 명세서에 기술될 수 있지만, 이들은 본 명세서에 개시된 기술의 구현예가 구현될 수 있는 예일 뿐이라는 것에 유의하여야 한다.FIG. 2 is a block diagram illustrating an example two-channel, line-scanning modular optical imaging system 200 that may be implemented in certain implementations. Although systems and methods may sometimes be described herein in the context of example imaging system 200, it should be noted that these are merely examples of how implementations of the techniques disclosed herein may be implemented.

일부 구현예에서, 시스템(200)은 핵산의 시퀀싱을 위해 사용될 수 있다. 적용 가능한 기술은 핵산이 어레이(예: 플로우 셀의 웰) 내의 고정 위치에서 부착되고 어레이를 반복적으로 이미지화한 것들을 포함한다. 그러한 구현예에서, 시스템(200)은 2개의 상이한 컬러 채널에서 이미지를 획득할 수 있고, 이는 특정 뉴클레오티드 염기 유형을 다른 것과 구별하는 데 사용될 수 있다. 더 구체적으로, 시스템(200)은 "염기 콜링(base calling)"으로 지칭되는 프로세스를 구현할 수 있고, 이는 일반적으로 이미징 사이클에서 이미지의 주어진 스폿 위치에 대한 염기 콜(예컨대, 아데닌 (A), 사이토신 (C), 구아닌 (G), 또는 티민 (T))을 결정하는 프로세스를 지칭한다. 2-채널 염기 콜링 동안, 2개의 이미지로부터 추출된 이미지 데이터는 2개의 이미지의 세기의 조합으로서 염기 아이덴티티를 인코딩함으로써 4개의 염기 유형들 중 하나의 존재를 결정하는 데 사용될 수 있다. 2개의 이미지들 각각의 주어진 스폿 또는 위치에 대해, 염기 아이덴티티는 신호 아이덴티티의 조합이 [온, 온], [온, 오프], [오프, 온], 또는 [오프, 오프]인지 여부에 기초하여 결정될 수 있다.In some implementations, system 200 can be used for sequencing nucleic acids. Applicable techniques include those in which nucleic acids are attached at fixed locations within an array (e.g., wells of a flow cell) and the array is repeatedly imaged. In such implementations, system 200 can acquire images in two different color channels, which can be used to distinguish certain nucleotide base types from others. More specifically, system 200 may implement a process referred to as “base calling,” which generally refers to a base call (e.g., adenine (A), cytosine, etc.) for a given spot location in an image in an imaging cycle. refers to the process of determining thymine (C), guanine (G), or thymine (T)). During two-channel base calling, image data extracted from two images can be used to determine the presence of one of four base types by encoding the base identity as a combination of the intensities of the two images. For a given spot or position in each of the two images, the base identity is based on whether the combination of signal identities is [on, on], [on, off], [off, on], or [off, off]. can be decided.

다시 이미징 시스템(200)을 참조하면, 시스템은 2개의 광원(211, 212)이 그 내부에 배치된 라인 생성 모듈(line generation module, LGM)(210)을 포함한다. 광원(211, 212)은 레이저 빔을 출력하는 레이저 다이오드와 같은 간섭성 광원일 수 있다. 광원(211)은 제1 파장(예컨대, 적색 파장)의 광을 방출할 수 있고, 광원(212)은 제2 파장(예컨대, 녹색 파장)의 광을 방출할 수 있다. 레이저 광원(211, 212)으로부터 출력된 광 빔은 빔 성형 렌즈 또는 렌즈들(213)을 통해 지향될 수 있다. 일부 구현예에서, 단일 광 성형 렌즈가 둘 모두의 광원으로부터 출력된 광 빔을 성형하는 데 사용될 수 있다. 다른 구현예에서, 별개의 빔 성형 렌즈가 각각의 광 빔에 대해 사용될 수 있다. 일부 예에서, 빔 성형 렌즈는 파월 렌즈(Powell lens)이어서, 광 빔은 라인 패턴으로 성형된다. LGM(210)의 빔 성형 렌즈 또는 이미징 시스템(200)의 다른 광학 컴포넌트는 광원(211, 212)에 의해 방출된 광을 (예컨대, 하나 이상의 파월 렌즈, 또는 다른 빔 성형 렌즈, 회절 또는 산란 컴포넌트를 사용함으로써) 라인 패턴으로 성형하도록 구성될 수 있다.Referring again to the imaging system 200, the system includes a line generation module (LGM) 210 with two light sources 211 and 212 disposed therein. The light sources 211 and 212 may be coherent light sources such as laser diodes that output a laser beam. The light source 211 may emit light of a first wavelength (eg, a red wavelength), and the light source 212 may emit light of a second wavelength (eg, a green wavelength). The light beam output from the laser light sources 211 and 212 may be directed through a beam shaping lens or lenses 213. In some implementations, a single light shaping lens can be used to shape the light beams output from both light sources. In other implementations, separate beam shaping lenses may be used for each light beam. In some examples, the beam shaping lens is a Powell lens, so that the light beam is shaped into a line pattern. The beam shaping lens of LGM 210 or other optical component of imaging system 200 may be configured to convert light emitted by light sources 211, 212 (e.g., one or more Powell lenses, or other beam shaping lenses, diffractive or scattering components). (by using) can be configured to be molded into a line pattern.

LGM(210)은 광 빔을 단일 인터페이스 포트를 통해 방출 광학계 모듈(emission optics module, EOM)(230)로 지향시키도록 구성된 미러(214) 및 준-반사 미러(semi-reflective mirror)(215)를 더 포함할 수 있다. 광 빔은 셔터 요소(216)를 통과할 수 있다. EOM(230)은 대물 렌즈(235) 및 대물 렌즈(235)를 타겟(250)에 더 가깝게 또는 그로부터 더 멀리 종방향으로 이동시키는 z-스테이지(236)를 포함할 수 있다. 예를 들어, 타겟(250)은 액체 층(252) 및 반투명 커버 플레이트(251)를 포함할 수 있고, 생물학적 샘플이 반투명 커버 플레이트의 내측 표면뿐만 아니라 액체 층 아래에 위치된 기재 층의 내측 표면에 위치될 수 있다. z-스테이지(236)는 이어서 광 빔을 플로우 셀의 각각의 내측 표면 상으로 포커싱하기 위해 대물 렌즈를 이동시킬 수 있다(예컨대, 생물학적 샘플 상에 포커싱됨). 생물학적 샘플은 DNA, RNA, 단백질, 또는 당업계에 알려진 바와 같이 광학 시퀀싱에 반응하는 다른 생물학적 물질일 수 있다.LGM 210 includes a mirror 214 and a semi-reflective mirror 215 configured to direct the light beam through a single interface port to an emission optics module (EOM) 230. More may be included. The light beam may pass through shutter element 216. EOM 230 may include an objective lens 235 and a z-stage 236 that longitudinally moves objective lens 235 closer to or further away from target 250 . For example, target 250 may include a liquid layer 252 and a translucent cover plate 251, such that the biological sample can be applied to the inner surface of the semitransparent cover plate as well as to the inner surface of a substrate layer located beneath the liquid layer. can be located The z-stage 236 may then move the objective lens to focus the light beam onto each inner surface of the flow cell (e.g., focused on the biological sample). The biological sample may be DNA, RNA, protein, or other biological material amenable to optical sequencing as known in the art.

EOM(230)은, 초점 추적 모듈(focus tracking module, FTM)(240)로부터 방출된 초점 추적 광 빔을 타겟(250) 상으로 반사하기 위한 그리고 이어서 타겟(250)으로부터 반환된 광을 다시 FTM(240) 내로 반사하기 위한 준-반사 미러(233)를 포함할 수 있다. FTM(240)은, 반환된 초점 추적 광 빔의 특성을 검출하고 타겟(250)에 대한 대물 렌즈(235)의 초점을 최적화하도록 피드백 신호를 생성하기 위한 초점 추적 광학 센서를 포함할 수 있다.The EOM 230 is configured to reflect the focus tracking light beam emitted from the focus tracking module (FTM) 240 onto the target 250 and then return the light from the target 250 back to the FTM (FTM). 240) may include a quasi-reflecting mirror 233 for reflecting inwards. FTM 240 may include a focus tracking optical sensor to detect characteristics of the returned focus tracking light beam and generate a feedback signal to optimize the focus of objective lens 235 on target 250.

EOM(230)은 또한 광을 대물(235)을 통해 지향시키는 동시에 타겟(250)으로부터 반환된 광이 통과하도록 허용하기 위한 준-반사 미러(234)를 포함할 수 있다. 일부 구현예에서, EOM(230)은 튜브 렌즈(tube lens)(232)를 포함할 수 있다. 튜브 렌즈(232)를 통해 투과된 광은 필터 요소(231)를 통해 그리고 카메라 모듈(camera module, CAM)(220)로 통과할 수 있다. CAM(220)은 입사 광 빔에 응답하여 생물학적 샘플로부터 방출된 광(예컨대, 광원(211, 212)으로부터 수신된 적색 및 녹색 광에 응답한 형광)을 검출하기 위한 하나 이상의 광학 센서(221)를 포함할 수 있다.EOM 230 may also include a quasi-reflective mirror 234 to direct light through objective 235 while allowing light returned from target 250 to pass. In some implementations, EOM 230 may include a tube lens 232. Light transmitted through the tube lens 232 may pass through the filter element 231 and to a camera module (CAM) 220. CAM 220 includes one or more optical sensors 221 for detecting light emitted from a biological sample in response to an incident light beam (e.g., fluorescence in response to red and green light received from light sources 211, 212). It can be included.

CAM(220)의 센서로부터의 출력 데이터는 실시간 분석 모듈(225)로 전달될 수 있다. 다양한 구현에서 실시간 분석 모듈은, 이미지 데이터(예: 이미지 품질 점수, 베이스 콜링 등)를 분석하고, 그래픽 사용자 인터페이스(GUI) 등에 빔의 특성(예: 초점, 모양, 세기, 출력, 밝기, 위치)을 보고하거나 표시하는 등의 컴퓨터 판독 가능 명령을 실행한다. 이러한 동작은 이미징 사이클 동안 실시간으로 수행되어 다운스트림 분석 시간을 최소화하고 이미징 실행 중에 실시간 피드백과 문제 해결을 제공할 수 있다. 구현예에서, 실시간 분석 모듈은, 이미징 시스템(200)에 통신가능하게 결합되고 이를 제어하는 컴퓨팅 디바이스(예컨대, 컴퓨팅 디바이스(1000))일 수 있다. 아래에서 더 설명되는 구현예에서, 실시간 분석 모듈(225)은 CAM (220)으로부터 수신된 출력 이미지 데이터의 신호 대 잡음비를 최대화하기 위해 컴퓨터 판독 가능 명령어를 추가로 실행할 수 있다.Output data from the sensors of CAM 220 may be transmitted to real-time analysis module 225. In various implementations, the real-time analysis module analyzes image data (e.g., image quality score, base calling, etc.) and displays beam characteristics (e.g., focus, shape, intensity, power, brightness, position), etc. in a graphical user interface (GUI). Executes computer-readable instructions, such as reporting or displaying These operations can be performed in real time during the imaging cycle, minimizing downstream analysis time and providing real-time feedback and troubleshooting during the imaging run. In implementations, the real-time analysis module may be a computing device (e.g., computing device 1000) that is communicatively coupled to and controls imaging system 200. In an implementation described further below, real-time analysis module 225 may further execute computer readable instructions to maximize the signal-to-noise ratio of output image data received from CAM 220.

시퀀싱은 대응하는 m개의 이미지 채널들에 대해 시퀀싱 사이클당 m개의 시퀀싱 이미지들을 생성한다. 즉, 서열분석 이미지들 각각은 하나 이상의 이미지(또는 세기) 채널들(색상 이미지의 적색, 녹색, 청색(red, green, blue, RGB) 채널들과 유사함)을 갖는다. 하나의 구현예에서, 각각의 이미지 채널은 복수의 필터 파장 대역들 중 하나에 대응한다. 다른 구현예에서, 각각의 이미지 채널은 서열분석 사이클에서의 복수의 이미징 이벤트들 중 하나에 대응한다. 또 다른 구현예에서, 각각의 이미지 채널은 특정 레이저를 사용하는 조명과 특정 광학 필터를 통한 이미징의 조합에 대응한다. 이미지 패치들은 특정 시퀀싱 사이클 동안 m개의 이미지 채널들 각각으로부터 타일링(또는 액세스)된다. 4-, 2-, 및 1-채널 화학들과 같은 상이한 구현예들에서, m은 4 또는 2이다. 다른 구현예들에서, m은 1, 3, 또는 4 초과이다. 다른 구현예에서, 이미지는 적색 및 녹색 채널 대신에 또는 그에 더하여 청색 및 자색 색상 채널에 있을 수 있다.Sequencing generates m sequencing images per sequencing cycle for the corresponding m image channels. That is, each of the sequencing images has one or more image (or intensity) channels (similar to the red, green, blue (RGB) channels of a color image). In one implementation, each image channel corresponds to one of a plurality of filter wavelength bands. In another implementation, each image channel corresponds to one of a plurality of imaging events in a sequencing cycle. In another implementation, each image channel corresponds to a combination of illumination using a specific laser and imaging through a specific optical filter. Image patches are tiled (or accessed) from each of the m image channels during a particular sequencing cycle. In different embodiments, such as 4-, 2-, and 1-channel chemistries, m is 4 or 2. In other embodiments, m is greater than 1, 3, or 4. In other implementations, the image may be in the blue and violet color channels instead of or in addition to the red and green channels.

공간적 구성-특이적 전문 신호 프로파일러Spatial configuration-specific expert signal profiler

도 3은 각각의 공간적 구성들 1 내지 N에서 플로우 셀 상에 위치된 클러스터들의 각각의 클래스들 1 내지 N에 대해 생성된 각각의 이미지 데이터 서브세트들 1 내지 N의 신호 대 잡음비를 최대화하도록 훈련된 각각의/별개의/상이한/독립적인 전문 신호 프로파일러 1 내지 N의 일 구현예를 도시한다. 시퀀싱 런(300)은 복수의 시퀀싱 사이클들/이미징 사이클들 1 내지 K의 집단을 시퀀싱하고 클러스터들의 집단의 세기 방출물들을 묘사하는 이미지 데이터를 생성한다.Figure 3 shows image data subsets 1 to N generated for each class 1 to N of clusters located on the flow cell in respective spatial configurations 1 to N, trained to maximize the signal-to-noise ratio. An example implementation of each/separate/different/independent specialized signal profilers 1 to N is shown. Sequencing run 300 sequences a population of a plurality of sequencing cycles/imaging cycles 1 through K and generates image data depicting the intensity emissions of the population of clusters.

시퀀싱 런(300) 동안 생성된 이미지 데이터와 관련하여, 플로우 셀의 이미징 사이클은 하나 이상의 일관성 있는 광원을 사용하여 플로우 셀 영역을 스캔(예: 라인 스캐너 사용)하여 전체 플로우 셀에 대한 이미지 데이터를 수집할 수 있도록 수행된다. 예를 들어, 이미징 시스템(200)은 시스템의 광학 장치와 협력하여 LGM(210)을 사용하여 적색 스펙트럼 내의 파장을 갖는 빛으로 플로우 셀을 라인 스캔하고 녹색 스펙트럼 내의 파장을 갖는 빛으로 샘플을 라인 스캔할 수 있다. 라인 스캐닝에 응답하여, 플로우 셀 형광의 서로 다른 클러스터에 위치한 형광 염료와 결과적인 빛이 대물 렌즈(235)에 의해 수집되어 형광을 감지하기 위해 CAM(220)의 이미지 센서로 향할 수 있다. 예를 들어, 각각의 클러스터의 형광은 CAM(220)의 몇 개의 픽셀에 의해 검출될 수 있다. 이어서, CAM(220)으로부터 출력된 이미지 데이터는 이미지 노이즈 보정 및 염기 호출을 위해 실시간 분석 모듈(225)로 통신될 수 있다.With respect to image data generated during sequencing run 300, the imaging cycle of the flow cell uses one or more coherent light sources to scan the flow cell area (e.g., using a line scanner) to collect image data for the entire flow cell. It is carried out so that it can be done. For example, imaging system 200, in conjunction with the system's optics, may use LGM 210 to line scan a flow cell with light having a wavelength within the red spectrum and to line scan a sample with light having a wavelength within the green spectrum. can do. In response to line scanning, the fluorescent dye located in different clusters of flow cell fluorescence and the resulting light can be collected by objective lens 235 and directed to the image sensor of CAM 220 for detection of the fluorescence. For example, the fluorescence of each cluster can be detected by several pixels of CAM 220. Subsequently, the image data output from the CAM 220 may be communicated to the real-time analysis module 225 for image noise correction and base calling.

클러스터들의 집단은 플로우 셀에 걸쳐 공간적으로 분포된다. 플로우 셀, 및 따라서 클러스터들 및 이미지 데이터의 집단은 플로우 셀의 상이한 영역들에 의해 정의된 상이한 공간적 구성로 부분적으로 부분적이다. 따라서, 예를 들어, 플로우 셀이 3개의 직사각형 영역들로 부분적으로 있는 경우, 이는 플로우 셀의 3개의 공간적 구성들, 클러스터들의 3개의 서브집단들 또는 클래스들, 이미지 데이터의 3개의 서브세트들, 및 3개의 전문 신호 프로파일러들을 초래할 것이다. 다음 세부 규모에서, 플로우 셀의 3개의 직사각형 영역들 각각은 3개의 정사각형으로 더 분할되어, 총 9개의 정사각형을 생성할 수 있다. 이는 결국 플로우 셀, 9개의 서브집단 또는 클래스들의 클래스들, 이미지 데이터의 9개의 서브세트들, 및 9개의 전문 신호 프로파일러들의 9개의 공간적 구성들을 초래할 것이다.A population of clusters is spatially distributed across the flow cell. The flow cell, and thus the population of clusters and image data, is partially segmented with different spatial configurations defined by different regions of the flow cell. So, for example, if the flow cell is partially in three rectangular regions, this represents three spatial configurations of the flow cell, three subpopulations or classes of clusters, three subsets of image data, and three specialized signal profilers. At the next detailed scale, each of the three rectangular regions of the flow cell can be further divided into three squares, creating a total of nine squares. This will result in 9 spatial configurations of the flow cell, 9 subpopulations or classes of classes, 9 subsets of image data, and 9 expert signal profilers.

이미지 데이터의 세그먼트화와 관련하여, 이미지 데이터는 플로우 셀의 각각의 영역에 대응하는 복수의 이미지 데이터 서브세트들로 분할된다. 다양한 구현예에서, 이미지 데이터 서브세트의 크기는 이미징 시스템(200)의 시야에서 기점 마커 또는 기점의 배치를 사용하여, 플로우 셀에서, 또는 플로우 셀 상에 결정될 수 있다. 이미지 데이터 서브세트는 각각의 이미지 데이터 서브세트의 픽셀이 미리 결정된 수의 기준점(예를 들어, 적어도 3개의 기준점, 4개의 기준점, 6개의 기준점, 8개의 기준점 등)을 갖도록 분할될 수 있다. 예를 들어, 이미지 데이터 서브세트의 픽셀들의 총 수는 이미지 데이터 서브세트의 경계들과 기점들 사이의 미리 결정된 픽셀 거리들에 기초하여 미리 결정될 수 있다.Regarding segmentation of image data, the image data is divided into a plurality of image data subsets corresponding to each region of the flow cell. In various implementations, the size of the image data subset can be determined at, or on, the flow cell using the placement of fiducial markers or fiducials in the field of view of imaging system 200. The image data subsets may be partitioned such that the pixels of each image data subset have a predetermined number of reference points (e.g., at least 3 reference points, 4 reference points, 6 reference points, 8 reference points, etc.). For example, the total number of pixels in an image data subset can be predetermined based on predetermined pixel distances between boundaries and fiducials of the image data subset.

또한 이미지 데이터의 각 하위 집합에 있는 세기 데이터는 여러 색상 채널(예: 적색, 청색, 녹색 및/또는 자색 이미지 채널)에 걸쳐 있을 수 있다. 따라서, 각각의 전문 신호 프로파일러는 대응하는 컬러 채널에서 세기 데이터의 신호 대 잡음비를 최대화하도록 훈련된 계수들의 각각의 세트를 갖는다.Additionally, the intensity data in each subset of image data may span multiple color channels (e.g., red, blue, green, and/or purple image channels). Accordingly, each expert signal profiler has a respective set of coefficients trained to maximize the signal-to-noise ratio of the intensity data in the corresponding color channel.

추론 동안, 새로운/보이지 않는/야생 이미지 데이터는 정의된 공간 구성과 동일한 기준으로 분할되고 훈련 시 각각의/별도의/상이한/독립적인 전문 신호 프로파일러를 생성하는 데 사용된다. 각각의/별도의/상이한/독립적인 전문 신호 프로파일러는 전문 신호 프로파일러의 애플리케이션이 대응하는 새로운 이미지 데이터 서브세트에만 한정되는 방식으로 세그먼트화된 새로운 이미지 데이터 서브세트에 적용된다.During inference, new/unseen/wild image data is partitioned with the same criteria as defined spatial configuration and used to generate each/separate/different/independent expert signal profiler in training. Each/separate/different/independent expert signal profiler is applied to the segmented new image data subset in such a way that the application of the expert signal profiler is limited to the corresponding new image data subset only.

이미지 데이터 서브세트 1 내지 N에 전문 신호 프로파일러 1 내지 N을 적용하면 신호 대 잡음비가 최대화된 이미지 데이터 서브세트 1 내지 N이 생성되며, 이에 대한 추가 세부 사항은 2021년 5월 4일자로 출원된 "Equalization-Based Image Processing and Spatial Crosstalk Attenuator"라는 명칭의 미국 정규 특허 출원 제17/308,035호(대리인 문서 번호 ILLM 1032-2/IP-1991-PRV)에서 찾을 수 있다. 채널, 공간적, 또는 위상화 노이즈에 대한 다른 보정들은 미처리 이미지 데이터에 미리 적용될 수 있거나, 후속적으로 신호 대 잡음비에 대한 이미지 데이터 서브세트들 1 내지 N을 최대화할 수 있다.Applying expert signal profilers 1 to N to image data subsets 1 to N generates image data subsets 1 to N with maximized signal-to-noise ratio, for further details see Filed May 4, 2021 It can be found in U.S. Provisional Patent Application Serial No. 17/308,035, entitled “Equalization-Based Image Processing and Spatial Crosstalk Attenuator” (Attorney Docket No. ILLM 1032-2/IP-1991-PRV). Other corrections for channel, spatial, or phase noise can be pre-applied to the raw image data, or subsequently maximize image data subsets 1 through N for signal-to-noise ratio.

전문가 신호 프로파일러들 1 내지 N의 출력, 즉, 상기 신호 대 잡음비 최대화된 이미지 데이터 서브세트들 1 내지 N은 염기 호출자(332)에 대한 입력으로서 제공되어 상기 클러스터들의 집단에 대해 염기 호출들 1 내지 N을 생성한다. 염기 호출은 수학적 모델을 세기 데이터에 피팅함으로써 수행될 수 있다. 사용될 수 있는 적합한 수학적 모델은 예를 들어 k-수단 클러스터링 알고리즘을 포함하며, k-평균 유사 클러스터링 알고리즘, 기대 최대화 클러스터링 알고리즘, 히스토그램 기반 방법 등을 포함한다. 4개의 가우시안 분포는 데이터 세트에 나타낸 4개의 뉴클레오티드 각각에 대해 하나의 분포가 적용되도록 2 채널 세기 데이터의 세트에 피팅될 수 있다. 하나의 특정 구현예에서, 기대 최대화(EM) 알고리즘이 적용될 수 있다. EM 알고리즘의 결과로서, 각각의 X, Y 값(각각 2개의 채널 세기들 각각을 참조함)에 대해, 특정 X, Y 세기 값이 데이터가 피팅되는 4개의 가우시안 분포들 중 하나에 속할 가능성을 나타내는 값이 생성될 수 있다. 4개의 염기가 4개의 별개의 분포를 제공하는 경우, 각각의 X, Y 세기 값은 또한 4개의 염기 각각에 대해 하나씩, 4개의 연관된 가능성 값을 가질 것이다. 4 개의 가능성 값들 중 최대값은 염기 호출을 나타낸다. 예를 들어, 클러스터가 채널들 둘 모두에서 "오프"되는 경우, 염기 호출은 G이다. 클러스터가 하나의 채널에서 "오프"되고, 다른 채널에서 "온"인 경우, 염기 호출은 C 또는 T이고(어느 채널이 켜지는 것에 따라), 클러스터가 둘 모두의 채널들에서 "온"이면, 염기 호출은 A이다.The output of expert signal profilers 1 through N, i.e. the signal-to-noise ratio maximized image data subsets 1 through N, are provided as input to base caller 332 to make base calls 1 through N for the population of clusters. Create N. Base calling can be performed by fitting a mathematical model to the intensity data. Suitable mathematical models that can be used include, for example, k-means clustering algorithms, k-means pseudo-clustering algorithms, expectation maximization clustering algorithms, histogram-based methods, etc. Four Gaussian distributions can be fit to the set of two-channel intensity data such that one distribution applies for each of the four nucleotides represented in the data set. In one particular implementation, the expectation maximization (EM) algorithm may be applied. As a result of the EM algorithm, for each X, Y value (referring to each of the two channel intensities), a A value can be created. If the four bases give four distinct distributions, each X, Y intensity value will also have four associated likelihood values, one for each of the four bases. The maximum of the four possible values represents the base call. For example, if the cluster is “off” in both channels, the base call is G. If the cluster is "off" in one channel and "on" in the other, the base call is C or T (depending on which channel is on), and if the cluster is "on" in both channels, The base call is A.

도 4는 본 명세서에 개시된 구현예에 따라 이미징될 수 있는 플로우 셀(400)의 예시적인 구성을 예시한다. 일부 구현예들에서, 플로우 셀(400)은 이미징 런 동안 동시에 이미징될 수 있는 정렬된 클러스터들 또는 스폿들 또는 특징부들의 육각형 어레이로 패턴화된다. 다른 구현예에서, 플로우 셀(400)은 직선 어레이, 원형 어레이, 팔각형 어레이, 또는 일부 다른 어레이 패턴을 사용하여 패턴화될 수 있다. 플로우 셀(400)은 이미징되는 수십, 수백, 수천, 수백만, 또는 수십억 개의 클러스터들을 가질 수 있다. 특정 구현예에서, 플로우 셀(400)은 레인으로 분할되는 수백만 또는 수십억 개의 웰로 패턴화될 수 있다. 이러한 특정 구현예에서, 플로우 셀(400)의 각각의 웰은 적어도 하나의 클러스터를 포함할 수 있다.FIG. 4 illustrates an example configuration of a flow cell 400 that can be imaged in accordance with implementations disclosed herein. In some implementations, flow cell 400 is patterned into a hexagonal array of aligned clusters or spots or features that can be imaged simultaneously during an imaging run. In other implementations, flow cell 400 may be patterned using a straight array, circular array, octagonal array, or some other array pattern. Flow cell 400 may have tens, hundreds, thousands, millions, or billions of clusters being imaged. In certain implementations, flow cell 400 may be patterned with millions or billions of wells divided into lanes. In this particular implementation, each well of flow cell 400 may include at least one cluster.

표면-특이적 전문 신호 프로파일러Surface-specific expert signal profiler

일부 구현예들에서, 플로우 셀(400)은 이미징 런 동안 샘플링되는 클러스터들의 다수의 평면들을 포함하는 다중-평면 샘플일 수 있다. 클러스터의 다수의 평면의 예는 상부 표면(402) 및 하부 표면(412)을 포함한다. 따라서, 일 구현예에서, 플로우 셀(400)은 상부 및 하부 표면들(402, 412)에 대응하는 2개의 공간적 구성들을 가질 수 있고, 이는 결국 클러스터들의 2개의 서브모집단들 또는 클래스들, 이미지 데이터의 2개의 서브세트들, 및 2개의 전문적 신호 프로파일러들을 초래한다.In some implementations, flow cell 400 may be a multi-plane sample containing multiple planes of clusters that are sampled during an imaging run. Examples of multiple planes of a cluster include an upper surface 402 and a lower surface 412. Accordingly, in one implementation, flow cell 400 may have two spatial configurations corresponding to upper and lower surfaces 402, 412, which in turn may result in two subpopulations or classes of clusters, image data. This results in two subsets of , and two specialized signal profilers.

도 6a는 각각의 클러스터 클래스들(602)에 대해 각각의 표면-특이적 전문 신호 프로파일러들(604)을 훈련시키는 일 구현예를 도시한다. 각각의 클러스터 클래스들(602)은 상부 표면(402) 상에 각각 위치된 클러스터들의 그룹들, 및 하부 표면(412) 상에 위치된 클러스터들을 포함한다. 결과는 상부 표면(402) 상에 위치된 클러스터의 세기 데이터의 신호 대 잡음비의 최대화를 최대화하도록 구성된 제1 전문 신호 프로파일러(604a), 및 하부 표면(412) 상에 위치된 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제2 전문 신호 프로파일러(604b)이다.Figure 6A shows one implementation of training each surface-specific expert signal profiler 604 for each cluster class 602. Each cluster class 602 includes groups of clusters each located on the upper surface 402, and clusters located on the lower surface 412. The result is a first expert signal profiler 604a configured to maximize the signal-to-noise ratio of the intensity data of clusters located on the upper surface 402, and the intensity data of the clusters located on the lower surface 412. A second specialized signal profiler 604b is configured to maximize the signal-to-noise ratio.

도 6b는 시퀀싱 런(600) 동안 각각의 클러스터 클래스들(602)에 대응하는 이미지 데이터 서브세트들(632, 642)에 훈련된 표면-특이적 전문 신호 프로파일러들(604)을 적용하는 일 구현예를 도시한다. 일 구현예에서, 플로우 셀(400)은 타일 레벨에서 이미징된다. 따라서, 상부 표면(402)은 1600 타일들의 제1 세트를 갖고 하부 표면(412)은 1600 타일들의 제2 세트를 갖는다고 가정하면, 제1 이미지 데이터 서브세트(632)는 1600 타일들의 제1-세트의-1600개의 타일 이미지들의 제1 세트를 포함하고, 제2 이미지 데이터 서브세트(642)는 1600 타일들의 제2-세트의-1600개의 타일 이미지들의 제2 세트를 포함한다.6B illustrates one implementation of applying trained surface-specific expert signal profilers 604 to image data subsets 632, 642 corresponding to each cluster class 602 during a sequencing run 600. An example is shown. In one implementation, flow cell 400 is imaged at the tile level. Accordingly, assuming top surface 402 has a first set of 1600 tiles and bottom surface 412 has a second set of 1600 tiles, first image data subset 632 has a first set of 1600 tiles. and a first set of -1600 tile images of the set, and the second image data subset 642 includes a second set of -1600 tile images of the second set of 1600 tiles.

제1 전문 신호 프로파일러(604a)는 제1 이미지 데이터 서브세트(632)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제1 이미지 데이터 서브세트(632)의 신호 대 잡음비 최대화 버전(634)을 생성하도록 구성된다. 제2 전문 신호 프로파일러(604b)는 제2 이미지 데이터 서브세트(642)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제2 이미지 데이터 서브세트(642)의 신호 대 잡음비 최대화 버전(644)을 생성하도록 구성된다. 염기 호출자(332)는 신호 대 잡음비를 최대화된 버전들(634, 644)을 프로세싱하고 염기 호출들(638, 648)을 생성한다.The first expert signal profiler 604a maximizes the signal-to-noise ratio of the intensity data in the first image data subset 632 to generate a signal-to-noise ratio maximized version 634 of the first image data subset 632. It is configured to do so. The second expert signal profiler 604b maximizes the signal-to-noise ratio of the intensity data in the second image data subset 642 to generate a signal-to-noise ratio maximized version 644 of the second image data subset 642. It is configured to do so. Base caller 332 processes the signal-to-noise ratio maximized versions 634, 644 and generates base calls 638, 648.

본 명세서에 사용되는 바와 같이, 용어 "최대화 버전"은 전문 신호 프로파일러에 의한 출력으로서 생성된 데이터를 지칭한다. 입력의 최대화된 버전은 대응하는 입력을 처리하는 것에 응답하여 전문 신호 프로파일러에 의해 생성된 출력이다. 입력의 최대화된 버전은 전문 신호 프로파일러에 의해 처리된 대응하는 입력보다 더 큰 신호 대 잡음비(SNR, S/R)를 갖는다. 예를 들어, 전문 신호 프로파일러에 의해 생성된 입력(즉, 출력)의 최대화 버전은 해당 입력에 비해 공간적 크로스토크 및 배경 잡음이 적도록 전문 신호 프로파일러에 의해 정정된다. 유사하게, 전문 신호 프로파일러에 의해 생성된 입력(즉, 출력)의 최대화 버전은 해당 입력에 비해 페이징 및 사전 페이징 잡음이 적도록 전문 신호 프로파일러에 의해 정정된다.As used herein, the term “maximized version” refers to data generated as output by a professional signal profiler. A maximized version of the input is the output produced by a professional signal profiler in response to processing the corresponding input. The maximized version of the input has a greater signal-to-noise ratio (SNR, S/R) than the corresponding input processed by a professional signal profiler. For example, a maximized version of the input (i.e., output) generated by an expert signal profiler is corrected by the expert signal profiler to have less spatial crosstalk and background noise compared to the corresponding input. Similarly, the maximized version of the input (i.e., output) generated by the expert signal profiler is corrected by the expert signal profiler to have less phasing and pre-phasing noise compared to the corresponding input.

레인-특이적 전문 신호 프로파일러Lane-specific expert signal profiler

일 구현예에서, 상부 표면(402)은 복수의 레인(508a, 508b,....5081)으로 분할되거나 분할될 수 있다. 도 5a에 예시된 예에서, 상부 표면(402)은 8개의 레인들을 갖지만, 레인들의 수는 구현 특정적이다. 따라서, 하나의 구현예에서, 플로우 셀(400)은 상부 표면(402)의 8개의 레인들 및 하부 표면(412)의 8개의 레인들에 대응하는 16개의 공간적 구성들을 가질 수 있고, 이는 결국 16개의 서브모집단들 또는 클러스터들의 클래스들, 이미지 데이터의 16개의 서브세트들, 및 16개의 전문적 신호 프로파일러들을 초래한다.In one implementation, the upper surface 402 may be divided or divided into a plurality of lanes 508a, 508b,...5081. In the example illustrated in Figure 5A, top surface 402 has eight lanes, but the number of lanes is implementation specific. Accordingly, in one implementation, flow cell 400 may have 16 spatial configurations corresponding to 8 lanes of upper surface 402 and 8 lanes of lower surface 412, resulting in 16 This results in 16 classes of subpopulations or clusters, 16 subsets of image data, and 16 expert signal profilers.

도 7b는 각각의 클러스터 클래스들(712)에 대해 각각의 레인-특이적 전문 신호 프로파일러들(714)을 훈련시키는 일 구현예를 도시한다. 각각의 클러스터 클래스들(712)은 레인들(508a, 508b,...., 5081) 상에 각각 위치된 클러스터들의 그룹들을 포함한다. 결과는 제1 레인(508a)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제1 전문 신호 프로파일러(714a), 제2 레인(508b)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제2 전문 신호 프로파일러(714b) 등이다(제1 레인(5081)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제1 전문 신호 프로파일러(7141)로 계속).FIG. 7B shows one implementation of training each lane-specific expert signal profiler 714 for each cluster class 712. Each cluster class 712 includes groups of clusters respectively located on lanes 508a, 508b,..., 5081. The result is that the first expert signal profiler 714a is configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the first lane 508a, and maximize the signal-to-noise ratio of the intensity data of the cluster located in the second lane 508b. and a second expert signal profiler 714b configured to do so (continued with a first expert signal profiler 7141 configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the first lane 5081).

도 9는 시퀀싱 런(900) 동안 각각의 클러스터 클래스들(712)에 대응하는 이미지 데이터 서브세트들(902, 912, ..., 922)에 훈련된 레인-특이적 전문 신호 프로파일러들(714)을 적용하는 일 구현예를 도시한다. 일 구현예에서, 플로우 셀(400)은 타일 레벨에서 이미징된다. 따라서, 각각의 레인이 200개의 타일들을 갖는다고 가정하면, 제1 이미지 데이터 서브세트(902)는 제1 레인(508a) 상의 200개의 타일들의 200개의 타일 이미지들의 제1 세트를 포함하고, 제2 이미지 데이터 서브세트(912)는 제2 레인(508b) 상의 200개의 타일들의 제2 세트의 제2 세트 등을 포함한다(제1 이미지 데이터 서브세트(922)로 계속).9 shows lane-specific expert signal profilers 714 trained on image data subsets 902, 912, ..., 922 corresponding to each cluster class 712 during a sequencing run 900. ) shows an implementation example of applying. In one implementation, flow cell 400 is imaged at the tile level. Accordingly, assuming each lane has 200 tiles, the first image data subset 902 includes a first set of 200 tile images of the 200 tiles on the first lane 508a, the second Image data subset 912 includes a second set of a second set of 200 tiles on second lane 508b, and so on (continuing with first image data subset 922).

제1 전문 신호 프로파일러(714a)는 제1 이미지 데이터 서브세트(902)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제1 이미지 데이터 서브세트(902)의 신호 대 잡음비 최대화 버전(904)을 생성하도록 구성된다. 제2 전문 신호 프로파일러(714b)는 제2 이미지 데이터 서브세트(912)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제2 이미지 데이터 서브세트(912)의 신호 대 잡음비 최대화 버전(914)을 생성하도록 구성된다(제1 이미지 데이터 서브세트(922)의 신호 대 잡음비 최대화 버전(924)으로 계속). 염기 호출자(332)는 신호 대 잡음비 최대화 버전들(904, 914,..., 924)을 프로세싱하고, 염기 호출들(908, 918,...., 928)을 생성한다.The first expert signal profiler 714a maximizes the signal-to-noise ratio of the intensity data in the first image data subset 902 to generate a signal-to-noise ratio maximized version 904 of the first image data subset 902. It is configured to do so. The second expert signal profiler 714b maximizes the signal-to-noise ratio of the intensity data in the second image data subset 912 to generate a signal-to-noise ratio maximized version 914 of the second image data subset 912. (Continuing with signal-to-noise ratio maximizing version 924 of first image data subset 922). Base caller 332 processes signal-to-noise ratio maximizing versions 904, 914,..., 924 and generates base calls 908, 918,..., 928.

레인 그룹-특이적 전문 신호 프로파일러Lane group-specific expert signal profiler

일부 구현예들에서, 레인들은 레인 그룹들(502a, 502b, 502c)로 그룹화될 수 있다. 레인 그룹의 예는 상부 주변 레인, 중심 레인, 및 하부 주변 레인을 포함한다. 따라서, 하나의 구현예에서, 플로우 셀(400)은 상부 표면(402)의 3개의 레인 그룹들 및 하부 표면(412)의 3개의 레인 그룹들에 대응하는 6개의 공간적 구성들을 가질 수 있고, 이는 결국 클러스터들의 6개의 서브집단 또는 클래스들, 이미지 데이터의 6개의 서브세트들, 및 6개의 전문 신호 프로파일러들을 초래한다.In some implementations, lanes may be grouped into lane groups 502a, 502b, and 502c. Examples of lane groups include upper perimeter lanes, central lanes, and lower perimeter lanes. Accordingly, in one implementation, flow cell 400 may have six spatial configurations corresponding to three lane groups on the upper surface 402 and three lane groups on the lower surface 412, which This results in 6 subpopulations or classes of clusters, 6 subsets of image data, and 6 expert signal profilers.

도 7a는 각각의 클러스터 클래스들(702)에 대해 각각의 레인 그룹-특이적 전문 신호 프로파일러들(704)을 훈련시키는 일 구현예를 도시한다. 각각의 클러스터 클래스들(702)은 레인 그룹들(502a, 502b, 502c) 상에 각각 위치된 클러스터들의 그룹들을 포함한다. 결과는 제1 레인 그룹(502a) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제1 전문 신호 프로파일러(704a), 제2 레인 그룹(502b) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제2 전문 신호 프로파일러(704b), 및 제3 레인 그룹(502c) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제3 전문 신호 프로파일러(704c)이다.FIG. 7A shows one implementation of training respective lane group-specific expert signal profilers 704 for each cluster class 702. Each cluster class 702 includes groups of clusters located respectively on lane groups 502a, 502b, and 502c. The result is a first expert signal profiler 704a configured to maximize the signal-to-noise ratio of the intensity data of the clusters located on the first lane group 502a, the intensity data of the clusters located on the second lane group 502b. a second specialized signal profiler 704b configured to maximize the signal-to-noise ratio of the intensity data of the clusters located on the third lane group 502c, and a third specialized signal profiler 704c configured to maximize the signal-to-noise ratio of the intensity data of the clusters located on the third lane group 502c. )am.

도 8은 시퀀싱 런(800) 동안 각각의 클러스터 클래스들(702)에 대응하는 이미지 데이터 서브세트들(802, 812, 822)에 훈련된 레인 그룹-특이적 전문 신호 프로파일러들(704)을 적용하는 일 구현예를 도시한다. 일 구현예에서, 플로우 셀(400)은 타일 레벨에서 이미징된다. 따라서, 제1 레인 그룹(502a)이 600개의 타일들을 갖는다고 가정하면, 제2 레인 그룹(502b)은 600개의 타일들을 갖고, 제3 레인 그룹(502c)은 400개의 타일들을 갖고, 제1 이미지 데이터 서브세트(802)는 제1 레인 그룹(502a) 상의 600개의 타일들의 600개의 타일 이미지들의 제1 세트를 포함하고, 제2 이미지 데이터 서브세트(812)는 제2 레인 그룹(502b) 상의 600개의 타일들의 600개의 타일 이미지들의 제2 세트를 포함하고, 제3 이미지 데이터 서브세트(822)는 제3 레인 그룹(502c) 상의 400개의 타일들의 400개의 타일 이미지들의 제3 세트를 포함한다.8 shows the application of trained lane group-specific expert signal profilers 704 to image data subsets 802, 812, 822 corresponding to each cluster class 702 during a sequencing run 800. Shows an implementation example of what is done. In one implementation, flow cell 400 is imaged at the tile level. Therefore, assuming the first lane group 502a has 600 tiles, the second lane group 502b has 600 tiles, the third lane group 502c has 400 tiles, and the first image Data subset 802 includes a first set of 600 tile images of the 600 tiles on a first lane group 502a, and a second image data subset 812 includes 600 tile images on a second lane group 502b. and a second set of 600 tile images of tiles, and the third image data subset 822 includes a third set of 400 tile images of 400 tiles on the third lane group 502c.

제1 전문 신호 프로파일러(704a)는 제1 이미지 데이터 서브세트(802)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제1 이미지 데이터 서브세트(802)의 신호 대 잡음비 최대화 버전(804)을 생성하도록 구성된다. 제2 전문 신호 프로파일러(704b)는 제2 이미지 데이터 서브세트(812)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제2 이미지 데이터 서브세트(812)의 신호 대 잡음비 최대화 버전(814)을 생성하도록 구성된다. 제3 전문 신호 프로파일러(704c)는 제3 이미지 데이터 서브세트(822)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제3 이미지 데이터 서브세트(822)의 신호 대 잡음비 최대화 버전(824)을 생성하도록 구성된다. 염기 호출자(332)는 신호 대 잡음비 최대화 버전들(804, 814, 824)을 프로세싱하고 염기 호출들(808, 818, 828)을 생성한다.The first expert signal profiler 704a maximizes the signal-to-noise ratio of the intensity data in the first image data subset 802 to generate a signal-to-noise ratio maximized version 804 of the first image data subset 802. It is configured to do so. The second expert signal profiler 704b maximizes the signal-to-noise ratio of the intensity data in the second image data subset 812 to generate a signal-to-noise ratio maximized version 814 of the second image data subset 812. It is configured to do so. A third expert signal profiler 704c maximizes the signal-to-noise ratio of the intensity data in the third image data subset 822 to generate a signal-to-noise ratio maximized version 824 of the third image data subset 822. It is configured to do so. Base caller 332 processes signal-to-noise ratio maximizing versions 804, 814, 824 and generates base calls 808, 818, 828.

스와스-특이적 전문 신호 프로파일러Swath-specific expert signal profiler

일부 구현예들에서, 각각의 레인은, 도 5b에 도시된 바와 같이, 타일들의 하나 이상의 컬럼/스와스(506a, 506b)를 포함한다. 따라서, 일 구현예에서, 플로우 셀(400)은 상부 표면(402)의 타일들의 32개의 스와스들 및 하부 표면(412)의 타일들의 32개의 스와스들에 대응하는 32개의 공간적 구성들을 가질 수 있고, 이는 결국 클러스터들의 32개의 서브모집단들 또는 클래스들, 이미지 데이터의 32개의 서브세트들, 및 32개의 전문 신호 프로파일러들을 초래한다.In some implementations, each lane includes one or more columns/swaths of tiles 506a, 506b, as shown in Figure 5B. Accordingly, in one implementation, flow cell 400 may have 32 spatial configurations corresponding to 32 swaths of tiles of upper surface 402 and 32 swaths of tiles of lower surface 412. There are, which results in 32 subpopulations or classes of clusters, 32 subsets of image data, and 32 expert signal profilers.

도 7c는 각각의 클러스터 클래스들(722)에 대해 각각의 스와스-특이적 전문 신호 프로파일러들(724)을 훈련시키는 일 구현예를 도시한다. 각각의 클러스터 클래스들(722)은 스와스(506a, 506b) 상에 각각 위치된 클러스터들의 그룹들을 포함한다. 결과는 제1 스와스(506a) 상에 위치된 클러스터의 세기 데이터의 신호 대 잡음비의 최대화를 최대화하도록 구성된 제1 전문 신호 프로파일러(724a), 및 제2 스와스(506b) 상에 위치된 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제2 전문 신호 프로파일러(724b)이다.FIG. 7C shows one implementation of training respective swath-specific expert signal profilers 724 for each cluster class 722. Each cluster class 722 includes groups of clusters located respectively on swaths 506a and 506b. The result is a first expert signal profiler 724a configured to maximize the signal-to-noise ratio of the intensity data of the clusters located on the first swath 506a, and the clusters located on the second swath 506b. A second professional signal profiler 724b is configured to maximize the signal-to-noise ratio of the intensity data.

도 10은 시퀀싱 런(1000) 동안 각각의 클러스터 클래스들(722)에 대응하는 이미지 데이터 서브세트들(1002, 1012)에 훈련된 스와스-특이적 전문 신호 프로파일러들(724)을 적용하는 일 구현예를 도시한다. 일 구현예에서, 플로우 셀(400)은 타일 레벨에서 이미징된다. 따라서, 제1 스와스(506a)가 100개의 타일들을 갖고 제2 스와스(506b)가 100개의 타일들을 갖는다고 가정하면, 제1 이미지 데이터 서브세트(1002)는 제1 스와스(506a) 상의 100개 타일의 100개 타일 이미지의 제1 세트를 포함하고, 제2 이미지 데이터 서브세트(1012)는 제2 스와스(506b) 상의 100개 타일의 100개 타일 이미지의 제2 세트를 포함한다.10 shows the application of trained swath-specific expert signal profilers 724 to image data subsets 1002, 1012 corresponding to respective cluster classes 722 during a sequencing run 1000. An implementation example is shown. In one implementation, flow cell 400 is imaged at the tile level. Thus, assuming first swath 506a has 100 tiles and second swath 506b has 100 tiles, first image data subset 1002 is Containing a first set of 100 tile images of 100 tiles, second image data subset 1012 includes a second set of 100 tile images of 100 tiles on second swath 506b.

제1 전문 신호 프로파일러(724a)는 제1 이미지 데이터 서브세트(1002)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제1 이미지 데이터 서브세트(1002)의 신호 대 잡음비 최대화 버전(1004)을 생성하도록 구성된다. 제2 전문 신호 프로파일러(724b)는 제2 이미지 데이터 서브세트(1012)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제2 이미지 데이터 서브세트(1012)의 신호 대 잡음비 최대화 버전(1014)을 생성하도록 구성된다. 염기 호출자(332)는 신호 대 잡음비 최대화 버전들(1004, 1014, 1024)을 프로세싱하고, 염기 호출들(1008, 1018)을 생성한다.The first expert signal profiler 724a maximizes the signal-to-noise ratio of the intensity data in the first image data subset 1002 to generate a signal-to-noise ratio maximized version 1004 of the first image data subset 1002. It is configured to do so. The second expert signal profiler 724b maximizes the signal-to-noise ratio of the intensity data in the second image data subset 1012 to generate a signal-to-noise ratio maximized version 1014 of the second image data subset 1012. It is configured to do so. Base caller 332 processes signal-to-noise ratio maximizing versions 1004, 1014, 1024 and generates base calls 1008, 1018.

타일-특이적 전문 신호 프로파일러Tile-specific expert signal profiler

각각의 스와스는 도 5b에 도시된 바와 같이 복수의 타일들(512a, 512b,..., 512t)을 포함한다. 각각의 스와스 내의 타일들의 수는 구현 특이적이고, 상이한 예들에서, 50개의 타일들, 60개의 타일들, 80개의 타일들 등이 있을 수 있다. 예를 들어, 각각의 스와스가 100개의 타일들을 갖는 것을 고려한다. 이어서, 플로우 셀(400)은 레인당 200개의 타일들을 가져서, 상부 표면(402)에 대한 1600개의 타일들 및 하부 표면(412)에 대한 다른 1600개의 타일들을 생성할 것이다(즉, 총 3200개의 타일들). 따라서, 일 구현예에서, 플로우 셀(400)은 3200개의 타일들에 대응하는 3200개의 공간적 구성들을 가질 수 있고, 이는 결국 클러스터들의 3200개의 서브집단들 또는 클래스들, 이미지 데이터의 3200개의 서브세트들, 및 3200개의 전문 신호 프로파일러들을 초래한다.Each swath includes a plurality of tiles 512a, 512b,..., 512t, as shown in FIG. 5B. The number of tiles in each swath is implementation specific, and in different examples there may be 50 tiles, 60 tiles, 80 tiles, etc. For example, consider that each swath has 100 tiles. Flow cell 400 will then have 200 tiles per lane, producing 1600 tiles for the upper surface 402 and another 1600 tiles for the lower surface 412 (i.e., 3200 tiles total) field). Accordingly, in one implementation, flow cell 400 may have 3200 spatial configurations corresponding to 3200 tiles, which results in 3200 subpopulations or classes of clusters, 3200 subsets of image data. , and 3200 professional signal profilers.

도 7d는 각각의 클러스터 클래스들(732)에 대해 각각의 타일-특이적 전문 신호 프로파일러들(734)을 훈련시키는 일 구현예를 도시한다. 각각의 클러스터 클래스들(732)은 타일들(512a, 512b,...., 512t) 상에 각각 위치된 클러스터들의 그룹들을 포함한다. 결과는 제1 타일(512a)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제1 전문 신호 프로파일러(734a), 제2 타일(512b)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제2 전문 신호 프로파일러(734b) 등이다(제t 타일(512t)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제t 전문 신호 프로파일러(734t)로 계속).FIG. 7D shows one implementation of training respective tile-specific expert signal profilers 734 for each cluster class 732. Each cluster class 732 includes groups of clusters respectively located on tiles 512a, 512b,..., 512t. The result is a first expert signal profiler 734a configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the first tile 512a, and maximize the signal-to-noise ratio of the intensity data of the cluster located in the second tile 512b. a second expert signal profiler 734b configured to do so (continued with a t-th expert signal profiler 734t configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the t-th tile 512t);

도 11은 훈련된 타일-특이적 전문 신호 프로파일러들(734)을 각각의 클러스터 클래스(732)에 대응하는 이미지 데이터 서브세트(1102, 1112, ..., 1122)에 적용하는 일 구현예를 도시한다 (시퀀싱 런(1100) 동안). 일 구현예에서, 플로우 셀(400)은 타일 레벨에서 이미징된다. 따라서, 타일들(512a, 512b,..., 512t)에 대해, 제1 이미지 데이터 서브세트(1102)는 제1 타일(512a)의 제1 타일 이미지를 포함하고, 제2 이미지 데이터 서브세트(1112)는 제2 타일(512b)의 제2 타일 이미지를 포함한다(제t 타일(512t)의 제t 타일 이미지를 포함하는 제t 이미지 데이터 서브세트(1122)로 계속됨(도 5c에 도시됨)).11 shows one implementation of applying trained tile-specific expert signal profilers 734 to image data subsets 1102, 1112, ..., 1122 corresponding to each cluster class 732. (during sequencing run 1100). In one implementation, flow cell 400 is imaged at the tile level. Accordingly, for tiles 512a, 512b,..., 512t, the first image data subset 1102 includes the first tile image of the first tile 512a, and the second image data subset ( 1112 includes a second tile image of second tile 512b (continued with t-th image data subset 1122 including a t-th tile image of t-tile 512t (shown in FIG. 5C) ).

제1 전문 신호 프로파일러(734a)는 제1 이미지 데이터 서브세트(1102)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제1 이미지 데이터 서브세트(1102)의 신호 대 잡음비 최대화 버전(1104)을 생성하도록 구성된다. 제2 전문 신호 프로파일러(734b)는 제2 이미지 데이터 서브세트(1112)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제2 이미지 데이터 서브세트(1112)의 신호 대 잡음비 최대화 버전(1114)을 생성하도록 구성된다(제t 이미지 데이터 서브세트(1122)의 신호 대 잡음비 최대화 버전(1124)으로 계속). 염기 호출자(332)는 신호 대 잡음비 최대화 버전들(1104, 1114,..., 1124)을 프로세싱하고, 염기 호출들(1108, 1118,...., 1128)을 생성한다.The first expert signal profiler 734a maximizes the signal-to-noise ratio of the intensity data in the first image data subset 1102 to generate a signal-to-noise ratio maximized version 1104 of the first image data subset 1102. It is configured to do so. The second expert signal profiler 734b maximizes the signal-to-noise ratio of the intensity data in the second image data subset 1112 to generate a signal-to-noise ratio maximized version 1114 of the second image data subset 1112. (Continuing with the signal-to-noise ratio maximizing version 1124 of the t image data subset 1122). Base caller 332 processes signal-to-noise ratio maximizing versions 1104, 1114,..., 1124 and generates base calls 1108, 1118,..., 1128.

서브타일-특이적 전문 신호 프로파일러Subtile-specific expert signal profiler

각각의 타일은 다음으로 분할될 수 있다: 복수의 서브타일들(518a, 518b 518s), 도 5d에 도시된 바와 같음. 타일이 구현 특정적일 수 있는 서브타일들의 수는, 상이한 예들에서, 10개의 서브타일들, 30 서브타일들, 50 서브타일들 등일 수 있다. 예를 들어, 각각의 타일이 9개의 서브타일들로 분할되는 것을 고려한다. 이어서, 플로우 셀(400)은 3200개의 타일들에 대한 총 28,800개의 서브타일들을 가질 것이다. 따라서, 일 구현예에서, 플로우 셀(400)은 28800개의 타일들에 대응하는 28800개의 공간적 구성들을 가질 수 있고, 이는 결국 클러스터들의 28800개의 서브집단들 또는 클래스들, 이미지 데이터의 28800개의 서브세트들, 및 28800개의 전문 신호 프로파일러들을 초래한다.Each tile may be divided into: a plurality of subtiles 518a, 518b and 518s, as shown in Figure 5D. The number of subtiles a tile can be implementation specific may be, in different examples, 10 subtiles, 30 subtiles, 50 subtiles, etc. For example, consider that each tile is divided into 9 subtiles. Flow cell 400 will then have a total of 28,800 subtiles for 3200 tiles. Thus, in one implementation, flow cell 400 may have 28800 spatial configurations corresponding to 28800 tiles, which results in 28800 subpopulations or classes of clusters, 28800 subsets of image data. , resulting in 28800 professional signal profilers.

도 7e는 각각의 클러스터 클래스들(742)에 대해 각각의 서브타일-특이적 전문 신호 프로파일러들(744)을 훈련시키는 일 구현예를 도시한다. 각각의 클러스터 클래스들(742)은 서브타일들(518a, 518b, 518c, 518d,..., 518 s) 상에 각각 위치된 클러스터들의 그룹들을 포함한다. 결과는 제1 서브타일(518a)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제1 전문 신호 프로파일러(744a), 제2 서브타일(518b)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제2 전문 신호 프로파일러(744b), 제3 서브타일(518c)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제3 전문 신호 프로파일러(744c), 제4 서브타일(518d)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제4 전문 신호 프로파일러(744d) 등이다(제s 서브타일(518s)에 위치한 클러스터의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성된 제s 전문 신호 프로파일러(744s)로 계속됨).Figure 7E shows one implementation of training each subtile-specific expert signal profiler 744 for each cluster class 742. Each cluster class 742 includes groups of clusters each located on subtiles 518a, 518b, 518c, 518d,..., 518 s. The result is a first expert signal profiler 744a configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the first subtile 518a, and the signal-to-noise ratio of the intensity data of the cluster located in the second subtile 518b. A second specialized signal profiler 744b configured to maximize, a third specialized signal profiler 744c configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the third subtile 518c, and a fourth subtile ( a fourth professional signal profiler 744d configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in 518d), and the like (a fourth specialized signal profiler 744d configured to maximize the signal-to-noise ratio of the intensity data of the cluster located in the s subtile 518s). s continued as a professional signal profiler (744s).

도 12는 시퀀싱 런(1200) 동안 각각의 클러스터 클래스들(742)에 대응하는 이미지 데이터 서브세트들(1202, 1212, 1222, 1232,...., 1242)에 훈련된 서브타일-특이적 전문 신호 프로파일러들(744)을 적용하는 일 구현예를 도시한다. 일 구현예에서, 플로우 셀(400)은 서브타일 레벨에서 이미징된다. 따라서, 518a, 518b, 518c, 518d,..., 518s의 경우, 제1 이미지 데이터 서브세트(1202)는 제1 서브타일(518a)의 제1 서브타일 이미지 패치를 포함하고, 제2 이미지 데이터 서브세트(1212)는 제2 서브타일(518b)의 제2 서브타일 이미지 패치를 포함하고, 제3 이미지 데이터 서브세트(1222)는 제3 서브타일(518c)의 제3 서브타일 이미지 패치를 포함하고, 제4 이미지 데이터 서브세트(1232)는 제4 서브타일(518d)의 제4 서브타일 이미지 패치를 포함한다(제s 서브타일(518s)의 제s 서브타일 이미지 패치를 포함하는 제s 이미지 데이터 서브세트(1242)로 계속됨).12 shows the subtile-specific full text trained on image data subsets 1202, 1212, 1222, 1232,..., 1242 corresponding to each cluster class 742 during sequencing run 1200. One implementation of applying signal profilers 744 is shown. In one implementation, flow cell 400 is imaged at the subtile level. Accordingly, for 518a, 518b, 518c, 518d,..., 518s, the first image data subset 1202 includes the first subtile image patch of the first subtile 518a, and the second image data Subset 1212 includes a second subtile image patch of second subtile 518b, and third image data subset 1222 includes a third subtile image patch of third subtile 518c. And, the fourth image data subset 1232 includes the fourth subtile image patch of the fourth subtile 518d (the sth image including the sth subtile image patch of the sth subtile 518s). Continued with Data Subsets (1242).

제1 전문 신호 프로파일러(744a)는 제1 이미지 데이터 서브세트(1202)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제1 이미지 데이터 서브세트(1202)의 신호 대 잡음비 최대화 버전(1204)을 생성하도록 구성된다. 제2 전문 신호 프로파일러(744b)는 제2 이미지 데이터 서브세트(1212)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제2 이미지 데이터 서브세트(1212)의 신호 대 잡음비 최대화 버전(1214)을 생성하도록 구성된다. 제3 전문 신호 프로파일러(744c)는 제3 이미지 데이터 서브세트(1222)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제3 이미지 데이터 서브세트(1222)의 신호 대 잡음비 최대화 버전(1224)을 생성하도록 구성된다. 제4 전문 신호 프로파일러(744d)는 제4 이미지 데이터 서브세트(1232)에서의 세기 데이터의 신호 대 잡음비를 최대화하여 제4 이미지 데이터 서브세트(1232)의 신호 대 잡음비 최대화 버전(1234)을 생성하도록 구성된다(제s 이미지 데이터 서브세트(1242)의 신호 대 잡음비 최대화 버전(1244)으로 계속). 염기 호출자(332)는 신호 대 잡음비 최대화 버전들(1204, 1214, 1224, 1234,..., 1244)을 프로세싱하고, 염기 호출들을 생성한다(염기 호출 1208, 1218,The first expert signal profiler 744a maximizes the signal-to-noise ratio of the intensity data in the first image data subset 1202 to generate a signal-to-noise ratio maximized version 1204 of the first image data subset 1202. It is configured to do so. The second expert signal profiler 744b maximizes the signal-to-noise ratio of the intensity data in the second image data subset 1212 to generate a signal-to-noise ratio maximized version 1214 of the second image data subset 1212. It is configured to do so. A third expert signal profiler 744c maximizes the signal-to-noise ratio of the intensity data in the third image data subset 1222 to generate a signal-to-noise ratio maximized version 1224 of the third image data subset 1222. It is configured to do so. A fourth expert signal profiler 744d maximizes the signal-to-noise ratio of the intensity data in the fourth image data subset 1232 to generate a signal-to-noise ratio maximized version 1234 of the fourth image data subset 1232. (Continuing with the signal-to-noise ratio maximizing version 1244 of the first image data subset 1242). Base caller 332 processes the signal-to-noise ratio maximizing versions 1204, 1214, 1224, 1234,..., 1244 and generates base calls (base calls 1208, 1218,

1228, 1238, ..., 1248).1228, 1238, ..., 1248).

시간적 구성-특이적 전문 신호 프로파일러Temporal configuration-specific expert signal profiler

도 13은 총 N개의 시퀀싱 사이클들을 갖는 시퀀싱 런(1300)의 각각의 서브시리즈의 시퀀싱 사이클들에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들의 일 구현예를 도시한다. 제1 전문 신호 프로파일러(1312)는 서브타일 M 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 1 내지 N1 동안 생성되도록 구성된다. 제2 전문 신호 프로파일러(1314)는 서브타일 M 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 N1 + 1 내지 N2 동안 생성되도록 구성된다. 제3 전문 신호 프로파일러(1318)는 서브타일 M 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성되고, 시퀀싱 사이클들 N2 + 1 내지 N동안 생성된다. 시간적 구성들의 다른 예들은 시퀀싱 런의 제1 판독의 시퀀싱 사이클들(판독 1), 및 시퀀싱 런의 제2 판독의 시퀀싱 사이클들(판독 2)을 포함한다.Figure 13 shows an example implementation of each/separate/different/independent expert signal profilers for the sequencing cycles of each subseries of sequencing run 1300 with a total of N sequencing cycles. The first expert signal profiler 1312 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile M and generated during sequencing cycles 1 to N1. The second expert signal profiler 1314 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile M and generated during sequencing cycles N1 + 1 to N2. A third expert signal profiler 1318 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile M, generated during sequencing cycles N2 + 1 to N. Other examples of temporal configurations include sequencing cycles of a first read of a sequencing run (Read 1), and sequencing cycles of a second read of a sequencing run (Read 2).

도 14는 상이한 공간 구성(예를 들어, 상이한 서브타일)과 상이한 시간적 구성(예를 들어, 시퀀싱 사이클의 상이한 서브시리즈)의 조합에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들의 일 구현예를 도시한다. 일 구현예에서, 클러스터 클래스(1410)는 상이한 공간적 구성(예를 들어, 상이한 서브타일)에 의해 정의된다. 일 구현예에서, 클러스터 서브클래스들(1412, 1414, 14)은 상이한 시간적 구성들에 의해 정의된다(예: 상이한 서브시리즈의 시퀀싱 사이클들).Figure 14 shows the results of each/separate/different/independent expert signal profilers for combinations of different spatial configurations (e.g. different subtiles) and different temporal configurations (e.g. different subseries of sequencing cycles). One implementation example is shown. In one implementation, cluster classes 1410 are defined by different spatial configurations (e.g., different subtiles). In one implementation, cluster subclasses 1412, 1414, 14 are defined by different temporal configurations (eg, different subseries of sequencing cycles).

제1 전문 신호 프로파일러(1422)는 서브타일(518a) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 1 내지 N1 동안 생성되도록 구성된다. 제2 전문 신호 프로파일러(1424)는 서브타일(518a) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하도록 구성되고, 시퀀싱 사이클들 N1 + 1 내지 N2 동안 생성된다. 제3 전문 신호 프로파일러(1428)는 서브타일(518a) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 N2 + 1 내지 N 동안 생성되도록 구성된다.The first expert signal profiler 1422 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile 518a and generated during sequencing cycles 1 through N1. The second expert signal profiler 1424 is configured to maximize the signal-to-noise ratio of the intensity data of the clusters located on subtile 518a, generated during sequencing cycles N1 + 1 through N2. The third expert signal profiler 1428 is configured to maximize the signal-to-noise ratio of the intensity data of the clusters located on subtile 518a and generated during sequencing cycles N2 + 1 to N.

제4 전문 신호 프로파일러(1432)는 서브타일(518b) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 1 내지 N1 동안 생성되도록 구성된다. 제5 전문 신호 프로파일러(1434)는 서브타일(518b) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 N1 + 1 내지 N2 동안 생성되도록 구성된다. 제6 전문 신호 프로파일러(1438)는 서브타일(518b) 상에 위치된 클러스터들의 세기 데이터의 신호 대 잡음비를 최대화하고 시퀀싱 사이클들 N2 + 1 내지 N 동안 생성되도록 구성된다.The fourth expert signal profiler 1432 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile 518b and generated during sequencing cycles 1 through N1. The fifth expert signal profiler 1434 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile 518b and generated during sequencing cycles N1 + 1 to N2. The sixth expert signal profiler 1438 is configured to maximize the signal-to-noise ratio of the intensity data of clusters located on subtile 518b and generated during sequencing cycles N2 + 1 to N.

클러스터/웰-특이적 전문 신호 프로파일러Cluster/well-specific expert signal profiler

도 15는 시퀀싱 런 동안 시퀀싱된 각각의 클러스터/웰에 대한 각각의/별도의/상이한/독립적인 신호 프로파일러들의 일 구현예를 도시한다. 플로우 셀 상의 클러스터들/웰들은 그들의 위치 좌표들에 의해 미리 식별될 수 있다. 클러스터들/웰들의 이러한 위치 좌표들은 클러스터/웰 세기 데이터에 대한 훈련 동안 클러스터/웰 전문 신호 프로파일러들을 훈련시키고 클러스터들/웰들의 위치 좌표들에 기초하여 클러스터/웰 세기 데이터 추론에 대한 추론 동안 훈련된 클러스터별 전문 신호 프로파일러들을 적용하는 데 사용될 수 있다. 도 15에서, 클러스터/웰 집단(1502)은 N개의 클러스터들/웰들을 갖고, 따라서 전문 신호 프로파일러들(1508)은 각각의 N개의 클러스터별/웰 전문 신호 프로파일러들을 포함한다. 다른 구현예들에서, 상기에서 논의된 바와 같이, 상이한 클러스터/웰 신호 프로파일러들이 또한 상이한 시간적 구성들에 대해 훈련될 수 있다.Figure 15 shows one implementation of each/separate/different/independent signal profilers for each cluster/well sequenced during a sequencing run. Clusters/wells on the flow cell can be pre-identified by their position coordinates. These position coordinates of clusters/wells train cluster/well expert signal profilers during training on cluster/well intensity data and during inference on cluster/well intensity data inference based on the position coordinates of clusters/wells. It can be used to apply professional signal profilers for each cluster. In Figure 15, cluster/well population 1502 has N clusters/well, so expert signal profilers 1508 includes each of N per-cluster/well expert signal profilers. In other implementations, as discussed above, different cluster/well signal profilers may also be trained for different temporal configurations.

오프라인 훈련offline training

도 16은 하나 이상의 완성된/이미 실행된 시퀀싱 런들로부터의 시퀀싱된 데이터 상에서 전문 신호 프로파일러들의 오프라인 훈련, 및 진행 중인 시퀀싱 런으로부터의 시퀀싱된 데이터 상에 훈련된 전문 신호 프로파일러들의 적용의 일 구현예를 도시한다. 훈련 스테이지(1602) 동안 훈련 데이터(1612)가 생성된다. 훈련 데이터(1612)는 하나 이상의 완성된/이미 실행된 시퀀싱 런들로부터의 시퀀싱된 데이터를 포함한다.16 shows one implementation of offline training of expert signal profilers on sequenced data from one or more completed/already run sequencing runs, and application of trained expert signal profilers on sequenced data from an ongoing sequencing run. An example is shown. During training stage 1602, training data 1612 is generated. Training data 1612 includes sequenced data from one or more completed/already executed sequencing runs.

세그먼트화 로직(1622)은 상이한 공간적 구성, 시간적 구성들, 신호 분배 구성들, 또는 이들의 임의의 조합으로부터 선택된 하나 이상의 구성들에 기초하여 훈련 데이터(1612)를 세그먼트화한다. 결과들은 구성-특이적 훈련 데이터 서브세트들 1 내지 N을 갖는 세그먼트화된 훈련 데이터(1632)이다. 예를 들어, 훈련 데이터(1612)는 완료된/이미 실행된 시퀀싱 런의 K개의 이미징 사이클들로부터의 타일의 K개의 이미지들을 포함할 수 있으며, 이때 각각의 타일 이미지는 다수의 컬러 채널들을 갖는다. 도 17은 C 색상 채널들을 갖는 타일 이미지의 예를 도시한다. 이 경우에, 세그먼트화 로직(1622)은 픽셀 범위들을 특정함으로써 훈련 데이터(1612)의 각각의 타일 이미지를 서브타일 이미지들로 논리적으로 분할한다. 예를 들어, 타일의 제1 서브타일 이미지는 픽셀 1 내지 500의 범위일 수 있고, 타일의 제2 서브타일 이미지는 픽셀 501 내지 1000의 범위일 수 있다. 픽셀 범위는 위에서 논의된 바와 같이 기점 마커를 사용하여 정의될 수 있다.Segmentation logic 1622 segments training data 1612 based on one or more configurations selected from different spatial configurations, temporal configurations, signal distribution configurations, or any combination thereof. The results are segmented training data 1632 with configuration-specific training data subsets 1 through N. For example, training data 1612 may include K images of a tile from K imaging cycles of a completed/already executed sequencing run, where each tile image has multiple color channels. Figure 17 shows an example of a tile image with C color channels. In this case, segmentation logic 1622 logically partitions each tile image of training data 1612 into subtile images by specifying pixel ranges. For example, a first subtile image of a tile may range from pixels 1 to 500, and a second subtile image of a tile may range from pixels 501 to 1000. Pixel ranges can be defined using fiducial markers as discussed above.

오프라인 훈련 로직(1642)은 각각의 구성별 훈련 데이터 서브세트 1 내지 N에 대해 각각의/별도의/상이한/독립적인 전문 신호 프로파일러 1 내지 N을 훈련한다. 결과는 훈련된 전문 신호 프로파일러 1 내지 N이다. 도 17의 예로 돌아가면, 해당 전문 신호 프로파일러는 훈련 데이터(1612)의 각 서브타일 이미지의 신호 대 잡음비를 최대화하도록 훈련된다.Offline training logic 1642 trains each/separate/different/independent expert signal profiler 1 to N for each configuration-specific training data subset 1 to N. The result is trained expert signal profilers 1 to N. Returning to the example of Figure 17, the expert signal profiler is trained to maximize the signal-to-noise ratio of each subtile image in training data 1612.

추론 데이터(1618)는 추론 스테이지(1608) 동안 생성된다. 추론 데이터(1618)는 진행 중인 시퀀싱 런으로부터의 시퀀싱된 데이터(예를 들어, 진행 중인 시퀀싱 런의 제1 i 사이클)를 포함한다.Inference data 1618 is generated during inference stage 1608. Inference data 1618 includes sequenced data from an ongoing sequencing run (e.g., the first i cycle of an ongoing sequencing run).

세그먼트화 로직(1622)은 훈련 스테이지(1602) 동안 훈련 데이터(1612)를 세그먼트화하는 데 사용되었던 동일한 하나 이상의 구성들에 기초하여 추론 데이터(1618)를 세그먼트화한다. 결과는 구성-특이적 추론 데이터 서브세트들 1 내지 N을 갖는 세그먼트화된 추론 데이터(1638)이다. 예를 들어, 추론 데이터(1618)는 진행 중인 시퀀싱 런의 K개의 이미징 사이클들로부터의 타일의 K개의 이미지들을 포함할 수 있으며, 이때 각각의 타일 이미지는 다수의 컬러 채널들을 갖는다. 도 17의 예로 돌아가서, 세그먼트화 로직(1622)은 훈련 데이터(1612)를 분할하는 데 사용되는 동일한 픽셀 범위들을 특정함으로써 추론 데이터(1618)의 각각의 타일 이미지를 서브타일 이미지들로 논리적으로 분할한다.Segmentation logic 1622 segments inference data 1618 based on the same one or more configurations that were used to segment training data 1612 during training stage 1602. The result is segmented inference data 1638 with configuration-specific inference data subsets 1 through N. For example, inference data 1618 may include K images of a tile from K imaging cycles of an ongoing sequencing run, with each tile image having multiple color channels. Returning to the example of FIG. 17 , segmentation logic 1622 logically partitions each tile image of inference data 1618 into subtile images by specifying identical pixel ranges used to segment training data 1612. .

런타임 로직(1648)은 각각의 구성-특이적 추론 데이터 서브세트들 1 내지 N에 각각의 훈련된 전문 신호 프로파일러들 1 내지 N(1658)을 적용한다. 도 17의 예로 돌아가서, 대응하는 훈련된 전문 신호 프로파일러가 적용되어 세그먼트화된 추론 데이터(1638)에서 각각의 서브타일 이미지의 신호 대 잡음비를 최대화한다.Runtime logic 1648 applies each of the trained expert signal profilers 1 through N 1658 to each configuration-specific inference data subsets 1 through N. Returning to the example of Figure 17, the corresponding trained expert signal profiler is applied to maximize the signal-to-noise ratio of each subtile image in the segmented inferred data 1638.

온라인 훈련online training

도 18은 진행 중인 시퀀싱 런의 초기 시퀀싱 사이클들로부터의 시퀀싱된 데이터 상의 전문 신호 프로파일러들의 온라인 훈련, 및 진행 중인 시퀀싱 런의 나중의 시퀀싱 사이클들로부터의 시퀀싱된 데이터 상에 훈련된 전문 신호 프로파일러들의 적용의 일 구현예를 도시한다. 추론 데이터(1802)는 추론 스테이지(1802) 동안 생성된다. 추론 데이터(1812)는 진행 중인 시퀀싱 런의 이전 시퀀싱 사이클(예를 들어, 사이클 1 내지 N1)로부터의 시퀀싱된 데이터를 포함한다.18 shows online training of expert signal profilers on sequenced data from early sequencing cycles of an ongoing sequencing run, and expert signal profilers trained on sequenced data from later sequencing cycles of an ongoing sequencing run. One implementation example of application of these is shown. Inference data 1802 is generated during the inference stage 1802. Inferred data 1812 includes sequenced data from previous sequencing cycles (e.g., Cycles 1 through N1) of the ongoing sequencing run.

세그먼트화 로직(1622)은 상이한 공간적 구성, 시간적 구성들, 신호 분배 구성들, 또는 이들의 임의의 조합으로부터 선택된 하나 이상의 구성들에 기초하여 추론 데이터(1812)를 세그먼트화한다. 결는 구성-특이적 훈련 데이터 서브세트들 1 내지 N을 갖는 세그먼트화된 추론 데이터(1832)이다. 예를 들어, 추론 데이터(1812)는 진행 중인 시퀀싱 런의 N1개의 초기 시퀀싱 사이클들로부터의 타일의 N1 이미지들을 포함할 수 있으며, 이때 각각의 타일 이미지는 다수의 컬러 채널들을 갖는다. 도 17은 C 색상 채널들을 갖는 타일 이미지의 예를 도시한다. 이 경우에, 세그먼트화 로직(1622)은 픽셀 범위들을 특정함으로써 추론 데이터(1812)의 각각의 타일 이미지를 서브타일 이미지들로 논리적으로 분할한다. 예를 들어, 타일의 제1 서브타일 이미지는 픽셀 1 내지 500의 범위일 수 있고, 타일의 제2 서브타일 이미지는 픽셀 501 내지 1000의 범위일 수 있다. 픽셀 범위는 위에서 논의된 바와 같이 기점 마커를 사용하여 정의될 수 있다.Segmentation logic 1622 segments inference data 1812 based on one or more configurations selected from different spatial configurations, temporal configurations, signal distribution configurations, or any combination thereof. The result is segmented inference data 1832 with configuration-specific training data subsets 1 to N. For example, inference data 1812 may include N1 images of tiles from N1 initial sequencing cycles of an ongoing sequencing run, with each tile image having multiple color channels. Figure 17 shows an example of a tile image with C color channels. In this case, segmentation logic 1622 logically partitions each tile image of inferred data 1812 into subtile images by specifying pixel ranges. For example, a first subtile image of a tile may range from pixels 1 to 500, and a second subtile image of a tile may range from pixels 501 to 1000. Pixel ranges can be defined using fiducial markers as discussed above.

오프라인 훈련 로직(1842)은 각각의 구성별 훈련 데이터 서브세트 1 내지 N에 대해 각각의/별도의/상이한/독립적인 전문 신호 프로파일러 1 내지 N을 훈련한다. 결과는 훈련된 전문 신호 프로파일러 1 내지 N이다. 도 17의 예로 돌아가면, 해당 전문 신호 프로파일러는 추론 데이터(1812)의 각 서브타일 이미지의 신호 대 잡음비를 최대화하도록 훈련된다.Offline training logic 1842 trains each/separate/different/independent expert signal profiler 1 to N for each configuration-specific training data subset 1 to N. The result is trained expert signal profilers 1 to N. Returning to the example of Figure 17, the expert signal profiler is trained to maximize the signal-to-noise ratio of each subtile image in the inferred data 1812.

추론 데이터(1818)는 또한 추론 스테이지(1802) 동안 생성된다. 추론 데이터(1818)는 진행 중인 시퀀싱 런의 나중의 시퀀싱 사이클(예를 들어, 사이클 N1 +1 내지 N2)로부터의 시퀀싱된 데이터를 포함한다.Inference data 1818 is also generated during the inference stage 1802. Inferred data 1818 includes sequenced data from later sequencing cycles (e.g., cycles N1 +1 to N2) of an ongoing sequencing run.

세그먼트화 로직(1622)은 진행 중인 시퀀싱 런의 더 이른 시퀀싱 사이클(예: 사이클 1 내지 N1) 동안 추론 데이터(1812)를 세그먼트화하는 데 사용되었던 동일한 하나 이상의 구성들에 기초하여 추론 데이터(1818)를 세그먼트화한다. 결과는 구성-특이적 추론 데이터 서브세트들 1 내지 N을 갖는 세그먼트화된 추론 데이터(1838)이다. 예를 들어, 추론 데이터(1818)는 진행 중인 시퀀싱 런의 N2개의 나중의 시퀀싱 사이클들로부터의 타일의 N2 이미지들을 포함할 수 있으며, 이때 각각의 타일 이미지는 다수의 컬러 채널들을 갖는다. 도 17의 예로 돌아가서, 세그먼트화 로직(1622)은 추론 데이터(1812)를 분할하는 데 사용되는 동일한 픽셀 범위들을 특정함으로써 추론 데이터(1818)의 각각의 타일 이미지를 서브타일 이미지들로 논리적으로 분할한다.Segmentation logic 1622 may segment inference data 1818 based on the same one or more configurations that were used to segment inference data 1812 during an earlier sequencing cycle (e.g., Cycles 1 through N1) of an ongoing sequencing run. segment. The result is segmented inference data 1838 with configuration-specific inference data subsets 1 through N. For example, inference data 1818 may include N2 images of tiles from N2 later sequencing cycles of an ongoing sequencing run, with each tile image having multiple color channels. Returning to the example of Figure 17, segmentation logic 1622 logically partitions each tile image of speculation data 1818 into subtile images by specifying identical pixel ranges used to segment speculation data 1812. .

런타임 로직(1648)은 각각의 구성-특이적 추론 데이터 서브세트들 1 내지 N에 각각의 훈련된 전문 신호 프로파일러들 1 내지 N(1858)을 적용한다. 도 17의 예로 돌아가서, 대응하는 훈련된 전문 신호 프로파일러가 적용되어 세그먼트화된 추론 데이터(1838)에서 각각의 서브타일 이미지의 신호 대 잡음비를 최대화한다.Runtime logic 1648 applies each of the trained expert signal profilers 1 through N 1858 to each configuration-specific inference data subsets 1 through N. Returning to the example of Figure 17, the corresponding trained expert signal profiler is applied to maximize the signal-to-noise ratio of each subtile image in the segmented inferred data 1838.

일부 구현예에서, 훈련 프로세스가 반복적으로 반복되는데, 각각의 훈련된 전문 신호 프로파일러 1 내지 N(1858)은 진행 중인 시퀀싱 런의 이후의 시퀀싱 사이클(예를 들어, 사이클 N1+1에서 N2까지)로부터 세그먼트화된 추론 데이터(1838)에 대해 재훈련/더 훈련되고, 진행 중인 시퀀싱 실행의 더 나중 시퀀싱 사이클(예: N2+1에서 N3 사이클)에서까지 세그먼트화된 추론 데이터에 적용된다.In some implementations, the training process is repeated iteratively, with each trained expert signal profiler 1 through N 1858 completing a subsequent sequencing cycle (e.g., cycles N1+1 through N2) of the ongoing sequencing run. is retrained/further trained on the segmented inference data 1838 and applied to the segmented inference data from later sequencing cycles (e.g., N2+1 to N3 cycles) of the ongoing sequencing run.

제어 로직(도시되지 않음)은 진행 중인 시퀀싱 런의 각각의 연속적인 시퀀싱 사이클에서 또는 진행 중인 시퀀싱 런의 연속적인 서브시리즈 시퀀싱 사이클에서(예를 들어, 10 또는 20개의 시퀀싱 사이클마다) 반복할 수 있다: (i) 서로 다른 공간적 구성, 시간적 구성, 신호 분포 구성 또는 이들의 조합으로부터 선택된 하나 이상의 구성을 기반으로 이미지 데이터의 현재 배치를 세그먼트화하고, (ii) 세그먼트화된 현재 이미지 데이터 배치에 대해 각각의 훈련된 전문 신호 프로파일러 1 내지 N을 재교육하고, (iii) 현재 이미지 데이터 배치와 동일한 기준으로 다음 이미지 데이터 배치를 세그먼트화하고, (iv) 각각의 재훈련된 전문가 신호 프로파일러 1 내지 N을 세그먼트화된 다음 이미지 데이터 배치에 적용한다.The control logic (not shown) may repeat in each successive sequencing cycle of an ongoing sequencing run or in successive subseries sequencing cycles of an ongoing sequencing run (e.g., every 10 or 20 sequencing cycles). : (i) segment the current batch of image data based on one or more configurations selected from different spatial configurations, temporal configurations, signal distribution configurations, or a combination thereof, and (ii) for each segmented current batch of image data retrain the trained expert signal profilers 1 to N of, (iii) segment the next batch of image data by the same criteria as the current batch of image data, and (iv) retrain each of the retrained expert signal profilers 1 to N It is applied to the next batch of segmented image data.

신호 분배 구성-특이적 전문 신호 프로파일러Signal Distribution Configuration-Specific Professional Signal Profiler

도 19는 시퀀싱된 데이터에서 관찰된 각각의 신호 분포에 대한 각각의/별도의/상이한/독립적인 전문 신호 프로파일러들을 훈련시키는 일 구현예를 도시한다. 일부 구현예들에서, 각각의 신호 분포는 오프라인/이미 실행된 시퀀싱 런으로부터 시퀀싱된 데이터에서 관찰될 수 있다. 다른 구현예들에서, 각각의 신호 분포는 또한 온라인/진행 중인 시퀀싱 런으로부터 시퀀싱된 데이터에서 관찰될 수 있다(예를 들어, 진행 중인 시퀀싱 런의 처음 10개 시퀀싱 사이클에서 관찰됨).Figure 19 shows one implementation of training separate/separate/different/independent expert signal profilers for each signal distribution observed in sequenced data. In some implementations, each signal distribution can be observed in sequenced data from an offline/already run sequencing run. In other embodiments, the respective signal distribution can also be observed in sequenced data from an online/ongoing sequencing run (e.g., observed in the first 10 sequencing cycles of an ongoing sequencing run).

도 20은 신호 분포/신호 프로파일/클러스터 세기 프로파일의 일례를 도시한다. 도 20에 묘사된 클러스터 세기 프로파일은 클러스터 신호가 클러스터 중심에서 가장 강하고 클러스터 중심에서 멀어질수록 감쇠되는 감쇠 패턴을 따른다. 클러스터 집단에서 클러스터들(1904)의 서브집단들/그룹들/세트들은 유사한 신호 분포들을 가질 수 있다. 동일하거나 유사한 신호 분포를 공유하는 클러스터는 함께 버킷화될 수 있으므로(예: 위치 좌표에 따라 클러스터를 그룹화/어드레싱함으로써) 전문 신호 프로파일러가 해당 신호 분포를 나타내는 해당 클러스터 그룹/세트에 대해 훈련될 수 있다. 공간적으로 인접한 클러스터들이 그룹화되는 공간적 그룹화와는 달리, 신호 분포-기반 그룹화는 비-인접 클러스터들을 그룹화할 수 있다. 예를 들어, 타일의 반대편 에지들 상의 에지 클러스터들은 유사한 신호 분포들을 가질 수 있고, 그들의 세기 데이터가 동일한 전문 신호 프로파일러에 의해 보정되는 것으로 그룹화될 수 있다(예: 위치 좌표를 기준으로 클러스터를 그룹화/주소 지정에 의함).Figure 20 shows an example of signal distribution/signal profile/cluster intensity profile. The cluster intensity profile depicted in Figure 20 follows an attenuation pattern where the cluster signal is strongest at the cluster center and attenuates as one moves away from the cluster center. Subpopulations/groups/sets of clusters 1904 in a cluster population may have similar signal distributions. Clusters that share the same or similar signal distribution can be bucketed together (e.g. by grouping/addressing clusters according to their location coordinates) so that a professional signal profiler can be trained on that group/set of clusters representing that signal distribution. there is. Unlike spatial grouping, where spatially adjacent clusters are grouped, signal distribution-based grouping can group non-adjacent clusters. For example, edge clusters on opposite edges of a tile may have similar signal distributions and be grouped as having their intensity data calibrated by the same expert signal profiler (e.g., grouping clusters based on location coordinates). /by addressing).

본 명세서에 사용되는 바와 같이, 어구 "유사한 신호 분포"는 실질적으로 중첩하는 신호 패턴들을 공유하는 그러한 신호 분포들을 지칭한다. 예를 들어, 유사한 모양(예: 사다리꼴)이지만 모양 크기가 다른 두 신호 패턴(예: 하나는 더 큰 사다리꼴, 하나는 더 작은 사다리꼴)은 유사한 신호 분포를 갖는 것으로 간주될 수 있다. 유사하게, 예를 들어 각각의 차원에서 1 내지 5 단위의 범위 내의 공통 중심 또는 중심들을 갖는 2개의 신호 패턴들이 유사한 신호 분포들을 갖는 것으로 간주될 수 있다.As used herein, the phrase “similar signal distribution” refers to those signal distributions that share substantially overlapping signal patterns. For example, two signal patterns of similar shape (e.g., trapezoids) but different shape sizes (e.g., one larger trapezoid, one smaller trapezoid) may be considered to have similar signal distributions. Similarly, two signal patterns that have a common center or centers, for example within the range of 1 to 5 units in each dimension, may be considered to have similar signal distributions.

도 19에서, 각각의/별도의/상이한/독립적인 전문 신호 프로파일러 1 내지 N(1908)은 각각의 클러스터 세트 1 내지 N(1904)에 대응하는 각각의 신호 분포 1 내지 N(1902)의 신호 대 잡음비를 최대화하도록 훈련된다. 물론, 상이한 신호 분포들이 상이한 시퀀싱 사이클들에서 관찰될 수 있고, 따라서 상이한 전문 신호 프로파일러들이 진행 중인 시퀀싱 런의 상이한 시간적 스테이지들에 대해 훈련되고 적용을 위해 구성될 수 있다.19, each/separate/different/independent expert signal profiler 1 to N 1908 is configured to analyze the signals of each signal distribution 1 to N 1902 corresponding to each cluster set 1 to N 1904. It is trained to maximize the noise-to-noise ratio. Of course, different signal distributions may be observed at different sequencing cycles, and thus different expert signal profilers may be trained and configured for application to different temporal stages of an ongoing sequencing run.

본 명세서에 사용되는 바와 같이, 어구 "진행 중인 시퀀싱 런의 상이한 시간적 단계들"은 진행 중인 시퀀싱 런의 상이한 시퀀싱 사이클들 또는 상이한 시퀀싱 사이클 그룹들을 지칭한다. 예를 들어, 시퀀싱 런이 150개의 시퀀싱 사이클들을 갖는 경우, 각각의 연속적인 시퀀싱 사이클은 상이한 시간 스테이지, 또는 사이클들 1 내지 20과 같은 시퀀싱 사이클들, 사이클들 20 내지 40, 사이클들 40 내지 70의 그룹들 등으로 간주될 수 있고, 따라서 상이한 시간 스테이지들로 간주될 수 있다.As used herein, the phrase “different temporal stages of an ongoing sequencing run” refers to different sequencing cycles or different groups of sequencing cycles of an ongoing sequencing run. For example, if a sequencing run has 150 sequencing cycles, each successive sequencing cycle may be at a different time stage, or sequencing cycle such as Cycles 1 to 20, Cycles 20 to 40, or Cycles 40 to 70. may be regarded as groups, etc., and thus as different time stages.

프로세싱 파이프라인processing pipeline

도 21은 개시된 기술을 구현하는 프로세싱 파이프라인의 일 구현예를 도시한다. 프로세싱 파이프라인은 실시간 분석 모듈(225)에 의해 구현될 수 있다. 프로세싱 파이프라인은 하나의 구현예에 따라 사이클별 기반(2100)으로 실행되고, 각각의 새로운 사이클에 대해 2102를 반복한다. 일 구현예에서, 프로세싱 파이프라인에 대한 입력은 제1 (녹색) 채널 및 제2 (청색) 채널을 갖는 타일 이미지들이다.Figure 21 illustrates one implementation of a processing pipeline implementing the disclosed technology. The processing pipeline may be implemented by real-time analysis module 225. The processing pipeline runs on a cycle-by-cycle basis 2100 according to one implementation, repeating 2102 for each new cycle. In one implementation, the input to the processing pipeline is tile images with a first (green) channel and a second (blue) channel.

동작 2113에서, 템플릿 이미지는 템플릿 사이클들로 불리는 일부 수의 초기 시퀀싱 사이클들로부터의 시퀀싱 이미지들을 사용하여 타일 상의 클러스터들의 위치들을 식별하는 생성된다. 템플릿 이미지는 후속 등록 및 세기 추출 단계들에 대한 참조로서 사용된다. 템플릿 이미지는 템플릿 사이클의 각 시퀀싱 이미지에서 밝은 점을 감지하고 병합하여 형성되며, 이는 차례로 시퀀싱 이미지(2114)를 선명하게 하는 것(예를 들어 라플라시안 컨볼루션을 사용하여), 공간적으로 분리된 Otsu 접근법에 의해 "온" 임계값을 결정하는 것, 그리고 서브픽셀 위치 보간을 통한 후속 5픽셀 로컬 최대 검출을 포함한다. 어구 "임계치 상"은, 세기 값이 배경 세기 값 또는 잡음 세기 값보다 더 크도록 검출되도록, 미리 설정된 값, 예를 들어 200 또는 320을 초과하는 세기 값을 지칭할 수 있다.At operation 2113, a template image is created that identifies the locations of clusters on the tile using sequencing images from some number of initial sequencing cycles, called template cycles. The template image is used as a reference for subsequent registration and intensity extraction steps. A template image is formed by detecting and merging bright spots in each sequencing image of a template cycle, which in turn sharpens the sequencing image 2114 (e.g. using a Laplacian convolution), a spatially separated Otsu approach. determining the “on” threshold by , and subsequent 5-pixel local maximum detection with subpixel position interpolation. The phrase “above threshold” may refer to an intensity value exceeding a preset value, for example 200 or 320, such that the intensity value is detected as being greater than the background intensity value or the noise intensity value.

프로세싱 파이프라인은 이어서 현재 시퀀싱 이미지를 템플릿 이미지에 정합시킨다. 이는 이미지 상관 관계를 사용하여 현재 시퀀싱 이미지를 하위 영역의 템플릿 이미지에 정렬하거나 비선형 변환(예: 전체 6개 매개변수 선형 아핀 변환)을 사용하여 달성된다.The processing pipeline then matches the current sequencing image to the template image. This is achieved by using image correlation to align the current sequencing image to the template image of the subregion or by using a non-linear transformation (e.g., a full six-parameter linear affine transformation).

동작 2115에서, 프로세싱 파이프라인은 예를 들어 광학 렌즈의 기하학적 구조에 의해 야기되는 광학적 왜곡들을 고려하기 위해 각각의 스폿에 비선형 왜곡을 적용한다. 비-선형 왜곡은 3차 다항식들의 채널-의존적 계수들로서 적용될 수 있다.At operation 2115, the processing pipeline applies a non-linear distortion to each spot to account for optical distortions caused, for example, by the geometry of the optical lens. Non-linear distortion can be applied as channel-dependent coefficients of third-order polynomials.

동작 2116에서, 프로세싱 파이프라인은 상이한 공간적 구성, 시간적 구성들, 신호 분배 구성들, 또는 이들의 임의의 조합으로부터 선택된 하나 이상의 구성들에 기초하여 타일 이미지들을 서브타일 이미지들로 세그먼트화한다.At operation 2116, the processing pipeline segments the tile images into subtile images based on one or more configurations selected from different spatial configurations, temporal configurations, signal distribution configurations, or any combination thereof.

동작 2118에서, 대응하는 전문 신호 프로파일러들 1 내지 N을 사용하여 세그먼트화된 서브타일 이미지들로부터 세기가 추출된다.At operation 2118, the intensity is extracted from the segmented subtile images using corresponding expert signal profilers 1 to N.

동작 2123에서, 서브타일 세기는 예를 들어 추출된 세기의 백분위 90을 동일하게 만들어 공간적으로 정규화된다.In operation 2123, the subtile intensities are spatially normalized, for example by equalizing the 90th percentile of the extracted intensities.

동작 2124에서, 서브타일 세기들이 압축된다.At operation 2124, subtile intensities are compressed.

동작 2125에서, 프로세싱 파이프라인은 페이징 및 프리페이징 에러들에 의해 야기되는 이미지 데이터의 잡음을 보상하기 위해 경험적 위상 보정을 적용한다.At operation 2125, the processing pipeline applies heuristic phase correction to compensate for noise in the image data caused by paging and pre-phasing errors.

동작 2125에서, 프로세싱 파이프라인은 추출된 신호 세기들을 공간적으로 정규화하여 샘플링된 이미징에 걸쳐 조명의 변동을 고려한다. 예를 들어, 세기 값은 백분위 5 및 95가 각각 0 및 1의 값을 갖도록 정규화될 수 있다. 이미지에 대한 정규화된 신호 세기(예를 들어, 각 채널에 대한 정규화된 세기)는 이미지의 복수 지점에 대한 평균 순결을 계산하는 데 사용될 수 있다.At operation 2125, the processing pipeline spatially normalizes the extracted signal intensities to account for variations in illumination across the sampled imaging. For example, intensity values can be normalized such that percentiles 5 and 95 have values of 0 and 1, respectively. The normalized signal intensity for the image (e.g., the normalized intensity for each channel) can be used to calculate the average purity for multiple points in the image.

동작 2133에서, 프로세싱 파이프라인은 클러스터들의 밝기의 변동을 고려하기 위해 클러스터 별로 세기들을 스케일링한다.At operation 2133, the processing pipeline scales the intensities by cluster to account for variations in brightness of the clusters.

동작 2134에서, 프로세싱 파이프라인은 기대 최대화(EM) 알고리즘을 사용하여, 위에서 논의된 바와 같이 염기 호출들을 생성한다.At operation 2134, the processing pipeline uses the Expectation Maximization (EM) algorithm to generate base calls as discussed above.

동작 2135에서, 프로세싱 파이프라인은 품질 표들(Q표)(2152)을 사용하여 지불된 염기들에 품질 점수들을 할당한다.At operation 2135, the processing pipeline assigns quality scores to paid bases using quality tables (Q tables) 2152.

동작 2136에서, 프로세싱 파이프라인은 호출된 염기들을 (예를 들어, PhiX 박테리아의) 기준 게놈에 정렬시킨다.At operation 2136, the processing pipeline aligns the called bases to a reference genome (e.g., of a PhiX bacterium).

프로세싱 파이프라인은 염기 호출 및 품질 점수(2128), 동작 간(InterOp) 파일들(2138)(시퀀싱 분석 뷰를 위한 이진 보고 파일들), 및 로그들(2148)(예를 들어, 에러 로그, 일반적인 이벤트 로그, 처리 이벤트 로그, 경고 이벤트 로그)과 같은 소정 출력들을 생성한다.The processing pipeline includes base calling and quality scores 2128, InterOp files 2138 (binary reporting files for sequencing analysis views), and logs 2148 (e.g., error logs, general Generates certain outputs such as event log, processing event log, and warning event log.

다른 구성different configuration

본 발명에 의해 커버되는 구성의 다른 예는 시퀀싱 데이터를 분할하는 것 및 라이브러리 유형, 샘플 유형, 인덱싱 유형(제1 인덱스 판독 v/s 제2 인덱스 판독), 판독 유형(순방향 판독 v/s 역방향 판독), 판독 유형(순방향 판독 v/s 역방향 판독), 샘플의 물리적 속성, 잡음 유형(예를 들어, 버블), 및 시약 유형에 의해 대응하는 전문 신호 프로파일러들을 훈련시키는 것을 포함한다.Other examples of configurations covered by the present invention include partitioning sequencing data and selecting library type, sample type, indexing type (first index read v/s second index read), read type (forward read v/s reverse read) ), read type (forward read v/s reverse read), physical properties of the sample, noise type (e.g., bubble), and reagent type.

성능 결과 - 비자명성 및 진보성의 객관적 표시로서의 기술적 효과 및 이점Performance results - technical effectiveness and benefits as an objective indication of nonobviousness and inventive step

도 33은 전문 신호 프로파일러에 대한 비용 함수가 구배 하강의 각각의 반복에 따라 개선되는 방법을 도시한다. 비용 함수(또는 손실 함수)는 주어진 데이터에 대한 모델의 성능을 측정한다. 도 33에서, 상이한 컬러 라인들은 타일이 분할되는 서브타일들의 수에 대응하여, 각각의 서브타일에 대해, 전문 신호 프로파일러가 독립적으로 적응/훈련/구성/업데이트되도록 한다. 서브타일들의 수가 1에서 16으로 증가함에 따라(4x4 서브타일들), 전문 신호 프로파일러 내의 피팅 파라미터들의 수가 또한 증가한다. 결과적으로, 비용 함수는 더 낮다. 비용 함수는 웰(클러스터) 샘플에 대한 웰 세기와 염기 호출 중심 사이의 유클리드 거리의 제곱합이다. 이 비용 함수를 개선하는 것은 또한 염기 호출 정확도를 개선한다.Figure 33 shows how the cost function for an expert signal profiler improves with each iteration of gradient descent. The cost function (or loss function) measures the performance of the model for given data. In Figure 33, different colored lines correspond to the number of subtiles into which the tile is divided, allowing the expert signal profiler to be adapted/trained/configured/updated independently for each subtile. As the number of subtiles increases from 1 to 16 (4x4 subtiles), the number of fitting parameters in the expert signal profiler also increases. As a result, the cost function is lower. The cost function is the sum of the squares of the Euclidean distance between well intensities and base call centroids for a well (cluster) sample. Improving this cost function also improves base calling accuracy.

도 34는 전문 신호 프로파일러가 각각의 시퀀싱 사이클에서 적응/훈련/구성/업데이트될 때 도 33의 비용 함수의 초기 및 최종 값들을 도시하는 플롯이다. 각각의 연속적인 시퀀싱 사이클에서, 본 발명자들은 동일한 초기 전문 신호 프로파일러로 시작하였고, 구배 하강을 사용하여 전문 신호 프로파일러를 적응시켰다. 플롯은 임의의 시퀀싱 사이클에서 전문 신호 프로파일러를 적응/훈련/구성/업데이트할 수 있음을 보여준다.Figure 34 is a plot showing the initial and final values of the cost function of Figure 33 as the expert signal profiler is adapted/trained/configured/updated at each sequencing cycle. In each successive sequencing cycle, we started with the same initial expert signal profiler and adapted the expert signal profiler using gradient descent. The plot shows that it is possible to adapt/train/configure/update a professional signal profiler at arbitrary sequencing cycles.

도 35a 및 도 35b는 전문 신호 프로파일러를 적응/훈련/구성/업데이트할 때 시퀀싱 런에 대한 1차 분석 지표의 개선을 도시한다. 이 경우에, 평균 PhiX 오류율은 0.3520%에서 0.3316%로 개선된다.Figures 35A and 35B illustrate the improvement of primary analysis metrics for sequencing runs when adapting/training/configuring/updating an expert signal profiler. In this case, the average PhiX error rate improves from 0.3520% to 0.3316%.

도 36a 및 도 36b는 시퀀싱 타일이 각각의 전문 프로파일러들의 적응적 등화를 위해 분할될 수 있는 서브타일들의 수를 평가하는 2개의 플롯들을 도시한다. 각각의 서브타일에 대해, 별개의 전문 신호 프로파일러는 대응하는 서브타일로부터 웰들을 사용하여 적응/훈련/구성/업데이트된다. 더 많은 서브타일들을 사용하는 것은 타일 내에서 공간적으로 변하는 현상을 더 정확하게 모델링할 수 있게 하며, 이는 1 내지 9의 서브타일들의 수를 증가시킴에 따라 오류율과 Q30이 상당히 개선되는 이유이다. 그러나, 모델을 적응시키는 데 이용가능한 웰들의 수는 서브타일이 더 작아짐에 따라 감소한다. 즉, 올바른 수의 서브타일들을 픽킹하는 데 장단점이 있는 이유이다. 이 특정 경우에, 서브타일들의 수를 9에서 16으로 증가시키는 것은 오류율과 Q30을 약간 열화시킨다.Figures 36A and 36B show two plots evaluating the number of subtiles a sequencing tile can be divided into for adaptive equalization of each expert profiler. For each subtile, a separate expert signal profiler is adapted/trained/configured/updated using the wells from the corresponding subtile. Using more subtiles allows more accurate modeling of spatially varying phenomena within a tile, which is why the error rate and Q30 improve significantly as the number of subtiles from 1 to 9 increases. However, the number of wells available to adapt the model decreases as subtiles become smaller. In other words, this is why there are pros and cons to picking the right number of subtiles. In this specific case, increasing the number of subtiles from 9 to 16 slightly degrades the error rate and Q30.

컴퓨터 시스템computer system

도 37은 개시된 기술을 구현하는 데 사용될 수 있는 예시적인 컴퓨터 시스템(3700)을 도시한다. 컴퓨터 시스템(3700)은 버스 서브시스템(3755)을 통해 다수의 주변 디바이스들과 통신하는 적어도 하나의 중심 처리 유닛(CPU)(3772)을 포함한다. 이러한 주변 디바이스들은, 예를 들어 메모리 디바이스들 및 파일 저장 서브시스템(3736)을 포함하는 저장 서브시스템(3710), 사용자 인터페이스 입력 디바이스들(3738), 사용자 인터페이스 출력 디바이스들(3776), 및 네트워크 인터페이스 서브시스템(3774)을 포함할 수 있다. 입력 및 출력 디바이스들은 컴퓨터 시스템(3700)과의 사용자 상호작용을 허용한다. 네트워크 인터페이스 서브시스템(3774)은 다른 컴퓨터 시스템들에서의 대응하는 인터페이스 디바이스들에 대한 인터페이스를 포함하는 인터페이스를 외부 네트워크들에 제공한다.37 depicts an example computer system 3700 that can be used to implement the disclosed techniques. Computer system 3700 includes at least one central processing unit (CPU) 3772 that communicates with a number of peripheral devices via a bus subsystem 3755. These peripheral devices include, for example, storage subsystem 3710, including memory devices and file storage subsystem 3736, user interface input devices 3738, user interface output devices 3776, and network interface. May include subsystem 3774. Input and output devices allow user interaction with computer system 3700. Network interface subsystem 3774 provides an interface to external networks, including an interface to corresponding interface devices in other computer systems.

하나의 구현예에서, 전무 신호 프로파일러(3718)는 저장 서브시스템(3710) 및 사용자 인터페이스 입력 디바이스들(3738)에 통신가능하게 링크된다.In one implementation, executive signal profiler 3718 is communicatively linked to storage subsystem 3710 and user interface input devices 3738.

사용자 인터페이스 입력 디바이스들(3738)은 키보드; 마우스, 트랙볼, 터치패드, 또는 그래픽 태블릿과 같은 포인팅 디바이스들; 스캐너; 디스플레이 내에 통합된 터치 스크린; 음성 인식 시스템들 및 마이크로폰들과 같은 오디오 입력 디바이스들; 및 다른 유형들의 입력 디바이스들을 포함할 수 있다. 대체적으로, 용어 "입력 디바이스"의 사용은 정보를 컴퓨터 시스템(3700)에 입력하기 위한 모든 가능한 유형들의 디바이스들 및 방식들을 포함하도록 의도된다.User interface input devices 3738 include a keyboard; Pointing devices such as a mouse, trackball, touchpad, or graphics tablet; scanner; Touch screen integrated within the display; audio input devices such as voice recognition systems and microphones; and other types of input devices. Broadly, use of the term “input device” is intended to include all possible types of devices and methods for inputting information into computer system 3700.

사용자 인터페이스 출력 디바이스들(3776)은 디스플레이 서브시스템, 프린터, 팩스 기계, 또는 오디오 출력 디바이스들과 같은 비시각적 디스플레이들을 포함할 수 있다. 디스플레이 서브시스템은 LED 디스플레이, 음극선관(CRT), 액정 디스플레이(LCD)와 같은 평면 디바이스, 프로젝션 디바이스, 또는 가시적인 이미지를 생성하기 위한 일부 다른 메커니즘을 포함할 수 있다. 디스플레이 서브시스템은 또한, 오디오 출력 디바이스들과 같은 비시각적 디스플레이를 제공할 수 있다. 대체적으로, "출력 디바이스"라는 용어의 사용은 정보를 컴퓨터 시스템(3700)으로부터 사용자에게 또는 다른 기계 또는 컴퓨터 시스템에 출력하기 위한 모든 가능한 유형들의 디바이스들 및 방식들을 포함하도록 의도된다.User interface output devices 3776 may include non-visual displays, such as a display subsystem, printer, fax machine, or audio output devices. The display subsystem may include a planar device such as an LED display, a cathode ray tube (CRT), a liquid crystal display (LCD), a projection device, or some other mechanism for producing a visible image. The display subsystem may also provide non-visual displays, such as audio output devices. Broadly, the use of the term “output device” is intended to include all possible types of devices and manners for outputting information from computer system 3700 to a user or to another machine or computer system.

저장 서브시스템(3710)은 본 명세서에 기술된 모듈들 및 방법들 중 일부 또는 전부의 기능을 제공하는 프로그래밍 및 데이터 구성들을 저장한다. 이러한 소프트웨어 모듈들은 대체적으로, 프로세서들(3778)에 의해 실행된다.Storage subsystem 3710 stores programming and data configurations that provide functionality of some or all of the modules and methods described herein. These software modules are generally executed by processors 3778.

프로세서들(3778)은 그래픽 처리 유닛(GPU)들, 필드 프로그래밍가능 게이트 어레이(FPGA)들, 주문형 반도체(ASIC)들, 및/또는 코어스-그레인드 재구성가능 아키텍처(CGRA)들일 수 있다. 프로세서들(3778)은 Google Cloud Platform™, Xilinx™, 및 Cirrascale™과 같은 심층 학습 클라우드 플랫폼에 의해 호스팅될 수 있다. 프로세서들(3778)의 예들은 Google의 Tensor Processing Unit(TPU)™, 랙마운트 솔루션들, 예컨대 GX4 Rackmount Series™, GX37 Rackmount Series™, NVIDIA DGX-1™, Microsoft의 Stratix V FPGA™, Graphcore의 Intelligent Processor Unit (IPU)™, Snapdragon processors™을 갖는 Qualcomm의 Zeroth Platform™, NVIDIA의 Volta™, NVIDIA의 DRIVE PX™, NVIDIA의 JETSON TX1/TX2 MODULE™, Intel의 Nirvana™, Movidius VPU™, Fujitsu DPI™, ARM의 DynamicIQ™, IBM TrueNorth™, Testa VI 00s™을 갖는 Lambda GPU 서버 등을 포함한다.Processors 3778 may be graphics processing units (GPUs), field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and/or coarse-grained reconfigurable architectures (CGRAs). Processors 3778 may be hosted by deep learning cloud platforms such as Google Cloud Platform™, Xilinx™, and Cirrascale™. Examples of processors 3778 include Google's Tensor Processing Unit (TPU)™, rackmount solutions such as GX4 Rackmount Series™, GX37 Rackmount Series™, NVIDIA DGX-1™, Microsoft's Stratix V FPGA™, Graphcore's Intelligent Processor Unit (IPU)™, Qualcomm's Zeroth Platform™ with Snapdragon processors™, NVIDIA's Volta™, NVIDIA's DRIVE PX™, NVIDIA's JETSON TX1/TX2 MODULE™, Intel's Nirvana™, Movidius VPU™, Fujitsu DPI™ , ARM's DynamicIQ™, IBM TrueNorth™, and Lambda GPU servers with Testa VI 00s™.

저장 서브시스템(3710)에 사용되는 메모리 서브시스템(3722)은 프로그램 실행 동안 명령어들 및 데이터의 저장을 위한 메인 랜덤 액세스 메모리(RAM)(3732) 및 고정된 명령어들이 저장되는 판독 전용 메모리(ROM)(3734)를 포함하는 다수의 메모리들을 포함할 수 있다. 파일 저장 서브시스템(3736)은 프로그램 및 데이터 파일들을 위한 영구 저장소를 제공할 수 있고, 하드 디스크 드라이브, 연관된 착탈식 매체와 함께 플로피 디스크 드라이브, CD-ROM 드라이브, 광학 드라이브, 또는 착탈식 매체 카트리지를 포함할 수 있다. 소정 구현예들의 기능을 구현하는 모듈들은 저장 서브시스템(3710) 내의 파일 저장 서브시스템(3736)에 의해, 또는 프로세서에 의해 액세스가능한 다른 기계들에 저장될 수 있다.The memory subsystem 3722 used in the storage subsystem 3710 includes a main random access memory (RAM) 3732 for storage of instructions and data during program execution and a read-only memory (ROM) where fixed instructions are stored. It may include multiple memories including (3734). File storage subsystem 3736 may provide persistent storage for program and data files and may include a hard disk drive, a floppy disk drive with associated removable media, a CD-ROM drive, an optical drive, or a removable media cartridge. You can. Modules implementing the functionality of certain implementations may be stored by file storage subsystem 3736 in storage subsystem 3710, or on other machines accessible by the processor.

버스 서브시스템(3755)은 컴퓨터 시스템(3700)의 다양한 컴포넌트들 및 서브시스템들이 의도된 대로 서로 통신하게 하기 위한 메커니즘을 제공한다. 버스 서브시스템(3755)이 개략적으로 단일 버스로서 도시되어 있지만, 버스 서브시스템의 대안적인 구현예들은 다수의 버스들을 사용할 수 있다.Bus subsystem 3755 provides a mechanism for the various components and subsystems of computer system 3700 to communicate with each other as intended. Although bus subsystem 3755 is schematically depicted as a single bus, alternative implementations of the bus subsystem may use multiple buses.

컴퓨터 시스템(3700) 자체는 개인용 컴퓨터, 휴대용 컴퓨터, 워크스테이션, 컴퓨터 단말기, 네트워크 컴퓨터, 텔레비전, 메인프레임, 서버 팜, 광범위하게 분포된 느슨하게 네트워킹된 컴퓨터 세트, 또는 임의의 다른 데이터 처리 시스템이나 사용자 디바이스를 포함하는 다양한 유형들의 것일 수 있다. 컴퓨터들 및 네트워크들의 지속적으로 변화하는(ever-changing) 특성으로 인해, 도 37에 묘사된 컴퓨터 시스템(3700)의 설명은 본 발명의 바람직한 구현예들을 예시하기 위한 특정 예로서만 의도된다. 도 37에 묘사된 컴퓨터 시스템보다 더 많은 또는 더 적은 컴포넌트들을 갖는 컴퓨터 시스템(3700)의 많은 다른 구성들이 가능하다.Computer system 3700 itself may be a personal computer, portable computer, workstation, computer terminal, network computer, television, mainframe, server farm, widely distributed set of loosely networked computers, or any other data processing system or user device. It may be of various types including. Due to the ever-changing nature of computers and networks, the description of computer system 3700 depicted in FIG. 37 is intended only as a specific example to illustrate preferred implementations of the invention. Many other configurations of computer system 3700 are possible with more or fewer components than the computer system depicted in FIG. 37.

신경망 기반 염기 호출자Neural network-based base caller

다음의 논의는 전문 신호 프로파일러들과 함께 사용될 수 있는 본 명세서에 기술된 신경 네트워크 기반 염기 호출자에 초점을 맞춘다. 먼저, 하나의 구현예에 따라 신경망 기반 염기 호출자에 대한 입력이 설명된다. 이어서, 신경망 기반 염기 호출자의 구조 및 형태의 예들이 제공된다. 마지막으로, 하나의 구현예에 따라 신경망 기반 염기 호출자의 출력이 설명된다.The following discussion focuses on the neural network-based base caller described herein that can be used with professional signal profilers. First, the input to a neural network-based base caller according to one implementation is described. Next, examples of the structure and form of neural network-based base callers are provided. Finally, the output of a neural network-based base caller according to one implementation is described.

데이터 흐름 로직은 염기 호출을 위해 신경 네트워크 기반 염기 호출자에 시퀀싱 이미지들을 제공한다. 신경 네트워크 기반 염기 호출자는 패치 단위(또는 타일 단위)로 시퀀싱 이미지들에 액세스한다. 패치들 각각은 서열분석 이미지들을 형성하는 픽셀화된 단위들의 그리드 내의 픽셀화된 단위들의 서브그리드(또는 서브어레이)이다. 패치들은 픽셀화된 단위들의 서브그리드의 디멘션들 q x r을 가지며, 여기서 q(폭) 및 r(높이)은 1 내지 10,000 범위의 임의의 수들(예컨대, 3 x 3, 5 x 5, 7 x 7, 10 x 10, 15 x 15, 25 x 25, 64 x 64, 78 x 78, 115 x 115)이다. 일부 구현예들에서, q 및 r은 동일하다. 다른 구현예들에서, q 및 r은 상이하다. 일부 구현예들에서, 서열분석 이미지로부터 추출된 패치들은 동일한 크기의 것들이다. 다른 구현예들에서, 패치들은 상이한 크기들의 것들이다. 일부 구현예들에서, 패치들은 (예컨대, 에지들 상에서) 중첩되는 픽셀화된 단위들을 가질 수 있다.Data flow logic provides sequencing images to a neural network-based base caller for base calling. A neural network-based base caller accesses sequencing images on a patch-by-patch (or tile-by-tile) basis. Each of the patches is a subgrid (or subarray) of pixelated units within a grid of pixelated units that form sequencing images. Patches have dimensions q x r of a subgrid of pixelated units, where q (width) and r (height) are arbitrary numbers ranging from 1 to 10,000 (e.g., 3 x 3, 5 x 5, 7 x 7, 10 x 10, 15 x 15, 25 x 25, 64 x 64, 78 x 78, 115 x 115). In some implementations, q and r are the same. In other implementations, q and r are different. In some implementations, patches extracted from a sequencing image are of the same size. In other implementations, the patches are of different sizes. In some implementations, patches may have overlapping pixelated units (eg, on edges).

시퀀싱은 대응하는 m개의 이미지 채널들에 대해 시퀀싱 사이클당 m개의 시퀀싱 이미지들을 생성한다. 즉, 서열분석 이미지들 각각은 하나 이상의 이미지(또는 세기) 채널들(색상 이미지의 적색, 녹색, 청색(red, green, blue, RGB) 채널들과 유사함)을 갖는다. 하나의 구현예에서, 각각의 이미지 채널은 복수의 필터 파장 대역들 중 하나에 대응한다. 다른 구현예에서, 각각의 이미지 채널은 서열분석 사이클에서의 복수의 이미징 이벤트들 중 하나에 대응한다. 또 다른 구현예에서, 각각의 이미지 채널은 특정 레이저를 사용하는 조명과 특정 광학 필터를 통한 이미징의 조합에 대응한다. 이미지 패치들은 특정 서열분석 사이클 동안 m개의 이미지 채널들 각각으로부터 타일링(또는 액세스)된다. 4-, 2-, 및 1-채널 화학들과 같은 상이한 구현예들에서, m은 4 또는 2이다. 다른 구현예들에서, m은 1, 3, 또는 4 초과이다. 다른 구현예에서, 이미지는 적색 및 녹색 채널 대신에 또는 그에 더하여 청색 및 자색 색상 채널에 있을 수 있다.Sequencing generates m sequencing images per sequencing cycle for the corresponding m image channels. That is, each of the sequencing images has one or more image (or intensity) channels (similar to the red, green, blue (RGB) channels of a color image). In one implementation, each image channel corresponds to one of a plurality of filter wavelength bands. In another implementation, each image channel corresponds to one of a plurality of imaging events in a sequencing cycle. In another implementation, each image channel corresponds to a combination of illumination using a specific laser and imaging through a specific optical filter. Image patches are tiled (or accessed) from each of the m image channels during a particular sequencing cycle. In different embodiments, such as 4-, 2-, and 1-channel chemistries, m is 4 or 2. In other embodiments, m is greater than 1, 3, or 4. In other implementations, the image may be in the blue and violet color channels instead of or in addition to the red and green channels.

예를 들어, 서열분석 런이 2개의 상이한 이미지 채널들, 즉 청색 채널 및 녹색 채널을 사용하여 구현된다는 것을 고려한다. 이어서, 각각의 서열분석 사이클에서, 서열분석 런은 청색 이미지 및 녹색 이미지를 생성한다. 이러한 방식으로, 시퀀싱 런의 일련의 k개의 시퀀싱 사이클들 동안, 청색 및 녹색 이미지들의 k개의 쌍들의 서열이 출력으로서 생성되고, 시퀀싱 이미지들로서 저장된다. 따라서, 신경 네트워크 기반 염기 호출자에 의한 패치 레벨 프로세싱에 대해, 청색 및 녹색 이미지 패치들의 k개의 쌍들의 서열이 생성된다.For example, consider that a sequencing run is implemented using two different image channels: a blue channel and a green channel. Then, at each sequencing cycle, the sequencing run produces a blue image and a green image. In this way, during a series of k sequencing cycles of a sequencing run, sequences of k pairs of blue and green images are generated as output and stored as sequencing images. Therefore, for patch level processing by a neural network based base caller, a sequence of k pairs of blue and green image patches is generated.

염기 호출의 단일 반복(또는 단일 순방향 순회(traversal) 또는 순방향 패스의 단일 인스턴스)을 위한 신경 네트워크 기반 염기 호출자에 대한 입력 이미지 데이터는 다수의 시퀀싱 사이클들의 슬라이딩 윈도우에 대한 데이터를 포함한다. 슬라이딩 윈도우는 예를 들어, 현재 서열분석 사이클, 하나 이상의 선행 서열분석 사이클들, 및 하나 이상의 연속 서열분석 사이클들을 포함할 수 있다.The input image data for a neural network-based base caller for a single iteration of base calling (or a single forward traversal or a single instance of a forward pass) includes data for a sliding window of multiple sequencing cycles. The sliding window may include, for example, a current sequencing cycle, one or more preceding sequencing cycles, and one or more consecutive sequencing cycles.

하나의 구현예에서, 입력 이미지 데이터는 3개의 시퀀싱 사이클들에 대한 데이터를 포함하여, 염기 호출될 현재(시간 t) 시퀀싱 사이클에 대한 데이터에 (i) 좌측 플랭킹/콘텍스트/이전/선행/우선(시간 t-1) 시퀀싱 사이클에 대한 데이터, 및 (ii) 우측 플랭킹/콘텍스트/다음/연속/후속(시간 t+1) 시퀀싱 사이클에 대한 데이터가 동반된다.In one implementation, the input image data includes data for three sequencing cycles, including data for the current (time t) sequencing cycle to be base called (i) left flanking/context/previous/previous/first. Accompanied by data for (time t -1) sequencing cycle, and (ii) right flanking/context/next/continuous/subsequent (time t +1) sequencing cycle.

다른 구현예에서, 입력 이미지 데이터는 5개의 시퀀싱 사이클들에 대한 데이터를 포함하여, 염기 호출될 현재(시간 t) 시퀀싱 사이클에 대한 데이터에 (i) 제1 좌측 플랭킹/콘텍스트/이전/선행/우선(시간 t-1) 시퀀싱 사이클에 대한 데이터, (ii) 제2 좌측 플랭킹/콘텍스트/이전/선행/우선(시간 t-2) 시퀀싱 사이클에 대한 데이터, (iii) 제1 우측 플랭킹/콘텍스트/다음/연속/후속(시간 t+1)에 대한 데이터, 및 (iv) 제2 우측 플랭킹/콘텍스트/다음/연속/후속(시간 t+2) 시퀀싱 사이클에 대한 데이터가 동반된다.In another implementation, the input image data includes data for 5 sequencing cycles, including data for the current (time t) sequencing cycle to be base called (i) first left flanking/context/previous/preceding/ Data for the first (time t -1) sequencing cycle, (ii) data for the second left flanking/context/previous/preceding/priority (time t -2) sequencing cycle, (iii) first right flanking/ Data for Context/Next/Continuous/Subsequent (Time t +1), and (iv) data for the second right flanking/Context/Next/Continuous/Subsequent (Time t +2) sequencing cycle.

또 다른 구현예에서, 입력 이미지 데이터는 7개의 서열분석 사이클들에 대한 데이터를 포함하여, 염기 호출될 현재(시간 t) 서열분석 사이클에 대한 데이터에 (i) 제1 좌측 플랭킹/콘텍스트/이전/선행/우선(시간 t-1) 서열분석 사이클에 대한 데이터, (ii) 제2 좌측 플랭킹/콘텍스트/이전/선행/우선(시간 t-2) 서열분석 사이클에 대한 데이터, (iii) 제3 좌측 플랭킹/콘텍스트/이전/선행/우선(시간 t-3) 서열분석 사이클에 대한 데이터, (iv) 제1 우측 플랭킹/콘텍스트/다음/연속/후속(시간 t+1)에 대한 데이터, (iv) 제2 우측 플랭킹/콘텍스트/다음/연속/후속(시간 t+2) 서열분석 사이클에 대한 데이터, 및 (v) 제3 우측 플랭킹/콘텍스트/다음/연속/후속(시간 t+3) 서열분석 사이클에 대한 데이터가 동반된다. 다른 구현예들에서, 입력 이미지 데이터는 단일 서열분석 사이클에 대한 데이터를 포함한다. 또 다른 구현예들에서, 입력 이미지 데이터는 10개, 15개, 20개, 30개, 58개, 75개, 92개, 130개, 168개, 175개, 209개, 225개, 230개, 275개, 318개, 325개, 330개, 525개, 또는 625개의 서열분석 사이클들 동안의 데이터를 포함한다.In another embodiment, the input image data includes data for seven sequencing cycles, including data for the current (time t ) sequencing cycle to be base called, plus (i) the first left flank/context/prior /preceding/preceding (time t -1) data for the sequencing cycle, (ii) second left flanking/context/previous/preceding/preferring (time t -2) sequencing cycle, (iii) first Data for 3 left flanking/context/previous/previous/first (time t -3) sequencing cycles, (iv) data for the first right flanking/context/next/successive/successor (time t +1) , (iv) data for the second right flanking/context/next/continuous/subsequent (time t +2) sequencing cycle, and (v) data for the third right flanking/context/next/consecutive/subsequent (time t +3) Accompanied by data on sequencing cycles. In other implementations, the input image data includes data for a single sequencing cycle. In still other implementations, the input image data is 10, 15, 20, 30, 58, 75, 92, 130, 168, 175, 209, 225, 230, Includes data for 275, 318, 325, 330, 525, or 625 sequencing cycles.

신경 네트워크 기반 염기 호출자는 하나의 구현예에 따르면, 이미지 패치들을 그의 콘볼루션 층들을 통해 프로세싱하고 대안적인 표현을 생성한다. 이어서, 대안적인 표현은, 바로 현재(시간 t) 시퀀싱 사이클 또는 시퀀싱 사이클들 각각, 즉 현재(시간 t) 시퀀싱 사이클, 제1 및 제2 선행(시간 t-1, 시간 t-2) 시퀀싱 사이클들, 및 제1 및 제2 후행(시간 t+1, 시간 t+2) 시퀀싱 사이클들 중 어느 하나의 시퀀싱 사이클 동안 염기 호출을 생성하기 위한 출력 층(예컨대, 소프트맥스 층)에 의해 사용된다. 생성된 염기 호출들은 서열분석 리드들을 형성한다.A neural network-based base caller, according to one implementation, processes image patches through its convolutional layers and generates alternative representations. Then, an alternative expression is the immediately current (time t ) sequencing cycle or sequencing cycles, respectively, i.e. the current (time t ) sequencing cycle, the first and second preceding (time t -1, time t -2) sequencing cycles. , and an output layer (e.g., a softmax layer) to generate base calls during either the first and second trailing (time t +1, time t +2) sequencing cycles. The generated base calls form sequencing reads.

하나의 구현예에서, 신경망 기반 염기 호출자는 특정 시퀀싱 사이클에 대한 단일 표적 클러스터에 대해 염기 호출을 출력한다. 다른 구현예에서, 신경 네트워크 기반 염기 호출자는 특정 시퀀싱 사이클 동안 복수의 표적 클러스터들 내의 각각의 표적 클러스터에 대한 염기 호출을 출력한다. 또 다른 구현예에서, 신경 네트워크 기반 염기 호출자는 복수의 시퀀싱 사이클들 중의 각각의 시퀀싱 사이클 동안 복수의 표적 클러스터들 내의 각각의 표적 클러스터에 대한 염기 호출을 출력하여, 이에 의해, 각각의 표적 클러스터에 대한 염기 호출 서열을 생성한다.In one implementation, a neural network-based base caller outputs base calls for a single target cluster for a particular sequencing cycle. In another implementation, a neural network-based base caller outputs a base call for each target cluster within a plurality of target clusters during a particular sequencing cycle. In another embodiment, a neural network-based base caller outputs a base call for each target cluster within a plurality of target clusters during each sequencing cycle of the plurality of sequencing cycles, thereby generating a base call for each target cluster. Generate base calling sequence.

하나의 구현예에서, 신경망 기반 염기 호출자는 다층 퍼셉트론(multilayer perceptron, MLP)이다. 다른 구현예에서, 신경망 기반 염기 호출자는 피드포워드(feedforward) 신경망이다. 또 다른 구현예에서, 신경망 기반 염기 호출자는 완전히 접속된 신경망이다. 추가 구현예에서, 신경망 기반 염기 호출자는 완전 콘볼루션 신경망이다. 다른 추가 구현예에서, 신경망 기반 염기 호출자는 시맨틱 세그먼트화(semantic segmentation) 신경망이다. 또 다른 추가 구현예에서, 신경망 기반 염기 호출자는 생성적 적대 네트워크(generative adversarial network, GAN)이다.In one implementation, the neural network-based base caller is a multilayer perceptron (MLP). In another implementation, the neural network-based base caller is a feedforward neural network. In another implementation, the neural network based base caller is a fully connected neural network. In a further implementation, the neural network based base caller is a fully convolutional neural network. In another further implementation, the neural network based base caller is a semantic segmentation neural network. In yet a further implementation, the neural network-based base caller is a generative adversarial network (GAN).

하나의 구현예에서, 신경망 기반 염기 호출자는 복수의 콘볼루션 층들을 갖는 콘볼루션 신경망(CNN)이다. 다른 구현예에서, 신경 네트워크 기반 염기 호출자는 장단기 메모리 네트워크(short-term memory network, LSTM), 양방향 LSTM(bi-directional LSTM, Bi-LSTM), 또는 게이트형 순환 유닛(gated recurrent unit, GRU)과 같은 순환 신경 네트워크(recurrent neural network, RNN)이다. 또 다른 구현예에서, 신경 네트워크 기반 염기 호출자는 CNN 및 RNN 둘 모두를 포함한다.In one implementation, the neural network-based base caller is a convolutional neural network (CNN) with multiple convolutional layers. In another implementation, the neural network-based base caller includes a short-term memory network (LSTM), a bi-directional LSTM (Bi-LSTM), or a gated recurrent unit (GRU). It is the same recurrent neural network (RNN). In another implementation, neural network based base callers include both CNNs and RNNs.

또 다른 구현예들에서, 신경망 기반 염기 호출자는 ID 콘볼루션, 2D 콘볼루션, 3D 콘볼루션, 4D 콘볼루션, 5D 콘볼루션, 확장형 또는 아트로스 콘볼루션, 전치 콘볼루션, 깊이별 분리가능 콘볼루션, 포인트별 콘볼루션, l x l 콘볼루션, 그룹 콘볼루션, 편평형 콘볼루션, 공간적 및 교차 채널 콘볼루션, 셔플 그룹형 콘볼루션, 공간적 분리가능 콘볼루션, 및 디콘볼루션을 사용할 수 있다. 신경 네트워크 기반 염기 호출자는 하나 이상의 손실 함수들, 예컨대 로지스틱 회귀(logistic regression)/로그(log) 손실, 다중클래스 교차-엔트로피(multi-class cross-entropy)/소프트맥스 손실, 이진 교차-엔트로피(binary cross-entropy) 손실, 평균 제곱 에러(mean-squared error) 손실, L1 손실, L2 손실, 평활한(smooth) L1 손실, 및 Huber 손실을 사용할 수 있다. 신경 네트워크 기반 염기 호출자는 임의의 병렬성(parallelism), 효율성, 및 압축 스킴들, 예컨대 TFRecords, 압축 인코딩(예컨대, PNG), 샤딩(sharding), 맵 변환을 위한 병렬 호출, 배칭(batching), 프리페칭(prefetching), 모델 병렬성, 데이터 병렬성, 및 동기식/비동기식 확률적 기울기 하강법(stochastic gradient descent, SGD)을 사용할 수 있다. 신경 네트워크 기반 염기 호출자는 업샘플링 층, 다운샘플링 층, 순환 접속, 게이트 및 게이트형 메모리 유닛(예컨대, LSTM 또는 GRU), 잔차 블록, 잔차 접속, 하이웨이 접속, 스킵 접속, 핍홀(peephole) 접속, 활성화 함수(예컨대, ReLU(rectifying linear unit), 리키 ReLU(leaky ReLU), ELU(exponential liner unit), 시그모이드 및 tanh(hyperbolic tangent)와 같은 비선형 변환 함수), 배치 정규화 층, 규칙화 층, 드롭아웃, 풀링 층(예컨대, 최대 또는 평균 풀링), 글로벌 평균 풀링 층, 및 감쇠 메커니즘을 포함할 수 있다.In still other implementations, the neural network based base caller can be used to perform an ID convolution, a 2D convolution, a 3D convolution, a 4D convolution, a 5D convolution, a dilated or atros convolution, a transpose convolution, a depth-wise separable convolution, Point-wise convolution, l x l convolution, group convolution, flat convolution, spatial and cross-channel convolution, shuffle grouped convolution, spatial separable convolution, and deconvolution can be used. A neural network-based base caller may use one or more loss functions, such as logistic regression/log loss, multi-class cross-entropy/softmax loss, and binary cross-entropy. cross-entropy loss, mean-squared error loss, L1 loss, L2 loss, smooth L1 loss, and Huber loss can be used. Neural network-based base callers support arbitrary parallelism, efficiency, and compression schemes, such as TFRecords, compressed encoding (e.g., PNG), sharding, parallel calls for map transformations, batching, and prefetching. (prefetching), model parallelism, data parallelism, and synchronous/asynchronous stochastic gradient descent (SGD) can be used. Neural network-based base callers include upsampling layers, downsampling layers, recurrent connections, gated and gated memory units (e.g., LSTM or GRU), residual blocks, residual connections, highway connections, skip connections, peephole connections, and activation. Functions (e.g., nonlinear transformation functions such as rectifying linear unit (ReLU), leaky ReLU, exponential liner unit (ELU), sigmoid, and hyperbolic tangent (tanh)), batch normalization layer, regularization layer, drop It may include an out, pooling layer (e.g., maximum or average pooling), a global average pooling layer, and an attenuation mechanism.

신경망 기반 염기 호출자는 역전파 기반 기울기 업데이트 기법들을 사용하여 훈련된다. 신경망 기반 염기 호출자를 훈련시키기 위해 사용될 수 있는 예시적인 기울기 하강 기법들은 확률적 기울기 하강법, 배치 기울기 하강법, 및 미니-배치 기울기 하강법을 포함한다. 신경망 기반 염기 호출자를 훈련시키는 데 사용될 수 있는 기울기 하강 최적화 알고리즘들의 일부 예들은 Momentum, Nesterov 가속화된 기울기, Adagrad, Adadelta, RMSprop, Adam, AdaMax, Nadam, 및 AMSGrad이다.A neural network-based base caller is trained using backpropagation-based gradient update techniques. Exemplary gradient descent techniques that can be used to train a neural network-based base caller include stochastic gradient descent, batch gradient descent, and mini-batch gradient descent. Some examples of gradient descent optimization algorithms that can be used to train a neural network-based base caller are Momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSprop, Adam, AdaMax, Nadam, and AMSGrad.

하나의 구현예에서, 신경 네트워크 기반 염기 호출자는 특화된 아키텍처를 사용하여, 상이한 시퀀싱 사이클들 동안의 데이터의 프로세싱을 분리(segregate)한다. 특화된 아키텍처를 사용하기 위한 동기가 먼저 기술된다. 상기에서 논의된 바와 같이, 신경 네트워크 기반 염기 호출자는 현재 시퀀싱 사이클, 하나 이상의 선행 시퀀싱 사이클들, 및 하나 이상의 연속 시퀀싱 사이클들 동안 이미지 패치들을 프로세싱한다. 추가적인 서열분석 사이클에 대한 데이터는 서열-특이적 콘텍스트를 제공한다. 신경망 기반 염기 호출자는 훈련 동안 서열-특이적 콘텍스트를 학습하고, 그들을 염기 호출한다. 더욱이, 사전 및 사후 서열분석 사이클에 대한 데이터는 현재 서열분석 사이클에 대한 사전-페이징 및 페이징 신호의 2차 기여를 제공한다.In one implementation, a neural network-based base caller uses a specialized architecture to segregate the processing of data during different sequencing cycles. The motivation for using a specialized architecture is first described. As discussed above, the neural network-based base caller processes image patches during the current sequencing cycle, one or more preceding sequencing cycles, and one or more successive sequencing cycles. Data on additional sequencing cycles provide sequence-specific context. Neural network-based base callers learn sequence-specific contexts during training and base call them. Moreover, data on pre- and post-sequencing cycles provide secondary contributions of pre-phasing and phasing signals to the current sequencing cycle.

그러나, 상이한 서열분석 사이클들에서 그리고 상이한 이미지 채널들에서 캡처된 이미지들은 오정렬되고, 서로에 대해 잔차 등록 오류(residual registration error)를 갖는다. 이러한 오정렬을 처리하기 위해, 특화된 아키텍처는 서열분석 사이클들 사이의 정보를 혼합하지 않고 단지 한 서열분석 사이클 내에서의 정보만을 혼합하는 공간 콘볼루션 층을 포함한다.However, images captured in different sequencing cycles and in different image channels are misaligned and have residual registration errors with respect to each other. To handle such misalignments, specialized architectures include spatial convolution layers that do not mix information between sequencing cycles, but only mix information within one sequencing cycle.

공간 콘볼루션 층들(또는 공간 로직)은, 콘볼루션들의 "전용 비공유" 시퀀스를 통해 복수의 서열분석 사이클들 각각 동안 데이터를 독립적으로 프로세싱함으로써 분리를 조작할 수 있게 하는 소위 "분리형 콘볼루션들"을 사용한다. 분리형 콘볼루션들은, 임의의 다른 시퀀싱 사이클의 데이터 및 생성된 특징 맵에 걸쳐서 콘볼루션하지 않고서, 단지 소정의 시퀀싱 사이클, 즉, 사이클내(intra-cycle) 전용의 데이터 및 생성된 특징 맵에 걸쳐서 콘볼루션한다.Spatial convolution layers (or spatial logic) use so-called "separate convolutions" that allow separation to be manipulated by independently processing the data during each of multiple sequencing cycles through a "dedicated, non-shared" sequence of convolutions. use. Separate convolutions convolution the data and generated feature maps only for a given sequencing cycle, i.e., intra-cycle, without convolving over the data and generated feature maps of any other sequencing cycle. Routine.

예를 들어, 입력 이미지 데이터는 (i) 염기 호출될 현재(시간 t) 서열분석 사이클 동안의 현재 이미지 패치, (ii) 이전(시간 t-1) 서열분석 사이클 동안의 이전 이미지 패치, 및 (iii) 다음(시간 t+1) 서열분석 사이클 동안의 다음 이미지 패치를 포함한다는 것을 고려한다. 이어서, 특화된 아키텍처는 3개의 별개의 콘볼루션 파이프라인들, 즉, 현재 콘볼루션 파이프라인, 이전 콘볼루션 파이프라인, 및 다음 콘볼루션 파이프라인을 시작한다. 현재 데이터 프로세싱 파이프라인은 현재(시간 t) 서열분석 사이클 동안 현재 이미지 패치를 입력으로서 수신하고, 이를 복수의 공간 콘볼루션 층들을 통해 독립적으로 프로세싱하여 최종 공간 콘볼루션 층의 출력으로서 소위 "현재 공간적으로 콘볼루션된 표현"을 생성한다. 이전 콘볼루션 파이프라인은 이전(시간 t-1) 서열분석 사이클 동안 이전 이미지 패치를 입력으로서 수신하고, 이를 복수의 공간 콘볼루션 층들을 통해 독립적으로 프로세싱하여 최종 공간 콘볼루션 층의 출력으로서 소위 "이전에 공간적으로 콘볼루션된 표현"을 생성한다. 다음 콘볼루션 파이프라인은 다음(시간 t+1) 서열분석 사이클 동안 다음 이미지 패치를 입력으로서 수신하고, 이를 복수의 공간 콘볼루션 층들을 통해 독립적으로 프로세싱하여 최종 공간 콘볼루션 층의 출력으로서 소위 "다음에 공간적으로 콘볼루션된 표현"을 생성한다.For example, the input image data can be (i) the current image patch during the current (time t ) sequencing cycle to be base called, (ii) the previous image patch during the previous (time t -1) sequencing cycle, and (iii) ) is considered to include the next image patch for the next (time t +1) sequencing cycle. The specialized architecture then starts three separate convolution pipelines: the current convolution pipeline, the previous convolution pipeline, and the next convolution pipeline. The current data processing pipeline receives the current image patch as input during the current (time t ) sequencing cycle and independently processes it through a plurality of spatial convolution layers to produce the so-called “current spatial convolution layer” as the output of the final spatial convolution layer. Creates a “convolved representation”. The previous convolution pipeline receives as input the previous image patch during the previous (time t -1) sequencing cycle, processes it independently through a plurality of spatial convolution layers, and produces the so-called "previous" patch as the output of the final spatial convolution layer. Creates a “spatially convolved representation”. The next convolution pipeline receives the next image patch as input during the next (time t +1) sequencing cycle and independently processes it through a plurality of spatial convolution layers to produce the so-called “next” as the output of the final spatial convolution layer. Creates a “spatially convolved representation”.

일부 구현예들에서, 현재, 이전, 및 다음 콘볼루션 파이프라인들은 병렬로 실행된다. 일부 구현예들에서, 공간 콘볼루션 층들은 특화된 아키텍처 내의 공간 콘볼루션 네트워크(또는 서브네트워크)의 일부이다.In some implementations, the current, previous, and next convolution pipelines run in parallel. In some implementations, spatial convolutional layers are part of a spatial convolutional network (or subnetwork) within a specialized architecture.

신경 네트워크 기반 염기 호출자는 시퀀싱 사이클들 사이의, 즉, 사이클간(inter-cycle)의 정보를 혼합하는 시간 콘볼루션 층들(또는 시간 로직)을 더 포함한다. 시간 콘볼루션 층들은 공간 콘볼루션 네트워크로부터 그들의 입력들을 수신하고, 각자의 데이터 프로세싱 파이프라인들에 대해 최종 공간 콘볼루션 층에 의해 생성되는 공간적으로 콘볼루션된 표현들 상에서 동작한다.The neural network-based base caller further includes temporal convolution layers (or temporal logic) that mix information between sequencing cycles, i.e., inter-cycle. The temporal convolutional layers receive their inputs from the spatial convolutional network and operate on the spatially convolutional representations produced by the final spatial convolutional layer for their respective data processing pipelines.

시간 콘볼루션 층들의 사이클간 동작가능성 자유는, 공간 콘볼루션 네트워크에 대한 입력으로서 공급되는 이미지 데이터 내에 존재하는 오정렬 속성이 공간 콘볼루션 층들의 시퀀스에 의해 수행되는 분리형 콘볼루션들의 스택 또는 캐스케이드에 의해, 공간적으로 콘볼루션된 표현들로부터 제거된다(purged out)는 사실로부터 나온다.The cycle-to-cycle operability freedom of the temporal convolutional layers is such that the misalignment properties present in the image data supplied as input to the spatial convolutional network are achieved by a stack or cascade of disjoint convolutions performed by a sequence of spatial convolutional layers. It follows from the fact that it is purged out from spatially convolved representations.

시간 콘볼루션 층은 슬라이딩 윈도우 단위로 연속적 입력들에서 입력 채널들에 걸쳐서 그룹별로 콘볼루션하는 소위 "조합 콘볼루션들"을 사용한다. 일 구현예에서, 연속적 입력들은 이전 공간 콘볼루션 층 또는 이전 시간 콘볼루션 층에 의해 생성되는 연속적 출력들이다.The temporal convolution layer uses so-called “combinatorial convolutions”, which group-by-group convolve over input channels in successive inputs on a sliding window basis. In one implementation, the continuous inputs are continuous outputs produced by the previous spatial convolution layer or the previous temporal convolution layer.

일부 구현예들에서, 시간 콘볼루션 층들은 특화된 아키텍처 내의 시간 콘볼루션 네트워크(또는 서브네트워크)의 일부이다. 시간 콘볼루션 네트워크는 그의 입력들을 공간 콘볼루션 네트워크로부터 수신한다. 하나의 구현예에서, 시간 콘볼루션 네트워크의 제1 시간 콘볼루션 층은 서열분석 사이클들 사이의 공간적으로 콘볼루션된 표현들을 그룹별로 조합한다. 다른 구현예에서, 시간 콘볼루션 네트워크의 후속 시간 콘볼루션 층들은 이전 시간 콘볼루션 층들의 연속 출력들을 조합한다. 최종 시간 콘볼루션 층의 출력은 출력을 생성하는 출력 층에 공급된다. 출력은 하나 이상의 서열분석 사이클에서 하나 이상의 클러스터를 염기 호출하는 데 사용된다.In some implementations, temporal convolutional layers are part of a temporal convolutional network (or subnetwork) within a specialized architecture. A temporal convolutional network receives its inputs from a spatial convolutional network. In one implementation, the first temporal convolutional layer of the temporal convolutional network combines spatially convolved representations group by group between sequencing cycles. In another implementation, subsequent temporal convolutional layers of a temporal convolutional network combine successive outputs of previous temporal convolutional layers. The output of the final temporal convolution layer is fed to the output layer that generates the output. The output is used to base call one or more clusters in one or more sequencing cycles.

신경망 기반 염기 호출자에 관한 추가적인 세부사항들은 발명의 명칭이 "Artificial Intelligence-Based Sequencing"이고 2019년 3월 21일자로 출원된 미국 가특허 출원 제62/821,766호(대리인 문서 번호 ILLM 1008-9/IP- 1752-PRV)에서 찾을 수 있으며, 이는 본원에 참고로 포함된다.Additional details regarding the neural network-based base caller can be found in U.S. Provisional Patent Application Serial No. 62/821,766, entitled “Artificial Intelligence-Based Sequencing,” filed March 21, 2019 (Attorney Docket No. ILLM 1008-9/IP) - 1752-PRV), which is incorporated herein by reference.

항목item

하기의 항목들은 본 개시내용의 일부이다:The following items are part of this disclosure:

1. 시스템으로서,One. As a system,

복수의 전문 신호 프로파일러를 저장하는 메모리로서, 상기 복수의 전문 신호 프로파일러 내의 각각의 전문 신호 프로파일러는 특정 분석물 클래스에서 분석물에 대해 검출되고 특정 훈련 데이터 세트에서 특징지어지는 특정 신호 프로파일에서 시퀀싱된 신호의 신호 대 잡음비를 최대화하도록 훈련되는 메모리;A memory storing a plurality of specialized signal profilers, wherein each specialized signal profiler in the plurality of specialized signal profilers is a specific signal profile detected for an analyte in a specific analyte class and characterized in a specific training data set. a memory trained to maximize the signal-to-noise ratio of the sequenced signal;

상기 메모리에 액세스하는 런타임 로직으로서, 염기 호출 동작(base calling operation) 동안 각각의 분석물 클래스 내의 분석물에 대해 검출된 각각의 신호 프로파일에서 시퀀싱된 신호에 상기 복수의 전문 신호 프로파일러 내의 각각의 전문 신호 프로파일러를 적용함으로써 염기 호출 동작을 실행하도록 구성되는 런타임 로직을 포함하는, 시스템.Runtime logic that accesses the memory, wherein during a base calling operation, a sequenced signal from each detected signal profile for an analyte within each analyte class is accessed by each expert within the plurality of expert signal profilers. A system comprising runtime logic configured to execute base calling operations by applying a signal profiler.

2. 항목 1에 있어서, 상기 각각의 분석물 클래스는 상기 염기 호출 동작 동안 상기 각각의 신호 프로파일의 생성에 기여하는 상기 분석물의 상이한 공간적 구성을 나타내는, 시스템.2. The system of item 1, wherein each analyte class represents a different spatial configuration of the analytes that contribute to the generation of the respective signal profile during the base calling operation.

3. 항목 2에 있어서, 상기 상이한 공간적 구성은 상기 염기 호출 동작이 실행되는 바이오센서의 상이한 표면 상에 위치되는 분석물을 포함하는, 시스템.3. The system of item 2, wherein the different spatial configurations include analytes located on different surfaces of the biosensor on which the base calling operation is performed.

4. 항목 3에 있어서, 상기 상이한 표면은 상부 표면 및 하부 표면을 포함하는, 시스템.4. The system of item 3, wherein the different surfaces include an upper surface and a lower surface.

5. 항목 2 내지 항목 4 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상이한 레인 상에 위치되는 분석물을 포함하는, 시스템.5. The system of any of items 2-4, wherein the different spatial configurations include analytes located on different lanes of the biosensor.

6. 항목 2 내지 항목 5 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상이한 레인 그룹 상에 위치되는 분석물을 포함하는, 시스템.6. The system of any of items 2-5, wherein the different spatial configurations include analytes located on different lane groups of the biosensor.

7. 항목 6에 있어서, 상기 상이한 레인 그룹은 상부 주변 레인들, 중심 레인들, 및 하부 주변 레인들을 포함하는, 시스템.7. The system of item 6, wherein the different lane groups include upper perimeter lanes, central lanes, and lower perimeter lanes.

8. 항목 6 또는 항목 7에 있어서, 상기 상이한 레인 그룹은 에지 레인 및 비에지 레인(non-edge lane)을 포함하는, 시스템.8. The system of item 6 or item 7, wherein the different lane groups include edge lanes and non-edge lanes.

9. 항목 2 내지 항목 8 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상기 상이한 레인의 상이한 스와스 상에 위치되는 분석물을 포함하는, 시스템.9. The system of any of items 2-8, wherein the different spatial configurations include analytes located on different swaths of the different lanes of the biosensor.

10. 항목 9에 있어서, 상기 상이한 스와스는 상부 주변 스와스, 중심 스와스 및 하부 주변 스와스를 포함하는, 시스템.10. The system of item 9, wherein the different swaths include a top perimeter swath, a center swath, and a bottom perimeter swath.

11. 항목 9 또는 항목 10에 있어서, 상기 상이한 스와스는 에지 스와스 및 중심 스와스를 포함하는, 시스템.11. The system of item 9 or item 10, wherein the different swaths include an edge swath and a center swath.

12. 항목 9 내지 항목 11 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상기 상이한 레인의 상기 상이한 스와스의 상이한 타일 상에 위치되는 분석물을 포함하는, 시스템.12. The system of any of items 9-11, wherein the different spatial configurations include analytes located on different tiles of the different swaths of the different lanes of the biosensor.

13. 항목 2 내지 항목 12 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상이한 타일 그룹 상에 위치되는 분석물을 포함하는, 시스템.13. The system of any of items 2-12, wherein the different spatial configurations include analytes located on different tile groups of the biosensor.

14. 항목 13에 있어서, 상기 상이한 타일 그룹은 에지 타일, 중심 타일 및 근거리-에지 타일을 포함하는, 시스템.14. The system of item 13, wherein the different tile groups include edge tiles, center tiles, and near-edge tiles.

15. 항목 12 내지 항목 14 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상기 상이한 레인의 상기 상이한 스와스의 상기 상이한 타일의 상이한 서브타일 상에 위치되는 분석물을 포함하는, 시스템.15. The system of any of items 12-14, wherein the different spatial configurations include analytes located on different subtiles of the different tiles of the different swaths of the different lanes of the biosensor.

16. 항목 2 내지 항목 15 중 어느 한 항목에 있어서, 상기 상이한 공간적 구성은 상기 바이오센서의 상이한 섹션 상에 위치되는 분석물을 포함하는, 시스템.16. The system of any of items 2-15, wherein the different spatial configurations include analytes located on different sections of the biosensor.

17. 항목 16에 있어서, 상기 상이한 섹션은 상부 우측 섹션, 상부 중심 섹션, 상부 좌측 섹션, 중간 우측 섹션, 중심 섹션, 중간 좌측 섹션, 하부 좌측 섹션, 하부 중심 섹션, 및 하부 좌측 섹션을 포함하는, 시스템.17. The system of item 16, wherein the different sections include an upper right section, an upper center section, an upper left section, a middle right section, a center section, a middle left section, a lower left section, a lower center section, and a lower left section.

18. 항목 1 내지 항목 17 중 어느 한 항목에 있어서, 각각의 전문 신호 프로파일러는 특정 분석물 서브클래스 내의 분석물에 대해 검출되고 특정 훈련 데이터 서브세트에서 특징지어지는 특정 신호 프로파일에서 시퀀싱된 신호들의 신호 대 잡음비를 최대화하도록 추가로 훈련되고,18. The method of any one of items 1-17, wherein each expert signal profiler performs a signal-to-signal analysis of signals detected for an analyte within a specific analyte subclass and sequenced in a specific signal profile characterized in a specific training data subset. are further trained to maximize noise ratio,

상기 런타임 로직은 상기 염기 호출 동작 동안 각각의 분석물 서브클래스 내의 분석물에 대해 검출된 각각의 신호 프로파일에서 시퀀싱된 신호에 상기 각각의 전문 신호 프로파일러를 적용함으로써 상기 염기 호출 동작을 실행하도록 더 구성되는, 시스템.The runtime logic is further configured to execute the base calling operation by applying the respective specialized signal profiler to the sequenced signal in each signal profile detected for the analyte within each analyte subclass during the base calling operation. Being a system.

19. 항목 18에 있어서, 상기 각각의 분석물 서브클래스는 상기 염기 호출 동작의 상이한 시간 기간에서 상기 시퀀싱된 신호를 생성한 상기 분석물의 상이한 공간적 구성을 나타내고, 상기 상이한 공간적 구성과 상기 상이한 시간 기간의 상이한 조합은 상기 염기 호출 동작 동안 상기 검출된 각각의 신호 프로파일의 생성에 기여하는, 시스템.19. The method of item 18, wherein each analyte subclass represents a different spatial configuration of the analyte that produced the sequenced signal at a different time period of the base calling operation, and a different combination of the different spatial configuration and the different time period. contributes to the generation of each detected signal profile during the base calling operation.

20. 항목 19에 있어서, 상기 상이한 시간 기간은 상기 염기 호출 동작의 일련의 감지 사이클에서 상이한 감지 사이클에 대응하는, 시스템.20. The system of item 19, wherein the different time periods correspond to different sensing cycles in the series of sensing cycles of the base calling operation.

21. 항목 19 또는 항목 20에 있어서, 상기 상이한 시간 기간은 상기 염기 호출 동작의 상기 일련의 감지 사이클에서 감지 사이클의 상이한 서브시리즈에 대응하는, 시스템.21. The system of item 19 or item 20, wherein the different time periods correspond to different subseries of sense cycles in the series of sense cycles of the base call operation.

22. 항목 1 내지 항목 21 중 어느 한 항목에 있어서, 각각의 전문 신호 프로파일러는 채널-특이적 등화기들로 구성되고, 각각의 채널-특이적 등화기는 복수의 컨볼루션 커널을 갖는, 시스템.22. The system of any of items 1 to 21, wherein each expert signal profiler is comprised of channel-specific equalizers, and each channel-specific equalizer has a plurality of convolution kernels.

23. 항목 1 내지 항목 22 중 어느 한 항목에 있어서, 상기 런타임 로직은 상기 염기 호출 동작 동안 상기 각각의 전문 신호 프로파일러를 반복적으로 훈련하도록 더 구성되는, 시스템.23. The system of any of items 1 to 22, wherein the runtime logic is further configured to iteratively train each expert signal profiler during the base call operation.

24. 항목 23에 있어서, 현재 훈련 반복에 대해, 상기 런타임 로직은 상기 염기 호출 동작 동안 지금까지 검출된 시퀀싱된 신호에 가장 잘 맞는 신호 분포와 염기별 신호 중심을 채널별로 관찰할 가능성을 반복적으로 최대화하는 기대 최대화를 구현하고, 상기 각각의 전문 신호 프로파일러를 상기 시퀀싱된 신호에 적용하는 것에 응답하여 신호 대 잡음비 최대화 시퀀싱된 신호를 채널별로 결정하고, 상기 신호 대 잡음비 최대화 시퀀싱된 신호를 기반으로 염기를 호출하고, 상기 신호 대 잡음비 최대화 시퀀싱된 신호를 상기 호출된 염기의 신호 중심과 비교하여 채널별로 염기 호출 오류를 결정하고,24. Item 23, wherein, for the current training iteration, the runtime logic sets an expectation that iteratively maximizes the probability of observing, by channel, the signal distribution and base-by-base signal centroid that best fits the sequenced signal detected so far during the base calling operation. Implement maximization, determine on a channel-by-channel basis a signal-to-noise ratio-maximizing sequenced signal in response to applying the respective expert signal profiler to the sequenced signal, and call bases based on the signal-to-noise ratio-maximizing sequenced signal. and determining a base calling error for each channel by comparing the signal-to-noise-ratio-maximizing sequenced signal with the signal center of the called base,

상기 염기 호출 오류를 기반으로 상기 각각의 전문 신호 프로파일러의 컨볼루션 커널 계수를 채널별로 업데이트하도록 더 구성되는, 시스템.The system further configured to update convolution kernel coefficients of each expert signal profiler on a channel-by-channel basis based on the base call error.

25. 항목 1 내지 항목 24 중 어느 한 항목에 있어서, 상기 분석물들은 상기 바이오센서가 패턴화된 바이오센서인 경우 웰들에 대응하는, 시스템.25. The system of any of items 1 to 24, wherein the analytes correspond to wells when the biosensor is a patterned biosensor.

26. 항목 1 내지 항목 25 중 어느 한 항목에 있어서, 상기 시퀀싱된 신호는 세기 신호인, 시스템.26. The system of any one of items 1 through 25, wherein the sequenced signal is an intensity signal.

27. 항목 1 내지 항목 25 중 어느 한 항목에 있어서, 상기 시퀀싱된 신호는 전압 신호인, 시스템.27. The system of any of items 1 through 25, wherein the sequenced signal is a voltage signal.

28. 항목 1 내지 항목 25 중 어느 한 항목에 있어서, 상기 시퀀싱된 신호는 세기 신호인, 시스템.28. The system of any one of items 1 through 25, wherein the sequenced signal is an intensity signal.

29. 시스템으로서,29. As a system,

시퀀싱 런의 초기 시퀀싱 사이클들 동안 검출된 초기에 시퀀싱된 신호를 저장하는 메모리;a memory storing initially sequenced signals detected during initial sequencing cycles of a sequencing run;

상기 메모리에 액세스하는 피팅 로직으로서, 상기 초기에 시퀀싱된 신호들 상에 복수의 신호 분포를 피팅하고 상기 메모리에 상기 복수의 신호 분포를 저장하도록 구성된 피팅 로직;fitting logic accessing the memory, the fitting logic configured to fit a plurality of signal distributions on the initially sequenced signals and store the plurality of signal distributions in the memory;

상기 메모리에 액세스하는 온라인 훈련 로직으로서, 복수의 전문 신호 프로파일러에서 각각의 전문 신호 프로파일러를 훈련하여 상기 복수의 신호 분포에서 각각의 신호 분포의 신호 대 잡음비를 최대화하고 상기 메모리에 상기 훈련된 각각의 전문 신호 프로파일러를 저장하도록 구성된 온라인 훈련 로직; 및Online training logic accessing the memory, training each expert signal profiler in the plurality of expert signal profilers to maximize the signal-to-noise ratio of each signal distribution in the plurality of signal distributions and storing each trained signal profiler in the memory. Online training logic configured to store expert signal profilers; and

상기 메모리에 액세스하는 런타임 로직으로서, 상기 각각의 신호 분포에 대한 상기 시퀀스 런의 후속 시퀀싱 사이클 동안 검출된 후속적으로 시퀀싱된 신호를 고유하게 맵핑하고, 상기 각각의 신호 분포에 대한 상기 고유한 맵핑에 기초하여 상기 훈련된 각각의 전문 신호 프로파일러를 상기 후속적으로 시퀀싱된 신호에 적용하여 상기 후속 시퀀싱 사이클에 대한 염기 호출을 생성하도록 구성된 런타임 로직을 포함하는, 시스템.Runtime logic that accesses the memory to uniquely map a subsequently sequenced signal detected during a subsequent sequencing cycle of the sequence run to the respective signal distribution, and to the unique mapping for the respective signal distribution. and runtime logic configured to apply each trained expert signal profiler to the subsequently sequenced signal to generate base calls for the subsequent sequencing cycle.

30. 항목 29에 있어서, 상기 복수의 신호 분포 내의 상기 신호 분포들 중 적어도 일부는 상기 신호 분포들 중 상기 일부의 생성에 기여하는 상이한 기본 시퀀싱 이벤트를 나타내는, 시스템.30. The system of item 29, wherein at least some of the signal distributions within the plurality of signal distributions represent different underlying sequencing events contributing to the generation of the some of the signal distributions.

31. 항목 30에 있어서, 상기 기본 시퀀싱 이벤트들은 상기 시퀀싱 런이 실행되는 바이오센서 상에 버블들의 형성을 포함하는, 시스템.31. The system of item 30, wherein the basic sequencing events include the formation of bubbles on the biosensor on which the sequencing run is run.

32. 항목 29 내지 항목 31 중 어느 한 항목에 있어서, 상기 복수의 신호 분포 내의 상기 신호 분포들 중 적어도 일부는 상기 신호 분포들 중 상기 일부의 생성에 기여하는 상기 바이오센서 상의 상이한 분석물 위치들을 나타내는, 시스템.32. The system of any of items 29-31, wherein at least some of the signal distributions in the plurality of signal distributions represent different analyte locations on the biosensor that contribute to the generation of the some of the signal distributions. .

33. 항목 29 내지 항목 32 중 어느 한 항목에 있어서, 상기 복수의 신호 분포 내의 상기 신호 분포들 중 적어도 일부는 상기 신호 분포들 중 상기 일부의 생성에 기여하는 상기 시퀀싱 런의 상이한 시퀀싱 사이클들을 나타내는, 시스템.33. The system of any of items 29-32, wherein at least some of the signal distributions in the plurality of signal distributions represent different sequencing cycles of the sequencing run contributing to the generation of the some of the signal distributions.

34. 항목 29 내지 항목 33 중 어느 한 항목에 있어서, 상기 훈련된 각각의 전문 신호 프로파일러들은 상기 각각의 신호 분포에 대응하는 변동 계수들의 각각의 세트들을 갖는, 시스템.34. The system of any of items 29-33, wherein each of the trained expert signal profilers has respective sets of coefficients of variation corresponding to the respective signal distribution.

35. 항목 34에 있어서, 변동 계수들의 각각의 세트는 대응하는 신호 분포에서 스케일 변동들을 정정하는 채널-특이적 증폭 계수들을 포함하는, 시스템.35. The system of item 34, wherein each set of variation coefficients comprises channel-specific amplification coefficients that correct for scale variations in the corresponding signal distribution.

36. 항목 34 또는 항목 35에 있어서, 변동 계수들의 각각의 세트는 대응하는 신호 분포에서의 시프트 변동들을 정정하는 채널-특이적 오프셋 계수들을 포함하는, 시스템.36. The system of item 34 or item 35, wherein each set of variation coefficients comprises channel-specific offset coefficients that correct shift variations in the corresponding signal distribution.

37. 항목 29 내지 항목 36 중 어느 한 항목에 있어서, 상기 시퀀싱 런의 각각의 시퀀싱 사이클에서 상기 피팅 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.37. The method of any one of items 29 to 36, further configured to include control logic configured to repeat execution of the fitting logic, the online training logic, and the runtime logic in each sequencing cycle of the sequencing run. system.

38. 항목 29 내지 항목 37 중 어느 한 항목에 있어서, 상기 시퀀싱 런의 소정 수의 시퀀싱 사이클들 후에, 상기 피팅 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.38. The method of any one of items 29 to 37, further comprising control logic configured to repeat execution of the fitting logic, the online training logic, and the runtime logic after a predetermined number of sequencing cycles of the sequencing run. Consisting of a system.

39. 항목 29 내지 항목 38 중 어느 한 항목에 있어서, 상기 시퀀싱 런의 소정의 미리 결정된 시퀀싱 사이클들에서 상기 피팅 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.39. The method of any one of items 29 to 38, further comprising control logic configured to repeat execution of the fitting logic, the online training logic, and the runtime logic at predetermined sequencing cycles of the sequencing run. Consisting of a system.

40. 시스템으로서,40. As a system,

염기 호출 동작에서 사용하도록 구성된 복수의 전문 신호 프로파일러를 저장하는 메모리로서, 상기 복수의 전문 신호 프로파일러에서 각각의 전문 신호 프로파일러들은 상기 염기 호출 동작의 각각의 시퀀싱 이벤트에서 관찰되고 각각의 훈련 데이터 세트에서 특징지어지는 각각의 신호 분포에서 센서 데이터의 신호 대 잡음비를 최대화하도록 훈련되고 는 메모리; 및A memory configured to store a plurality of expert signal profilers configured for use in a base calling operation, wherein each expert signal profiler in the plurality of expert signal profilers is observed at a respective sequencing event of the base calling operation and generates respective training data. A memory is trained to maximize the signal-to-noise ratio of the sensor data for each signal distribution characterized in the set; and

상기 메모리에 액세스하는 런타임 로직으로서, 대상 센서 데이터를 생성한 대상 시퀀싱 이벤트에 기초하여 상기 복수의 전문 신호 프로파일러로부터 전문 신호 프로파일러를 선택하고, 상기 대상 센서 데이터 상에 상기 선택된 전문 신호 프로파일러를 적용하여 대상 시퀀싱 이벤트에 대한 염기 호출 분류 데이터를 생성하도록 구성된 런타임 로직을 포함하는, 시스템.Runtime logic that accesses the memory to select an expert signal profiler from the plurality of expert signal profilers based on a subject sequencing event that generated subject sensor data, and apply the selected expert signal profiler on the subject sensor data. A system, comprising runtime logic configured to apply and generate base call classification data for target sequencing events.

41. 항목 40에 있어서, 상기 각각의 시퀀싱 이벤트에 의해 생성된 센서 데이터의 신호 대 잡음비는 상기 염기 호출 동작의 시간적 진행에 따라 저하되고, 상기 각각의 시퀀싱 이벤트에 기초하여 선택되고 상기 염기 호출 동작의 상기 시간적 진행을 통해 적용된 각각의 전문 신호 프로파일러들은 상기 센서 데이터의 상기 신호 대 잡음비의 상기 열화를 반전시키는, 시스템.41. The method of item 40, wherein the signal-to-noise ratio of the sensor data generated by each sequencing event degrades with the temporal progression of the base calling operation, and is selected based on the respective sequencing event and the temporal progression of the base calling operation. Each expert signal profiler applied over time reverses the degradation of the signal-to-noise ratio of the sensor data.

42. 항목 41에 있어서, 상기 각각의 시퀀싱 이벤트는 상기 염기 호출 동작의 일련의 감지 사이클에서 각각의 감지 사이클들을 통한 상기 염기 호출 동작의 시간적 진행인, 시스템.42. The system of item 41, wherein each sequencing event is a temporal progression of the base calling operation through each sensing cycle in a series of sensing cycles of the base calling operation.

43. 항목 42에 있어서, 상기 각각의 시퀀싱 이벤트는 상기 일련의 감지 사이클에서 감지 사이클들의 각각의 서브시리즈를 통한 상기 염기 호출 동작의 시간적 진행인, 시스템.43. The system of item 42, wherein each sequencing event is a temporal progression of the base calling operation through each subseries of sensing cycles in the series of sensing cycles.

44. 항목 41에 있어서, 상기 각각의 시퀀싱 이벤트는 상기 염기 호출 동작이 실행되는 바이오센서 상의 각각의 분석물 위치들을 통한 상기 염기 호출 동작의 공간적 진행인, 시스템.44. The system of item 41, wherein each sequencing event is a spatial progression of the base calling operation through respective analyte positions on the biosensor on which the base calling operation is performed.

45. 항목 40에 있어서, 상기 런타임 로직은 상기 각각의 시퀀싱 이벤트에서 상기 시퀀싱 런 동안 각각의 전문 신호 프로파일러를 반복적으로 훈련하도록 더 구성되는, 시스템.45. The system of item 40, wherein the runtime logic is further configured to iteratively train each expert signal profiler during the sequencing run at each sequencing event.

46. 항목 45에 있어서, 현재 훈련 반복에 대해, 상기 런타임 로직은 상기 센서 데이터에 가장 잘 맞는 신호 분포와 염기별 신호 중심을 채널별로 관찰할 가능성을 반복적으로 최대화하는 기대 최대화를 구현하고, 상기 각각의 전문 신호 프로파일러를 상기 센서 데이터에 적용하는 것에 응답하여 신호 대 잡음비 최대화 센서 데이터를 채널별로 결정하고, 상기 신호 대 잡음비 최대화 센서 데이터를 기반으로 염기를 호출하고, 상기 신호 대 잡음비 최대화 센서 데이터를 상기 호출된 염기의 신호 중심과 비교하여 채널별로 염기 호출 오류를 결정하고, 상기 염기 호출 오류를 기반으로 상기 각각의 전문 신호 프로파일러의 컨볼루션 커널 계수를 채널별로 업데이트하도록 더 구성되는, 시스템.46. In item 45, for the current training iteration, the runtime logic implements expectation maximization to iteratively maximize the probability of observing the signal distribution and base-by-base signal center that best fits the sensor data, channel by channel, and Responsive to applying a signal profiler to the sensor data, determine signal-to-noise ratio maximizing sensor data on a channel-by-channel basis, call bases based on the signal-to-noise ratio maximization sensor data, and call bases based on the signal-to-noise ratio maximization sensor data. The system further configured to determine a base call error on a channel-by-channel basis by comparing it to the signal center of the base, and update the convolution kernel coefficients of each expert signal profiler on a channel-by-channel basis based on the base call error.

47. 시스템으로서,47. As a system,

분석물 집단에 대한 시퀀싱 런의 초기 시퀀싱 사이클들 동안 검출된 초기에 시퀀싱된 신호를 저장하는 메모리;a memory storing initially sequenced signals detected during initial sequencing cycles of a sequencing run for an analyte population;

상기 메모리에 액세스하는 피팅 로직으로서, 분석물별로 상기 처음에 시퀀싱된 신호를 처리하고, 상기 분석물 집단 내의 각각의 분석물에 대한 각각의 신호 프로파일을 피팅하고, 상기 각각의 신호 프로파일을 상기 메모리에 저장하도록 구성되는 피팅 로직;Fitting logic accessing the memory, processing the initially sequenced signal for each analyte, fitting each signal profile for each analyte in the analyte population, and storing each signal profile in the memory. fitting logic configured to store;

상기 메모리에 액세스하는 온라인 훈련 로직으로서, 복수의 전문 신호 프로파일러에서 각각의 전문 신호 프로파일러를 훈련하여 상기 각각의 분석물에 대해 피팅된 상기 각각의 신호 프로파일의 신호 대 잡음비를 최대화하고 상기 메모리에 상기 훈련된 각각의 전문 신호 프로파일러를 저장하도록 구성된 온라인 훈련 로직; 및Online training logic accessing the memory, training each expert signal profiler in a plurality of expert signal profilers to maximize the signal-to-noise ratio of each fitted signal profile for each analyte and storing the signal profile in the memory. online training logic configured to store each trained expert signal profiler; and

상기 메모리에 액세스하는 런타임 로직으로서, 상기 각각의 신호 프로파일에 대해 분석물별로 상기 시퀀스 런의 후속 시퀀싱 사이클 동안 검출된 후속적으로 시퀀싱된 신호를 고유하게 맵핑하고, 상기 각각의 신호 프로파일에 대한 상기 고유한 맵핑에 기초하여 상기 훈련된 각각의 전문 신호 프로파일러를 상기 후속적으로 시퀀싱된 신호에 적용하여 상기 후속 시퀀싱 사이클에 대한 염기 호출을 분석물별로 생성하도록 구성된 런타임 로직을 포함하는, 시스템.Runtime logic that accesses the memory to uniquely map subsequently sequenced signals detected during a subsequent sequencing cycle of the sequence run by analyte to each signal profile, The system comprising runtime logic configured to apply each trained expert signal profiler to the subsequently sequenced signal based on the mapping to generate analyte-specific base calls for the subsequent sequencing cycle.

48. 항목 47에 있어서, 상기 훈련된 각각의 전문 신호 프로파일러들은 상기 각각의 신호 프로파일에 대응하는 변동 계수들의 각각의 세트들을 갖는, 시스템.48. The system of item 47, wherein each of the trained expert signal profilers has respective sets of variation coefficients corresponding to the respective signal profile.

49. 항목 48에 있어서, 변동 계수들의 각각의 세트는 대응하는 신호 프로파일에서 스케일 변동들을 정정하는 채널-특이적 증폭 계수들을 포함하는, 시스템.49. The system of item 48, wherein each set of coefficients of variation includes channel-specific amplification coefficients that correct scale variations in the corresponding signal profile.

50. 항목 48에 있어서, 변동 계수들의 각각의 세트는 대응하는 신호 프로파일에서의 시프트 변동들을 정정하는 채널-특이적 오프셋 계수들을 포함하는, 시스템.50. The system of item 48, wherein each set of variation coefficients includes channel-specific offset coefficients that correct for shift variations in the corresponding signal profile.

51. 항목 47에 있어서, 상기 시퀀싱 런의 각각의 시퀀싱 사이클에서 상기 피팅 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.51. The system of item 47, further configured to include control logic configured to repeat execution of the fitting logic, the online training logic, and the runtime logic in each sequencing cycle of the sequencing run.

52. 항목 51에 있어서, 상기 시퀀싱 런의 소정 수의 시퀀싱 사이클들 후에, 상기 피팅 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.52. The system of item 51, further configured to include control logic configured to repeat execution of the fitting logic, the online training logic, and the runtime logic after a predetermined number of sequencing cycles of the sequencing run.

53. 항목 51에 있어서, 상기 시퀀싱 런의 소정의 미리 결정된 시퀀싱 사이클들에서 상기 피팅 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.53. The system of item 51, further configured to include control logic configured to repeat execution of the fitting logic, the online training logic, and the runtime logic at predetermined sequencing cycles of the sequencing run.

54. 시스템으로서,54. As a system,

복수의 전문 신호 프로파일러를 저장하는 메모리로서, 상기 복수의 전문 신호 프로파일러 내의 각각의 전문 신호 프로파일러는 특정 분석물에 대해 검출되고 특정 훈련 데이터 세트에서 특징지어지는 특정 신호 프로파일에서 시퀀싱된 신호의 신호 대 잡음비를 최대화하도록 훈련되는 메모리; 및A memory storing a plurality of expert signal profilers, wherein each expert signal profiler in the plurality of expert signal profilers represents a signal sequenced in a particular signal profile detected for a particular analyte and characterized in a particular training data set. Memory trained to maximize signal-to-noise ratio; and

상기 메모리에 액세스하는 런타임 로직으로서, 염기 호출 동작 동안 각각의 분석물에 대해 검출된 각각의 신호 프로파일에서 시퀀싱된 신호에 상기 복수의 전문 신호 프로파일러 내의 각각의 전문 신호 프로파일러를 적용함으로써 상기 염기 호출 동작을 실행하도록 구성되는 런타임 로직을 포함하는, 시스템.Runtime logic to access the memory, wherein the base call is performed by applying each expert signal profiler in the plurality of expert signal profilers to a sequenced signal in each signal profile detected for each analyte during a base call operation. A system, including runtime logic configured to execute operations.

55. 시스템으로서,55. As a system,

분석물 위치를 사용하여 분석물 집단을 공간적 클래스로 세그먼트화하도록 구성된 공간적 분류 로직으로서, 각각의 공간적 클래스는 상기 분석물 집단으로부터 컬링된 분석물의 비중첩 세트를 포함하는 공간적 분류 로직;Spatial classification logic configured to segment an analyte population into spatial classes using analyte location, each spatial class comprising a non-overlapping set of analytes culled from the analyte population;

상기 분석물 집단에 대해 검출된 시퀀싱 신호들을 사용하여 각각의 공간적 클래스에 대한 하나 이상의 신호 프로파일들을 추정하도록 구성된 신호 프로파일링 로직으로서, 각각의 신호 프로파일은 분석물들의 비중첩 세트로부터 컬링된 분석물들의 비중첩 서브세트를 포함하는 신호 프로파일링 로직; 각각의 공간적 클래스에 대해 추정된 각각의 신호 프로파일의 신호 대 잡음비를 최대화하기 위해 적어도 하나의 전문 신호 프로파일러를 훈련하도록 구성된 온라인 훈련 로직; 및Signal profiling logic configured to estimate one or more signal profiles for each spatial class using the sequencing signals detected for the population of analytes, wherein each signal profile is one of analytes culled from a non-overlapping set of analytes. signal profiling logic including non-overlapping subsets; online training logic configured to train at least one expert signal profiler to maximize the signal-to-noise ratio of each signal profile estimated for each spatial class; and

각각의 훈련된 전문 신호 프로파일러들을 사용하여 분석물의 각각의 비중첩 서브세트들 내의 분석물을 염기 호출하도록 구성된 런타임 로직을 포함하는, 시스템.A system comprising runtime logic configured to base call analytes within respective non-overlapping subsets of analytes using respective trained expert signal profilers.

56. 항목 55에 있어서, 상기 온라인 훈련 로직은 특정 전문 신호 프로파일러를 훈련하는 데 사용되는 훈련 데이터를 분석물들의 특정 비중첩 서브세트에 대해 검출된 시퀀싱 신호들에 정합하도록 더 구성되는, 시스템.56. The system of item 55, wherein the online training logic is further configured to match training data used to train a particular expert signal profiler to detected sequencing signals for a particular non-overlapping subset of analytes.

57. 항목 55 또는 항목 56에 있어서, 분석물들의 특정 비중첩 서브세트들이 적절한 분석물들이 없을 때 분석물들의 인접한 비중첩 서브세트들을 인접하여 병합하여 이용가능한 적절한 훈련 데이터를 만들도록 더 구성되는, 시스템.57. The system of item 55 or item 56, wherein certain non-overlapping subsets of analytes are further configured to adjacently merge adjacent non-overlapping subsets of analytes when no suitable analytes are available to make appropriate training data available.

58. 항목 55 내지 항목 57 중 어느 한 항목에 있어서, 분석물들의 각각의 비중첩 서브세트 내의 분석물들의 수는 훈련을 최적화하도록 구성가능한, 시스템.58. The system of any of items 55-57, wherein the number of analytes within each non-overlapping subset of analytes is configurable to optimize training.

59. 항목 55 내지 항목 58 중 어느 한 항목에 있어서, 상기 온라인 훈련 로직은 각각의 훈련된 전문 신호 프로파일러에 대한 변동 계수들의 세트를 추정하도록 더 구성되는, 시스템.59. The system of any of items 55-58, wherein the online training logic is further configured to estimate a set of coefficients of variation for each trained expert signal profiler.

60. 항목 59에 있어서, 변동 계수들의 상기 세트는 대응하는 신호 프로파일에서 스케일 변동들을 정정하는 채널-특이적 증폭 계수들을 포함하는, 시스템.60. The system of item 59, wherein the set of variation coefficients comprise channel-specific amplification coefficients that correct scale variations in the corresponding signal profile.

61. 항목 59 또는 항목 60에 있어서, 변동 계수들의 상기 세트는 대응하는 신호 프로파일에서의 시프트 변동들을 정정하는 채널-특이적 오프셋 계수들을 포함하는, 시스템.61. The system of item 59 or item 60, wherein the set of variation coefficients comprise channel-specific offset coefficients that correct shift variations in the corresponding signal profile.

62. 항목 55 내지 항목 61 중 어느 한 항목에 있어서, 시퀀싱 런의 각각의 시퀀싱 사이클에서 상기 신호 프로파일링 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.62. The method of any one of items 55 to 61, further configured to include control logic configured to repeat execution of the signal profiling logic, the online training logic, and the runtime logic at each sequencing cycle of a sequencing run. , system.

63. 항목 55 내지 항목 62 중 어느 한 항목에 있어서, 상기 시퀀싱 런의 소정 수의 시퀀싱 사이클들 후에, 상기 신호 프로파일링 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.63. The method of any one of items 55-62, comprising control logic configured to repeat execution of the signal profiling logic, the online training logic, and the runtime logic after a predetermined number of sequencing cycles of the sequencing run. A system further configured to:

64. 항목 55 내지 항목 63 중 어느 한 항목에 있어서, 상기 시퀀싱 런의 소정의 미리 결정된 시퀀싱 사이클들에서 상기 신호 프로파일링 로직, 상기 온라인 훈련 로직, 및 상기 런타임 로직의 실행을 반복하도록 구성되는 제어 로직을 포함하도록 더 구성되는, 시스템.64. The method of any one of items 55 to 63, comprising control logic configured to repeat execution of the signal profiling logic, the online training logic, and the runtime logic at predetermined sequencing cycles of the sequencing run. A system further configured to:

Claims

As a system,
A memory storing a plurality of specialized signal profilers, wherein each specialized signal profiler in the plurality of specialized signal profilers is a specific signal profile detected for an analyte in a specific analyte class and characterized in a specific training data set. a memory trained to maximize the signal-to-noise ratio of the sequenced signal;
Runtime logic that accesses the memory, wherein during a base calling operation, a sequenced signal from each detected signal profile for an analyte within each analyte class is accessed by each expert within the plurality of expert signal profilers. Runtime logic configured to execute base calling operations by applying a signal profiler.
system, including.

The system of claim 1, wherein each analyte class represents a different spatial configuration of the analytes that contribute to the generation of the respective signal profile during the base calling operation.

3. The system of claim 2, wherein the different spatial configurations include analytes located on different surfaces of the biosensor on which the base calling operation is performed.

4. The system of claim 3, wherein the different surfaces include an upper surface and a lower surface.

5. The system of any one of claims 2-4, wherein the different spatial configurations include analytes located on different lanes of the biosensor.

6. The system of any one of claims 2-5, wherein the different spatial configurations comprise analytes located on different lane groups of the biosensor.

7. The system of claim 6, wherein the different lane groups include an upper perimeter lane, a central lane, and a lower perimeter lane.

8. The system of claim 6 or 7, wherein the different lane groups include edge lanes and non-edge lanes.

9. The system of any one of claims 2-8, wherein the different spatial configurations include analytes located on different swaths of the different lanes of the biosensor.

10. The system of claim 9, wherein the different swaths include a top perimeter swath, a center swath, and a bottom perimeter swath.

11. The system of claim 9 or 10, wherein the different swaths include an edge swath and a center swath.

12. The system of any one of claims 9-11, wherein the different spatial configurations include analytes located on different tiles of the different swaths of the different lanes of the biosensor.

13. The system of any one of claims 2-12, wherein the different spatial configurations include analytes located on different tile groups of the biosensor.

14. The system of claim 13, wherein the different tile groups include edge tiles, center tiles, and near-edge tiles.

15. The system of any one of claims 12-14, wherein the different spatial configurations include analytes located on different subtiles of the different tiles of the different swaths of the different lanes of the biosensor. .

16. The system of any one of claims 2-15, wherein the different spatial configurations include analytes located on different sections of the biosensor.

17. The system of claim 16, wherein the different sections include an upper right section, an upper center section, an upper left section, a middle right section, a center section, a middle left section, a lower left section, a lower center section, and a lower left section. .

18. The method of any one of claims 1 to 17, wherein each expert signal profiler is configured to: further trained to maximize signal-to-noise ratio,
The runtime logic is further configured to execute the base calling operation by applying the respective specialized signal profiler to the sequenced signal in each signal profile detected for the analyte within each analyte subclass during the base calling operation. Being a system.

19. The method of claim 18, wherein each analyte subclass represents a different spatial configuration of the analyte that produced the sequenced signal at a different time period of the base calling operation, and wherein the different spatial configuration and the different time period are A system wherein different combinations contribute to the generation of each detected signal profile during the base calling operation.

20. The system of claim 19, wherein the different time periods correspond to different sensing cycles in the series of sensing cycles of the base calling operation.

21. The system of claim 19 or 20, wherein the different time periods correspond to different subseries of sensing cycles in the series of sensing cycles of the base calling operation.

22. The system according to any one of claims 1 to 21, wherein each specialized signal profiler consists of a channel-specific equalizer, and each channel-specific equalizer has a plurality of convolution kernels.

23. The system of any preceding claim, wherein the runtime logic is further configured to iteratively train each expert signal profiler during the base call operation.

24. The method of claim 23, wherein, for a current training iteration, the runtime logic iteratively maximizes the likelihood of observing, by channel, the signal distribution and base-by-base signal centroid that best fits the sequenced signal detected so far during the base calling operation. Implement expectation maximization, determine on a channel-by-channel basis a signal-to-noise ratio-maximizing sequenced signal in response to applying the respective expert signal profiler to the sequenced signal, and select bases based on the signal-to-noise ratio-maximizing sequenced signal. calling, and comparing the signal-to-noise-ratio-maximizing sequenced signal to the signal center of the called base to determine a base calling error for each channel;
The system further configured to update convolution kernel coefficients of each expert signal profiler on a channel-by-channel basis based on the base call error.

25. The system of any one of claims 1-24, wherein the analytes correspond to wells when the biosensor is a patterned biosensor.

26. The system of any preceding claim, wherein the sequenced signal is an intensity signal.

26. The system of any preceding claim, wherein the sequenced signal is a voltage signal.

26. The system of any preceding claim, wherein the sequenced signal is a current signal.

As a system,
a memory storing initially sequenced signals detected during initial sequencing cycles of a sequencing run;
fitting logic accessing the memory, the fitting logic configured to fit a plurality of signal distributions on the initially sequenced signals and store the plurality of signal distributions in the memory;
Online training logic accessing the memory, training each expert signal profiler in the plurality of expert signal profilers to maximize the signal-to-noise ratio of each signal distribution in the plurality of signal distributions and storing each trained signal profiler in the memory. Online training logic configured to store an expert signal profiler of; and
Runtime logic that accesses the memory to uniquely map a subsequently sequenced signal detected during a subsequent sequencing cycle of the sequence run to the respective signal distribution, and to the unique mapping for the respective signal distribution. and runtime logic configured to apply each trained expert signal profiler to the subsequently sequenced signal to generate base calls for the subsequent sequencing cycle.

30. The system of claim 29, wherein at least some of the signal distributions within the plurality of signal distributions represent different underlying sequencing events contributing to the generation of the some of the signal distributions.

As a system,
A memory configured to store a plurality of expert signal profilers configured for use in a base calling operation, wherein each expert signal profiler in the plurality of expert signal profilers is observed and trained at a respective sequencing event of the base calling operation. a memory trained to maximize the signal-to-noise ratio of the sensor data for each signal distribution characterized in the data set; and
Runtime logic that accesses the memory to select an expert signal profiler from the plurality of expert signal profilers based on a subject sequencing event that generated subject sensor data, and apply the selected expert signal profiler on the subject sensor data. A system comprising runtime logic configured to apply and generate base call classification data for the target sequencing event.

As a system,
a memory storing initially sequenced signals detected during initial sequencing cycles of a sequencing run for an analyte population;
Fitting logic accessing the memory, processing the initially sequenced signal for each analyte, fitting each signal profile for each analyte in the analyte population, and storing each signal profile in the memory. fitting logic configured to store;
Online training logic accessing the memory, training each expert signal profiler in a plurality of expert signal profilers to maximize the signal-to-noise ratio of each fitted signal profile for each analyte and storing the signal profile in the memory. online training logic configured to store each trained expert signal profiler; and
Runtime logic that accesses the memory to uniquely map subsequently sequenced signals detected during a subsequent sequencing cycle of the sequence run for each analyte to each signal profile, and A system comprising runtime logic configured to apply each trained expert signal profiler to the subsequently sequenced signal based on a unique mapping to generate base calls for each of the analytes for the subsequent sequencing cycle. .

As a system,
A memory storing a plurality of expert signal profilers, wherein each expert signal profiler in the plurality of expert signal profilers represents a signal sequenced in a particular signal profile detected for a particular analyte and characterized in a particular training data set. Memory trained to maximize signal-to-noise ratio; and
Runtime logic that accesses the memory to perform base calling by applying each specialized signal profiler in the plurality of specialized signal profilers to the sequenced signal in each signal profile detected for each analyte during the base calling operation. A system, including runtime logic configured to execute operations.

As a system,
Spatial classification logic configured to segment an analyte population into spatial classes using analyte location, each spatial class comprising a non-overlapping set of analytes culled from the analyte population;
Signal profiling logic configured to estimate one or more signal profiles for each spatial class using the sequencing signals detected for the population of analytes, wherein each signal profile is culled from a non-overlapping set of analytes. signal profiling logic comprising a non-overlapping subset of; online training logic configured to train at least one expert signal profiler to maximize the signal-to-noise ratio of each signal profile estimated for each spatial class; and
A system comprising runtime logic configured to base call analytes within respective non-overlapping subsets of analytes using respective trained expert signal profilers.