WO2022132286A1 - Bloc - Google Patents

Bloc Download PDF

Info

Publication number
WO2022132286A1
WO2022132286A1 PCT/US2021/053143 US2021053143W WO2022132286A1 WO 2022132286 A1 WO2022132286 A1 WO 2022132286A1 US 2021053143 W US2021053143 W US 2021053143W WO 2022132286 A1 WO2022132286 A1 WO 2022132286A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample data
block
score
data
threshold value
Prior art date
Application number
PCT/US2021/053143
Other languages
English (en)
Inventor
Haley GRANT
Albert Kuo
Kamel LAHOUEL
Cristian TOMASETTI
Original Assignee
The Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Johns Hopkins University filed Critical The Johns Hopkins University
Publication of WO2022132286A1 publication Critical patent/WO2022132286A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Definitions

  • the disclosed embodiments can be applied to any classifier that requires a high specificity.
  • the disclosed embodiments can be applied to screening a population with incidence- based rates of cancer for early cancer detection.
  • the disclosed embodiments can provide for ranking scores associated with data within blocks to classify that data with higher specificity. Data’ scores can be more sensitive to batch effects rather than relative rankings of same or similar scores within a block. Therefore, the disclosed embodiments provide for classifying data based on separating a population of data into relative block sizes and classifying the data within such blocks.
  • Embodiment 1 is a method for classifying data in blocks, the method comprising: receiving a population of sample data, determining block sizes relative to the sample data, grouping the sample data into blocks based on the determined block sizes, selecting a block, determining a desired specificity for the block, assigning a score to each of the sample data in the block, ranking the sample data according to the score associated with each of the sample data, identifying a positive threshold value for the block based on the desired specificity, and classifying the scored sample data based on the positive threshold value.
  • Embodiment 2 is the method of embodiment 1, wherein the positive threshold value is 1- specificity.
  • Embodiment 3 is the method of any one of embodiments 1 through 2, wherein classifying the sample data comprises assigning a classifier value of 1 for each sample data in the block where the score for the sample data exceeds the positive threshold value.
  • Embodiment 4 is the method of any one of embodiments 1 through 3, wherein classifying the sample data comprises assigning a classifier value of 0 for each sample data in the block where the score for the sample data is less than the positive threshold value.
  • Embodiment 5 is the method of any one of embodiments 1 through 4, wherein the classifier value of 1 is indicative of a positive call.
  • Embodiment 6 is the method of any one of embodiments 1 through 5, wherein the classifier value of 0 is indicative of a neutral call.
  • Embodiment 7 is the method of any one of embodiments 1 through 6, wherein the positive call is cancer.
  • Embodiment 8 is the method of any one of embodiments 1 through 7, wherein the neutral call is non-cancer.
  • Embodiment 9 is the method of any one of embodiments 1 through 8, further comprising adjusting at least one of the determined block sizes, the desired specificity, the positive threshold value, and a sensitivity based on classifying the sample data.
  • Embodiment 10 is the method of any one of embodiments 1 through 9, further comprising determining an outlier cutoff score for the sample data.
  • Embodiment 11 is the method of any one of embodiments 1 through 10, further comprising classifying the scored sample data as outliers based on the score of each sample data exceeding the outlier cutoff score.
  • Embodiment 12 is the method of any one of embodiments 1 through 11, further comprising: aggregating the sample data classified as outliers, ranking the aggregated sample data from highest score to lowest score, determining a difference between each ranked sample data and a next highest neighbor, identifying a largest difference, and classifying, in response to determining that an upper point associated with the largest difference is greater than the desired specificity, all sample data in the population based on the largest difference.
  • Embodiment 13 is the method of any one of embodiments 1 through 12, further comprising outputting classifier values for each of the sample data.
  • Embodiment 14 is the method of any one of embodiments 1 through 13, further comprising aggregating classifier values for the block, and outputting an aggregated value for the block.
  • Embodiment 15 is the method of any one of embodiments 1 through 14, further comprising selecting a second block, determining a second desired specificity for the second block, assigning a score to each of the sample data in the second block, ranking the sample data according to the score associated with each of the sample data, identifying a second positive threshold value for the second block based on the second desired specificity, and classifying the scored sample data based on the second positive threshold value.
  • Embodiment 16 is a system for classifying data in blocks, the system comprising one or more processors and computer memory storing instructions that, when executed by the processors, cause the processors to perform the method of any one of the embodiments 1 through 15.
  • the devices, system, and techniques described herein may provide one or more of the following advantages.
  • the disclosed embodiments can provide for higher specificity and accuracy in classifying data.
  • Classifying data within a block, where the data in that block has same or similar scores can provide for more accurate classification, especially where prevalence of cases is low.
  • individual data’ scores can be more sensitive to batch effects or conditions impacting an entire population of data. The batch effects therefore can cause classification of individual data relative to the entire population of data to be less accurate and/or less specific.
  • the disclosed embodiments can be especially beneficial for improving accuracy in early detection of cancer within a population.
  • Relative rankings within blocks can provide for more stable calls than overall rankings since overall rankings can fluctuate over time based on random conditions, such as collecting data for a population.
  • the disclosed embodiments can provide for a a-priori guaranteed lower bound for specificity while often still maintaining if not improving sensitivity when using incidence-based data. This is advantageous when applying the disclosed embodiments to an unknown population of data that requires classifying and/or screening, like in a prospective study.
  • FIG.1 is a conceptual diagram of a system for binary classification of incidence- based data.
  • FIG.2 depicts a basic fixed threshold.
  • FIG.3A depicts a formula to define an outlier cutoff.
  • FIG.3B is a graphical depiction of the outlier cutoff.
  • FIGS.4A-B depict a flowchart of a process for determining the basic fixed threshold of FIG.2.
  • FIGS.5A-D depict a flowchart of a process for determining the outlier cutoff of FIGS.3A-B.
  • FIG.6 is a diagram of system components of the system of FIG.1.
  • FIG.7 is a flowchart of a process for binary classification of incidence-based data.
  • FIG.8 is a graphical depiction of incidence-based data broken up into blocks as described herein.
  • FIG.9 is an example table comparing use of the binary classification of incidence- based data described herein in a fragmentation use case.
  • DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS [0035] This document generally relates to binary classification of incidence-based data (e.g., sequencing data from normal and cancer samples), where prevalence of cases can be low. The disclosed embodiments can be applied to any classifier that requires a high specificity.
  • the disclosed embodiments can be applied to screening a population with incidence- based rates of cancer for early cancer detection or other early detection methodologies needing very high specificity.
  • the disclosed embodiments can provide for ranking scores associated with data within blocks to classify that data. Data’ scores can be more sensitive to batch effects rather than relative rankings of same or similar scores within a block. Therefore, the disclosed embodiments provide for classifying data based on separating a population of data into blocks of fixed or variable sizes and classifying the data within such blocks. [0036]
  • the disclosed embodiments provide for using ranking of scores of data within blocks to make calls about the data (e.g., cancer or normal) within each block rather than using an overall threshold across all blocks.
  • a notation calli I (sample i is classified as a case) can be used.
  • FIG.1 is a conceptual diagram of a system 100 for binary classification of incidence-based data.
  • a user computing device 102 can be in communication (e.g., wired, wireless) with a computer system 104 via a network 106.
  • a lab technician or other expert can provide input to the user computing device 102, such as a population of data 108.
  • the population of data 108 can be transmitted to the computer system 104.
  • the computer system 104 can score the data present in the population based on a scoring algorithm of choice. Following the scoring, the data can be separated into blocks (A). Each block can, e.g., include data collected in a same time period and/or whose analysis was performed in a same lab, using the same sequencing machine etc., in order to better control for batch effects. The data within a block can then be ranked based on their score (B). A relative threshold value can be determined for the block (C). The relative threshold can be specific to the case scores of the data in the block. This can be advantageous to eliminate possibility of batch effects on classifying each data in the population of data. [0039] The computer system 104 can then determine classifier values for each data in the block (D).
  • This determination can be based off the relative threshold value identified for the block (C). For example, any data having a case score above the relative threshold value for the block can be assigned a classifier value of 1 (e.g., indicative of cancer). Any data having a case score below the relative threshold value for the block can be assigned a classifier value of 0 (e.g., indicative of normal). C-D can be repeated for each block that is generated around and for the population of data. [0040] Once all the data in the population of data receives a classifier score, classifier values can be outputted (E) as output of data classifier values 110. The output 110 can be displayed at the user computing device 102, the computer system 104, and/or any other display or computing device.
  • the output 110 can be easily understandable by the lab technician or expert to more accurately respond to case detection in a population of data. Moreover, because the data is analyzed relative to other similarly ranked data in blocks, case detection can be more accurate and less impacted or skewed by batch effects.
  • the output 110 can include indicators of which samples or patients in the population have a classifier value associated with cancer or some other detection concern. Proving output of just those samples having classifier values associated with cancer or another detection concern can be advantageous to improve early detection methodology and treatment.
  • the computer system 104 can classify samples as those samples are processed over time.
  • FIG.2 depicts a basic fixed threshold 200.
  • a predetermined proportion of positive calls to make in every block can be fixed. For example, to guarantee a-priori a 99% specificity, positive calls can be 1% of all samples in the block.
  • Determining the fixed threshold 200 can be repeated for each block of data scores in a population of data.
  • X% can be a desired specificity 202.
  • Formula 204 can be used to determine a corresponding threshold.
  • j composed of n scores, ⁇ s 1 ; s 2 ; ... s n ⁇ , calculate an X th percentile of the n scores (e.g., using an R quantile() function with default settings).
  • Call this threshold s x which is a value of s such that: [0044]
  • the scores in the block can then be classified with call i 206: [0045]
  • This approach can guarantee a specificity of at least X%. In a worst case, for example, all calls made can be false positives.
  • FIG.3A depicts a formula 300 to define an outlier cutoff. Defining an outlier cutoff allows for variation in a number of cases called in each block. Defining the cutoff can provide for analyzing blocks with relatively low variability, which can allow for an entire block to be called normal. Variations on this approach can also allow for more than (100-X) % of a given block to be classified as a case. [0047] As an example, suppose a block of 100 scores is obtained in which all 100 scores are equal.
  • the formula 300 provides for a more conservative approach to defining the outlier cutoff.
  • the formula 300 provides for less than 1% of samples within a block to be classified as cases, but no more than 1%. This cutoff of 1% can be changed to handle different data sets and depending on a lab technician’s or other expert’s tolerance for false positives.
  • j composed of n scores, ⁇ s 1 ; s 2 ; ...
  • the scores can be ordered from highest to lowest.
  • a (reverse) order statistics can be defined as ⁇ s (1) ; s (2) ; ... s (n) ⁇ .
  • s (1) can indicate a highest score
  • s (n) can indicate a lowest score in the block.
  • An outlier cutoff can then be defined. [0050] For the bottom 90% of scores, ⁇ s (m) ; s (m + 1) ; ...
  • a standard deviation 302 can be calculated: [0051] Where a sample mean 304 of the bottom 90% of the scores is: [0052] An outlier cutoff 306 can be defined: [0053] Where a sample average 308 of all scores in the block is: [0054] Any score s i can be called an outlier if s i > c. [0055] If there are no outliers, all samples can be called normal. That is, if If there are outliers present, then a set of outliers can be defined as: [0056] i o can be a largest index in O (corresponding to the outlier with the lowest value).
  • a set of first differences can be defined as: [0057]
  • another set can be defined as: [0058]
  • This set can be a set of differences between each outlier point and its next highest neighbor. For example, the last outlier can be excluded because a difference between the smallest outlier and the highest non-outlier may not be desired in some implementations.
  • a largest difference between any outlier point and its next highest neighbor can be defined as: [0060] Then, all scores can be classified using the threshold: [0061] In some implementations, a less conservative approach to defining the outlier cutoff can be used.
  • the threshold for calling a case can be defined as: [0062]
  • the computer system can call up to 5% of samples in any given block case.
  • the 99% cutoff approach described in FIG.3A can be applied. This is because a user may want to allow for greater than 1% of samples to be called cases. If the data appears to have a large cluster of high scores (e.g., less than 5% of the data), but if the cluster is too large (e.g., more than 5% of the data), with large blocks it may not be expected that all of these samples are cases. Thus, going back to the 99% threshold cutoff can be applied and a more conservative approach.
  • FIG.3B is a graphical depiction of the outlier cutoff.
  • False cancer points 300A-N and true cancer points 302A-N can be plotted on a graph depicting distribution of scores within any given block. As indicated, there can be a higher density of false cancer points 300A-N (e.g., non- cancer samples) and a much lower density of true cancer points 302A-N.
  • the true cancer points 302A-N can be samples in the block that are classified as positive (e.g., cancer).
  • a threshold with 99% specificity can be used, such that a conservative approach to identifying an outlier cutoff is used.
  • blocks have scores that are all very similar (e.g., all high, all low, etc.)
  • all samples can be called normal to protect against unnecessary false positives. Therefore, to define an outlier cutoff, a standard deviation of bottom 90% of scores can be taken. Then any score can be called an outlier if score > mean(scores) + 3 ⁇ . If a block has no outliers, then all samples can be called normal. In presence out outliers, a cancer cutoff can be defined.99% can be used as the threshold.
  • a clustering method can then be used to call a top cluster of scores cancer, with a limit on a number of cancers that can be called.
  • FIGS.4A-B depict a flowchart of a process 400 for determining the basic fixed threshold of FIG.2.
  • the process 400 can be performed by the computer system or any other computer system described herein.
  • a population of sample data can be received in 402.
  • the sample data can then be broken up or separated into blocks in 404.
  • Block sizes can be relative and based on scores of the ranked sample data. This can be advantageous to prevent calls from being incorrectly made or influenced by batch effects. The larger a block, the better. Instead of analyzing samples one at a time relative to the entire population of data, samples can be analyzed relative to similar samples in a block.
  • a call can be made for the block as a whole to determine whether the block is indicative of cancer or some other condition that is being called.
  • Splitting up the data into relative blocks provides for relative rankings, which can be more stable than overall rankings. After all, overall rankings can fluctuate based on batch effects and random conditions in collecting data for the population.
  • a block can be selected in 406.
  • a desired specificity can be determined for the selected block in 408. The specificity can be different for each block. In some implementations, the specificity can also be the same for one or more blocks of the population of data.
  • a score can then be assigned to each sample data using a scoring algorithm of choice (409).
  • the sample data can be ranked in 410.
  • the rankings can be based on case scores assigned to each data.
  • the data can be ranked from highest to lowest score.
  • the ranked data scores can be plotted on a graph or similar depiction in which scores are displayed on a Y axis.
  • the population of sample data can be already ranked when it is received in 402.
  • a threshold value of data scores for the block can be calculated based on the desired specificity in 411. As described above, the threshold value can be different per block, based on the desired specificity, the scores for the data in the block, and/or a number of data in the block. Moreover, the threshold value can adjust or change over time based on one or more factors. For example, when positive calls are made (e.g., cancer status), the specificity for the block can increase and/or the threshold value can increase.
  • the threshold value and/or the desired specificity can adjust accordingly. As yet another example, if more data is added to the block, the threshold value and/or the desired specificity can adjust accordingly. As yet another example, if no calls are being made and/or everything is being called normal, the threshold value can be lowered. [0071] It can be determined whether the case score for a selected data in the block is greater than or exceeds the threshold value in 414. [0072] If the score does not exceed the threshold value, then a classifier value of 0 can be assigned in 420. A classifier value of 0 can indicate that the data is not indicative of cancer or another condition that is being called. The classifier value of 0 can indicate that the data is normal. [0073] If the score does exceed the threshold value, then a classifier value of 1 can be assigned to that data point in 416.
  • a classifier value of 1 can indicate that the data is indicative of cancer or another condition that is being called.
  • the classifier value of 1 can indicate that the data is therefore not normal.
  • the desired specificity and/or sensitivity can be increased in 418 based on making a positive call in 414-416.
  • assigned classifier values can be any binary values. As another example, instead of 1 and 0, True and False can be assigned. [0075] Next, it can be determined whether there are more ranked sample data in the block in 422. If there are, 414-422 can be repeated for each of the remaining sample data in the block.
  • FIGS.5A-D depict a flowchart of a process 500 for determining the outlier cutoff of FIGS.3A-B.
  • the process 500 can be performed by the computer system or any other computer system described herein. Moreover, the process 500 can be repeated for each block comprising a population of data.
  • a block of sample data can be received in 502.
  • a population of data can already be broken up into blocks, as described herein.
  • Case scores for the sample data in the block can be identified in 504.
  • the identified case scores can then be ranked from highest to lowest in 506.
  • An outlier cutoff score can then be defined in 508, as described in reference to FIGS.3A-B.
  • the outlier cutoff score can be based on a lowest and/or highest score or aggregate of scores in the ranked sample data.
  • the outlier cutoff score can take into account data that is an outlier or does not have a case score that is close to other relative scores in the block. [0077]
  • a case score can then be selected from the ranked case scores in 510.
  • a largest difference of these differences can be identified in 530.
  • a value of 1 can be assigned to the data associated with that case score in 540. As mentioned throughout, this value can indicate a positive call, such as cancer or some other condition that is being called. If the case score is less than the largest difference in 538, then a classifier value of 0 can be assigned to the data associated with that case score in 542. As mentioned throughout, this value can indicate that the data point (e.g., sample) is normal or does not have the condition that is being called (e.g., cancer). [0081] Regardless of which score is assigned in 540 or 542, it can next be determined whether there is more ranked sample data in the block in 544.
  • 532-544 can be repeated for each remaining ranked sample data in the block. If the block does not have any more sample data to analyze, the classifier values for each of the ranked sample data can be outputted in 546.
  • a case score can be selected from all the ranked sample data in the block in 534. It can then be determined whether the selected case score is greater than the minimum specificity percentile in 535. If the case score is greater than the minimum specificity percentile, then a classifier value of 1 can be assigned in 540.
  • FIG.6 is a diagram of system components of the system 100 of FIG.1.
  • the system 100 includes the user computing device 102 and the computer system 104, which can communicate via the network 106.
  • the user computing device 102 can provide a user such as a lab technician with a display, input, and output devices. The user can input the population of sample data 108 to the user computing device 102, which can then transmit that data 108 to the computer system 104.
  • the computer system 104 can include a block threshold determiner 602, a classifier engine 604, a block outlier determiner 606, and a network interface 512. One or more of these components of the computer system 104 can be combined and/or removed from the system 104.
  • the block threshold determiner 602 can be configured to determine block sizes for the population of data that is received from the user computing device 102. It can be advantageous to determine block sizes having more data because the more data, the higher specificity and/or sensitivity. As described throughout this disclosure, the block sizes can be determined based on a quantity of data in the population and a ranking of each of the data in the population. One or more blocks can be different sizes.
  • the classifier engine 604 can be configured to determine a desired specificity per block.
  • the classifier engine 604 can also be configured to determine a threshold value per block based on the desired specificity.
  • the engine 604 can further be configured to classify each data in the block based on their case score and the determined threshold value. Therefore, the engine 604 can classify data having case scores that exceed the threshold value as cancer or an associated binary value for that positive call.
  • the engine 603 can then classify data having case scores below the threshold value as normal or an associated binary value for that call.
  • the block outlier determiner 606 can be configured to determine an outlier cutoff.
  • the determiner 606 can be configured to classify data in a block based on the determined outlier cutoff.
  • the classifier engine 604 can be configured to classify data once the determiner 606 determines an appropriate outlier cutoff for that block.
  • the block outlier determiner 606 can also include an outlier threshold determiner 608.
  • the determiner 608 can be configured to determine a threshold value for the block based on the outlier cutoff. That threshold value can be used to then classify the data in the block, where the threshold value is based on the outlier cutoff.
  • the network interface 610 can provide for communication between the computer system 104 and one or more other components of the system 100.
  • the computer system 104 can also be in communication with a classified sample data database 612.
  • the database 612 can be configured to store classifier values for each of the sample data in a population 614A-N. In some implementations, the database 612 can additionally or alternatively store aggregate classifier values per block of a population of sample data.
  • the database 612 can also store additional information such as prior defined block sizes, prior defined specificities and/or threshold values, and/or outlier cutoffs. Information stored by the database 612 can be used by the computer system 104 to improve one or more algorithms, models, and/or components of the system 104 to more accurately block populations of sample data, determine specificities, classify data per block, and/or determine outlier cutoffs.
  • FIG.7 is a flowchart of a process 700 for binary classification of incidence-based data.
  • the process 700 can be same, similar, and/or an alternative process as processes described herein, such as the process 400 of FIGS.4A-B.
  • scored sample data can be received in 702.
  • Block sizes can be determined for the received data in 704.
  • the data can be grouped into blocks based on the determined block sizes in 706.
  • a block can be selected in 708.
  • a desired specificity can be determined for that block in 710.
  • a positive threshold value can be identified as 1-X% of specificity in 712, where X% is the desired specificity. For example, as described throughout, if the desired specificity is 99%, the positive threshold value can be 1%.
  • any data in the block having a case score in the 1% can be classified as a positive call (e.g., cancer).
  • a positive call e.g., cancer
  • FIG.8 is a graphical depiction of incidence-based data broken up into blocks as described herein. As depicted, data points 804A-N and 806A-N can be graphed, plotted, and/or ranked based on their case scores. Scores can be ranked along a Y axis 800. Moreover, a threshold value 802 can be determined based on the scores of the data points 804A-N and 806A- N in a population of data. The threshold 802 depicted in FIG.8 is relative to all the points in the population.
  • Blocks 808A-N can be designated around sets of data points 804A-N and 806A-N.
  • the blocks 808A-N can be a same size.
  • the blocks 808A-N can be different sizes.
  • the blocks 808A-N can include a different number of data 804A-N and 806A-N.
  • Taking the block 808A, cancer point 804A and normal point 806A are both above the threshold 802.
  • the cancer point 804A and the normal point 806A can both be classified as positive calls (e.g., cancer).
  • positive calls e.g., cancer
  • the top 1% of scores for the points 804A-N and 806A-N can be called positive (e.g., cancer) and everything else can be called normal.
  • Calling the normal point 806A cancer can be the 1% of error that is allowed with a 99% specificity.
  • the threshold 802 can be adjusted to make more positive calls. For example, the threshold 802 can be raised such that the normal point 806A is just below the threshold.
  • FIG.9 is an example table comparing use of the binary classification of incidence- based data described herein in a fragmentation use case.
  • specificity and sensitivity can improve with use of techniques described herein (e.g., block method) in comparison to standard or traditional calling techniques.
  • the block techniques described herein can be applied to various different use cases, including but not limited to fragmentation, mutations, and proteins. Regardless of use case application, the disclosed techniques provide for improved specificity and sensitivity.

Abstract

Un procédé de classification de données dans des blocs peut comprendre la réception d'une population de données d'échantillon, la détermination de tailles de bloc par rapport aux données d'échantillon, et le regroupement des données d'échantillon en blocs sur la base des tailles de bloc déterminées. Le procédé peut en outre comprendre la sélection d'un bloc, la détermination d'une spécificité souhaitée pour le bloc, l'attribution d'un score à chacune des données d'échantillon, le classement des données d'échantillon en fonction du score associé à chacune des données d'échantillon, l'identification d'une valeur seuil positive pour le bloc sur la base de la spécificité souhaitée, et la classification des données d'échantillon évaluées sur la base de la valeur seuil positive.
PCT/US2021/053143 2020-12-14 2021-10-01 Bloc WO2022132286A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063125161P 2020-12-14 2020-12-14
US63/125,161 2020-12-14

Publications (1)

Publication Number Publication Date
WO2022132286A1 true WO2022132286A1 (fr) 2022-06-23

Family

ID=82059775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/053143 WO2022132286A1 (fr) 2020-12-14 2021-10-01 Bloc

Country Status (1)

Country Link
WO (1) WO2022132286A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232425A1 (en) * 2013-11-06 2016-08-11 Lehigh University Diagnostic system and method for biological tissue analysis
US20180068083A1 (en) * 2014-12-08 2018-03-08 20/20 Gene Systems, Inc. Methods and machine learning systems for predicting the likelihood or risk of having cancer
KR102108050B1 (ko) * 2019-10-21 2020-05-07 가천대학교 산학협력단 증강 컨볼루션 네트워크를 통한 유방암 조직학 이미지 분류 방법 및 그 장치

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232425A1 (en) * 2013-11-06 2016-08-11 Lehigh University Diagnostic system and method for biological tissue analysis
US20180068083A1 (en) * 2014-12-08 2018-03-08 20/20 Gene Systems, Inc. Methods and machine learning systems for predicting the likelihood or risk of having cancer
KR102108050B1 (ko) * 2019-10-21 2020-05-07 가천대학교 산학협력단 증강 컨볼루션 네트워크를 통한 유방암 조직학 이미지 분류 방법 및 그 장치

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALOM MD ZAHANGIR; YAKOPCIC CHRIS; NASRIN MST. SHAMIMA; TAHA TAREK M.; ASARI VIJAYAN K.: "Breast Cancer Classification from Histopathological Images with Inception Recurrent Residual Convolutional Neural Network", JOURNAL OF DIGITAL IMAGING, SPRINGER INTERNATIONAL PUBLISHING, CHAM, vol. 32, no. 4, 12 February 2019 (2019-02-12), Cham, pages 605 - 617, XP036841032, ISSN: 0897-1889, DOI: 10.1007/s10278-019-00182-7 *

Similar Documents

Publication Publication Date Title
WO2020164282A1 (fr) Procédé et appareil de reconnaissance de cible d'image basée sur yolo, dispositif électronique et support de stockage
WO2022111327A1 (fr) Procédé et appareil de traitement de données de niveau de risque, support de stockage et dispositif électronique
CN101484910B (zh) 聚类系统及缺陷种类判定装置
CN110352389B (zh) 信息处理装置及信息处理方法
US10133962B2 (en) Method of digital information classification
CN111539308B (zh) 基于深度学习的胚胎质量综合评价装置
JP7333482B2 (ja) 微生物の種特異的共通配列の取得方法、装置及び応用
CN108681751B (zh) 确定事件影响因素的方法及终端设备
CN114706992B (zh) 一种基于知识图谱的事件信息处理系统
CN112287094A (zh) 相似病例文本检索系统
CN111783107B (zh) 一种多源可信数据接入方法、装置及设备
CN115018315A (zh) 一种供热异常的检测方法、装置、电子设备及存储介质
WO2022132286A1 (fr) Bloc
CN111539451A (zh) 样本数据优化方法、装置、设备及存储介质
WO2007053962A1 (fr) Procede et systeme informatiques permettant d'identifier des organismes
US20210327060A1 (en) Discerning device, cell mass discerning method, and computer program
CN112328775B (zh) 病例文本信息检索系统
CN111414930A (zh) 深度学习模型训练方法及装置、电子设备及存储介质
CN112529112B (zh) 一种矿物识别的方法和装置
CN111652733B (zh) 基于云计算和区块链的金融信息管理系统
AU2021290433A1 (en) Data sampling method and apparatus, and storage medium
CN113836826A (zh) 关键参数确定方法、装置、电子装置及存储介质
KR100879854B1 (ko) 분포 적합 자동화 시스템 및 방법
CN115639786A (zh) 调度路径确定、晶圆调度方法、装置、设备及存储介质
EP3540660A1 (fr) Appareil et procédé associé permettant de déterminer un ensemble de conditions de commande d'une ligne de production

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21907393

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21907393

Country of ref document: EP

Kind code of ref document: A1