WO2023147363A1 - Pipeline de diffusion en continu de données pour systèmes et applications de mappage de calcul - Google Patents
Pipeline de diffusion en continu de données pour systèmes et applications de mappage de calcul Download PDFInfo
- Publication number
- WO2023147363A1 WO2023147363A1 PCT/US2023/061274 US2023061274W WO2023147363A1 WO 2023147363 A1 WO2023147363 A1 WO 2023147363A1 US 2023061274 W US2023061274 W US 2023061274W WO 2023147363 A1 WO2023147363 A1 WO 2023147363A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- compute
- information
- platform
- computer
- Prior art date
Links
- 238000013507 mapping Methods 0.000 title claims description 10
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000003860 storage Methods 0.000 claims abstract description 32
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 238000010801 machine learning Methods 0.000 claims description 38
- 230000036541 health Effects 0.000 claims description 32
- 238000002372 labelling Methods 0.000 claims description 30
- 230000005540 biological transmission Effects 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 4
- 238000013459 approach Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 20
- 238000013500 data storage Methods 0.000 description 7
- 208000024172 Cardiovascular disease Diseases 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 238000003384 imaging method Methods 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 4
- 230000002526 effect on cardiovascular system Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 4
- 230000000747 cardiac effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 210000004072 lung Anatomy 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000002483 medication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000002610 neuroimaging Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 206010019280 Heart failures Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012432 intermediate storage Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000009613 pulmonary function test Methods 0.000 description 1
- 230000000246 remedial effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000003325 tomography Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- Accessing and using vast quantities of data may present several challenges due to the way the data is stored, filed, or transmitted.
- data that is stored at remote locations may be difficult to obtain for meaningful analysis and evaluation without expending significant resources for data transmission and local storage.
- different filing and storage procedures may cause difficulties with respect to identifying, organizing, and preparing data for processing.
- transmission of vast quantities of data is time consuming and costly.
- FIG. 1 illustrates an example environment for a data platform, in accordance with various embodiments
- FIG. 2 illustrates an example environment of a queuing system in use with a compute service, in accordance with various embodiments
- FIG. 3 A illustrates an example environment of a data filer, in accordance with various embodiments
- FIG. 3B illustrates an example environment of a compute service, in accordance with various embodiments
- FIG. 4A illustrates an example service environment, in accordance with various embodiments
- FIG. 4B illustrates an example digital health platform, in accordance with various embodiments
- FIG. 5 A illustrates an example process for executing a compute operate using remote data, in accordance with various embodiments
- FIG. 5B illustrates an example process for executing a compute operating with a pretrained model, in accordance with various embodiments
- FIG. 5C illustrates an example process for determining and provisioning different execution modes for a workflow, in accordance with various embodiments.
- FIG. 6 illustrates a computer system, according to at least one embodiment.
- a platform to use artificial intelligence (Al) and/or machine learning (ML) to process information using one or more pre-trained models that are updated and/or configured based on one or more configuration files for a desired task.
- Al artificial intelligence
- ML machine learning
- Various embodiments may further incorporate a data platform to enable users to identify data sources, select from the pre-trained models, and establish compute pipelines and/or workflows for processing different types of information.
- This data platform may be integrated with and/or communicate with the data pipeline to permit efficient data streaming, pre-processing, and mapping along with one or more compute services.
- specific applications may be deployed with a particular emphasis as defined, at least in part, by one or more configuration files.
- the data platform may be used to build components of the specific applications and link the data pipeline for data streaming and compute services, thereby enabling sub-systems to be developed within a larger data environment.
- Various embodiments are directed toward an integrated platform that may be used to retrieve data from a source via a streaming connection, label the data for filing, provide resources for users to interact with the platform, and to store and/or prepare a library of pre-trained models that may be used for one or more application based on a set of input configuration files.
- the integrated platform may be used in a variety of industries, such as healthcare, where unique challenges are present with respect to control of information, labeling of data, and the like.
- EHRs electronic health records
- EHRs may be present from a variety of different practices or sources, and as a result, data may be coded differently or otherwise difficult to group together.
- Systems and methods may be used to pre-process information acquired from data sources, such as EHR data sources. This information may then be distributed through one or more pipelines or workflows for processing, such as for filing, for evaluation via one or more compute services, and/or the like. Accordingly, systems and methods may be used to establish an integrated platform to permit one or more users to aggregate data from a variety of sources, prepare the data, transmit
- systems and methods of the present disclosure may incorporate sub-systems directed toward at least engineering/infrastructure, integrated clinical labeling, democratization, and a library of pretrained models.
- Systems and methods may be used to leverage infrastructure to build a library of pretrained base models to enable clinical and research partners to utilize the platform for various projects.
- the pre-trained models may be used with various different modalities, including at least multi-modal, graph (FHIR), language, image, video, and audio.
- Pre-training may increase performance, reduce label requirements, and reduce model development time.
- a variety of different languages may be implemented to develop the environment, with one or more APIs being deployed to enable communication between systems having different underlying language definitions and/or different storage or compute parameters.
- systems and methods may include various just-in-time data loaders to both on prem and off prem (e.g., cloud storage) in order to stream data directly into the compute infrastructure.
- Further embodiment may include a library of APIs that allow for full connectivity to/from a composite of clinical systems, such as EHR for applications in the health field.
- EHR embeddings and workflow-integrated labeling solutions may also be implemented.
- Democratization efforts may include standardizing tutorials for model development, deployment, and monitoring.
- systems and methods of the present disclosure enable rapid and seamless creation and destruction of ephemeral compute services to facilitate execution of one or more compute operations in accordance with instructions associated with one or more workflows.
- a workflow may be associated with various machine learning systems, such as those that may process, evaluate, and classify information within an input.
- a compute node may be provisioned to load one or more machine learning applications from a database of pre-trained models, execute commands in accordance with the workflow, and then destroy or otherwise close the node upon completion of the task.
- creation of compute nodes may be skipped, for example, in
- Systems and methods may enable clinical personnel, such as those that are not intimately familiar with different machine learning systems, to execute applications within the workflow, label information (in certain embodiments), and the like.
- labeling may be substantially in line with the clinical personnel’s skill set in executing their traditional job function, but now, the labeling may be used to facilitate further development of different machine learning systems.
- clinical personnel may be more intimately involved with the machine learning applications, even when their expertise or particular job function is not directly involved with these systems. As a result, additional expertise may be added to a pool of users due to the decreased barrier to entry.
- pretrained models which may also be bespoke models that are particularly trained for one or more tasks.
- a classifier may be particularly trained to identify nodes within lung image information.
- a natural language processing system may be particularly trained to identify clinical notes or evaluations within a corpus of text.
- Systems and methods provide the collection of pretrained libraries, which may then be initialized using one or more configuration files, to enable clinical personnel to either run workflows on their own and/or to facilitate with training of different systems for one or more tasks by providing labeled ground truth information.
- use of remote compute systems may enable operation of these models on hardware with reduced computing capacity (compared to data centers, for example) such as smart phones, personal computers, and the like.
- Systems and methods also provide for a just-in-time approach for connecting to, identify, and receiving information from one or more different remote data sources. For example, a data pipeline may be established in which selected data is directly mapped to and/or loaded to particular layers of the machine learning system. As a result, the selected data may remain in its storage location within transmitting the data to the compute node and/or device executing the
- Such systems may also be integrated into various other applications, including enterprise migrations, consolidation of storage, and the like.
- FIG. 1 illustrates an environment 100 in which various embodiments may be deployed.
- a user device 102 may be used to communicate with an application environment 104 over a network 106.
- the user device 102 may be operated by a human user or may be a device that executes instructions, such as a device that is programmed to perform one or more tasks, such as on a continuing or recurring basis.
- the user device 102 may include one or more electronic devices, such as a computer, smart phone, wearable device, gaming console, server, compute cluster of various processing units, and the like.
- the user device 102 communicates over the network 106, such as the Internet, in order to interact with the application environment 104.
- the application environment 104 may be a distributed computing environment (e.g., a “cloud” environment) using a server-based or serverless architecture.
- the application environment 104 executes on hardware that may be remotely positioned from the user device 102 and may be shared or otherwise used with other systems.
- one or more components of the application environment 104 may execute on a virtual machine that uses underlying hardware that executes one or more additional virtual machines.
- various modules or sub-systems of the application environment 104 are shown separately for convenience and clarity, but may be integrated into a common module and/or set of programming instructions. For example, operations to label and file data may be integrated into a common platform or application.
- Various embodiments include a landing environment 108 that may serve as a landing page or introduction to the user device 102.
- the landing environment 108 may include one or more features to verify access credentials of the user device 102 and/or to provide access in accordance with various user preferences or stored settings.
- Various applications 110 may also execute within the application
- the application environment 104 may be used to establish one or more workflows to execute within the applications 110.
- a first application may be related to evaluating lung imaging data
- a second application may be related to evaluating brain imaging data
- a third application may be related to evaluating colon imaging data
- a fourth application may be related to evaluating contraindications between users with one or more known conditions, and so forth.
- the applications may be built and executed within an established workflow of the application environment 106, as described herein.
- a data platform 112 may execute within the application environment 104 in order to identify, pull, and/or process data from one or more data sources 114, which may be remote data sources.
- the data platform 112 enables a pipeline for streaming data transmission without moving the data to a new storage location.
- certain health systems may have large quantities of data that are stored at various locations and associated with different divisions of the health system. This data may include a significant quantity of information that can be processed to improve diagnostics or care for users of the health system. However, with large quantities of data, it may be difficult, time-consuming, and expensive to move the data.
- systems and methods of the present disclosure enable streaming data from the one or more data sources 114 through a pipeline while maintaining/leaving the data at its present storage location.
- the application environment 104 is not burdened with the additional requirement of providing storage locations for the data and also may receive, evaluate, and then direct data to particular locations without intermediate storage. In this manner, information can be queued and then mapped directly to a compute location.
- the data platform may include a data loader 116 to communicate with one or more data sources 114, for example via the network 106.
- the data loader 116 may identify specific classes or types of information or identify a storage location, and then establish
- a pipeline for transmission of information over the network 106.
- transmission may be in the form of streaming information and not a wholesale movement or copying of the data, which may lead to reduced bandwidth use.
- a pipeline may be established in which one or more configuration files may be used to identify files, identify a file location, read information from the file, extract one or more portions of the information, process the data for use with one or more systems of the application environment 104, and the transmit the information to one or more queues, which may be used to map information directly to one or more compute services 120.
- these compute services 120 may be provisioned and executed upon a request from the user device 102, as part of a predetermined workflow, or combinations thereof.
- information from the one or more data sources 114 may be evaluated, extracted, pre-processed, and then mapped and transmitted directly to one or more compute services, such as to a specific GPU and/or to a specific layer of a trained machine learning system, without moving or otherwise transmitting the data to an intermediate location.
- compute services such as to a specific GPU and/or to a specific layer of a trained machine learning system, without moving or otherwise transmitting the data to an intermediate location.
- Various embodiments of the present disclosure may also incorporate a labeling system 122, a data filer 124, and/or a queueing system 126. As noted, these systems are shown as separate blocks in the example of FIG. 1, but it should be appreciated that the functionality of these systems may be incorporated into a single component and/or a single platform executing within the application environment 104.
- the labeling system 122 may be referred to as a unified labeling platform that may be used to consistently label and/or identify data that is used within the application environment 104.
- legacy data e.g., data stored with previous labeling instructions or coding instructions
- legacy data may be rich in information, but such information may be hard to identify and extract when labeled or otherwise buried within other information.
- Embodiments may be used to extract information from various data sources and to re-label or otherwise improve labeling of the data.
- an application may execute to scan information within the data and to identify keywords and/or locations of the information.
- a document may include information such as a patient location, coding instructions for care, notes regarding health history, and treatment notes.
- the labeling system 122 may be used to extract information from the treatment notes, for
- the labeling system 122 may also be integrated with, or receive information from, a human labeling system. For example, a user may assist the labeling system 122 by identifying data. As will be appreciated, in at least one embodiment, the data loader 116 and/or the labeling system 122 may obscure or otherwise remove identifying information within the data in accordance with various retention and privacy regulations.
- the illustrated data filer 124 may direct or otherwise queue information from the obtained data into the queueing system 126.
- the data filer 124 may identify particular data associated with cardiovascular disease and then direct that information into the queuing system 126 for computation with other indicators for cardiovascular disease.
- Such a process may be used to develop a workflow that enable large quantities of data to be evaluated, labeled, and then filed into an appropriate location for further processing.
- the data filer 124 may also be used to provide notifications for users of the system. For example, if the labeling system 122 identifies one or more follow up actions, the data filer 124 may be used to provide the information to a notification system to provide an alert or a recommendation for action to a care provider, among other options.
- the data filer 124 may be incorporated within one or more orchestration environments as a batch process.
- the system may be executed in the background or at certain designated times with limited human interaction.
- a document repository that includes EHR may be processed to identify information relevant for a particular application.
- Information may be processed in accordance with one or more configuration files to identify and extract relevant information for a particular application and then loaded into the queueing system 126 for further processing at a later time, for example, when a user returns and provides a configuration file to a pre-trained machine learning system to process the information. In this manner, data can be scanned, evaluated, tagged, and pre-
- the illustrated queueing system 126 may be used to map or otherwise coordinate operation with one or more artificial intelligence (Al) and/or machine learning (ML) services 128. That is, the queueing system 126 may be used to map and/or direct information directly into one or more data iterations of the AI/ML applications executing as part of the service 128.
- the AI/ML service 128 may use one or more pre-trained models from a pre-trained library 130.
- the pre-trained libraries may be trained to perform one or more tasks (e.g., classification, image segmentation, data extraction, natural language processing, etc.) in accordance with one or more parameters.
- the parameters may be provided as part of a configuration file that is particularized with use for a given application.
- a pre-trained classifier may be used in classifying both brain imaging data and lung imaging data
- embodiments may incorporate different configuration files to identify which features may be associated with a desired output.
- the pre-trained models may be updated and/or modified over periods of time. For example, as new information is acquired through running the AI/MI service 128, such information may be fed back to various models of the pre-trained library 130.
- new models may be generated as new information is acquired, with those new models being added to the library 130. In this manner, bespoke models may be generated for particular functions (e.g., classification for particular organs, natural language processing for certain types of records, etc.).
- Various embodiments of the present disclosure may be used with one or more distributed computing platforms that enables the application environment 104 and/or one or more components of the application environment 104 to integrate with or run alongside the distributed computing platform.
- the distributed computing platform may provide the one or more compute services 120 based on instructions from the AI/ML service 128 that may load the pre-trained models and configuration information into the compute service 120 for execution.
- various embodiments may also leverage or otherwise use information or services from the distributed computing platform, such as encryption services, data collection services,
- various embodiments provide a unified platform in which one or more organizations can share information, even legacy information, to collect, identify, extract, and prepare information for further evaluation or filing, even if information is stored or otherwise coded using different systems.
- systems and methods may incorporate a singular location or platform for system-wide data loading, data labeling, and data queuing efforts. Additionally, particular features associated with the functionality of the platform may also be integrated into the platform. For example, characteristics of processing and using EHR may be built into one or more workflows to effectively manage data. The incorporation of a singular management and/or labeling solution may enable a more diverse group of input data. Furthermore, a digital platform for practitioners may enable more users to leverage and use the various applications within the system. Additionally, as noted above, various compute services may be used with pre-trained models. For example, containerized, ephemeral development environments may be connected to spot compute nodes.
- highly optimized and parallelized just-in-time data loaders may be incorporated to stream cloud and on premises data directly into GPUs.
- compute resources which may use a “bring your own model” system
- users may gain access to an a la carte menu of highly configurable spot compute resources that reduce costs for end users.
- Various embodiments may also incorporate tutorials and information for users to learn how to build their own applications and build out custom workflows for their own particular uses.
- These workflows may also incorporate a library of pre-trained models, as noted herein, that reduces the need for individual and/or additional labeling, increases model performance, reduce modeling time, and allows greater diversity in use cases.
- FIG. 2 illustrates an example architecture diagram 200 for a streaming solution that may be associated with one or more embodiments of the present disclosure.
- the diagram 200 shows information transmitted from the data sources 114 to the queuing system 126, which incorporates one or more processing systems 202 and an event engine 204.
- the queuing system 126 may be integrated into one or more additional applications and/or may communicate with other applications to provide different functionality.
- the data sources 114 may be transmitted over a network, such as the internet, and may include both remote and on-prem data sources.
- the on-prem data sources may be hard-wired to one or more compute systems that permit data transmission over a physical connection, such as a universal serial bus.
- transmission may also be over a common network or the like.
- transmission may occur via one or more network connections.
- the data sources 114 may be associated with a variety of entities, such as different hospital systems within a health network, different departments within a government health system, research information for one or more universities or private institutions, and/or the like.
- various embodiments may implement one or more processing systems 202, which may be batch processing systems, to receive, evaluate, extract, and/or label different data from the data sources 114.
- processing systems 202 may be batch processing systems, to receive, evaluate, extract, and/or label different data from the data sources 114.
- multiple processes may be used. For example, for text-only information, a single processing step may be sufficient to extract relevant information.
- multimodal information such as data that includes both text and images or text, images, and video, multiple processing steps may be used.
- data may be converted from raw or substantially raw data to processed or pre-processed data. The conversion between data types may include eliminating extraneous information, anonymizing data, extracting salient information, and/or the like.
- the event engine 204 may be implemented to monitor different steps of the queuing process and to prepare information for mapping to one or more compute services 120.
- the event engine 204 may control a number of queues that are open, control topics within queues, manage subscriptions, and the like.
- the event engine may direct data for processing based on data modalities, expected output sources, and the like. For example, if a configure file provides information that data will be used in natural language processing, one step of the processing may include removal of images or graphics from the
- the event engine 204 may be used to map or otherwise direct information (e.g., processed data, raw data, etc.) to one or more compute services 120.
- the data may be mapped directly to a GPU and streamed to the GPU without requiring movement from the original storage location. In this manner, particular information may be extracted and used with various compute services without copying data over.
- data from a variety of sources 114 may be collected, processed, and then integrated into different pipelines and workflows for analysis.
- FIG. 3A illustrates a data sorting environment 300 that may be used with embodiments of the present disclosure.
- the data filer 124 is illustrated as a system to receive information from one or more data sources 114 and sort the information based on properties provided by one or more configuration files 302.
- various additional components have been omitted for clarity and conciseness, which may include various pre or post processing systems, data fetchers, network architecture, and/or the like.
- the data filer 124 may be incorporated within, or in communication with, one or more additional systems or sub-systems.
- the data filer 124 may be a virtual representation that executes on one or more processors in accordance with different machine-readable instructions.
- data sources 114 which as noted may be remote and/or on prem, may be streamed to the data filer 120.
- the data filer 124 may receive the information without physically moving the storage location for information within the data sources 114.
- a configuration file 204 may provide instructions for how to file or sort the information from the data sources 114.
- the configuration file 302 may include information regarding how to file or otherwise direct information.
- the information may include what type of information to identify within data from the data sources 114, an end location for the data, and/or the like.
- Configuration files 302 may differ based on different end applications. For example, configuration files 302 associated with tumor detection
- configuration files 302 associated with comorbidity information may identify data that includes particular text phrases. It should be appreciated that any number of configuration files 302 may be processed in parallel, or at least partially in parallel, and different pipelines or workflows may be established in accordance with these different configuration files 302.
- the data filter 124 may distribute or otherwise establish end connections, by itself or in part with other portions of the system, to different locations.
- the data filer 124 may map identified and/or extracted data (either pre or post processing) to different locations, such as an AI/ML pipeline 304, an alert pipeline 306, a processing pipeline 308, and/or the like.
- different workflows may be defined that have different output pipelines and/or that include additional pipelines.
- information may be fed to multiple pipelines at once.
- the data filer 120 may provide information to the AI/MI pipeline 304 for use as input information for a compute job, to the alert pipeline 306 to provide clinical information to the user, and to the processing pipeline 308 for storage as training data to develop one or more additional pre-trained models.
- the data filer 124 may direct the information back to the data sources 114 for further storage.
- the AI/ML pipeline 304 may be associated with one or more workflows to map information to a particular compute instance(s) executing one or more pre-trained machine learning systems.
- the alert pipeline 306 may be used to provide information to practitioners for follow on actions. For example, if data includes information regarding treatment options for users, the information may be directed toward a practitioner that will carry out the treatment.
- Another example includes the processing pipeline 308, which may lead to further processing of the information, such as to extract additional data or the like.
- the data filer 124 may be configured to process large quantities of data for mapping or otherwise directing to different endpoints based on configuration files without moving data from its original storage location.
- FIG. 3B illustrates an example compute environment 320 that may be used with embodiments of the present disclosure. As noted, various components have been eliminated for
- the compute service 120 loads one or more pre-trained ML systems from the library 130.
- the pre-trained ML systems may be trained on different types of data and then used with specific applications based on one or more configuration files 322, which may include additional training information, parameters associated with a trained ML system, and/or the like.
- the queuing system 126 may receive information from the data sources 114 (either processed data or pre-processed data) and then may map the data to particular portions of the compute service 120, such as to different layers of the trained ML systems, particular GPUs, and/or the like.
- data may be streamed to the compute service 120 as needed, rather than using wholesale data transfer techniques that may be time consuming and costly.
- one or more outputs 324 may be generated for use within a workflow, as noted herein.
- FIG. 4 A illustrates an environment 400 that incorporates a digital health data platform 402 that may be used to retrieve, evaluate, and transmit data from one or more data sources 114.
- the digital health data platform 402 may be integrated into existing services and storage solutions provided by health practitioners to leverage a large collection of information that may be stored using different standards and protocols but can, if properly evaluated, potentially provide useful health insights for a large number of patients.
- the data sources 114 may include remote and on-prem sources, as noted herein.
- remote sources could be legacy databases or files stored in a cloud storage systems.
- data sources 114 may include streaming data, such as information acquired from wearable devices that may be streamed intermittently or in real or near-real time (e.g., without substantial delay).
- the illustrated digital health platform 402 may be part of a larger application environment 104 (FIG. 1) and/or offered within a service environment 404 associated with a provider, such as a health provider as a non-limiting example.
- the platform includes a data input system 406, such as the data loader 116, the labeling system 122, the data filter 124, the queue system 126, and/or the like.
- the data input system 406 may be
- the event engine 204 may be used to direct information received from the data sources 114, and also a storage location 408.
- the storage location 408 may be used to maintain standards or policies associated with the platform 402.
- the platform 402 is integrated within the service environment 404 to provide one or more additional services 410 and/or processing operations 412 to one or more users.
- additional services 410 may include orchestration centers to manage workflows and communications, API management centers to enable development for individual groups, machine learning systems for classification and automation, an analytic engine to monitor usage and various metrics, compute services, and/or cloud storage services.
- processing operations 412 may include features such as data pre-processing, data post-processing, and the like.
- various outputs 414 may be generated.
- the outputs 414 may include information that is provided to a health practitioner, alarms, data that is added to another work flow, and/or the like.
- various embodiment may implement a digital health data platform to address legacy data architectures that have not been maintained and/or have been insufficiently updated in accordance with modern business needs of data-driven organizations. In an example for healthcare practitioners, this could include using legacy coding information and/or identifying solutions for the sheer volume of data available within an organization.
- Various embodiments may build the platform on different stacks, which may include using features or services from one or more orchestration providers.
- health data models rooted in interoperability standards, may enable a variety of digital solutions (e.g., analytics, ML, etc.) with formal and automated data observability and reliability processes.
- event-driven architecture may be implemented to achieve real time or near real time (e.g., without significant delay) analytics and workflow orchestrated with integrated access to EHR and other systems of record.
- providing API management may enable plug- and-play integration to adapt to changes in the marketplace and adopt new solutions for data storage and transmission.
- the data platform described herein may be useful as a standalone system or one that has been scaled to incorporate a variety of systems, such as numerous healthcare providers. Adding interoperability between providers may provide greater access to data and improved care. However, present systems provide barriers to entry, such as the cost of modernizing systems, data storage and transmission, and the like. Embodiments address and overcome these problems by leveraging universal system to receive, evaluation, and label information for use within one or more end systems, such as a compute service.
- FIG. 4B illustrates an example implementation 420 of a data platform to process multimodal data to generate recommendations or health evaluations.
- the environment 104 receives information from the one or more data sources 114 for processing, evaluation, and the like from one or more components of the platform 402. For example, information may be acquired, evaluated, extracted, and then directed toward one or more machine learning systems based, at least in part, on instructions and/or a workflow associated with the application 110. Thereafter, different output 410 may be generated that may include research data, actionable findings, and the like.
- an application 110 that may be integrated into the environment 104 and/or used with the platform 402, a multimodal, longitudinal cardiovascular disease detection and risk prediction service may established.
- Cardiovascular disease is the number one cause of death in the US and disproportionately impacts patients from racial and ethnic minority groups and other vulnerable populations. Projects leveraging this application 110 may have the ability to directly impact clinical care for patients by enabling the timely identification of high-risk conditions and facilitating access to high quality, cardiovascular care and research.
- Embodiments may integrate or incorporate multiple applications from one or more environment producers, such as stacks for data storage, deep learning, data lake services, analytics, and the like.
- the application 110 may enable generation of one or more workflows to enable ML algorithms to process real-time and/or near real-time cardiovascular data, thereby allowing clinicians to efficiently and effectively deliver evidenced-based, timely, and equitable
- the application 110 may include a platform to leverage data to enable ML research and development of a cardiovascular care system.
- Inputs may be provided as multimodal data information to machine learning models for patients with heart failure.
- information may include patent-reported outcomes, sensor data, wearable device data, curated lists of medications, patient lab work, electrocardiogram data, cardiac computerized tomography data, cardiac catheterization data, demographics, disease definitions, co-morbidities, interventions, testing, remedial actions and their results, pulmonary function tests, endomyocardial biopsies, cardiac magnet resonance imaging data, and/or the like.
- Various embodiments may leverage the rich data collected by various health services to facilitate opportunities to enhance patient care, improve access to treatment and clinical trials, and further research efforts.
- the application 110 may be scaled to various high value data types to provide additional data points to identify potential treatment options.
- FIG. 5A illustrates an example process 500 for executing an operation using a compute instance. It should be understood that for this and other processes presented herein that there can be additional, fewer, or alternative operations performed in similar or alternative order, or at least partially in parallel, within the scope of various embodiments unless otherwise specifically stated.
- a remote data location is identified 502.
- the remote data location may refer to a storage location, which may be at a remote compute location and/or a cloud storage location, among various other options. Remote may also refer to one or more storage locations that are networked or otherwise not integral to one or more compute clusters or nodes that are executing different operations.
- a remote data location is a location from which one or more network connects are used to transmit or otherwise access data stored therein.
- Stored data may be retrieved from the remote data location 504.
- Retrieval of the data may include transmission of at least a portion of the data over one or more data connections.
- various embodiments described herein are related to data streaming such that a physical location of the data is not modified or otherwise changed during operations. For example, the data may remain stored at the remote storage location with portions of it being transmitted or otherwise used for one or more compute operations.
- stored data is read 506 and one or more components from the stored data are extracted 508.
- the extracted data may be used to generate a new file and/or to generate a temporary file that may be used in one or more operations.
- Extracting information from the data may be based, at least in part, on one or more configuration files that may identify particular data types for later processing using one or more compute instances. For example, text may be extracted from a document, the text may be evaluated using one or more natural language processing systems, and then one or more phrases or groups of words may be extracted for further evaluation. It should be appreciated that only certain file within the data location may be evaluated based, at least in part, on different coding or identifying features. For example, files with a certain extension may be evaluated.
- the one or more components may be positioned within a queue 510.
- the extracted components themselves, a newly created file, or the original file with tags identifying the extracted components may be positioned within the queue 510.
- the one or more components are mapped to a compute instance 512.
- mapping the one or more components may enable reduced bandwidth usage while still permitting meaningful evaluation of the data.
- the compute instance may then be used to execute one or more operations using the one or more components and a configuration file 514.
- the configuration file may include parameters for the one or more operations, which may be associated with one or more pre-trained machine learning models that use the parameters of the configuration file. In this manner, various embodiments enable streaming data compute execution.
- FIG. 5B illustrates an example process 520 that may be used with embodiments of the present disclosure.
- a request to perform a compute operation is received 522.
- a user may submit a request to a data platform and/or an automated workflow may generate a request to execute one or more compute operations.
- a library of pre-trained models may be available within the data platform.
- a pre-trained model may be selected 524. The selection of the model may be based on the request. For example, a specific type of operation (e.g., identification) may lead to selection of a classifier.
- a specific type of operation e.g., identification
- another type of operation may lead to identification of one or more natural language systems.
- the pre-trained model may then be provided to a compute instance 526 and configured to execute an operation based on one or more configuration files 528.
- the configuration files may include operational parameters for the pre-trained models based on the request.
- Remotely stored data may then be streamed to the compute instance 530 for execution of the compute operation 532.
- a user or workflow may obtain a model, prepare the model for execution, and then perform a compute operation through one or more data platforms.
- FIG. 5C illustrates an example process 540 to executing a workflow.
- a request to execute a workflow is received 542.
- a digital health platform may receive the request from one or more authorized users, from a script scheduled to run a particular times, and/or the like.
- the workflow and/or the request may include information associated with different operations of the workflow.
- one or more pre-trained models and/or one or more data sources may be selected for the workflow 544.
- the workflow may be directed toward a classification system that receives image information for processing via a trained machine learning system to classify anomalies within the images. Accordingly, one or more data sources may be selected along with one or more pre-trained classifiers.
- an execution mode for the workflow may be determined 546.
- the execution mode may be a determination as to whether the workflow facilitates compute services or labeling 548. If it is determined that a compute execution mode is selected, then one or more compute resources may be provisioned 550. For example, one or more nodes may be used to execute different compute operations, where the one or more nodes may be part of a distributed computing environment.
- a data pipeline between the one or more data sources and the one or more compute resources is established 552. The data pipeline may enable transmission of information, or portions of information, that is mapped directly to one or more GPUs without intermediate transmission and/or storage. The pre-trained model may then be executed against the one or more data sources 554.
- a labeling execution mode may be selected where information from the one or more data sources is provided within an environment 556.
- the environment may also be part of a distributed computing service or may be loaded onto a device that made the request, among other options.
- the user may review the information and provide labeling information, which then received back at the system, such as the health platform 558.
- the labeling information may then be used to update training information for one or more pretrained models 560, which may further be used to retrain one or more pre-trained models 562. In this manner, updated information may be acquired to continuously adjust and/or improve models.
- the information may also be used to generate new models, which may be particularized for a specific use case.
- FIG. 6 is a block diagram illustrating an exemplary computer system 600, which may be a system with interconnected devices and components, a system-on-a-chip (SOC), or some combination thereof formed with one or more processors.
- the computer system 600 may include one or more processors 602 to employ execution units including logic to perform algorithms for processing data.
- the computer system 600 may form part of a compute cluster, for example within a data center, and may execute the instructions within one or more provisioned instances, such as a virtual machine.
- Embodiments may be used in other devices such as handheld devices and embedded applications.
- handheld devices include cellular phones, Internet Protocol devices, digital cameras, personal digital assistants (“PDAs”), and handheld PCs.
- embedded applications may include a microcontroller, a digital signal processor (“DSP”), system on a chip, network computers (“NetPCs”), set-top boxes, gaming consoles, wearable devices, or any other system that may perform one or more instructions in accordance with at least one embodiment.
- DSP digital signal processor
- NetPCs network computers
- the processor 602 may include various execution units to perform various applications, including machine learning or inferencing applications. Additionally, the processor 602 may be a single or multi-processor. The processor 602 may be coupled to a processor bus that may transmit data signals between processor 602 and other
- memory 604 may be implemented as a Dynamic Random Access Memory (“DRAM”) device, a Static Random Access Memory (“SRAM”) device, flash memory device, or other memory device.
- memory 904 may store instruction and/or data represented by data signals that may be executed by processor 602.
- the memory may be any type of data storage device or non-transitory computer- readable storage media, such as a first data storage for program instructions for execution by the processor 602, a separate storage for images or data, a removable memory for sharing information with other devices, etc.
- the device may further include a display element 606, such as a touch screen or liquid crystal display (LCD), among various other options.
- LCD liquid crystal display
- the computer system in many embodiments will include at least one input element 608 able to receive input from a user.
- This input can include, for example, a push button, touch pad, touch screen, wheel, joystick, keyboard, mouse, keypad, or any other such device or element whereby a user can input a command to the device.
- a device might not include any buttons at all, and might be controlled only through a combination of visual and audio commands, such that a user can control the device without having to be in contact with the device.
- the input element 608 may be a component to receive signals received over a network.
- the computer system includes one or more network interface or communication elements or components 610 for communicating over various networks, such as a Wi-Fi, Bluetooth, RF, wired, or wireless communication systems.
- the device in many embodiments can communicate with a network, such as the Internet, and may be able to communicate with other such devices.
- a computer-implemented method comprising: receiving a request to execute one or more compute operations against information within a remote data source; identifying a location of the remote data source; retrieving data stored within the remote data source; extracting, from the data, one or more components;
- a system comprising:
- one or more processing units to: receive a request to perform a compute operation; select, from a plurality of models, a pre-trained model based, at least in part, on the request; provide, to a compute instance, the pre-trained model; provide, to the compute instance, one or more configuration files having parameters for execution of the pre-trained model; and cause remotely stored data to be streamed to the compute instance as an input to the pretrained model.
- the one or more processing units are further to: provide a landing page to a user associated with a health platform; collect health data based, at least in part, on one or more parameters of the health platform; and generate information indicative of diagnostic or treatment information for the health platform.
- a computer-implemented method comprising: receiving a request to perform a compute operation; selecting, from a plurality of models, a pre-trained model based, at least in part, on the request; providing, to a compute instance, the pre-trained model; providing, to the compute instance, one or more configuration files having parameters for execution of the pre-trained model; and causing remotely stored data to be streamed to the compute instance as an input to the pre-trained model.
Landscapes
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Pathology (AREA)
- General Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Les approches présentées ici fournissent des systèmes et des procédés pour une plateforme de données pour identifier, évaluer et mapper des données pour un traitement par l'intermédiaire d'une ou de plusieurs instances de calcul. La plateforme de données peut recevoir une instruction et extraire des données à partir d'une variété d'emplacements de données à distance. Les données peuvent être traitées, par exemple, pour étiqueter ou classer les données, puis diffusées en continu vers une instance de calcul pour une évaluation supplémentaire. La plateforme de données peut être utilisée pour fournir un système centralisé pour gérer des données de système qui combinent à la fois des solutions de stockage classiques et modernes en une plateforme unique intégrée.
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263302807P | 2022-01-25 | 2022-01-25 | |
US202263302811P | 2022-01-25 | 2022-01-25 | |
US202263302798P | 2022-01-25 | 2022-01-25 | |
US63/302,798 | 2022-01-25 | ||
US63/302,811 | 2022-01-25 | ||
US63/302,807 | 2022-01-25 | ||
US18/158,824 US20230236886A1 (en) | 2022-01-25 | 2023-01-24 | Data streaming pipeline for compute mapping systems and applications |
US18/158,824 | 2023-01-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023147363A1 true WO2023147363A1 (fr) | 2023-08-03 |
Family
ID=85283610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/061274 WO2023147363A1 (fr) | 2022-01-25 | 2023-01-25 | Pipeline de diffusion en continu de données pour systèmes et applications de mappage de calcul |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023147363A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210098133A1 (en) * | 2019-09-27 | 2021-04-01 | Parkland Center For Clinical Innovation | Secure Scalable Real-Time Machine Learning Platform for Healthcare |
-
2023
- 2023-01-25 WO PCT/US2023/061274 patent/WO2023147363A1/fr unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210098133A1 (en) * | 2019-09-27 | 2021-04-01 | Parkland Center For Clinical Innovation | Secure Scalable Real-Time Machine Learning Platform for Healthcare |
Non-Patent Citations (1)
Title |
---|
AKSHAY ARORA ET AL: "ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 September 2019 (2019-09-29), XP081484903 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200027210A1 (en) | Virtualized computing platform for inferencing, advanced processing, and machine learning applications | |
AU2020349082B2 (en) | System to collect and identify skin conditions from images and expert knowledge | |
US11935643B2 (en) | Federated, centralized, and collaborative medical data management and orchestration platform to facilitate healthcare image processing and analysis | |
US9760990B2 (en) | Cloud-based infrastructure for feedback-driven training and image recognition | |
US11663057B2 (en) | Analytics framework for selection and execution of analytics in a distributed environment | |
US12020807B2 (en) | Algorithm orchestration of workflows to facilitate healthcare imaging diagnostics | |
US20210174941A1 (en) | Algorithm orchestration of workflows to facilitate healthcare imaging diagnostics | |
US11334806B2 (en) | Registration, composition, and execution of analytics in a distributed environment | |
US20180157928A1 (en) | Image analytics platform for medical data using expert knowledge models | |
WO2019097474A1 (fr) | Architecture de collaboration d'analyse de données et procédés d'utilisation de celle-ci | |
US20220130525A1 (en) | Artificial intelligence orchestration engine for medical studies | |
CN111985197B (zh) | 一种基于医疗信息的模板生成方法 | |
US20230386663A1 (en) | System and method for improving clinical effectiveness using a query reasoning engine | |
US20210183487A1 (en) | Cognitive patient care event reconstruction | |
US20230236886A1 (en) | Data streaming pipeline for compute mapping systems and applications | |
WO2023147363A1 (fr) | Pipeline de diffusion en continu de données pour systèmes et applications de mappage de calcul | |
JP7478518B2 (ja) | 読影支援装置および読影支援方法 | |
US12014807B2 (en) | Automated report generation using artificial intelligence algorithms | |
Ferreira et al. | Web platform for medical deep learning services | |
US20220301671A1 (en) | Cognitive engine compilation of electronic medical records based on patient condition | |
WO2023194795A1 (fr) | Structure de recommandation basée sur l'ontologie sémantique, basée sur des graphes d'entreprise multimodaux, multi-omiques | |
WO2024205911A1 (fr) | Générateur de code pour accéder à différents systèmes de dossiers médicaux | |
WO2024126111A1 (fr) | Système et procédé pour faciliter une consultation de radiologie |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23706269 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |