WO2025049978A1

WO2025049978A1 - Methods and systems for the design and use of intelligent layer shared neural networks and model architecture search algorithms

Info

Publication number: WO2025049978A1
Application number: PCT/US2024/044769
Authority: WO
Inventors: Prabuddha Chakraborty; Sumaiya Shomaji; Md Hafizur RAHMAN; Md Mashfiq RIZVEE
Original assignee: University Of Maine System Board Of Trustees; University Of Kansas
Priority date: 2023-08-31
Filing date: 2024-08-30
Publication date: 2025-03-06

Abstract

Presented herein are systems and methods for the use and/or automated design of neural networks (NNs) with layer shared architecture (e.g., an intelligent layer shared (ILASH) neural architecture). In certain embodiments, the NN with layer shared architecture comprises a base set of layers (e.g., base layer shared model) and one or more branches extending from the base set of layers. Each branch may include one or more layers from the base set of layers and one or more additional layers different from the base set of layers, each branch designed and trained to perform a particular unique task on a common set of input data. As a result, the NN will share some layers among multiple tasks. Moreover, presented herein are techniques for using a predictive neural network search algorithm to create the branched network of the layer shared architecture.

Description

METHODS AND SYSTEMS FOR THE DESIGN AND USE OF INTELLIGENT LAYER SHARED NEURAL NETWORKS AND MODEL ARCHITECTURE SEARCH ALGORITHMS CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to U.S. Provisional Patent Application No. 63/535,627 filed August 31, 2023, the disclosure of which is incorporated by reference herein in its entirety. FIELD [0002] This invention relates generally to systems and methods for the use and design of neural networks. BACKGROUND [0003] Artificial Intelligence (AI) and Neural Networks (NN) are widely used in various domains such as automated industry/manufacturing, intelligent transport systems, smart cities, healthcare, retail, banking, surveillance, and stock exchanges. These NN/AI models are designed to process various types of data, for example, images, videos, sound, sensor values, and natural language documents. NN/AI models are becoming increasingly complex, requiring heavy computational energy for: (i) searching for optimal neural architecture models, (ii) training the models, and (iii) inferencing the models post deployment (e.g., using the trained models to produce output, e.g., a response to a query or command, given a set of input). [0004] There is a need for more efficient neural networks, as well as systems and methods for the automated design of such neural networks. SUMMARY [0005] Presented herein are systems and methods for the use and/or automated design of neural networks (NNs) with layer shared architecture (e.g., an intelligent layer shared (ILASH) neural architecture). In certain embodiments, the NN with layer shared architecture comprises a base set of layers (e.g., base layer shared model) and one or more branches extending from the base set of layers. Each branch may include one or more layers from the base set of layers and one or more additional layers different from the base set of layers, each branch designed and trained to perform a particular unique task on a common set of input data. As a result, the NN will share some layers among multiple tasks. Moreover, presented herein are techniques for using a predictive neural network search algorithm to create the branched network of the layer shared architecture. [0006] The present disclosure recognizes that neural networks (NNs) that perform tasks on a shared input data may contain one or many layers that are the same or similar. As such, it is advantageous in this scenario to have a single NN with layer shared architecture instead of designing and/or using multiple NNs. Implementing a NN with layer shared architecture may significantly reduce runtime (training and/or inferencing), and reduce power consumption. The methods, systems, and NN architectures presented herein differ significantly from previous layer fusion strategies for improving NN efficiency. Furthermore, no previous strategy employs the techniques presented herein for using a predictive neural network search algorithm to create the layer shared architectures. [0007] Also presented herein are methods and systems for automatically designing a neural network with a layer shared architecture [e.g., an intelligent layer shared (ILASH) neural network] to perform a set of tasks using a common dataset. The NN architecture is built using a set of tasks, a common input dataset, and one or more specifications for control of the architecture (e.g., one or more search termination conditions, task priorities, layer mutation flags, sub-branch forking flags, and/or predictive forking flags). The NN architecture may be built using a heuristic based search and/or predictive rapid search. The resulting NN comprises a base set of layers and one or more branches, each of which is trained to perform a unique task from the set of tasks. [0008] In certain embodiments, the NN architecture is built using a heuristic based search. In certain embodiments, the heuristic based search corresponds to an iterative search of a large number of combinations in the NN architecture space. This search strategy results in an exhaustive search and ensures finding optimal architecture. The search starts with producing a base layer shared model for performing a first task from the set of tasks. Producing a base layer shared model may include selecting the initial model from a set of existing (e.g., traditional) NN architectures and/or creating one or many NN layers from scratch. For any additional tasks, a selected branch model is created. The creation process includes performing an iterative search to identify a candidate branch model, then identifying a branching point and/or multi-connection point from the base layer for the selected task, and mutating and/or varying any number of layers of the candidate branch model. At any given step, an efficacy value (e.g., an accuracy value produced by NN using a training data set) for the complete model is computed. The candidate branch model with the highest efficacy value is retained as the output layer shared model. [0009] In certain embodiments, the NN architecture is built using a predictive rapid search. The predictive rapid search relies on predictive steps rather than an iterative search. For example, the approach may rely on predicting a limited set of certain elements, narrowing down the search and reducing search time, as compared to the exhaustive search. The algorithm starts by predicting what task from the set of tasks to be considered as a first task. Next, for performing the selected first task, a base layer shared model is produced. For any additional tasks, a selected branch is created. The algorithm predicts a set of optimal candidate branching points and/or multi-connection points from the base layer. Then, for this limited set of candidates, mutating and/or varying any number of layers of the candidate branch model is performed. For every candidate, an efficacy value (e.g., an accuracy value produced by NN using a training data set) for the complete model is computed. The candidate branch model with the highest efficacy value is retained as the output layer shared model. [0010] In certain embodiments, the base set of layers for the NN with a layer shared architecture may be created using an autoencoder. An autoencoder model may be selected from a set of existing (e.g., traditional) autoencoder architectures and/or one or many autoencoder layers may be created from scratch. The autoencoder model is trained using the set of input data (e.g., to recreate the input data from its output). By its design, the autoencoder model typically results in compressing the input into a lower-dimensional representation and recreating it back to its original form. Relying on this autoencoder property, at least a portion of the trained autoencoder is used as the base set of layers. For each unique task from the set of tasks, the NN is trained by adding a particular candidate branch model to the base set of layers at a particular branch location by automatically conducting the heuristic based search and/or the predictive rapid search. In certain embodiments, the weights of the base set of layers are set throughout the training of the NN model. In certain embodiments, the weights of the base set of layers are allowed to change to meet a required model accuracy. [0011] The neural networks with layer shared architectures described herein may be used in a wide variety of different multitasking scenarios in a variety of technology fields. [0012] For example, a neural network with layer shared architecture as described herein may be used to perform multiple tasks on a common input image (or common set of input images). The multiple tasks may include, for example, jointly performing age prediction, person recognition, and/or mood detection for a common set of face image data (e.g., a single or multiple images). Similarly, the NN framework described herein may be used on a common set of scene data for jointly performing edge detection, segmentation, and/or depth detection. There are many other such possibilities. [0013] In another example, a neural network with layer shared architecture as described herein may be used to perform multiple tasks on a common set of video data. Video data provides a rich source of visual information that can be analyzed and processed for a wide range of tasks across various domains. A NN with layer shared architecture as described herein may perform multiple tasks on a common set of video data, for example, object tracking, action recognition, video captioning, video summarizing, and/or emotion recognition. [0014] In another example, a neural network with layer shared architecture as described herein may be used to perform multiple tasks on a common set of audio data. Artificial NNs have been used for audio signal processing. Various types of classification tasks can be performed to extract meaningful information or features from audio signals. A NN with layer shared architecture as described herein may perform multiple tasks on a common set of audio data, for example, speaker identification, instrument identification, language classification, and emotion recognition. [0015] In another example, a neural network with layer shared architecture as described herein may be used to perform multiple tasks on a common set of hyperspectral and/or multispectral analysis data. Hyperspectral and multispectral data are used in earth observation, agriculture, environmental monitoring, and other fields to extract useful information about the Earth’s surface and features. A NN with layer shared architecture as described herein may perform multiple tasks on a common set of hyperspectral and/or multispectral data, for example, classification of land cover types (e.g., forest, crops, water bodies, urban areas, and the like), mineral and/or material identification (e.g., using detected spectral patterns of the Earth’s surface), and water quality assessment (e.g., measurement of parameters such as turbidity, chlorophyll concentration, and/or sediment levels). [0016] In another example, a neural network with layer shared architecture as described herein may be used to perform multiple tasks on a common set of medical imaging data, such as a set of scans (e.g., volumetric) obtained by radiography, magnetic resonance imaging, nuclear imaging, ultrasound, elastography, photoacoustic imaging, tomography, echocardiography, near-infrared spectroscopy, and magnetic particle imaging. Artificial NNs have been used for medical data processing. Various types of classification tasks can be performed to extract meaningful information or features from medical scans. A NN with layer shared architecture as described herein may perform multiple tasks on a common set of medical imaging data, for example, organ identification, organ physiology monitoring, pathology identification (e.g., anatomical, clinical, molecular, tumor, infection, inflammation, fibrotic conditions), pathology monitoring, disease identification (e.g., heart diseases, brain disorders, cancerous tumors), disease monitoring, and treatment monitoring. [0017] In one aspect, the invention is directed to a method for producing digital output in response to a query or command via a neural network having a layer shared architecture (e.g., an intelligent layer shared (ILASH) neural architecture), the method comprising: receiving, by a processor of a computing device, a set of input data (e.g., image data, video data, audio data, sensor data, alphanumeric text data such as natural language electronic documents, hyperspectral data and/or multispectral data); receiving, by the processor, the query or command, wherein the query or command comprises a plurality of different tasks, each said task to be performed with the (same) set of input data; and producing, by the processor, output for each of the plurality of tasks of the query or command using the neural network with the layer shared architecture, wherein the neural network comprises a unique set of layers for each task, each set having one or more base layers in common with the other set(s). [0018] In certain embodiments, the neural network comprises a base set of layers (e.g., base layer shared model) and one or more branches extending from the base set of layers, each said branch consisting of (i) one or more layers from the base set of layers, and (ii) one or more additional layers different from the base set of layers. In certain embodiments, the base set of layers and the one or more branches are each trained to perform a unique task from the plurality of different tasks, each said task to be performed with the (same) set of input data. [0019] In another aspect, the invention is directed to a method of automatically designing a neural network having a layer shared architecture (e.g., an intelligent layer shared (ILASH) neural network), the method comprising: (a) receiving, by a processor of a computing device, a set of tasks and associated dataset(s); (b) receiving, by the processor, one or more specifications for control of the architecture (e.g., one or more search termination conditions, task priorities, layer mutation flags, sub-branch forking flags, and/or predictive forking flags); and (c) automatically conducting, by the processor, a heuristic based search and/or a predictive rapid search using the set of tasks and associated dataset(s) and using the one or more specifications to produce an output layer shared model, wherein the output layer shared model comprises a base set of layers and one or more branches, each trained to perform a unique task from the set of tasks. [0020] In certain embodiments, the method comprises automatically conducting, by the processor, the heuristics based search by: creating a base layer shared model for performing a first task from the set of tasks; determining whether there is at least one additional task in the set of tasks to model; upon determining there is at least one additional task in the set of tasks to model, for said additional selected task, creating a selected branch model by: for each of a plurality of candidate branch models: (i) identifying a candidate branching point and/or multi-connection point from the base layer shared model for the selected task, and (ii) mutating and/or varying one or more layers of the candidate branch model; for each of the plurality of candidate branch models, computing an efficacy value for the (complete) model containing the candidate branch model; and comparing efficacy values computed for the plurality of candidate branch models, and retaining the candidate branch model of the plurality that results in the highest (best) computed efficacy value; and constructing the output layer shared model as the base layer model for performing the first task with one or more branches corresponding to the retained candidate branch models for each of the additional tasks in the set of tasks. [0021] In certain embodiments, the method comprises automatically conducting, by the processor, the predictive rapid search by: creating a base layer shared model for performing a first task from the set of tasks; determining whether there is at least one additional task in the set of tasks to model; upon determining there is at least one additional task in the set of tasks to model, for said additional selected task, creating a selected branch model by: predicting a set of optimal candidate branching points and/or multi-connection points from the base layer shared model; for each of a plurality of candidate branching points and/or multi-connection points, mutating and/or varying one or more layers to create a candidate branch model; for each of the plurality of candidate branch models, computing an efficacy value for the (complete) model containing the candidate branch model; and comparing efficacy values computed for the plurality of candidate branch models, and retaining the candidate branch model of the plurality that results in the highest (best) computed efficacy value; and constructing the output layer shared model as the base layer model for performing the first task with one or more branches corresponding to the retained candidate branch models for each of the additional tasks in the set of tasks. [0022] In certain embodiments, the method comprises automatically selecting, by the processor, a first task from the set of tasks for use in creating the base set of layers for performing the first task (e.g., performing this automatic selection of the first task from the set of tasks using a neural network dedicated for this determination). [0023] In certain embodiments, the method comprises automatically constructing, by the processor, the base set of layers by: creating an initial [e.g., untrained (e.g., having layer weights with initialized values)] autoencoder model; training the initial autoencoder model using the associated dataset(s) (e.g., to determine / learn values of the layer weights) to generate a trained autoencoder model (e.g., initialize the autoencoder model to produce output data similar to its input data); and identifying at least a portion of the trained autoencoder for use as the base set of layers of the output layer shared model. In certain embodiments, the method comprises, for each of the unique tasks from the set of tasks, adding a particular candidate branch model to the base set of layers at a particular branch location by automatically conducting, by the processor, the heuristic based search and/or the predictive rapid search using the set of tasks and associated dataset(s) (e.g., using the search to identify the particular candidate branch model from a plurality of candidate branch models and/or identify the particular branching point) [e.g., while keeping the base layer weights set or, alternatively, allowing the base layer weights to change]. [0024] In certain embodiments, the set of input data comprises image data and the plurality of tasks comprises at least one member selected from the group consisting of age prediction, person recognition, and mood detection. [0025] In certain embodiments, the set of input data comprises image data (e.g., scene data) and the plurality of tasks comprises at least one member selected from the group consisting of edge detection, segmentation, and depth detection. [0026] In certain embodiments, the set of input data comprises video data and the plurality of tasks comprises at least one member selected from the group consisting of object tracking, action recognition, video captioning, video summarizing, and emotion recognition. [0027] In certain embodiments, the set of input data comprises audio data and the plurality of tasks comprises at least one member selected from the group consisting of speaker identification, instrument identification, language classification, and emotion recognition. [0028] In certain embodiments, the set of input data comprises hyperspectral and/or multispectral data and the plurality of tasks comprises at least one member selected from the group consisting of land cover type classification (e.g., forest, crops, water bodies, urban areas), mineral and/or material identification (e.g., using detected spectral patterns of the Earth’s surface), agricultural monitoring (e.g., plant health, chlorophyll content, nutrient content, vitamin content), and water quality assessment (e.g., measuring parameters such as water turbidity, chlorophyll content, and/or sedimentation levels). [0029] In certain embodiments, the set of input data comprises electrocardiogram (ECG) data and the plurality of tasks comprises at least one member selected from the group consisting of heart attack classification (and/or risk determination), abnormal heartbeat identification, extent of heart damage identification, location of heart damage identification, heart rhythm disturbance detection, detection of heart blockage and/or conduction problems, detection of electrolyte disturbance and/or intoxication, detection of ischemia and/or infarction, and detection of heart structural change. [0030] In certain embodiments, the set of input data comprises medical imaging data, and the plurality of tasks comprises at least one member selected from the group consisting of organ identification, organ physiology monitoring, pathology identification (e.g., anatomical, clinical, molecular, tumor, infection, inflammation, fibrotic conditions), pathology monitoring, disease identification (e.g., heart diseases, brain disorders, cancerous tumors), disease monitoring, and treatment monitoring. [0031] In another aspect, the invention is directed to a system comprising a processor of a computing device and memory having instructions stored thereon, which, when executed by the processor, cause the processor to perform one or more of the methods described herein. [0032] Any two or more of the features described in this specification, including in this summary section, may be combined to form implementations of the disclosure, whether specifically expressly described as a separate combination in this specification or not. BRIEF DESCRIPTIONS OF THE DRAWINGS [0033] The foregoing and other objects, aspects, features, and advantages of the present disclosure will become more apparent and better understood by referring to the following descriptions taken in conjunction with the accompanying drawings, in which: [0034] FIG.1 is a block diagram of an exemplary architecture of a neural network with layer shared architecture for performing two tasks, according to an illustrative embodiment. [0035] FIG.2 is a block diagram of an exemplary architecture of a neural network with layer shared architecture for performing three tasks, according to an illustrative embodiment. [0036] FIG.3 is a block diagram of an exemplary architecture of a neural network with layer shared architecture for performing four tasks, according to an illustrative embodiment. [0037] FIG.4 is a block diagram of an exemplary architecture of a neural network with layer shared architecture for performing four tasks, according to an illustrative embodiment. [0038] FIG.5A is a block diagram of an exemplary architecture of a neural network with layer shared architecture for performing four tasks, according to an illustrative embodiment. [0039] FIG.5B illustrates a method of producing digital output(s) in response to a query or command via a neural network having a layer shared architecture, according to aspects of the present disclosure. [0040] FIG.6A illustrates a method of automatically designing a neural network having a layer shared architecture, according to aspects of the present disclosure. [0041] FIG.6B illustrates a method of automatically designing a neural network having a layer shared architecture, according to aspects of the present disclosure. [0042] FIG.6C is a block diagram of an exemplary method of designing a neural network, according to an illustrative embodiment. [0043] FIG.6D is a block diagram of an exemplary method of designing a neural network based on a heuristics search, according to an illustrative embodiment. [0044] FIG.6E is a block diagram of an exemplary method of designing a neural network based on a predictive rapid search, according to an illustrative embodiment. [0045] FIG.7 is a block diagram of an exemplary architecture of a neural network for an image recognition task, according to an illustrative embodiment. [0046] FIG.8A is a block diagram of an exemplary architecture of an autoencoder neural network, according to an illustrative embodiment. [0047] FIG.8B is a block diagram of an exemplary architecture of a neural network with layer shared architecture based on an autoencoder architecture, according to an illustrative embodiment. [0048] FIG.8C illustrates a method of automatically designing a neural network having a layer shared architecture, according to aspects of the present disclosure. [0049] FIG.9 is a diagram of exemplary electrocardiography features of a heart in normal sinus rhythm. [0050] FIG.10 is a block diagram of an exemplary method for performing multiple tasks related ECG signal data using a neural network with layer shared architecture based on an encoder, according to an illustrative embodiment. [0051] FIG.11 is a block diagram of an exemplary cloud computing environment, used in certain embodiments. [0052] FIG.12 is a block diagram of an example computing device and an example mobile computing device used in certain embodiments. [0053] The features and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. DETAILED DESCRIPTION [0054] It is contemplated that systems, architectures, devices, methods, and processes of the claimed invention encompass variations and adaptations developed using information from the embodiments described herein. Adaptation and/or modification of the systems, architectures, devices, methods, and processes described herein may be performed, as contemplated by this description. [0055] Throughout the description, where articles, devices, systems, and architectures are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are articles, devices, systems, and architectures of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps. [0056] It should be understood that the order of steps or order for performing certain action is immaterial so long as the invention remains operable. Moreover, two or more steps or actions may be conducted simultaneously. [0057] The mention herein of any publication, for example, in the Background section, is not an admission that the publication serves as prior art with respect to any of the claims presented herein. The Background section is presented for purposes of clarity and is not meant as a description of prior art with respect to any claim. [0058] Documents are incorporated herein by reference as noted. Where there is any discrepancy in the meaning of a particular term, the meaning provided in the Definition section above is controlling. [0059] Headers are provided for the convenience of the reader – the presence and/or placement of a header is not intended to limit the scope of the subject matter described herein. A. Layer Shared Neural Architecture [0060] In certain embodiments, building a layer shared neural architecture comprises the following steps: (1) building a base model to perform one of the tasks, (2) extending a base model to perform other tasks by branching out from the existing base model, (3) making multiple connection points between two or more branches to increase feature cross- pollination, and (4) making strategic hyperparameter mutations to different branch layers to make the overall network more efficient and accurate. [0061] To build a Layer Shared Neural Network (LSNN) Architecture, a base model serving one of the tasks is first chosen. This base model may also be used as a template for the branches associated with other tasks. A repository of standard neural network models may be used to decide on the base model and/or one or more branches. In other words, if an LSNN is built for tasks, T = {t₁, t₂, t₃, ..., t_n}, then a task t_i is first selected. Next, from a database of standard model architectures M = {m₁, m₂, m₃, ..., m_k}, one model m_j (e.g., the most optimal) for task t_i is selected. The selected model m_j may be further adjusted to fit the role (e.g., by adjusting its architecture, hyperparameters, weights, by training), and the base layer shared model (LS) is then created. Various methods and/or algorithms may be used to select which task is tackled first and to determine a template model suitable for the first task. For example, a first task may be randomly selected. For example, a heuristic-based algorithm and/or a predictive algorithm may be used. [0062] After creating a base model, the model is further expanded to perform other tasks. For example, for each task ^ T |t_o ≠ t_i, a layer in a base model is determined from where the model branches out by adding one or more layers associated with performing the given task. The layers in the branch may be based off (e.g., resemble in quantity, architecture, type) the base model layers as shown in FIG.1 for Task-1 and Task-2. As shown in FIG.2, every task may be associated with a different branching point with respect to a base layer. The model may be updated (e.g., by adjusting its hyperparameters, weights, by training) after adding at least one branch. For example, as shown in FIG.3 for branches associated with Task-3 and Task-4, a branch may start (e.g., fork) from another branch and not necessarily form the base model. While branching out to support additional tasks, branches may be connected to a base layer and to each other at multiple points. For example, as shown in FIG.4, a branch associated with Task-3 is connected to a branch associated with Task-1 (e.g., a base layer) at multiple points. A layer in a branch may be designed based on another layer in the model (e.g., from another branch, from a base model). A designed layer may have mutations and/or variations (e.g., hyperparameters) with respect to an original layer. For example, as shown in FIG.5A, two similar pooling layers in branches associates with Task-1 and Task-3 have different hyperparameters. The mutations and/or variations may be determined at least partly based on various methods, based on various task (e.g., as related to performance of the model in a given task). [0063] FIG.5B illustrates a method 500 of producing digital output(s) in response to a query or command via a neural network having a layer shared architecture, according to aspects of the present disclosure. At step 502, method 500 includes receiving, by a processor of a computing device, a set of input data. In some embodiments, the input data may include image data, video data, audio data, sensor data, alphanumeric text data such as natural language electronic documents, hyperspectral data and/or multispectral data. At step 504, method 500 includes receiving, by the processor, the query or command. In some embodiments, the query or command includes a plurality of different tasks, each task to be performed with the (same) set of input data. At step 506, method 500 includes producing, by the processor, output for each of the plurality of tasks of the query or command using the neural network with the layer shared architecture. In some embodiments, the neural network includes a unique set of layers for each task, each set having one or more base layers in common with the other set(s). B. Design of Neural Networks Having Layer Shared Architecture [0064] FIG.6A illustrates a method 600 of automatically designing a neural network having a layer shared architecture, according to aspects of the present disclosure. In some embodiments, the layer shared architecture is an intelligent layer shared (ILASH) neural network. At step 602, method 600 includes receiving, by a processor of a computing device, a set of tasks and associated datasets. At step 604, method 600 includes receiving, by the processor, one or more specifications for control of the architecture. In some embodiments, one or more specifications may include one or more search termination conditions, task priorities, layer mutation flags, sub-branch forking flags, and/or predictive forking flags. At step 606, method 600 includes automatically conducting, by the processor, a heuristic based search and/or a predictive rapid search using the set of tasks and associated dataset(s,) and using the specifications to produce an output layer shared model. In some embodiments, the output layer shared model includes a base set of layers and one or more branches, each trained to perform a unique task from the set of tasks. At step 608, in connection with the heuristic based search, and upon determining there is at least one additional task in the set of tasks to model, method 600 may include creating a base layer shared model for performing a first task from the set of tasks. [0065] Referring still to FIG.6A, at step 610, in connection with the heuristic based search, method 600 may include determining whether there is at least one additional task in the set of tasks to model. At step 612, in connection with the heuristic based search, method 600 may include creating a selected branch model by: (at step 614) (i) identifying a candidate branching point and/or multi-connection point from the base layer shared model for the selected task, and (ii) mutating and/or varying one or more layers of the candidate branch model. At step 616, in connection with creating a selected branch model, method 600 may include: for each branch candidate, computing an efficacy value for the (complete) model containing the candidate branch model. At step 618, in connection with creating a selected branch model, method 600 may include: comparing efficacy values computed for the plurality of candidate branch models, and retaining the candidate branch model of the plurality that results in the highest (best) computed efficacy value. At step 620, in connection with the heuristic based search, method 600 may include constructing the output layer shared model as the base layer model for performing the first task with one or more branches corresponding to the retained candidate branch models for each of the additional tasks in the set of tasks. [0066] FIG.6B illustrates a method 630 of automatically designing a neural network having a layer shared architecture, according to aspects of the present disclosure. Method 630 illustrated in FIG.6B is similar to that of FIG.6A, but employs prediction of optimal branching points, with the following steps 632, 634, and 636 of FIG.6B replacing steps 614 and 616 of FIG.6A. At step 632, in connection with creating a selected branch model, method 630 may include predicting a set of optimal candidate branching points and/or multi- connection points from the base layer shared model. At step 634, in connection with creating a selected branch model, method 630 may include: for each of a plurality of candidate branching points and/or multi-connection points, mutating and/or varying one or more layers to create a candidate branch model. At step 636, in connection with creating a selected branch model, method 630 may include: for each of the plurality of candidate branch models, computing an efficacy value for the (complete) model containing the candidate branch model. In some embodiments, method 630 may further include automatically selecting, by the processor, a first task from the set of tasks for use in creating the base set of layers for performing the first task. In some embodiments, performing the first task may include performing an automatic selection of the first task from the set of tasks using a neural network dedicated for this determination. [0067] An exemplary embodiment of a method for designing neural networks with layer shared architecture, called Intelligent Layer Shared (ILASH) Neural Architecture Search, is presented. As shown in FIG.6C, the ILASH designs LSNNs based on: (1) a set of tasks and associated datasets and (2) specifications to control the behavior of ILASH. The specifications may comprise search termination conditions, task priorities, and various flags to, for example, enable or disable layer mutations, sub-branch forking, predictive forking. ILASH may comprise various search methods that determine an optimal configuration of the neural network. [0068] In certain embodiments, search methods comprise a heuristics based search process. An exemplary embodiment of the process is presented in FIG.6D. The process first determines an initial model for Task - 1 (main branch), for example, by selecting from a set of existing traditional neural network architectures. This model is then trained and prepared for augmenting support to other tasks. For each other task, the process iterates over all possible candidate branching options in the existing network, explores different multi-connection possibilities, mutates layers as necessary, outputting a combination based on its performance or model efficacy. Once the best branching conditions for supporting a task are obtained, the branching conditions are permanently added to the base model before moving to the next task. The candidate branching points are determined based on a set of specifications provided as an input to ILASH. Once all the branches are added into the LSNN, the final model is provided as an output. Heuristics based search may lead to a thorough search, but it may also be time consuming depending on tasks, specifications, and other parameters. [0069] In certain embodiments, search methods comprise a predictive rapid hybrid approach. An exemplary embodiment of the predictive rapid hybrid approach is shown in FIG.6E. In this algorithm, the initial model is first created. Next, the model is augmented towards supporting the other tasks. Instead of exhaustive search, the approach uses a predictive branching model to suggest a set of potential branching and multi-connection points for the given task. Next for all the shortlisted candidates, mutations are performed, again based on a predictive mutation model. The model (e.g., the best in terms of performance or efficacy) is searched from these shortlisted predicted combinations towards ascertaining the optimal one. Before moving to the next task, the changes are made permanent to the base model. Once all the branches are added into the LSNN, the final model is provided as an output. [0070] A pilot study was conducted to evaluate the methodology presented herein. According to the methodology, three datasets were considered: UTKFace (A. Das et al., 2018), MTFL (Z. Zhang et al., 2014) and CelebA (Z. Liu et al., 2018) to train and evaluate ILASH heuristic, predictive search model and also for a commercially available method for automatic machine learning, called Autokeras. These datasets consist of facial data with multiple labels: UTKFace has Age, Race, and Gender labels; MTFL has Gender, Smiling or not, Wearing glass or not, and different Head poses labels; CelebA has 40 facial attributes but in the study, 5 different types of tasks is considered: Gender, Hair Color, Hair Style, Smiling or not, and Nose. Thus, all the labels share common features that are utilized. The datasets were split into training, validation, and testing using 80%, 10%, and 20%, respectively. Then the models were trained up to 50 successive epochs and used with a batch size of 32. Adam optimizer was used with an initial learning rate of 0.001 and if there was no improvement in validation accuracy for five consecutive epochs, the learning rate was reduced by 75% of its value. If there was still no improvement in validation accuracy after 10 epochs, the training session was stopped automatically. The experiments have achieved competitive performance for all tasks using both heuristic-based and predictive- based models. As shown in Table 1, for the UTKface dataset, the ILASH achieved 91.53%, 87.82% testing accuracy for Gender and Race, and an r2 score of 0.796 for Age when trained on heuristic-base. In contrast, Autokeras achieved 91.89%, 86.12% testing accuracy for Gender and Race, and an r2 score of 0.862 for Age. Although the performance of ILASH is slightly lower than Autokeras for heuristic-based search, it outperforms Autokeras in predictive-based search. As shown in Table 1, predictive ILASH takes 0.269 hours of training time and Autokeras takes 25.41 hours of training time for the UTKface dataset. The results in ILASH consume 0.070 KWh Power Utilization Effectiveness (PUE) and emit 0.067 lb CO2 whereas Autokeras consume 5.530 KWh-PUE and emit 5.275 lbs CO2. The PUE p_t (KWh) and CO₂(lb) emission were calculated by the equations conducted in E. Strubell et al.2019: ^^^^ ^{1.58 ^^^^( ^^^^ ^^^^+ ^^^^ ^^^^+ ^^^^ ^^^^ ^^^^)} ^_^^^ = ₁₀₀₀ , ^^^^ ^^^^₂ ^^^^ = 0.954 ^^^^ _^^^^, where p_c is average power draw from all CPU sockets, p_r is average power draw from all DRAM sockets, and, gp_g is average power draw from all GPU sockets while training. [0071] For the Inference result shown in Table 2, the ILASH has lower PUE and less CO2 emission than the Autokeras for both heuristic and predictive searches. The results show that the predictive ILASH performed well and achieved a satisfactory accuracy/r2 score for all tasks. Additionally, the approach resulted in a notable decrease in power consumption, which not only saves costs but also has a positive environmental effect by reducing the carbon footprint associated with machine learning training. FIG.7 shows the predictive ILASH model for the UTKFace dataset. Attorney Docket No.2010363-0426 Table 1: Comparison of training results between ILASH and Autokeras.

- 22 - 12233953v1

Attorney Docket No.2010363-0426

Table 2: Comparison of inferencing results between ILASH and Autokeras.

- 23 - 12233953v1

Attorney Docket No.2010363-0426

- 24 - 12233953v1

C. Autoencoder-based Layer Shared Neural Networks [0072] Autoencoders are neural networks that have been trained to recreate their input data. This is typically accomplished by compressing the input into a lower-dimensional representation and thereafter recreating it back to its original form. When an autoencoder is trained to reconstruct data, it learns to extract meaningful features from the input that can be used for other tasks. Shared layers, on the other hand, are layers in a neural network that are reused across multiple inputs or tasks. By sharing layers, the network can learn to recognize common patterns and features in different types of data, leading to better performance and faster training times. [0073] An important aspect of constructing a machine learning model involves the careful selection of an appropriate architecture. One possible approach is to design the architecture from scratch, while another involves exploring pre-existing models that have undergone training and been employed in analogous scenarios. During the training phase, the search for optimal parameters can commence either from a randomly initialized set of values or, alternatively, based on prior model training experiences. In this methodology, it is possible to utilize a single architecture to address multiple tasks, for example, that involve similar/same data. This approach can be advantageous as it reduces the computational resources needed to train the model by enabling shared learning of features across multiple tasks. [0074] In certain embodiments, instead of using separate architectures to complete each classification task (Task 1, Task 2, Task 3,…, Task n), a common autoencoder architecture is built as shown in FIG.8A. The network is trained using the entire dataset as the first step. The goal of training is to make the autoencoder learn the weights that map the input data to its compressed representation and back to its original form. Herein, an autoencoder is used with a bottleneck layer to extract features from the input data. The learned weights are then used as the initial weights for “n” branches of the network as shown in Fig.8B. After initializing the network with the learned weights, the second step is to divide the autoencoder layer into two parts and use at least a part of the encoder portion. After that, each branch is trained for its specific classification task using, for example, a softmax /sigmoid activation layer for classification or linear activation for continuous values. During this phase, the weights of the encoder layers may be kept frozen, and only the weights of the task-specific layers are updated. The accuracy of the model may be checked at this stage. If it is required to improve the accuracy, the encoder layers are unfrozen until the accuracy reaches an acceptable level. Therefore, the unfrozen layers of the encoders are added to the trainable layer domain. The output of each branch corresponds to the probability distribution over the labels for the corresponding task, which can be used to make predictions on new data. Overall, this methodology may allow for efficient multi-task learning using a shared representation of the input data. The output of each branch corresponds to the probability distribution over the labels for the corresponding task. [0075] FIG.8C illustrates a method 800 of automatically designing a neural network having a layer shared architecture via use of an autoencoder, according to aspects of the present disclosure. Method 800 illustrated in FIG.8C is similar to that of FIG.6A, but employs an autoencoder, with the following steps 802, 804, 806, and 808 of FIG.8C replacing steps 608-620 of FIG.6A. At step 802, method 800 may include creating an autoencoder model. In some embodiments, the model be an initial, untrained model, for example, having layer weights with initialized values. At step 804, method 800 may include training the initial autoencoder model using the associated dataset(s). In some embodiments, training the initial autoencoder model may include doing so to determine / learn values of the layer weights, for example to generate a trained autoencoder model (that is, initializing the autoencoder model to produce output data similar to its input data). At step 806, method 800 may include identifying at least a portion of the trained autoencoder for use as the base set of layers of the output layer shared model. At step 808, method 800 may adding a particular candidate branch model to the base set of layers at a particular branch location by automatically conducting, by the processor, the heuristic based search and/or the predictive rapid search using the set of tasks and associated dataset(s). In some embodiments, step 808 may include using the search to identify the particular candidate branch model from a plurality of candidate branch models, and/or to identify the particular branching point (for example, while keeping the base layer weights set or, alternatively, allowing the base layer weights to change). [0076] The above methodology has several advantages over traditional methods that use separate architectures for each classification task. It may allow for better sharing of information across tasks, leading to improved accuracy and generalization. It may also reduce the overall complexity and power consumption of the system that implements the methodology. [0077] As an example, the UTKFace dataset was used to train the autoencoder model, and the learned weights were used to initialize three branches of the network. The branches consisted of fully connected layers that perform specific tasks, such as Gender classification, Age prediction, and Race classification. The autoencoders encoder layers are shared among the branches to extract features learned during training. During training, the weights of the autoencoder encoder layers were frozen, and only the weights of the fully connected layers were updated. Subsequently, an evaluation of the models accuracy was conducted. A competitive accuracy and r2 score was achieved for all tasks using a single architecture with comparatively lesser power consumption and CO2 emission. As shown in Table 3 and 4, the experiments demonstrate that the total power consumption (by both CPU and GPU) required to train and inference autoencoder model to predict age, gender or race is significantly lower than the power consumption when trained and inference with the traditional approach VGG16. Evidently, the proposed methodology uses nearly 1/6 computational power (CPU + GPU) than that of the traditional approach. Table 3: Comparison of training result between Autoencoder and traditional model.

Table 4: Comparison of inferencing result between Autoencoder and traditional model.

D. Example: Multiple Tasks Related to ECG Signal Data [0078] This section provides an example of a use case of a neural network with layer shared architecture to perform multiple tasks on a common set of medical data, specifically ECG signal data. [0079] A neural network with layer shared architecture as described herein may be used to perform multiple tasks on a common set of medical data, such as electrocardiogram (ECG) data. Artificial NNs may be used for ECG signal processing. Various types of classification tasks can be performed to extract meaningful information or features from ECG signals. A NN with layer shared architecture as described herein may perform multiple tasks on a common set of ECG data, such multiple tasks including, for example, identification of heart attack probability, abnormal heartbeat classification, extent of heart damage, location of heart damage, various rhythm disturbances (e.g., atrial fibrillation, atrial flutter, premature atrial contraction, premature ventricular contraction, sinus arrhythmia, sinus bradycardia, sinus tachycardia, sinus pause, sinoatrial arrest, sinus node dysfunction, bradycardia- tachycardia syndrome, supraventricular tachycardia, polymorphic ventricular tachycardia, wide complex tachycardia, pre-excitation syndrome, J wave); heart block and conduction problems (e.g., aberration, sinoatrial block, AV node, right bundle, left bundle, QT syndrome, right and left atrial abnormality); electrolytes disturbances and intoxication (e.g., digital intoxication, calcium: hypocalcemia and hypercalcemia, potassium: hypokalemia and hyperkalemia, serotonin toxicity); ischemia and infarction (e.g., Wellens’ syndrome, de Winter T waves, ST elevation and ST depression, High Frequency QRS changes, myocardial infarction); and/or structural changes (e.g., acute pericarditis, right and left ventricular hypertrophy, right ventricular strain). [0080] For example, cardiac arrhythmia is a medical condition characterized by irregular heart rates, manifesting as either excessively slow or rapid beats. This irregularity results from the disruption of proper electrical impulses responsible for coordinating heart contractions. Certain perilous arrhythmic patterns may lead to sudden cardiac death. Recognizing the necessity for prompt identification of cardiac arrhythmias, there is a clear demand for automated methods utilizing computer-assisted decision-making processes. Moreover, alongside the analysis of ECG data for regression purposes, the classification of electrocardiogram (ECG) signals assumes a significant role in the accurate diagnosis of cardiovascular disorders. [0081] While ECG data as an image could be used as an input to our system directly, it is also possible to use features such as the P-wave, QRS complex and T-wave etc. as numeric inputs to our system and predict if a patient is prone to heart attacks or abnormal heartbeats. The patterns on the ECG may also help determine which part of the heart has been damaged, as well as the extent of the damage. Regardless of the problem being a regression or a classification, analogously as with the image inputs, an autoencoder is trained to reconstruct the final outcome of the ECG features. The learned encoder is then used along with additional layers and proper activation functions to predict the classification or regression outcome. [0082] FIG.9 demonstrates an ECG signal of a human heart in normal sinus rhythm with various features of the signal marked, as adapted from Algarni, Abeer D., Naglaa F. Soliman, Hanaa A. Abdallah, and Fathi E. Abd El-Samie, "Encryption of ECG signals for telemedicine applications", Multimedia Tools and Applications 80 (2021): 10679-10703. In certain embodiments, processed ECG data (e.g., extracted features, such as the P wave, PR interval, QRS complex, J-point, ST segment, T-wave, corrected QT interval, U wave) may be used as an input for a NN with layer shared architecture to assess various heart conditions. For example, a tabular form ECG signal may be input into a pre-trained neural encoder network. The network leverages learned representations to predict the presence of arrhythmia and assess the underlying heart condition, offering a comprehensive diagnostic outcome. [0083] Turning again to FIG.9, as discussed in Algarni et al. referenced above, the P wave corresponds to depolarization of the heart atria and typically occurs in the first 80 ms of the heart wave. A typical P wave shape is upright and its inversion may indicate an ectopic atrial pacemaker. Unusually long duration of the P wave may represent atrial enlargement. The PR interval is defined from the beginning of the P wave to the beginning of the QRS complex and typically lasts between 120 and 200 ms. A PR interval shorter than 120 ms may indicate that the electrical impulse is bypassing the atrioventricular (AV) node that occurs in Wolf-Parkinson-White syndrome. A PR interval longer than 200 ms may be related to atrioventricular block. The PR segment typically has a flat shape with any deviations typically associated with pericarditis. The QRS complex typically lasts from 80 to 100 ms and has a much larger amplitude as compared to the P wave because it corresponds to rapid depolarization of the left and right ventricles. A QRS complex with a duration longer than 120 ms may be associated with disruption of the heart’s conduction system, such as right bundle branch block, left bundle branch block, ventricular rhythms, or with metabolic issues, such as sever hyperkalemia, tricyclic antidepressant overdose. A QRS complex with unusually large amplitude may indicate left ventricular hypertrophy, whereas a low-amplitude QRS complex may be associated with a pericardial effusion or infiltrative myocardial disease. The J-point marks the end of the QRS complex and the beginning of the ST segment. A separate J wave appearance may indicate pathognomonic of hypothermia or hypercalcemia. The ST segment straddles between the QRS complex and the T wave. The ST segment typically has no amplitude whereas any (e.g., non-zero) amplitude may be associated with myocardial infarction or ischemia. Negative amplitudes of the ST with respect to ECG baseline may also be caused by digoxin or left ventricular hypertrophy. Positive amplitudes of the ST with respect to ECG baseline may also be caused by pericarditis or Brugada syndrome. The T wave corresponds to the repolarization of the ventricles and lasts around 160 ms. An inverted T wave may be associated with myocardial ischemia, left ventricular hypertrophy, high intracranial pressure, or metabolic abnormalities. The QT interval is defined from the beginning of the QRS complex to the end of the T wave. The corrected QT (QTc) is obtained by diving the QT by the square root of the RR interval (i.e., time between two successive R waves) and typically lasts less than 440 ms. Prolonged QTc interval may associated with a risk factor of ventricular tachyarrhythmia and sudden death. The U wave is related to the repolarization of the interventricular septum. The U wave often has a low or even zero amplitude. A prominent U wave may be associated with hypokalemia, hypercalcemia, or hyperthyroidism. [0084] Thus, ECG data (e.g., extracted features, such as the P wave, PR interval, QRS complex, J-point, ST segment, T-wave, corrected QT interval, U wave) may be used as input for a NN with layer shared architecture to assess various heart conditions. [0085] FIG.10 demonstrates a flow diagram for a method that includes extracting features from a digital ECG signal and using these extracted features as input for a NN with layered shared architecture. The NN comprises a learned encoder model with multiple branches to complete multiple tasks (e.g., heart attack prediction and arrhythmia detection). Examples of NNs with such layer shared architectures comprising multiple branches are discussed herein in more detail. [0086] In FIG.10, a digital ECG signal is preprocessed and features are extracted. An autoencoder model is trained to reconstruct the extracted ECG features. At least a portion of the trained autoencoder may be used along with additional layers and activation functions to produce output that satisfies multiple tasks (e.g., makes multiple predictions) from the common set of input ECG signal data. In this example, the multiple tasks include performing a binary classification of a predicted heart attack, and arrhythmia detection. E. Software, Computer System, and Network Environment [0087] Certain embodiments described herein make use of computer algorithms in the form of software instructions executed by a computer processor. In certain embodiments, the software instructions include a machine learning module, also referred to herein as artificial intelligence software. As used herein, a machine learning module refers to a computer implemented process (e.g., a software function) that implements one or more specific machine learning algorithms, such as an artificial neural network (ANN), random forest, decision trees, support vector machines, and the like, in order to determine, for a given input, one or more output values. In certain embodiments, the input comprises alphanumeric data which can include numbers, words, phrases, or lengthier strings, for example. In certain embodiments, the one or more output values comprise values representing numeric values, words, phrases, or other alphanumeric strings. In certain embodiments, the one or more output values comprise an identification of one or more response strings (e.g., selected from a database). [0088] In certain embodiments, machine learning modules implementing machine learning techniques are trained, for example using datasets that include categories of data described herein. Such training may be used to determine various parameters of machine learning algorithms implemented by a machine learning module, such as weights associated with layers in neural networks. In certain embodiments, once a machine learning module is trained, e.g., to accomplish a specific task such as identifying certain response strings, values of determined parameters are fixed and the (e.g., unchanging, static) machine learning module is used to process new data (e.g., different from the training data; e.g., infer a result) and accomplish its trained task without further updates to its parameters (e.g., the machine learning module does not receive feedback and/or updates). In certain embodiments, machine learning modules may receive feedback, e.g., based on automated review of accuracy or human user review of accuracy, and such feedback may be used as additional training data, to dynamically update the machine learning module. In certain embodiments, two or more machine learning modules may be combined and implemented as a single module and/or a single software application. In certain embodiments, two or more machine learning modules may also be implemented separately, e.g., as separate software applications. A machine learning module may be software and/or hardware. For example, a machine learning module may be implemented entirely as software, or certain functions of a ANN module may be carried out via specialized hardware (e.g., via an application specific integrated circuit (ASIC), field programmable gate arrays (FPGAs)). [0089] In certain embodiments, machine learning modules implementing machine learning techniques may be composed of individual nodes (e.g. units, neurons). A node may receive a set of inputs that may include at least a portion of a given input data for the machine learning module and/or at least one output of another node. A node may have at least one parameter to apply and/or a set of instructions to perform (e.g., mathematical functions to execute) over the set of inputs. In certain embodiments, node instructions may include a step to provide various relative importance to the set of inputs using various parameters, such as weights. The weights may be applied by performing scalar multiplication (e.g., or other mathematical function) between a set of inputs values and the parameters, resulting in a set of weighted inputs. In certain embodiments, a node may have a transfer function to combine the set of weighted inputs into one output value. A transfer function may be implemented by a summation of all the weighted inputs and the addition of an offset (e.g., bias) value. In certain embodiments, a node may have an activation function to introduce non-linearity into the output value. Non-limiting examples of the activation function include Rectified Linear Activation (ReLu), logistic (e.g., sigmoid), hyperbolic tangent (tanh), and softmax. In certain embodiments, a node may have a capability of remembering previous states (e.g., recurrent nodes). Previous states may be applied to the input and output values using a set of learning parameters. [0090] A layer is a building block in a deep learning architecture composed of nodes. In particular, a layer is a set of nodes that receives data input (e.g., weighted or non-weighted input), transforms it (e.g., by carrying out instructions, e.g., applying a set of functions e.g., linear and/or non-linear functions), and passes transformed values as output (e.g., to the next layer). In certain embodiments, the set of nodes in a particular layer may share the same parameters and instructions without interacting with each other. A machine learning module may be composed of at least one layer (e.g., ordered). Examples of types of layers include convolutional layers (e.g., layers with a kernel, a matrix of parameters that is slid across an input to be multiplied with multiple input values to reduce them to a single output value); fully connected (FC) layers (e.g. all nodes are connected to all outputs of the previous layer); recurrent layers, long/short term memory (LSTM) layers, gated recurrent unit (GRU) layers (e.g., nodes with the various abilities to memorize and apply their previous inputs and/or outputs); batch normalization (BN) layers (e.g., layers that normalize a set of outputs from another layer, allowing for more independent learning of individual layers); activation layer (e.g., layers with nodes that only contain an activation function); (un)pooling layers [e.g., layers that reduce (increase) dimensions of an input by summarizing (splitting) input values in defined patches). [0091] In certain embodiments, the performance of a machine learning module may be characterized by its ability to produce an output data that reproduces an input data with specific accuracy. To achieve specific accuracy, a training process is performed to find optimal parameters, such as weights, for every node in every layer of the machine learning module. In certain embodiments, the training process of a machine learning module may involve using output data to calculate an objective function (e.g., cost function, loss function, error function) that needs to be optimized (e.g., minimized, maximized). For example, a machine learning objective function may be a combination of a loss function and regularization parameter. The loss function is related to how well the output is able to predict the input. The loss function may take various forms, like mean squared error, mean absolute error, binary cross-entropy, categorical cross-entropy, for example. The regularization term may be needed to prevent overfitting and improve generalization of the training process. Typical regularization techniques include L1 Regularization or Lasso Regression, L2 Regularization or Ridge Regression, and Dropout (e.g., dropping layer outputs at random during training process). [0092] In certain embodiments, objective function optimization of a machine learning module may involve finding at least one (e.g., all) of the present global optima (e.g., as opposed to local optima). A typical algorithm for objective function optimization follows principles of mathematical optimization for a multi-variable function and relies on achieving specific accuracy of the process. Examples of objective function optimization algorithms include gradient descent, nonlinear conjugate gradient, random search, Levenberg-Marquardt algorithm, limited-memory Broyden-Fietcher-Goldfarb-Shanno algorithm, pattern search, basin hopping method, Krylov method, Adam method, genetic algorithm, particle swarm optimization, surrogate optimization, and simulated annealing. [0093] In certain embodiments, available input data includes training data and validation data, e.g., where the validation data is separate and non-overlapping with the training data. Training data is used during the training process to optimize a model, whereas validation data is used to check the accuracy of the model while operating on previously unseen data. In certain embodiments, training data is divided into batches (e.g., portions) that is sequentially used (e.g., in random order) as sets of inputs to train a model. In certain embodiments, a model is trained multiple times (e.g., epochs) on the entire set of training data. [0094] As shown in FIG.11, an implementation of a network environment 1100 for use in providing systems, methods, and architectures as described herein is shown and described. In brief overview, referring now to FIG.11, a block diagram of an exemplary cloud computing environment 1100 is shown and described. The cloud computing environment 1100 may include one or more resource providers 1102a, 1102b, 1102c (collectively, 1102). Each resource provider 1102 may include computing resources. In some implementations, computing resources may include any hardware and/or software used to process data. For example, computing resources may include hardware and/or software capable of executing algorithms, computer programs, and/or computer applications. In some implementations, exemplary computing resources may include application servers and/or databases with storage and retrieval capabilities. Each resource provider 1102 may be connected to any other resource provider 1102 in the cloud computing environment 1100. In some implementations, the resource providers 1102 may be connected over a computer network 1108. Each resource provider 1102 may be connected to one or more computing device 1104a, 1104b, 1104c (collectively, 1104), over the computer network 1108. [0095] The cloud computing environment 1100 may include a resource manager 1106. The resource manager 1106 may be connected to the resource providers 1102 and the computing devices 1104 over the computer network 1108. In some implementations, the resource manager 1106 may facilitate the provision of computing resources by one or more resource providers 1102 to one or more computing devices 1104. The resource manager 1106 may receive a request for a computing resource from a particular computing device 1104. The resource manager 1106 may identify one or more resource providers 1102 capable of providing the computing resource requested by the computing device 1104. The resource manager 1106 may select a resource provider 1102 to provide the computing resource. The resource manager 1106 may facilitate a connection between the resource provider 1102 and a particular computing device 1104. In some implementations, the resource manager 1106 may establish a connection between a particular resource provider 1102 and a particular computing device 1104. In some implementations, the resource manager 1106 may redirect a particular computing device 1104 to a particular resource provider 1102 with the requested computing resource. [0096] FIG.12 shows an example of a computing device 1200 and a mobile computing device 1250 that can be used to implement the techniques described in this disclosure. The computing device 1200 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The mobile computing device 1250 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart-phones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be examples only, and are not meant to be limiting. [0097] The computing device 1200 includes a processor 1202, a memory 1204, a storage device 1206, a high-speed interface 1208 connecting to the memory 1204 and multiple high-speed expansion ports 1210, and a low-speed interface 1212 connecting to a low-speed expansion port 1214 and the storage device 1206. Each of the processor 1202, the memory 1204, the storage device 1206, the high-speed interface 1208, the high-speed expansion ports 1210, and the low-speed interface 1212, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 1202 can process instructions for execution within the computing device 1200, including instructions stored in the memory 1204 or on the storage device 1206 to display graphical information for a GUI on an external input/output device, such as a display 1216 coupled to the high-speed interface 1208. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system). Thus, as the term is used herein, where a plurality of functions are described as being performed by “a processor”, this encompasses embodiments wherein the plurality of functions are performed by any number of processors (one or more) of any number of computing devices (one or more). Furthermore, where a function is described as being performed by “a processor”, this encompasses embodiments wherein the function is performed by any number of processors (one or more) of any number of computing devices (one or more) (e.g., in a distributed computing system). [0098] The memory 1204 stores information within the computing device 1200. In some implementations, the memory 1204 is a volatile memory unit or units. In some implementations, the memory 1204 is a non-volatile memory unit or units. The memory 1204 may also be another form of computer-readable medium, such as a magnetic or optical disk. [0099] The storage device 1206 is capable of providing mass storage for the computing device 1200. In some implementations, the storage device 1206 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. Instructions can be stored in an information carrier. The instructions, when executed by one or more processing devices (for example, processor 1202), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices such as computer- or machine-readable mediums (for example, the memory 1204, the storage device 1206, or memory on the processor 1202). [0100] The high-speed interface 1208 manages bandwidth-intensive operations for the computing device 1200, while the low-speed interface 1212 manages lower bandwidth- intensive operations. Such allocation of functions is an example only. In some implementations, the high-speed interface 1208 is coupled to the memory 1204, the display 1216 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 1210, which may accept various expansion cards (not shown). In the implementation, the low-speed interface 1212 is coupled to the storage device 1206 and the low-speed expansion port 1214. The low-speed expansion port 1214, which may include various communication ports (e.g., USB, Bluetooth®, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter. [0101] The computing device 1200 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 1220, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 1222. It may also be implemented as part of a rack server system 1224. Alternatively, components from the computing device 1200 may be combined with other components in a mobile device (not shown), such as a mobile computing device 1250. Each of such devices may contain one or more of the computing device 1200 and the mobile computing device 1250, and an entire system may be made up of multiple computing devices communicating with each other. [0102] The mobile computing device 1250 includes a processor 1252, a memory 1264, an input/output device such as a display 1254, a communication interface 1266, and a transceiver 1268, among other components. The mobile computing device 1250 may also be provided with a storage device, such as a micro-drive or other device, to provide additional storage. Each of the processor 1252, the memory 1264, the display 1254, the communication interface 1266, and the transceiver 1268, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate. [0103] The processor 1252 can execute instructions within the mobile computing device 1250, including instructions stored in the memory 1264. The processor 1252 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor 1252 may provide, for example, for coordination of the other components of the mobile computing device 1250, such as control of user interfaces, applications run by the mobile computing device 1250, and wireless communication by the mobile computing device 1250. [0104] The processor 1252 may communicate with a user through a control interface 1258 and a display interface 1256 coupled to the display 1254. The display 1254 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 1256 may comprise appropriate circuitry for driving the display 1254 to present graphical and other information to a user. The control interface 1258 may receive commands from a user and convert them for submission to the processor 1252. In addition, an external interface 1262 may provide communication with the processor 1252, so as to enable near area communication of the mobile computing device 1250 with other devices. The external interface 1262 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used. [0105] The memory 1264 stores information within the mobile computing device 1250. The memory 1264 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. An expansion memory 1274 may also be provided and connected to the mobile computing device 1250 through an expansion interface 1272, which may include, for example, a SIMM (Single In Line Memory Module) card interface. The expansion memory 1274 may provide extra storage space for the mobile computing device 1250, or may also store applications or other information for the mobile computing device 1250. Specifically, the expansion memory 1274 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, the expansion memory 1274 may be provide as a security module for the mobile computing device 1250, and may be programmed with instructions that permit secure use of the mobile computing device 1250. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner. [0106] The memory may include, for example, flash memory and/or NVRAM memory (non-volatile random access memory), as discussed below. In some implementations, instructions are stored in an information carrier. The instructions, when executed by one or more processing devices (for example, processor 1252), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices, such as one or more computer- or machine-readable mediums (for example, the memory 1264, the expansion memory 1274, or memory on the processor 1252). In some implementations, the instructions can be received in a propagated signal, for example, over the transceiver 1268 or the external interface 1262. [0107] The mobile computing device 1250 may communicate wirelessly through the communication interface 1266, which may include digital signal processing circuitry where necessary. The communication interface 1266 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (Personal Digital Cellular), WCDMA (Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communication may occur, for example, through the transceiver 1268 using a radio-frequency. In addition, short-range communication may occur, such as using a Bluetooth®, Wi-Fi™, or other such transceiver (not shown). In addition, a GPS (Global Positioning System) receiver module 1270 may provide additional navigation- and location-related wireless data to the mobile computing device 1250, which may be used as appropriate by applications running on the mobile computing device 1250. [0108] The mobile computing device 1250 may also communicate audibly using an audio codec 1260, which may receive spoken information from a user and convert it to usable digital information. The audio codec 1260 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 1250. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 1250. [0109] The mobile computing device 1250 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 1280. It may also be implemented as part of a smart-phone 1282, personal digital assistant, or other similar mobile device. [0110] Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. [0111] These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine- readable medium that receives machine instructions as a machine-readable signal. The term machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor. [0112] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input. [0113] The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet. [0114] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. [0115] In some implementations, certain modules described herein can be separated, combined or incorporated into single or combined modules. Any modules depicted in the figures are not intended to limit the systems described herein to the software architectures shown therein. EQUIVALENTS [0116] Elements of different implementations described herein may be combined to form other implementations not specifically set forth above. Elements may be left out of the processes, computer programs, databases, etc. described herein without adversely affecting their operation. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Various separate elements may be combined into one or more individual elements to perform the functions described herein. [0117] Throughout the description, where apparatus and systems are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are apparatus, and systems of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps. [0118] It should be understood that the order of steps or order for performing certain action is immaterial so long as the invention remains operable. Moreover, two or more steps or actions may be conducted simultaneously. [0119] While the invention has been particularly shown and described with reference to specific preferred embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

What is claimed is: 1. A method for producing digital output in response to a query or command via a neural network having a layer shared architecture, the method comprising: receiving, by a processor of a computing device, a set of input data; receiving, by the processor, the query or command, wherein the query or command comprises a plurality of different tasks, each said task to be performed with the set of input data; and producing, by the processor, output for each of the plurality of tasks of the query or command using the neural network with the layer shared architecture, wherein the neural network comprises a unique set of layers for each task, each set having one or more base layers in common with the other set(s).

2. The method of claim 1, wherein the neural network comprises a base set of layers and one or more branches extending from the base set of layers, each said branch consisting of (i) one or more layers from the base set of layers, and (ii) one or more additional layers different from the base set of layers.

3. The method of claim 2, wherein the base set of layers and the one or more branches are each trained to perform a unique task from the plurality of different tasks, each said task to be performed with the set of input data.

4. A method of automatically designing a neural network having a layer shared architecture, the method comprising: (a) receiving, by a processor of a computing device, a set of tasks and associated dataset(s); (b) receiving, by the processor, one or more specifications for control of the architecture; and (c) automatically conducting, by the processor, a heuristic based search and/or a predictive rapid search using the set of tasks and associated dataset(s) and using the one or more specifications to produce an output layer shared model, wherein the output layer shared model comprises a base set of layers and one or more branches, each trained to perform a unique task from the set of tasks.

5. The method of claim 4, comprising automatically conducting, by the processor, the heuristics based search by: creating a base layer shared model for performing a first task from the set of tasks; determining whether there is at least one additional task in the set of tasks to model; upon determining there is at least one additional task in the set of tasks to model, for said additional selected task, creating a selected branch model by: for each of a plurality of candidate branch models: (i) identifying a candidate branching point and/or multi-connection point from the base layer shared model for the selected task, and (ii) mutating and/or varying one or more layers of the candidate branch model; for each of the plurality of candidate branch models, computing an efficacy value for the (complete) model containing the candidate branch model; and comparing efficacy values computed for the plurality of candidate branch models, and retaining the candidate branch model of the plurality that results in the highest (best) computed efficacy value; and constructing the output layer shared model as the base layer model for performing the first task with one or more branches corresponding to the retained candidate branch models for each of the additional tasks in the set of tasks.

6. The method of claim 4, comprising automatically conducting, by the processor, the predictive rapid search by: creating a base layer shared model for performing a first task from the set of tasks; determining whether there is at least one additional task in the set of tasks to model; upon determining there is at least one additional task in the set of tasks to model, for said additional selected task, creating a selected branch model by: predicting a set of optimal candidate branching points and/or multi-connection points from the base layer shared model; for each of a plurality of candidate branching points and/or multi-connection points, mutating and/or varying one or more layers to create a candidate branch model; for each of the plurality of candidate branch models, computing an efficacy value for the model containing the candidate branch model; and comparing efficacy values computed for the plurality of candidate branch models, and retaining the candidate branch model of the plurality that results in the highest computed efficacy value; and constructing the output layer shared model as the base layer model for performing the first task with one or more branches corresponding to the retained candidate branch models for each of the additional tasks in the set of tasks.

7. The method of claim 6, comprising automatically selecting, by the processor, a first task from the set of tasks for use in creating the base set of layers for performing the first task.

8. The method of claim 4, comprising automatically constructing, by the processor, the base set of layers by: creating an initial autoencoder model; training the initial autoencoder model using the associated dataset(s); and identifying at least a portion of the trained autoencoder for use as the base set of layers of the output layer shared model.

9. The method of claim 8, the method comprising, for each of the unique tasks from the set of tasks, adding a particular candidate branch model to the base set of layers at a particular branch location by automatically conducting, by the processor, the heuristic based search and/or the predictive rapid search using the set of tasks and associated dataset(s).

10. The method of claim 1, wherein the set of input data comprises image data and the plurality of tasks comprises at least one member selected from the group consisting of age prediction, person recognition, and mood detection.

11. The method of claim 1, wherein the set of input data comprises image data and the plurality of tasks comprises at least one member selected from the group consisting of edge detection, segmentation, and depth detection.

12. The method of claim 1, wherein the set of input data comprises video data and the plurality of tasks comprises at least one member selected from the group consisting of object tracking, action recognition, video captioning, video summarizing, and emotion recognition.

13. The method of claim 1, wherein the set of input data comprises audio data and the plurality of tasks comprises at least one member selected from the group consisting of speaker identification, instrument identification, language classification, and emotion recognition.

14. The method of claim 1, wherein the set of input data comprises hyperspectral and/or multispectral data and the plurality of tasks comprises at least one member selected from the group consisting of land cover type classification, mineral and/or material identification, agricultural monitoring, and water quality assessment.

15. The method of claim 1, wherein the set of input data comprises electrocardiogram (ECG) data and the plurality of tasks comprises at least one member selected from the group consisting of heart attack classification, abnormal heartbeat identification, extent of heart damage identification, location of heart damage identification, heart rhythm disturbance detection, detection of heart blockage and/or conduction problems, detection of electrolyte disturbance and/or intoxication, detection of ischemia and/or infarction, and detection of heart structural change.

16. The method of claim 1, wherein the set of input data comprises medical imaging data, and the plurality of tasks comprises at least one member selected from the group consisting of organ identification, organ physiology monitoring, pathology identification , pathology monitoring, disease identification, disease monitoring, and treatment monitoring.

17. A system comprising a processor of a computing device and memory having instructions stored thereon, which, when executed by the processor, cause the processor to perform the method of any one of claims 1 to 16.