EP3895059A1 - On the fly adaptive convolutional neural network for variable computational resources - Google Patents
On the fly adaptive convolutional neural network for variable computational resourcesInfo
- Publication number
- EP3895059A1 EP3895059A1 EP18842859.3A EP18842859A EP3895059A1 EP 3895059 A1 EP3895059 A1 EP 3895059A1 EP 18842859 A EP18842859 A EP 18842859A EP 3895059 A1 EP3895059 A1 EP 3895059A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- convolutional
- subset
- available
- cnn
- convolutional filters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/96—Management of image or video recognition tasks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
Definitions
- CNNs convolutional neural networks
- CNNs provide high quality results at the cost of large model size and large computing costs, which may make implementation in resource- limited environments difficult.
- the computational budget for CNN inference is dependent on the changing real-time computational resource availability of the device or system implementing the CNN. For example, in the case of a central processing unit (CPU) implementing a CNN, the available resources of the CPU for the CNN computational load can vary dramatically depending on other applications running on the CPU (e.g., an antivirus application runs suddenly, etc.).
- FIG. 1 illustrates an example system for performing object recognition using a CNN adaptive to available computational resources
- FIG. 2 is a flow diagram illustrating an example process for performing object recognition using a CNN adaptive to available computational resources
- FIG. 3 is a flow diagram illustrating an example process for training an adaptive
- FIG. 4 illustrates exemplary training of an example adaptive CNN
- FIG. 5 illustrates example channel wise drop out CNN training techniques
- FIG. 6 illustrates exemplary object recognition inference using example adaptive
- FIG. 7 is a flow diagram illustrating an example process for performing object recognition using a convolutional neural network that is adaptive based on available computational resources
- FIG. 8 is an illustrative diagram of an example system for performing object recognition using a convolutional neural network that is adaptive based on available computational resources;
- FIG. 9 is an illustrative diagram of an example system.
- FIG. 10 illustrates an example device, all arranged in accordance with at least some implementations of the present disclosure.
- SoC system-on-a-chip
- implementation of the techniques and/or arrangements described herein are not restricted to particular architectures and/or computing systems and may be implemented by any architecture and/or computing system for similar purposes.
- various architectures employing, for example, multiple integrated circuit (IC) chips and/or packages, and/or various computing devices and/or consumer electronic (CE) devices such as set top boxes, smart phones, etc. may implement the techniques and/or arrangements described herein.
- IC integrated circuit
- CE consumer electronic
- claimed subject matter may be practiced without such specific details.
- some material such as, for example, control structures and full software instruction sequences, may not be shown in detail in order not to obscure the material disclosed herein.
- a machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device).
- a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others.
- references in the specification to "one implementation”, “an implementation”, “an example implementation”, etc., indicate that the implementation described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same implementation. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other implementations whether or not explicitly described herein.
- the terms “substantially,” “close,” “approximately,”“near,” and“about,” generally refer to being within +/- 10% of a target value.
- the term“satisfies” when used in reference to a threshold indicates the value in question meets the condition established by the threshold.
- the terms “compares favorably” when used in reference to a threshold indicates the value in question is greater than or greater than or equal to the threshold.
- the terms“compares unfavorably” when used in reference to a threshold indicates the value in question is less than or less than or equal to the threshold.
- Methods, devices, apparatuses, computing platforms, and articles are described herein related to the implementation of CNNs in variable computational resource environments and, in particular, to adjusting the number of convolutional filters applied at convolutional layers of the CNN in response to varying computational resources available for the CNN.
- CNNs may be implemented to provide high quality object recognition results.
- an adaptive CNN architecture is provided such that on the fly response to computational resources is provided.
- the adaptive CNN may be trained such that the CNN may be employed in any number of configurations. In each configuration, a number of convolutional filters at each convolutional layer (and a number of fully connected channels in a fully connected layer) of the CNN varies. Furthermore, in each configuration, the employed convolutional filters share the same filter coefficients and, therefore, filter structure.
- the full CNN e.g., a configuration using all available convolutional filters at each convolutional layer is used.
- computational resource level indicates one or more of available computational cycles, available processor cores in multi-core systems, available memory resources, available power resources, etc.
- Such computational resource levels to CNN configurations may be predefined such that a CNN configuration may be accessed via a look up table based on such computational resource level parameters as inputs or using any suitable technique or techniques.
- the described CNN is trained such that each configuration of the CNN shares filter parameters for all filters. For example, a first subset of available convolutional filters may be applied at a convolutional layer when the CNN is in a lowest computational resource configuration. At a higher computational resource configuration, the first subset and one or more additional convolutional filters are applied at the convolutional layer. At both the lowest computational resource configuration and all higher computational resource configurations, the same convolutional filters of the first subset are applied. Similar, at the highest computational resource configuration, all convolutional filters of the highest computational resource configuration that are also in the lower computational resource configurations are applied with the same (e.g., shared or common) convolutional filter coefficients across all computational resource configurations of the CNN.
- the discussed techniques provide for training and use (e.g., inference) of an adaptive CNN that is configured based on available computational resources or budget in an on the fly manner during inference. For example, available computational resources may be monitored and a suitable CNN configuration may be selected prior to processing each image or frame of a video sequence.
- the discussed techniques work robustly as computational resources or power varies due to, for example, float and not- stable computational resources availability due to, for example, unstable power, redistribution of CPU or graphics processing unit (GPU) load in the case of occasionally running side programs or services (e.g., antivirus software) and so on.
- the adaptive CNN configurations discussed herein vary usage of sets of convolutional filters having shared or common convolutional filter coefficients at one or more convolutional layers of the CNN.
- more convolutional filters e.g., the sets have more convolutional filters
- fewer convolutional filters are used. Since the same convolutional filter coefficients in the common convolutional filters are used regardless of configuration (e.g., although not all convolutional filters are applied in each configuration, the convolutional filter coefficients are the same), the feature maps generated by the convolutional filters are of the same resolution.
- the number of generated feature maps changes at the convolutional layer depending on the number of convolutional filters in the set applied in the particular configuration; however, the resolution of the feature maps are the same.
- the term resolution with respect to images or feature maps indicates the number of members or data points therein. In the context of images, resolution indicates the number of pixels. In the context of feature maps, resolution indicates the number of data points in the feature map.
- the resolution or size of the feature maps is not varied in response to computational resource changes.
- the convolutional filter coefficients are shared, as discussed. Therefore, convolutional filter coefficients (nor the CNN architecture itself) do not need to be reloaded to memory on the fly, offering enhanced efficiency and speed of operation of the CNN in real time.
- a single CNN is trained such that even a piece (e.g., a configuration using only subsets of available convolutional filters at each convolutional layer) of the CNN, during inference, produces reliable and useful object recognition or classification results.
- Smaller pieces (e.g., configurations with fewer convolutional filters in use) of the CNN are used in very limited computational resource environments to perform defined computer vision tasks and bigger pieces (e.g., configurations with more convolutional filters in use) or even the entirety of the CNN are used when more computational resources are available such that, as more computational resources are available, the CNN provides the highest available quality of object recognition or classification results.
- object recognition and classification are used interchangeably and are inclusive of object detection. Data structures or indicators of object recognition or classification may indicate probability scores of a particular class of object being detected, an indicator as to whether or not a particular class of object has been detected, or any other suitable data structure to indicate an object has been recognized, classified, or detected in an image.
- FIG. 1 illustrates an example system 100 for performing object recognition using a CNN adaptive to available computational resources, arranged in accordance with at least some implementations of the present disclosure.
- system 100 includes an imaging device 101, a normalization module 102, an adaptive convolutional neural network (CNN) module 103, and a controller 104.
- System 100 may be implemented in any suitable form factor device such as motor vehicle platform, a robotics platform, a personal computer, a laptop computer, a tablet, a phablet, a smart phone, a digital camera, a gaming console, a wearable device, a display device, an all-in- one device, a two-in-one device, etc.
- system 100 may perform object recognition as discussed herein.
- imaging device 101 attains image data 111.
- Imaging device 101 may be any suitable imaging device such as an RGB camera or the like.
- system 100 attains image data 111 via imaging device 101.
- system 100 receives image data 111 or input image data 112 from another device via a communications channel (not shown).
- image data 111 is attained for processing from a memory (not shown) of system 100.
- Image data 111 may include any suitable picture, frame, or the like or any data structure representing a picture or frame at any suitable resolution.
- image data 111 is RGB image data having R (red), G (green), and B (blue), values for pixels thereof.
- image data 111 is RGB-D image data having R, G, B, D (depth) values for pixels thereof.
- image data 111 is single channel image data (e.g., luminance, IR, etc.) having a single value (e.g., an intensity value) at each pixel thereof.
- Image data 111 may be received by optional normalization module 102.
- Normalization module 102 using image data 111, may optionally perform object detection using any suitable technique or techniques such as landmark detection to generate a bounding box around the object (if any).
- normalization module 102 may normalize image data corresponding to the detected objects(s) or may normalize the image data without using object detection to a predetermined size and/or scale to generate input image data 112.
- Input image data 112 may include any suitable data structure.
- input image data 112 has a single channel (e.g., gray scale image data) such that input image data 112 has a single value for each pixel thereof.
- input image data 112 has three color channels (e.g., RGB image data) such that input image data 112 has three values (e.g., an R value, a G value, and a B value) for each pixel thereof.
- RGB image data any suitable image data format (e.g., YUV, YCbCr, etc.) may be used.
- input image data 112 has three color channels and a depth channel (e.g., RGB-D image data) such that input image data 112 has four values (e.g., an R value, a G value, a B value, and a D value) for each pixel thereof.
- input image data 112 may have any suitable size.
- input image data 112 may represent any suitable size of image such as a 32x32 pixel image, a 160x160 pixel image, a 672x384 pixel image, etc.
- normalization module 102 is optional.
- normalization module 102 generates input image data 112 suitable for inference using adaptive CNN module 103.
- image data 111 (as generated by system 100 or as received by system 100) is suitable for inference using adaptive CNN module 103.
- CNN output data 113 may include any suitable data structure such as an N- dimensional vector with each value indicating a likelihood or score that an object or feature is within input image data 112.
- the N-dimensional vector may include any number of likelihood scores such as 10s, 100s, or even 1,000 scores or more.
- CNN output data 113 may be provided to another module of system 100 for the generation of object recognition data, for object tracking, for output to a user, for use in artificial intelligence applications, etc.
- object recognition data that includes indication of a recognized object (e.g., a car, a bicycle, a person, etc.) is identified in input image data 112 based CNN output data 113.
- controller 104 receives computational resource level 115.
- Computational resource level 115 may be any suitable data indicating computational resources that are available for the application of the adaptive CNN (e.g., via adaptive CNN module 103) to input image data 112.
- computational resource level 115 may indicate one or more of a number of processing cores available, an available operation frequency, an available number of processor cycles per time, etc. that are available for the computational resource that is to apply the adaptive CNN.
- the computational resource may be any suitable computational resource such as a CPU, a GPU, an image signal processor, etc.
- computational resource level 115 may indicate one or more of an available memory allocation, an available memory bandwidth, etc. that are available for the memory resource that is to apply the adaptive CNN.
- the memory resource may be any suitable memory resource such as a static random access memory (SRAM), an on-board cache, etc.
- computational resource level 115 is a predefined rating of available resources that may be a scalar value (e.g., 1 to 10), or one of a variety of descriptive values (e.g., low, medium-low, medium, medium-high, high) at any level of granularity.
- Controller 104 translates computational resource level 115 to a CNN configuration 114.
- the adaptive CNN may have one of N configurations such configuration 1 implements a lowest number of convolutional filters at one or more convolutional layers of the CNN (e.g., a lowest level of the adaptive CNN), configuration 2 implements a higher number of convolutional filters at one or more convolutional layers of the CNN (e.g., a higher level of the adaptive CNN), configuration 3 implements a yet higher number of convolutional filters at one or more convolutional layers of the CNN (e.g., a higher level of the adaptive CNN), and so on, through configuration N, which implements the full number of available convolutional filters at each convolutional layers of the CNN (e.g., a highest level of the adaptive CNN).
- the adaptive CNN may have any number of configurations, N, such as 2, 4, 8, or more.
- a particular configuration of the adaptive CNN is applied to input image data 112 via adaptive CNN module 103.
- Such techniques are discussed further herein with respect to FIG. 6.
- the configuration of the adaptive CNN of those convolutional filters that are applied at each convolutional layer of the adaptive CNN, the same convolutional filter coefficients are applied to input image data 112 and the feature maps corresponding to input image data 112.
- FIG. 2 is a flow diagram illustrating an example process 200 for performing object recognition using a CNN adaptive to available computational resources, arranged in accordance with at least some implementations of the present disclosure.
- Process 200 may include one or more operations 201-204 as illustrated in FIG. 2.
- Process 200 may be performed by any device or system discussed herein to perform inference using an adaptive CNN as discussed herein.
- Process 200 or portions thereof may be repeated for any number of images, images of a video sequence, frames of a video sequence, etc.
- the object recognition indicators generated by process 200 may be used by other artificial intelligence applications, presented to a user (e.g., as a bounding box over an image corresponding to image data 111), stored to memory, etc.
- Process 200 begins at operation 201, where an available computational resource level for implementing the adaptive CNN is monitored. Such monitoring may be performed continuously, at particular time intervals, as triggered by particular events (e.g., other software running, entry to a different power state, responsive to a user request), etc. Processing continues at operation 202, where a CNN configuration is selected based on the available computational resource monitoring. Such CNN configuration selection may be performed using any suitable technique or techniques. In an embodiment, the one or more available computational resource parameters are mapped to a particular CNN configuration using a look up table or similar mapping technique.
- processing continues at operation 203, where the input image (e.g., input image data 112) is processed by the adaptive CNN using the selected CNN configuration. Such processing may be performed as discussed further herein with respect to FIG. 6. Processing continues at operation 204, where the object detection indicators, if any, corresponding to the application of the adaptive CNN are output.
- processing continues at operation 201, where the available computational resource level for implementing the adaptive CNN are again monitored. If there is no change, the selected CNN configuration is used to process input images until a change is detected. If a change is detected, operations 202, 203, 204 are repeated to select a different CNN configuration for processing the next available input image data, implemented the newly selected CNN configuration, and output object detection indicators, if any.
- computational resource levels are lower, at one or more convolutional layers of the adaptive CNN, fewer convolutional filters are applied to the input image data and/or feature maps corresponding to the input image data.
- convolutional filters When computational resource levels are higher, at the one or more convolutional layers of the adaptive CNN, more convolutional filters are applied to the input image data and/or feature maps corresponding to the input image data. In both configurations, common convolutional filters share the same convolutional filter coefficients during inference.
- FIG. 3 is a flow diagram illustrating an example process 300 for training an adaptive CNN, arranged in accordance with at least some implementations of the present disclosure.
- Process 300 may include one or more operations 301-305 as illustrated in FIG. 3.
- Process 300 may be performed by any device or system discussed herein to train any adaptive CNN discussed herein.
- Process 300 or portions thereof may be repeated for any training, training sets, etc.
- the parameter weights generated by process 300 may be stored to memory and implemented via a processor of system 100 during inference, for example.
- Process 300 begins at operation 301, where a training corpus of images are attained.
- the training corpus may include sets of images that provide a ground truth for training.
- the training corpus may include any number of images, such as 15k images, 1.2M images, or more.
- the images of the training corpus have the same resolution and each image is of the same format, such as any format discussed with respect to image data 1 1 1 or input image data 112.
- a number of configurations for the adaptive CNN are selected. For example, a number N of possible of CNN (e.g., neural net) configurations are selected.
- the number of configurations may be any suitable number, such as 2, 4, 8, or more. More configurations levels provide more freedom for selection of more appropriate CNN configuration taking into the available computational resources (e.g., accounting for speed vs. accuracy trade off during inference) at the cost of more difficult and less accurate training. Fewer configurations provide faster and more accurate training at the cost of less selection granularity in inference.
- a number of convolutional filters are selected for each layer of the CNN. For example, for each configuration, ⁇ ⁇ O..N - 1, a number of used convolutional filters (e.g., output channels), CNuJ for each layer I of the adaptive CNN are selected.
- a full CNN architecture is selected including a number of convolutional layers, a number of convolutional filters in each layer, a fully connected layer configuration, etc.
- the full CNN architecture may include any number of convolutional layers each having any number of convolutional filters.
- Each configuration, moving from less and less accurate CNN configurations, of the adaptive CNN then eliminates one or more convolutional filters from one or more convolutional layers of the CNN. In an embodiment, moving from less and less accurate CNN configurations, at least one convolutional filter is removed from each convolutional layer of the CNN.
- the number of used convolutional filters for a particular con f: iguration, CNyJ is defined as shown in Equation (1): Wi
- Equation (1) may provide for a linear reduction in the number of convolutional filters at each convolutional layer of the CNN.
- FIG. 4 illustrates exemplary training of an example adaptive CNN 400, arranged in accordance with at least some implementations of the present disclosure.
- adaptive CNN 400 includes any number of convolutional layers 481, 482, 483, 484, and one or more fully connected layers 485.
- adaptive CNN 400 includes any number of configurations 491, 492, 493, 494 such that one or more of convolutional layers 481, 482, 483, 484 implement different numbers of convolutional filters between configurations 491, 492, 493, 494.
- each of available convolutional filters 401 are applied to input image data 490 to generate feature maps 402.
- each of available convolutional filters 403 are applied to feature maps 402 to generate feature maps 404.
- each of available convolutional filters 405 are applied to feature maps 404 (and subsequent feature maps, as applicable) to generate feature maps
- each of available convolutional filters 407 are applied to feature maps 406 (or feature maps 404 if no intervening convolutional layers are implemented) to generate a feature vector 408.
- each convolutional filter has a kernel size equal to the size of input feature maps 406, 426, 446, 466 such that convolutional layer 484 produces output channels with lxl sizes to provide feature vectors 408, 428, 448, 468.
- convolutional layer 484 may provide an output similar to a fully connected layer (e.g., a lxl channel provides a single feature). In other embodiments, convolutional layer may be a fully connected layer.
- the term fully connected layer weights indicates the weights of a convolutional layer providing a feature vector or weights of any suitable fully connected layer to provide a feature vector.
- a fully connected layer 409 may be applied to feature vector 408 to generate class probabilities 410 (e.g., analogous to CNN output data 113), which are provided to a loss function 411.
- each convolutional layer may also include pooling and rectified linear unit (ReLU) layers or any other deep network operations, as is known in the art. Such operations are not shown for the sake of clarity of presentation.
- ReLU rectified linear unit
- configuration 492 e.g., a lower quality/higher speed CNN configuration
- a subset 421 of available convolutional filters 401 are applied to input image data 490 to generate feature maps 422 (the dashed lines indicate the convolutional filter is not applied, the feature map is not generated, etc. in FIG. 4 and elsewhere herein).
- the number of feature maps 422 corresponding to the number of convolutional filters in subset 421.
- a subset 423 of available convolutional filters 403 are applied to feature maps 422 to generate feature maps 424.
- any additional convolutional layers 483 of configuration 492 subsets 425 of available convolutional filters 405 are applied to feature maps 424 (and subsequent feature maps, as applicable) to generate feature maps 426.
- a subset 427 of available convolutional filters 407 are applied to feature maps 426 (or feature maps 424 if no intervening convolutional layers are implemented) to generate a feature vector 428.
- each convolutional layer or fully connected layer implements fewer convolutional filters (or channels) to generate fewer feature maps (or a smaller feature vector).
- a subset 429 of available fully connected layer weights 409 is applied to feature vector 428 to generate class probabilities 430, which are provided to a loss function 431.
- configuration 493 e.g., a yet lower quality/higher speed CNN configuration
- a subset 441 of available convolutional filters 401 are applied to input image data 490 to generate feature maps 442 with the number of feature maps 442 corresponding to the number of convolutional filters in subset 441.
- a subset 443 of available convolutional filters 403 are applied to feature maps 442 to generate feature maps 444.
- subsets 445 of available convolutional filters 405 are applied to feature maps 444 (and subsequent feature maps, as applicable) to generate feature maps 446.
- a subset 447 of available convolutional filters 407 are applied to feature maps 446 (or feature maps 444 if no intervening convolutional layers are implemented) to generate a feature vector 448.
- each convolutional layer or fully connected layer implements fewer convolutional filters (or channels) to generate fewer feature maps (or smaller feature vector).
- a subset 449 of available fully connected layer weights 409 is applied to feature vector 448 to generate class probabilities 450 (e.g., analogous to CNN output data 113), which is provided to a loss function 451.
- configuration 494 e.g., a lowest quality/higher speed CNN configuration
- a subset 461 of available convolutional filters 401 are applied to input image data 490 to generate feature maps 462 with the number of feature maps 462 corresponding to the number of convolutional filters in subset 461.
- a subset 463 of available convolutional filters 403 are applied to feature maps 462 to generate feature maps 464.
- subsets 465 of available convolutional filters 405 are applied to feature maps 464 (and subsequent feature maps, as applicable) to generate feature maps 466.
- a subset 467 of available convolutional filters 407 are applied to feature maps 466 (or feature maps 464 if no intervening convolutional layers are implemented) to generate a feature vector 468.
- each convolutional layer or fully connected layer implements fewer convolutional filters (or channels) to generate fewer feature maps (or smaller feature vector).
- a subset 469 of available fully connected layer weights 409 is applied to feature vector 468 to generate class probabilities 479 (e.g., analogous to CNN output data 113), which are provided to a loss function 471.
- Loss functions 411, 431, 451, 471 determine loss functions or values, which are summed at loss function 472.
- adaptive CNN 400 is trained to minimize the sum of loss functions 411, 431, 451, 471 at summed loss function 472.
- processing continues at operation 304, where each configuration of the adaptive CNN are trained in conjunction with one another to train shared convolutional filter weights that are shared across the configurations.
- filter weights are shared, via shared filter weights 412 (e.g., pretrained filter weights after training), such that the weights of available convolutional filters 401 are trained and implemented at each of configuration 491, 492, 493, 494.
- the filter weights of the convolutional filters in subset 461 in configuration 494 are the same as the filter weights, for corresponding convolutional filters, as those of the convolutional filters in subset 441, 421, 401 of configurations 493, 492, 491, respectively.
- subset 461 implements two convolutional filters, A and B
- subset 441 implements four convolutional filters
- subset 421 implements six convolutional filters
- available convolutional filters 401 implement eight convolutional filters, A, B, C, D, E, F, G, and H, at each of configuration 491, 492, 493, 494, each of convolutional filters A, B, C, D, E, F, G, and H (when applicable) have shared filter weights 421.
- feature maps 402, 422, 442, 462 also all have the same resolution (although the number of feature maps 402, 422, 442, 462 or channels changes between configuration 491, 492, 493, 494).
- filter weights are shared, via shared filter weights 413 (e.g., pretrained filter weights after training), such that the weights of available convolutional filters 403 are trained and implemented at each of configuration 491, 492, 493, 494.
- the filter weights of the convolutional filters as applied in subset 463, 443, 423, and available convolutional filters 403 are the same as the filter weights, for corresponding convolutional filters, therebetween.
- Such filter weight sharing is also applied between subsets 465, 445, 425, and available convolutional filters 405 as shown with respect to shared weights 414 (e.g., pretrained filter weights after training) as well as between subsets 467, 447, 427, and available convolutional filters 407 as shown with respect to shared weights 415 (e.g., pretrained filter weights after training).
- shared weights 414 e.g., pretrained filter weights after training
- shared weights 415 e.g., pretrained filter weights after training
- fully connected layers 409, 429, 449, 469 share filter weights as shown with respect to shared weights 416.
- training of adaptive CNN 400 may be performed using any suitable technique or techniques such that weight sharing is maintained.
- training configurations in conjunction with one another indicates the configurations are trained with shared filter weights and a shared loss function.
- channel wise drop out techniques may be used.
- FIG. 5 illustrates example channel wise drop out CNN training techniques 500, arranged in accordance with at least some implementations of the present disclosure.
- a dropout power setting for each configuration module 501 may provide indications 502, 503, 504, 505 to eliminate one or more convolutional filters from available convolutional filters 401, 403, 405, 507 during training to simulate subsets of available convolutional filters 401, 403, 405, 507 as illustrated with respect to subsets 421, 423, 425, 427.
- forward propagation and backward propagation are performed N times (once for each configuration) and batch gradient descents are accumulated.
- a forward propagation and a backward propagation are performed a number of forward and backward propagations equal to the number of available configurations to determine batch gradient descents.
- a forward propagation and a backward propagation are performed for a single randomly or iteratively chosen configuration to determine batch gradient descents. That is, at each iteration of training adaptive CNN 400, forward and backward propagation is performed 1 time for randomly or iteratively chosen configuration.
- Such processing is repeated for any number of iterations (e.g., epochs) until a convergence condition is met, a maximum number of iterations is met, or the like.
- processing continues at operation 305, where the resultant filter weights (e.g., parameter weights) of adaptive CNN 400 are output.
- the predefined architecture of adaptive CNN 400 and resultant filter weights after training may be stored to memory and/or transmitted to another device for implementation during an inference phase as discussed herein.
- adaptive CNN 400 may have any number of configurations. Furthermore, in the illustrated embodiment, between each of configurations 491, 492, 493, 494, each convolutional layer 481, 482, 483, 484 and fully connected layer 485 has a different number of convolutional filters and weights. In other embodiments, some of convolutional layers 481, 482, 483, 484 and/or fully connected layer 485 have the same number of convolutional filters or channels between some of configurations 491, 492, 493, 494. Furthermore, as discussed with respect to Equation (1), in some embodiments, the number of convolutional filters (or channels) may be reduced in a linear manner.
- a ratio of the number of convolutional filters in available convolutional filters 401 to the number of convolutional filters in subset 421 is the same as a ratio of the number of convolutional filters in available convolutional filters 403 to the number of convolutional filters in subset 423. Similar ratios between numbers of convolutional filters or channels along configurations 491, 492 may also be the same and such ratios may match between other combinations of configurations 491, 492, 493, 494.
- the number of convolutional filters in available convolutional filters 401 is not less than twice the number of convolutional filters in subset 421, which is not less than twice the number of convolutional filters in subset 441, which is not less than twice the number of convolutional filters in subset 461. Similar characteristics may be provided with respect to the number of convolutional filters in convolutional layers 482, 483,484 and fully connected layer 485.
- the reduction in the number of convolutional filters or channels for one or more of convolutional layer 481, 482, 483,484 and/or fully connected layer 485 between configurations 491, 492, 493, 494 may be non-linear.
- the reduction in the number of convolutional filters or channels between configuration 491 and configuration 492 may be less than the reduction in the number of convolutional filters or channels between configuration 492 and configuration 493, which may, in turn, be less than the reduction in the number of convolutional filters or channels between configuration 493 and configuration 494.
- Such reduction characteristics may advantageously offer more configurations at or near peak computational resource availability and therefore less loss in accuracy as minor computational resource reduction are needed.
- the configurations have different numbers output channels (e.g., convolutional filters) for each layer of the adaptive CNN.
- the configurations are trained together (e.g., in conjunction with each other) using shared weights and common loss.
- the resultant pretrained model can be configured on the fly during inference for different computational resource budgets for execution. Using the discussed techniques, reasonable accuracy loss is maintained in the case of small computational resource budgets. Furthermore, the resultant full computational resource budget CNN provides high accuracy and, again, a reasonable loss as compared to training the full CNN alone.
- such adaptive CNNs offer a variety of advantages including the advantage of not skipping video images or frames due to low computational resources, not increasing the number of weights for implementation of the CNN, robust computer vision results even in low or highly variable computational resource environments, and the simplicity of training and operating a single CNN.
- FIG. 6 illustrates exemplary object recognition inference using example adaptive CNN 400, arranged in accordance with at least some implementations of the present disclosure.
- adaptive CNN 400 as implemented by adaptive CNN module 103, may be configured to one of configurations 491, 492, 493, 494 based on CNN configuration data 114 as received from controller 104.
- adaptive CNN 400 is configured to configuration 493.
- configuration 493 may be selected.
- only subset 441 of available convolutional filters 401 are applied to input image data 112 at convolutional layer 481
- only subset 443 of available convolutional filters 403 are applied to feature maps 442 at convolutional layer 482
- only subsets 445 of available convolutional filters 405 are applied to feature maps 444 at convolutional layer(s) 483 (if any)
- only subset 447 of available convolutional filters 407 are applied to feature maps 446 (or feature maps 444)
- only subset 449 of fully connected weights 409 is applied to feature vector 448 to generate class probabilities 650.
- the term only indicates a portion or subset of a set or group are used while another portion or subset of a set or group are not used.
- the unused portion or subset may be characterized as unused, discarded, idle, etc.
- processing is performed such that subset 441 of available convolutional filters 401 of convolutional layer 481 are applied to input image data 112 to generate corresponding feature maps 442 having a number thereof that matches the number of convolutional filters in subset 441 (e.g., 16 convolutional filters out of 32 available convolutional filters).
- Subset 443 of available convolutional filters 403 of convolutional layer 482 are applied to feature maps 442 to generate corresponding feature maps 444 having a number thereof that matches the number of convolutional filters in subset 443 (e.g., 16 convolutional filters out of 32 available convolutional filters).
- one or more subsets 445 of one or more sets of available convolutional filters 405 of one or more convolutional layers 483 are applied to feature maps 444 (and subsequently formed feature maps) to generate corresponding feature maps 446 having a number thereof that matches the number of convolutional filters in final subset of subsets 445.
- Subset 447 of available convolutional filters 407 are applied to feature maps 446 (or feature maps 444 if no intervening one or more convolutional layers 483 are implemented) to generate feature vector 448 having a number of features thereof that matches the number of convolutional filters of subset 447 (e.g., 32 convolutional filters out of 64 available convolutional filters).
- class probabilities 650 which indicate a probability for each of the available classes that an object corresponding to the class is in input image data 112.
- class probabilities 650 may indicate include a score in the range of zero to one for each available class (e.g., car, person, truck, dog, street light, etc.) with increasing probability scores indicating increasing likelihood image data 112 includes an object of the corresponding object class.
- Adaptive CNN 400 may be configured to any number of configurations such as configurations 491, 492, 493, 494 based on CNN configuration data 114. For example, in response to change in the computational resource level available for processing subsequent input image data 112, configuration 491, 492, or 494 may be selected. In an embodiment, in response to a change to another computational resource level greater than the previously discussed computational resource level available for processing second input image data 112, configuration 491 or 492. With respect to configuration 493, in both of configurations 491, 492, each convolutional filter in subset 441 and one or more additional convolutional filters of available convolutional filters 401 are applied at convolutional layer 481 to second input image data 112. Notably, subset 421 includes all convolutional filters in subset 441 and additional convolutional filters.
- Available convolutional filters 401 include all convolutional filters in subset 441 (and those in subset 421) and, indeed, include all available convolutional filters 401 of convolutional layer 481. Similarly, in both of configurations 491, 492, each convolutional filter in subsets 443, 445 and one or more additional convolutional filters of available convolutional filters 403, 405, respectively are applied at convolutional layers 482, 483 to the corresponding feature maps, as illustrated. Furthermore, in both of configurations 491, 492, each convolutional filter in subset 447 and one or more additional convolutional filter of available convolutional filters 407, respectively are applied at convolutional layer 484 to the corresponding feature maps. As discussed herein, those convolutional filters shared between configurations apply the same filter coefficients. Similarly, the fully connected layer weights shared between configurations apply the same coefficients and parameters.
- configuration 491 is selected and processing is performed in analogy to the processing discussed with respect to configuration 493 such that all of available convolutional filters 401 of convolutional layer 481 are applied to input image data 112 to generate corresponding feature maps 402, all of available convolutional filters 403 of convolutional layer 482 are applied to feature maps 402 to generate corresponding feature maps 404, all convolutional filters 405 of one or more convolutional layers 483 are applied to feature maps 404 (and subsequently formed feature maps) to generate corresponding feature maps 406, all available convolutional filters 407 are applied to feature maps 446 (or feature maps 444 if no intervening one or more convolutional layers 483 are implemented) to generate feature vector 408, and fully connected layer 409 is applied to feature vector 408 to generate class probabilities 610.
- configuration 494 is selected and processing is such that subset 461 of available convolutional filters 401 of convolutional layer 481 are applied to input image data 112 to generate corresponding feature maps 462, subset 463 of available convolutional filters 403 of convolutional layer 482 are applied to feature maps 462 to generate corresponding feature maps 464, if applicable, one or more subsets 465 of one or more sets of available convolutional filters 405 of one or more convolutional layers 483 are applied to feature maps 444 (and subsequently formed feature maps) to generate corresponding feature maps 466, subset 467 of available convolutional filters 407 are applied to feature maps 466 (or feature maps 464 if no intervening one or more convolutional layers 483 are implemented) to generate feature vector 468, and fully connected layer 469 is applied to feature vector 468 to generate class probabilities 679. Similar processing may be applied with respect to configuration 492 to generate class probabilities 630.
- class probabilities 610, 630, 650, or 679 are provided as output from adaptive CNN module 103 for use in any suitable image processing, object tracking, object recognition, artificial intelligence, etc. application.
- class probabilities 610, 630, 650, or 679 are provided to an object recognition module 611 for use in object recognition or detection.
- object recognition indicator indicates any data or data structure indicating object recognition or detection such as one or more class probabilities, one or more locations of detected objects within image data, a flag indicating an object is detected in image data, etc.
- Adaptive CNN 400 may have any characteristics discussed with respect to FIG. 4. Notably, the convolutional filters, layers, shared weights, fully connected layers, etc. as shown in FIG. 6 are finalized, trained parameters of adaptive CNN 400 (e.g., pretrained parameters of adaptive CNN 400) used for implementation. Furthermore, the discussed processing may be repeated for any number of input images represented by input image data 112. Between any of such input images, a different one of configurations 491, 492, 493, 494 may be selected for processing. However, such configurations 491, 492, 493, 494 may also remain constant for any number of input images. Such changing between configurations 491, 492, 493, 494 may occur when computation resources vary, for example.
- FIG. 7 is a flow diagram illustrating an example process 700 for performing object recognition using a convolutional neural network that is adaptive based on available computational resources, arranged in accordance with at least some implementations of the present disclosure.
- Process 700 may include one or more operations 701-703 as illustrated in FIG. 7.
- Process 700 may form at least part of an object recognition process.
- process 700 may form at least part of an object recognition process performed by system 100 as discussed herein.
- process 700 will be described herein with reference to system 800 of FIG.
- FIG. 8 is an illustrative diagram of an example system 800 for performing object recognition using a convolutional neural network that is adaptive based on available computational resources, arranged in accordance with at least some implementations of the present disclosure.
- system 800 includes one or more central processing units 801 (i.e., central processor(s)), a graphics processing unit 802 (i.e., graphics processor), and memory stores 803.
- graphics processing unit 802 may include or implement normalization module 102, adaptive CNN module 103, and controller 104. Such modules may be implemented to perform operations as discussed herein.
- memory stores 803 may store input image data, CNN characteristics and parameters data, convolutional filter coefficients, feature maps, feature vectors, CNN output data, class probabilities, or any other data or data structure discussed herein.
- normalization module 102, adaptive CNN module 103, and controller 104 are implemented via graphics processing unit 802.
- one or both or portions of normalization module 102, adaptive CNN module 103, and controller 104 are implemented via central processing units 801 or an image processing unit (not shown) of system 800.
- one or both or portions of view normalization module 102, adaptive CNN module 103, and controller 104 are implemented via an imaging processing pipeline, graphics pipeline, or the like.
- Graphics processing unit 802 may include any number and type of graphics processing units that may provide the operations as discussed herein. Such operations may be implemented via software or hardware or a combination thereof.
- graphics processing unit 802 may include circuitry dedicated to manipulate image data, CNN data, etc. obtained from memory stores 803.
- Central processing units 801 may include any number and type of processing units or modules that may provide control and other high level functions for system 800 and/or provide any operations as discussed herein.
- Memory stores 803 may be any type of memory such as volatile memory (e.g., Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), etc.) or non-volatile memory (e.g., flash memory, etc.), and so forth.
- SRAM Static Random Access Memory
- DRAM Dynamic Random Access Memory
- flash memory etc.
- memory stores 803 may be implemented by cache memory.
- one or both or portions of view synthesis network 92 and image super-resolution CNN 93 are implemented via an execution unit (EU) of graphics processing unit 802.
- the EU may include, for example, programmable logic or circuitry such as a logic core or cores that may provide a wide array of programmable logic functions.
- one or both or portions of view normalization module 102, adaptive CNN module 103, and controller 104 are implemented via dedicated hardware such as fixed function circuitry or the like.
- Fixed function circuitry may include dedicated logic or circuitry and may provide a set of fixed function entry points that may map to the dedicated logic for a fixed purpose or function.
- one or both or portions of view normalization module 102, adaptive CNN module 103, and controller 104 are implemented via an application specific integrated circuit (ASIC).
- the ASIC may include an integrated circuitry customized to perform the operations discussed herein.
- process 700 begins at operation 701, where, in response to a first computational resource level available for processing first input image data, only a first subset of available convolutional filters are applied at a convolutional layer of a convolutional neural network (CNN) to first feature maps corresponding to the first input image data.
- CNN convolutional neural network
- Processing continues at operation 702, where, in response to a change to a second computational resource level greater than the first computational resource level available for processing second input image data, the first subset and one or more additional convolutional filters of the available convolutional filters are applied at the convolutional layer of the CNN to second feature maps corresponding to the second input image data.
- applying the first subset of available convolutional filters to the first feature maps and applying the first subset of available convolutional filters to the second feature maps includes applying the same pretrained filter weights.
- applying the first subset of convolutional filters generates a number of third feature maps equal to the number of convolutional filters in the first subset of convolutional filters and applying the first subset and the one or more additional convolutional filters generates a number of fourth feature maps equal to the number of third feature maps plus the number of additional convolutional filters such that the third feature maps and the fourth feature maps are all of the same resolution.
- process 700 further includes applying, in response to the first computational resource level, only a second subset of second available convolutional filters at a second convolutional layer of the CNN to the first input image data and applying, in response to the second computational resource level, the second subset and one or more additional second convolutional filters of the second available convolutional filters at the second convolutional layer of the CNN to the second input image data.
- a ratio of a number of convolutional filters in the first subset to a number of convolutional filters in the first subset plus the one or more additional convolutional filters is the same as a second ratio of a number of convolutional filters in the second subset to a number of convolutional filters in the second subset plus the one or more additional second convolutional filters.
- process 700 further includes applying, in response to the first computational resource level, only a second subset of available fully connected channels at a second layer of the CNN to third feature maps corresponding to the first input image data and applying, in response to the second computational resource level, the second subset and one or more additional fully connected channels of the second available convolutional filters at the second convolutional layer of the CNN to fourth feature maps corresponding to the second input image data.
- process 700 further includes applying, in response to a change to a third computational resource level less than the first computational resource level available for processing third input image data, only a second subset of available convolutional filters at the convolutional layer of the CNN to third feature maps corresponding to the third input image, such that the second subset has fewer convolutional filters than the first subset and each convolutional filter of the second subset is in the first subset.
- a number of convolutional filters in the first subset is not less than twice a number of convolutional filters in the second subset and a number of convolutional filters in the first subset and the one or more additional convolutional filters is not less than twice the number of convolutional filters in the first subset.
- first and second object recognition indicators are transmitted for the first and second images, respectively, based at least in part on said applications of the convolutional layer of the CNN.
- the first and second object recognition indicators may include any suitable object recognition or detection indicators such as class probability scores, one or more locations of detected objects within image data, or one or more flags indicating an object is detected in image data.
- process 700 further includes training the CNN.
- training the CNN includes selecting a number of available configurations for the CNN, determining a number of convolutional filters for application at each convolutional layer for each of the available configurations, and training, each of the available configurations in conjunction with one another such that common convolutional filters of the available configurations share filter waits in the training, to generate finalized weights for the CNN.
- training each of the available configurations includes, at each of a plurality of training iterations performing a number of back-propagations equal to the number of available configurations to determine batch gradient descents and updating convolutional filter waits using the batch gradient descents.
- training each of the available configurations, at each of a plurality of training iterations includes a layer-wise dropout of convolutional filters at each convolutional layer to train the full CNN and, subsequently, reduced CNN configurations.
- Process 700 may provide for generating object recognition or detection data for any number of input images. Process 700 may be repeated any number of times either in series or in parallel for any number of input images, input images of a video sequence of input images, video frames, etc.
- Various components of the systems described herein may be implemented in software, firmware, and/or hardware and/or any combination thereof.
- various components of devices or systems discussed herein may be provided, at least in part, by hardware of a computing System-on-a-Chip (SoC) such as may be found in a computing system such as, for example, a computer, a laptop computer, a tablet, or a smart phone.
- SoC System-on-a-Chip
- implementation of the example processes discussed herein may include the undertaking of all operations shown in the order illustrated, the present disclosure is not limited in this regard and, in various examples, implementation of the example processes herein may include only a subset of the operations shown, operations performed in a different order than illustrated, or additional operations.
- any one or more of the operations discussed herein may be undertaken in response to instructions provided by one or more computer program products.
- Such program products may include signal bearing media providing instructions that, when executed by, for example, a processor, may provide the functionality described herein.
- the computer program products may be provided in any form of one or more machine-readable media.
- a processor including one or more graphics processing unit(s) or processor core(s) may undertake one or more of the blocks of the example processes herein in response to program code and/or instructions or instruction sets conveyed to the processor by one or more machine- readable media.
- a machine-readable medium may convey software in the form of program code and/or instructions or instruction sets that may cause any of the devices and/or systems described herein to implement at least portions of the discussed operations, modules, or components discussed herein.
- module refers to any combination of software logic, firmware logic, hardware logic, and/or circuitry configured to provide the functionality described herein.
- the software may be embodied as a software package, code and/or instruction set or instructions, and“hardware”, as used in any implementation described herein, may include, for example, singly or in any combination, hardwired circuitry, programmable circuitry, state machine circuitry, fixed function circuitry, execution unit circuitry, and/or firmware that stores instructions executed by programmable circuitry.
- the modules may, collectively or individually, be embodied as circuitry that forms part of a larger system, for example, an integrated circuit (IC), system on-chip (SoC), and so forth.
- IC integrated circuit
- SoC system on-chip
- FIG. 9 is an illustrative diagram of an example system 900, arranged in accordance with at least some implementations of the present disclosure.
- system 900 may be a computing system although system 900 is not limited to this context.
- system 900 may be incorporated into a personal computer (PC), laptop computer, ultra-laptop computer, tablet, phablet, touch pad, portable computer, handheld computer, palmtop computer, personal digital assistant (PDA), cellular telephone, combination cellular telephone/PDA, television, smart device (e.g., smart phone, smart tablet or smart television), mobile internet device (MID), messaging device, data communication device, peripheral device, gaming console, wearable device, display device, all-in-one device, two-in-one device, and so forth.
- PC personal computer
- PDA personal digital assistant
- MID mobile internet device
- system 900 includes a platform 902 coupled to a display 920.
- Platform 902 may receive content from a content device such as content services device(s) 930 or content delivery device(s) 940 or other similar content sources such as a camera or camera module or the like.
- a navigation controller 950 including one or more navigation features may be used to interact with, for example, platform 902 and/or display 920. Each of these components is described in greater detail below.
- platform 902 may include any combination of a chipset 905, processor 910, memory 912, antenna 913, storage 914, graphics subsystem 915, applications 916 and/or radio 918.
- Chipset 905 may provide intercommunication among processor 910, memory 912, storage 914, graphics subsystem 915, applications 916 and/or radio 918.
- chipset 905 may include a storage adapter (not depicted) capable of providing intercommunication with storage 914.
- Processor 910 may be implemented as a Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors, x86 instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU). In various implementations, processor 910 may be dual-core processor(s), dual-core mobile processor(s), and so forth.
- CISC Complex Instruction Set Computer
- RISC Reduced Instruction Set Computer
- processor 910 may be dual-core processor(s), dual-core mobile processor(s), and so forth.
- Memory 912 may be implemented as a volatile memory device such as, but not limited to, a Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), or Static RAM (SRAM).
- RAM Random Access Memory
- DRAM Dynamic Random Access Memory
- SRAM Static RAM
- Storage 914 may be implemented as a non-volatile storage device such as, but not limited to, a magnetic disk drive, optical disk drive, tape drive, an internal storage device, an attached storage device, flash memory, battery backed-up SDRAM (synchronous DRAM), and/or a network accessible storage device.
- storage 914 may include technology to increase the storage performance enhanced protection for valuable digital media when multiple hard drives are included, for example.
- Graphics subsystem 915 may perform processing of images such as still images, graphics, or video for display. Graphics subsystem 915 may be a graphics processing unit (GPU), a visual processing unit (VPU), or an image processing unit, for example. In some examples, graphics subsystem 915 may perform scanned image rendering as discussed herein.
- An analog or digital interface may be used to communicatively couple graphics subsystem 915 and display 920.
- the interface may be any of a High-Definition Multimedia Interface, DisplayPort, wireless HDMI, and/or wireless HD compliant techniques.
- Graphics subsystem 915 may be integrated into processor 910 or chipset 905. In some implementations, graphics subsystem 915 may be a stand-alone device communicatively coupled to chipset 905.
- image processing techniques described herein may be implemented in various hardware architectures. For example, image processing functionality may be integrated within a chipset. Alternatively, a discrete graphics and/or image processor and/or application specific integrated circuit may be used. As still another implementation, the image processing may be provided by a general purpose processor, including a multi-core processor. In further embodiments, the functions may be implemented in a consumer electronics device.
- Radio 918 may include one or more radios capable of transmitting and receiving signals using various suitable wireless communications techniques. Such techniques may involve communications across one or more wireless networks.
- Example wireless networks include (but are not limited to) wireless local area networks (WLANs), wireless personal area networks (WPANs), wireless metropolitan area network (WMANs), cellular networks, and satellite networks. In communicating across such networks, radio 918 may operate in accordance with one or more applicable standards in any version.
- display 920 may include any flat panel monitor or display.
- Display 920 may include, for example, a computer display screen, touch screen display, video monitor, television-like device, and/or a television.
- Display 920 may be digital and/or analog.
- display 920 may be a holographic display.
- display 920 may be a transparent surface that may receive a visual projection.
- projections may convey various forms of information, images, and/or objects.
- such projections may be a visual overlay for a mobile augmented reality (MAR) application.
- MAR mobile augmented reality
- platform 902 may display user interface 922 on display 920.
- content services device(s) 930 may be hosted by any national, international and/or independent service and thus accessible to platform 902 via the Internet, for example.
- Content services device(s) 930 may be coupled to platform 902 and/or to display 920.
- Platform 902 and/or content services device(s) 930 may be coupled to a network 960 to communicate (e.g., send and/or receive) media information to and from network 960.
- Content delivery device(s) 940 also may be coupled to platform 902 and/or to display 920.
- content services device(s) 930 may include a cable television box, personal computer, network, telephone, Internet enabled devices or appliance capable of delivering digital information and/or content, and any other similar device capable of uni-directionally or bi-directionally communicating content between content providers and platform 902 and/display 920, via network 960 or directly. It will be appreciated that the content may be communicated uni-directionally and or bidirectionally to and from any one of the components in system 900 and a content provider via network 960. Examples of content may include any media information including, for example, video, music, medical and gaming information, and so forth.
- Content services device(s) 930 may receive content such as cable television programming including media information, digital information, and/or other content.
- content providers may include any cable or satellite television or radio or Internet content providers. The provided examples are not meant to limit implementations in accordance with the present disclosure in any way.
- platform 902 may receive control signals from navigation controller 950 having one or more navigation features.
- the navigation features of navigation controller 950 may be used to interact with user interface 922, for example.
- navigation controller 950 may be a pointing device that may be a computer hardware component (specifically, a human interface device) that allows a user to input spatial (e.g., continuous and multi-dimensional) data into a computer.
- GUI graphical user interfaces
- televisions and monitors allow the user to control and provide data to the computer or television using physical gestures.
- Movements of the navigation features of navigation controller 950 may be replicated on a display (e.g., display 920) by movements of a pointer, cursor, focus ring, or other visual indicators displayed on the display.
- a display e.g., display 920
- the navigation features located on navigation controller 950 may be mapped to virtual navigation features displayed on user interface 922, for example.
- navigation controller 950 may not be a separate component but may be integrated into platform 902 and/or display 920. The present disclosure, however, is not limited to the elements or in the context shown or described herein.
- drivers may include technology to enable users to instantly turn on and off platform 902 like a television with the touch of a button after initial boot-up, when enabled, for example.
- Program logic may allow platform 902 to stream content to media adaptors or other content services device(s) 930 or content delivery device(s) 940 even when the platform is turned“off.”
- chipset 905 may include hardware and/or software support for 5.1 surround sound audio and/or high definition 9.1 surround sound audio, for example.
- Drivers may include a graphics driver for integrated graphics platforms.
- the graphics driver may include a peripheral component interconnect (PCI) Express graphics card.
- PCI peripheral component interconnect
- any one or more of the components shown in system 900 may be integrated.
- platform 902 and content services device(s) 930 may be integrated, or platform 902 and content delivery device(s) 940 may be integrated, or platform 902, content services device(s) 930, and content delivery device(s) 940 may be integrated, for example.
- platform 902 and display 920 may be an integrated unit. Display 920 and content service device(s) 930 may be integrated, or display 920 and content delivery device(s) 940 may be integrated, for example. These examples are not meant to limit the present disclosure.
- system 900 may be implemented as a wireless system, a wired system, or a combination of both.
- system 900 may include components and interfaces suitable for communicating over a wireless shared media, such as one or more antennas, transmitters, receivers, transceivers, amplifiers, filters, control logic, and so forth.
- a wireless shared media may include portions of a wireless spectrum, such as the RF spectrum and so forth.
- system 900 may include components and interfaces suitable for communicating over wired communications media, such as input/output (I/O) adapters, physical connectors to connect the I/O adapter with a corresponding wired communications medium, a network interface card (NIC), disc controller, video controller, audio controller, and the like.
- wired communications media may include a wire, cable, metal leads, printed circuit board (PCB), backplane, switch fabric, semiconductor material, twisted-pair wire, co-axial cable, fiber optics, and so forth.
- Platform 902 may establish one or more logical or physical channels to communicate information.
- the information may include media information and control information.
- Media information may refer to any data representing content meant for a user. Examples of content may include, for example, data from a voice conversation, videoconference, streaming video, electronic mail (“email”) message, voice mail message, alphanumeric symbols, graphics, image, video, text and so forth. Data from a voice conversation may be, for example, speech information, silence periods, background noise, comfort noise, tones and so forth.
- Control information may refer to any data representing commands, instructions or control words meant for an automated system. For example, control information may be used to route media information through a system, or instruct a node to process the media information in a predetermined manner. The embodiments, however, are not limited to the elements or in the context shown or described in FIG. 9.
- FIG. 10 illustrates an example small form factor device 1000, arranged in accordance with at least some implementations of the present disclosure.
- system 900 may be implemented via device 1000.
- other systems, components, or modules discussed herein or portions thereof may be implemented via device 1000.
- device 1000 may be implemented as a mobile computing device a having wireless capabilities.
- a mobile computing device may refer to any device having a processing system and a mobile power source or supply, such as one or more batteries, for example.
- Examples of a mobile computing device may include a personal computer (PC), laptop computer, ultra-laptop computer, tablet, touch pad, portable computer, handheld computer, palmtop computer, personal digital assistant (PDA), cellular telephone, combination cellular telephone/PDA, smart device (e.g., smartphone, smart tablet or smart mobile television), mobile internet device (MID), messaging device, data communication device, cameras (e.g. point-and- shoot cameras, super-zoom cameras, digital single-lens reflex (DSLR) cameras), and so forth.
- PC personal computer
- laptop computer ultra-laptop computer
- tablet touch pad
- portable computer handheld computer
- palmtop computer personal digital assistant
- cellular telephone e.g., combination cellular telephone/PDA
- smart device e.g., smartphone, smart tablet or smart mobile television
- MID mobile internet device
- messaging device e.g., data communication device
- cameras e.g. point-and- shoot cameras, super-zoom cameras, digital single-lens reflex (DSLR) cameras
- Examples of a mobile computing device also may include computers that are arranged to be implemented by a motor vehicle or robot, or worn by a person, such as wrist computers, finger computers, ring computers, eyeglass computers, belt-clip computers, arm-band computers, shoe computers, clothing computers, and other wearable computers.
- a mobile computing device may be implemented as a smartphone capable of executing computer applications, as well as voice communications and/or data communications.
- voice communications and/or data communications may be described with a mobile computing device implemented as a smartphone by way of example, it may be appreciated that other embodiments may be implemented using other wireless mobile computing devices as well. The embodiments are not limited in this context.
- device 1000 may include a housing with a front 1001 and a back 1002.
- Device 1000 includes a display 1004, an input/output (I/O) device 1006, a color camera 1021, a color camera 1022, and an integrated antenna 1008.
- I/O device 1006 may include any suitable I/O device for entering information into a mobile computing device. Examples for I/O device 1006 may include an alphanumeric keyboard, a numeric keypad, a touch pad, input keys, buttons, switches, microphones, speakers, voice recognition device and software, and so forth.
- device 1000 may include color cameras 1021, 1022, and a flash 1010 integrated into back 1002 (or elsewhere) of device 1000.
- color cameras 1021, 1022, and flash 1010 may be integrated into front 1001 of device 1000 or both front and back sets of cameras may be provided.
- Color cameras 1021, 1022 and a flash 1010 may be components of a camera module to originate color image data that may be processed into an image or streaming video that is output to display 1004 and/or communicated remotely from device 1000 via antenna 1008 for example.
- Various embodiments may be implemented using hardware elements, software elements, or a combination of both.
- hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
- Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
- One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein.
- Such representations known as IP cores may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
- the above embodiments are not limited in this regard and, in various implementations, the above embodiments may include the undertaking only a subset of such features, undertaking a different order of such features, undertaking a different combination of such features, and/or undertaking additional features than those features explicitly listed.
- the scope of the embodiments should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Abstract
Description
Claims
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2018/000828 WO2020122753A1 (en) | 2018-12-14 | 2018-12-14 | On the fly adaptive convolutional neural network for variable computational resources |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3895059A1 true EP3895059A1 (en) | 2021-10-20 |
Family
ID=65278442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18842859.3A Pending EP3895059A1 (en) | 2018-12-14 | 2018-12-14 | On the fly adaptive convolutional neural network for variable computational resources |
Country Status (4)
Country | Link |
---|---|
US (1) | US11928860B2 (en) |
EP (1) | EP3895059A1 (en) |
CN (1) | CN113168502A (en) |
WO (1) | WO2020122753A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI696129B (en) * | 2019-03-15 | 2020-06-11 | 華邦電子股份有限公司 | Memory chip capable of performing artificial intelligence operation and operation method thereof |
US20210056357A1 (en) * | 2019-08-19 | 2021-02-25 | Board Of Trustees Of Michigan State University | Systems and methods for implementing flexible, input-adaptive deep learning neural networks |
CN111294512A (en) * | 2020-02-10 | 2020-06-16 | 深圳市铂岩科技有限公司 | Image processing method, image processing apparatus, storage medium, and image pickup apparatus |
CN114581767B (en) * | 2022-01-19 | 2024-03-22 | 上海土蜂科技有限公司 | Image processing system, method and computer device thereof |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102276339B1 (en) * | 2014-12-09 | 2021-07-12 | 삼성전자주식회사 | Apparatus and method for training convolutional neural network for approximation of convolutional neural network |
US10185891B1 (en) * | 2016-07-08 | 2019-01-22 | Gopro, Inc. | Systems and methods for compact convolutional neural networks |
WO2018084974A1 (en) | 2016-11-04 | 2018-05-11 | Google Llc | Convolutional neural network |
US10997502B1 (en) * | 2017-04-13 | 2021-05-04 | Cadence Design Systems, Inc. | Complexity optimization of trainable networks |
US20180336468A1 (en) * | 2017-05-16 | 2018-11-22 | Nec Laboratories America, Inc. | Pruning filters for efficient convolutional neural networks for image recognition in surveillance applications |
US10051423B1 (en) * | 2017-06-02 | 2018-08-14 | Apple Inc. | Time of flight estimation using a convolutional neural network |
US20200279156A1 (en) * | 2017-10-09 | 2020-09-03 | Intel Corporation | Feature fusion for multi-modal machine learning analysis |
-
2018
- 2018-12-14 US US17/058,077 patent/US11928860B2/en active Active
- 2018-12-14 EP EP18842859.3A patent/EP3895059A1/en active Pending
- 2018-12-14 CN CN201880095126.8A patent/CN113168502A/en active Pending
- 2018-12-14 WO PCT/RU2018/000828 patent/WO2020122753A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
CN113168502A (en) | 2021-07-23 |
US11928860B2 (en) | 2024-03-12 |
US20210216747A1 (en) | 2021-07-15 |
WO2020122753A1 (en) | 2020-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11538164B2 (en) | Coupled multi-task fully convolutional networks using multi-scale contextual information and hierarchical hyper-features for semantic image segmentation | |
US11676278B2 (en) | Deep learning for dense semantic segmentation in video with automated interactivity and improved temporal coherence | |
US10885384B2 (en) | Local tone mapping to reduce bit depth of input images to high-level computer vision tasks | |
US10685262B2 (en) | Object recognition based on boosting binary convolutional neural network features | |
US20240029193A1 (en) | High fidelity interactive segmentation for video data with deep convolutional tessellations and context aware skip connections | |
US11928860B2 (en) | On the fly adaptive convolutional neural network for variable computational budget | |
US20240112035A1 (en) | 3d object recognition using 3d convolutional neural network with depth based multi-scale filters | |
US9860553B2 (en) | Local change detection in video | |
US20170076195A1 (en) | Distributed neural networks for scalable real-time analytics | |
US11164317B2 (en) | Real-time mask quality predictor | |
US20200267310A1 (en) | Single image ultra-wide fisheye camera calibration via deep learning | |
US11776263B2 (en) | Bidirectional pairing architecture for object detection in video | |
US20230368493A1 (en) | Method and system of image hashing object detection for image processing | |
US20240005628A1 (en) | Bidirectional compact deep fusion networks for multimodality visual analysis applications | |
WO2021253148A1 (en) | Input image size switchable network for adaptive runtime efficient image classification | |
WO2023028908A1 (en) | Dynamic temporal normalization for deep learning in video understanding applications | |
US20240005649A1 (en) | Poly-scale kernel-wise convolution for high-performance visual recognition applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20201112 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230605 |