WO2018027453A1 - Artificial neural network - Google Patents

Artificial neural network Download PDF

Info

Publication number
WO2018027453A1
WO2018027453A1 PCT/CN2016/093904 CN2016093904W WO2018027453A1 WO 2018027453 A1 WO2018027453 A1 WO 2018027453A1 CN 2016093904 W CN2016093904 W CN 2016093904W WO 2018027453 A1 WO2018027453 A1 WO 2018027453A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
stochastic
artificial neural
linear unit
rectified linear
Prior art date
Application number
PCT/CN2016/093904
Other languages
French (fr)
Inventor
Xiaoheng JIANG
Original Assignee
Nokia Technologies Oy
Nokia Technologies (Beijing) Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy, Nokia Technologies (Beijing) Co., Ltd. filed Critical Nokia Technologies Oy
Priority to PCT/CN2016/093904 priority Critical patent/WO2018027453A1/en
Priority to EP16911909.6A priority patent/EP3497622A4/en
Priority to CN201680088205.7A priority patent/CN109564633B/en
Priority to US16/321,285 priority patent/US10956788B2/en
Publication of WO2018027453A1 publication Critical patent/WO2018027453A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the present invention relates to artificial neural networks, such as, for example, convolutional artificial neural networks.
  • Machine learning and machine recognition finds several applications, such as, for example, automated passport control at airports, where a digital image of a person’s face may be compared to biometric information, stored in a passport, characterizing the person’s face.
  • machine recognition is in handwriting or printed document text recognition, to render contents of books searchable, for example.
  • a yet further example is pedestrian recognition, wherein, ultimately, a self-driving car is thereby seen as being enabled to become aware a pedestrian is ahead and the car can avoid running over the pedestrian.
  • spoken language may be the subject of machine recognition.
  • spoken language may be subsequently input to a parser to provide commands to a digital personal assistant, or it may be provided to a machine translation program to thereby obtain a text in another language, corresponding in meaning to the spoken language.
  • Machine recognition technologies employ algorithms engineered for this purpose.
  • artificial neural networks may be used to implement machine vision applications.
  • Artificial neural network may be referred to herein simply as neural networks.
  • Machine recognition algorithms may comprise processing functions, in recognition of images such processing functions may include, for example, filtering, such as morphological filtering, thresholding, edge detection, pattern recognition and object dimension measurement.
  • Neural network may be comprise, for example, fully connected layers and convolutional layers.
  • a fully connected layer may comprise a layer wherein all neurons have connections to all neurons on an adjacent layer, such as, for example, a previous layer.
  • a convolutional layer may comprise a layer wherein neurons receive input from a part of a previous layer, such part being referred to as a receptive field, for example.
  • an apparatus comprising memory configured to store data defining, at least partly, an artificial neural network, and at least one processing core configured to train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  • Various embodiments of the first aspect may comprise at least one feature from the following bulleted list:
  • the stochastical selection comprises a random or pseudorandom stochastic selection
  • the dataset comprises a plurality of test images and the at least one processing core is configured to vary the stochastically selected value for each test image
  • the at least one processing core is configured to apply a first stochastic rectified linear unit between a first pair of convolutional layers in the artificial neural network, and a second stochastic rectified linear unit between a second pair of convolutional layers in the artificial neural network
  • the at least one stochastic rectified linear unit is configured to produce a zero output from a negative input
  • the at least one processing core is configured to implement a stochastic dropout function in the artificial neural network, the dropout feature stochastically setting half of activations within a layer to zero for each training sample
  • the stochastical selection comprises that the value is randomly or pseudorandomly selected from the range (1 -a, 1 + a)
  • the artificial neural network is a pattern recognition neural network.
  • method comprising storing data defining, at least partly, an artificial neural network, and training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  • Various embodiments of the second aspect may comprise at least one feature from the following bulleted list:
  • the stochastical selection comprises a random or pseudorandom stochastic selection
  • the dataset comprises a plurality of test images and the at least one processing core is configured to vary the stochastically selected value for each test image
  • the method further comprises applying a first stochastic rectified linear unit between a first pair of convolutional layers in the artificial neural network, and applying a second stochastic rectified linear unit between a second pair of convolutional layers in the artificial neural network
  • the at least one stochastic rectified linear unit is configured to produce a zero output from a negative input
  • the method further comprises implementing a stochastic dropout function in the artificial neural network, the dropout feature stochastically setting half of activations within a layer to zero for each training sample
  • the stochastical selection comprises that the value is randomly or pseudorandomly selected from the range (1 -a, 1 + a)
  • the artificial neural network is a pattern recognition neural network.
  • an apparatus comprising means for storing data defining, at least partly, an artificial neural network, means for training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  • a non-transitory computer readable medium having stored thereon a set of computer readable instructions that, when executed by at least one processor, cause an apparatus to at least store data defining, at least partly, an artificial neural network, and train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  • a computer program configured to cause a method in accordance with the second aspect to be performed.
  • FIGURE 1 illustrates an example system capable of supporting at least some embodiments of the present invention
  • FIGURE 2 illustrates rectifiers
  • FIGURE 3 illustrates an example apparatus capable of supporting at least some embodiments of the present invention
  • FIGURE 4 illustrates a neural network in accordance with at least some embodiments of the present invention.
  • FIGURE 5 is a flow graph of a method in accordance with at least some embodiments of the present invention.
  • a stochastic rectified linear unit that varies a gradient of a linear function defining an output has been found to outperform classical rectified linear units.
  • the stochastic rectified linear unit has also been found to outperform noisy rectified linear units, which use additive Gaussian noise to randomize the output.
  • FIGURE 1 illustrates an example system capable of supporting at least some embodiments of the present invention.
  • FIGURE 1 has a view 110 of a road 101, on which a pedestrian 120 is walking. While described herein in connection with FIGURE 1 in terms of detecting pedestrians, the invention is not restricted thereto, but as the skilled person will understand, the invention is applicable also more generally to machine recognition in visual, audio or other kind of data. For example, bicyclist recognition, handwriting recognition, facial recognition, traffic sign recognition, voice recognition, language recognition, sign language recognition and/or spam email recognition may benefit from the present invention, depending on the embodiment in question.
  • road 101 is imaged by a camera.
  • the camera may be configured to capture a view 110 that covers the road, at least in part.
  • the camera may be configured to pre-process image data obtained from an image capture device, such as a charge-coupled device, CCD, comprised in the camera. Examples of pre-processing include reduction to black and white, contrast adjustment and/or brightness balancing to increase a dynamic range present in the captured image.
  • the image data is also scaled to a bit depth suitable for feeding into an image recognition algorithm, such as AdaBoost, for example.
  • Pre-processing may include selection of an area of interest, such as area 125, for example, for feeding into the image recognition algorithm. Pre-processing may be absent or limited in nature, depending on the embodiment.
  • the camera may be installed, for example, in a car that is configured to drive itself, or collect training data.
  • the camera may be installed in a car designed to be driven by a human driver, but to provide a warning and/or automatic braking if the car appears to be about to hit a pedestrian or an animal.
  • An image feed from the camera may be used to generate a test dataset for use in training a neural network.
  • a dataset may comprise training samples.
  • a training sample may comprise a still image, such as a video image frame, or a short video clip, for example.
  • the incoming data to be recognized is not visual data
  • the incoming data may comprise, for example, a vector of digital samples obtained from an analogue-to-digital converter.
  • the analogue-to-digital converter may obtain an analogue feed from a microphone, for example, and generate the samples from the analogue feed.
  • data of non-visual forms may also be the subject of machine recognition. For example, accelerometer or rotation sensor data may be used to detect whether a person is walking, running or falling.
  • a training phase may precede a use phase, or test phase, of the neural network.
  • a challenge with training neural networks with test datasets is over-fitting of the neural network to the test dataset.
  • a neural network may comprise a large number of parameters, even millions of parameters, the network may become specialized in recognizing characteristics of the test dataset, rather than becoming specialized in performing the recognition task in a generic setting.
  • an element of randomization may be introduced between layers of the neural network.
  • Dropout One way to introduce an element of randomization between layers of the neural network is so-called dropout, where, during training, half of activations are randomly, or stochastically, selected and set to zero. The selecting may be re-done for each training sample, for example. Dropout may be applied to fully connected layers, for example, where it produces more of a benefit than in convolutional layers. Dropout may be seen as providing a way of approximatively combining exponentially many different neural network architectures in an efficient manner. Dropout is typically applied to fully connected layers, where it may provide a benefit. Dropout does not seem to be similarly beneficial in convolutional layers.
  • stochastic pooling Another way to introduce an element of randomization between layers of the neural network is stochastic pooling, wherein deterministic pooling operations, such as average and maximum pooling, are replaced with a stochastic procedure for regularizing convolutional neural networks. This procedure randomly picks the activation within each pooling region according to a multinomial distribution given by activities within the pooling region.
  • pooling does not necessarily follow each layer. Consequently, stochastic pooling may be applied a few times. Stochastic pooling needs to compute probabilities for each region at both training time and test time, resulting in an increased computational load in a device running the neural network.
  • the neural network is illustrated schematically in FIGURE 1 as first layer 130, rectifier 140 and second layer 150.
  • An actual network may comprise more than two layers.
  • Rectifier 140 may be comprised functionally in first layer 130 or second layer 150.
  • Rectifier 150 may perform an activation function, and/or rectifier may comprise a rectified linear unit, ReLU.
  • First and second layers may comprise convolutional layers. Alternatively, at least one, and optionally both, of first layer 130 and second layer 150 may comprise a fully connected layer.
  • Rectifier 140 may be configured to process an output of first layer 130, for input into second layer 150.
  • rectifier 140 may be configured to produce an output of zero from inputs that have negative values, effectively preventing negative values from being fed from first layer 130 to second layer 150.
  • Values x may be comprised in real numbers, represented in a digital system by floating-point values or an integer representation, for example.
  • the variance may be obtained using all the units of one layer, for example.
  • the parameter a may take the value of 0.1, 0.3, 0.5, 0.8 or 0.9, for example.
  • Multiplier b may be randomly or pseudorandomly re-obtained for each training sample, for example.
  • multiplier b may be randomly or pseudorandomly re-obtained several times during training of the neural network, but not separately for each training sample. For example, multiplier b may be so re-obtained every ten, or every hundred, training samples.
  • the stochastic rectifier multiplies an input with a multiplier that is randomly selected.
  • a positive output is produced from a positive input such that the positive output is a linear function of the positive input, a gradient of the linear function having a variability.
  • the variability may be random or pseudorandom, for example.
  • no noise is separately generated and additively added in to obtain the output.
  • the stochastic rectifier may be arranged to return a zero output.
  • the stochastic rectifier yields improved recognition results with compared to both the traditional rectifier and the noisy rectifier.
  • dropout being optionally used in a fully connected layer:
  • FIGURE 2 illustrates rectifiers.
  • the response for positive x is linear with a gradient of unity.
  • NReLU noisy rectifier
  • a randomly selected value is added to the output.
  • Outputs of the NReLU will predominantly lie between the two lines, for positive inputs.
  • the NReLU may return a positive output in case the addition of Gaussian noise increases causes the output to exceed zero.
  • the upper, x+3 ⁇ , line intersects the y-axis above the origin.
  • c is a stochastic rectifier, SReLU.
  • the output of the rectifier for positive x lies between the two lines, denoted as b1*x and b2*x.
  • the output is zero.
  • the output for positive input is obtained by multiplying the input with a randomly selected value.
  • a positive output is produced from a positive input such that the positive output is a linear function of the positive input, a gradient of the linear function having a variability.
  • the SReLU may be configured, as illustrated, to return a zero output from a negative or zero input.
  • SReLU over NReLU may be understood with reference to the figure, since the range of variation in NReLU is constant, being so also for small input values, in SReLU, however, the range of variation decreases as the input approaches zero from the positive direction, which maintains signals in small-amplitude inputs better than NReLU. Furthermore, compared to NReLU, SReLU is computationally more efficient, since SReLU directly multiplies each activation unit with the multiplier selected from the range. NReLU, on the other hand, calculates an input variance from each layer, and then adds a bias selected from a Gaussian distribution to each activation unit. SReLU may, in general, be employed in an artificial convolutional neural network.
  • FIGURE 3 illustrates an example apparatus capable of supporting at least some embodiments of the present invention.
  • device 300 which may comprise, for example, computing device such a server, node or cloud computing device.
  • Device 300 may be configured to run a neural network, such as is described herein.
  • processor 310 which may comprise, for example, a single-or multi-core processor wherein a single-core processor comprises one processing core and a multi-core processor comprises more than one processing core.
  • Processor 310 may comprise more than one processor.
  • a processing core may comprise, for example, a Cortex-A8 processing core by ARM Holdings or a Steamroller processing core produced by Advanced Micro Devices Corporation.
  • Processor 310 may comprise at least one Qualcomm Snapdragon and/or Intel Core processor, for example.
  • Processor 310 may comprise at least one application-specific integrated circuit, ASIC.
  • Processor 310 may comprise at least one field-programmable gate array, FPGA.
  • Processor 310 may be means for performing method steps in device 300.
  • Processor 310 may be configured, at least in part by computer instructions, to perform actions.
  • Device 300 may comprise memory 320.
  • Memory 320 may comprise random-access memory and/or permanent memory.
  • Memory 320 may comprise at least one RAM chip.
  • Memory 320 may comprise solid-state, magnetic, optical and/or holographic memory, for example.
  • Memory 320 may be at least in part accessible to processor 310.
  • Memory 320 may be at least in part comprised in processor 310.
  • Memory 320 may be means for storing information.
  • Memory 320 may comprise computer instructions that processor 310 is configured to execute. When computer instructions configured to cause processor 310 to perform certain actions are stored in memory 320, and device 300 overall is configured to run under the direction of processor 310 using computer instructions from memory 320, processor 310 and/or its at least one processing core may be considered to be configured to perform said certain actions.
  • Memory 320 may be at least in part comprised in processor 310. Memory 320 may be at least in part external to device 300 but accessible to device 300. Computer instructions in memory 320 may comprise a plurality of applications or processes. For example, machine learning algorithms, such as an AdaBoost algorithm with its classifiers, may run in one application or process, a camera functionality may run in another application or process, and an output of a machine learning procedure may be provided to a further application or process, which may comprise an automobile driving process, for example, to cause a braking action to be triggered responsive to recognition of a pedestrian in a camera view.
  • machine learning algorithms such as an AdaBoost algorithm with its classifiers
  • Device 300 may comprise a transmitter 330.
  • Device 300 may comprise a receiver 340.
  • Transmitter 330 and receiver 340 may be configured to transmit and receive, respectively, information in accordance with at least one communication standard.
  • Transmitter 330 may comprise more than one transmitter.
  • Receiver 340 may comprise more than one receiver.
  • Transmitter 330 and/or receiver 340 may be configured to operate in accordance with wireless local area network, WLAN, Ethernet, universal serial bus, USB, and/or worldwide interoperability for microwave access, WiMAX, standards, for example. Alternatively or additionally, a proprietary communication framework may be utilized.
  • Device 300 may comprise user interface, UI, 360.
  • UI 360 may comprise at least one of a display, a keyboard, a touchscreen, a vibrator arranged to signal to a user by causing device 300 to vibrate, a speaker and a microphone.
  • a user may be able to operate device 300 via UI 360, for example to configure machine learning parameters and/or to switch device 300 on and/or off.
  • Processor 310 may be furnished with a transmitter arranged to output information from processor 310, via electrical leads internal to device 300, to other devices comprised in device 300.
  • a transmitter may comprise a serial bus transmitter arranged to, for example, output information via at least one electrical lead to memory 320 for storage therein.
  • the transmitter may comprise a parallel bus transmitter.
  • processor 310 may comprise a receiver arranged to receive information in processor 310, via electrical leads internal to device 300, from other devices comprised in device 300.
  • Such a receiver may comprise a serial bus receiver arranged to, for example, receive information via at least one electrical lead from receiver 340 for processing in processor 310.
  • the receiver may comprise a parallel bus receiver.
  • Device 300 may comprise further devices not illustrated in FIGURE 3.
  • device 300 may comprise at least one digital camera.
  • Some devices 300 may comprise a back-facing camera and a front-facing camera, wherein the back-facing camera may be intended for digital photography and the front-facing camera for video telephony.
  • Device 300 may comprise a fingerprint sensor arranged to authenticate, at least in part, a user of device 300. In some embodiments, device 300 lacks at least one device described above.
  • Processor 310, memory 320, transmitter 330, receiver 340, and/or UI 360 may be interconnected by electrical leads internal to device 300 in a multitude of different ways.
  • each of the aforementioned devices may be separately connected to a master bus internal to device 300, to allow for the devices to exchange information.
  • this is only one example and depending on the embodiment various ways of interconnecting at least two of the aforementioned devices may be selected without departing from the scope of the present invention.
  • FIGURE 4 illustrates a neural network in accordance with at least some embodiments of the present invention.
  • the network comprises an input layer 410, which may have dimensions of 32 x 32, for example.
  • Layers 420, 430 and 440 may have dimensions 32 x 32, with depth 128.
  • Layer 420 may run 3x3 convolutional kernels with SReLU output
  • layer 420 may likewise run 3x3 convolutional kernels with SReLU output
  • layer 440 may run 1 x 1 convolutional kernels with SReLU output.
  • Each of layers 420, 430 and 440 outputs 128 feature channels.
  • Layers 450 and 460 may each have dimensions 32 x 32 with depth 192, and run 3x3 convolutional kernels with SReLU output.
  • Layer 470 may run lxl convolutional kernels, apply SReLU to output and implement a dropout, as described herein above.
  • Layers 480 and 490 may have dimensions 16 x 16, with depth 256, and they may run 3x3 convolutional kernels with SReLU output.
  • Layer 4100 may have 16 x 16 dimensions with 256 depth, with SReLU output and dropout.
  • Processing advances from layer 4100 to layer 4110 via a Max pooling procedure.
  • Layers 4110 and 4120 may have dimensions 8 x 8 with depth 512, and they may run 3x3 convolutional kernels with SReLU output.
  • Layers 4130 and 4140 may have dimensions 8 x 8 with depth 512 and 10, respectively, running 1x1 convolutional kernels with SReLU and ReLU output, respectively.
  • processing may advance to a decision phase via an Average pooling procedure. Activations in each channel are averaged to generate one score for each category.
  • the decision phase may comprise 10-class softmax classifier, for example.
  • the neural network may, in general, comprise an artificial convolutional neural network, for example.
  • FIGURE 5 is a flow graph of a method in accordance with at least some embodiments of the present invention.
  • the phases of the illustrated method may be performed in a device arranged to run the neural network, for example, by a control device of such a device.
  • Phase 510 comprises storing data defining, at least partly, an artificial neural network.
  • Phase 510 comprises training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  • At least some embodiments of the present invention find industrial application in optimizing machine recognition, to, for example, reduce traffic accidents in self-driving vehicles.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

According to an example aspect of the present invention, there is provided an apparatus comprising memory configured to store data defining, at least partly, an artificial neural network, and at least one processing core configured to train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.

Description

ARTIFICIAL NEURAL NETWORK FIELD
The present invention relates to artificial neural networks, such as, for example, convolutional artificial neural networks.
BACKGROUND
Machine learning and machine recognition finds several applications, such as, for example, automated passport control at airports, where a digital image of a person’s face may be compared to biometric information, stored in a passport, characterizing the person’s face.
Another example of machine recognition is in handwriting or printed document text recognition, to render contents of books searchable, for example. A yet further example is pedestrian recognition, wherein, ultimately, a self-driving car is thereby seen as being enabled to become aware a pedestrian is ahead and the car can avoid running over the pedestrian.
In addition to visual recognition, spoken language may be the subject of machine recognition. When spoken language is recognized, it may be subsequently input to a parser to provide commands to a digital personal assistant, or it may be provided to a machine translation program to thereby obtain a text in another language, corresponding in meaning to the spoken language.
Machine recognition technologies employ algorithms engineered for this purpose. For example, artificial neural networks may be used to implement machine vision applications. Artificial neural network may be referred to herein simply as neural networks. Machine recognition algorithms may comprise processing functions, in recognition of images such processing functions may include, for example, filtering, such as morphological filtering, thresholding, edge detection, pattern recognition and object dimension measurement.
Neural network may be comprise, for example, fully connected layers and convolutional layers. A fully connected layer may comprise a layer wherein all neurons  have connections to all neurons on an adjacent layer, such as, for example, a previous layer. A convolutional layer may comprise a layer wherein neurons receive input from a part of a previous layer, such part being referred to as a receptive field, for example.
SUMMARY OF THE INVENTION
The invention is defined by the features of the independent claims. Some specific embodiments are defined in the dependent claims.
According to a first aspect of the present invention, there is provided an apparatus comprising memory configured to store data defining, at least partly, an artificial neural network, and at least one processing core configured to train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
Various embodiments of the first aspect may comprise at least one feature from the following bulleted list:
· the stochastical selection comprises a random or pseudorandom stochastic selection
· the dataset comprises a plurality of test images and the at least one processing core is configured to vary the stochastically selected value for each test image
· the at least one processing core is configured to apply a first stochastic rectified linear unit between a first pair of convolutional layers in the artificial neural network, and a second stochastic rectified linear unit between a second pair of convolutional layers in the artificial neural network
· the at least one stochastic rectified linear unit is configured to produce a zero output from a negative input
· the at least one processing core is configured to implement a stochastic dropout function in the artificial neural network, the dropout feature stochastically setting half of activations within a layer to zero for each training sample
· the stochastical selection comprises that the value is randomly or pseudorandomly selected from the range (1 -a, 1 + a)
· the value a is 0, 8
· the value a is 0, 3
· the artificial neural network is a pattern recognition neural network.
According to a second aspect of the present invention, there is provided method comprising storing data defining, at least partly, an artificial neural network, and training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
Various embodiments of the second aspect may comprise at least one feature from the following bulleted list:
· the stochastical selection comprises a random or pseudorandom stochastic selection
· the dataset comprises a plurality of test images and the at least one processing core is configured to vary the stochastically selected value for each test image
· the method further comprises applying a first stochastic rectified linear unit between a first pair of convolutional layers in the artificial neural network, and applying a second stochastic rectified linear unit between a second pair of convolutional layers in the artificial neural network
· the at least one stochastic rectified linear unit is configured to produce a zero output from a negative input
· the method further comprises implementing a stochastic dropout function in the artificial neural network, the dropout feature stochastically setting half of activations within a layer to zero for each training sample
· the stochastical selection comprises that the value is randomly or pseudorandomly selected from the range (1 -a, 1 + a)
· the value a is 0, 8
· the value a is 0, 3
· the artificial neural network is a pattern recognition neural network.
According to a third aspect of the present invention, there is provided an apparatus comprising means for storing data defining, at least partly, an artificial neural network, means for training the artificial neural network by applying a test dataset to the  artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
According to a fourth aspect of the present invention, there is provided a non-transitory computer readable medium having stored thereon a set of computer readable instructions that, when executed by at least one processor, cause an apparatus to at least store data defining, at least partly, an artificial neural network, and train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
According to a fifth aspect of the present invention, there is provided a computer program configured to cause a method in accordance with the second aspect to be performed.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGURE 1 illustrates an example system capable of supporting at least some embodiments of the present invention;
FIGURE 2 illustrates rectifiers;
FIGURE 3 illustrates an example apparatus capable of supporting at least some embodiments of the present invention;
FIGURE 4 illustrates a neural network in accordance with at least some embodiments of the present invention, and
FIGURE 5 is a flow graph of a method in accordance with at least some embodiments of the present invention.
EMBODIMENTS
Using an activation function with randomization, effects of overtraining a neural network to a specific training dataset may be alleviated. In detail, a stochastic rectified linear unit that varies a gradient of a linear function defining an output has been found to outperform classical rectified linear units. The stochastic rectified linear unit has also been found to outperform noisy rectified linear units, which use additive Gaussian noise to randomize the output.
FIGURE 1 illustrates an example system capable of supporting at least some embodiments of the present invention. FIGURE 1 has a view 110 of a road 101, on which a pedestrian 120 is walking. While described herein in connection with FIGURE 1 in terms of detecting pedestrians, the invention is not restricted thereto, but as the skilled person will understand, the invention is applicable also more generally to machine recognition in visual, audio or other kind of data. For example, bicyclist recognition, handwriting recognition, facial recognition, traffic sign recognition, voice recognition, language recognition, sign language recognition and/or spam email recognition may benefit from the present invention, depending on the embodiment in question.
In FIGURE 1, road 101 is imaged by a camera. The camera may be configured to capture a view 110 that covers the road, at least in part. The camera may be configured to pre-process image data obtained from an image capture device, such as a charge-coupled device, CCD, comprised in the camera. Examples of pre-processing include reduction to black and white, contrast adjustment and/or brightness balancing to increase a dynamic range present in the captured image. In some embodiments, the image data is also scaled to a bit depth suitable for feeding into an image recognition algorithm, such as AdaBoost, for example. Pre-processing may include selection of an area of interest, such as area 125, for example, for feeding into the image recognition algorithm. Pre-processing may be absent or limited in nature, depending on the embodiment. The camera may be installed, for example, in a car that is configured to drive itself, or collect training data. Alternatively, the camera may be installed in a car designed to be driven by a human driver, but to provide a warning and/or automatic braking if the car appears to be about to hit a pedestrian or an animal.
An image feed from the camera may be used to generate a test dataset for use in training a neural network. Such a dataset may comprise training samples. A training  sample may comprise a still image, such as a video image frame, or a short video clip, for example. Where the incoming data to be recognized is not visual data, the incoming data may comprise, for example, a vector of digital samples obtained from an analogue-to-digital converter. The analogue-to-digital converter may obtain an analogue feed from a microphone, for example, and generate the samples from the analogue feed. Overall, as discussed above, data of non-visual forms may also be the subject of machine recognition. For example, accelerometer or rotation sensor data may be used to detect whether a person is walking, running or falling. As a neural network may be trained to recognize objects in view 110, a training phase may precede a use phase, or test phase, of the neural network.
A challenge with training neural networks with test datasets is over-fitting of the neural network to the test dataset. As a neural network may comprise a large number of parameters, even millions of parameters, the network may become specialized in recognizing characteristics of the test dataset, rather than becoming specialized in performing the recognition task in a generic setting. To control the over-fitting problem, an element of randomization may be introduced between layers of the neural network.
One way to introduce an element of randomization between layers of the neural network is so-called dropout, where, during training, half of activations are randomly, or stochastically, selected and set to zero. The selecting may be re-done for each training sample, for example. Dropout may be applied to fully connected layers, for example, where it produces more of a benefit than in convolutional layers. Dropout may be seen as providing a way of approximatively combining exponentially many different neural network architectures in an efficient manner. Dropout is typically applied to fully connected layers, where it may provide a benefit. Dropout does not seem to be similarly beneficial in convolutional layers.
Another way to introduce an element of randomization between layers of the neural network is stochastic pooling, wherein deterministic pooling operations, such as average and maximum pooling, are replaced with a stochastic procedure for regularizing convolutional neural networks. This procedure randomly picks the activation within each pooling region according to a multinomial distribution given by activities within the pooling region. In deep convolutional neural networks, pooling does not necessarily follow each layer. Consequently, stochastic pooling may be applied a few times. Stochastic  pooling needs to compute probabilities for each region at both training time and test time, resulting in an increased computational load in a device running the neural network.
The neural network is illustrated schematically in FIGURE 1 as first layer 130, rectifier 140 and second layer 150. An actual network may comprise more than two layers. Rectifier 140 may be comprised functionally in first layer 130 or second layer 150. Rectifier 150 may perform an activation function, and/or rectifier may comprise a rectified linear unit, ReLU. First and second layers may comprise convolutional layers. Alternatively, at least one, and optionally both, of first layer 130 and second layer 150 may comprise a fully connected layer.
Rectifier 140 may be configured to process an output of first layer 130, for input into second layer 150. For example, rectifier 140 may be configured to produce an output of zero from inputs that have negative values, effectively preventing negative values from being fed from first layer 130 to second layer 150. A traditional rectifier produces an output according to function f, such that f (x) = max (0, x) . Values x may be comprised in real numbers, represented in a digital system by floating-point values or an integer representation, for example.
A so-called noisy rectifier, NReLU produces an output according to f, such that f (x) = max (0, x + N (σ (x) ) ) , where N is Gaussian noise with variance σ (x) , the Gaussian noise being employed to randomize the output of the rectifier. The variance may be obtained using all the units of one layer, for example.
A stochastic rectifier, SReLU, in accordance with the present invention, operates by obtaining an output as f (x) = max (0, bx) , such that multiplier b is randomly or pseudorandomly selected from the range (1 -a, 1 + a) . The parameter a may take the value of 0.1, 0.3, 0.5, 0.8 or 0.9, for example. Multiplier b may be randomly or pseudorandomly re-obtained for each training sample, for example. Alternatively, multiplier b may be randomly or pseudorandomly re-obtained several times during training of the neural network, but not separately for each training sample. For example, multiplier b may be so re-obtained every ten, or every hundred, training samples. In other words, to obtain the output, the stochastic rectifier multiplies an input with a multiplier that is randomly selected. Put in another way, a positive output is produced from a positive input such that the positive output is a linear function of the positive input, a gradient of the linear function having a variability. The variability may be random or pseudorandom, for example. In at  least some embodiments of the SReLU, no noise is separately generated and additively added in to obtain the output. For negative inputs, the stochastic rectifier may be arranged to return a zero output.
The stochastic rectifier, SReLU, may be used at training time, while at test time, also referred to as simply during use, a traditional rectifier may be used, wherein in the traditional rectifier, the output f (x) produced by input x is f (x) = max (0, x) .
The stochastic rectifier, as defined above, yields improved recognition results with compared to both the traditional rectifier and the noisy rectifier. In a study conducted by the inventor, the following results were obtained, dropout being optionally used in a fully connected layer:
Figure PCTCN2016093904-appb-000001
Introducing randomization into the neural network improves performance, since two similar training samples will produce similar, but not the same, responses with randomization. Thus the test dataset effectively becomes larger, leading to improved performance. Over-fitting is also prevented, since the neural network cannot fit exactly to the training samples, the training samples producing the randomized, and no longer identical, or fully deterministic, output.
FIGURE 2 illustrates rectifiers. In the upper part of the figure, denoted as a) , is illustrated a traditional rectifier, ReLU, wherein f (x) =x for positive x and f (x) =0 for negative or zero x. The response for positive x is linear with a gradient of unity.
In the middle part of the figure, denoted by b) , is a noisy rectifier, NReLU, wherein f (x) =max (0, x+N) , N being Gaussian noise. The output of the rectifier for positive x lies between the two lines, denoted as f (x) = x+3σ and f (x) = x-3σ. In other words, in NReLU, a randomly selected value is added to the output. Outputs of the NReLU will  predominantly lie between the two lines, for positive inputs. For some slightly negative inputs, the NReLU may return a positive output in case the addition of Gaussian noise increases causes the output to exceed zero. Thus the upper, x+3σ, line intersects the y-axis above the origin.
In the lower part of the figure, denoted by c) , is a stochastic rectifier, SReLU. The output of the rectifier for positive x lies between the two lines, denoted as b1*x and b2*x. For negative x the output is zero. In other words, the output for positive input is obtained by multiplying the input with a randomly selected value. In terms of FIGURE 1, b1 = 1 + a and b2 = 1 -a. Expressed another way, a positive output is produced from a positive input such that the positive output is a linear function of the positive input, a gradient of the linear function having a variability. The SReLU may be configured, as illustrated, to return a zero output from a negative or zero input.
The benefit of SReLU over NReLU may be understood with reference to the figure, since the range of variation in NReLU is constant, being so also for small input values, in SReLU, however, the range of variation decreases as the input approaches zero from the positive direction, which maintains signals in small-amplitude inputs better than NReLU. Furthermore, compared to NReLU, SReLU is computationally more efficient, since SReLU directly multiplies each activation unit with the multiplier selected from the range. NReLU, on the other hand, calculates an input variance from each layer, and then adds a bias selected from a Gaussian distribution to each activation unit. SReLU may, in general, be employed in an artificial convolutional neural network.
FIGURE 3 illustrates an example apparatus capable of supporting at least some embodiments of the present invention. Illustrated is device 300, which may comprise, for example, computing device such a server, node or cloud computing device. Device 300 may be configured to run a neural network, such as is described herein. Comprised in device 300 is processor 310, which may comprise, for example, a single-or multi-core processor wherein a single-core processor comprises one processing core and a multi-core processor comprises more than one processing core. Processor 310 may comprise more than one processor. A processing core may comprise, for example, a Cortex-A8 processing core by ARM Holdings or a Steamroller processing core produced by Advanced Micro Devices Corporation. Processor 310 may comprise at least one Qualcomm Snapdragon and/or Intel Core processor, for example. Processor 310 may comprise at least one  application-specific integrated circuit, ASIC. Processor 310 may comprise at least one field-programmable gate array, FPGA. Processor 310 may be means for performing method steps in device 300. Processor 310 may be configured, at least in part by computer instructions, to perform actions.
Device 300 may comprise memory 320. Memory 320 may comprise random-access memory and/or permanent memory. Memory 320 may comprise at least one RAM chip. Memory 320 may comprise solid-state, magnetic, optical and/or holographic memory, for example. Memory 320 may be at least in part accessible to processor 310. Memory 320 may be at least in part comprised in processor 310. Memory 320 may be means for storing information. Memory 320 may comprise computer instructions that processor 310 is configured to execute. When computer instructions configured to cause processor 310 to perform certain actions are stored in memory 320, and device 300 overall is configured to run under the direction of processor 310 using computer instructions from memory 320, processor 310 and/or its at least one processing core may be considered to be configured to perform said certain actions. Memory 320 may be at least in part comprised in processor 310. Memory 320 may be at least in part external to device 300 but accessible to device 300. Computer instructions in memory 320 may comprise a plurality of applications or processes. For example, machine learning algorithms, such as an AdaBoost algorithm with its classifiers, may run in one application or process, a camera functionality may run in another application or process, and an output of a machine learning procedure may be provided to a further application or process, which may comprise an automobile driving process, for example, to cause a braking action to be triggered responsive to recognition of a pedestrian in a camera view.
Device 300 may comprise a transmitter 330. Device 300 may comprise a receiver 340. Transmitter 330 and receiver 340 may be configured to transmit and receive, respectively, information in accordance with at least one communication standard. Transmitter 330 may comprise more than one transmitter. Receiver 340 may comprise more than one receiver. Transmitter 330 and/or receiver 340 may be configured to operate in accordance with wireless local area network, WLAN, Ethernet, universal serial bus, USB, and/or worldwide interoperability for microwave access, WiMAX, standards, for example. Alternatively or additionally, a proprietary communication framework may be utilized.
Device 300 may comprise user interface, UI, 360. UI 360 may comprise at least one of a display, a keyboard, a touchscreen, a vibrator arranged to signal to a user by causing device 300 to vibrate, a speaker and a microphone. A user may be able to operate device 300 via UI 360, for example to configure machine learning parameters and/or to switch device 300 on and/or off.
Processor 310 may be furnished with a transmitter arranged to output information from processor 310, via electrical leads internal to device 300, to other devices comprised in device 300. Such a transmitter may comprise a serial bus transmitter arranged to, for example, output information via at least one electrical lead to memory 320 for storage therein. Alternatively to a serial bus, the transmitter may comprise a parallel bus transmitter. Likewise processor 310 may comprise a receiver arranged to receive information in processor 310, via electrical leads internal to device 300, from other devices comprised in device 300. Such a receiver may comprise a serial bus receiver arranged to, for example, receive information via at least one electrical lead from receiver 340 for processing in processor 310. Alternatively to a serial bus, the receiver may comprise a parallel bus receiver.
Device 300 may comprise further devices not illustrated in FIGURE 3. For example, where device 300 comprises a smartphone, it may comprise at least one digital camera. Some devices 300 may comprise a back-facing camera and a front-facing camera, wherein the back-facing camera may be intended for digital photography and the front-facing camera for video telephony. Device 300 may comprise a fingerprint sensor arranged to authenticate, at least in part, a user of device 300. In some embodiments, device 300 lacks at least one device described above.
Processor 310, memory 320, transmitter 330, receiver 340, and/or UI 360 may be interconnected by electrical leads internal to device 300 in a multitude of different ways. For example, each of the aforementioned devices may be separately connected to a master bus internal to device 300, to allow for the devices to exchange information. However, as the skilled person will appreciate, this is only one example and depending on the embodiment various ways of interconnecting at least two of the aforementioned devices may be selected without departing from the scope of the present invention.
FIGURE 4 illustrates a neural network in accordance with at least some embodiments of the present invention. The network comprises an input layer 410, which  may have dimensions of 32 x 32, for example.  Layers  420, 430 and 440 may have dimensions 32 x 32, with depth 128. Layer 420 may run 3x3 convolutional kernels with SReLU output, layer 420 may likewise run 3x3 convolutional kernels with SReLU output, and layer 440 may run 1 x 1 convolutional kernels with SReLU output. Each of  layers  420, 430 and 440 outputs 128 feature channels.
Layers  450 and 460 may each have dimensions 32 x 32 with depth 192, and run 3x3 convolutional kernels with SReLU output. Layer 470 may run lxl convolutional kernels, apply SReLU to output and implement a dropout, as described herein above.
Processing advances from layer 470 to layer 480 via a Max pooling procedure.  Layers  480 and 490 may have dimensions 16 x 16, with depth 256, and they may run 3x3 convolutional kernels with SReLU output. Layer 4100 may have 16 x 16 dimensions with 256 depth, with SReLU output and dropout. Processing advances from layer 4100 to layer 4110 via a Max pooling procedure.  Layers  4110 and 4120 may have dimensions 8 x 8 with depth 512, and they may run 3x3 convolutional kernels with SReLU output.  Layers  4130 and 4140 may have dimensions 8 x 8 with depth 512 and 10, respectively, running 1x1 convolutional kernels with SReLU and ReLU output, respectively. From layer 4140, which runs ten feature channels, processing may advance to a decision phase via an Average pooling procedure. Activations in each channel are averaged to generate one score for each category. The decision phase may comprise 10-class softmax classifier, for example.
To generate a neural network with SReLU on accordance with the example in FIGURE 4, initially all convolutional layers may be provided with ReLU output, after which all except the last one may be replaced with SReLU output. The neural network may, in general, comprise an artificial convolutional neural network, for example.
FIGURE 5 is a flow graph of a method in accordance with at least some embodiments of the present invention. The phases of the illustrated method may be performed in a device arranged to run the neural network, for example, by a control device of such a device.
Phase 510 comprises storing data defining, at least partly, an artificial neural network. Phase 510 comprises training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the  at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
In use, after training, the stochastic rectified linear unit may be replaced in the artificial neural network with a rectified linear unit which returns an output f from input x according to f (x) = max (0, x) .
It is to be understood that the embodiments of the invention disclosed are not limited to the particular structures, process steps, or materials disclosed herein, but are extended to equivalents thereof as would be recognized by those ordinarily skilled in the relevant arts. It should also be understood that terminology employed herein is used for the purpose of describing particular embodiments only and is not intended to be limiting.
Reference throughout this specification to one embodiment or an embodiment means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Where reference is made to a numerical value using a term such as, for example, about or substantially, the exact numerical value is also disclosed.
As used herein, a plurality of items, structural elements, compositional elements, and/or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on their presentation in a common group without indications to the contrary. In addition, various embodiments and example of the present invention may be referred to herein along with alternatives for the various components thereof. It is understood that such embodiments, examples, and alternatives are not to be construed as de facto equivalents of one another, but are to be considered as separate and autonomous representations of the present invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the preceding description, numerous specific details are provided, such as examples of lengths, widths, shapes, etc., to provide a thorough understanding of embodiments of the invention. One  skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
While the forgoing examples are illustrative of the principles of the present invention in one or more particular applications, it will be apparent to those of ordinary skill in the art that numerous modifications in form, usage and details of implementation can be made without the exercise of inventive faculty, and without departing from the principles and concepts of the invention. Accordingly, it is not intended that the invention be limited, except as by the claims set forth below.
The verbs “to comprise” and “to include” are used in this document as open limitations that neither exclude nor require the existence of also un-recited features. The features recited in depending claims are mutually freely combinable unless otherwise explicitly stated. Furthermore, it is to be understood that the use of ″a″ or ″an″ , that is, a singular form, throughout this document does not exclude a plurality.
INDUSTRIAL APPLICABILITY
At least some embodiments of the present invention find industrial application in optimizing machine recognition, to, for example, reduce traffic accidents in self-driving vehicles.
ACRONYMS
CNN       convolutional neural network
NReLU     noisy ReLU
ReLU      rectified linear unit
SReLU    stochastic ReLU
REFERENCE SIGNS LIST
110 View
101 Road
125 Area of interest
120 Pedestfian
130 First layer
140 Rectifier
150 Second layer
300-360 Structure of device of FIGURE 3
410-4140 Layers of the neural network illustrated in FIGURE 4
510-520 Phases of the method of FIGURE 5

Claims (23)

  1. An apparatus comprising:
    - memory configured to store data defining, at least partly, an artificial neural network, and
    - at least one processing core configured to train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  2. The apparatus according to claim 1, wherein the stochastical selection comprises a random or pseudorandom stochastic selection.
  3. The apparatus according to claim 1 or 2, wherein the dataset comprises a plurality of test images and the at least one processing core is configured to vary the stochastically selected value for each test image.
  4. The apparatus according to any of claims 1 -3, wherein the at least one processing core is configured to apply a first stochastic rectified linear unit between a first pair of convolutional layers in the artificial neural network, and a second stochastic rectified linear unit between a second pair of convolutional layers in the artificial neural network.
  5. The apparatus according to any of claims 1 -4, wherein the at least one stochastic rectified linear unit is configured to produce a zero output from a negative input.
  6. The apparatus according to any of claims 1 -5, wherein the at least one processing core is configured to implement a stochastic dropout function in the artificial neural network, the dropout feature stochastically setting half of activations within a layer to zero for each training sample.
  7. The apparatus according to any of claims 1 -6, wherein the stochastical selection comprises that the value is randomly or pseudorandomly selected from the range (1 -a, 1 +a) .
  8. The apparatus according to claim 7, wherein in the value a is 0, 8.
  9. The apparatus according to claim 7, wherein in the value a is 0, 3.
  10. The apparatus according to any of claims 1 -9, wherein in the artificial neural network is a pattern recognition neural network.
  11. A method comprising:
    - storing data defining, at least partly, an artificial neural network, and
    - training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  12. The method according to claim 11, wherein the stochastical selection comprises a random or pseudorandom stochastic selection.
  13. The method according to claim 11 or 12, wherein the dataset comprises a plurality of test images and the at least one processing core is configured to vary the stochastically selected value for each test image.
  14. The method according to any of claims 11 -13, further comprising applying a first stochastic rectified linear unit between a first pair of convolutional layers in the artificial neural network, and applying a second stochastic rectified linear unit between a second pair of convolutional layers in the artificial neural network.
  15. The method according to any of claims 11 -14, wherein the at least one stochastic rectified linear unit is configured to produce a zero output from a negative input.
  16. The method according to any of claims 11 -15, further comprising implementing a stochastic dropout function in the artificial neural network, the dropout feature stochastically setting half of activations within a layer to zero for each training sample.
  17. The method according to any of claims 11 -16, wherein the stochastical selection comprises that the value is randomly or pseudorandomly selected from the range (1 -a, 1 +a) .
  18. The method according to claim 17, wherein in the value a is 0, 8.
  19. The method according to claim 17, wherein in the value a is 0, 3.
  20. The method according to any of claims 11 -19, wherein in the artificial neural network is a pattern recognition neural network.
  21. An apparatus comprising:
    - means for storing data defining, at least partly, an artificial neural network;
    - means for training the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  22. A non-transitory computer readable medium having stored thereon a set of computer readable instructions that, when executed by at least one processor, cause an apparatus to at least:
    - store data defining, at least partly, an artificial neural network;
    - train the artificial neural network by applying a test dataset to the artificial neural network with at least one stochastic rectified linear unit, the at least one stochastic rectified linear unit being configured to produce a positive output from a positive input by multiplying the input with a stochastically selected value.
  23. A computer program configured to cause a method in accordance with at least one of claims 11 -20 to be performed.
PCT/CN2016/093904 2016-08-08 2016-08-08 Artificial neural network WO2018027453A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2016/093904 WO2018027453A1 (en) 2016-08-08 2016-08-08 Artificial neural network
EP16911909.6A EP3497622A4 (en) 2016-08-08 2016-08-08 Artificial neural network
CN201680088205.7A CN109564633B (en) 2016-08-08 2016-08-08 Artificial neural network
US16/321,285 US10956788B2 (en) 2016-08-08 2016-08-08 Artificial neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/093904 WO2018027453A1 (en) 2016-08-08 2016-08-08 Artificial neural network

Publications (1)

Publication Number Publication Date
WO2018027453A1 true WO2018027453A1 (en) 2018-02-15

Family

ID=61161079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/093904 WO2018027453A1 (en) 2016-08-08 2016-08-08 Artificial neural network

Country Status (4)

Country Link
US (1) US10956788B2 (en)
EP (1) EP3497622A4 (en)
CN (1) CN109564633B (en)
WO (1) WO2018027453A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960219B (en) * 2017-03-10 2021-04-16 百度在线网络技术(北京)有限公司 Picture identification method and device, computer equipment and computer readable medium
CN110570013B (en) * 2019-08-06 2023-04-07 山东省科学院海洋仪器仪表研究所 Single-station online wave period data prediction diagnosis method
US11390286B2 (en) * 2020-03-04 2022-07-19 GM Global Technology Operations LLC System and process for end to end prediction of lane detection uncertainty
CN116992249B (en) * 2023-09-28 2024-01-23 南京信息工程大学 Grid point forecast deviation correction method based on FMCNN-LSTM

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130343641A1 (en) * 2012-06-22 2013-12-26 Google Inc. System and method for labelling aerial images
US20160070673A1 (en) * 2014-09-04 2016-03-10 Qualcomm Incorporated Event-driven spatio-temporal short-time fourier transform processing for asynchronous pulse-modulated sampled signals
CN105678333A (en) * 2016-01-06 2016-06-15 浙江宇视科技有限公司 Congested area determining method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9978013B2 (en) * 2014-07-16 2018-05-22 Deep Learning Analytics, LLC Systems and methods for recognizing objects in radar imagery
US9646634B2 (en) 2014-09-30 2017-05-09 Google Inc. Low-rank hidden input layer for speech recognition neural network
US9892344B1 (en) * 2015-11-30 2018-02-13 A9.Com, Inc. Activation layers for deep learning networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130343641A1 (en) * 2012-06-22 2013-12-26 Google Inc. System and method for labelling aerial images
US20160070673A1 (en) * 2014-09-04 2016-03-10 Qualcomm Incorporated Event-driven spatio-temporal short-time fourier transform processing for asynchronous pulse-modulated sampled signals
CN105678333A (en) * 2016-01-06 2016-06-15 浙江宇视科技有限公司 Congested area determining method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3497622A4 *

Also Published As

Publication number Publication date
EP3497622A4 (en) 2020-03-25
US20190180148A1 (en) 2019-06-13
CN109564633A (en) 2019-04-02
US10956788B2 (en) 2021-03-23
CN109564633B (en) 2023-07-18
EP3497622A1 (en) 2019-06-19

Similar Documents

Publication Publication Date Title
WO2018120013A1 (en) Artificial neural network
EP3329424B1 (en) Object detection with neural network
Vennelakanti et al. Traffic sign detection and recognition using a CNN ensemble
US20230117712A1 (en) Feature density object classification, systems and methods
WO2016095117A1 (en) Object detection with neural network
CN106815560B (en) Face recognition method applied to self-adaptive driving seat
WO2019152983A2 (en) System and apparatus for face anti-spoofing via auxiliary supervision
US10956788B2 (en) Artificial neural network
KR20060054540A (en) Face identifying method and apparatus
Anand et al. An improved local binary patterns histograms techniques for face recognition for real time application
Shanta et al. Bangla sign language detection using sift and cnn
JP2010108494A (en) Method and system for determining characteristic of face within image
Stuchi et al. Improving image classification with frequency domain layers for feature extraction
KR101268520B1 (en) The apparatus and method for recognizing image
US11042722B2 (en) Artificial neural network
US20230259782A1 (en) Artificial neural network
US11494590B2 (en) Adaptive boosting machine learning
Rajan et al. Fusion of iris & fingerprint biometrics for gender classification using neural network
Mohammed et al. Hybrid Mamdani Fuzzy Rules and Convolutional Neural Networks for Analysis and Identification of Animal Images. Computation 2021, 9, 35

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16911909

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016911909

Country of ref document: EP

Effective date: 20190311