WO2021237727A1 - Method and apparatus of image processing - Google Patents

Method and apparatus of image processing Download PDF

Info

Publication number
WO2021237727A1
WO2021237727A1 PCT/CN2020/093497 CN2020093497W WO2021237727A1 WO 2021237727 A1 WO2021237727 A1 WO 2021237727A1 CN 2020093497 W CN2020093497 W CN 2020093497W WO 2021237727 A1 WO2021237727 A1 WO 2021237727A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
parameters
combination
updated
image processing
Prior art date
Application number
PCT/CN2020/093497
Other languages
French (fr)
Inventor
Chen Yu
Original Assignee
Siemens Aktiengesellschaft
Siemens Ltd., China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft, Siemens Ltd., China filed Critical Siemens Aktiengesellschaft
Priority to PCT/CN2020/093497 priority Critical patent/WO2021237727A1/en
Priority to CN202080101212.2A priority patent/CN115668277A/en
Publication of WO2021237727A1 publication Critical patent/WO2021237727A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Definitions

  • the present invention relates to techniques of computer vision and more particularly to a method, apparatus and computer-readable storage medium for image processing.
  • crowd density analysis and pedestrian detection technology are widely used in security, smart buildings and other fields.
  • Crowd counting and pedestrian detection in complex scenes are like the infrastructure in the field of computer vision, providing a perceptual basis for higher semantic and more complex tasks.
  • crowd counting and pedestrian detection are often solved by different neural network models, and pedestrian detection networks often do not predict face parts separately.
  • both crowd counting and pedestrian detection are required simultaneously.
  • the invention is based on deep learning technology, especially convolutional neural network, and integrates crowd counting and pedestrian detection in a model by designing a dual-engine multi-tasking lightweight framework.
  • the solutions provided can used for other kinds of computer vision tasks.
  • solutions provided in the present disclosure can save computing resources, reduce memory consumption, improve computing efficiency, which can also be deployed on low-cost edge devices.
  • Embodiments of the present disclosure include methods, apparatuses for image processing.
  • a method for image processing includes following steps:
  • an apparatus for image processing includes:
  • an image acquisition module configured to acquire an image
  • a feature extraction module configured to extract at least one feature of the image via a first part of a neural network with a first set of parameters
  • an apparatus for image processing includes at least one processor; at least one memory, coupled to the at least one processor, configured to execute method according to the first aspect.
  • a computer-readable medium for image processing stores computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to the first aspect.
  • a neural network which can be used in any above aspect of the present disclosure.
  • the neural network includes:
  • the first set of parameters are acquired through backward propagation; for combination of the first part and the (i+1) th part of the neural network, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network; and for combination of the first part and a series of the second part to the (N+1) th part of the neural network, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network.
  • the second set of parameters are acquired through backward propagation; for combination of the first part and the (i+1) th part of the neural network , i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network; and for combination of the first part and a series of the second part to the (N+1) th part of the neural network , the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network and based on the (i+1) th set of parameters updated through training for combination of the first part and the (i+1) th part of the neural network.
  • solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.
  • the neural network can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
  • a basic convolution layer a basic residual module
  • a feature attention mechanism module a feature fusion module
  • a scale-independent feature extraction module a scale-independent feature extraction module
  • step-by-step deconvolution module a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network has advanced performance on basis of lightweight.
  • one of the N image processing tasks is to output dense map of target objects
  • the outputs of the corresponding part of the neural network can include: dense maps of different components of the target objects; number of the target objects can be further counted based on weighted sum of the dense maps of different components of the target objects.
  • the weights used can be decided based on engineering practice and/or through tests.
  • one of the N image processing tasks is to output dense map of target objects
  • the corresponding part of the neural network can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
  • the step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
  • FIG. 1 depicts a block diagram of an apparatus for image processing in accordance with one embodiment of the present disclosure.
  • FIG. 2 depicts structure of a CNN in accordance with one embodiment of the present disclosure.
  • FIG. 3 depicts training process of a CNN in accordance with one embodiment of the present disclosure.
  • FIG. 4 depicts a flow diagram of a method for image processing in accordance with one embodiment of the present disclosure.
  • FIG. 5 depicts an image processing system in accordance with one embodiment of the present disclosure.
  • FIG. 6 depicts training process of the system shown in FIG. 5
  • FIG. 7 depicts image processing process of the system shown in FIG. 5.
  • the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements.
  • the terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
  • Image processing solutions are proposed in this disclosure, which can be used to execute multiple tasks via a single neural network, such as the mentioned crowd counting and pedestrian detection. Now the present disclosure will be described hereinafter in details by referring to FIG. 1 to FIG. 7.
  • FIG. 1 depicts a block diagram of an apparatus in accordance with one embodiment of the present disclosure.
  • the apparatus 10 for image processing presented in the present disclosure can be implemented as a network of computer processors, to execute following method 100 for image processing presented in the present disclosure.
  • the apparatus 10 can also be a single computer, as shown in FIG. 1, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) .
  • the apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101.
  • Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein.
  • the at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc.
  • embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions.
  • various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless.
  • the instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.
  • the at least one memory 101 shown in FIG. 1 can contain an image processing program 20, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method 100 for image processing presented in the present disclosure.
  • the image processing program 20 can include:
  • an image acquisition module 201 configured to acquire an image 30
  • a feature extraction module 202 configured to extract at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters
  • Image 30 to be processed can be taken by a camera 70 and sent to the apparatus 10 via the communication module 103 shown in the FIG. 1.
  • the image 30 can also be stored in the at least one memory 101.
  • the online training process of the neural network 40 can be executed with large amounts of data by a server 60, such as a high performance GPU server.
  • a server 60 such as a high performance GPU server.
  • the file of neural network 40 (including parameters of each part of the neural network 40) can be transmitted via the communication module 103 to the apparatus 10 and also can be stored in the at least one memory 101, then the neural network 40 can be deployed on apparatus 10.
  • the neural network 40 can be a CNN.
  • the online training process can also be executed on the apparatus 10, which depends on device configuration and processing competence.
  • the online training program can be part of the image processing program 20 and can be pre-stored in the at least one memory 101.
  • Multiple tasks can be executed via the same CNN, which can save computing resources, and such processing may also comply with service logic.
  • Such computing resource saving solutions can also make the apparatus 10 applicable to deploy on an edge device.
  • the neural network 40 can include:
  • a first part 401 configured to extract at least one feature of the image 30 with a first set of parameters 51;
  • the first part 401 can extract shallow feature (s) and optionally, some of the second part 402 to the (N+1) th part 40 (N+1) can further extract deep feature (s) .
  • backward propagation can be executed for training each part of the neural network 40.
  • First different parts corresponding to different image processing tasks are trained independently, and finally the overall fine-tuning training process is performed.
  • first set of parameters 51 can be updated.
  • parameters of each part can also be updated.
  • the first set of parameters for combination of the first part 401 and the second part 402 of the neural network 40, the first set of parameters are acquired through backward propagation with large amount of image samples; for combination of the first part 401 and the (i+1) th part 40 (i+1) of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1) th part 40 (N+1) of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1) th part 40 (N+1) of the neural network 40.
  • the second set of parameters are acquired through backward propagation; for combination of the first part 401 and the (i+1) th part 40 (i+1) of the neural network 40, i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1) th part 40 (N+1) of the neural network 40, the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1) th part 40 (N+1) of the neural network 40 and based on the (i+1) th set of parameters updated through training for combination of the first part 401
  • the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
  • a basic convolution layer a basic residual module
  • a feature attention mechanism module a feature fusion module
  • a scale-independent feature extraction module a feature-by-step deconvolution module.
  • one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include dense maps of different components of the target objects; the image processing module 203 is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise.
  • the weights used can be decided based on engineering practice and/or through tests.
  • one of the N image processing tasks is to output dense map of target objects
  • the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
  • the scale-independent feature extraction module can be configured to extract features in multiple different scales
  • the step by step deconvolution module can include multiple pairs of convolution and deconvolution modules, wherein each pair corresponds to an up sample procedure.
  • the step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
  • the image acquisition module 201, the feature extraction module 202 and the image processing module 203 are described above as software modules of the image processing program 20. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.
  • FIG. 1 The architecture above is merely exemplary and used to explain the exemplary method 100 shown in FIG. 4.
  • One exemplary method 100 according to the present disclosure includes following steps:
  • S102 extracting at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters
  • the first set of parameters are acquired through backward propagation, for combination of the first part and the (i+1) th part of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1) th part of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network 40.
  • the second set of parameters are acquired through backward propagation, for combination of the first part and the (i+1) th part of the neural network 40, i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1) th part of the neural network 40, the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network 40 and based on the (i+1) th set of parameters updated through training for combination of the first part and the (i+1) th part of the neural network 40.
  • the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
  • one of the N image processing tasks is to output dense map of target objects
  • the outputs of the corresponding part of the neural network 40 can include: dense maps of different components of the target objects
  • the method 100 can further include: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
  • one of the N image processing tasks is to output dense map of target objects
  • the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
  • a lightweight crowd counting and pedestrian detection system 80 for edge device is provided.
  • the system 80 can include:
  • an online training module 801 configured to train a neural network 813, used for image processing
  • an offline module 802 configured to execute image processing via neural network 813.
  • the neural network model 813 can be trained with large amount of image samples 811 via server 81, such as a high performance GPU server. After training, the model of neural network 813 can be deployed on the edge device 83 of the offline module 802, including the feature extraction 831, fuzzy engine 831 and refinement engine 832.
  • a camera 82 can be connected to the edge device 83, and the neural network 812 can be running on the edge device 83.
  • features can be extracted by feature extraction 831 from image to be processed, then fuzzy engine 832 will output dense map and refinement engine 833 will execute pedestrian detection.
  • the output dense map can be further processed by crowd counting 834 to count number of pedestrians, and by crowd flow analysis 835 to analyze the crowd flow.
  • the output bounding boxes of pedestrians and optional faces can be further processed by re-identification 836, such as finding a specific man from image, or finding people wearing masks.
  • step 1 the image samples are input into the neural network 812, and loss functions can be set as MSE and SSIM, parameter set 1 for feature extraction 831, and parameter set 2 for fuzzy engine 832 can be acquired through backward propagation .
  • step 2 the image samples are input into the neural network 812, and loss functions can be set as focal loss and L1 loss, through backward propagation, parameter set 1 will be updated to parameter set 1’ based on parameter set 1, and parameter set 3 for the refinement engine 833 will be acquired.
  • the image samples are input into the neural network 812, and loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss.
  • loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss.
  • step 1 and step 2 are added and parameters acquired during the first two steps will be used during the step 3.
  • the parameter set are almost ready, so the step 3 is a slight adjustment, which makes the training process can be converged fast, computing power can be saved, the whole training process can be shortened.
  • parameter set 1 parameter set 2’ and parameter set 3’ will be used for image processing.
  • FIG. 7 shows workflow of inference phase of the use case.
  • step S201 basic convolution is executed.
  • a convolution kernel with a stride can be used for the convolution operation, and a ReLU function can be used for activation, a batchnorm2d can be used for normalization, then a max pooling layer can be used.
  • a feature map with a quarter size of the original image can be obtained at this step, and amount of operations can be largely reduced.
  • a residual block can be used, features can be further extracted.
  • improved h-swish function can be used as the activation function.
  • the h-swish activation function as follow:
  • a residual block can be used here, features can be further extracted. Also, an improved h-swish function can be used as the activation function uses.
  • step S202 and S203 Different kinds of features can be extracted by step S202 and S203 respectively.
  • other residual blocks can also be used to extract other kind of feature (s) here.
  • a squeeze-and-excitation bottleneck can be used.
  • an improved h-swish function can also be used as the activation function.
  • Feature attention mechanism can be sued to make the neural network 812 learn more effective abstract features. Dimensions can be decreased and amount of computation can be reduced. Important features can be found.
  • a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function.
  • step S205 will be used at step S206, S207 and S208 respectively. At these steps, different scales of features will be extracted. Then with repetition of step S210 and S 211, a step by step deconvolution operations can be made to restore a precise dense map with not only outlines but also detailed information.
  • a convolution operation can be made.
  • no activation function is used here, and normalization can use batchnorm2d.
  • the scale-independent feature extraction module 8321 can include two kernel convolution modules.
  • the convolution modules both contain a batchnorm2d layer.
  • the first module can use a h-swish activation function, the second module may not use an activation function.
  • the number of feature channels in the output can be a quarter of the input.
  • the scale-independent feature extraction module 8321 can include two kernel convolution modules.
  • the convolution modules cam both include a batchnorm2d layer.
  • the first module can use the h-swish activation function, the second module may not use an activation function.
  • the number of feature channels in the output can be a quarter of the input.
  • feature stitching techniques can be used to obtain feature maps which do not increase in number.
  • convolution kernel and a ReLU activation function can be used, instancenorm2d can be used as a normalization layer.
  • a deconvolution kernel can be used, and instancenorm2d can be used as a normalization layer.
  • bodies and heads can be predicted based on the dense.
  • the final number of people can be obtained by cascading the two results.
  • the formula can be as follows:
  • C is the result of crowd counting.
  • D body is the sum of body dense map.
  • D head is the sum of head dense map.
  • ⁇ 1 , ⁇ 2 can be set according to the scene. Such a weighted sum can be effectively eliminated influence of people’s overlapping, with which the final result can be more precise.
  • a bilinear interpolation can be used to get the twice enlarged feature maps without the calculation amount.
  • step S214 without adding calculation parameters, pixel-by-pixel addition can be used to do feature fusion.
  • target center points of pedestrians and faces can be predicted and predictions can be made in form of heat maps.
  • step S216 on the feature map with a quarter size of the original image, height and width of the target can be predicted.
  • step S217 on the feature map with a quarter size of the original image, offset of the target center point can be predicted.
  • the loss function can be set as followed:
  • L L mse + ⁇ 1 L ssim +L focal_loss + ⁇ 2 L wh_L1loss +L offset_L1loss
  • the cascading loss function includes MSE loss function, SSIM loss function, focal loss function, and L1 loss function.
  • the system 80 and method 100 can be used for dense crowd flow monitoring, pedestrian detection and tracking, etc.
  • the solution presented can be deployed on edge devices, and can give artificial intelligence capabilities to traditional image acquisition devices.
  • a computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
  • a computer program which is being executed by at least one processor and performs any of the methods presented in this disclosure.

Abstract

A method, apparatus, system and computer-readable medium for image processing are presented. A method includes: acquiring (S101) an image (30); extracting (S102) at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters; executing (S103) N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1) th part of the neural network (40) with an (i+1) th set of parameters, N is an integer and N≥2, i=1.. N.

Description

Method and apparatus of image processing Technical Field
The present invention relates to techniques of computer vision and more particularly to a method, apparatus and computer-readable storage medium for image processing.
Background Art
With the development of artificial intelligence technology, crowd density analysis and pedestrian detection technology are widely used in security, smart buildings and other fields. Crowd counting and pedestrian detection in complex scenes are like the infrastructure in the field of computer vision, providing a perceptual basis for higher semantic and more complex tasks. Such as pedestrian recognition, pedestrian flow estimation, video structured analysis, etc.
Summary of the Invention
In common solutions, crowd counting and pedestrian detection are often solved by different neural network models, and pedestrian detection networks often do not predict face parts separately. However, usually for an application scenario, both crowd counting and pedestrian detection are required simultaneously. The invention is based on deep learning technology, especially convolutional neural network, and integrates crowd counting and pedestrian detection in a model by designing a dual-engine multi-tasking lightweight framework. And the solutions provided can used for other kinds of computer vision tasks. Compared with current solutions based on separate models, solutions provided in the present disclosure can save computing resources, reduce memory consumption, improve computing efficiency, which can also be deployed on low-cost edge devices.
Embodiments of the present disclosure include methods, apparatuses for image processing.
According to a first aspect of the present disclosure, a method for image processing is presented. The method includes following steps:
- acquiring an image;
- extracting at least one feature of the image via a first part of a neural network with a first set of parameters;
- executing N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1)  th part of the neural network (40) with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
According to a second aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes:
- an image acquisition module, configured to acquire an image;
- a feature extraction module, configured to extract at least one feature of the image via a first part of a neural network with a first set of parameters;
- an image processing module, configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1)  th part of the neural network with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
According to a third aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes at least one processor; at least one memory, coupled to the at least one processor, configured to execute method according to the first aspect.
According to a fourth aspect of the present disclosure, a computer-readable medium for image processing is presented. The computer-readable medium stores computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to the first aspect.
According to a fifth aspect of the present disclosure, a neural network is presented which can be used in any above aspect of the present disclosure. The neural network includes:
- a first part, configured to extract at least one feature of the image with a first set of parameters;
- a second part to an (N+1)  th part (40 (N+1) ) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1)  th part (40 (i+1) ) of the neural network (40) is configured to execute the i th image processing task with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
With solutions provided in the present disclosure, multiple tasks can be executed via a single neural network, which can save computing resources, and such processing may also comply with service logic. Such computing resource saving solutions can also make the solutions applicable to deploy on an edge device.
Optionally, for combination of the first part and the second part of the neural network, the first set of parameters are acquired through backward propagation; for combination of the first part and the (i+1)  th part of the neural network, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network; and for combination of the first part and a series of the second part to the (N+1)  th part of the neural network, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network.
Optionally, for combination of the first part and the second part of the neural network , the second set of parameters are acquired through backward propagation; for combination of the first part and the (i+1)  th part of the neural network , i>1, the (i+1)  th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network; and for combination of the first part and a series of the second part to the (N+1)  th part of the neural network , the (i+1)  th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network and based on the (i+1)  th set of parameters updated through training for combination of the first part and the (i+1)  th part of the neural network.
Compared with current backward propagation process, in which only combination of all parts are involved, solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.
Optionally, the neural network can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module. Such a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network has advanced performance on basis of lightweight.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network can include: dense maps of different components of the target objects; number of the target objects can be further counted based on weighted sum of the dense maps of  different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise. The weights used can be decided based on engineering practice and/or through tests.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module. The step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
Brief Description of the Drawings
The above mentioned attributes and other features and advantages of the present technique and the manner of attaining them will become more apparent and the present technique itself will be better understood by reference to the following description of embodiments of the present technique taken in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a block diagram of an apparatus for image processing in accordance with one embodiment of the present disclosure.
FIG. 2 depicts structure of a CNN in accordance with one embodiment of the present disclosure.
FIG. 3 depicts training process of a CNN in accordance with one embodiment of the present disclosure.
FIG. 4 depicts a flow diagram of a method for image processing in accordance with one embodiment of the present disclosure.
FIG. 5 depicts an image processing system in accordance with one embodiment of the present disclosure.
FIG. 6 depicts training process of the system shown in FIG. 5
FIG. 7 depicts image processing process of the system shown in FIG. 5.
Reference Numbers:
10, an apparatus for image processing
101, at least one memory
102, at least one processor
103, a communication module
20, an image processing program
201, an image acquisition module
202, a feature extraction module
203, an image processing module
30, image acquired and to be processed
40, a neural network
401, a first part of the neural network 40
402, a second part of the neural network 40
40i, an i th part of the neural network 40
51, a first set of parameters of the first part 401 of the neural network 40
52, a second set of parameters of the second part 402 of the neural network 40
5i, an i th set of parameters of the i th part 40i of the neural network 40
60, a training server
70, a camera
100, a method for image processing
S101~S103, steps of method 100
80, a lightweight crowd counting and pedestrian detection system
801, online training module
802, offline module
81, a server
82, a camera
83, an edge device
811, image samples
812, neural network
813, loss function
831, feature extraction
832, fuzzy engine
833, refinement engine
834, crowd counting
835, crowd flow analysis
836, re-identification
200 a method for image processing
S201~S217, steps of method 200
Detailed Description of Example Embodiments
Hereinafter, above-mentioned and other features of the present technique are described in detail. Various embodiments are described with reference to the drawing, where like reference numerals are used to refer to like elements throughout. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to explain, and not to limit the invention. It may be evident that such embodiments may be practiced without these specific details.
When introducing elements of various embodiments of the present disclosure, the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
Image processing solutions are proposed in this disclosure, which can be used to execute multiple tasks via a single neural network, such as the mentioned crowd counting and pedestrian detection. Now the present disclosure will be described hereinafter in details by referring to FIG. 1 to FIG. 7.
FIG. 1 depicts a block diagram of an apparatus in accordance with one embodiment of the present disclosure. The apparatus 10 for image processing presented in the present disclosure can be implemented as a network of computer processors, to execute following method 100 for image processing presented in the present disclosure. the apparatus 10 can also be a single computer, as shown in FIG. 1, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) . The apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics  processing unit (GPU) , state machines, etc. embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.
The at least one memory 101 shown in FIG. 1 can contain an image processing program 20, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method 100 for image processing presented in the present disclosure. The image processing program 20 can include:
- an image acquisition module 201, configured to acquire an image 30;
- a feature extraction module 202, configured to extract at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters;
- an image processing module 203, configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an i+1 th part of the neural network 40 with an i+1 th set of parameters, N is an integer and N≥2, i=1.. N.
Image 30 to be processed can be taken by a camera 70 and sent to the apparatus 10 via the communication module 103 shown in the FIG. 1. The image 30 can also be stored in the at least one memory 101.
The online training process of the neural network 40 can be executed with large amounts of data by a server 60, such as a high performance GPU server. After training, the file of neural network 40 (including parameters of each part of the neural network 40) can be transmitted via the communication module 103 to the apparatus 10 and also can be stored in the at least one memory 101, then the neural network 40 can be deployed on apparatus 10. The neural network 40 can be a CNN.
However, the online training process can also be executed on the apparatus 10, which depends on device configuration and processing competence. In such a case, the online training program can be part of the image processing program 20 and can be pre-stored in the at least one memory 101.
Multiple tasks can be executed via the same CNN, which can save computing resources, and such processing may also comply with service logic. Such computing resource saving solutions can also make the apparatus 10 applicable to deploy on an  edge device.
As shown in FIG. 2, the neural network 40 can include:
- a first part 401, configured to extract at least one feature of the image 30 with a first set of parameters 51;
- a second part 402 to an (N+1)  th part 40 (N+1) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1)  th part 40 (i+1) of the neural network 40 is configured to execute the i th image processing task with an (i+1)  th set of parameters 5 (i+1) , N is an integer and N≥2, i=1.. N.
To be mentioned that, the first part 401 can extract shallow feature (s) and optionally, some of the second part 402 to the (N+1)  th part 40 (N+1) can further extract deep feature (s) .
Now referring to FIG. 3, backward propagation can be executed for training each part of the neural network 40. First different parts corresponding to different image processing tasks are trained independently, and finally the overall fine-tuning training process is performed. With combination of the first part 401 and each of other parts 40 (i+1) , first set of parameters 51 can be updated. And with the combination of all parts, parameters of each part can also be updated. Compared with current backward propagation process, in which only combination of all parts are involved, solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.
In details, as to the first set of parameters, for combination of the first part 401 and the second part 402 of the neural network 40, the first set of parameters are acquired through backward propagation with large amount of image samples; for combination of the first part 401 and the (i+1)  th part 40 (i+1) of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1)  th part 40 (N+1) of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1)  th part 40 (N+1) of the neural network 40.
And, as to the second to the (N+1)  th set of parameters, for combination of the first part 401 and the second part 402 of the neural network 40, the second set of  parameters are acquired through backward propagation; for combination of the first part 401 and the (i+1)  th part 40 (i+1) of the neural network 40, i>1, the (i+1)  th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1)  th part 40 (N+1) of the neural network 40, the (i+1)  th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1)  th part 40 (N+1) of the neural network 40 and based on the (i+1)  th set of parameters updated through training for combination of the first part 401 and the (i+1)  th part 40 (i+1) of the neural network 40.
Optionally, the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module. Such a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network 40 has advanced performance on basis of lightweight.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include dense maps of different components of the target objects; the image processing module 203 is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise. The weights used can be decided based on engineering practice and/or through tests.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module. The scale-independent feature extraction module can be configured to extract features in multiple different scales, and the step by step deconvolution module can include multiple pairs of convolution and deconvolution modules, wherein each pair corresponds to an up sample procedure. The step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an  image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
Although the image acquisition module 201, the feature extraction module 202 and the image processing module 203 are described above as software modules of the image processing program 20. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.
It should be mentioned that the present disclosure may include apparatuses having different architecture than shown in FIG. 1. The architecture above is merely exemplary and used to explain the exemplary method 100 shown in FIG. 4.
Various methods in accordance with the present disclosure may be carried out. One exemplary method 100 according to the present disclosure includes following steps:
S101: acquiring an image 30;
S102: extracting at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters;
S103: executing N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1)  th part of the neural network 40 with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
Optionally, for combination of the first part and the second part of the neural network 40, the first set of parameters are acquired through backward propagation, for combination of the first part and the (i+1)  th part of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1)  th part of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network 40.
Optionally, for combination of the first part and the second part of the neural network 40, the second set of parameters are acquired through backward propagation, for combination of the first part and the (i+1)  th part of the neural network 40, i>1, the (i+1)  th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for  combination of the first part and the i th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1)  th part of the neural network 40, the (i+1)  th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network 40 and based on the (i+1)  th set of parameters updated through training for combination of the first part and the (i+1)  th part of the neural network 40.
Optionally, the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include: dense maps of different components of the target objects; the method 100 can further include: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
Following is a use case in which the solution provided in the present disclosure can be adopted. Referring to FIG. 5, in this use case, a lightweight crowd counting and pedestrian detection system 80 for edge device is provided. The system 80 can include:
- an online training module 801, configured to train a neural network 813, used for image processing;
- an offline module 802, configured to execute image processing via neural network 813.
As to the online training module 801, the neural network model 813 can be trained with large amount of image samples 811 via server 81, such as a high performance GPU server. After training, the model of neural network 813 can be deployed on the edge device 83 of the offline module 802, including the feature extraction 831, fuzzy engine 831 and refinement engine 832.
As to the offline training module 802, a camera 82 can be connected to the edge device 83, and the neural network 812 can be running on the edge device 83. Firstly, features can be extracted by feature extraction 831 from image to be processed, then fuzzy engine 832 will output dense map and refinement engine 833 will execute pedestrian detection. The output dense map can be further processed by crowd counting 834 to count number of pedestrians, and by crowd flow analysis 835 to analyze the crowd flow. The output bounding boxes of pedestrians and optional faces can be further processed by re-identification 836, such as finding a specific man from image, or finding people wearing masks.
Now referring to FIG. 6, during training phase, firstly, at step 1, the image samples are input into the neural network 812, and loss functions can be set as MSE and SSIM, parameter set 1 for feature extraction 831, and parameter set 2 for fuzzy engine 832 can be acquired through backward propagation .
Then, at step 2, the image samples are input into the neural network 812, and loss functions can be set as focal loss and L1 loss, through backward propagation, parameter set 1 will be updated to parameter set 1’ based on parameter set 1, and parameter set 3 for the refinement engine 833 will be acquired.
At last, at step 3, the image samples are input into the neural network 812, and loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss. Through backward propagation, the parameter set 1’ will be updated to parameter set 1”, parameter set 2 will be updated to parameter set 2’, and parameter set 3 will be updated to parameter set 3’.
In comparison to current backward propagation, the step 1 and step 2 are added and parameters acquired during the first two steps will be used during the step 3. In fact, after step 1 and step 2, the parameter set are almost ready, so the step 3 is a slight adjustment, which makes the training process can be converged fast, computing power can be saved, the whole training process can be shortened.
During the inference phase. The parameter set 1”, parameter set 2’ and parameter set 3’ will be used for image processing.
FIG. 7 shows workflow of inference phase of the use case.
At step S201, basic convolution is executed. A convolution kernel with a stride can be used for the convolution operation, and a ReLU function can be used for activation, a batchnorm2d can be used for normalization, then a max pooling layer can be used. A feature map with a quarter size of the original image can be obtained at this step, and amount of operations can be largely reduced.
At step S202, A residual block can be used, features can be further extracted.  Following improved h-swish function can be used as the activation function. The h-swish activation function as follow:
Figure PCTCN2020093497-appb-000001
At step S203, a residual block can be used here, features can be further extracted. Also, an improved h-swish function can be used as the activation function uses.
Different kinds of features can be extracted by step S202 and S203 respectively. Optionally, other residual blocks can also be used to extract other kind of feature (s) here.
At step S204, a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function. Feature attention mechanism can be sued to make the neural network 812 learn more effective abstract features. Dimensions can be decreased and amount of computation can be reduced. Important features can be found.
At step S205, a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function.
Features extracted at the step S205 will be output to the fuzzy engine 832 for acquiring dense map. Features extracted at the step S202, S203, S204 and S205 will be output to the refinement engine 833 for pedestrian detection.
Now comes to the fuzzy engine 832. Features output by the step S205 will be used at step S206, S207 and S208 respectively. At these steps, different scales of features will be extracted. Then with repetition of step S210 and S 211, a step by step deconvolution operations can be made to restore a precise dense map with not only outlines but also detailed information.
At the step S206, here is a part of the scale-independent feature extraction module 8321, a convolution operation can be made. Optionally, no activation function is used here, and normalization can use batchnorm2d.
At the step S207, here is also a part of the scale-independent feature extraction module 8321, which can include two kernel convolution modules. The convolution modules both contain a batchnorm2d layer. The first module can use a h-swish activation function, the second module may not use an activation function. The number of feature channels in the output can be a quarter of the input.
At the step S208, here is also a part of the scale-independent feature extraction module 8321, which can include two kernel convolution modules. The convolution modules cam both include a batchnorm2d layer. The first module can use the h-swish activation function, the second module may not use an activation function.  The number of feature channels in the output can be a quarter of the input.
At step S209, feature stitching techniques can be used to obtain feature maps which do not increase in number.
At the step S210, convolution kernel and a ReLU activation function can be used, instancenorm2d can be used as a normalization layer.
At the step S211, a deconvolution kernel can be used, and instancenorm2d can be used as a normalization layer.
At step S212, bodies and heads can be predicted based on the dense. The final number of people can be obtained by cascading the two results. The formula can be as follows:
C=λ 1D body2D head
Wherein, C is the result of crowd counting. D body is the sum of body dense map. D head is the sum of head dense map. λ 1, λ 2can be set according to the scene. Such a weighted sum can be effectively eliminated influence of people’s overlapping, with which the final result can be more precise.
Now comes to the refinement engine 833.
At step S213, a bilinear interpolation can be used to get the twice enlarged feature maps without the calculation amount.
At step S214, without adding calculation parameters, pixel-by-pixel addition can be used to do feature fusion.
At step S215, on the feature map of the quarter size of the original image, target center points of pedestrians and faces can be predicted and predictions can be made in form of heat maps.
At step S216, on the feature map with a quarter size of the original image, height and width of the target can be predicted.
At step S217, on the feature map with a quarter size of the original image, offset of the target center point can be predicted.
The loss function can be set as followed:
L=L mse1L ssim+L focal_loss2L wh_L1loss+L offset_L1loss
The cascading loss function includes MSE loss function, SSIM loss function, focal loss function, and L1 loss function.
The system 80 and method 100 can be used for dense crowd flow monitoring, pedestrian detection and tracking, etc. The solution presented can be deployed on edge devices, and can give artificial intelligence capabilities to traditional image acquisition devices.
A computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
A computer program, which is being executed by at least one processor and performs any of the methods presented in this disclosure.
While the present technique has been described in detail with reference to certain embodiments, it should be appreciated that the present technique is not limited to those precise embodiments. Rather, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations would present themselves, to those skilled in the art without departing from the scope and spirit of this invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications, and variations coming within the meaning and range of equivalency of the claims are to be considered within their scope.

Claims (20)

  1. A method (100) for image processing, comprising:
    - acquiring (S101) an image (30) ;
    - extracting (S102) at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters;
    - executing (S103) N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1)  th part of the neural network (40) with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
  2. the method (100) according to claim 1, wherein
    - for combination of the first part and the second part of the neural network (40) , the first set of parameters are acquired through backward propagation,
    - for combination of the first part and the (i+1)  th part of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and
    - for combination of the first part and a series of the second part to the (N+1)  th part of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network (40) .
  3. the method (100) according to claim 1, wherein
    - for combination of the first part and the second part of the neural network (40) , the second set of parameters are acquired through backward propagation,
    - for combination of the first part and the (i+1)  th part of the neural network (40) , i>1, the (i+1)  th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and
    - for combination of the first part and a series of the second part to the (N+1)  th part of the neural network (40) , the (i+1)  th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network (40) and based on the (i+1)  th set of parameters updated through training for combination of the first part and the (i+1)  th part of the neural network (40) .
  4. the method (100) according to claim 1, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
  5. the method (100) according to claim 1, wherein
    - one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;
    - the method (100) further comprises: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
  6. the method (100) according to claim 1, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:
    - a scale-independent feature extraction module, and
    - a step by step deconvolution module receiving output of the scale-independent feature extraction module.
  7. A apparatus (10) for image processing, comprising:
    - an image acquisition module (201) , configured to acquire an image (30) ;
    - a feature extraction module (202) , configured to extract at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters;
    - an image processing module (203) , configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1)  th part of the neural network (40) with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
  8. the apparatus (10) according to claim 7, wherein
    - for combination of the first part and the second part of the neural network (40) , the first set of parameters are acquired through backward propagation,
    - for combination of the first part and the (i+1)  th part of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first  part and the i th part of the neural network (40) , and
    - for combination of the first part and a series of the second part to the (N+1)  th part of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network (40) .
  9. the apparatus (10) according to claim 7, wherein
    - for combination of the first part and the second part of the neural network (40) , the second set of parameters are acquired through backward propagation,
    - for combination of the first part and the (i+1)  th part of the neural network (40) , i>1, the (i+1)  th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and
    - for combination of the first part and a series of the second part to the (N+1)  th part of the neural network (40) , the (i+1)  th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)  th part of the neural network (40) and based on the (i+1)  th set of parameters updated through training for combination of the first part and the (i+1)  th part of the neural network (40) .
  10. the apparatus (10) according to claim 7, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
  11. the apparatus (10) according to claim 7, wherein
    - one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;
    - the image processing module (203) is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects.
  12. the apparatus (10) according to claim 7, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part  of the neural network (40) comprises:
    - a scale-independent feature extraction module, and
    - a step by step deconvolution module receiving output of the scale-independent feature extraction module.
  13. An apparatus (10) for image processing, comprising:
    - at least one processor (102) ;
    - at least one memory (101) , coupled to the at least one processor (102) , configured to execute method according to any of claims 1~6.
  14. A computer-readable medium for image processing, storing computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to any of claims 1~6.
  15. A neural network (40) , comprising:
    - a first part (401) , configured to extract at least one feature of the image (30) with a first set of parameters;
    - a second part (402) to an (N+1)  th part (40 (N+1) ) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1)  th part (40 (i+1) ) of the neural network (40) is configured to execute the i th image processing task with an (i+1)  th set of parameters, N is an integer and N≥2, i=1.. N.
  16. the neural network (40) according to claim 15, wherein
    - for combination of the first part (401) and the second part (402) of the neural network (40) , the first set of parameters are acquired through backward propagation,
    - for combination of the first part (401) and the (i+1)  th part (40 (i+1)) of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the i th part (40i) of the neural network (40) , and
    - for combination of the first part (401) and a series of the second part (402) to the (N+1)  th part (40 (N+1) ) of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters  updated through training for combination of the first part (401) and the (N+1)  th part (40 (N+1) ) of the neural network (40) .
  17. the neural network (40) according to claim 15, wherein
    - for combination of the first part (401) and the second part (402) of the neural network (40) , the second set of parameters are acquired through backward propagation,
    - for combination of the first part (401) and the (i+1)  th part (40 (i+1)) of the neural network (40) , i>1, the (i+1)  th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the i th part (40i) of the neural network (40) , and
    - for combination of the first part (401) and a series of the second part (402) to the (N+1)  th part (40 (N+1) ) of the neural network (40) , the (i+1)  th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the (N+1)  th part (40 (N+1) ) of the neural network (40) and based on the (i+1)  th set of parameters updated through training for combination of the first part (401) and the (i+1)  th part (40 (i+1) ) of the neural network (40) .
  18. the neural network (40) according to claim 15, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
  19. the neural network (40) according to claim 15, wherein
    - one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;
    - the corresponding part of the neural network (40) is further configured to: count number of the target objects based on weighted sum of the dense mpas of different components of the target objects.
  20. the neural network (40) according to claim 15, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:
    - a scale-independent feature extraction module, and
    - a step by step deconvolution module receiving output of the scale-independent feature extraction module.
PCT/CN2020/093497 2020-05-29 2020-05-29 Method and apparatus of image processing WO2021237727A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/093497 WO2021237727A1 (en) 2020-05-29 2020-05-29 Method and apparatus of image processing
CN202080101212.2A CN115668277A (en) 2020-05-29 2020-05-29 Method and apparatus for image processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/093497 WO2021237727A1 (en) 2020-05-29 2020-05-29 Method and apparatus of image processing

Publications (1)

Publication Number Publication Date
WO2021237727A1 true WO2021237727A1 (en) 2021-12-02

Family

ID=78745343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093497 WO2021237727A1 (en) 2020-05-29 2020-05-29 Method and apparatus of image processing

Country Status (2)

Country Link
CN (1) CN115668277A (en)
WO (1) WO2021237727A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024007423A1 (en) * 2022-07-06 2024-01-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Reference picture resampling (rpr) based super-resolution guided by partition information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN109523532A (en) * 2018-11-13 2019-03-26 腾讯科技(深圳)有限公司 Image processing method, device, computer-readable medium and electronic equipment
CN109858372A (en) * 2018-12-29 2019-06-07 浙江零跑科技有限公司 A kind of lane class precision automatic Pilot structured data analysis method
US20200118423A1 (en) * 2017-04-05 2020-04-16 Carnegie Mellon University Deep Learning Methods For Estimating Density and/or Flow of Objects, and Related Methods and Software
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 Light-weight rapid crowd counting method based on multiple labels
CN111178253A (en) * 2019-12-27 2020-05-19 深圳佑驾创新科技有限公司 Visual perception method and device for automatic driving, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
US20200118423A1 (en) * 2017-04-05 2020-04-16 Carnegie Mellon University Deep Learning Methods For Estimating Density and/or Flow of Objects, and Related Methods and Software
CN109523532A (en) * 2018-11-13 2019-03-26 腾讯科技(深圳)有限公司 Image processing method, device, computer-readable medium and electronic equipment
CN109858372A (en) * 2018-12-29 2019-06-07 浙江零跑科技有限公司 A kind of lane class precision automatic Pilot structured data analysis method
CN111178253A (en) * 2019-12-27 2020-05-19 深圳佑驾创新科技有限公司 Visual perception method and device for automatic driving, computer equipment and storage medium
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 Light-weight rapid crowd counting method based on multiple labels

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024007423A1 (en) * 2022-07-06 2024-01-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Reference picture resampling (rpr) based super-resolution guided by partition information

Also Published As

Publication number Publication date
CN115668277A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
Wu et al. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud
US11783491B2 (en) Object tracking method and apparatus, storage medium, and electronic device
EP4145353A1 (en) Neural network construction method and apparatus
Zhuang et al. Dense relation network: Learning consistent and context-aware representation for semantic image segmentation
CN111382868A (en) Neural network structure search method and neural network structure search device
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
Li et al. Rts3d: Real-time stereo 3d detection from 4d feature-consistency embedding space for autonomous driving
Xie et al. A binocular vision application in IoT: Realtime trustworthy road condition detection system in passable area
CN113297959A (en) Target tracking method and system based on corner attention twin network
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
Wang et al. Is-mvsnet: Importance sampling-based mvsnet
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
Liu et al. Traffic sign recognition algorithm based on improved YOLOv5s
WO2021237727A1 (en) Method and apparatus of image processing
CN116432736A (en) Neural network model optimization method and device and computing equipment
Yang et al. Vehicle logo detection based on modified YOLOv2
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
Jiang et al. Multi-level graph convolutional recurrent neural network for semantic image segmentation
CN113793341A (en) Automatic driving scene semantic segmentation method, electronic device and readable medium
CN113822134A (en) Instance tracking method, device, equipment and storage medium based on video
Nguyen et al. Smart solution to detect images in limited visibility conditions based convolutional neural networks
Palle et al. Automated image and video object detection based on hybrid heuristic-based U-net segmentation and faster region-convolutional neural network-enabled learning
Sun et al. Semantic-aware 3D-voxel CenterNet for point cloud object detection
Tseng et al. Image semantic segmentation with an improved fully convolutional network
CN113947154A (en) Target detection method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937480

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20937480

Country of ref document: EP

Kind code of ref document: A1