WO2021237727A1 - Method and apparatus of image processing - Google Patents
Method and apparatus of image processing Download PDFInfo
- Publication number
- WO2021237727A1 WO2021237727A1 PCT/CN2020/093497 CN2020093497W WO2021237727A1 WO 2021237727 A1 WO2021237727 A1 WO 2021237727A1 CN 2020093497 W CN2020093497 W CN 2020093497W WO 2021237727 A1 WO2021237727 A1 WO 2021237727A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- parameters
- combination
- updated
- image processing
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
Definitions
- the present invention relates to techniques of computer vision and more particularly to a method, apparatus and computer-readable storage medium for image processing.
- crowd density analysis and pedestrian detection technology are widely used in security, smart buildings and other fields.
- Crowd counting and pedestrian detection in complex scenes are like the infrastructure in the field of computer vision, providing a perceptual basis for higher semantic and more complex tasks.
- crowd counting and pedestrian detection are often solved by different neural network models, and pedestrian detection networks often do not predict face parts separately.
- both crowd counting and pedestrian detection are required simultaneously.
- the invention is based on deep learning technology, especially convolutional neural network, and integrates crowd counting and pedestrian detection in a model by designing a dual-engine multi-tasking lightweight framework.
- the solutions provided can used for other kinds of computer vision tasks.
- solutions provided in the present disclosure can save computing resources, reduce memory consumption, improve computing efficiency, which can also be deployed on low-cost edge devices.
- Embodiments of the present disclosure include methods, apparatuses for image processing.
- a method for image processing includes following steps:
- an apparatus for image processing includes:
- an image acquisition module configured to acquire an image
- a feature extraction module configured to extract at least one feature of the image via a first part of a neural network with a first set of parameters
- an apparatus for image processing includes at least one processor; at least one memory, coupled to the at least one processor, configured to execute method according to the first aspect.
- a computer-readable medium for image processing stores computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to the first aspect.
- a neural network which can be used in any above aspect of the present disclosure.
- the neural network includes:
- the first set of parameters are acquired through backward propagation; for combination of the first part and the (i+1) th part of the neural network, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network; and for combination of the first part and a series of the second part to the (N+1) th part of the neural network, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network.
- the second set of parameters are acquired through backward propagation; for combination of the first part and the (i+1) th part of the neural network , i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network; and for combination of the first part and a series of the second part to the (N+1) th part of the neural network , the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network and based on the (i+1) th set of parameters updated through training for combination of the first part and the (i+1) th part of the neural network.
- solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.
- the neural network can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
- a basic convolution layer a basic residual module
- a feature attention mechanism module a feature fusion module
- a scale-independent feature extraction module a scale-independent feature extraction module
- step-by-step deconvolution module a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network has advanced performance on basis of lightweight.
- one of the N image processing tasks is to output dense map of target objects
- the outputs of the corresponding part of the neural network can include: dense maps of different components of the target objects; number of the target objects can be further counted based on weighted sum of the dense maps of different components of the target objects.
- the weights used can be decided based on engineering practice and/or through tests.
- one of the N image processing tasks is to output dense map of target objects
- the corresponding part of the neural network can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
- the step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
- FIG. 1 depicts a block diagram of an apparatus for image processing in accordance with one embodiment of the present disclosure.
- FIG. 2 depicts structure of a CNN in accordance with one embodiment of the present disclosure.
- FIG. 3 depicts training process of a CNN in accordance with one embodiment of the present disclosure.
- FIG. 4 depicts a flow diagram of a method for image processing in accordance with one embodiment of the present disclosure.
- FIG. 5 depicts an image processing system in accordance with one embodiment of the present disclosure.
- FIG. 6 depicts training process of the system shown in FIG. 5
- FIG. 7 depicts image processing process of the system shown in FIG. 5.
- the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements.
- the terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
- Image processing solutions are proposed in this disclosure, which can be used to execute multiple tasks via a single neural network, such as the mentioned crowd counting and pedestrian detection. Now the present disclosure will be described hereinafter in details by referring to FIG. 1 to FIG. 7.
- FIG. 1 depicts a block diagram of an apparatus in accordance with one embodiment of the present disclosure.
- the apparatus 10 for image processing presented in the present disclosure can be implemented as a network of computer processors, to execute following method 100 for image processing presented in the present disclosure.
- the apparatus 10 can also be a single computer, as shown in FIG. 1, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) .
- the apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101.
- Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein.
- the at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc.
- embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions.
- various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless.
- the instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.
- the at least one memory 101 shown in FIG. 1 can contain an image processing program 20, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method 100 for image processing presented in the present disclosure.
- the image processing program 20 can include:
- an image acquisition module 201 configured to acquire an image 30
- a feature extraction module 202 configured to extract at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters
- Image 30 to be processed can be taken by a camera 70 and sent to the apparatus 10 via the communication module 103 shown in the FIG. 1.
- the image 30 can also be stored in the at least one memory 101.
- the online training process of the neural network 40 can be executed with large amounts of data by a server 60, such as a high performance GPU server.
- a server 60 such as a high performance GPU server.
- the file of neural network 40 (including parameters of each part of the neural network 40) can be transmitted via the communication module 103 to the apparatus 10 and also can be stored in the at least one memory 101, then the neural network 40 can be deployed on apparatus 10.
- the neural network 40 can be a CNN.
- the online training process can also be executed on the apparatus 10, which depends on device configuration and processing competence.
- the online training program can be part of the image processing program 20 and can be pre-stored in the at least one memory 101.
- Multiple tasks can be executed via the same CNN, which can save computing resources, and such processing may also comply with service logic.
- Such computing resource saving solutions can also make the apparatus 10 applicable to deploy on an edge device.
- the neural network 40 can include:
- a first part 401 configured to extract at least one feature of the image 30 with a first set of parameters 51;
- the first part 401 can extract shallow feature (s) and optionally, some of the second part 402 to the (N+1) th part 40 (N+1) can further extract deep feature (s) .
- backward propagation can be executed for training each part of the neural network 40.
- First different parts corresponding to different image processing tasks are trained independently, and finally the overall fine-tuning training process is performed.
- first set of parameters 51 can be updated.
- parameters of each part can also be updated.
- the first set of parameters for combination of the first part 401 and the second part 402 of the neural network 40, the first set of parameters are acquired through backward propagation with large amount of image samples; for combination of the first part 401 and the (i+1) th part 40 (i+1) of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1) th part 40 (N+1) of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1) th part 40 (N+1) of the neural network 40.
- the second set of parameters are acquired through backward propagation; for combination of the first part 401 and the (i+1) th part 40 (i+1) of the neural network 40, i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1) th part 40 (N+1) of the neural network 40, the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1) th part 40 (N+1) of the neural network 40 and based on the (i+1) th set of parameters updated through training for combination of the first part 401
- the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
- a basic convolution layer a basic residual module
- a feature attention mechanism module a feature fusion module
- a scale-independent feature extraction module a feature-by-step deconvolution module.
- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include dense maps of different components of the target objects; the image processing module 203 is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise.
- the weights used can be decided based on engineering practice and/or through tests.
- one of the N image processing tasks is to output dense map of target objects
- the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
- the scale-independent feature extraction module can be configured to extract features in multiple different scales
- the step by step deconvolution module can include multiple pairs of convolution and deconvolution modules, wherein each pair corresponds to an up sample procedure.
- the step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
- the image acquisition module 201, the feature extraction module 202 and the image processing module 203 are described above as software modules of the image processing program 20. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.
- FIG. 1 The architecture above is merely exemplary and used to explain the exemplary method 100 shown in FIG. 4.
- One exemplary method 100 according to the present disclosure includes following steps:
- S102 extracting at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters
- the first set of parameters are acquired through backward propagation, for combination of the first part and the (i+1) th part of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1) th part of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network 40.
- the second set of parameters are acquired through backward propagation, for combination of the first part and the (i+1) th part of the neural network 40, i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1) th part of the neural network 40, the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network 40 and based on the (i+1) th set of parameters updated through training for combination of the first part and the (i+1) th part of the neural network 40.
- the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
- one of the N image processing tasks is to output dense map of target objects
- the outputs of the corresponding part of the neural network 40 can include: dense maps of different components of the target objects
- the method 100 can further include: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
- one of the N image processing tasks is to output dense map of target objects
- the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
- a lightweight crowd counting and pedestrian detection system 80 for edge device is provided.
- the system 80 can include:
- an online training module 801 configured to train a neural network 813, used for image processing
- an offline module 802 configured to execute image processing via neural network 813.
- the neural network model 813 can be trained with large amount of image samples 811 via server 81, such as a high performance GPU server. After training, the model of neural network 813 can be deployed on the edge device 83 of the offline module 802, including the feature extraction 831, fuzzy engine 831 and refinement engine 832.
- a camera 82 can be connected to the edge device 83, and the neural network 812 can be running on the edge device 83.
- features can be extracted by feature extraction 831 from image to be processed, then fuzzy engine 832 will output dense map and refinement engine 833 will execute pedestrian detection.
- the output dense map can be further processed by crowd counting 834 to count number of pedestrians, and by crowd flow analysis 835 to analyze the crowd flow.
- the output bounding boxes of pedestrians and optional faces can be further processed by re-identification 836, such as finding a specific man from image, or finding people wearing masks.
- step 1 the image samples are input into the neural network 812, and loss functions can be set as MSE and SSIM, parameter set 1 for feature extraction 831, and parameter set 2 for fuzzy engine 832 can be acquired through backward propagation .
- step 2 the image samples are input into the neural network 812, and loss functions can be set as focal loss and L1 loss, through backward propagation, parameter set 1 will be updated to parameter set 1’ based on parameter set 1, and parameter set 3 for the refinement engine 833 will be acquired.
- the image samples are input into the neural network 812, and loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss.
- loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss.
- step 1 and step 2 are added and parameters acquired during the first two steps will be used during the step 3.
- the parameter set are almost ready, so the step 3 is a slight adjustment, which makes the training process can be converged fast, computing power can be saved, the whole training process can be shortened.
- parameter set 1 parameter set 2’ and parameter set 3’ will be used for image processing.
- FIG. 7 shows workflow of inference phase of the use case.
- step S201 basic convolution is executed.
- a convolution kernel with a stride can be used for the convolution operation, and a ReLU function can be used for activation, a batchnorm2d can be used for normalization, then a max pooling layer can be used.
- a feature map with a quarter size of the original image can be obtained at this step, and amount of operations can be largely reduced.
- a residual block can be used, features can be further extracted.
- improved h-swish function can be used as the activation function.
- the h-swish activation function as follow:
- a residual block can be used here, features can be further extracted. Also, an improved h-swish function can be used as the activation function uses.
- step S202 and S203 Different kinds of features can be extracted by step S202 and S203 respectively.
- other residual blocks can also be used to extract other kind of feature (s) here.
- a squeeze-and-excitation bottleneck can be used.
- an improved h-swish function can also be used as the activation function.
- Feature attention mechanism can be sued to make the neural network 812 learn more effective abstract features. Dimensions can be decreased and amount of computation can be reduced. Important features can be found.
- a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function.
- step S205 will be used at step S206, S207 and S208 respectively. At these steps, different scales of features will be extracted. Then with repetition of step S210 and S 211, a step by step deconvolution operations can be made to restore a precise dense map with not only outlines but also detailed information.
- a convolution operation can be made.
- no activation function is used here, and normalization can use batchnorm2d.
- the scale-independent feature extraction module 8321 can include two kernel convolution modules.
- the convolution modules both contain a batchnorm2d layer.
- the first module can use a h-swish activation function, the second module may not use an activation function.
- the number of feature channels in the output can be a quarter of the input.
- the scale-independent feature extraction module 8321 can include two kernel convolution modules.
- the convolution modules cam both include a batchnorm2d layer.
- the first module can use the h-swish activation function, the second module may not use an activation function.
- the number of feature channels in the output can be a quarter of the input.
- feature stitching techniques can be used to obtain feature maps which do not increase in number.
- convolution kernel and a ReLU activation function can be used, instancenorm2d can be used as a normalization layer.
- a deconvolution kernel can be used, and instancenorm2d can be used as a normalization layer.
- bodies and heads can be predicted based on the dense.
- the final number of people can be obtained by cascading the two results.
- the formula can be as follows:
- C is the result of crowd counting.
- D body is the sum of body dense map.
- D head is the sum of head dense map.
- ⁇ 1 , ⁇ 2 can be set according to the scene. Such a weighted sum can be effectively eliminated influence of people’s overlapping, with which the final result can be more precise.
- a bilinear interpolation can be used to get the twice enlarged feature maps without the calculation amount.
- step S214 without adding calculation parameters, pixel-by-pixel addition can be used to do feature fusion.
- target center points of pedestrians and faces can be predicted and predictions can be made in form of heat maps.
- step S216 on the feature map with a quarter size of the original image, height and width of the target can be predicted.
- step S217 on the feature map with a quarter size of the original image, offset of the target center point can be predicted.
- the loss function can be set as followed:
- L L mse + ⁇ 1 L ssim +L focal_loss + ⁇ 2 L wh_L1loss +L offset_L1loss
- the cascading loss function includes MSE loss function, SSIM loss function, focal loss function, and L1 loss function.
- the system 80 and method 100 can be used for dense crowd flow monitoring, pedestrian detection and tracking, etc.
- the solution presented can be deployed on edge devices, and can give artificial intelligence capabilities to traditional image acquisition devices.
- a computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
- a computer program which is being executed by at least one processor and performs any of the methods presented in this disclosure.
Abstract
A method, apparatus, system and computer-readable medium for image processing are presented. A method includes: acquiring (S101) an image (30); extracting (S102) at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters; executing (S103) N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1) th part of the neural network (40) with an (i+1) th set of parameters, N is an integer and N≥2, i=1.. N.
Description
The present invention relates to techniques of computer vision and more particularly to a method, apparatus and computer-readable storage medium for image processing.
Background Art
With the development of artificial intelligence technology, crowd density analysis and pedestrian detection technology are widely used in security, smart buildings and other fields. Crowd counting and pedestrian detection in complex scenes are like the infrastructure in the field of computer vision, providing a perceptual basis for higher semantic and more complex tasks. Such as pedestrian recognition, pedestrian flow estimation, video structured analysis, etc.
Summary of the Invention
In common solutions, crowd counting and pedestrian detection are often solved by different neural network models, and pedestrian detection networks often do not predict face parts separately. However, usually for an application scenario, both crowd counting and pedestrian detection are required simultaneously. The invention is based on deep learning technology, especially convolutional neural network, and integrates crowd counting and pedestrian detection in a model by designing a dual-engine multi-tasking lightweight framework. And the solutions provided can used for other kinds of computer vision tasks. Compared with current solutions based on separate models, solutions provided in the present disclosure can save computing resources, reduce memory consumption, improve computing efficiency, which can also be deployed on low-cost edge devices.
Embodiments of the present disclosure include methods, apparatuses for image processing.
According to a first aspect of the present disclosure, a method for image processing is presented. The method includes following steps:
- acquiring an image;
- extracting at least one feature of the image via a first part of a neural network with a first set of parameters;
- executing N image processing tasks based on the at least one feature respectively, wherein for the i
th image processing task, via an (i+1)
th part of the neural network (40) with an (i+1)
th set of parameters, N is an integer and N≥2, i=1.. N.
According to a second aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes:
- an image acquisition module, configured to acquire an image;
- a feature extraction module, configured to extract at least one feature of the image via a first part of a neural network with a first set of parameters;
- an image processing module, configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i
th image processing task, via an (i+1)
th part of the neural network with an (i+1)
th set of parameters, N is an integer and N≥2, i=1.. N.
According to a third aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes at least one processor; at least one memory, coupled to the at least one processor, configured to execute method according to the first aspect.
According to a fourth aspect of the present disclosure, a computer-readable medium for image processing is presented. The computer-readable medium stores computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to the first aspect.
According to a fifth aspect of the present disclosure, a neural network is presented which can be used in any above aspect of the present disclosure. The neural network includes:
- a first part, configured to extract at least one feature of the image with a first set of parameters;
- a second part to an (N+1)
th part (40 (N+1) ) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1)
th part (40 (i+1) ) of the neural network (40) is configured to execute the i
th image processing task with an (i+1)
th set of parameters, N is an integer and N≥2, i=1.. N.
With solutions provided in the present disclosure, multiple tasks can be executed via a single neural network, which can save computing resources, and such processing may also comply with service logic. Such computing resource saving solutions can also make the solutions applicable to deploy on an edge device.
Optionally, for combination of the first part and the second part of the neural network, the first set of parameters are acquired through backward propagation; for combination of the first part and the (i+1)
th part of the neural network, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i
th part of the neural network; and for combination of the first part and a series of the second part to the (N+1)
th part of the neural network, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)
th part of the neural network.
Optionally, for combination of the first part and the second part of the neural network , the second set of parameters are acquired through backward propagation; for combination of the first part and the (i+1)
th part of the neural network , i>1, the (i+1)
th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i
th part of the neural network; and for combination of the first part and a series of the second part to the (N+1)
th part of the neural network , the (i+1)
th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)
th part of the neural network and based on the (i+1)
th set of parameters updated through training for combination of the first part and the (i+1)
th part of the neural network.
Compared with current backward propagation process, in which only combination of all parts are involved, solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.
Optionally, the neural network can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module. Such a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network has advanced performance on basis of lightweight.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network can include: dense maps of different components of the target objects; number of the target objects can be further counted based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise. The weights used can be decided based on engineering practice and/or through tests.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module. The step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
The above mentioned attributes and other features and advantages of the present technique and the manner of attaining them will become more apparent and the present technique itself will be better understood by reference to the following description of embodiments of the present technique taken in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a block diagram of an apparatus for image processing in accordance with one embodiment of the present disclosure.
FIG. 2 depicts structure of a CNN in accordance with one embodiment of the present disclosure.
FIG. 3 depicts training process of a CNN in accordance with one embodiment of the present disclosure.
FIG. 4 depicts a flow diagram of a method for image processing in accordance with one embodiment of the present disclosure.
FIG. 5 depicts an image processing system in accordance with one embodiment of the present disclosure.
FIG. 6 depicts training process of the system shown in FIG. 5
FIG. 7 depicts image processing process of the system shown in FIG. 5.
Reference Numbers:
10, an apparatus for image processing
101, at least one memory
102, at least one processor
103, a communication module
20, an image processing program
201, an image acquisition module
202, a feature extraction module
203, an image processing module
30, image acquired and to be processed
40, a neural network
401, a first part of the neural network 40
402, a second part of the neural network 40
40i, an i
th part of the neural network 40
51, a first set of parameters of the first part 401 of the neural network 40
52, a second set of parameters of the second part 402 of the neural network 40
5i, an i
th set of parameters of the i
th part 40i of the neural network 40
60, a training server
70, a camera
100, a method for image processing
S101~S103, steps of method 100
80, a lightweight crowd counting and pedestrian detection system
801, online training module
802, offline module
81, a server
82, a camera
83, an edge device
811, image samples
812, neural network
813, loss function
831, feature extraction
832, fuzzy engine
833, refinement engine
834, crowd counting
835, crowd flow analysis
836, re-identification
200 a method for image processing
S201~S217, steps of method 200
Detailed Description of Example Embodiments
Hereinafter, above-mentioned and other features of the present technique are described in detail. Various embodiments are described with reference to the drawing, where like reference numerals are used to refer to like elements throughout. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to explain, and not to limit the invention. It may be evident that such embodiments may be practiced without these specific details.
When introducing elements of various embodiments of the present disclosure, the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
Image processing solutions are proposed in this disclosure, which can be used to execute multiple tasks via a single neural network, such as the mentioned crowd counting and pedestrian detection. Now the present disclosure will be described hereinafter in details by referring to FIG. 1 to FIG. 7.
FIG. 1 depicts a block diagram of an apparatus in accordance with one embodiment of the present disclosure. The apparatus 10 for image processing presented in the present disclosure can be implemented as a network of computer processors, to execute following method 100 for image processing presented in the present disclosure. the apparatus 10 can also be a single computer, as shown in FIG. 1, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) . The apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc. embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.
The at least one memory 101 shown in FIG. 1 can contain an image processing program 20, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method 100 for image processing presented in the present disclosure. The image processing program 20 can include:
- an image acquisition module 201, configured to acquire an image 30;
- a feature extraction module 202, configured to extract at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters;
- an image processing module 203, configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i
th image processing task, via an i+1
th part of the neural network 40 with an i+1
th set of parameters, N is an integer and N≥2, i=1.. N.
The online training process of the neural network 40 can be executed with large amounts of data by a server 60, such as a high performance GPU server. After training, the file of neural network 40 (including parameters of each part of the neural network 40) can be transmitted via the communication module 103 to the apparatus 10 and also can be stored in the at least one memory 101, then the neural network 40 can be deployed on apparatus 10. The neural network 40 can be a CNN.
However, the online training process can also be executed on the apparatus 10, which depends on device configuration and processing competence. In such a case, the online training program can be part of the image processing program 20 and can be pre-stored in the at least one memory 101.
Multiple tasks can be executed via the same CNN, which can save computing resources, and such processing may also comply with service logic. Such computing resource saving solutions can also make the apparatus 10 applicable to deploy on an edge device.
As shown in FIG. 2, the neural network 40 can include:
- a first part 401, configured to extract at least one feature of the image 30 with a first set of parameters 51;
- a second part 402 to an (N+1)
th part 40 (N+1) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1)
th part 40 (i+1) of the neural network 40 is configured to execute the i
th image processing task with an (i+1)
th set of parameters 5 (i+1) , N is an integer and N≥2, i=1.. N.
To be mentioned that, the first part 401 can extract shallow feature (s) and optionally, some of the second part 402 to the (N+1)
th part 40 (N+1) can further extract deep feature (s) .
Now referring to FIG. 3, backward propagation can be executed for training each part of the neural network 40. First different parts corresponding to different image processing tasks are trained independently, and finally the overall fine-tuning training process is performed. With combination of the first part 401 and each of other parts 40 (i+1) , first set of parameters 51 can be updated. And with the combination of all parts, parameters of each part can also be updated. Compared with current backward propagation process, in which only combination of all parts are involved, solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.
In details, as to the first set of parameters, for combination of the first part 401 and the second part 402 of the neural network 40, the first set of parameters are acquired through backward propagation with large amount of image samples; for combination of the first part 401 and the (i+1)
th part 40 (i+1) of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i
th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1)
th part 40 (N+1) of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1)
th part 40 (N+1) of the neural network 40.
And, as to the second to the (N+1)
th set of parameters, for combination of the first part 401 and the second part 402 of the neural network 40, the second set of parameters are acquired through backward propagation; for combination of the first part 401 and the (i+1)
th part 40 (i+1) of the neural network 40, i>1, the (i+1)
th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i
th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1)
th part 40 (N+1) of the neural network 40, the (i+1)
th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1)
th part 40 (N+1) of the neural network 40 and based on the (i+1)
th set of parameters updated through training for combination of the first part 401 and the (i+1)
th part 40 (i+1) of the neural network 40.
Optionally, the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module. Such a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network 40 has advanced performance on basis of lightweight.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include dense maps of different components of the target objects; the image processing module 203 is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise. The weights used can be decided based on engineering practice and/or through tests.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module. The scale-independent feature extraction module can be configured to extract features in multiple different scales, and the step by step deconvolution module can include multiple pairs of convolution and deconvolution modules, wherein each pair corresponds to an up sample procedure. The step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.
Although the image acquisition module 201, the feature extraction module 202 and the image processing module 203 are described above as software modules of the image processing program 20. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.
It should be mentioned that the present disclosure may include apparatuses having different architecture than shown in FIG. 1. The architecture above is merely exemplary and used to explain the exemplary method 100 shown in FIG. 4.
Various methods in accordance with the present disclosure may be carried out. One exemplary method 100 according to the present disclosure includes following steps:
S101: acquiring an image 30;
S102: extracting at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters;
S103: executing N image processing tasks based on the at least one feature respectively, wherein for the i
th image processing task, via an (i+1)
th part of the neural network 40 with an (i+1)
th set of parameters, N is an integer and N≥2, i=1.. N.
Optionally, for combination of the first part and the second part of the neural network 40, the first set of parameters are acquired through backward propagation, for combination of the first part and the (i+1)
th part of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i
th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1)
th part of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)
th part of the neural network 40.
Optionally, for combination of the first part and the second part of the neural network 40, the second set of parameters are acquired through backward propagation, for combination of the first part and the (i+1)
th part of the neural network 40, i>1, the (i+1)
th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i
th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1)
th part of the neural network 40, the (i+1)
th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1)
th part of the neural network 40 and based on the (i+1)
th set of parameters updated through training for combination of the first part and the (i+1)
th part of the neural network 40.
Optionally, the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include: dense maps of different components of the target objects; the method 100 can further include: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.
Following is a use case in which the solution provided in the present disclosure can be adopted. Referring to FIG. 5, in this use case, a lightweight crowd counting and pedestrian detection system 80 for edge device is provided. The system 80 can include:
- an online training module 801, configured to train a neural network 813, used for image processing;
- an offline module 802, configured to execute image processing via neural network 813.
As to the online training module 801, the neural network model 813 can be trained with large amount of image samples 811 via server 81, such as a high performance GPU server. After training, the model of neural network 813 can be deployed on the edge device 83 of the offline module 802, including the feature extraction 831, fuzzy engine 831 and refinement engine 832.
As to the offline training module 802, a camera 82 can be connected to the edge device 83, and the neural network 812 can be running on the edge device 83. Firstly, features can be extracted by feature extraction 831 from image to be processed, then fuzzy engine 832 will output dense map and refinement engine 833 will execute pedestrian detection. The output dense map can be further processed by crowd counting 834 to count number of pedestrians, and by crowd flow analysis 835 to analyze the crowd flow. The output bounding boxes of pedestrians and optional faces can be further processed by re-identification 836, such as finding a specific man from image, or finding people wearing masks.
Now referring to FIG. 6, during training phase, firstly, at step 1, the image samples are input into the neural network 812, and loss functions can be set as MSE and SSIM, parameter set 1 for feature extraction 831, and parameter set 2 for fuzzy engine 832 can be acquired through backward propagation .
Then, at step 2, the image samples are input into the neural network 812, and loss functions can be set as focal loss and L1 loss, through backward propagation, parameter set 1 will be updated to parameter set 1’ based on parameter set 1, and parameter set 3 for the refinement engine 833 will be acquired.
At last, at step 3, the image samples are input into the neural network 812, and loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss. Through backward propagation, the parameter set 1’ will be updated to parameter set 1”, parameter set 2 will be updated to parameter set 2’, and parameter set 3 will be updated to parameter set 3’.
In comparison to current backward propagation, the step 1 and step 2 are added and parameters acquired during the first two steps will be used during the step 3. In fact, after step 1 and step 2, the parameter set are almost ready, so the step 3 is a slight adjustment, which makes the training process can be converged fast, computing power can be saved, the whole training process can be shortened.
During the inference phase. The parameter set 1”, parameter set 2’ and parameter set 3’ will be used for image processing.
FIG. 7 shows workflow of inference phase of the use case.
At step S201, basic convolution is executed. A convolution kernel with a stride can be used for the convolution operation, and a ReLU function can be used for activation, a batchnorm2d can be used for normalization, then a max pooling layer can be used. A feature map with a quarter size of the original image can be obtained at this step, and amount of operations can be largely reduced.
At step S202, A residual block can be used, features can be further extracted. Following improved h-swish function can be used as the activation function. The h-swish activation function as follow:
At step S203, a residual block can be used here, features can be further extracted. Also, an improved h-swish function can be used as the activation function uses.
Different kinds of features can be extracted by step S202 and S203 respectively. Optionally, other residual blocks can also be used to extract other kind of feature (s) here.
At step S204, a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function. Feature attention mechanism can be sued to make the neural network 812 learn more effective abstract features. Dimensions can be decreased and amount of computation can be reduced. Important features can be found.
At step S205, a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function.
Features extracted at the step S205 will be output to the fuzzy engine 832 for acquiring dense map. Features extracted at the step S202, S203, S204 and S205 will be output to the refinement engine 833 for pedestrian detection.
Now comes to the fuzzy engine 832. Features output by the step S205 will be used at step S206, S207 and S208 respectively. At these steps, different scales of features will be extracted. Then with repetition of step S210 and S 211, a step by step deconvolution operations can be made to restore a precise dense map with not only outlines but also detailed information.
At the step S206, here is a part of the scale-independent feature extraction module 8321, a convolution operation can be made. Optionally, no activation function is used here, and normalization can use batchnorm2d.
At the step S207, here is also a part of the scale-independent feature extraction module 8321, which can include two kernel convolution modules. The convolution modules both contain a batchnorm2d layer. The first module can use a h-swish activation function, the second module may not use an activation function. The number of feature channels in the output can be a quarter of the input.
At the step S208, here is also a part of the scale-independent feature extraction module 8321, which can include two kernel convolution modules. The convolution modules cam both include a batchnorm2d layer. The first module can use the h-swish activation function, the second module may not use an activation function. The number of feature channels in the output can be a quarter of the input.
At step S209, feature stitching techniques can be used to obtain feature maps which do not increase in number.
At the step S210, convolution kernel and a ReLU activation function can be used, instancenorm2d can be used as a normalization layer.
At the step S211, a deconvolution kernel can be used, and instancenorm2d can be used as a normalization layer.
At step S212, bodies and heads can be predicted based on the dense. The final number of people can be obtained by cascading the two results. The formula can be as follows:
C=λ
1D
body+λ
2D
head
Wherein, C is the result of crowd counting. D
body is the sum of body dense map. D
head is the sum of head dense map. λ
1, λ
2can be set according to the scene. Such a weighted sum can be effectively eliminated influence of people’s overlapping, with which the final result can be more precise.
Now comes to the refinement engine 833.
At step S213, a bilinear interpolation can be used to get the twice enlarged feature maps without the calculation amount.
At step S214, without adding calculation parameters, pixel-by-pixel addition can be used to do feature fusion.
At step S215, on the feature map of the quarter size of the original image, target center points of pedestrians and faces can be predicted and predictions can be made in form of heat maps.
At step S216, on the feature map with a quarter size of the original image, height and width of the target can be predicted.
At step S217, on the feature map with a quarter size of the original image, offset of the target center point can be predicted.
The loss function can be set as followed:
L=L
mse+α
1L
ssim+L
focal_loss+α
2L
wh_L1loss+L
offset_L1loss
The cascading loss function includes MSE loss function, SSIM loss function, focal loss function, and L1 loss function.
The system 80 and method 100 can be used for dense crowd flow monitoring, pedestrian detection and tracking, etc. The solution presented can be deployed on edge devices, and can give artificial intelligence capabilities to traditional image acquisition devices.
A computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
A computer program, which is being executed by at least one processor and performs any of the methods presented in this disclosure.
While the present technique has been described in detail with reference to certain embodiments, it should be appreciated that the present technique is not limited to those precise embodiments. Rather, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations would present themselves, to those skilled in the art without departing from the scope and spirit of this invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications, and variations coming within the meaning and range of equivalency of the claims are to be considered within their scope.
Claims (20)
- A method (100) for image processing, comprising:- acquiring (S101) an image (30) ;- extracting (S102) at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters;- executing (S103) N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1) th part of the neural network (40) with an (i+1) th set of parameters, N is an integer and N≥2, i=1.. N.
- the method (100) according to claim 1, wherein- for combination of the first part and the second part of the neural network (40) , the first set of parameters are acquired through backward propagation,- for combination of the first part and the (i+1) th part of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and- for combination of the first part and a series of the second part to the (N+1) th part of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network (40) .
- the method (100) according to claim 1, wherein- for combination of the first part and the second part of the neural network (40) , the second set of parameters are acquired through backward propagation,- for combination of the first part and the (i+1) th part of the neural network (40) , i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and- for combination of the first part and a series of the second part to the (N+1) th part of the neural network (40) , the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network (40) and based on the (i+1) th set of parameters updated through training for combination of the first part and the (i+1) th part of the neural network (40) .
- the method (100) according to claim 1, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
- the method (100) according to claim 1, wherein- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;- the method (100) further comprises: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
- the method (100) according to claim 1, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:- a scale-independent feature extraction module, and- a step by step deconvolution module receiving output of the scale-independent feature extraction module.
- A apparatus (10) for image processing, comprising:- an image acquisition module (201) , configured to acquire an image (30) ;- a feature extraction module (202) , configured to extract at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters;- an image processing module (203) , configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i th image processing task, via an (i+1) th part of the neural network (40) with an (i+1) th set of parameters, N is an integer and N≥2, i=1.. N.
- the apparatus (10) according to claim 7, wherein- for combination of the first part and the second part of the neural network (40) , the first set of parameters are acquired through backward propagation,- for combination of the first part and the (i+1) th part of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and- for combination of the first part and a series of the second part to the (N+1) th part of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network (40) .
- the apparatus (10) according to claim 7, wherein- for combination of the first part and the second part of the neural network (40) , the second set of parameters are acquired through backward propagation,- for combination of the first part and the (i+1) th part of the neural network (40) , i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i th part of the neural network (40) , and- for combination of the first part and a series of the second part to the (N+1) th part of the neural network (40) , the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) th part of the neural network (40) and based on the (i+1) th set of parameters updated through training for combination of the first part and the (i+1) th part of the neural network (40) .
- the apparatus (10) according to claim 7, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
- the apparatus (10) according to claim 7, wherein- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;- the image processing module (203) is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects.
- the apparatus (10) according to claim 7, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:- a scale-independent feature extraction module, and- a step by step deconvolution module receiving output of the scale-independent feature extraction module.
- An apparatus (10) for image processing, comprising:- at least one processor (102) ;- at least one memory (101) , coupled to the at least one processor (102) , configured to execute method according to any of claims 1~6.
- A computer-readable medium for image processing, storing computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to any of claims 1~6.
- A neural network (40) , comprising:- a first part (401) , configured to extract at least one feature of the image (30) with a first set of parameters;- a second part (402) to an (N+1) th part (40 (N+1) ) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1) th part (40 (i+1) ) of the neural network (40) is configured to execute the i th image processing task with an (i+1) th set of parameters, N is an integer and N≥2, i=1.. N.
- the neural network (40) according to claim 15, wherein- for combination of the first part (401) and the second part (402) of the neural network (40) , the first set of parameters are acquired through backward propagation,- for combination of the first part (401) and the (i+1) th part (40 (i+1)) of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the i th part (40i) of the neural network (40) , and- for combination of the first part (401) and a series of the second part (402) to the (N+1) th part (40 (N+1) ) of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the (N+1) th part (40 (N+1) ) of the neural network (40) .
- the neural network (40) according to claim 15, wherein- for combination of the first part (401) and the second part (402) of the neural network (40) , the second set of parameters are acquired through backward propagation,- for combination of the first part (401) and the (i+1) th part (40 (i+1)) of the neural network (40) , i>1, the (i+1) th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the i th part (40i) of the neural network (40) , and- for combination of the first part (401) and a series of the second part (402) to the (N+1) th part (40 (N+1) ) of the neural network (40) , the (i+1) th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the (N+1) th part (40 (N+1) ) of the neural network (40) and based on the (i+1) th set of parameters updated through training for combination of the first part (401) and the (i+1) th part (40 (i+1) ) of the neural network (40) .
- the neural network (40) according to claim 15, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
- the neural network (40) according to claim 15, wherein- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;- the corresponding part of the neural network (40) is further configured to: count number of the target objects based on weighted sum of the dense mpas of different components of the target objects.
- the neural network (40) according to claim 15, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:- a scale-independent feature extraction module, and- a step by step deconvolution module receiving output of the scale-independent feature extraction module.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/093497 WO2021237727A1 (en) | 2020-05-29 | 2020-05-29 | Method and apparatus of image processing |
CN202080101212.2A CN115668277A (en) | 2020-05-29 | 2020-05-29 | Method and apparatus for image processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/093497 WO2021237727A1 (en) | 2020-05-29 | 2020-05-29 | Method and apparatus of image processing |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021237727A1 true WO2021237727A1 (en) | 2021-12-02 |
Family
ID=78745343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/093497 WO2021237727A1 (en) | 2020-05-29 | 2020-05-29 | Method and apparatus of image processing |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115668277A (en) |
WO (1) | WO2021237727A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024007423A1 (en) * | 2022-07-06 | 2024-01-11 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Reference picture resampling (rpr) based super-resolution guided by partition information |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529402A (en) * | 2016-09-27 | 2017-03-22 | 中国科学院自动化研究所 | Multi-task learning convolutional neural network-based face attribute analysis method |
CN109523532A (en) * | 2018-11-13 | 2019-03-26 | 腾讯科技(深圳)有限公司 | Image processing method, device, computer-readable medium and electronic equipment |
CN109858372A (en) * | 2018-12-29 | 2019-06-07 | 浙江零跑科技有限公司 | A kind of lane class precision automatic Pilot structured data analysis method |
US20200118423A1 (en) * | 2017-04-05 | 2020-04-16 | Carnegie Mellon University | Deep Learning Methods For Estimating Density and/or Flow of Objects, and Related Methods and Software |
CN111144329A (en) * | 2019-12-29 | 2020-05-12 | 北京工业大学 | Light-weight rapid crowd counting method based on multiple labels |
CN111178253A (en) * | 2019-12-27 | 2020-05-19 | 深圳佑驾创新科技有限公司 | Visual perception method and device for automatic driving, computer equipment and storage medium |
-
2020
- 2020-05-29 CN CN202080101212.2A patent/CN115668277A/en active Pending
- 2020-05-29 WO PCT/CN2020/093497 patent/WO2021237727A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529402A (en) * | 2016-09-27 | 2017-03-22 | 中国科学院自动化研究所 | Multi-task learning convolutional neural network-based face attribute analysis method |
US20200118423A1 (en) * | 2017-04-05 | 2020-04-16 | Carnegie Mellon University | Deep Learning Methods For Estimating Density and/or Flow of Objects, and Related Methods and Software |
CN109523532A (en) * | 2018-11-13 | 2019-03-26 | 腾讯科技(深圳)有限公司 | Image processing method, device, computer-readable medium and electronic equipment |
CN109858372A (en) * | 2018-12-29 | 2019-06-07 | 浙江零跑科技有限公司 | A kind of lane class precision automatic Pilot structured data analysis method |
CN111178253A (en) * | 2019-12-27 | 2020-05-19 | 深圳佑驾创新科技有限公司 | Visual perception method and device for automatic driving, computer equipment and storage medium |
CN111144329A (en) * | 2019-12-29 | 2020-05-12 | 北京工业大学 | Light-weight rapid crowd counting method based on multiple labels |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024007423A1 (en) * | 2022-07-06 | 2024-01-11 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Reference picture resampling (rpr) based super-resolution guided by partition information |
Also Published As
Publication number | Publication date |
---|---|
CN115668277A (en) | 2023-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud | |
US11783491B2 (en) | Object tracking method and apparatus, storage medium, and electronic device | |
EP4145353A1 (en) | Neural network construction method and apparatus | |
Zhuang et al. | Dense relation network: Learning consistent and context-aware representation for semantic image segmentation | |
CN111382868A (en) | Neural network structure search method and neural network structure search device | |
CN113704531A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
Li et al. | Rts3d: Real-time stereo 3d detection from 4d feature-consistency embedding space for autonomous driving | |
Xie et al. | A binocular vision application in IoT: Realtime trustworthy road condition detection system in passable area | |
CN113297959A (en) | Target tracking method and system based on corner attention twin network | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
Wang et al. | Is-mvsnet: Importance sampling-based mvsnet | |
JP2023131117A (en) | Joint perception model training, joint perception method, device, and medium | |
Liu et al. | Traffic sign recognition algorithm based on improved YOLOv5s | |
WO2021237727A1 (en) | Method and apparatus of image processing | |
CN116432736A (en) | Neural network model optimization method and device and computing equipment | |
Yang et al. | Vehicle logo detection based on modified YOLOv2 | |
CN114333062A (en) | Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency | |
Jiang et al. | Multi-level graph convolutional recurrent neural network for semantic image segmentation | |
CN113793341A (en) | Automatic driving scene semantic segmentation method, electronic device and readable medium | |
CN113822134A (en) | Instance tracking method, device, equipment and storage medium based on video | |
Nguyen et al. | Smart solution to detect images in limited visibility conditions based convolutional neural networks | |
Palle et al. | Automated image and video object detection based on hybrid heuristic-based U-net segmentation and faster region-convolutional neural network-enabled learning | |
Sun et al. | Semantic-aware 3D-voxel CenterNet for point cloud object detection | |
Tseng et al. | Image semantic segmentation with an improved fully convolutional network | |
CN113947154A (en) | Target detection method, system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20937480 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20937480 Country of ref document: EP Kind code of ref document: A1 |