WO2021237727A1

WO2021237727A1 - Method and apparatus of image processing

Info

Publication number: WO2021237727A1
Application number: PCT/CN2020/093497
Authority: WO
Inventors: Chen Yu
Original assignee: Siemens Aktiengesellschaft; Siemens Ltd., China
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2021-12-02
Also published as: CN115668277A

Abstract

A method, apparatus, system and computer-readable medium for image processing are presented. A method includes: acquiring (S101) an image (30); extracting (S102) at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters; executing (S103) N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an (i+1) ^th part of the neural network (40) with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.

Description

Method and apparatus of image processing

Technical Field

The present invention relates to techniques of computer vision and more particularly to a method, apparatus and computer-readable storage medium for image processing.

Background Art

With the development of artificial intelligence technology, crowd density analysis and pedestrian detection technology are widely used in security, smart buildings and other fields. Crowd counting and pedestrian detection in complex scenes are like the infrastructure in the field of computer vision, providing a perceptual basis for higher semantic and more complex tasks. Such as pedestrian recognition, pedestrian flow estimation, video structured analysis, etc.

Summary of the Invention

In common solutions, crowd counting and pedestrian detection are often solved by different neural network models, and pedestrian detection networks often do not predict face parts separately. However, usually for an application scenario, both crowd counting and pedestrian detection are required simultaneously. The invention is based on deep learning technology, especially convolutional neural network, and integrates crowd counting and pedestrian detection in a model by designing a dual-engine multi-tasking lightweight framework. And the solutions provided can used for other kinds of computer vision tasks. Compared with current solutions based on separate models, solutions provided in the present disclosure can save computing resources, reduce memory consumption, improve computing efficiency, which can also be deployed on low-cost edge devices.

Embodiments of the present disclosure include methods, apparatuses for image processing.

According to a first aspect of the present disclosure, a method for image processing is presented. The method includes following steps:

- acquiring an image;

- extracting at least one feature of the image via a first part of a neural network with a first set of parameters;

- executing N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an (i+1) ^th part of the neural network (40) with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.

According to a second aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes:

- an image acquisition module, configured to acquire an image;

- a feature extraction module, configured to extract at least one feature of the image via a first part of a neural network with a first set of parameters;

- an image processing module, configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an (i+1) ^th part of the neural network with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.

According to a third aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes at least one processor; at least one memory, coupled to the at least one processor, configured to execute method according to the first aspect.

According to a fourth aspect of the present disclosure, a computer-readable medium for image processing is presented. The computer-readable medium stores computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to the first aspect.

According to a fifth aspect of the present disclosure, a neural network is presented which can be used in any above aspect of the present disclosure. The neural network includes:

- a first part, configured to extract at least one feature of the image with a first set of parameters;

- a second part to an (N+1) ^th part (40 (N+1) ) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1) ^th part (40 (i+1) ) of the neural network (40) is configured to execute the i ^th image processing task with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.

With solutions provided in the present disclosure, multiple tasks can be executed via a single neural network, which can save computing resources, and such processing may also comply with service logic. Such computing resource saving solutions can also make the solutions applicable to deploy on an edge device.

Optionally, for combination of the first part and the second part of the neural network, the first set of parameters are acquired through backward propagation; for combination of the first part and the (i+1) ^th part of the neural network, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network; and for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network.

Optionally, for combination of the first part and the second part of the neural network , the second set of parameters are acquired through backward propagation; for combination of the first part and the (i+1) ^th part of the neural network , i>1, the (i+1) ^th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network; and for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network , the (i+1) ^th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network and based on the (i+1) ^th set of parameters updated through training for combination of the first part and the (i+1) ^th part of the neural network.

Compared with current backward propagation process, in which only combination of all parts are involved, solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.

Optionally, the neural network can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module. Such a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network has advanced performance on basis of lightweight.

Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network can include: dense maps of different components of the target objects; number of the target objects can be further counted based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise. The weights used can be decided based on engineering practice and/or through tests.

Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module. The step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.

Brief Description of the Drawings

The above mentioned attributes and other features and advantages of the present technique and the manner of attaining them will become more apparent and the present technique itself will be better understood by reference to the following description of embodiments of the present technique taken in conjunction with the accompanying drawings, wherein:

FIG. 1 depicts a block diagram of an apparatus for image processing in accordance with one embodiment of the present disclosure.

FIG. 2 depicts structure of a CNN in accordance with one embodiment of the present disclosure.

FIG. 3 depicts training process of a CNN in accordance with one embodiment of the present disclosure.

FIG. 4 depicts a flow diagram of a method for image processing in accordance with one embodiment of the present disclosure.

FIG. 5 depicts an image processing system in accordance with one embodiment of the present disclosure.

FIG. 6 depicts training process of the system shown in FIG. 5

FIG. 7 depicts image processing process of the system shown in FIG. 5.

Reference Numbers:

10, an apparatus for image processing

101, at least one memory

102, at least one processor

103, a communication module

20, an image processing program

201, an image acquisition module

202, a feature extraction module

203, an image processing module

30, image acquired and to be processed

40, a neural network

401, a first part of the neural network 40

402, a second part of the neural network 40

40i, an i ^th part of the neural network 40

51, a first set of parameters of the first part 401 of the neural network 40

52, a second set of parameters of the second part 402 of the neural network 40

5i, an i ^th set of parameters of the i ^th part 40i of the neural network 40

60, a training server

70, a camera

100, a method for image processing

S101～S103, steps of method 100

80, a lightweight crowd counting and pedestrian detection system

801, online training module

802, offline module

81, a server

82, a camera

83, an edge device

811, image samples

812, neural network

813, loss function

831, feature extraction

832, fuzzy engine

833, refinement engine

834, crowd counting

835, crowd flow analysis

836, re-identification

200 a method for image processing

S201～S217, steps of method 200

Detailed Description of Example Embodiments

Hereinafter, above-mentioned and other features of the present technique are described in detail. Various embodiments are described with reference to the drawing, where like reference numerals are used to refer to like elements throughout. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to explain, and not to limit the invention. It may be evident that such embodiments may be practiced without these specific details.

When introducing elements of various embodiments of the present disclosure, the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.

Image processing solutions are proposed in this disclosure, which can be used to execute multiple tasks via a single neural network, such as the mentioned crowd counting and pedestrian detection. Now the present disclosure will be described hereinafter in details by referring to FIG. 1 to FIG. 7.

FIG. 1 depicts a block diagram of an apparatus in accordance with one embodiment of the present disclosure. The apparatus 10 for image processing presented in the present disclosure can be implemented as a network of computer processors, to execute following method 100 for image processing presented in the present disclosure. the apparatus 10 can also be a single computer, as shown in FIG. 1, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) . The apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc. embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.

The at least one memory 101 shown in FIG. 1 can contain an image processing program 20, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method 100 for image processing presented in the present disclosure. The image processing program 20 can include:

- an image acquisition module 201, configured to acquire an image 30;

- a feature extraction module 202, configured to extract at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters;

- an image processing module 203, configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an i+1 ^th part of the neural network 40 with an i+1 ^th set of parameters, N is an integer and N≥2, i=1.. N.

Image 30 to be processed can be taken by a camera 70 and sent to the apparatus 10 via the communication module 103 shown in the FIG. 1. The image 30 can also be stored in the at least one memory 101.

The online training process of the neural network 40 can be executed with large amounts of data by a server 60, such as a high performance GPU server. After training, the file of neural network 40 (including parameters of each part of the neural network 40) can be transmitted via the communication module 103 to the apparatus 10 and also can be stored in the at least one memory 101, then the neural network 40 can be deployed on apparatus 10. The neural network 40 can be a CNN.

However, the online training process can also be executed on the apparatus 10, which depends on device configuration and processing competence. In such a case, the online training program can be part of the image processing program 20 and can be pre-stored in the at least one memory 101.

Multiple tasks can be executed via the same CNN, which can save computing resources, and such processing may also comply with service logic. Such computing resource saving solutions can also make the apparatus 10 applicable to deploy on an edge device.

As shown in FIG. 2, the neural network 40 can include:

- a first part 401, configured to extract at least one feature of the image 30 with a first set of parameters 51;

- a second part 402 to an (N+1) ^th part 40 (N+1) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1) ^th part 40 (i+1) of the neural network 40 is configured to execute the i ^th image processing task with an (i+1) ^th set of parameters 5 (i+1) , N is an integer and N≥2, i=1.. N.

To be mentioned that, the first part 401 can extract shallow feature (s) and optionally, some of the second part 402 to the (N+1) ^th part 40 (N+1) can further extract deep feature (s) .

Now referring to FIG. 3, backward propagation can be executed for training each part of the neural network 40. First different parts corresponding to different image processing tasks are trained independently, and finally the overall fine-tuning training process is performed. With combination of the first part 401 and each of other parts 40 (i+1) , first set of parameters 51 can be updated. And with the combination of all parts, parameters of each part can also be updated. Compared with current backward propagation process, in which only combination of all parts are involved, solutions presented herein can obtain very close parameters during each combination of the first part and another part, which can quickly converge, save computing power and speed up the training process.

In details, as to the first set of parameters, for combination of the first part 401 and the second part 402 of the neural network 40, the first set of parameters are acquired through backward propagation with large amount of image samples; for combination of the first part 401 and the (i+1) ^th part 40 (i+1) of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i ^th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1) ^th part 40 (N+1) of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1) ^th part 40 (N+1) of the neural network 40.

And, as to the second to the (N+1) ^th set of parameters, for combination of the first part 401 and the second part 402 of the neural network 40, the second set of parameters are acquired through backward propagation; for combination of the first part 401 and the (i+1) ^th part 40 (i+1) of the neural network 40, i>1, the (i+1) ^th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the i ^th part 40i of the neural network 40; and for combination of the first part 401 and a series of the second part 402 to the (N+1) ^th part 40 (N+1) of the neural network 40, the (i+1) ^th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part 401 and the (N+1) ^th part 40 (N+1) of the neural network 40 and based on the (i+1) ^th set of parameters updated through training for combination of the first part 401 and the (i+1) ^th part 40 (i+1) of the neural network 40.

Optionally, the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module. Such a simplified structure and the combination of above-mentioned modules can reduce redundant parameters and ensure that the neural network 40 has advanced performance on basis of lightweight.

Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include dense maps of different components of the target objects; the image processing module 203 is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects. In case of objects being overlapped by each other, such an optional solution can make the result of counting more precise. The weights used can be decided based on engineering practice and/or through tests.

Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module. The scale-independent feature extraction module can be configured to extract features in multiple different scales, and the step by step deconvolution module can include multiple pairs of convolution and deconvolution modules, wherein each pair corresponds to an up sample procedure. The step by step deconvolution module takes advantage of the features of different scales extracted by the scale-independent feature extraction module, then restores step by step to achieve a precise dense map. For example, there are four clusters of target objects on an image, without the scale-independent feature extraction and step by step deconvolution, the output dense map can only include four fuzzy clusters, no details of each cluster can be seen.

Although the image acquisition module 201, the feature extraction module 202 and the image processing module 203 are described above as software modules of the image processing program 20. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.

It should be mentioned that the present disclosure may include apparatuses having different architecture than shown in FIG. 1. The architecture above is merely exemplary and used to explain the exemplary method 100 shown in FIG. 4.

Various methods in accordance with the present disclosure may be carried out. One exemplary method 100 according to the present disclosure includes following steps:

S101: acquiring an image 30;

S102: extracting at least one feature of the image 30 via a first part of a neural network 40 with a first set of parameters;

S103: executing N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an (i+1) ^th part of the neural network 40 with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.

Optionally, for combination of the first part and the second part of the neural network 40, the first set of parameters are acquired through backward propagation, for combination of the first part and the (i+1) ^th part of the neural network 40, i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network 40, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network 40.

Optionally, for combination of the first part and the second part of the neural network 40, the second set of parameters are acquired through backward propagation, for combination of the first part and the (i+1) ^th part of the neural network 40, i>1, the (i+1) ^th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network 40, and for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network 40, the (i+1) ^th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network 40 and based on the (i+1) ^th set of parameters updated through training for combination of the first part and the (i+1) ^th part of the neural network 40.

Optionally, the neural network 40 can include: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.

Optionally, one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network 40 can include: dense maps of different components of the target objects; the method 100 can further include: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.

Optionally, one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network 40 can include: a scale-independent feature extraction module, and a step by step deconvolution module receiving output of the scale-independent feature extraction module.

Following is a use case in which the solution provided in the present disclosure can be adopted. Referring to FIG. 5, in this use case, a lightweight crowd counting and pedestrian detection system 80 for edge device is provided. The system 80 can include:

- an online training module 801, configured to train a neural network 813, used for image processing;

- an offline module 802, configured to execute image processing via neural network 813.

As to the online training module 801, the neural network model 813 can be trained with large amount of image samples 811 via server 81, such as a high performance GPU server. After training, the model of neural network 813 can be deployed on the edge device 83 of the offline module 802, including the feature extraction 831, fuzzy engine 831 and refinement engine 832.

As to the offline training module 802, a camera 82 can be connected to the edge device 83, and the neural network 812 can be running on the edge device 83. Firstly, features can be extracted by feature extraction 831 from image to be processed, then fuzzy engine 832 will output dense map and refinement engine 833 will execute pedestrian detection. The output dense map can be further processed by crowd counting 834 to count number of pedestrians, and by crowd flow analysis 835 to analyze the crowd flow. The output bounding boxes of pedestrians and optional faces can be further processed by re-identification 836, such as finding a specific man from image, or finding people wearing masks.

Now referring to FIG. 6, during training phase, firstly, at step 1, the image samples are input into the neural network 812, and loss functions can be set as MSE and SSIM, parameter set 1 for feature extraction 831, and parameter set 2 for fuzzy engine 832 can be acquired through backward propagation .

Then, at step 2, the image samples are input into the neural network 812, and loss functions can be set as focal loss and L1 loss, through backward propagation, parameter set 1 will be updated to parameter set 1’ based on parameter set 1, and parameter set 3 for the refinement engine 833 will be acquired.

At last, at step 3, the image samples are input into the neural network 812, and loss functions can be set as weighted sum of MSE, SSIM, focal loss and L1 loss. Through backward propagation, the parameter set 1’ will be updated to parameter set 1”, parameter set 2 will be updated to parameter set 2’, and parameter set 3 will be updated to parameter set 3’.

In comparison to current backward propagation, the step 1 and step 2 are added and parameters acquired during the first two steps will be used during the step 3. In fact, after step 1 and step 2, the parameter set are almost ready, so the step 3 is a slight adjustment, which makes the training process can be converged fast, computing power can be saved, the whole training process can be shortened.

During the inference phase. The parameter set 1”, parameter set 2’ and parameter set 3’ will be used for image processing.

FIG. 7 shows workflow of inference phase of the use case.

At step S201, basic convolution is executed. A convolution kernel with a stride can be used for the convolution operation, and a ReLU function can be used for activation, a batchnorm2d can be used for normalization, then a max pooling layer can be used. A feature map with a quarter size of the original image can be obtained at this step, and amount of operations can be largely reduced.

At step S202, A residual block can be used, features can be further extracted. Following improved h-swish function can be used as the activation function. The h-swish activation function as follow:

At step S203, a residual block can be used here, features can be further extracted. Also, an improved h-swish function can be used as the activation function uses.

Different kinds of features can be extracted by step S202 and S203 respectively. Optionally, other residual blocks can also be used to extract other kind of feature (s) here.

At step S204, a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function. Feature attention mechanism can be sued to make the neural network 812 learn more effective abstract features. Dimensions can be decreased and amount of computation can be reduced. Important features can be found.

At step S205, a squeeze-and-excitation bottleneck can be used. Also, an improved h-swish function can also be used as the activation function.

Features extracted at the step S205 will be output to the fuzzy engine 832 for acquiring dense map. Features extracted at the step S202, S203, S204 and S205 will be output to the refinement engine 833 for pedestrian detection.

Now comes to the fuzzy engine 832. Features output by the step S205 will be used at step S206, S207 and S208 respectively. At these steps, different scales of features will be extracted. Then with repetition of step S210 and S 211, a step by step deconvolution operations can be made to restore a precise dense map with not only outlines but also detailed information.

At the step S206, here is a part of the scale-independent feature extraction module 8321, a convolution operation can be made. Optionally, no activation function is used here, and normalization can use batchnorm2d.

At the step S207, here is also a part of the scale-independent feature extraction module 8321, which can include two kernel convolution modules. The convolution modules both contain a batchnorm2d layer. The first module can use a h-swish activation function, the second module may not use an activation function. The number of feature channels in the output can be a quarter of the input.

At the step S208, here is also a part of the scale-independent feature extraction module 8321, which can include two kernel convolution modules. The convolution modules cam both include a batchnorm2d layer. The first module can use the h-swish activation function, the second module may not use an activation function. The number of feature channels in the output can be a quarter of the input.

At step S209, feature stitching techniques can be used to obtain feature maps which do not increase in number.

At the step S210, convolution kernel and a ReLU activation function can be used, instancenorm2d can be used as a normalization layer.

At the step S211, a deconvolution kernel can be used, and instancenorm2d can be used as a normalization layer.

At step S212, bodies and heads can be predicted based on the dense. The final number of people can be obtained by cascading the two results. The formula can be as follows:

C=λ ₁D _body+λ ₂D _head

Wherein, C is the result of crowd counting. D _body is the sum of body dense map. D _head is the sum of head dense map. λ ₁, λ ₂can be set according to the scene. Such a weighted sum can be effectively eliminated influence of people’s overlapping, with which the final result can be more precise.

Now comes to the refinement engine 833.

At step S213, a bilinear interpolation can be used to get the twice enlarged feature maps without the calculation amount.

At step S214, without adding calculation parameters, pixel-by-pixel addition can be used to do feature fusion.

At step S215, on the feature map of the quarter size of the original image, target center points of pedestrians and faces can be predicted and predictions can be made in form of heat maps.

At step S216, on the feature map with a quarter size of the original image, height and width of the target can be predicted.

At step S217, on the feature map with a quarter size of the original image, offset of the target center point can be predicted.

The loss function can be set as followed:

L=L _mse+α ₁L _ssim+L _{focal_loss}+α ₂L _{wh_L1loss}+L _{offset_L1loss}

The cascading loss function includes MSE loss function, SSIM loss function, focal loss function, and L1 loss function.

The system 80 and method 100 can be used for dense crowd flow monitoring, pedestrian detection and tracking, etc. The solution presented can be deployed on edge devices, and can give artificial intelligence capabilities to traditional image acquisition devices.

A computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.

A computer program, which is being executed by at least one processor and performs any of the methods presented in this disclosure.

While the present technique has been described in detail with reference to certain embodiments, it should be appreciated that the present technique is not limited to those precise embodiments. Rather, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations would present themselves, to those skilled in the art without departing from the scope and spirit of this invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications, and variations coming within the meaning and range of equivalency of the claims are to be considered within their scope.

Claims

A method (100) for image processing, comprising:

- acquiring (S101) an image (30) ;

- extracting (S102) at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters;

- executing (S103) N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an (i+1) ^th part of the neural network (40) with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.
the method (100) according to claim 1, wherein

- for combination of the first part and the second part of the neural network (40) , the first set of parameters are acquired through backward propagation,

- for combination of the first part and the (i+1) ^th part of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network (40) , and

- for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network (40) .
the method (100) according to claim 1, wherein

- for combination of the first part and the second part of the neural network (40) , the second set of parameters are acquired through backward propagation,

- for combination of the first part and the (i+1) ^th part of the neural network (40) , i>1, the (i+1) ^th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network (40) , and

- for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network (40) , the (i+1) ^th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network (40) and based on the (i+1) ^th set of parameters updated through training for combination of the first part and the (i+1) ^th part of the neural network (40) .
the method (100) according to claim 1, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
the method (100) according to claim 1, wherein

- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;

- the method (100) further comprises: counting number of the target objects based on weighted sum of the dense maps of different components of the target objects.
the method (100) according to claim 1, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:

- a scale-independent feature extraction module, and

- a step by step deconvolution module receiving output of the scale-independent feature extraction module.
A apparatus (10) for image processing, comprising:

- an image acquisition module (201) , configured to acquire an image (30) ;

- a feature extraction module (202) , configured to extract at least one feature of the image (30) via a first part of a neural network (40) with a first set of parameters;

- an image processing module (203) , configured to execute N image processing tasks based on the at least one feature respectively, wherein for the i ^th image processing task, via an (i+1) ^th part of the neural network (40) with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.
the apparatus (10) according to claim 7, wherein

- for combination of the first part and the second part of the neural network (40) , the first set of parameters are acquired through backward propagation,

- for combination of the first part and the (i+1) ^th part of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network (40) , and

- for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network (40) .
the apparatus (10) according to claim 7, wherein

- for combination of the first part and the second part of the neural network (40) , the second set of parameters are acquired through backward propagation,

- for combination of the first part and the (i+1) ^th part of the neural network (40) , i>1, the (i+1) ^th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part and the i ^th part of the neural network (40) , and

- for combination of the first part and a series of the second part to the (N+1) ^th part of the neural network (40) , the (i+1) ^th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part and the (N+1) ^th part of the neural network (40) and based on the (i+1) ^th set of parameters updated through training for combination of the first part and the (i+1) ^th part of the neural network (40) .
the apparatus (10) according to claim 7, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
the apparatus (10) according to claim 7, wherein

- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;

- the image processing module (203) is further configured to count number of the target objects based on weighted sum of the dense maps of different components of the target objects.
the apparatus (10) according to claim 7, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:

- a scale-independent feature extraction module, and

- a step by step deconvolution module receiving output of the scale-independent feature extraction module.
An apparatus (10) for image processing, comprising:

- at least one processor (102) ;

- at least one memory (101) , coupled to the at least one processor (102) , configured to execute method according to any of claims 1～6.
A computer-readable medium for image processing, storing computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to any of claims 1～6.
A neural network (40) , comprising:

- a first part (401) , configured to extract at least one feature of the image (30) with a first set of parameters;

- a second part (402) to an (N+1) ^th part (40 (N+1) ) , configured to execute N image processing tasks based on the at least one feature respectively, wherein the (i+1) ^th part (40 (i+1) ) of the neural network (40) is configured to execute the i ^th image processing task with an (i+1) ^th set of parameters, N is an integer and N≥2, i=1.. N.
the neural network (40) according to claim 15, wherein

- for combination of the first part (401) and the second part (402) of the neural network (40) , the first set of parameters are acquired through backward propagation,

- for combination of the first part (401) and the (i+1) ^th part (40 (i+1)) of the neural network (40) , i>1, the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the i ^th part (40i) of the neural network (40) , and

- for combination of the first part (401) and a series of the second part (402) to the (N+1) ^th part (40 (N+1) ) of the neural network (40) , the first set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the (N+1) ^th part (40 (N+1) ) of the neural network (40) .
the neural network (40) according to claim 15, wherein

- for combination of the first part (401) and the second part (402) of the neural network (40) , the second set of parameters are acquired through backward propagation,

- for combination of the first part (401) and the (i+1) ^th part (40 (i+1)) of the neural network (40) , i>1, the (i+1) ^th set of parameters are acquired through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the i ^th part (40i) of the neural network (40) , and

- for combination of the first part (401) and a series of the second part (402) to the (N+1) ^th part (40 (N+1) ) of the neural network (40) , the (i+1) ^th set of parameters are updated through backward propagation and based on the first set of parameters updated through training for combination of the first part (401) and the (N+1) ^th part (40 (N+1) ) of the neural network (40) and based on the (i+1) ^th set of parameters updated through training for combination of the first part (401) and the (i+1) ^th part (40 (i+1) ) of the neural network (40) .
the neural network (40) according to claim 15, wherein the neural network (40) comprises: a basic convolution layer, a basic residual module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module, step-by-step deconvolution module.
the neural network (40) according to claim 15, wherein

- one of the N image processing tasks is to output dense map of target objects, and the outputs of the corresponding part of the neural network (40) comprises: dense maps of different components of the target objects;

- the corresponding part of the neural network (40) is further configured to: count number of the target objects based on weighted sum of the dense mpas of different components of the target objects.
the neural network (40) according to claim 15, wherein one of the N image processing tasks is to output dense map of target objects, and the corresponding part of the neural network (40) comprises:

- a scale-independent feature extraction module, and

- a step by step deconvolution module receiving output of the scale-independent feature extraction module.