CN115668277A

CN115668277A - Method and apparatus for image processing

Info

Publication number: CN115668277A
Application number: CN202080101212.2A
Authority: CN
Inventors: 于晨
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-01-31
Also published as: WO2021237727A1

Abstract

The present disclosure presents a method, apparatus, system, and computer-readable medium for image processing. The method comprises the following steps: acquiring (S101) an image (30); extracting (S102) at least one feature of the image (30) with a first set of parameters via a first portion of a neural network (40); n image processing tasks are respectively executed (S103) based on the at least one feature, wherein for the ith image processing task, the (i + 1) th group of parameters is utilized via the (i + 1) th part of the neural network (40), N is an integer and N ≧ 2, i =1 \8230; N.

Description

Method and apparatus for image processing

Technical Field

The present invention relates to the art of computer vision, and more particularly, to a method, apparatus, and computer-readable storage medium for image processing.

Background

With the development of artificial intelligence technology, population density analysis and pedestrian detection technology is widely used in security, intelligent buildings, and other fields. Population counting and pedestrian detection in complex scenes are similar to infrastructure in the field of computer vision, providing a perceptual foundation for higher semantics and more complex tasks. Such as pedestrian identification, pedestrian flow estimation, video structuring analysis, etc.

Disclosure of Invention

In common solutions, population counting and pedestrian detection are typically addressed by different neural network models, and pedestrian detection networks typically do not separately predict face portions. However, typically for application scenarios, both population counting and pedestrian detection are required. The present invention is based on deep learning techniques, particularly convolutional neural networks, and integrates population counting and pedestrian detection in a model by designing a dual-engine multitask lightweight framework. And the provided solution can be used for other categories of computer vision tasks. Compared to current solutions based on separate models, the solution provided in the present disclosure may save computational resources, reduce memory consumption, improve computational efficiency, which may also be deployed on low cost edge devices.

Embodiments of the present disclosure include methods, apparatus, and computer program products for image processing.

According to a first aspect of the present disclosure, a method for image processing is presented. The method comprises the following steps:

-acquiring an image;

-extracting at least one feature of the image via a first part of the neural network using a first set of parameters;

-performing N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, the (i + 1) th set of parameters is utilized via the (i + 1) th part of the neural network (40), N being an integer and N ≧ 2, i =1 \8230; N.

According to a second aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes:

-an image acquisition module configured to acquire an image;

-a feature extraction module configured to extract at least one feature of the image with a first set of parameters via a first part of the neural network;

-an image processing module configured to perform N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, the (i + 1) th set of parameters is utilized via the (i + 1) th part of the neural network, N being an integer and N ≧ 2, i =1 \8230n.

According to a third aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes: at least one processor; at least one memory coupled to the at least one processor configured to perform the method according to the first aspect.

According to a fourth aspect of the disclosure, a computer-readable medium for image processing is presented. The computer-readable medium stores computer-executable instructions, wherein the computer-executable instructions, when executed, cause at least one processor to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, a neural network is presented that may be used in any of the above aspects of the present disclosure. The neural network includes:

-a first part configured to extract at least one feature of an image using a first set of parameters;

-a second to (N + 1) th section (40 (N + 1)) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein the (i + 1) th section (40 (i + 1)) of the neural network (40) is configured to perform the ith image processing task using the (i + 1) th set of parameters, N being an integer and N ≧ 2, i =1 = 8230n.

With the solution provided in the present disclosure, multiple tasks can be performed via a single neural network, which can save computational resources, and this process can also adhere to service logic. This computational resource saving solution may also make the solutions suitable for deployment on edge devices.

Optionally, for a combination of the first and second parts of the neural network, obtaining a first set of parameters by back propagation; for a combination of the first part and the (i + 1) th part of the neural network, i >1, updating the first set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network; and for a combination of the first portion of the neural network and a series of second portions through the (N + 1) th portion, updating the first set of parameters by back propagation and based on the first set of parameters updated by the combination of the first portion and the (N + 1) th portion of the training neural network.

Optionally, for a combination of the first and second portions of the neural network, obtaining a second set of parameters by back propagation; for a combination of the first part and the (i + 1) th part of the neural network, i >1, obtaining an (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network; and for a combination of the first portion of the neural network and a series of second portions through the (N + 1) th portion, updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by the combination of the first portion and the (N + 1) th portion of the training neural network and based on the (i + 1) th set of parameters updated by the combination of the first portion and the (i + 1) th portion of the training neural network.

In contrast to current back propagation processes that involve only the combination of all parts, the solution presented herein can obtain very close parameters during each combination of a first part and another part, which can converge quickly, save computational power, and speed up the training process.

Optionally, the neural network may comprise: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module. This simplified structure and the combination of the above modules can reduce redundant parameters and ensure a neural network with advanced performance based on light weight.

Optionally, one of the N image processing tasks is outputting a dense mapping of the target object, and the outputting of the corresponding portion of the neural network may include: dense mapping of different components of the target object; the number of target objects may be further counted based on a weighted sum of dense mappings of different components of the target object. This optional solution may make the result of counting more accurate in case objects overlap each other. The weights used may be determined based on engineering practices and/or by testing.

Optionally, one of the N image processing tasks is a dense mapping of the output target object, and the corresponding portion of the neural network may include: a scale-independent feature extraction module and a step-by-step deconvolution module that receives the output of the scale-independent feature extraction module. The step-wise deconvolution module utilizes the different scales of features extracted by the scale-independent feature extraction module, followed by a step-wise recovery to achieve an accurate dense mapping. For example, where there are four target object clusters on the image, without scale-independent feature extraction and gradual deconvolution, the output dense map may include only four fuzzy clusters, the details of each of which may not be seen.

Drawings

The above-mentioned and other features and advantages of the present technology, and the manner of attaining them, will become more apparent and the present technology itself will be better understood by reference to the following description of embodiments of the present technology taken in conjunction with the accompanying drawings, wherein:

fig. 1 depicts a block diagram of an apparatus for image processing according to one embodiment of the present disclosure.

Fig. 2 depicts the structure of a CNN according to one embodiment of the present disclosure.

Fig. 3 depicts a training process for CNN according to one embodiment of the present disclosure.

Fig. 4 depicts a flow diagram of a method for image processing according to one embodiment of the present disclosure.

FIG. 5 depicts an image processing system according to one embodiment of the present disclosure.

FIG. 6 depicts the training process of the system shown in FIG. 5.

Fig. 7 depicts the image processing process of the system shown in fig. 5.

Reference numerals：

10, apparatus for image processing

101, at least one memory

102, at least one processor

103, communication module

20, image processing program

201, image acquisition module

202, a feature extraction module

203, image processing module

30, images acquired and to be processed

40 neural network

401, first part of neural network 40

402, second part of the neural network 40

40i, part i of the neural network 40

51, first set of parameters of the first part 401 of the neural network 40

52, second set of parameters for the second portion 402 of the neural network 40

5i, ith set of parameters for the ith part 40i of the neural network 40

60, training server

70, video camera

100 method for image processing

S101-S103, steps of method 100

80, lightweight group counting and pedestrian detection system

801, on-line training module

802, offline module

81, server

82, video camera

83 edge device

811, image sample

812 neural network

813 loss function

831, feature extraction

832 fuzzy engine

833, refining engine

834, population count

835 mass flow analysis

836, re-recognition

200 method for image processing

S201-S217, steps of method 200

Detailed Description

The above-described and other features of the present technology are described in detail below. Various embodiments are described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to illustrate rather than to limit the invention. It may be evident that such embodiment(s) may be practiced without these specific details.

When introducing elements of various embodiments of the present disclosure, the articles "a" and "the" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.

Image processing solutions are proposed in the present disclosure that can be used to perform multiple tasks, such as the mentioned population counting and pedestrian detection, via a single neural network. Now, the present disclosure will be described in detail below by referring to fig. 1 to 7.

Fig. 1 depicts a block diagram of an apparatus according to one embodiment of the present disclosure. The apparatus for image processing 10 presented in this disclosure may be implemented as a network of computer processors to perform the subsequent method for image processing 100 presented in this disclosure. Apparatus 10 may also be a single computer (as shown in fig. 1) including at least one memory 101 including a computer-readable medium, such as Random Access Memory (RAM). The apparatus 10 also includes at least one processor 102 coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101 and, when executed by the at least one processor 102, may cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a state machine, and so forth. Embodiments of a computer-readable medium include, but are not limited to, a floppy disk, a CD-ROM, a magnetic disk, a memory chip, a ROM, a RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable media may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may comprise code from any computer programming language, including, for example, C + +, C #, visual Basic, java, and JavaScript.

The at least one memory 101 shown in fig. 1 may contain an image processing program 20 that, when executed by the at least one processor 102, causes the at least one processor 102 to perform the method 100 for image processing presented in the present disclosure. The image processing program 20 may include:

an image acquisition module 201 configured to acquire an image 30;

a feature extraction module 202 configured to extract at least one feature of the image 30 with a first set of parameters via a first portion of the neural network 40;

an image processing module 203 configured to perform N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, an i +1 th set of parameters is utilized via an i +1 th portion of the neural network 40, N being an integer and N ≧ 2, i =1 \8230n.

The image to be processed 30 may be acquired by the camera 70 and sent to the device 10 via the communication module 103 shown in fig. 1. The image 30 may also be stored in at least one memory 101.

The online training process of the neural network 40 may be performed by a server 60 (e.g., a high-performance GPU server) with a large amount of data. After training, the file of the neural network 40 (including the parameters for each portion of the neural network 40) may be transmitted to the device 10 via the communication module 103 and may also be stored in the at least one memory 101, after which the neural network 40 may be deployed on the device 10. The neural network 40 may be a CNN.

However, the online training process may also be performed on the apparatus 10, depending on the device configuration and processing capabilities. In this case, the online training program may be part of the image processing program 20 and may be pre-stored in the at least one memory 101.

Multiple tasks may be performed via the same CNN, which may save computing resources, and this process may also respect service logic. This computing resource saving solution may also make the apparatus 10 suitable for deployment on edge devices.

As shown in fig. 2, the neural network 40 may include:

a first portion 401 configured to extract at least one feature of the image 30 using a first set of parameters 51;

a second part 402 to an (N + 1) th part 40 (N + 1) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein the (i + 1) th part 40 (i + 1) of the neural network 40 is configured to perform the ith image processing task using the (i + 1) th set of parameters 5 (i + 1), N being an integer and N ≧ 2, i =1 \8230n.

It is worth mentioning that the first portion 401 may extract shallow features, and optionally, some of the second portion 402 to the (N + 1) th portion 40 (N + 1) may further extract deep features.

Referring now to fig. 3, back propagation may be performed for training each portion of the neural network 40. A first different part corresponding to a different image processing task is trained independently and finally an overall fine tuning training process is performed. With the combination of the first portion 401 and each of the other portions 40 (i + 1), the first set of parameters 51 may be updated. And with the combination of all the parts, the parameters of each part can also be updated. In contrast to current back propagation processes that involve only the combination of all parts, the solution presented herein can obtain very close parameters during each combination of a first part and another part, which can converge quickly, save computational power, and speed up the training process.

Specifically, with respect to the first set of parameters, for the combination of the first portion 401 and the second portion 402 of the neural network 40, the first set of parameters is obtained by back propagation using a large number of image samples; for the combination of the first portion 401 and the (i + 1) th portion 40 (i + 1) of the neural network 40, i >1, updating the first set of parameters by back-propagation and based on the first set of parameters updated by the combination of the first portion 401 and the i-th portion 40i of the training neural network 40; and for the combination of the first portion 401 and the series of second portions 402 through the (N + 1) th portion 40 (N + 1) of the neural network 40, the first set of parameters is updated by back-propagation and based on the first set of parameters updated by the combination of the first portion 401 and the (N + 1) th portion 40 (N + 1) of the training neural network 40.

And, with respect to the second to (N + 1) th sets of parameters, for the combination of the first part 401 and the second part 402 of the neural network 40, the second set of parameters is acquired by back propagation; for the combination of the first part 401 of the neural network 40 and the (i + 1) th part 40 (i + 1), i >1, the (i + 1) th set of parameters is obtained by back propagation and based on the first set of parameters updated by the combination of the first part 401 of the training neural network 40 and the i-th part 40 i; and for the combination of the first portion 401 and the series of second portions 402 through the (N + 1) th portion 40 (N + 1) of the neural network 40, the (i + 1) th set of parameters is updated by back-propagation and based on the first set of parameters updated by the combination of the first portion 401 and the (N + 1) th portion 40 (N + 1) of the training neural network 40 and on the (i + 1) th set of parameters updated by the combination of the first portion 401 and the (i + 1) th portion 40 (i + 1) of the training neural network 40.

Optionally, the neural network 40 may include: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module. This simplified structure and the combination of the above modules can reduce redundant parameters and ensure that the neural network 40 has advanced performance based on light weight.

Optionally, one of the N image processing tasks is outputting a dense mapping of the target object, and the output of the corresponding portion of the neural network 40 may include dense mapping of different components of the target object; the image processing module 203 is further configured to count the number of target objects based on a weighted sum of dense mappings of different components of the target objects. This optional solution may make the result of the counting more accurate in case the objects overlap each other. The weights used may be determined based on engineering practices and/or by testing.

Optionally, one of the N image processing tasks is a dense mapping of the output target object, and the corresponding portion of the neural network 40 may include: a scale-independent feature extraction module and a gradual deconvolution module that receives the output of the scale-independent feature extraction module. The scale-independent feature extraction module may be configured to extract features at a plurality of different scales, and the stepwise deconvolution module may include a plurality of pairs of convolution and deconvolution modules, with each pair corresponding to an upsampling process. The gradual deconvolution module utilizes the different scales of features extracted by the scale-independent feature extraction module, followed by gradual recovery to achieve accurate dense mapping. For example, where there are four target object clusters on the image, without scale-independent feature extraction and gradual deconvolution, the output dense map may include only four fuzzy clusters, the details of each of which may not be seen.

Although the image acquisition module 201, the feature extraction module 202, and the image processing module 203 are described above as software modules of the image processing program 20. Furthermore, it may be implemented via hardware, such as an ASIC chip. They may be integrated into one chip or implemented separately and electrically connected.

It should be mentioned that the present disclosure may encompass devices having architectures that are different from that shown in fig. 1. The above architecture is merely exemplary and is used to explain the exemplary method 100 shown in fig. 4.

Various methods according to the present disclosure may be performed. An exemplary method 100 according to the present disclosure includes the steps of:

s101: acquiring an image 30;

s102: extracting at least one feature of the image 30 via a first portion of the neural network 40 using a first set of parameters;

s103: n image processing tasks are respectively executed based on at least one feature, wherein for the ith image processing task, the (i + 1) th group of parameters is utilized through the (i + 1) th part of the neural network 40, N is an integer and N is larger than or equal to 2, i =1 \8230, N.

Optionally, for a combination of the first and second portions of the neural network 40, a first set of parameters is obtained by back propagation, for a combination of the first and (i + 1) th portions of the neural network 40, i >1, the first set of parameters is updated by back propagation and based on the first set of parameters updated by the combination of the first and (i) th portions of the training neural network 40, and for a combination of the first portion of the neural network 40 and a series of second to (N + 1) th portions, the first set of parameters is updated by back propagation and based on the first set of parameters updated by the combination of the first and (N + 1) th portions of the training neural network 40.

Optionally, for a combination of the first and second portions of the neural network 40, a second set of parameters is obtained by back propagation, for a combination of the first and (i + 1) th portions of the neural network 40, i >1, a (i + 1) th set of parameters is obtained by back propagation and based on the first set of parameters updated by the combination of the first and (i + 1) th portions of the training neural network 40, and for a combination of the first portion of the neural network 40 and a series of second to (N + 1) th portions, the (i + 1) th set of parameters is updated by back propagation and based on the first set of parameters updated by the combination of the first and (N + 1) th portions of the training neural network 40 and on the (i + 1) th set of parameters updated by the combination of the first and (i + 1) th portions of the training neural network 40.

Optionally, the neural network 40 may include: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.

Optionally, one of the N image processing tasks is outputting a dense mapping of the target object, and the output of the corresponding portion of the neural network 40 may include: dense mapping of different components of the target object; the method 100 may further comprise: the number of target objects is counted based on a weighted sum of dense mappings of different components of the target objects.

Optionally, one of the N image processing tasks is a dense mapping of the output target object, and the corresponding portion of the neural network 40 may include: a scale-independent feature extraction module and a step-by-step deconvolution module that receives the output of the scale-independent feature extraction module.

The following are use cases in which the solutions provided in the present disclosure may be employed. Referring to fig. 5, in this use case, a lightweight population counting and pedestrian detection system 80 for an edge device is provided. The system 80 may include:

an online training module 801 configured to train a neural network 813 for image processing;

an offline module 802 configured to perform image processing via a neural network 813.

With respect to online training module 801, neural network model 813 may be trained with a large number of image samples 811 via server 81 (e.g., a high performance GPU server). After training, the model of the neural network 813 can be deployed on the edge device 83 of the offline module 802, which includes the feature extraction 831, the blur engine 831, and the refinement engine 832.

With respect to the offline training module 802, the camera 82 may be connected to the edge device 83 and the neural network 812 may run on the edge device 83. First, features may be extracted from the image to be processed by feature extraction 831, then the blur engine 832 will output a dense map and the refinement engine 833 will perform pedestrian detection. The output dense map may be further processed by a population count 834 for counting the number of pedestrians, and a population flow analysis 835 for analyzing the population flow. The output bounding box of the pedestrian and optionally the face may be further processed by re-recognition 836, for example to find a particular person from the image, or to find a person wearing the mask.

Referring now to fig. 6, during the training phase, first, at step 1, the image sample is input into the neural network 812, and the loss function may be set to MSE and SSIM, parameter set 1 of the feature extraction 831 and parameter set 2 of the blur engine 832 may be obtained by back propagation.

Subsequently, at step 2, the image sample is input into the neural network 812 and the loss function can be set to the focus loss and the L1 loss, by back propagation, parameter set 1 will be updated to parameter set 1' based on parameter set 1, and parameter set 3 of the refinement engine 833 will be obtained.

Finally, at step 3, the image sample is input into the neural network 812, and the loss function may be set to a weighted sum of MSE, SSIM, focus loss, and L1 loss. By back propagation, parameter set 1' will be updated to parameter set 1", parameter set 2 will be updated to parameter set 2', and parameter set 3 will be updated to parameter set 3'.

Step 1 and step 2 are added compared to the current back propagation, and the parameters acquired during the first two steps will be used during step 3. In fact, after step 1 and step 2, the parameter set is almost ready, so step 3 is slightly adjusted, which allows the training process to converge quickly, which saves computing power and shortens the whole training process.

During the reasoning phase, parameter set 1", parameter set 2 'and parameter set 3' will be used for image processing.

Fig. 7 shows the workflow of the inference phase of the use case.

At step S201, basic convolution is performed. A convolution kernel with stride may be used for the convolution operation and the ReLU function may be used for activation, a batch normalization 2d (batch norm2 d) may be used for normalization, and then the max pooling layer may be used. A feature map having a size of one-fourth of the original image can be obtained at this step, and the amount of operation can be reduced to a large extent.

At step S202, the residual block may be used and features may be further extracted. The following modified h-swish function may be used as the activation function. The H-swish activation function is as follows:

at step S203, where the residual block may be used, features may be further extracted. In addition, the improved h-swish function may be used as an activation function.

Different classes of features may be extracted through steps S202 and S203, respectively. Optionally, other residual blocks may also be used to extract other kinds of features herein.

At step S204, a squeeze-and-excitation bottleneck may be used. Furthermore, the improved h-swish function may also be used as an activation function. Feature attention mechanisms may be used to make the neural network 812 learn more efficient abstract features. The dimension can be reduced and the amount of calculation can be reduced. Important features can be found.

At step S205, a squeeze and a trigger bottleneck may be used. Furthermore, the improved h-swish function may also be used as an activation function.

The features extracted at step S205 will be output to the fuzzy engine 832 to obtain the dense mapping. The features extracted at steps S202, S203, S204 and S205 will be output to the refinement engine 833 for pedestrian detection.

Now to the blur engine 832. The features output by step S205 will be used at steps S206, S207 and S208, respectively. At these steps, features of different scales will be extracted. Subsequently, in case of repeating steps S210 and S211, a gradual deconvolution operation may be performed to restore the exact dense mapping, not only with contours, but also with detailed information.

At step S206, which is part of the scale-independent feature extraction module 8321, a convolution operation may be performed. Optionally, no activation function is used here, and the normalization may use batch normalization 2d.

At step S207, here also part of the scale-independent feature extraction module 8321, which may include two kernel convolution modules. Both of the convolution modules contain batch normalized 2d layers. The first module may use an h-swish Activate function and the second module may not use an Activate function. The number of eigen channels in the output may be one quarter of the input.

At step S208, here also part of the scale-independent feature extraction module 8321, which may include two kernel convolution modules. Both convolution module cams contain batch normalized 2d layers. The first module may use an h-swish activation function and the second module may not use an activation function. The number of characteristic channels in the output may be one quarter of the input.

At step S209, a feature stitching technique may be used to obtain a non-increasing number of feature maps.

At step S210, an instance normalization 2d (instancenorm 2 d) may be used as a normalization layer, using a convolution kernel and a ReLU activation function.

At step S211, a deconvolution kernel may be used, and the example normalization 2d may be used as a normalization layer.

At step S212, the body and head may be predicted based on the density. The final number of people can be obtained by concatenating the two results. The formula can be as follows:

C＝λ ₁ D _body +λ ₂ D _head

where C is the result of population counting. D _body Is the sum of the body intensity maps. D _head Is the sum of the dense mappings of the header. Lambda [ alpha ] ₁ ，λ ₂ Can be set according to the scene. This weighted sum can effectively eliminate the effect of human overlap, in which case the final result can be more accurate.

Now to the refinement engine 833.

At step S213, bilinear interpolation may be used to obtain the twice-enlarged feature map without computation.

In step S214, pixel-by-pixel addition is available for feature fusion without adding computational parameters.

At step S215, on the feature map of one-fourth size of the original image, the target center points of the pedestrian and the human face may be predicted, and may be predicted in the form of a heat map.

At step S216, on the feature map having a size of one-fourth of the original image, the height and width of the target may be predicted.

At step S217, on the feature map having a size of one-fourth of the original image, the deviation of the target center point may be predicted.

The loss function can be set as follows:

L＝L _mse +α ₁ L _ssim +L _{focal_loss} +α ₂ L _{wh_L1loss} +L _{offset_L1loss}

the series loss function includes a MSE loss function, an SSIM loss function, a focus loss function, and an L1 loss function.

The system 80 and method 100 may be used for dense population flow monitoring, pedestrian detection and tracking, and the like. The presented solution may be deployed on edge devices and may provide artificial intelligence capabilities for traditional image acquisition devices.

Also provided in the present disclosure is a computer-readable medium storing computer-executable instructions that, when executed by a computer, cause the computer to perform any of the methods presented in the present disclosure.

The computer program is being executed by at least one processor and performs any of the methods presented in this disclosure.

Although the present technology has been described in detail with reference to certain embodiments, it should be understood that the present technology is not limited to those precise embodiments. Indeed, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations can be made by those skilled in the art without departing from the scope and spirit of the invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications and variations that fall within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method (100) for image processing, comprising:

-acquiring (S101) an image (30);

-extracting (S102) at least one feature of the image (30) with a first set of parameters via a first part of a neural network (40);

-performing (S103) N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, the (i + 1) th set of parameters is utilized via the (i + 1) th portion of the neural network (40), N being an integer and N ≧ 2, i =1 \8230An.

2. The method (100) of claim 1, wherein

-for a combination of the first and second parts of the neural network (40), obtaining the first set of parameters by back propagation,

-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, updating the first set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and

-for a combination of the first part of the neural network (40) and a series of the second to (N + 1) th parts, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the (N + 1) th part of the neural network (40).

3. The method (100) of claim 1, wherein

-obtaining a second set of parameters by back propagation for a combination of the first and second parts of the neural network (40),

-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, obtaining the (i + 1) th set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and

-for a combination of the first part and a series of the second part to the (N + 1) th part of the neural network (40), updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training a combination of the first part and the (N + 1) th part of the neural network (40) and based on the (i + 1) th set of parameters updated by training a combination of the first part and the (i + 1) th part of the neural network (40).

4. The method (100) of claim 1, wherein the neural network (40) comprises: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.

5. The method (100) of claim 1, wherein

-one of the N image processing tasks is a dense mapping of output target objects, and the outputting of the corresponding part of the neural network (40) comprises: a dense mapping of different components of the target object;

-the method (100) further comprises: counting a number of the target objects based on a weighted sum of the dense mappings of different components of the target objects.

6. The method (100) of claim 1, wherein one of the N image processing tasks is a dense mapping of output target objects, and the corresponding portion of the neural network (40) comprises:

-a scale-independent feature extraction module, and

-a stepwise deconvolution module receiving the output of the scale-independent feature extraction module.

7. An apparatus (10) for image processing, comprising:

-an image acquisition module (201) configured to acquire an image (30);

-a feature extraction module (202) configured to extract at least one feature of the image (30) with a first set of parameters via a first part of a neural network (40);

-an image processing module (203) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, an (i + 1) th set of parameters is utilized via an (i + 1) th portion of the neural network (40), N being an integer and N ≧ 2, i =1 \8230; N.

8. The apparatus (10) of claim 7, wherein

-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and

9. The apparatus (10) of claim 7, wherein

-obtaining a second set of parameters by back propagation for a combination of the first part and the second part of the neural network (40),

10. The apparatus (10) of claim 7, wherein the neural network (40) comprises: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.

11. The apparatus (10) of claim 7, wherein

-one of the N image processing tasks is outputting a dense mapping of target objects, and the outputting of the corresponding part of the neural network (40) comprises: a dense mapping of different components of the target object;

-the image processing module (203) is further configured to count the number of target objects based on a weighted sum of the dense mappings of different components of the target objects.

12. The apparatus (10) of claim 7, wherein one of the N image processing tasks is a dense mapping of output target objects, and the corresponding portion of the neural network (40) includes:

-a scale-independent feature extraction module, and

13. An apparatus (10) for image processing, comprising:

-at least one processor (102);

-at least one memory (101), coupled to the at least one processor (102), configured to perform the method according to any one of claims 1 to 6.

14. A computer-readable medium storing computer-executable instructions for image processing, wherein the computer-executable instructions, when executed, cause at least one processor to perform the method of any of claims 1-6.

15. A neural network (40), comprising:

-a first part (401) configured to extract at least one feature of the image (30) using a first set of parameters;

-a second part (402) to an (N + 1) th part (40 (N + 1)) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein the (i + 1) th part (40 (i + 1)) of the neural network (40) is configured to perform the ith image processing task with the (i + 1) th set of parameters, N being an integer and N ≧ 2, i =1 \8230n.

16. The neural network (40) of claim 15, wherein

-for a combination of the first part (401) and the second part (402) of the neural network (40), obtaining the first set of parameters by back propagation,

-for a combination of the first part (401) and the (i + 1) th part (40 (i + 1)) of the neural network (40), i >1, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part (401) and the i-th part (40 i) of the neural network (40), and

-for a combination of the first portion (401) and a series of the second (402) to (N + 1) th portions (40 (N + 1)) of the neural network (40), updating the first set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first portion (401) and the (N + 1) th portion (40 (N + 1)) of the neural network (40).

17. The neural network (40) of claim 15, wherein

-obtaining a second set of parameters by back propagation for a combination of the first part (401) and the second part (402) of the neural network (40),

-for a combination of the first part (401) and the (i + 1) th part (40 (i + 1)) of the neural network (40), i >1, obtaining the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part (401) and the i-th part (40 i) of the neural network (40), and

-for a combination of the first portion (401) and a series of the second portion (402) to the (N + 1) th portion (40 (N + 1)) of the neural network (40), updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first portion (401) and the (N + 1) th portion (40 (N + 1)) of the neural network (40) and on the (i + 1) th set of parameters updated by training the combination of the first portion (401) and the (i + 1) th portion (40 (i + 1)) of the neural network (40).

18. The neural network (40) of claim 15, wherein the neural network (40) comprises: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.

19. The neural network (40) of claim 15, wherein

-the corresponding part of the neural network (40) is further configured to: counting a number of the target objects based on a weighted sum of the dense mappings of different components of the target objects.

20. The neural network (40) of claim 15, wherein one of the N image processing tasks is a dense mapping of output target objects, and the corresponding portion of the neural network (40) includes:

-a scale-independent feature extraction module, and