CN115668277A - Method and apparatus for image processing - Google Patents

Method and apparatus for image processing Download PDF

Info

Publication number
CN115668277A
CN115668277A CN202080101212.2A CN202080101212A CN115668277A CN 115668277 A CN115668277 A CN 115668277A CN 202080101212 A CN202080101212 A CN 202080101212A CN 115668277 A CN115668277 A CN 115668277A
Authority
CN
China
Prior art keywords
neural network
parameters
combination
image processing
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080101212.2A
Other languages
Chinese (zh)
Inventor
于晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Publication of CN115668277A publication Critical patent/CN115668277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure presents a method, apparatus, system, and computer-readable medium for image processing. The method comprises the following steps: acquiring (S101) an image (30); extracting (S102) at least one feature of the image (30) with a first set of parameters via a first portion of a neural network (40); n image processing tasks are respectively executed (S103) based on the at least one feature, wherein for the ith image processing task, the (i + 1) th group of parameters is utilized via the (i + 1) th part of the neural network (40), N is an integer and N ≧ 2, i =1 \8230; N.

Description

Method and apparatus for image processing
Technical Field
The present invention relates to the art of computer vision, and more particularly, to a method, apparatus, and computer-readable storage medium for image processing.
Background
With the development of artificial intelligence technology, population density analysis and pedestrian detection technology is widely used in security, intelligent buildings, and other fields. Population counting and pedestrian detection in complex scenes are similar to infrastructure in the field of computer vision, providing a perceptual foundation for higher semantics and more complex tasks. Such as pedestrian identification, pedestrian flow estimation, video structuring analysis, etc.
Disclosure of Invention
In common solutions, population counting and pedestrian detection are typically addressed by different neural network models, and pedestrian detection networks typically do not separately predict face portions. However, typically for application scenarios, both population counting and pedestrian detection are required. The present invention is based on deep learning techniques, particularly convolutional neural networks, and integrates population counting and pedestrian detection in a model by designing a dual-engine multitask lightweight framework. And the provided solution can be used for other categories of computer vision tasks. Compared to current solutions based on separate models, the solution provided in the present disclosure may save computational resources, reduce memory consumption, improve computational efficiency, which may also be deployed on low cost edge devices.
Embodiments of the present disclosure include methods, apparatus, and computer program products for image processing.
According to a first aspect of the present disclosure, a method for image processing is presented. The method comprises the following steps:
-acquiring an image;
-extracting at least one feature of the image via a first part of the neural network using a first set of parameters;
-performing N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, the (i + 1) th set of parameters is utilized via the (i + 1) th part of the neural network (40), N being an integer and N ≧ 2, i =1 \8230; N.
According to a second aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes:
-an image acquisition module configured to acquire an image;
-a feature extraction module configured to extract at least one feature of the image with a first set of parameters via a first part of the neural network;
-an image processing module configured to perform N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, the (i + 1) th set of parameters is utilized via the (i + 1) th part of the neural network, N being an integer and N ≧ 2, i =1 \8230n.
According to a third aspect of the present disclosure, an apparatus for image processing is presented. The apparatus includes: at least one processor; at least one memory coupled to the at least one processor configured to perform the method according to the first aspect.
According to a fourth aspect of the disclosure, a computer-readable medium for image processing is presented. The computer-readable medium stores computer-executable instructions, wherein the computer-executable instructions, when executed, cause at least one processor to perform the method according to the first aspect.
According to a fifth aspect of the present disclosure, a neural network is presented that may be used in any of the above aspects of the present disclosure. The neural network includes:
-a first part configured to extract at least one feature of an image using a first set of parameters;
-a second to (N + 1) th section (40 (N + 1)) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein the (i + 1) th section (40 (i + 1)) of the neural network (40) is configured to perform the ith image processing task using the (i + 1) th set of parameters, N being an integer and N ≧ 2, i =1 = 8230n.
With the solution provided in the present disclosure, multiple tasks can be performed via a single neural network, which can save computational resources, and this process can also adhere to service logic. This computational resource saving solution may also make the solutions suitable for deployment on edge devices.
Optionally, for a combination of the first and second parts of the neural network, obtaining a first set of parameters by back propagation; for a combination of the first part and the (i + 1) th part of the neural network, i >1, updating the first set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network; and for a combination of the first portion of the neural network and a series of second portions through the (N + 1) th portion, updating the first set of parameters by back propagation and based on the first set of parameters updated by the combination of the first portion and the (N + 1) th portion of the training neural network.
Optionally, for a combination of the first and second portions of the neural network, obtaining a second set of parameters by back propagation; for a combination of the first part and the (i + 1) th part of the neural network, i >1, obtaining an (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network; and for a combination of the first portion of the neural network and a series of second portions through the (N + 1) th portion, updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by the combination of the first portion and the (N + 1) th portion of the training neural network and based on the (i + 1) th set of parameters updated by the combination of the first portion and the (i + 1) th portion of the training neural network.
In contrast to current back propagation processes that involve only the combination of all parts, the solution presented herein can obtain very close parameters during each combination of a first part and another part, which can converge quickly, save computational power, and speed up the training process.
Optionally, the neural network may comprise: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module. This simplified structure and the combination of the above modules can reduce redundant parameters and ensure a neural network with advanced performance based on light weight.
Optionally, one of the N image processing tasks is outputting a dense mapping of the target object, and the outputting of the corresponding portion of the neural network may include: dense mapping of different components of the target object; the number of target objects may be further counted based on a weighted sum of dense mappings of different components of the target object. This optional solution may make the result of counting more accurate in case objects overlap each other. The weights used may be determined based on engineering practices and/or by testing.
Optionally, one of the N image processing tasks is a dense mapping of the output target object, and the corresponding portion of the neural network may include: a scale-independent feature extraction module and a step-by-step deconvolution module that receives the output of the scale-independent feature extraction module. The step-wise deconvolution module utilizes the different scales of features extracted by the scale-independent feature extraction module, followed by a step-wise recovery to achieve an accurate dense mapping. For example, where there are four target object clusters on the image, without scale-independent feature extraction and gradual deconvolution, the output dense map may include only four fuzzy clusters, the details of each of which may not be seen.
Drawings
The above-mentioned and other features and advantages of the present technology, and the manner of attaining them, will become more apparent and the present technology itself will be better understood by reference to the following description of embodiments of the present technology taken in conjunction with the accompanying drawings, wherein:
fig. 1 depicts a block diagram of an apparatus for image processing according to one embodiment of the present disclosure.
Fig. 2 depicts the structure of a CNN according to one embodiment of the present disclosure.
Fig. 3 depicts a training process for CNN according to one embodiment of the present disclosure.
Fig. 4 depicts a flow diagram of a method for image processing according to one embodiment of the present disclosure.
FIG. 5 depicts an image processing system according to one embodiment of the present disclosure.
FIG. 6 depicts the training process of the system shown in FIG. 5.
Fig. 7 depicts the image processing process of the system shown in fig. 5.
Reference numerals
10, apparatus for image processing
101, at least one memory
102, at least one processor
103, communication module
20, image processing program
201, image acquisition module
202, a feature extraction module
203, image processing module
30, images acquired and to be processed
40 neural network
401, first part of neural network 40
402, second part of the neural network 40
40i, part i of the neural network 40
51, first set of parameters of the first part 401 of the neural network 40
52, second set of parameters for the second portion 402 of the neural network 40
5i, ith set of parameters for the ith part 40i of the neural network 40
60, training server
70, video camera
100 method for image processing
S101-S103, steps of method 100
80, lightweight group counting and pedestrian detection system
801, on-line training module
802, offline module
81, server
82, video camera
83 edge device
811, image sample
812 neural network
813 loss function
831, feature extraction
832 fuzzy engine
833, refining engine
834, population count
835 mass flow analysis
836, re-recognition
200 method for image processing
S201-S217, steps of method 200
Detailed Description
The above-described and other features of the present technology are described in detail below. Various embodiments are described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to illustrate rather than to limit the invention. It may be evident that such embodiment(s) may be practiced without these specific details.
When introducing elements of various embodiments of the present disclosure, the articles "a" and "the" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.
Image processing solutions are proposed in the present disclosure that can be used to perform multiple tasks, such as the mentioned population counting and pedestrian detection, via a single neural network. Now, the present disclosure will be described in detail below by referring to fig. 1 to 7.
Fig. 1 depicts a block diagram of an apparatus according to one embodiment of the present disclosure. The apparatus for image processing 10 presented in this disclosure may be implemented as a network of computer processors to perform the subsequent method for image processing 100 presented in this disclosure. Apparatus 10 may also be a single computer (as shown in fig. 1) including at least one memory 101 including a computer-readable medium, such as Random Access Memory (RAM). The apparatus 10 also includes at least one processor 102 coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101 and, when executed by the at least one processor 102, may cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a state machine, and so forth. Embodiments of a computer-readable medium include, but are not limited to, a floppy disk, a CD-ROM, a magnetic disk, a memory chip, a ROM, a RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable media may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may comprise code from any computer programming language, including, for example, C + +, C #, visual Basic, java, and JavaScript.
The at least one memory 101 shown in fig. 1 may contain an image processing program 20 that, when executed by the at least one processor 102, causes the at least one processor 102 to perform the method 100 for image processing presented in the present disclosure. The image processing program 20 may include:
an image acquisition module 201 configured to acquire an image 30;
a feature extraction module 202 configured to extract at least one feature of the image 30 with a first set of parameters via a first portion of the neural network 40;
an image processing module 203 configured to perform N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, an i +1 th set of parameters is utilized via an i +1 th portion of the neural network 40, N being an integer and N ≧ 2, i =1 \8230n.
The image to be processed 30 may be acquired by the camera 70 and sent to the device 10 via the communication module 103 shown in fig. 1. The image 30 may also be stored in at least one memory 101.
The online training process of the neural network 40 may be performed by a server 60 (e.g., a high-performance GPU server) with a large amount of data. After training, the file of the neural network 40 (including the parameters for each portion of the neural network 40) may be transmitted to the device 10 via the communication module 103 and may also be stored in the at least one memory 101, after which the neural network 40 may be deployed on the device 10. The neural network 40 may be a CNN.
However, the online training process may also be performed on the apparatus 10, depending on the device configuration and processing capabilities. In this case, the online training program may be part of the image processing program 20 and may be pre-stored in the at least one memory 101.
Multiple tasks may be performed via the same CNN, which may save computing resources, and this process may also respect service logic. This computing resource saving solution may also make the apparatus 10 suitable for deployment on edge devices.
As shown in fig. 2, the neural network 40 may include:
a first portion 401 configured to extract at least one feature of the image 30 using a first set of parameters 51;
a second part 402 to an (N + 1) th part 40 (N + 1) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein the (i + 1) th part 40 (i + 1) of the neural network 40 is configured to perform the ith image processing task using the (i + 1) th set of parameters 5 (i + 1), N being an integer and N ≧ 2, i =1 \8230n.
It is worth mentioning that the first portion 401 may extract shallow features, and optionally, some of the second portion 402 to the (N + 1) th portion 40 (N + 1) may further extract deep features.
Referring now to fig. 3, back propagation may be performed for training each portion of the neural network 40. A first different part corresponding to a different image processing task is trained independently and finally an overall fine tuning training process is performed. With the combination of the first portion 401 and each of the other portions 40 (i + 1), the first set of parameters 51 may be updated. And with the combination of all the parts, the parameters of each part can also be updated. In contrast to current back propagation processes that involve only the combination of all parts, the solution presented herein can obtain very close parameters during each combination of a first part and another part, which can converge quickly, save computational power, and speed up the training process.
Specifically, with respect to the first set of parameters, for the combination of the first portion 401 and the second portion 402 of the neural network 40, the first set of parameters is obtained by back propagation using a large number of image samples; for the combination of the first portion 401 and the (i + 1) th portion 40 (i + 1) of the neural network 40, i >1, updating the first set of parameters by back-propagation and based on the first set of parameters updated by the combination of the first portion 401 and the i-th portion 40i of the training neural network 40; and for the combination of the first portion 401 and the series of second portions 402 through the (N + 1) th portion 40 (N + 1) of the neural network 40, the first set of parameters is updated by back-propagation and based on the first set of parameters updated by the combination of the first portion 401 and the (N + 1) th portion 40 (N + 1) of the training neural network 40.
And, with respect to the second to (N + 1) th sets of parameters, for the combination of the first part 401 and the second part 402 of the neural network 40, the second set of parameters is acquired by back propagation; for the combination of the first part 401 of the neural network 40 and the (i + 1) th part 40 (i + 1), i >1, the (i + 1) th set of parameters is obtained by back propagation and based on the first set of parameters updated by the combination of the first part 401 of the training neural network 40 and the i-th part 40 i; and for the combination of the first portion 401 and the series of second portions 402 through the (N + 1) th portion 40 (N + 1) of the neural network 40, the (i + 1) th set of parameters is updated by back-propagation and based on the first set of parameters updated by the combination of the first portion 401 and the (N + 1) th portion 40 (N + 1) of the training neural network 40 and on the (i + 1) th set of parameters updated by the combination of the first portion 401 and the (i + 1) th portion 40 (i + 1) of the training neural network 40.
Optionally, the neural network 40 may include: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module. This simplified structure and the combination of the above modules can reduce redundant parameters and ensure that the neural network 40 has advanced performance based on light weight.
Optionally, one of the N image processing tasks is outputting a dense mapping of the target object, and the output of the corresponding portion of the neural network 40 may include dense mapping of different components of the target object; the image processing module 203 is further configured to count the number of target objects based on a weighted sum of dense mappings of different components of the target objects. This optional solution may make the result of the counting more accurate in case the objects overlap each other. The weights used may be determined based on engineering practices and/or by testing.
Optionally, one of the N image processing tasks is a dense mapping of the output target object, and the corresponding portion of the neural network 40 may include: a scale-independent feature extraction module and a gradual deconvolution module that receives the output of the scale-independent feature extraction module. The scale-independent feature extraction module may be configured to extract features at a plurality of different scales, and the stepwise deconvolution module may include a plurality of pairs of convolution and deconvolution modules, with each pair corresponding to an upsampling process. The gradual deconvolution module utilizes the different scales of features extracted by the scale-independent feature extraction module, followed by gradual recovery to achieve accurate dense mapping. For example, where there are four target object clusters on the image, without scale-independent feature extraction and gradual deconvolution, the output dense map may include only four fuzzy clusters, the details of each of which may not be seen.
Although the image acquisition module 201, the feature extraction module 202, and the image processing module 203 are described above as software modules of the image processing program 20. Furthermore, it may be implemented via hardware, such as an ASIC chip. They may be integrated into one chip or implemented separately and electrically connected.
It should be mentioned that the present disclosure may encompass devices having architectures that are different from that shown in fig. 1. The above architecture is merely exemplary and is used to explain the exemplary method 100 shown in fig. 4.
Various methods according to the present disclosure may be performed. An exemplary method 100 according to the present disclosure includes the steps of:
s101: acquiring an image 30;
s102: extracting at least one feature of the image 30 via a first portion of the neural network 40 using a first set of parameters;
s103: n image processing tasks are respectively executed based on at least one feature, wherein for the ith image processing task, the (i + 1) th group of parameters is utilized through the (i + 1) th part of the neural network 40, N is an integer and N is larger than or equal to 2, i =1 \8230, N.
Optionally, for a combination of the first and second portions of the neural network 40, a first set of parameters is obtained by back propagation, for a combination of the first and (i + 1) th portions of the neural network 40, i >1, the first set of parameters is updated by back propagation and based on the first set of parameters updated by the combination of the first and (i) th portions of the training neural network 40, and for a combination of the first portion of the neural network 40 and a series of second to (N + 1) th portions, the first set of parameters is updated by back propagation and based on the first set of parameters updated by the combination of the first and (N + 1) th portions of the training neural network 40.
Optionally, for a combination of the first and second portions of the neural network 40, a second set of parameters is obtained by back propagation, for a combination of the first and (i + 1) th portions of the neural network 40, i >1, a (i + 1) th set of parameters is obtained by back propagation and based on the first set of parameters updated by the combination of the first and (i + 1) th portions of the training neural network 40, and for a combination of the first portion of the neural network 40 and a series of second to (N + 1) th portions, the (i + 1) th set of parameters is updated by back propagation and based on the first set of parameters updated by the combination of the first and (N + 1) th portions of the training neural network 40 and on the (i + 1) th set of parameters updated by the combination of the first and (i + 1) th portions of the training neural network 40.
Optionally, the neural network 40 may include: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.
Optionally, one of the N image processing tasks is outputting a dense mapping of the target object, and the output of the corresponding portion of the neural network 40 may include: dense mapping of different components of the target object; the method 100 may further comprise: the number of target objects is counted based on a weighted sum of dense mappings of different components of the target objects.
Optionally, one of the N image processing tasks is a dense mapping of the output target object, and the corresponding portion of the neural network 40 may include: a scale-independent feature extraction module and a step-by-step deconvolution module that receives the output of the scale-independent feature extraction module.
The following are use cases in which the solutions provided in the present disclosure may be employed. Referring to fig. 5, in this use case, a lightweight population counting and pedestrian detection system 80 for an edge device is provided. The system 80 may include:
an online training module 801 configured to train a neural network 813 for image processing;
an offline module 802 configured to perform image processing via a neural network 813.
With respect to online training module 801, neural network model 813 may be trained with a large number of image samples 811 via server 81 (e.g., a high performance GPU server). After training, the model of the neural network 813 can be deployed on the edge device 83 of the offline module 802, which includes the feature extraction 831, the blur engine 831, and the refinement engine 832.
With respect to the offline training module 802, the camera 82 may be connected to the edge device 83 and the neural network 812 may run on the edge device 83. First, features may be extracted from the image to be processed by feature extraction 831, then the blur engine 832 will output a dense map and the refinement engine 833 will perform pedestrian detection. The output dense map may be further processed by a population count 834 for counting the number of pedestrians, and a population flow analysis 835 for analyzing the population flow. The output bounding box of the pedestrian and optionally the face may be further processed by re-recognition 836, for example to find a particular person from the image, or to find a person wearing the mask.
Referring now to fig. 6, during the training phase, first, at step 1, the image sample is input into the neural network 812, and the loss function may be set to MSE and SSIM, parameter set 1 of the feature extraction 831 and parameter set 2 of the blur engine 832 may be obtained by back propagation.
Subsequently, at step 2, the image sample is input into the neural network 812 and the loss function can be set to the focus loss and the L1 loss, by back propagation, parameter set 1 will be updated to parameter set 1' based on parameter set 1, and parameter set 3 of the refinement engine 833 will be obtained.
Finally, at step 3, the image sample is input into the neural network 812, and the loss function may be set to a weighted sum of MSE, SSIM, focus loss, and L1 loss. By back propagation, parameter set 1' will be updated to parameter set 1", parameter set 2 will be updated to parameter set 2', and parameter set 3 will be updated to parameter set 3'.
Step 1 and step 2 are added compared to the current back propagation, and the parameters acquired during the first two steps will be used during step 3. In fact, after step 1 and step 2, the parameter set is almost ready, so step 3 is slightly adjusted, which allows the training process to converge quickly, which saves computing power and shortens the whole training process.
During the reasoning phase, parameter set 1", parameter set 2 'and parameter set 3' will be used for image processing.
Fig. 7 shows the workflow of the inference phase of the use case.
At step S201, basic convolution is performed. A convolution kernel with stride may be used for the convolution operation and the ReLU function may be used for activation, a batch normalization 2d (batch norm2 d) may be used for normalization, and then the max pooling layer may be used. A feature map having a size of one-fourth of the original image can be obtained at this step, and the amount of operation can be reduced to a large extent.
At step S202, the residual block may be used and features may be further extracted. The following modified h-swish function may be used as the activation function. The H-swish activation function is as follows:
Figure BDA0003955603310000121
at step S203, where the residual block may be used, features may be further extracted. In addition, the improved h-swish function may be used as an activation function.
Different classes of features may be extracted through steps S202 and S203, respectively. Optionally, other residual blocks may also be used to extract other kinds of features herein.
At step S204, a squeeze-and-excitation bottleneck may be used. Furthermore, the improved h-swish function may also be used as an activation function. Feature attention mechanisms may be used to make the neural network 812 learn more efficient abstract features. The dimension can be reduced and the amount of calculation can be reduced. Important features can be found.
At step S205, a squeeze and a trigger bottleneck may be used. Furthermore, the improved h-swish function may also be used as an activation function.
The features extracted at step S205 will be output to the fuzzy engine 832 to obtain the dense mapping. The features extracted at steps S202, S203, S204 and S205 will be output to the refinement engine 833 for pedestrian detection.
Now to the blur engine 832. The features output by step S205 will be used at steps S206, S207 and S208, respectively. At these steps, features of different scales will be extracted. Subsequently, in case of repeating steps S210 and S211, a gradual deconvolution operation may be performed to restore the exact dense mapping, not only with contours, but also with detailed information.
At step S206, which is part of the scale-independent feature extraction module 8321, a convolution operation may be performed. Optionally, no activation function is used here, and the normalization may use batch normalization 2d.
At step S207, here also part of the scale-independent feature extraction module 8321, which may include two kernel convolution modules. Both of the convolution modules contain batch normalized 2d layers. The first module may use an h-swish Activate function and the second module may not use an Activate function. The number of eigen channels in the output may be one quarter of the input.
At step S208, here also part of the scale-independent feature extraction module 8321, which may include two kernel convolution modules. Both convolution module cams contain batch normalized 2d layers. The first module may use an h-swish activation function and the second module may not use an activation function. The number of characteristic channels in the output may be one quarter of the input.
At step S209, a feature stitching technique may be used to obtain a non-increasing number of feature maps.
At step S210, an instance normalization 2d (instancenorm 2 d) may be used as a normalization layer, using a convolution kernel and a ReLU activation function.
At step S211, a deconvolution kernel may be used, and the example normalization 2d may be used as a normalization layer.
At step S212, the body and head may be predicted based on the density. The final number of people can be obtained by concatenating the two results. The formula can be as follows:
C=λ 1 D body2 D head
where C is the result of population counting. D body Is the sum of the body intensity maps. D head Is the sum of the dense mappings of the header. Lambda [ alpha ] 1 ,λ 2 Can be set according to the scene. This weighted sum can effectively eliminate the effect of human overlap, in which case the final result can be more accurate.
Now to the refinement engine 833.
At step S213, bilinear interpolation may be used to obtain the twice-enlarged feature map without computation.
In step S214, pixel-by-pixel addition is available for feature fusion without adding computational parameters.
At step S215, on the feature map of one-fourth size of the original image, the target center points of the pedestrian and the human face may be predicted, and may be predicted in the form of a heat map.
At step S216, on the feature map having a size of one-fourth of the original image, the height and width of the target may be predicted.
At step S217, on the feature map having a size of one-fourth of the original image, the deviation of the target center point may be predicted.
The loss function can be set as follows:
L=L mse1 L ssim +L focal_loss2 L wh_L1loss +L offset_L1loss
the series loss function includes a MSE loss function, an SSIM loss function, a focus loss function, and an L1 loss function.
The system 80 and method 100 may be used for dense population flow monitoring, pedestrian detection and tracking, and the like. The presented solution may be deployed on edge devices and may provide artificial intelligence capabilities for traditional image acquisition devices.
Also provided in the present disclosure is a computer-readable medium storing computer-executable instructions that, when executed by a computer, cause the computer to perform any of the methods presented in the present disclosure.
The computer program is being executed by at least one processor and performs any of the methods presented in this disclosure.
Although the present technology has been described in detail with reference to certain embodiments, it should be understood that the present technology is not limited to those precise embodiments. Indeed, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations can be made by those skilled in the art without departing from the scope and spirit of the invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications and variations that fall within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (20)

1. A method (100) for image processing, comprising:
-acquiring (S101) an image (30);
-extracting (S102) at least one feature of the image (30) with a first set of parameters via a first part of a neural network (40);
-performing (S103) N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, the (i + 1) th set of parameters is utilized via the (i + 1) th portion of the neural network (40), N being an integer and N ≧ 2, i =1 \8230An.
2. The method (100) of claim 1, wherein
-for a combination of the first and second parts of the neural network (40), obtaining the first set of parameters by back propagation,
-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, updating the first set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and
-for a combination of the first part of the neural network (40) and a series of the second to (N + 1) th parts, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the (N + 1) th part of the neural network (40).
3. The method (100) of claim 1, wherein
-obtaining a second set of parameters by back propagation for a combination of the first and second parts of the neural network (40),
-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, obtaining the (i + 1) th set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and
-for a combination of the first part and a series of the second part to the (N + 1) th part of the neural network (40), updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training a combination of the first part and the (N + 1) th part of the neural network (40) and based on the (i + 1) th set of parameters updated by training a combination of the first part and the (i + 1) th part of the neural network (40).
4. The method (100) of claim 1, wherein the neural network (40) comprises: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.
5. The method (100) of claim 1, wherein
-one of the N image processing tasks is a dense mapping of output target objects, and the outputting of the corresponding part of the neural network (40) comprises: a dense mapping of different components of the target object;
-the method (100) further comprises: counting a number of the target objects based on a weighted sum of the dense mappings of different components of the target objects.
6. The method (100) of claim 1, wherein one of the N image processing tasks is a dense mapping of output target objects, and the corresponding portion of the neural network (40) comprises:
-a scale-independent feature extraction module, and
-a stepwise deconvolution module receiving the output of the scale-independent feature extraction module.
7. An apparatus (10) for image processing, comprising:
-an image acquisition module (201) configured to acquire an image (30);
-a feature extraction module (202) configured to extract at least one feature of the image (30) with a first set of parameters via a first part of a neural network (40);
-an image processing module (203) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein for the ith image processing task, an (i + 1) th set of parameters is utilized via an (i + 1) th portion of the neural network (40), N being an integer and N ≧ 2, i =1 \8230; N.
8. The apparatus (10) of claim 7, wherein
-for a combination of the first and second parts of the neural network (40), obtaining the first set of parameters by back propagation,
-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and
-for a combination of the first part of the neural network (40) and a series of the second to (N + 1) th parts, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part and the (N + 1) th part of the neural network (40).
9. The apparatus (10) of claim 7, wherein
-obtaining a second set of parameters by back propagation for a combination of the first part and the second part of the neural network (40),
-for a combination of the first part and the (i + 1) th part of the neural network (40), i >1, obtaining the (i + 1) th set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first part and the i-th part of the neural network (40), and
-for a combination of the first part and a series of the second part to the (N + 1) th part of the neural network (40), updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training a combination of the first part and the (N + 1) th part of the neural network (40) and based on the (i + 1) th set of parameters updated by training a combination of the first part and the (i + 1) th part of the neural network (40).
10. The apparatus (10) of claim 7, wherein the neural network (40) comprises: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.
11. The apparatus (10) of claim 7, wherein
-one of the N image processing tasks is outputting a dense mapping of target objects, and the outputting of the corresponding part of the neural network (40) comprises: a dense mapping of different components of the target object;
-the image processing module (203) is further configured to count the number of target objects based on a weighted sum of the dense mappings of different components of the target objects.
12. The apparatus (10) of claim 7, wherein one of the N image processing tasks is a dense mapping of output target objects, and the corresponding portion of the neural network (40) includes:
-a scale-independent feature extraction module, and
-a stepwise deconvolution module receiving the output of the scale-independent feature extraction module.
13. An apparatus (10) for image processing, comprising:
-at least one processor (102);
-at least one memory (101), coupled to the at least one processor (102), configured to perform the method according to any one of claims 1 to 6.
14. A computer-readable medium storing computer-executable instructions for image processing, wherein the computer-executable instructions, when executed, cause at least one processor to perform the method of any of claims 1-6.
15. A neural network (40), comprising:
-a first part (401) configured to extract at least one feature of the image (30) using a first set of parameters;
-a second part (402) to an (N + 1) th part (40 (N + 1)) configured to perform N image processing tasks, respectively, based on the at least one feature, wherein the (i + 1) th part (40 (i + 1)) of the neural network (40) is configured to perform the ith image processing task with the (i + 1) th set of parameters, N being an integer and N ≧ 2, i =1 \8230n.
16. The neural network (40) of claim 15, wherein
-for a combination of the first part (401) and the second part (402) of the neural network (40), obtaining the first set of parameters by back propagation,
-for a combination of the first part (401) and the (i + 1) th part (40 (i + 1)) of the neural network (40), i >1, updating the first set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part (401) and the i-th part (40 i) of the neural network (40), and
-for a combination of the first portion (401) and a series of the second (402) to (N + 1) th portions (40 (N + 1)) of the neural network (40), updating the first set of parameters by back-propagation and based on the first set of parameters updated by training the combination of the first portion (401) and the (N + 1) th portion (40 (N + 1)) of the neural network (40).
17. The neural network (40) of claim 15, wherein
-obtaining a second set of parameters by back propagation for a combination of the first part (401) and the second part (402) of the neural network (40),
-for a combination of the first part (401) and the (i + 1) th part (40 (i + 1)) of the neural network (40), i >1, obtaining the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first part (401) and the i-th part (40 i) of the neural network (40), and
-for a combination of the first portion (401) and a series of the second portion (402) to the (N + 1) th portion (40 (N + 1)) of the neural network (40), updating the (i + 1) th set of parameters by back propagation and based on the first set of parameters updated by training the combination of the first portion (401) and the (N + 1) th portion (40 (N + 1)) of the neural network (40) and on the (i + 1) th set of parameters updated by training the combination of the first portion (401) and the (i + 1) th portion (40 (i + 1)) of the neural network (40).
18. The neural network (40) of claim 15, wherein the neural network (40) comprises: the system comprises a basic convolution layer, a basic residual error module, a feature attention mechanism module, a feature fusion module, a scale-independent feature extraction module and a gradual deconvolution module.
19. The neural network (40) of claim 15, wherein
-one of the N image processing tasks is outputting a dense mapping of target objects, and the outputting of the corresponding part of the neural network (40) comprises: a dense mapping of different components of the target object;
-the corresponding part of the neural network (40) is further configured to: counting a number of the target objects based on a weighted sum of the dense mappings of different components of the target objects.
20. The neural network (40) of claim 15, wherein one of the N image processing tasks is a dense mapping of output target objects, and the corresponding portion of the neural network (40) includes:
-a scale-independent feature extraction module, and
-a stepwise deconvolution module receiving the output of the scale-independent feature extraction module.
CN202080101212.2A 2020-05-29 2020-05-29 Method and apparatus for image processing Pending CN115668277A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/093497 WO2021237727A1 (en) 2020-05-29 2020-05-29 Method and apparatus of image processing

Publications (1)

Publication Number Publication Date
CN115668277A true CN115668277A (en) 2023-01-31

Family

ID=78745343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080101212.2A Pending CN115668277A (en) 2020-05-29 2020-05-29 Method and apparatus for image processing

Country Status (2)

Country Link
CN (1) CN115668277A (en)
WO (1) WO2021237727A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024007423A1 (en) * 2022-07-06 2024-01-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Reference picture resampling (rpr) based super-resolution guided by partition information

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529402B (en) * 2016-09-27 2019-05-28 中国科学院自动化研究所 The face character analysis method of convolutional neural networks based on multi-task learning
WO2018187632A1 (en) * 2017-04-05 2018-10-11 Carnegie Mellon University Deep learning methods for estimating density and/or flow of objects, and related methods and software
CN109523532B (en) * 2018-11-13 2022-05-03 腾讯医疗健康(深圳)有限公司 Image processing method, image processing device, computer readable medium and electronic equipment
CN109858372B (en) * 2018-12-29 2021-04-27 浙江零跑科技有限公司 Lane-level precision automatic driving structured data analysis method
CN111178253B (en) * 2019-12-27 2024-02-27 佑驾创新(北京)技术有限公司 Visual perception method and device for automatic driving, computer equipment and storage medium
CN111144329B (en) * 2019-12-29 2023-07-25 北京工业大学 Multi-label-based lightweight rapid crowd counting method

Also Published As

Publication number Publication date
WO2021237727A1 (en) 2021-12-02

Similar Documents

Publication Publication Date Title
Wu et al. Edge computing driven low-light image dynamic enhancement for object detection
CN112990211B (en) Training method, image processing method and device for neural network
CN111797983A (en) Neural network construction method and device
CN111507378A (en) Method and apparatus for training image processing model
CN112446398A (en) Image classification method and device
CN112132156A (en) Multi-depth feature fusion image saliency target detection method and system
CN110222718B (en) Image processing method and device
CN114418030B (en) Image classification method, training method and device for image classification model
Chen et al. Corse-to-fine road extraction based on local Dirichlet mixture models and multiscale-high-order deep learning
CN111401517A (en) Method and device for searching perception network structure
CN111368972A (en) Convolution layer quantization method and device thereof
Jain et al. AI-enabled object detection in UAVs: challenges, design choices, and research directions
CN111340758A (en) Novel efficient iris image quality evaluation method based on deep neural network
CN116681960A (en) Intelligent mesoscale vortex identification method and system based on K8s
Tran et al. Enhancement of robustness in object detection module for advanced driver assistance systems
CN113793341B (en) Automatic driving scene semantic segmentation method, electronic equipment and readable medium
US20230070439A1 (en) Managing occlusion in siamese tracking using structured dropouts
CN116432736A (en) Neural network model optimization method and device and computing equipment
CN115668277A (en) Method and apparatus for image processing
Islam et al. Self-supervised learning with local contrastive loss for detection and semantic segmentation
Obeso et al. Introduction of explicit visual saliency in training of deep cnns: Application to architectural styles classification
CN116309050A (en) Image super-resolution method, program product, storage medium and electronic device
CN116051561A (en) Lightweight pavement disease inspection method based on vehicle-mounted edge equipment
WO2022179599A1 (en) Perceptual network and data processing method
Sun et al. Semantic-aware 3D-voxel CenterNet for point cloud object detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination