WO2020207134A1 - 图像处理方法、装置、设备以及计算机可读介质 - Google Patents
图像处理方法、装置、设备以及计算机可读介质 Download PDFInfo
- Publication number
- WO2020207134A1 WO2020207134A1 PCT/CN2020/076598 CN2020076598W WO2020207134A1 WO 2020207134 A1 WO2020207134 A1 WO 2020207134A1 CN 2020076598 W CN2020076598 W CN 2020076598W WO 2020207134 A1 WO2020207134 A1 WO 2020207134A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input image
- image feature
- size
- feature
- candidate
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
Definitions
- the present disclosure relates to the field of image processing, and in particular, to an image processing method, equipment, device, and computer-readable medium for target detection.
- Target detection is one of the most basic applications in the field of computer vision. Generally, in multiple images, the target to be detected has a very large scale difference, that is, the target is very large in some images, but very small in other images. Therefore, in target detection, it is hoped that the accuracy of target detection can be improved by considering image information at multiple scales.
- the present disclosure provides an image processing method, device, device, and computer readable medium.
- an image processing method including: determining a plurality of input image features according to an input image, wherein the sizes of the plurality of input image features are different from each other; For each input image feature of, the input image feature is used as the reference input image feature, and the first input image feature whose size is smaller than the size of the reference input feature is selected from the plurality of input image features and the size is larger than the A second input image feature based on the size of the reference input image feature; determine candidate regions associated with the reference image feature based on the reference input image feature, the first input image feature, and the second input image feature; and The multiple candidate regions associated with the multiple input image features perform target detection.
- determining the candidate area associated with the reference image feature according to the reference input image feature, the first input image feature, and the second input image feature includes: The input image feature and the first input image feature determine a first candidate area, and the second candidate area is determined based on the reference input image feature and the second input image feature.
- Region execution target detection includes: performing pooling processing on a plurality of first candidate regions and a plurality of second candidate regions respectively associated with the plurality of input image features, so that the sizes of the processed candidate regions are the same; Perform classification prediction on the processed candidate area; and adjust the border of the candidate area according to the predicted category.
- determining the first candidate region according to the reference input image feature and the first input image feature includes: performing an upsampling operation on the first input image feature, so that the up-sampled first input image The size of the feature is enlarged to the size of the reference input image feature; the first input image feature after upsampling is combined with the reference input image feature, and a first combination whose size is the same as the size of the reference input image feature is obtained Image feature; determining the first candidate region based on the first combined image feature.
- determining the second candidate region according to the reference input image feature and the second input image feature includes: performing a down-sampling operation on the second input image feature, so that the down-sampled second input image The size of the feature is reduced to the size of the reference input image feature; the down-sampled second input image feature is combined with the reference input image feature, and a second combined image with the same size as the reference input image feature size is obtained Feature; determining the second candidate region based on the second combined image feature.
- the image processing method further includes: for the reference input image feature, selecting a third input image feature whose size is smaller than the size of the first input image feature from the plurality of input image features Perform an up-sampling operation on the third input image feature, so that the size of the up-sampled third input image feature is enlarged to the size of the reference input image feature; and wherein the up-sampled first input image feature is combined
- the first combined image feature having the same size as the reference input image feature and obtaining the first combined image feature with the size of the reference input image feature includes: combining the up-sampled third input image feature and the up-sampled first input image feature And the reference input image feature, and obtain a first combined image feature whose size is the same as the size of the first input image feature.
- the image processing method further includes: for the reference input image feature, selecting a fourth input image feature whose size is larger than that of the second input image feature from the plurality of input image features Perform a down-sampling operation on the fourth input image feature, so that the size of the down-sampled fourth input image feature is reduced to the size of the reference input image feature; and wherein the down-sampled second input image is combined Feature and the reference input image feature, and obtaining a second combined image feature whose size is the same as the feature size of the reference input image includes: combining the down-sampled fourth input image feature and the down-sampled second input image feature And the reference input image feature, and obtain a second combined image feature the same size as the reference input image feature size.
- the multiple input image features have the same number of channels.
- determining the first candidate region based on the first combined image feature includes: determining the first candidate region based on the first combined image feature using a sliding window, select search, edgebox algorithm, or a region suggestion network Candidate area.
- determining a plurality of input image features according to the input image includes: transforming the input image using a deep residual network, and determining the image corresponding to the input image according to the output of the deep residual network Multiple input image features.
- an image processing device including: a feature determining module configured to determine a plurality of input image features according to an input image, wherein the sizes of the plurality of input image features are different from each other; candidates The region determining module is configured to perform the following operations for each of the multiple input image features to generate a candidate region: for the first input image feature, select a second input image from the multiple input image features Feature and a third input image feature, wherein the size of the second input image feature is smaller than the size of the first input image feature, and the size of the third input image feature is larger than the size of the first input image feature;
- the candidate area is determined according to the first input image feature, the second input image feature, and the third input image feature; and a target detection module is configured to perform target detection according to the candidate area.
- the candidate area determination module is further configured to: determine a first candidate area according to the reference input image feature and the first input image feature, and according to the reference input image The feature and the second input image feature determine a second candidate area.
- the target detection module is further configured to: pair with the plurality of input images respectively. Perform pooling processing on multiple first candidate regions and multiple second candidate regions associated with features, so that the size of each candidate region after processing is the same; perform classification prediction on the processed candidate regions; and according to the predicted category Adjust the border of the candidate area.
- the candidate region determining module further includes: an up-sampling module configured to perform an up-sampling operation on the first input image feature, so that the size of the up-sampled first input image feature is enlarged to the The size of the reference input image feature; a combination module configured to combine the up-sampled first input image feature and the reference input image feature, and obtain a first combined image feature whose size is the same as the size of the reference input image feature An area determining module, configured to determine the first candidate area based on the first combined image feature.
- an up-sampling module configured to perform an up-sampling operation on the first input image feature, so that the size of the up-sampled first input image feature is enlarged to the The size of the reference input image feature
- a combination module configured to combine the up-sampled first input image feature and the reference input image feature, and obtain a first combined image feature whose size is the same as the size of the reference input image feature
- An area determining module configured to determine the first candidate area based on the first combined
- the candidate region determination module further includes: a down-sampling module configured to perform a down-sampling operation on the second input image feature, so that the size of the down-sampled second input image feature is reduced to the The size of the reference input image feature; a combination module configured to combine the down-sampled second input image feature and the reference input image feature, and obtain a second combined image feature whose size is the same as the reference input image feature size; An area determining module configured to determine the second candidate area based on the second combined image feature.
- a down-sampling module configured to perform a down-sampling operation on the second input image feature, so that the size of the down-sampled second input image feature is reduced to the The size of the reference input image feature
- a combination module configured to combine the down-sampled second input image feature and the reference input image feature, and obtain a second combined image feature whose size is the same as the reference input image feature size
- An area determining module configured to determine the second candidate area based on the second combined image feature.
- an image processing device including at least one processor and a memory storing program instructions.
- the at least one processor is configured to execute Image processing method.
- a computer-readable non-transitory storage medium with program instructions stored thereon.
- the program instructions When the program instructions are executed by a computer, the computer is configured to execute Image processing method.
- a candidate area for image target detection can be determined according to the image features that incorporate image information of multiple scales, thereby Improve the accuracy of target detection.
- Figure 1 shows an exemplary output result of target detection on an image
- Fig. 2 shows an exemplary process of a target detection method according to an embodiment of the present disclosure
- Fig. 3 shows a schematic block diagram of an image processing device according to an embodiment of the present disclosure
- Fig. 4A shows a schematic block diagram of a feature determination module according to an embodiment of the present disclosure
- FIG. 4B shows an example of a basic block constituting the deep residual network ResNet
- FIG. 4C shows an example of a basic block constituting the deep residual network ResNet
- Fig. 4D shows another example of a basic block constituting the deep residual network ResNet
- FIG. 5 shows a schematic block diagram of a candidate area determination module according to an embodiment of the present disclosure
- Fig. 6 shows a schematic block diagram of a target detection module according to an embodiment of the present disclosure
- Fig. 7 shows a schematic flowchart of an image processing method according to an embodiment of the present disclosure
- FIG. 8A shows a schematic flowchart of a feature determination method according to an embodiment of the present disclosure
- FIG. 8B shows a schematic diagram of a feature determination method according to an embodiment of the present disclosure
- FIG. 9A shows a schematic flowchart of a method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 9B shows a schematic flowchart of a method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 10A shows an example of a method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 10B shows an example of a method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 10C shows an example of a method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 11A shows a schematic flowchart of another method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 11B shows a schematic flowchart of another method for determining a candidate area according to an embodiment of the present disclosure
- FIG. 12A shows an example of another candidate region determination method according to an embodiment of the present disclosure
- FIG. 12B shows an example of another candidate area determination method according to an embodiment of the present disclosure
- FIG. 12C shows an example of a target detection process according to an embodiment of the present disclosure
- FIG. 13 shows a schematic flowchart of a target detection method according to an embodiment of the present disclosure.
- FIG. 14 shows a schematic block diagram of a computing device according to an embodiment of the present disclosure.
- Fig. 1 shows an exemplary output result of target detection on an image.
- the computer can recognize that the picture includes a cat, and add a mark box and a text mark "cat" to the recognized object in the picture.
- convolutional neural networks have shown great advantages in the field of image processing, especially in target detection and classification.
- the same type of targets may show larger differences in scale.
- the common target detection algorithm based on convolutional neural network has poor detection performance when processing images containing small targets.
- the present disclosure proposes an improved method for determining a candidate region containing a target to be detected in an image.
- Fig. 2 shows an exemplary process of a method for target detection according to an embodiment of the present disclosure.
- the trained convolutional neural network can be used to transform the input image into multiple image features, the scales of the multiple image features are different from each other, as shown in Figure 2 C1, C2, C3, C4 .
- the convolutional neural network may have a multilayer structure.
- the convolutional neural network may include multiple convolutional layers and/or pooling layers. The output of any layer of the convolutional neural network including multiple convolutional layers can be used as the image feature of the input image.
- the size of image feature C1 can be represented as 16 ⁇ 16
- the size of image feature C2 can be represented as 8 ⁇ 8
- the size of image feature C3 can be represented as 4 ⁇ 4
- the size of image feature C4 The size can be expressed as 2 ⁇ 2.
- the size of each image feature mentioned above may not be its actual size, but only used to represent the proportional relationship between the size of each image feature.
- the size of C1 can be 1024 ⁇ 1024
- the size of C2 can be 512 ⁇ 512
- the size of C3 can be 256 ⁇ 256
- the size of C4 can be 128 ⁇ 128.
- the image size mentioned here can be a size in pixels. Therefore, the larger the size of the image or image feature, the higher the resolution.
- Figure 2 only shows a possible example of transforming the input image into multiple image features of different scales.
- image features of different sizes can also be generated by adjusting the parameters of the aforementioned neural network.
- the image sizes between the respective scales may be based on a proportional relationship of 2 times, 3 times, or any multiple.
- the input image can be transformed into multiple image features with different sizes, and the size of each image feature can be set arbitrarily according to actual needs.
- each candidate area for realizing target detection can be generated according to image information of different scales.
- the trained neural network can be used to process each image feature and output the position of the bounding box that may include the object in the input image.
- one or more of the sliding window, select search (select search), edgebox algorithm, and Region Proposal Network (RPN) can be used to process image features C1, C2, C3, and C4 of different sizes. , And generate candidate regions for each image feature.
- the image features C1, C2, C3, C4 can be processed by the above-mentioned method and the coordinates of the four vertices of the rectangular candidate area in the input image can be output, so that it can be determined that the candidate area is in the input The position in the image.
- a pooling layer (for example, ROI Pooling) can be used to map candidate regions of different sizes to preset sizes.
- candidate regions of different sizes can be mapped to outputs of the same size.
- the pooling layer can achieve maximum pooling, minimum pooling, or average pooling.
- a region-based convolutional neural network R-CNN
- R-CNN region-based convolutional neural network
- Fig. 3 shows a schematic block diagram of an image processing device according to an embodiment of the present disclosure.
- the image processing apparatus 300 may include a feature determination module 310, a candidate region determination module 320 and a target detection module 330.
- the above modules can be connected to each other and exchange data.
- the feature determining module 310 may be configured to determine a plurality of input image features according to the input image, wherein the sizes of the multiple input image features are different from each other.
- the input image may include one or more objects to be detected.
- the input image may include various types of objects to be detected, such as people, animals, plants, indoor objects, and vehicles.
- the input image may include one or more channels, such as R, G, B and/or grayscale channels.
- the feature determining module 310 can determine the input image feature C1 with a size of 16 ⁇ 16, an input image feature C2 with a size of 8 ⁇ 8, and an input image with a size of 4 ⁇ 4 according to the input image.
- the input image features may be implemented in the form of tensors.
- the size of the input image feature C1 may be 16 ⁇ 16 ⁇ 64, where the third size component 64 represents the dimension of C1, also known as the number of channels, and 16 ⁇ 16 represents the feature size on each channel of C1.
- the channel numbers of the multiple input image features generated by the feature determination module 310 may be the same or different. Since the input image includes targets of different sizes, the target information included in the input image features of different scales is different.
- the candidate region determining module 320 may be used to determine a candidate region for the input image according to image information of different scales according to multiple input image features generated by the feature determining module 310.
- the candidate region determining module 320 may be configured to, for each input image feature of the plurality of input image features, use the input image feature as a reference input image feature, and select from the plurality of input image features whose size is smaller than The first input image feature whose size is the reference input feature and the second input image feature whose size is larger than the size of the reference input image feature. Determine a candidate area associated with the reference image feature according to the reference input image feature, the first input image feature, and the second input image feature. For example, for the reference input image feature, the candidate area determining module 320 may generate the first candidate area according to the combined image feature fused with the reference input image feature and the first input image feature.
- the candidate region determining module 320 may also generate a second candidate region according to a combined image feature fused with the above-mentioned reference input image feature and the above-mentioned second input image feature.
- the first candidate area and the second candidate area described above can be used in the next target detection step.
- the candidate region determining module 320 may generate a third candidate region based on a combined image feature fused with the reference input image feature, the first input image feature, and the second input image feature, and use the third candidate region for The next target detection step.
- the solutions provided by the present disclosure are not limited to the above examples.
- those skilled in the art can set the image processing device to select one or more of the first candidate area, the second candidate area, and the third candidate area for the next target detection step according to actual needs.
- the first candidate area, the second candidate area, and the third candidate area can all be used in the next target detection step.
- the target detection module 330 may be configured to perform target detection according to the aforementioned determined candidate area. In some embodiments, the target detection module 330 may classify the candidate region, and adjust the position and size of the bounding box of the candidate region according to the classification result. In some embodiments, the target detection module 330 may also output the probability that the object in the candidate area belongs to a certain preset category.
- the image processing apparatus 300 may further include an input/output module. Using the input/output module, the image processing device 300 can receive an input image on which image processing is to be performed, and output the result obtained by the image processing device 300 to the user.
- the output module can be implemented as a display screen. By displaying the target detection result shown in FIG. 1 on the display screen, the result obtained by the image processing device shown in FIG. 3 can be shown to the user.
- candidate regions can be generated based on input image features of different sizes, and for input image features of a specific size, the input image features can be combined with The input image features smaller than the specific size and/or the input image features larger than the specific size are merged, and the merged image features are used to determine the candidate area.
- the input image features of different scales can be used to determine the fusion at a deeper level.
- the input image feature reflects the image information of the small-sized target. Therefore, the accuracy of candidate regions generated by the image processing device provided by the present disclosure is higher.
- Fig. 4A shows a schematic block diagram of a feature determination module according to an embodiment of the present disclosure.
- the feature determination module 310 may include an image decomposition module 311 and a dimension adjustment module 312.
- the above-mentioned modules can be connected to each other and exchange data.
- the image decomposition module 311 may be configured to decompose the input image into multiple input image features of different scales, wherein the multiple input image features may have the same number of channels or different channel numbers.
- the image decomposition module 311 may use a deep residual network ResNet to decompose the input image.
- Fig. 4B shows an example of a basic block constituting the deep residual network ResNet.
- the input of this segment of the neural network is x
- the expected output is H(x), where H(x) is the desired complex mapping relationship.
- the training goal of the deep residual network is to approach the residual result to zero, so that the superposition of the multilayer neural network does not lead to a decrease in accuracy.
- the deep residual network includes a basic five-layer structure, including conv1, conv2_x, conv3_x, conv4_x, and conv5_x shown in the following table.
- Each layer of the five-layer structure can include the same or different weight layers to form deep residual networks of different depths.
- the conv1 layer has a convolutional layer of [7 ⁇ 7,64].
- Each conv2_x has a maximum pooling layer with a size of 3 ⁇ 3.
- the conv2_x layer also includes two successively connected two groups of convolutional layers with a size of [3 ⁇ 3,64], as shown in FIG. 4C.
- the conv2_x layer includes three groups of successively connected three-layer sizes of [1 ⁇ 1,64], [3 ⁇ 3,64], [1 ⁇ 1,256 ]
- the structure of the convolutional layer is shown in Figure 4D. Using the structure shown in Table 1, a deep residual network structure of 18 layers, 34 layers, 50 layers, 101 layers, and 152 layers can be constructed respectively.
- the output of the last layer of conv2_x, conv3_x, conv4_x, and conv5_x can be denoted as C1, C2, C3, and C4, respectively, and C1, C2, C3, and C4 can be regarded as the aforementioned differences in this disclosure.
- image features with the same number of channels at multiple different scales can be obtained, and image features with different numbers of channels at multiple different scales can also be obtained.
- the dimension adjustment module 312 can be used to perform dimensional adjustments on the features at the multiple scales.
- the dimension adjustment module 312 may be configured to process the input image features generated by the image decomposition module 311, and determine multiple input image features C1, C2, C3, and C4 with the same number of channels.
- the dimensionality adjustment module 312 may use a convolution kernel with a size of 1 ⁇ 1 and a channel number of n to convolve multiple input image features generated by the image decomposition module 311. In this way, the number of channels of multiple input image features can be changed to the number of channels n of the convolutional layer of 1 ⁇ 1 without changing the size of the input image feature.
- the number of channels of the input image feature can be set to the required number by setting the number of n.
- the dimension adjustment module 312 in the feature determination module 310 may be omitted.
- the dimension adjustment module 312 in the feature determination module 310 may be omitted. That is, the feature determination module 310 can output multiple input image features with different numbers of channels. In the subsequent operation steps, when two image features with different channel numbers need to be processed to have the same channel number, a separate dimension adjustment module can be used to process the image features.
- FIG. 5 shows a schematic block diagram of the candidate area determination module 320 according to an embodiment of the present disclosure.
- the candidate area determination module 320 may include an up-sampling module 321, a down-sampling module 322, a combination module 323, and an area determination module 324.
- the above modules can be connected to each other and exchange data.
- the up-sampling module 321 may be configured to perform an up-sampling operation on image features.
- the up-sampling operation may be to interpolate image features at uniform intervals. For example, if a 2x upsampling operation is performed, the size of the image feature can be increased from 2 ⁇ 2 to 4 ⁇ 4. If a 4-fold upsampling operation is performed, the size of the image feature can be increased from 2 ⁇ 2 to 8 ⁇ 8.
- Upsampling operations may include performing interpolation operations on the image, such as neighbor interpolation (such as bilinear interpolation, bicubic interpolation, spline interpolation, etc.), edge-based interpolation, and/or region-based interpolation.
- the down-sampling module 322 may be configured to perform down-sampling operations on image features.
- the down-sampling operation may extract data from image features at uniform intervals, thereby reducing the size of the image features to be processed. For example, if a downsampling operation of 2 times is performed, the size of the image feature can be reduced from 4 ⁇ 4 to 2 ⁇ 2. If a 4-fold down-sampling operation is performed, the size of the image feature can be reduced from 8 ⁇ 8 to 2 ⁇ 2.
- the downsampling operation can map all pixels in a 2 ⁇ 2 area in the image feature into one pixel. For example, the weighted average of all pixels in the area can be used as the down-sampled image. The pixel value of a pixel.
- the combination module 323 may be configured to combine image features.
- the combination module 323 may be configured to use a trained convolutional neural network to combine image features. That is, a combined image feature combining image information of different image features is generated through a convolutional neural network.
- the combination module 323 may be configured to superimpose multiple image features of the same size.
- the combination module 323 may be configured to superimpose multiple image features with the same size and number of channels. For example, for multiple image features with the same size and number of channels, the combination module can directly sum the element values of the multiple image features at the same coordinate as the parameter of the superimposed image feature at the coordinate. In other embodiments, the combination module 323 may be configured to superimpose multiple image features with different numbers of channels.
- the combination module may include a dimensionality adjustment unit, which may be configured to process the number of channels of the multiple image features so that the processed multiple image features have the same number of channels, and will have the same number of channels. Multiple image features are superimposed.
- the combination module 323 can be used to generate a combined image that incorporates image information of different scales.
- the area determination module 324 may be configured to determine the candidate area based on the combined image generated by the combination module 323.
- the region determining module 324 can use one or more of sliding window, select search (select search), edgebox algorithm, and region proposal network (Region Proposal Network, RPN) to perform image processing on the combined image, and obtain candidate regions in the combined image . Therefore, the candidate region determining module can determine the candidate region for detecting the target in the input image according to the image features of the image information of different scales.
- Fig. 6 shows a schematic block diagram of a target detection module according to an embodiment of the present disclosure.
- the target detection module 330 may include a pooling module 331, a classification module 332, and an adjustment module 333.
- the above modules can be connected to each other and exchange data.
- the pooling module 331 may be configured to perform pooling processing on each candidate area generated by the candidate area determining module, so that the size of each candidate area after processing is the same. For example, the pooling module 331 may use ROI (Pooling) to map candidate regions of different sizes into fixed-size outputs.
- ROI Rooling
- the classification module 332 may be configured to perform classification prediction on the processed candidate regions of the same size.
- the region-based convolutional neural network (R-CNN) that has been trained can be used to classify each candidate region.
- each candidate area can be input to the R-CNN network, and based on the output of the R-CNN network, the candidate area can be determined as "man”, “ woman”, “cat”, “dog”, “flower” and other various predictions.
- the defined category It is also possible to determine the candidate area as a "background category” to use for candidate areas with poor classification results.
- the specific classification result can be specified by adjusting the parameters of the convolutional neural network used for classification.
- the classification module 332 can also predict the probability that a certain candidate area belongs to a certain category.
- the adjustment module 333 may be configured to adjust the border of the candidate area according to the category predicted by the classification module 332. According to the category determined by the classification module 332, the adjustment module 333 can adjust the boundary of the candidate area generated by the aforementioned candidate area determination module 320 by using bounding-box regression, so as to obtain a more accurate target bounding box.
- the image processing device provided by the present disclosure, it is possible to decompose an input image into input image features of different sizes, and determine candidates that may contain objects in the input image based on the combined image features of the image information fused with the input image features of different sizes area. By considering the image information at multiple scales, the accuracy of target detection can be improved.
- Fig. 7 shows a schematic flowchart of an image processing method according to an embodiment of the present disclosure.
- the image processing apparatus shown in FIG. 3 to FIG. 6 may execute the image processing method shown in FIG. 7.
- step S702 multiple input image features may be determined according to the input image.
- Step S702 can be performed by using the feature determination module shown in Fig. 3 and Fig. 4A.
- the feature determination module can use the VGG network or the aforementioned deep residual network ResNet to extract the input image features from the input image.
- each image feature extracted through the VGG or ResNet network has the same number of channels.
- each image feature extracted through the VGG or ResNet network has a different number of channels. Since the targets included in the input image have different sizes, the target information included in the input image features of different scales is different.
- multiple input image features of different sizes obtained by using different convolution processing have different semantic information and detailed information.
- the first input image feature and the second input image feature can be selected from a plurality of input image features, wherein the size of the first input image feature is smaller than the size of the reference input image feature, and the first input image feature 2.
- the size of the input image feature is greater than the size of the reference input image feature; the candidate area associated with the reference image feature can be determined according to the reference input image feature, the first input image feature, and the second input image feature.
- the candidate area determination module shown in FIG. 3 and FIG. 5 may be used to perform step S704.
- the first input image feature whose size is smaller than the reference input image feature can be combined with the reference input image feature to generate a first combined image that incorporates the reference input image feature and the image information of the first input image feature Features, where the first combined image feature has the same size as the first input image feature.
- the first candidate area can be generated according to the image information of the first combined image feature.
- the second input image feature whose size is larger than the reference input image feature can be combined with the reference input image feature to generate a second combined image feature fused with image information of the reference input image feature and the second input image feature, where The second combined image feature has the same size as the first input image feature.
- the second candidate area can be generated according to the image information of the second combined image feature.
- the first input image feature whose size is smaller than the reference input image feature, and the second input image feature whose size is larger than the reference input image feature can be combined with the reference input image feature to generate a fused reference input image
- the third candidate region can be generated according to the image information of the third combined image feature.
- the technician can select one or more of the first candidate area, the second candidate area, and the third candidate area for the next target detection operation according to actual needs.
- the first candidate area, the second candidate area, and the third candidate area determined as described above can all be used for the next target detection operation.
- a part of the first candidate area, the second candidate area, and the third candidate area determined as described above may be selected according to a preset rule for the next target detection operation.
- target detection may be performed according to the determined candidate area.
- the target detection module shown in FIG. 3 and FIG. 6 may be used to perform step S706.
- the target detection algorithm can classify the aforementioned candidate regions, and adjust the position and size of the bounding box of the candidate region according to the classification results.
- the target detection algorithm can also be used to output the probability that the candidate region belongs to a certain category.
- the method shown in FIG. 7 can be used to determine multiple candidates associated with the multiple input image features. area.
- candidate regions can be generated based on input image features of different sizes, and for input image features of a specific size, the input image features can be combined with The input image features that are smaller than the specific size and/or the input image features that are larger than the specific size are fused, and the image features including image information fused with multiple scales are used to determine the input image Candidate area. Since the aforementioned multiple input image features of different sizes obtained by using different convolution processing have different semantic information and detailed information, by using the fused input image feature determination of different scales, it can be used in the deep input image features. Reflect the image information of a small-sized object. Therefore, the accuracy of candidate regions generated by the image processing device provided by the present disclosure is higher.
- Fig. 8A shows a schematic flowchart of a feature determination method according to an embodiment of the present disclosure.
- the method shown in FIG. 8A may be performed using the feature determination module 310 shown in FIGS. 3 and 4A.
- Step S702 as shown in FIG. 7 can be implemented by using the flow shown in FIG. 8A.
- the feature determination step S702 may include step S7022.
- step S7022 multiple input image characteristics may be determined according to the input image.
- the image decomposition module shown in FIG. 4A can be used to extract input image features from the input image.
- the output of the last convolutional layer in the conv2_x, conv3_x, conv4_x, and conv5_x layers of the aforementioned deep residual network can be used as the input image feature of the input image.
- the input image features determined according to the input image may have the same number of channels or different channel numbers.
- the feature determination step S702 may further include step S7024.
- step S7024 the channel numbers of multiple input image features may be dimensionally adjusted so that the multiple The input image features have the same number of channels.
- Step S7024 may be performed by using the dimension adjustment module shown in FIG. 4A.
- a convolution layer with a size of 1 ⁇ 1 and a channel number of n may be used to convolve the multiple input image features generated in step S7022. In this way, the number of channels of multiple input image features can be uniformly changed to the number of channels n of the 1 ⁇ 1 convolutional layer without changing the size of the input image feature.
- the number of channels of the input image feature can be set to the required number by setting the size of n.
- step S7024 can be omitted.
- the multiple input image features output in step S7022 can be used as multiple input image features of the input image.
- step S7024 can also be omitted. That is, step S702 of the image processing method shown in FIG. 7 can output multiple input image features with different numbers of channels.
- an additional dimension adjustment step can be used to process the input image features.
- FIG. 8B shows a schematic diagram of a feature dimension adjustment method according to an embodiment of the present disclosure.
- an input image feature C1 with a size of 16 ⁇ 16 ⁇ 16 an input image feature C2 with a size of 8 ⁇ 8 ⁇ 64, an input image feature C3 with a size of 4 ⁇ 4 ⁇ 256, and The input image feature C4 with a size of 2 ⁇ 2 ⁇ 1024. Since the parameters of the convolutional network used to extract features from the input image are set so that the number of channels of C1, C2, C3, and C4 are different, the dimension adjustment module shown in FIG. 4A can be used to adjust C1, C2, C3 and C4 are processed to have the same number of channels.
- the input image features C1, C2, C3, and C4 of different sizes can be normalized under the same size, so that the input image features C1, C2, C3, and C4 are transformed to have the same channel
- the input image features C1', C2', C3' and C4' to facilitate subsequent processing.
- input image features C1, C2, C3, and C4 are taken as examples to explain the principle of the present disclosure, where C1, C2, C3, and C4 may be the same or different.
- FIGS. 9A and 9B show schematic flowcharts of a method for determining a candidate area according to an embodiment of the present disclosure.
- the method shown in FIGS. 9A and 9B can be performed by using the candidate area determination module shown in FIGS. 2 and 5.
- the candidate area determining method shown in FIG. 9A and 9B can be used to implement the candidate area determining step S704 shown in FIG. 7.
- FIG. 9A shows a method for determining a candidate area based on a combined image feature that determines image information that incorporates a reference input image feature and a first input image feature whose size is smaller than the size of the reference input image feature.
- an up-sampling operation may be performed on the first input image feature, so that the size of the up-sampled first input image feature is enlarged to the size of the reference input image feature.
- Step S9022 can be performed using an up-sampling module as shown in FIG. 5.
- step S9024 the up-sampled first input image feature and the reference input image feature are combined to obtain the first combined image feature.
- Step S9024 can be performed by using a combination module as shown in FIG. 5.
- the reference input image feature and the up-sampled first input image feature may be superimposed.
- the element values at the same coordinates in the reference input image feature and the upsampled first input image feature can be directly processed The sum is used as the parameter of the superimposed image feature at this coordinate.
- the reference input image feature and the up-sampled first input image feature have different channel numbers
- the reference input image feature and the up-sampled first input image feature can be processed into a method as provided in step S7024 Have the same number of channels, and superimpose the reference input image feature with the same number of channels and the up-sampled first input image feature to generate the first combined image feature.
- FIG. 10A shows an example for combining image information of different scales shown in FIG. 9A.
- the size of C4 can be enlarged to the same size of 4 ⁇ 4 as the input image feature C3 by using up-sampling of 2 times. Then, the up-sampled C4 and C3 can be superimposed to generate a 4 ⁇ 4 combined image feature that combines the image information of C4 and the image information of C3.
- the up-sampled C4 and C3 can be processed into the same number of channels by the aforementioned dimensional adjustment step, and the image features with the same number of channels (up-sampled C4 And C3) are superimposed.
- a trained convolutional neural network may be used to generate a combined image feature combining image information of C4 and C3 from input image features (C4 and C3) with different numbers of channels.
- the first candidate region may be generated according to the first combined image feature of the image information fused with the reference input image feature and the first input image feature.
- Step S9024 may be performed using the area determination module shown in FIG. 5.
- one or more of a sliding window, a select search (select search), an edgebox algorithm, and a region proposal network (Region Proposal Network, RPN) may be used to perform image processing on the combined image features and generate candidate regions.
- the image feature C4 uses the above algorithm for determining candidate regions, the image feature C4, the image feature fused with the image information of C4 and C3, the image feature fused with the image information of C3 and C2, and the image feature fused with C2
- the algorithm for determining candidate regions is executed with the image characteristics of the image information of C1, and a plurality of first candidate regions for the input image are generated.
- the candidate area is a rectangle
- the image features C1, C2, C3, and C4 can be processed by the above method and the coordinates of the four vertices of the first candidate area of the rectangle in the input image can be output, so that the candidate can be determined The position of the region in the input image.
- FIG. 9B shows a method for determining a candidate area based on a combined image feature of image information fused with a reference input image feature and a second input image feature whose size is larger than the size of the reference input image feature.
- a down-sampling operation may be performed on the second input image feature, so that the size of the down-sampled second input image feature is reduced to the size of the reference input image feature.
- Step S9042 can be performed using the down-sampling module as shown in FIG. 5.
- step S9044 the down-sampled second input image feature and the reference input image feature may be combined to obtain a second combined image feature.
- Step S9044 can be performed by using a combination module as shown in FIG. 5.
- the reference input image feature and the down-sampled second input image feature may be superimposed.
- the element values at the same coordinates in the reference input image feature and the down-sampled second input image feature can be directly processed The sum is used as the parameter of the superimposed image feature at this coordinate.
- the reference input image feature and the down-sampled second input image feature have different channel numbers
- the reference input image feature and the down-sampled second input image feature can be processed into a method as provided in step S7024 Having the same number of channels, and superimposing the reference input image features with the same number of channels and the down-sampled second input image features to generate the second combined image feature.
- FIG. 10B shows an example for combining image information of different scales shown in FIG. 9B.
- Fig. 10B for an image feature C3 with a size of 4 ⁇ 4, downsampling of 2 times can be used to reduce the size of C3 to the same size of 2 ⁇ 2 as the image feature C4. Then, the down-sampled C3 and C4 can be superimposed, and a combined image feature with a size of 2 ⁇ 2 fused with the image information of C4 and the image information of C3 can be generated.
- the down-sampled C3 and C4 can be directly superimposed.
- the down-sampled C3 and C4 can be processed to have the same number of channels, and the image features with the same number of channels (the down-sampled C3 and C4) can be superimposed .
- a trained convolutional neural network can be used to generate a combined image feature combining image information of C4 and C3 from image features (C4 and C3) with different numbers of channels.
- the second candidate region may be generated according to the second combined image feature of the image information fused with the reference input image feature and the second input image feature.
- Step S9046 may be performed using the area determination module shown in FIG. 5.
- the second combined image feature generated as described above can be used as input, and one or more of the sliding window, select search, edgebox algorithm, and Region Proposal Network (RPN) can be used to compare the second Combine image features for image processing, and generate candidate regions.
- RPN Region Proposal Network
- image features C1, image features fused with C1 and C2 image information, image features fused with C3 and C2 image information, and C3 The image feature of the image information of C4 executes the algorithm for determining the candidate area, and generates a plurality of second candidate areas for the input image.
- the candidate area is a rectangle
- the image features C1, C2, C3, and C4 can be processed by the above method and the coordinates of the four vertices of the second candidate area of the rectangle in the input image can be output, so that the candidate can be determined The position of the region in the input image.
- the image information of the first input image feature whose size is smaller than the size of the reference input image feature and the second input image feature whose size is larger than the size of the reference input image feature may be combined according to the image information.
- the third combined image feature determines the third candidate area.
- the up-sampling module shown in FIG. 5 can be used to perform up-sampling on the image feature C4 with a size of 2 ⁇ 2, and the size of C4 can be enlarged to the same size of 4 ⁇ 4 as the image feature C3.
- the up-sampled C4, the down-sampled C2, and the image feature C3 can be combined.
- the superposition operation can be performed on the up-sampled C4, the down-sampled C2, and the image feature C3, and a third combined image feature fused with image information of C2, C3, and C4 with different sizes can be generated.
- a region determining module as shown in FIG. 5 can be used to generate a plurality of third candidate regions corresponding to the scale of C3 according to the third combined image feature. For example, when the candidate area is a rectangle, the coordinates of the four vertices of a plurality of third candidate areas of the rectangle in the input image can be output, so that the position of the candidate area in the input image can be determined.
- the technician can select one or more of the multiple first candidate regions, multiple second candidate regions, and multiple third candidate regions for the next target detection operation according to actual needs.
- the plurality of first candidate regions, the plurality of second candidate regions, and the plurality of third candidate regions determined as described above can all be used for the next target detection operation.
- a part of the plurality of first candidate regions, the plurality of second candidate regions, and the plurality of third candidate regions determined as described above may be selected for the next target detection operation according to a preset rule.
- FIG. 11A shows another method for determining a candidate region according to an embodiment of the present disclosure.
- the candidate area determination module shown in FIG. 3 and FIG. 5 may be used to perform the method shown in FIG. 11A.
- the method shown in FIG. 11A can be used to implement the candidate area determination step S704 shown in FIG. 7.
- step S1102 an up-sampling operation may be performed on the first input image feature, so that the size of the up-sampled first input image feature is enlarged to the size of the reference input image feature.
- Step S1102 can be performed by using the up-sampling module shown in FIG. 5.
- a third input image feature whose size is smaller than the size of the first input image feature is selected from the plurality of input image features.
- an up-sampling operation may be performed on the third input image feature, so that the size of the up-sampled third input image feature is enlarged to the size of the reference input image feature.
- Step S1106 can be performed using the up-sampling module shown in FIG. 5.
- step S1106 the up-sampled third input image feature, the up-sampled first input image feature, and the reference input image feature can be combined to obtain the first combined image feature.
- Step S1106 can be performed using the combination module shown in FIG. 5.
- step S1108 the first candidate region associated with the reference input image feature may be determined based on the first combined image feature.
- Step S1108 may be performed using the area determination module shown in FIG. 5.
- the method provided in FIG. 11A when determining a candidate area for a specific size input image feature, a plurality of different input image features that are smaller than the specific size can be selected, and The image information of these input image features is fused with the image information of the reference input image features.
- the method provided in FIG. 11A can determine the first candidate area for a specific size by using a combined image feature that incorporates information of image features of more scales.
- the image feature can be fused with the information of the image feature of a smaller size, and the fused image information can be used to generate a candidate area.
- FIG. 11A only shows an example of fusing image information of three scales, however, the content of the present disclosure is not limited to this.
- one or more image features having a size smaller than the specific size may be selected based on a predetermined rule for generating a combined image feature. For example, all image information of image features smaller than the specific size can be fused with the image features of the specific size.
- FIG. 11B shows another method for determining a candidate region according to an embodiment of the present disclosure.
- the candidate area determination module shown in FIG. 3 and FIG. 5 may be used to perform the method shown in FIG. 11B.
- the method shown in FIG. 11B can be used to implement the candidate region determination step S704 shown in FIG. 7.
- step S1112 a down-sampling operation may be performed on the second input image feature, so that the size of the down-sampled second input image feature is reduced to the size of the reference input image feature.
- the up-sampling module shown in FIG. 5 may be used to perform step S1112.
- a fourth input image feature whose size is larger than the size of the second input image feature is selected from the plurality of input image features.
- a down-sampling operation may be performed on the fourth input image feature, so that the size of the down-sampled fourth input image feature is reduced to the size of the reference input image feature.
- Step S1114 can be performed by using the down-sampling module shown in FIG. 5.
- step S1116 the down-sampled fourth input image feature, the down-sampled second input image feature, and the reference input image feature can be combined to obtain a second combined image feature.
- Step S1116 may be performed using the combination module shown in FIG. 5.
- a second candidate region associated with the reference input image feature may be determined based on the second combined image feature.
- Step S1118 may be performed by using the area determination module shown in FIG. 5.
- FIG. 11B uses a combined image feature that incorporates image feature information of more scales to determine the first candidate region for a specific size.
- the image feature can be fused with information of an image feature of a larger size, and a candidate region can be generated using the fused image information.
- FIG. 11B only shows an example in which image information of three scales is fused, however, the content of the present disclosure is not limited to this.
- one or more image features having a size larger than the specific size may be selected based on a predetermined rule for generating a combined image feature. For example, all image information of image features larger than the specific size can be fused with the image features of the specific size.
- FIG. 12A and FIG. 12B respectively show schematic procedures for the candidate region determination method shown in FIG. 11A and FIG. 11B.
- a 4-fold down-sampling operation can be performed on C2, Reduce the size of C2 to 2 ⁇ 2, perform a 2 times downsampling operation on C3, reduce the size of C3 to 2 ⁇ 2, and perform a combined operation on down-sampled C2, down-sampled C3, and C4.
- the method of the combined operation has been described in detail in the examples shown in FIGS. 10A-10C, and will not be repeated here.
- FIG. 12C shows an example of a target detection flow according to an embodiment of the present disclosure.
- all candidate regions obtained by the process shown in FIG. 12A and FIG. 12B can be used for subsequent target detection. That is, the target detection method that will be described next with reference to FIG. 13 is used to process all the candidate regions output by the process shown in FIGS. 12A and 12B to obtain the final target detection result.
- a part of all candidate regions output by the process shown in FIG. 12A and FIG. 12B may be selected for the target detection operation according to a preset rule.
- Fig. 13 shows a schematic flowchart of a target detection method according to an embodiment of the present disclosure.
- the method shown in FIG. 13 can be executed by using the target detection module shown in FIG. 3 and FIG. 6.
- the target detection step S706 in the image processing method shown in FIG. 7 can be realized by using the method shown in FIG. 13.
- step S1302 pooling processing can be performed on the first candidate area and the second candidate area with different sizes, so that the size of each candidate area after processing is the same .
- Step S1302 can be performed by using the pooling module shown in FIG. 6.
- FIG. 13 only shows the first candidate region and the second candidate region as examples, the scope of the present disclosure is not limited to this. The skilled person can select one or more of the candidate regions generated by the aforementioned candidate region determination method to be used in the target detection method shown in FIG. 13 according to the actual situation.
- a pooling layer (for example, ROI Pooling) may be used to map candidate regions of different sizes to candidate regions with a preset fixed size.
- the technician sets the size of the candidate area output by the pooling layer by adjusting the parameters of the pooling layer.
- step S1304 the processed candidate regions with the same size can be classified and predicted.
- Step S1304 can be performed by using the classification module shown in FIG. 6.
- a region-based convolutional neural network R-CNN
- R-CNN region-based convolutional neural network
- step S1306 the border of the candidate area can be adjusted according to the predicted category, and step S1306 can be performed by using the adjustment module shown in FIG. 6.
- a boundary regression bounding-box regression
- step S1306 can be performed by using the adjustment module shown in FIG. 6.
- a boundary regression bounding-box regression
- candidate regions can be generated based on input image features of different sizes, and for input image features of a specific size, the input image features can be combined with Image features smaller than the specific size and/or image features larger than the specific size are merged, and the candidate region for the input image is determined by using the image features including image information fused with multiple scales . By considering the image information at multiple scales, the accuracy of target detection can be improved.
- Fig. 14 shows a schematic block diagram of a computing device.
- the image processing apparatus shown in FIGS. 3 to 6 can be realized by using the computing device shown in FIG. 14.
- the computing device 1400 may include a bus 1410, one or more CPUs 1420, a read only memory (ROM) 1430, a random access memory (RAM) 1440, a communication port 1450 connected to a network, and input/output components. 1460, hard disk 1470, etc.
- the storage device in the computing device 1400 such as the ROM 1430 or the hard disk 1470, can store various data or files used for computer processing and/or communication and program instructions executed by the CPU.
- the computing device 1400 may also include a user interface 1480.
- the user interface 1480 can display the result output by the image processing apparatus as described above to the user.
- the architecture shown in FIG. 14 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG. 14 may be omitted according to actual needs.
- a computer-readable medium may take many forms, including tangible storage media, carrier wave media, or physical transmission media.
- Stable storage media may include: optical disks or magnetic disks, and other storage systems used in computers or similar devices that can implement the system components described in the figure.
- Unstable storage media may include dynamic memory, such as the main memory of a computer platform.
- Tangible transmission media may include coaxial cables, copper cables, and optical fibers, such as the lines forming a bus inside a computer system.
- the carrier wave transmission medium can transmit electric signal, electromagnetic signal, acoustic wave signal or light wave signal, etc.
- Common computer readable media include hard disks, floppy disks, magnetic tapes, any other magnetic media; CD-ROM, DVD, DVD-ROM, any other optical media; punch cards, any other physical storage media containing small hole patterns; RAM, PROM , EPROM, FLASH-EPROM, any other memory chips or tapes; carrier waves, cables or connecting devices for carrier waves that transmit data or instructions, any other program codes and/or data that can be read by a computer.
- Common computer readable media include hard disks, floppy disks, magnetic tapes, any other magnetic media; CD-ROM, DVD, DVD-ROM, any other optical media; punch cards, any other physical storage media containing small hole patterns; RAM, PROM , EPROM, FLASH-EPROM, any other memory chips or tapes; carrier waves, cables or connecting devices for carrier waves that transmit data or instructions, any other program codes and/or data that can be read by a computer.
- the “module” in this application may refer to logic stored in hardware, firmware, or a set of software instructions.
- the “module” referred to herein can be executed by software and/or hardware modules, or stored in any kind of computer-readable non-transitory medium or other storage device.
- a software module can be compiled and linked into an executable program.
- the software module here can respond to information transmitted by itself or other modules, and/or can respond when certain events or interrupts are detected.
- the software module may be provided on a computer-readable medium, and the software module may be configured to perform operations on a computing device (for example, the processor 220).
- the computer-readable medium here can be an optical disc, a digital optical disc, a flash drive, a magnetic disk, or any other kind of tangible medium.
- the software module can also be obtained through the digital download mode (the digital download here also includes the data stored in the compressed package or the installation package, which needs to be decompressed or decoded before execution).
- the code of the software module here can be partially or completely stored in the storage device of the computing device that performs the operation, and used in the operation of the computing device.
- Software instructions can be embedded in firmware, such as erasable programmable read-only memory (EPROM).
- the hardware module may include logic units connected together, such as gates and flip-flops, and/or include programmable units, such as programmable gate arrays or processors.
- modules or computing devices described herein are preferably implemented as software modules, but may also be expressed in hardware or firmware.
- the modules mentioned here are logical modules and are not limited by their specific physical form or memory.
- a module can be combined with other modules or divided into a series of sub-modules.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (17)
- 一种图像处理方法,包括:根据输入图像确定多个输入图像特征,其中所述多个输入图像特征的尺寸互不相同;对于所述多个输入图像特征中的每个输入图像特征,以该输入图像特征为基准输入图像特征,从所述多个输入图像特征中选择其尺寸小于所述基准输入特征的尺寸的第一输入图像特征和其尺寸大于所述基准输入图像特征的尺寸的第二输入图像特征;根据所述基准输入图像特征、所述第一输入图像特征和第二输入图像特征确定与该基准图像特征相关联的候选区域;以及根据分别与所述多个输入图像特征相关联的多个候选区域执行目标检测。
- 根据权利要求1所述的图像处理方法,其中,对于基准输入图像特征,根据所述基准输入图像特征、所述第一输入图像特征和第二输入图像特征确定与该基准图像特征相关联的候选区域包括:根据所述基准输入图像特征和所述第一输入图像特征确定第一候选区域,以及根据所述基准输入图像特征和所述第二输入图像特征确定第二候选区域。
- 根据权利要求2所述的方法,其中,对于所述基准输入图像,所述第一候选区域与所述第二候选区域的尺寸是不同的,其中,根据分别与所述多个输入图像特征相关联的多个候选区域执行目标检测包括:对分别与所述多个输入图像特征相关联的多个第一候选区域和多个第二候选区域进行池化处理,使得处理后的各候选区域的尺寸是相同的;对处理后的候选区域进行分类预测;以及根据预测的类别调整候选区域的边框。
- 根据权利要求2或3所述的图像处理方法,其中,根据所述基准输入图像特征和所述第一输入图像特征确定第一候选区域包括:对所述第一输入图像特征执行上采样操作,使得上采样后的第一输入图像特征的尺寸放大为所述基准输入图像特征的尺寸;组合上采样后的第一输入图像特征与所述基准输入图像特征,并获得其 尺寸与所述基准输入图像特征的尺寸相同的第一组合图像特征;基于所述第一组合图像特征确定所述第一候选区域。
- 根据权利要求2-4任一项所述的图像处理方法,其中,根据所述基准输入图像特征和所述第二输入图像特征确定第二候选区域包括:对所述第二输入图像特征执行下采样操作,使得下采样后的第二输入图像特征的尺寸缩小为所述基准输入图像特征的尺寸;组合下采样后的第二输入图像特征与所述基准输入图像特征,并获得其尺寸与所述基准输入图像特征尺寸相同的第二组合图像特征;基于所述第二组合图像特征确定所述第二候选区域。
- 根据权利要求4或5所述的图像处理方法,还包括:对于所述基准输入图像特征,从所述多个输入图像特征中选择其尺寸小于所述第一输入图像特征的尺寸的第三输入图像特征;对所述第三输入图像特征执行上采样操作,使得上采样后的第三输入图像特征的尺寸放大为所述基准输入图像特征的尺寸;以及其中,组合上采样后的第一输入图像特征与所述基准输入图像特征,并获得其尺寸与所述基准输入图像特征的尺寸相同的第一组合图像特征包括:组合上采样后的第三输入图像特征、上采样后的第一输入图像特征与所述基准输入图像特征,并获得其尺寸与所述第一输入图像特征的尺寸相同的第一组合图像特征。
- 根据权利要求5或6所述的图像处理方法,还包括:对于所述基准输入图像特征,从所述多个输入图像特征中选择其尺寸大于所述第二输入图像特征的尺寸的第四输入图像特征;对所述第四输入图像特征执行下采样操作,使得下采样后的第四输入图像特征的尺寸减小为所述基准输入图像特征的尺寸;以及其中,组合下采样后的第二输入图像特征与所述基准输入图像特征,并获得其尺寸与所述基准输入图像特征尺寸相同的第二组合图像特征包括:组合下采样后的第四输入图像特征、下采样后的第二输入图像特征与所述基准输入图像特征,并获得与所述基准输入图像特征尺寸相同的第二组合图像特征。
- 根据权利要求1-7任一项所述的图像处理方法,其中所述多个输入图 像特征具有相同的通道数。
- 根据权利要求4-8任一项所述的图像处理方法,其中,基于所述第一组合图像特征确定第一候选区域包括:利用滑动窗口、选择搜索(select search)、edgebox算法或区域建议网络,基于所述第一组合图像特征确定第一候选区域。
- 根据权利要求1-9任一项所述的图像处理方法,其中,根据所述输入图像确定多个输入图像特征包括:利用深度残差网络对所述输入图像进行变换,并根据所述深度残差网络的输出确定对应于所述输入图像的多个输入图像特征。
- 一种图像处理装置,包括:特征确定模块,配置成根据输入图像确定多个输入图像特征,其中所述多个输入图像特征的尺寸互不相同;候选区域确定模块,配置成对于所述多个输入图像特征中的每一个输入图像特征执行以下操作以生成候选区域:对于第一输入图像特征,从所述多个输入图像特征中选择第二输入图像特征和第三输入图像特征,其中所述第二输入图像特征的尺寸小于所述第一输入图像特征的尺寸,并且所述第三输入图像特征的尺寸大于所述第一输入图像特征的尺寸;根据所述第一输入图像特征、第二输入图像特征和第三输入图像特征确定候选区域;以及目标检测模块,配置成根据所述候选区域执行目标检测。
- 如权利要求11所述的图像处理装置,其中对于基准输入图像特征,所述候选区域确定模块进一步配置成:根据所述基准输入图像特征和所述第一输入图像特征确定第一候选区域,以及根据所述基准输入图像特征和所述第二输入图像特征确定第二候选区域。
- 如权利要求12所述的图像处理装置,其中,对于所述基准输入图像,所述第一候选区域与所述第二候选区域的尺寸是不同的,所述目标检测模块进一步配置成:对分别与所述多个输入图像特征相关联的多个第一候选区域和多个第二候选区域进行池化处理,使得处理后的各候选区域的尺寸是相同的;对处理后的候选区域进行分类预测;以及根据预测的类别调整候选区域的边框。
- 根据权利要求12或13所述的图像处理装置,其中,所述候选区域确定模块还包括:上采样模块,配置成对所述第一输入图像特征执行上采样操作,使得上采样后的第一输入图像特征的尺寸放大为所述基准输入图像特征的尺寸;组合模块,配置成组合上采样后的第一输入图像特征与所述基准输入图像特征,并获得其尺寸与所述基准输入图像特征的尺寸相同的第一组合图像特征;区域确定模块,配置成基于所述第一组合图像特征确定所述第一候选区域。
- 根据权利要求12-14任一项所述的图像处理装置,其中,所述候选区域确定模块还包括:下采样模块,配置成对所述第二输入图像特征执行下采样操作,使得下采样后的第二输入图像特征的尺寸缩小为所述基准输入图像特征的尺寸;组合模块,配置成组合下采样后的第二输入图像特征与所述基准输入图像特征,并获得其尺寸与所述基准输入图像特征尺寸相同的第二组合图像特征;区域确定模块,配置成基于所述第二组合图像特征确定所述第二候选区域。
- 一种图像处理设备,包括至少一个处理器以及存储有程序指令的存储器,当执行所述程序指令时,所述至少一个处理器配置成执行根据权利要求1-10任一所述的图像处理方法。
- 一种计算机可读的非暂态的存储介质,其上存储有程序指令,当由计算机执行所述程序指令时,所述计算机配置成执行根据权利要求1-10任一所述的图像处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910285254.5 | 2019-04-10 | ||
CN201910285254.5A CN109977963B (zh) | 2019-04-10 | 2019-04-10 | 图像处理方法、设备、装置以及计算机可读介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020207134A1 true WO2020207134A1 (zh) | 2020-10-15 |
Family
ID=67083889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/076598 WO2020207134A1 (zh) | 2019-04-10 | 2020-02-25 | 图像处理方法、装置、设备以及计算机可读介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109977963B (zh) |
WO (1) | WO2020207134A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112288657A (zh) * | 2020-11-16 | 2021-01-29 | 北京小米松果电子有限公司 | 图像处理方法、图像处理装置及存储介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977963B (zh) * | 2019-04-10 | 2021-10-15 | 京东方科技集团股份有限公司 | 图像处理方法、设备、装置以及计算机可读介质 |
CN112784629A (zh) * | 2019-11-06 | 2021-05-11 | 株式会社理光 | 图像处理方法、装置和计算机可读存储介质 |
CN113379738A (zh) * | 2021-07-20 | 2021-09-10 | 重庆大学 | 一种基于图像的疫木检测与定位方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968884A (zh) * | 2009-07-28 | 2011-02-09 | 索尼株式会社 | 检测视频图像中的目标的方法和装置 |
CN106529527A (zh) * | 2016-09-23 | 2017-03-22 | 北京市商汤科技开发有限公司 | 物体检测方法和装置、数据处理装置和电子设备 |
US20180089803A1 (en) * | 2016-03-21 | 2018-03-29 | Boe Technology Group Co., Ltd. | Resolving Method and System Based on Deep Learning |
CN108229488A (zh) * | 2016-12-27 | 2018-06-29 | 北京市商汤科技开发有限公司 | 用于检测物体关键点的方法、装置及电子设备 |
CN108876791A (zh) * | 2017-10-23 | 2018-11-23 | 北京旷视科技有限公司 | 图像处理方法、装置和系统及存储介质 |
CN109977963A (zh) * | 2019-04-10 | 2019-07-05 | 京东方科技集团股份有限公司 | 图像处理方法、设备、装置以及计算机可读介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9965719B2 (en) * | 2015-11-04 | 2018-05-08 | Nec Corporation | Subcategory-aware convolutional neural networks for object detection |
CN107341517B (zh) * | 2017-07-07 | 2020-08-11 | 哈尔滨工业大学 | 基于深度学习层级间特征融合的多尺度小物体检测方法 |
CN107392901A (zh) * | 2017-07-24 | 2017-11-24 | 国网山东省电力公司信息通信公司 | 一种用于输电线路部件智能自动识别的方法 |
CN108764063B (zh) * | 2018-05-07 | 2020-05-19 | 华中科技大学 | 一种基于特征金字塔的遥感影像时敏目标识别系统及方法 |
CN109117876B (zh) * | 2018-07-26 | 2022-11-04 | 成都快眼科技有限公司 | 一种稠密小目标检测模型构建方法、模型及检测方法 |
CN109360633B (zh) * | 2018-09-04 | 2022-08-30 | 北京市商汤科技开发有限公司 | 医疗影像处理方法及装置、处理设备及存储介质 |
-
2019
- 2019-04-10 CN CN201910285254.5A patent/CN109977963B/zh active Active
-
2020
- 2020-02-25 WO PCT/CN2020/076598 patent/WO2020207134A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968884A (zh) * | 2009-07-28 | 2011-02-09 | 索尼株式会社 | 检测视频图像中的目标的方法和装置 |
US20180089803A1 (en) * | 2016-03-21 | 2018-03-29 | Boe Technology Group Co., Ltd. | Resolving Method and System Based on Deep Learning |
CN106529527A (zh) * | 2016-09-23 | 2017-03-22 | 北京市商汤科技开发有限公司 | 物体检测方法和装置、数据处理装置和电子设备 |
CN108229488A (zh) * | 2016-12-27 | 2018-06-29 | 北京市商汤科技开发有限公司 | 用于检测物体关键点的方法、装置及电子设备 |
CN108876791A (zh) * | 2017-10-23 | 2018-11-23 | 北京旷视科技有限公司 | 图像处理方法、装置和系统及存储介质 |
CN109977963A (zh) * | 2019-04-10 | 2019-07-05 | 京东方科技集团股份有限公司 | 图像处理方法、设备、装置以及计算机可读介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112288657A (zh) * | 2020-11-16 | 2021-01-29 | 北京小米松果电子有限公司 | 图像处理方法、图像处理装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109977963A (zh) | 2019-07-05 |
CN109977963B (zh) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020207134A1 (zh) | 图像处理方法、装置、设备以及计算机可读介质 | |
US9892496B2 (en) | Edge-aware bilateral image processing | |
US9418319B2 (en) | Object detection using cascaded convolutional neural networks | |
WO2020078269A1 (zh) | 三维图像的语义分割方法、装置、终端及存储介质 | |
CN110473137B (zh) | 图像处理方法和装置 | |
KR102140340B1 (ko) | 컨볼루션 뉴럴 네트워크를 통해 이미지 위변조를 탐지하는 시스템 및 이를 이용하여 무보정 탐지 서비스를 제공하는 방법 | |
US10410350B2 (en) | Skip architecture neural network machine and method for improved semantic segmentation | |
US20190325203A1 (en) | Dynamic emotion recognition in unconstrained scenarios | |
JP6044134B2 (ja) | 最適画像サイズによる画像領域分割装置、方法、およびプログラム | |
CN109996023B (zh) | 图像处理方法和装置 | |
CN113034358B (zh) | 一种超分辨率图像处理方法以及相关装置 | |
CN109118490B (zh) | 一种图像分割网络生成方法及图像分割方法 | |
US20210209782A1 (en) | Disparity estimation | |
US20210183014A1 (en) | Determination of disparity | |
WO2023065665A1 (zh) | 图像处理方法、装置、设备、存储介质及计算机程序产品 | |
US20210150679A1 (en) | Using imager with on-purpose controlled distortion for inference or training of an artificial intelligence neural network | |
WO2020238120A1 (en) | System and method for single-modal or multi-modal style transfer and system for random stylization using the same | |
WO2021115061A1 (zh) | 图像分割方法、装置及服务器 | |
JP2020017082A (ja) | 画像オブジェクト抽出装置及びプログラム | |
US20230005104A1 (en) | Method and electronic device for performing ai based zoom of image | |
CN116468702A (zh) | 黄褐斑评估方法、装置、电子设备及计算机可读存储介质 | |
WO2022033088A1 (zh) | 图像处理方法、装置、电子设备和计算机可读介质 | |
KR101592087B1 (ko) | 배경 영상의 위치를 이용한 관심맵 생성 방법 및 이를 기록한 기록 매체 | |
US11797854B2 (en) | Image processing device, image processing method and object recognition system | |
WO2022257346A1 (zh) | 目标检测方法、装置、设备、存储介质及计算机程序 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20787767 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20787767 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20787767 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/05/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20787767 Country of ref document: EP Kind code of ref document: A1 |