WO2023231022A1 - Image recognition method, self-moving device and storage medium - Google Patents
Image recognition method, self-moving device and storage medium Download PDFInfo
- Publication number
- WO2023231022A1 WO2023231022A1 PCT/CN2022/096975 CN2022096975W WO2023231022A1 WO 2023231022 A1 WO2023231022 A1 WO 2023231022A1 CN 2022096975 W CN2022096975 W CN 2022096975W WO 2023231022 A1 WO2023231022 A1 WO 2023231022A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- channel
- pixel
- feature map
- recognized
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000005070 sampling Methods 0.000 claims abstract description 93
- 230000008569 process Effects 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 15
- 244000025254 Cannabis sativa Species 0.000 description 7
- 230000011218 segmentation Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000009331 sowing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- the present application relates to the field of artificial intelligence, and in particular to an image recognition method, a mobile device and a computer-readable storage medium.
- Various embodiments of the present application provide an image recognition method, a mobile device and a storage medium.
- this application provides an image recognition method, which method includes:
- each image channel is sampled to obtain multiple channel feature maps corresponding to each image channel; the total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to the pixels of each image channel. Total number of points;
- this application also provides an image recognition device, which includes:
- An image acquisition module configured to acquire an image feature map of the image to be recognized, where the image feature map includes multiple image channels;
- a channel sampling module configured to sample the pixels of each image channel to obtain multiple channel feature maps corresponding to each image channel; the total number of pixels of the multiple channel feature maps corresponding to each image channel Equal to the total number of pixels in each image channel;
- An input determination module configured to use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing
- the convolution processing module is configured to perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
- this application also provides a self-mobile device, the computer device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and execute the The computer program implements the above image recognition method.
- the present application also provides a computer-readable storage medium that stores a computer program.
- the computer program When executed by a processor, the computer program causes the processor to implement the above-mentioned image recognition method. .
- Figure 1 is a schematic flow chart of an image recognition method provided by an embodiment of the present application.
- Figure 2 is a schematic diagram of an image feature map provided by an embodiment of the present application.
- FIG. 3a is a schematic diagram of a channel diagram of an image channel provided by an embodiment of the present application.
- Figure 3b is a schematic diagram of the channel feature map corresponding to the image channel provided by the embodiment of the present application.
- Figure 4 is a schematic flow chart for obtaining a channel characteristic map provided by an embodiment of the present application.
- Figure 5a is a schematic diagram of an image to be recognized provided by an embodiment of the present application.
- Figure 5b is a schematic diagram of a recognition result of an image to be recognized provided by an embodiment of the present application.
- Figure 6 is a schematic block diagram of an image recognition device provided by an embodiment of the present application.
- FIG. 7 is a schematic structural block diagram of a mobile device provided by an embodiment of the present application.
- Embodiments of the present application provide an image recognition method, device, mobile device and storage medium.
- Image recognition methods can reduce the calculation amount of image recognition, thereby improving the efficiency of image recognition.
- the self-mobile equipment can be small self-mobile equipment such as lawn mowers, patrol robots, and mine-clearing robots, or self-mobile equipment for sanitation and cleaning, self-mobile food delivery equipment, and self-mobile equipment for agricultural sowing.
- Figure 1 is a schematic flow chart of an image recognition method provided by an embodiment of the present application.
- This image recognition method obtains the channel feature map by sampling the image feature map of the image to be recognized according to the image channel of the image to be recognized, and uses the channel feature map as the input of the convolution process to obtain the recognition result, which can reduce the need for convolution processing.
- the size of the channel feature map input at the same time can reduce the calculation amount of image recognition and improve the efficiency of image recognition.
- the image recognition method specifically includes steps S101 to S104.
- the image feature map includes multiple image channels.
- the image to be recognized captured by the camera device can be obtained through the camera device provided on the mobile device.
- the image to be recognized can be obtained through a camera installed on the lawn mower.
- the image to be recognized is a grass image captured by the lawn mower.
- the method of obtaining the image to be recognized is not limited here.
- the image feature map is used to characterize the image features of the image to be recognized. It can be the original image of the image to be recognized, or the image obtained after feature extraction of the image to be recognized.
- the image features can include the color features, texture features, and shape of the image. features and spatial relationship features.
- the image feature map includes multiple image channels, and the channel maps corresponding to each image channel are superimposed to form an image feature map.
- Figure 2 is a schematic diagram of an image feature map provided by an embodiment of the present application.
- the image channel may include three channels: image channel 1, image channel 2, and image channel 3.
- the corresponding channel maps of image channel 1, image channel 2, and image channel 3 are superimposed to form an image feature map.
- the multiple image channels may be, for example, three image channels: B image channel, G image channel and R image channel read using opencv (Open Source Computer Vision Library, a cross-platform computer vision library).
- S102 Sample the pixels of each image channel to obtain multiple channel feature maps corresponding to each image channel.
- the total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to the total number of pixels of each image channel. .
- the pixel points of the channel map of the image channel are sampled, thereby obtaining multiple channel feature maps corresponding to each image channel.
- the total number of pixels in the multiple channel feature maps obtained must be the same as the number of pixels in the channel map corresponding to the image channel. That is to say, when sampling, each pixel in the channel map must be Sampling is performed to ensure the integrity of the pixels in the sampled channel feature map, thereby ensuring the integrity of the image information in the image to be recognized without losing image information.
- Figure 3a is a schematic diagram of a channel diagram of an image channel provided by an embodiment of the present application
- Figure 3b is a schematic diagram of a channel feature map corresponding to the image channel provided by an embodiment of the present application.
- the numbers on the pixels in the channel map represent the labels of the pixels.
- pixel 1, pixel 3, pixel 9 and pixel 11 can be used as sampled pixels to obtain the channel feature map 11 corresponding to the image channel;
- pixel 2 , pixel 4, pixel 10 and pixel 12 are used as sampled pixels to obtain the channel feature map 12 corresponding to the image channel;
- pixel 5, pixel 7, pixel 13 and pixel 15 are used as sampled pixel point, the channel feature map 13 corresponding to the image channel is obtained;
- using pixel point 6, pixel point 8, pixel point 14 and pixel point 16 as the sampled pixel points, the channel feature map 14 corresponding to the image channel is obtained.
- the sampling is stopped, and the obtained channel feature map 11, channel feature map 12, channel feature map 13 and channel feature map 14 are used as the The channel feature map corresponding to the image channel.
- step S102 may include step S1021 and step S1022.
- the pixels of the channel map of the image channel are grouped to obtain multiple sampling pixel groups corresponding to each image channel. Among them, the number of pixels in each sampling pixel group is the same.
- the pixels of the channel map of each image channel can be sampled and grouped in an orderly manner.
- the sampling method may be to perform interval sampling with one pixel as the starting point, or to perform interval sampling with a group of adjacent pixels as the starting point.
- multiple adjacent pixels may be grouped as a group, and interval sampling may be performed with this group of pixels as the starting point.
- the adjacent pixels 1 and 5 in Figure 3a can be taken as a group and sampled every other pixel. Then, the sampled pixel group includes pixels 3 and 7.
- the step of grouping the pixels of each image channel to obtain multiple sampled pixel groups of each image channel may include:
- the starting pixel point set is determined among all pixels in the image channel according to the preset sampling interval N.
- the starting pixel point set is a pixel matrix including (N+1)*(N+1) pixels.
- the first pixel point set of the pixel matrix Each pixel point is any boundary vertex of the corresponding image channel; taking each pixel point in the starting pixel point set as the starting pixel point, the pixel points of the image channel are sampled at the preset sampling interval, and the pixel point of each starting pixel point is obtained. Sample a group of pixels.
- the starting pixel set includes multiple pixels. Each pixel in the starting pixel set The point is the starting point of sampling every time the pixels in the channel map of the image channel are sampled, that is, the starting pixel of each sampling.
- the starting pixel point set can be determined according to the boundary of the channel map of the image channel and the preset sampling interval N, that is, it is a pixel matrix including (N+1)*(N+1) pixel points,
- the first pixel point of the pixel matrix is any boundary vertex of the corresponding image channel.
- a starting pixel point can be determined based on any boundary vertex of the channel graph of the image channel, and then based on the starting pixel point, pixel points near the starting pixel point are selected according to the preset sampling interval. As the starting pixel point, the starting pixel point set is obtained.
- the channel map of the image channel can be based on the pixel point 1 in the upper left boundary vertex, the pixel point 4 in the upper right boundary vertex, the pixel point 13 in the lower left boundary vertex, and the pixel point 16 in the lower right boundary vertex. , you can select any boundary vertex as the first pixel point of the pixel matrix.
- the preset sampling interval N is 1, and pixel 1 in the upper left boundary vertex is the first pixel, the pixel matrix of (1+1)*(1+1) is obtained, which is the pixel in the pixel matrix where 1 is located.
- Point 2 pixel point 5 and pixel point 6, and these three pixel points are also used as the starting pixel points.
- the obtained starting pixel point set has a total of four pixel points, namely pixel point 1, pixel point 2, Pixel 5 and Pixel 6.
- the preset sampling interval N is 1, and pixel 4 in the upper right boundary vertex is the first pixel, then the pixel matrix of (1+1)*(1+1) is obtained, which is the pixel in the pixel matrix where 4 is located.
- Point 3 pixel point 7 and pixel point 8, and these three pixel points are also used as the starting pixel points.
- the obtained starting pixel point set has a total of four pixel points, namely pixel point 3, pixel point 4, Pixel point 7 and pixel point 8.
- sampling is performed based on the starting pixel point in the starting pixel point set and the preset sampling interval. Specifically, a pixel in the starting pixel set is used as the starting point for sampling, the pixels in the channel map of the image channel are sampled according to the preset sampling interval, and when this sampling is completed, the pixels obtained by this sampling are The pixels are used as a sampling pixel group, and then another pixel in the starting pixel set is used as the sampling starting point for the next sampling, until all pixels in the starting pixel set are used, and an image channel can be obtained The corresponding multiple sampling pixel groups. By executing the above process for each image channel, multiple sampling pixel groups corresponding to each image channel can be obtained.
- the preset sampling interval may be set in advance, and the value of the preset sampling interval needs to be set so that after the sampling is completed, the number of pixels in each sampling pixel group is the same.
- the step of obtaining the image feature map of the image to be recognized can be It includes: preprocessing the image to be recognized, the preprocessing including adjusting the resolution of the image to be recognized; performing feature extraction on the preprocessed image to be recognized to obtain an image feature map.
- preprocessing the image to be recognized can also include cropping, smoothing, filtering, etc. of the image to be recognized.
- preprocessing the image to be recognized irrelevant information in the image to be recognized can be eliminated, useful real information can be extracted, and relevant information can be enhanced. Detectability of information and maximizing data simplification, thereby increasing the reliability of image recognition.
- the resolution of the image to be recognized is adjusted, and the image to be recognized that needs to be image recognized is adjusted to the preset resolution, so that when the image to be recognized is subjected to image recognition, the pixels involved in the image recognition are The number of samples is the same, so there is no need to adjust the value of the preset sampling interval according to the resolution of the image to be recognized, which can reduce the calculation amount of image recognition during subsequent image recognition.
- the number of pixels in the image to be recognized is made consistent, and then when sampling the image to be recognized, it is ensured that the same sampling interval can adapt to all the images to be recognized, without the need to
- the sampling interval is adjusted according to the size of each image to be recognized, thereby reducing the amount of calculation during image recognition and thereby improving the efficiency of image recognition.
- Pixel 1 in the starting pixel set is used as the starting pixel for sampling.
- the resulting sampled pixel is pixel 3.
- Other sampleable points perform longitudinal sampling every other pixel from the position of pixel 1, and the resulting sampled pixel is pixel 9, and then perform horizontal sampling every other pixel, and the obtained sampled pixel is pixel 11.
- the sampled pixel is pixel 11.
- the pixels are obtained 1 is the sampling pixel group of the starting pixel, and the sampling pixel group includes pixel 1, pixel 3, pixel 9 and pixel 11.
- Pixel 2 in the starting pixel set is used as the starting pixel for sampling.
- horizontal sampling is performed every 1.
- the resulting sampled pixel is pixel 4.
- Other sampleable pixels perform vertical sampling every 1 from the position of pixel 2, and the resulting sampled pixel is pixel 10, and then perform horizontal sampling every 1, and the resulting sampled pixel is pixel 12 , at this time, there are no other sampleable points in the third row of pixels; at this time, when pixel 2 is used as the starting pixel, the pixels in the channel map of the image channel have no other sampleable points, and this sampling is completed.
- a sampling pixel group starting from pixel 2 is obtained.
- the sampling pixel group includes pixel 2, pixel 4, pixel 10 and pixel 12.
- Pixel 5 in the starting pixel set is used as the starting pixel for sampling, and horizontal sampling is performed every 1 starting from pixel 5.
- the resulting sampled pixel is pixel 7.
- Other sampleable pixels perform vertical sampling every 1 from the position of pixel 5, and the pixel of the sampled image obtained is pixel 13, and then perform horizontal sampling every 1, and the sampled pixel obtained is pixel 13. 15.
- a sampling pixel group starting from pixel 5 is obtained.
- the sampling pixel group includes pixel 5, pixel 7, pixel 13 and pixel 15.
- Pixel 6 in the starting pixel set is used as the starting pixel for sampling.
- horizontal sampling is performed every 1.
- the resulting sampled pixel is pixel 8.
- the first row of pixels There are no other pixels that can be sampled; perform vertical sampling every 1 from the position of pixel 6, and the resulting sampled pixel is pixel 14, and then perform horizontal sampling every 1, and the resulting sampled pixel is pixel 14. 16.
- a sampling pixel group starting from pixel 6 is obtained.
- the sampling pixel group includes pixel 6, pixel 8, pixel 14 and pixel 16.
- the pixels in each sampling pixel group are combined into images to obtain multiple channel feature maps corresponding to each image channel. That is to say, for each image channel, the number of sampled pixels is the same as the number of obtained channel feature maps. For example, if an image channel samples four pixel groups, then the channel feature map corresponding to the image channel The number of is also 4.
- the step of combining pixels in the sampled pixel group of each image channel to obtain multiple channel feature maps corresponding to each image channel may include:
- the individual pixels are spliced to obtain the channel feature map of the sampled pixel group; complete the pixels of all sampled pixel groups Point splicing obtains multiple channel feature maps corresponding to each image channel.
- the combination when combining the pixels in the sampled pixel group, the combination can be based on the position of each pixel in the sampled pixel group in the image feature map, so that the resulting image feature map The characteristic information is not lost.
- the pixels in multiple pixel groups in each image channel the pixels are spliced according to the position of each pixel in the image feature map, and after all the pixel groups are spliced, each image is obtained Multiple channel feature maps corresponding to the channel.
- the image feature map can be calculated based on these four pixels.
- position in that is, pixel point 6 is on the left side of pixel point 8
- pixel point 6 is on the upper side of pixel point 14
- pixel point 16 is on the right side of pixel point 14
- pixel point 16 is on the lower side of pixel point 8
- these four pixels are spliced to obtain the channel feature map 14 as shown in Figure 3b.
- All channel feature maps corresponding to the image feature map are a collection of multiple channel feature maps corresponding to each image channel.
- the multiple channel feature maps corresponding to each image channel are superimposed, and the superimposed channel feature map is used as a volume.
- the input feature maps of convolution processing are used to enable the image recognition convolutional network to convolve these channel feature maps.
- the sampled pixel points are spliced as the channel feature map, reducing the size of each channel feature map, so that the size of the obtained channel feature map is relative to the original image feature
- the map is reduced by a factor of n, and the number of image channels of the input feature map is increased.
- each channel map For example, for an image feature map with a resolution of 320*320 and a channel number of 3, the pixels are sampled every other sample and the sampled pixels are combined to obtain the size of each channel feature map. is 160*160, which is reduced to 1/4 times of the original image feature map. The width and height of the channel feature map are reduced to 1/2 times of the original.
- each channel map can get 4 Channel feature maps, after superimposing multiple channel feature maps corresponding to each image channel, the number of image channels of the input feature map used for convolution processing is expanded to 4 times the original.
- the channel feature map obtained by sampling and splicing pixels according to the image channel ensures that there is no information loss in the input feature map of the image recognition convolution network, and can reduce the input without ensuring that the image information is not lost.
- the size of the input feature map for the image recognition convolutional network is not lost.
- overlaying multiple channel feature maps corresponding to each image channel increases the number of image channels of the input feature map used for convolution processing, and can also reduce the calculation amount of the image recognition convolution network when performing image recognition. , to achieve the purpose of improving image recognition speed.
- S104 Perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
- the image recognition convolution network performs a preset number of convolution processes on the input feature map to obtain the recognition result of the image to be recognized.
- the recognition results of the image to be recognized include identifying image boundaries of different areas in the image to be recognized, and the recognition results after image segmentation of the image to be recognized.
- the input feature map includes multiple channel feature maps
- each channel feature map is convolved separately, and multiple convolution processing results will be obtained.
- Feature fusion is performed on multiple convolution processing results to obtain the final recognition result of the image to be recognized.
- a pre-recognition result will be obtained.
- Multiple pre-recognition results corresponding to multiple channel feature maps need to be spliced to obtain the final recognition result.
- the resnet50 residual network can be used as the backbone network of the image recognition convolution network to perform image recognition on the image to be recognized with a resolution of 320*320 and a number of image channels of 3, and in the resnet50 network
- the image feature map of the image to be recognized is processed to obtain the channel feature map, and the channel feature map is used as the input data of the resnet50 network for convolution processing.
- Table 1 they are the size and number of channels of the output data of the image recognition convolution network in each layer when the channel feature map is used as the input data of the resnet50 network, and the image feature map of the image to be recognized is used as the resnet50 network. Comparison of the size and number of channels of the output data of each layer of the image recognition convolutional network when inputting data.
- the channel feature map is the input data
- the image feature map is the input data 1 320*320*32 320*320*32 2 160*160*64 320*320*64 3 80*80*128 160*160*128 4 40*40*256 80*80*256 5 20*20*512 40*40*512 6 10*10*1024 20*20*1024
- the channel feature map is obtained, and the obtained channel feature map is used as the input data of the second layer. Therefore, it can be seen from Table 1 that using the image Compared with using the feature map as the input data of the resnet50 network and using the channel feature map as the input data of the resnet50 network, although the output data obtained by the two in each layer of the resnet50 network has the same number of output channels, the size of the output feature map differs by 4 times.
- the size of the output feature map of each layer of the resnet50 network starting from the second layer is only the output feature map obtained by using the image feature map as the input data of the resnet50 network. 1/4 of the size, the smaller size of the output feature map makes the calculation amount of convolution smaller and the model inference speed is faster.
- processing the image feature map of the image to be recognized and using the obtained channel feature map as the input feature map can reduce the size of the output data of each layer of the convolutional network.
- the model parameter amount param The smaller it is, the smaller the calculation amount of convolution is, which in turn makes the inference speed of the image recognition convolutional network faster.
- the instance segmentation algorithm model Yolact can also be used, and the resolution is 320* 320.
- the step of performing a preset number of convolution processes on the input feature map to obtain the recognition result of the image to be recognized may include: performing a preset number of convolution calculations on the input feature map; The results are normalized to obtain the recognition result of the image to be recognized.
- the recognition result of the image to be recognized includes the target area.
- the image recognition method also includes: obtaining the color characteristics of the target area; performing masking processing on the target area according to the color characteristics to obtain a mask map of the target area.
- the recognition result of the image to be recognized includes the target area.
- the image recognition method also includes: obtaining a label of the target area; performing masking processing on the target area according to the label of the target area to obtain a mask image of the target area.
- the target area refers to the movable area from the mobile device.
- the image to be recognized obtained from the mobile device is shown in Figure 5a.
- the green grass area and the yellow dead grass area can be used as the movable area of the lawn mower robot, and the movable area represents the lawn mower.
- the grass area where the robot operates, the immovable area is the non-grass area, such as trees, side paths, moving vehicles, etc.
- the obtained image recognition result is shown in Figure 5b, where the target area is the white area in Figure 5b, which is the movable area, the black area is the immovable area, and the boundary between the white area and the black area This is the edge of the grassland.
- mark the target area according to the label For example, mark the movable areas as 1 and mark the immovable areas as 0. Make the same mask for the target area with the same color feature, or use the same mask for the target area with the same label.
- the color characteristics of the target area can be obtained, so that the target area can be mapped based on the color characteristics of the target area.
- the image recognition method samples the pixel points of each image channel in the image feature map, so that the pixel points in the obtained channel feature map not only retain the information of the image feature map, but also increase the image channel quantity, ensuring the integrity of the image information in the image to be recognized.
- Figure 6 is a schematic block diagram of an image recognition device further provided by an embodiment of the present application.
- the image recognition device is used to execute the aforementioned image recognition method.
- the image recognition device can be configured in a mobile device.
- the server can be an independent server or a server cluster.
- the terminal can be an electronic device such as a mobile phone, tablet, laptop, desktop computer, personal digital assistant, and wearable device.
- the image recognition device 200 includes: an image acquisition module 201 , a channel sampling module 202 , an input determination module 203 and a convolution processing module 204 .
- the image acquisition module 201 is configured to acquire an image feature map of an image to be recognized, where the image feature map includes multiple image channels.
- the channel sampling module 202 is configured to sample the pixel points of each image channel to obtain multiple channel feature maps corresponding to each image channel.
- the pixel points of the multiple channel feature maps corresponding to each image channel are The total number is equal to the total number of pixels in each image channel.
- the input determination module 203 is configured to use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing.
- the convolution processing module 204 is configured to perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
- the above image recognition device can be implemented in the form of a computer program, and the computer program can be run on a mobile device as shown in Figure 7.
- FIG. 7 is a schematic structural block diagram of a mobile device provided by an embodiment of the present application.
- the self-mobile device can be a small self-mobile device such as a lawnmower, a patrol robot, or a mine-clearing robot.
- the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.
- Non-volatile storage media stores operating systems and computer programs.
- the computer program includes program instructions, which when executed, can cause the processor to perform any image recognition method.
- the processor is used to provide computing and control capabilities to support the operation of the entire computer device.
- the internal memory provides an environment for the execution of the computer program in the non-volatile storage medium.
- the computer program When executed by the processor, it can cause the processor to execute any image recognition method.
- This network interface is used for network communication, such as sending assigned tasks, etc.
- Those skilled in the art can understand that the structure shown in Figure 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
- the processor can be a central processing unit (Central Processing Unit, CPU), and the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general processor may be a microprocessor or the processor may be any conventional processor.
- the processor is used to run a computer program stored in the memory to implement the following steps:
- An image feature map of the image to be recognized is obtained, where the image feature map includes multiple image channels.
- the pixels of each image channel are sampled to obtain multiple channel feature maps corresponding to each image channel.
- the total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to each of the image channels.
- All channel feature maps corresponding to the image feature map are used as input feature maps for convolution processing.
- the processor when the processor implements the sampling of pixels of each of the image channels to obtain multiple channel feature maps corresponding to each of the image channels, the processor is used to implement:
- the pixel points of each image channel are grouped to obtain multiple sampled pixel point groups of each image channel.
- the pixels in the sampling pixel group of each image channel are combined to obtain multiple channel feature maps corresponding to each image channel.
- the processor when the processor implements the grouping of pixel points of each of the image channels to obtain multiple sampled pixel point groups of each of the image channels, the processor is configured to implement:
- a starting pixel point set is determined among all pixels of the image channel according to the preset sampling interval N, and the starting pixel point set is a pixel matrix including (N+1)*(N+1) pixels, The first pixel point of the pixel matrix is any boundary vertex of the corresponding image channel.
- the pixel points of the image channel are sampled at a preset sampling interval to obtain a sampled pixel point group for each starting pixel point.
- the processor when the processor implements the combination of pixels in the sampled pixel group of each image channel to obtain multiple channel feature maps corresponding to each image channel, , used to implement:
- each pixel is spliced according to the position of each pixel in the sampled pixel group in the image feature map to obtain a channel feature map of the sampled pixel group.
- the processor when implementing the acquisition of the image feature map of the image to be recognized, is configured to implement:
- the image to be recognized is preprocessed, and the preprocessing includes adjusting the resolution of the image to be recognized.
- Feature extraction is performed on the preprocessed image to be recognized to obtain an image feature map.
- the processor when performing the preset subconvolution process on the input feature map to obtain the recognition result of the image to be recognized, the processor is configured to:
- the results obtained by the convolution calculation are normalized to obtain the recognition result of the image to be recognized.
- the processor realizes that the recognition result of the image to be recognized includes the target area, and performs a preset convolution process on the input feature map to obtain the recognition result of the image to be recognized, Used to implement:
- Mask processing is performed on the target area according to the color characteristics to obtain a mask image of the target area.
- the processor realizes that the recognition result of the image to be recognized includes the target area, and performs a preset convolution process on the input feature map to obtain the recognition result of the image to be recognized, Used to implement:
- Mask processing is performed on the target area according to the label of the target area to obtain a mask image of the target area.
- Embodiments of the present application also provide a computer-readable storage medium.
- the computer-readable storage medium stores a computer program.
- the computer program includes program instructions.
- the processor executes the program instructions to implement the present application. Any image recognition method provided by the embodiment.
- the computer-readable storage medium may be an internal storage unit of the mobile device described in the previous embodiment, such as a hard disk or memory of the mobile device.
- the computer-readable storage medium may also be an external storage device of the mobile device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital device equipped on the mobile device. , SD) card, flash card (Flash Card), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
An image recognition method, comprising: acquiring an image feature map of an image to be recognized, the image feature map comprising a plurality of image channels; sampling pixel points of the image channels to obtain a plurality of channel feature maps corresponding to the image channels, the total number of pixel points of the plurality of channel feature maps corresponding to the image channels being equal to the total number of the pixel points of the image channels; using all the channel feature maps corresponding to the image feature map as an input feature map of convolution processing; and performing a preset number of times of convolution processing on the input feature map to obtain the recognition result of said image.
Description
本申请涉及人工智能领域,尤其涉及一种图像识别方法、自移动设备及计算机可读存储介质。The present application relates to the field of artificial intelligence, and in particular to an image recognition method, a mobile device and a computer-readable storage medium.
这里的陈述仅提供与本申请有关的背景信息,而不必然地构成示例性技术。The statements herein merely provide background information relevant to the present application and do not necessarily constitute exemplary techniques.
随着计算机技术和人工智能技术的不断进步,自移动设备的自动工作系统已经开始慢慢的走进人们的生活。自移动设备在进行工作时,通常需要实时采集图像,并对图像进行区域分割和识别,从而实现确定工作区域以及进行工作区域内的路径规划等功能。例如可使用深度学习方法中的全景分割或者实例分割,对采集到的图像进行工作区域与非工作区域的分割,从而根据识别出的工作区域控制机器人的工作。但这种图像分割和识别方式的计算量较大,分割效率较低。With the continuous advancement of computer technology and artificial intelligence technology, automatic working systems for mobile devices have slowly begun to enter people's lives. When mobile devices are working, they usually need to collect images in real time and segment and identify the image areas to achieve functions such as determining the work area and planning the path within the work area. For example, panoramic segmentation or instance segmentation in deep learning methods can be used to segment the collected images into working areas and non-working areas, thereby controlling the work of the robot based on the identified working areas. However, this method of image segmentation and recognition requires a large amount of calculation and has low segmentation efficiency.
发明内容Contents of the invention
本申请的各种实施例提供了一种图像识别方法、自移动设备及存储介质。Various embodiments of the present application provide an image recognition method, a mobile device and a storage medium.
第一方面,本申请提供了一种图像识别方法,所述方法包括:In a first aspect, this application provides an image recognition method, which method includes:
获取待识别图像的图像特征图,所述图像特征图包括多个图像通道;Obtain an image feature map of the image to be recognized, where the image feature map includes multiple image channels;
对每个所述图像通道的像素点进行采样,得到每个图像通道对应的多个通道特征图;每个所述图像通道对应的多个通道特征图的像素点总数等于每个图像通道的像素点总数;The pixels of each image channel are sampled to obtain multiple channel feature maps corresponding to each image channel; the total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to the pixels of each image channel. Total number of points;
将所述图像特征图对应的所有通道特征图作为卷积处理的输入特征图;Use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing;
对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果。Perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
第二方面,本申请还提供了一种图像识别装置,所述装置包括:In a second aspect, this application also provides an image recognition device, which includes:
图像获取模块,被配置为获取待识别图像的图像特征图,所述图像特征图包括多个图像通道;An image acquisition module configured to acquire an image feature map of the image to be recognized, where the image feature map includes multiple image channels;
通道采样模块,被配置为对每个所述图像通道的像素点进行采样,得到每个图像通道对应的多个通道特征图;每个所述图像通道对应的多个通道特征图的像素点总数等于每个图像通道的像素点总数;a channel sampling module configured to sample the pixels of each image channel to obtain multiple channel feature maps corresponding to each image channel; the total number of pixels of the multiple channel feature maps corresponding to each image channel Equal to the total number of pixels in each image channel;
输入确定模块,被配置为将所述图像特征图对应的所有通道特征图作为卷积处理的输入特征图;An input determination module configured to use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing;
卷积处理模块,被配置为对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果。The convolution processing module is configured to perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
第三方面,本申请还提供了一种自移动设备,所述计算机设备包括存储器和处理器;所述存储器用于存储计算机程序;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如上述的图像识别方法。In a third aspect, this application also provides a self-mobile device, the computer device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and execute the The computer program implements the above image recognition method.
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如上述的图像识别方法。In a fourth aspect, the present application also provides a computer-readable storage medium that stores a computer program. When executed by a processor, the computer program causes the processor to implement the above-mentioned image recognition method. .
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其他特征、目的和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features, objects and advantages of the application will become apparent from the description, drawings and claims.
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application, which are of great significance to this field. Ordinary technicians can also obtain other drawings based on these drawings without exerting creative work.
图1是本申请实施例提供的一种图像识别方法的示意流程图。Figure 1 is a schematic flow chart of an image recognition method provided by an embodiment of the present application.
图2是本申请实施例提供的图像特征图的示意图。Figure 2 is a schematic diagram of an image feature map provided by an embodiment of the present application.
图3a是本申请实施例提供的一图像通道的通道图的示意图。FIG. 3a is a schematic diagram of a channel diagram of an image channel provided by an embodiment of the present application.
图3b是本申请实施例提供的该图像通道对应的通道特征图的示意图。Figure 3b is a schematic diagram of the channel feature map corresponding to the image channel provided by the embodiment of the present application.
图4是本申请实施例提供的得到通道特征图的示意流程图。Figure 4 is a schematic flow chart for obtaining a channel characteristic map provided by an embodiment of the present application.
图5a是本申请实施例提供的一种待识别图像的示意图。Figure 5a is a schematic diagram of an image to be recognized provided by an embodiment of the present application.
图5b是本申请实施例提供的一种待识别图像的识别结果的示意图。Figure 5b is a schematic diagram of a recognition result of an image to be recognized provided by an embodiment of the present application.
图6是本申请实施例提供的一种图像识别装置的示意性框图。Figure 6 is a schematic block diagram of an image recognition device provided by an embodiment of the present application.
图7是本申请实施例提供的一种自移动设备的结构示意性框图。FIG. 7 is a schematic structural block diagram of a mobile device provided by an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all contents and operations/steps, nor are they necessarily performed in the order described. For example, some operations/steps can also be decomposed, combined or partially merged, so the actual order of execution may change according to actual conditions.
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should be understood that the terminology used in the specification of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly dictates otherwise.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
本申请的实施例提供了一种图像识别方法、装置、自移动设备及存储介质。图像识别方法能够减少图像识别的计算量,从而提高图像识别的效率。其中,自移动设备可以是割草机、巡逻机器人和扫雷机器人等小型自移动设备,或者是环卫清洁自移动设备、送餐自移动设备和农业播种自移动设备。Embodiments of the present application provide an image recognition method, device, mobile device and storage medium. Image recognition methods can reduce the calculation amount of image recognition, thereby improving the efficiency of image recognition. Among them, the self-mobile equipment can be small self-mobile equipment such as lawn mowers, patrol robots, and mine-clearing robots, or self-mobile equipment for sanitation and cleaning, self-mobile food delivery equipment, and self-mobile equipment for agricultural sowing.
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The following embodiments and features in the embodiments may be combined with each other without conflict.
请参阅图1,图1是本申请实施例提供的一种图像识别方法的示意流程图。该图像识别方法通过根据待识别图像的图像通道对待识别图像的图像特征图进行采样,得到通道特征图,并将通道特征图作为卷积处理的输入,得到识别结果,能够降低在进行卷积处理时输入的通道特征图的大小,进而减少图像识别的计算量,从而提高图像识别的效率。Please refer to Figure 1, which is a schematic flow chart of an image recognition method provided by an embodiment of the present application. This image recognition method obtains the channel feature map by sampling the image feature map of the image to be recognized according to the image channel of the image to be recognized, and uses the channel feature map as the input of the convolution process to obtain the recognition result, which can reduce the need for convolution processing. The size of the channel feature map input at the same time can reduce the calculation amount of image recognition and improve the efficiency of image recognition.
如图1所示,该图像识别方法,具体包括步骤S101至步骤S104。As shown in Figure 1, the image recognition method specifically includes steps S101 to S104.
S101、获取待识别图像的图像特征图,图像特征图包括多个图像通道。S101. Obtain the image feature map of the image to be recognized. The image feature map includes multiple image channels.
可以通过设置在自移动设备上的摄像装置来获取摄像装置所拍摄的待识别图像。例如,当自移动设备为割草机时,可以通过割草机上设置的摄像头来获取待识别图像,此时待识别图像为割草机所拍摄到的草地图像。或者通过蓝牙、WiFi或本地上传待识别图像,待识别图像的获取方式此处不做限定。图像特征图用于表征待识别图像的图像特征,既可以是待识别图像的原图,也可以是待识别图像经过特征提取之后得到的图,图像特征可以包括图像的颜色特征、纹理特征、形状特征和空间关系特征。The image to be recognized captured by the camera device can be obtained through the camera device provided on the mobile device. For example, when the mobile device is a lawn mower, the image to be recognized can be obtained through a camera installed on the lawn mower. In this case, the image to be recognized is a grass image captured by the lawn mower. Or upload the image to be recognized through Bluetooth, WiFi or locally. The method of obtaining the image to be recognized is not limited here. The image feature map is used to characterize the image features of the image to be recognized. It can be the original image of the image to be recognized, or the image obtained after feature extraction of the image to be recognized. The image features can include the color features, texture features, and shape of the image. features and spatial relationship features.
图像特征图中包括多个图像通道,将每个图像通道所对应的通道图进行叠加构成了图像特征图。请参阅图2,图2是本申请实施例提供的图像特征图的示意图。如图2中所示,图像通道中可以包括图像通道1、图像通道2和图像通道3三个通道,将图像通道1、图像通道2和图像通道3各自对应的通道图叠加构成图像特征图。在具体实施过程中,多个图像通道例如可以是使用opencv(Open Source Computer Vision Library,跨平台的计算机视觉库)读取出的B图像通道、G图像通道和R图像通道三个图像通道。The image feature map includes multiple image channels, and the channel maps corresponding to each image channel are superimposed to form an image feature map. Please refer to Figure 2. Figure 2 is a schematic diagram of an image feature map provided by an embodiment of the present application. As shown in Figure 2, the image channel may include three channels: image channel 1, image channel 2, and image channel 3. The corresponding channel maps of image channel 1, image channel 2, and image channel 3 are superimposed to form an image feature map. In the specific implementation process, the multiple image channels may be, for example, three image channels: B image channel, G image channel and R image channel read using opencv (Open Source Computer Vision Library, a cross-platform computer vision library).
S102、对每个图像通道的像素点进行采样,得到每个图像通道对应的多个通道特征图,每个图像通道对应的多个通道特征图的像素点总数等于每个图像通道的像素点总数。S102. Sample the pixels of each image channel to obtain multiple channel feature maps corresponding to each image channel. The total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to the total number of pixels of each image channel. .
针对图像特征图中的每一个图像通道,都对图像通道的通道图的像素点进行采样,从而得到每个图像通道所对应的多个通道特征图。For each image channel in the image feature map, the pixel points of the channel map of the image channel are sampled, thereby obtaining multiple channel feature maps corresponding to each image channel.
在进行采样后,得到的多个通道特征图中像素点的总数要与图像通道对应的通道图的像素点数相同,也即是说,在进行采样时,要将通道图中的每个像素点都进行采样,以保证采样得到的通道特征图中像素点的完整,进而保证待识别图像中图像信息的完整,不丢失图像信息。After sampling, the total number of pixels in the multiple channel feature maps obtained must be the same as the number of pixels in the channel map corresponding to the image channel. That is to say, when sampling, each pixel in the channel map must be Sampling is performed to ensure the integrity of the pixels in the sampled channel feature map, thereby ensuring the integrity of the image information in the image to be recognized without losing image information.
请参阅图3a和图3b,图3a是本申请实施例提供的一图像通道的通道图的示意图,图3b是本申请实施例提供的该图像通道对应的通道特征图的示意图。如图3a中所示,该通道图中像素点上的数字表示像素点的标号。Please refer to Figures 3a and 3b. Figure 3a is a schematic diagram of a channel diagram of an image channel provided by an embodiment of the present application, and Figure 3b is a schematic diagram of a channel feature map corresponding to the image channel provided by an embodiment of the present application. As shown in Figure 3a, the numbers on the pixels in the channel map represent the labels of the pixels.
在对该图像通道进行像素点采样时,可以将像素点1、像素点3、像素点9和像素点11作为采样得到的像素点,得到该图像通道对应的通道特征图11;将像素点2、像素点4、像素点10和像素点12作为采样得到的像素点,得到该图像通道对应的通道特征图12;将像素点5、像素点7、像素点13和像素点15作 为采样得到的像素点,得到该图像通道对应的通道特征图13;将像素点6、像素点8、像素点14和像素点16作为采样得到的像素点,得到该图像通道对应的通道特征图14。When sampling pixels for this image channel, pixel 1, pixel 3, pixel 9 and pixel 11 can be used as sampled pixels to obtain the channel feature map 11 corresponding to the image channel; pixel 2 , pixel 4, pixel 10 and pixel 12 are used as sampled pixels to obtain the channel feature map 12 corresponding to the image channel; pixel 5, pixel 7, pixel 13 and pixel 15 are used as sampled pixel point, the channel feature map 13 corresponding to the image channel is obtained; using pixel point 6, pixel point 8, pixel point 14 and pixel point 16 as the sampled pixel points, the channel feature map 14 corresponding to the image channel is obtained.
在进行多次采样,直至该图像通道的通道图中的像素点被全部采样后,停止采样,并将得到的通道特征图11、通道特征图12、通道特征图13和通道特征图14作为该图像通道所对应的通道特征图。After multiple samplings are performed until all the pixels in the channel map of the image channel are sampled, the sampling is stopped, and the obtained channel feature map 11, channel feature map 12, channel feature map 13 and channel feature map 14 are used as the The channel feature map corresponding to the image channel.
在一实施例中,请参阅图4,图4是本申请实施例提供的得到通道特征图的示意流程图。如图4中所示,步骤S102可以包括步骤S1021和步骤S1022。In one embodiment, please refer to FIG. 4 , which is a schematic flow chart for obtaining a channel characteristic map provided by an embodiment of the present application. As shown in Figure 4, step S102 may include step S1021 and step S1022.
S1021、对每个图像通道的像素点进行分组,得到每个图像通道的多个采样像素点组。S1021. Group the pixels of each image channel to obtain multiple sampled pixel groups of each image channel.
对于每一个图像通道,对该图像通道的通道图的像素点进行分组,得到每个图像通道所对应的多个采样像素点组。其中,每个采样像素点组中的像素点个数相同。For each image channel, the pixels of the channel map of the image channel are grouped to obtain multiple sampling pixel groups corresponding to each image channel. Among them, the number of pixels in each sampling pixel group is the same.
在具体实施过程中,可以对每个图像通道的通道图的像素点进行有序的采样分组。其中,采样方式可以是以一个像素点为起点进行间隔采样,还可以是以一组相邻的像素点为起点进行间隔采样。在具体实施过程中,当以一组相邻的像素点为起点进行间隔采样时,可以是将多个相邻的像素点作为一组,并以该组像素点作为起点进行间隔采样。例如,可以将图3a中相邻的像素点1和像素点5作为一组,进行隔1采样,那么采样得到的像素点组中包括像素点3和像素点7。In the specific implementation process, the pixels of the channel map of each image channel can be sampled and grouped in an orderly manner. The sampling method may be to perform interval sampling with one pixel as the starting point, or to perform interval sampling with a group of adjacent pixels as the starting point. In a specific implementation process, when performing interval sampling with a group of adjacent pixels as the starting point, multiple adjacent pixels may be grouped as a group, and interval sampling may be performed with this group of pixels as the starting point. For example, the adjacent pixels 1 and 5 in Figure 3a can be taken as a group and sampled every other pixel. Then, the sampled pixel group includes pixels 3 and 7.
在一实施例中,对每个所述图像通道的像素点进行分组,得到每个所述图像通道的多个采样像素点组的步骤可以包括:In one embodiment, the step of grouping the pixels of each image channel to obtain multiple sampled pixel groups of each image channel may include:
根据预设采样间隔N在图像通道的所有像素点中确定起始像素点集,起始像素点集为包括(N+1)*(N+1)个像素点的像素矩阵,像素矩阵的首个像素点为对应图像通道任一边界顶点;以起始像素点集中每个像素点为起始像素点,以预设采样间隔对图像通道的像素点进行采样,得到每个起始像素点的采样像素点组。The starting pixel point set is determined among all pixels in the image channel according to the preset sampling interval N. The starting pixel point set is a pixel matrix including (N+1)*(N+1) pixels. The first pixel point set of the pixel matrix Each pixel point is any boundary vertex of the corresponding image channel; taking each pixel point in the starting pixel point set as the starting pixel point, the pixel points of the image channel are sampled at the preset sampling interval, and the pixel point of each starting pixel point is obtained. Sample a group of pixels.
在进行采样时,对于一图像通道,首先从该图像通道的通道图中的像素点中确定起始像素点集,起始像素点集中包括多个像素点,起始像素点集中的每个像素点都是每次对该图像通道的通道图中的像素点进行采样时的采样起点, 即每次采样的起始像素点。When sampling, for an image channel, first determine the starting pixel set from the pixels in the channel map of the image channel. The starting pixel set includes multiple pixels. Each pixel in the starting pixel set The point is the starting point of sampling every time the pixels in the channel map of the image channel are sampled, that is, the starting pixel of each sampling.
在本申请实施例中,可以根据图像通道的通道图的边界和预设采样间隔N确定起始像素点集,即以为包括(N+1)*(N+1)个像素点的像素矩阵,像素矩阵的首个像素点为对应图像通道任一边界顶点。In the embodiment of the present application, the starting pixel point set can be determined according to the boundary of the channel map of the image channel and the preset sampling interval N, that is, it is a pixel matrix including (N+1)*(N+1) pixel points, The first pixel point of the pixel matrix is any boundary vertex of the corresponding image channel.
在具体实施过程中,可以根据图像通道的通道图的任一边界顶点来确定一起始像素点,然后基于该起始像素点,按照预设采样间隔选择与该起始像素点附近的像素点也作为起始像素点,从而得到起始像素点集。In the specific implementation process, a starting pixel point can be determined based on any boundary vertex of the channel graph of the image channel, and then based on the starting pixel point, pixel points near the starting pixel point are selected according to the preset sampling interval. As the starting pixel point, the starting pixel point set is obtained.
例如图3a中所示,可以根据图像通道的通道图的左上边界顶点中的像素点1、右上边界顶点中的像素点4、左下边界顶点中的像素点13和右下顶点中的像素点16,可以选择任一边界顶点作为像素矩阵的首个像素点。For example, as shown in Figure 3a, the channel map of the image channel can be based on the pixel point 1 in the upper left boundary vertex, the pixel point 4 in the upper right boundary vertex, the pixel point 13 in the lower left boundary vertex, and the pixel point 16 in the lower right boundary vertex. , you can select any boundary vertex as the first pixel point of the pixel matrix.
若预设采样间隔N为1,以左上边界顶点中的像素点1为首个像素点,则获取(1+1)*(1+1)的像素矩阵,也即与1所在像素矩阵中的像素点2、像素点5和像素点6,并将这三个像素点也作为起始像素点,此时得到的起始像素点集中共有四个像素点,分别是像素点1、像素点2、像素点5和像素点6。If the preset sampling interval N is 1, and pixel 1 in the upper left boundary vertex is the first pixel, the pixel matrix of (1+1)*(1+1) is obtained, which is the pixel in the pixel matrix where 1 is located. Point 2, pixel point 5 and pixel point 6, and these three pixel points are also used as the starting pixel points. At this time, the obtained starting pixel point set has a total of four pixel points, namely pixel point 1, pixel point 2, Pixel 5 and Pixel 6.
若预设采样间隔N为1,以右上边界顶点中的像素点4为首个像素点,则获取(1+1)*(1+1)的像素矩阵,也即与4所在像素矩阵中的像素点3、像素点7和像素点8,并将这三个像素点也作为起始像素点,此时得到的起始像素点集中共有四个像素点,分别是像素点3、像素点4、像素点7和像素点8。If the preset sampling interval N is 1, and pixel 4 in the upper right boundary vertex is the first pixel, then the pixel matrix of (1+1)*(1+1) is obtained, which is the pixel in the pixel matrix where 4 is located. Point 3, pixel point 7 and pixel point 8, and these three pixel points are also used as the starting pixel points. At this time, the obtained starting pixel point set has a total of four pixel points, namely pixel point 3, pixel point 4, Pixel point 7 and pixel point 8.
确定起始像素点集后,根据起始像素点集中的起始像素点和预设采样间隔进行采样。具体来说,使用起始像素点集中的一个像素点作为采样起点,按照预设采样间隔对该图像通道的通道图中的像素点进行采样,并在本次采样完成时将本次采样得到的像素点作为一个采样像素点组,然后使用起始像素点集中的另一个像素点作为采样起点进行下一次采样,直至起始像素点集中的所有像素点都被使用完毕,即可得到一个图像通道所对应的多个采样像素点组。对于每个图像通道都执行上述过程,即可得到每个图像通道所对应的多个采样像素点组。After determining the starting pixel point set, sampling is performed based on the starting pixel point in the starting pixel point set and the preset sampling interval. Specifically, a pixel in the starting pixel set is used as the starting point for sampling, the pixels in the channel map of the image channel are sampled according to the preset sampling interval, and when this sampling is completed, the pixels obtained by this sampling are The pixels are used as a sampling pixel group, and then another pixel in the starting pixel set is used as the sampling starting point for the next sampling, until all pixels in the starting pixel set are used, and an image channel can be obtained The corresponding multiple sampling pixel groups. By executing the above process for each image channel, multiple sampling pixel groups corresponding to each image channel can be obtained.
在具体实施过程中,预设采样间隔可以是预先设置好的,预设采样间隔数值的设置需要使得在采样完成后,每个采样像素点组中的像素点个数相同。In a specific implementation process, the preset sampling interval may be set in advance, and the value of the preset sampling interval needs to be set so that after the sampling is completed, the number of pixels in each sampling pixel group is the same.
由于预设采样间隔数值的设置需要保证在采样完成后,每个采样像素点组中的像素点个数相,因此,在一实施例中,所述获取待识别图像的图像特征图 的步骤可以包括:对待识别图像进行预处理,所述预处理包括调整待识别图像的分辨率;将预处理后的待识别图像进行特征提取,得到图像特征图。Since the setting of the preset sampling interval value needs to ensure that the number of pixels in each sampling pixel group is the same after the sampling is completed, therefore, in one embodiment, the step of obtaining the image feature map of the image to be recognized can be It includes: preprocessing the image to be recognized, the preprocessing including adjusting the resolution of the image to be recognized; performing feature extraction on the preprocessed image to be recognized to obtain an image feature map.
其中,对待识别图像进行预处理还可以包括对待识别图像进行裁剪、平滑处理、滤波处理等,通过对待识别图像进行预处理,能够消除待识别图像中无关的信息,提取有用的真实信息,增强有关信息的可检测性和最大限度地简化数据,从而提高图像识别的可靠性。Among them, preprocessing the image to be recognized can also include cropping, smoothing, filtering, etc. of the image to be recognized. By preprocessing the image to be recognized, irrelevant information in the image to be recognized can be eliminated, useful real information can be extracted, and relevant information can be enhanced. Detectability of information and maximizing data simplification, thereby increasing the reliability of image recognition.
对于一幅图像来说,图像的分辨率越高,图像中像素点的个数越多,图像也就越清晰。在对待识别图像进行图像识别之前,调整待识别图像的分辨率,将需要进行图像识别的待识别图像调整为预设的分辨率,从而使对待识别图像进行图像识别时,参与图像识别的像素点的个数相同,也就无需根据待识别图像的分辨率大小来调节预设采样间隔的数值,从而能够在后续进行图像识别时,减少图像识别的计算量。也就是说,通过对待识别图像的预处理使得待识别图像中像素点的个数一致,进而在对待识别图像进行采样时,保证以相同的采样间隔能够适应全部的待识别图像,而无需根据每个待识别图像的大小来调整采样间隔,进而减少图像识别时的计算量,从而提高图像识别的效率。For an image, the higher the resolution of the image, the more pixels there are in the image, and the clearer the image. Before image recognition is performed on the image to be recognized, the resolution of the image to be recognized is adjusted, and the image to be recognized that needs to be image recognized is adjusted to the preset resolution, so that when the image to be recognized is subjected to image recognition, the pixels involved in the image recognition are The number of samples is the same, so there is no need to adjust the value of the preset sampling interval according to the resolution of the image to be recognized, which can reduce the calculation amount of image recognition during subsequent image recognition. That is to say, by preprocessing the image to be recognized, the number of pixels in the image to be recognized is made consistent, and then when sampling the image to be recognized, it is ensured that the same sampling interval can adapt to all the images to be recognized, without the need to The sampling interval is adjusted according to the size of each image to be recognized, thereby reducing the amount of calculation during image recognition and thereby improving the efficiency of image recognition.
下面结合图3a和图3b对上述的采样方式进行举例说明。若将预设采样间隔设置为1,且起始像素点集中的起始像素点包括像素点1、像素点2、像素点5和像素点6,那么采样过程为:The above-mentioned sampling method will be illustrated below with reference to Figures 3a and 3b. If the preset sampling interval is set to 1, and the starting pixels in the starting pixel set include pixel 1, pixel 2, pixel 5 and pixel 6, then the sampling process is:
将起始像素点集中的像素点1作为起始像素点进行采样,从像素点1开始进行横向的隔1采样,得到的采样像素点为像素点3,此时在第一排像素点中无其他可采样点;从像素点1所在位置进行纵向的隔1采样,得到的采样像素点为像素点9,然后再进行横向的隔1采样,得到的采样像素点为像素点11,此时第三排像素点中无其他可采样点;此时在以像素点1作为起始像素点时,该图像通道的通道图中的像素点无其他可采样点,完成本次采样,得到以像素点1为起始像素点的采样像素点组,该采样像素点组中包括像素点1、像素点3、像素点9和像素点11。 Pixel 1 in the starting pixel set is used as the starting pixel for sampling. Starting from pixel 1, horizontal sampling is performed every other pixel. The resulting sampled pixel is pixel 3. At this time, there is no pixel in the first row of pixels. Other sampleable points; perform longitudinal sampling every other pixel from the position of pixel 1, and the resulting sampled pixel is pixel 9, and then perform horizontal sampling every other pixel, and the obtained sampled pixel is pixel 11. At this time, the sampled pixel is pixel 11. There are no other sampleable points in the three rows of pixels; at this time, when pixel 1 is used as the starting pixel, the pixels in the channel map of the image channel have no other sampleable points. After completing this sampling, the pixels are obtained 1 is the sampling pixel group of the starting pixel, and the sampling pixel group includes pixel 1, pixel 3, pixel 9 and pixel 11.
将起始像素点集中的像素点2作为起始像素点进行采样,从像素点2开始进行横向的隔1采样,得到的采样像素点为像素点4,此时在第一排像素点中无其他可采样的像素点;从像素点2所在位置进行纵向的隔1采样,得到的采样的像素点为像素点10,然后再进行横向的隔1采样,得到的采样的像素点为像 素点12,此时第三排像素点中无其他可采样点;此时在以像素点2作为起始像素点时,该图像通道的通道图中的像素点无其他可采样点,完成本次采样,得到以像素点2为起始像素点的采样像素点组,该采样像素点组中包括像素点2、像素点4、像素点10和像素点12。 Pixel 2 in the starting pixel set is used as the starting pixel for sampling. Starting from pixel 2, horizontal sampling is performed every 1. The resulting sampled pixel is pixel 4. At this time, there is no pixel in the first row of pixels. Other sampleable pixels; perform vertical sampling every 1 from the position of pixel 2, and the resulting sampled pixel is pixel 10, and then perform horizontal sampling every 1, and the resulting sampled pixel is pixel 12 , at this time, there are no other sampleable points in the third row of pixels; at this time, when pixel 2 is used as the starting pixel, the pixels in the channel map of the image channel have no other sampleable points, and this sampling is completed. A sampling pixel group starting from pixel 2 is obtained. The sampling pixel group includes pixel 2, pixel 4, pixel 10 and pixel 12.
将起始像素点集中的像素点5作为起始像素点进行采样,从像素点5开始进行横向的隔1采样,得到的采样像素点为像素点7,此时在第一排像素点中无其他可采样的像素点;从像素点5所在位置进行纵向的隔1采样,得到的采样像的素点为像素点13,然后再进行横向的隔1采样,得到的采样的像素点为像素点15,此时第三排像素点中无其他可采样的像素点;此时在以像素点5作为起始像素点点时,该图像通道的通道图中的像素点无其他可采样的像素点,完成本次采样,得到以像素点5为起始像素点的采样像素点组,该采样像素点组中包括像素点5、像素7、像素点13和像素点15。 Pixel 5 in the starting pixel set is used as the starting pixel for sampling, and horizontal sampling is performed every 1 starting from pixel 5. The resulting sampled pixel is pixel 7. At this time, there is no pixel in the first row of pixels. Other sampleable pixels; perform vertical sampling every 1 from the position of pixel 5, and the pixel of the sampled image obtained is pixel 13, and then perform horizontal sampling every 1, and the sampled pixel obtained is pixel 13. 15. At this time, there are no other sampleable pixels in the third row of pixels; at this time, when pixel 5 is used as the starting pixel, the pixels in the channel map of the image channel have no other sampleable pixels. After completing this sampling, a sampling pixel group starting from pixel 5 is obtained. The sampling pixel group includes pixel 5, pixel 7, pixel 13 and pixel 15.
将起始像素点集中的像素点6作为起始像素点进行采样,从像素点6开始进行横向的隔1采样,得到的采样的像素点为像素点8,此时在第一排像素点中无其他可采样的像素点;从像素点6所在位置进行纵向的隔1采样,得到的采样的像素点为像素点14,然后再进行横向的隔1采样,得到的采样的像素点为像素点16,此时第三排像素点中无其他可采样点;此时在以像素点6作为起始像素点时,该图像通道的通道图中的像素点无其他可采样的像素点,完成本次采样,得到以像素点6为起始像素点的采样像素点组,该采样像素点组中包括像素点6、像素点8、像素点14和像素点16。Pixel 6 in the starting pixel set is used as the starting pixel for sampling. Starting from pixel 6, horizontal sampling is performed every 1. The resulting sampled pixel is pixel 8. At this time, in the first row of pixels There are no other pixels that can be sampled; perform vertical sampling every 1 from the position of pixel 6, and the resulting sampled pixel is pixel 14, and then perform horizontal sampling every 1, and the resulting sampled pixel is pixel 14. 16. At this time, there are no other sampleable points in the third row of pixels; at this time, when pixel 6 is used as the starting pixel, the pixels in the channel map of the image channel have no other sampleable pixels. This completes the process. After sub-sampling, a sampling pixel group starting from pixel 6 is obtained. The sampling pixel group includes pixel 6, pixel 8, pixel 14 and pixel 16.
S1022、将每个图像通道的采样像素点组中的像素点进行组合,得到每个图像通道对应的多个通道特征图。S1022. Combine the pixels in the sampled pixel group of each image channel to obtain multiple channel feature maps corresponding to each image channel.
在得到每个图像通道的多个采样像素点组后,将每个采样像素点组中的像素点组合为图像,即可得到每个图像通道对应的多个通道特征图。也就是说,对于每一个图像通道,其采样像素点的个数与得到的通道特征图的个数相同,例如一图像通道采样得到了4个像素点组,那么该图像通道对应的通道特征图的个数也为4个。After obtaining multiple sampling pixel groups of each image channel, the pixels in each sampling pixel group are combined into images to obtain multiple channel feature maps corresponding to each image channel. That is to say, for each image channel, the number of sampled pixels is the same as the number of obtained channel feature maps. For example, if an image channel samples four pixel groups, then the channel feature map corresponding to the image channel The number of is also 4.
在一实施例中,将每个图像通道的采样像素点组中的像素点进行组合,得到每个图像通道对应的多个通道特征图的步骤可以包括:In one embodiment, the step of combining pixels in the sampled pixel group of each image channel to obtain multiple channel feature maps corresponding to each image channel may include:
对每个采样像素点组,根据采样像素点组中各个像素点在图像特征图中的 位置,将各个像素点进行拼接,得到采样像素点组的通道特征图;完成所有采样像素点组的像素点拼接,得到每个图像通道对应的多个通道特征图。For each sampled pixel group, according to the position of each pixel in the sampled pixel group in the image feature map, the individual pixels are spliced to obtain the channel feature map of the sampled pixel group; complete the pixels of all sampled pixel groups Point splicing obtains multiple channel feature maps corresponding to each image channel.
对于每一个采样像素点组,在将采样像素点组中的像素点进行组合时,可以根据采样像素点组中各个像素点在图像特征图中的位置进行组合,以使得到的图像特征图中的特征信息不丢失。对每个图像通道中的多个像素点组中的像素点,分别按照各个像素点在图像特征图中的位置进行像素点拼接,并在所有的像素点组都拼接完毕后,得到每个图像通道对应的多个通道特征图。For each sampled pixel group, when combining the pixels in the sampled pixel group, the combination can be based on the position of each pixel in the sampled pixel group in the image feature map, so that the resulting image feature map The characteristic information is not lost. For the pixels in multiple pixel groups in each image channel, the pixels are spliced according to the position of each pixel in the image feature map, and after all the pixel groups are spliced, each image is obtained Multiple channel feature maps corresponding to the channel.
如图3a和图3b中所示,例如在将一采样像素点组中的像素点6、像素点8、像素点14和像素点16进行组合时,可以根据这四个像素点在图像特征图中的位置,也即像素点6在像素点8的左侧,像素点6在像素点14的上侧,像素点16在像素点14的右侧,像素点16在像素点8的下侧,基于该位置关系对这四个像素点进行拼接,即可得到如图3b中的通道特征图14。As shown in Figure 3a and Figure 3b, for example, when combining pixel 6, pixel 8, pixel 14 and pixel 16 in a sampled pixel group, the image feature map can be calculated based on these four pixels. position in , that is, pixel point 6 is on the left side of pixel point 8, pixel point 6 is on the upper side of pixel point 14, pixel point 16 is on the right side of pixel point 14, and pixel point 16 is on the lower side of pixel point 8, Based on the positional relationship, these four pixels are spliced to obtain the channel feature map 14 as shown in Figure 3b.
S103、将图像特征图对应的所有通道特征图作为卷积处理的输入特征图。S103. Use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing.
图像特征图所对应的所有通道特征图即为每个图像道对应的多个通道特征图的集合,将每个图像道对应的多个通道特征图进行叠加,将叠加后的通道特征图作为卷积处理的输入特征图,以使图像识别卷积网络能够对这些通道特征图进行卷积处理。All channel feature maps corresponding to the image feature map are a collection of multiple channel feature maps corresponding to each image channel. The multiple channel feature maps corresponding to each image channel are superimposed, and the superimposed channel feature map is used as a volume. The input feature maps of convolution processing are used to enable the image recognition convolutional network to convolve these channel feature maps.
在对图像特征图按照图像通道进行像素点采样后,将采样得到的像素点进行拼接作为通道特征图,缩小了每个通道特征图的大小,使得得到的通道特征图的大小相对于原图像特征图缩小了n倍,并且增加了输入特征图的图像通道的数量。After sampling the pixel points of the image feature map according to the image channel, the sampled pixel points are spliced as the channel feature map, reducing the size of each channel feature map, so that the size of the obtained channel feature map is relative to the original image feature The map is reduced by a factor of n, and the number of image channels of the input feature map is increased.
例如,分辨率大小为320*320,通道数为3的图像特征图,通过隔1采样的方式对像素点进行采样,并将得到的采样像素点进行组合,得到的每个通道特征图的大小为160*160,降低为原来图像特征图的1/4倍,通道特征图的宽度和高度都降低为原来的1/2倍同时,通过上述采样和组合后,每个通道图对应可以得到4个通道特征图,在将每个图像道对应的多个通道特征图进行叠加后,用于进行卷积处理的输入特征图的图像通道的数量扩大为原来的4倍。For example, for an image feature map with a resolution of 320*320 and a channel number of 3, the pixels are sampled every other sample and the sampled pixels are combined to obtain the size of each channel feature map. is 160*160, which is reduced to 1/4 times of the original image feature map. The width and height of the channel feature map are reduced to 1/2 times of the original. At the same time, after the above sampling and combination, each channel map can get 4 Channel feature maps, after superimposing multiple channel feature maps corresponding to each image channel, the number of image channels of the input feature map used for convolution processing is expanded to 4 times the original.
由此可见,通过按照图像通道进行像素点采样以及拼接,得到的通道特征图,保证了图像识别卷积网络的输入特征图没有信息损失,能够在保证图像信息不丢失的前提下,减小输入图像识别卷积网络的输入特征图的大小。并且对 每个图像道对应的多个通道特征图进行叠加,增加了用于进行卷积处理的输入特征图的图像通道的数量,也能够减少图像识别卷积网络在进行图像识别时的计算量,达到提高图像识别速度的目的。It can be seen that the channel feature map obtained by sampling and splicing pixels according to the image channel ensures that there is no information loss in the input feature map of the image recognition convolution network, and can reduce the input without ensuring that the image information is not lost. The size of the input feature map for the image recognition convolutional network. And overlaying multiple channel feature maps corresponding to each image channel increases the number of image channels of the input feature map used for convolution processing, and can also reduce the calculation amount of the image recognition convolution network when performing image recognition. , to achieve the purpose of improving image recognition speed.
S104、对输入特征图进行预设次卷积处理,得到待识别图像的识别结果。S104. Perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
图像识别卷积网络对输入特征图进行预设次数的卷积处理,从而得到待识别图像的识别结果。其中,待识别图像的识别结果包括识别出待识别图像中不同区域的图像边界,以及对待识别图像进行图像分割后的识别结果。The image recognition convolution network performs a preset number of convolution processes on the input feature map to obtain the recognition result of the image to be recognized. The recognition results of the image to be recognized include identifying image boundaries of different areas in the image to be recognized, and the recognition results after image segmentation of the image to be recognized.
在具体实施过程中,由于输入特征图中包括多个通道特征图,在进行卷积处理时,是分别对每个通道特征图进行卷积处理,也就会得到多个卷积处理结果,可以将多个卷积处理结果进行特征融合,从而得到最终的待识别图像的识别结果。在对每一个通道特征图进行卷积处理后,都会得到一个预识别结果,需要将多个通道特征图所对应的多个预识别结果进行拼接,才能得到最终的识别结果。In the specific implementation process, since the input feature map includes multiple channel feature maps, when performing convolution processing, each channel feature map is convolved separately, and multiple convolution processing results will be obtained. Feature fusion is performed on multiple convolution processing results to obtain the final recognition result of the image to be recognized. After convolution processing is performed on each channel feature map, a pre-recognition result will be obtained. Multiple pre-recognition results corresponding to multiple channel feature maps need to be spliced to obtain the final recognition result.
在具体实施过程中,例如可以采用resnet50残差网络作为图像识别卷积网络的主干网络,对分辨率为320*320,图像通道数为3的待识别图像进行图像识别,并且,在resnet50网络的第二层中对待识别图像的图像特征图进行处理,从而得到通道特征图,并将通道特征图作为resnet50网络的输入数据,进行卷积处理。In the specific implementation process, for example, the resnet50 residual network can be used as the backbone network of the image recognition convolution network to perform image recognition on the image to be recognized with a resolution of 320*320 and a number of image channels of 3, and in the resnet50 network In the second layer, the image feature map of the image to be recognized is processed to obtain the channel feature map, and the channel feature map is used as the input data of the resnet50 network for convolution processing.
如表1中所示,分别为将通道特征图作为resnet50网络的输入数据时图像识别卷积网络在每层的输出数据的大小和通道数,以及将待识别图像的图像特征图作为resnet50网络的输入数据时图像识别卷积网络在每层的输出数据的大小和通道数的对比。As shown in Table 1, they are the size and number of channels of the output data of the image recognition convolution network in each layer when the channel feature map is used as the input data of the resnet50 network, and the image feature map of the image to be recognized is used as the resnet50 network. Comparison of the size and number of channels of the output data of each layer of the image recognition convolutional network when inputting data.
表1Table 1
层数Number of layers | 通道特征图为输入数据The channel feature map is the input data |
图像特征图为输入数据The image feature map is the |
11 | 320*320*32320*320*32 | 320*320*32320*320*32 |
22 | 160*160*64160*160*64 | 320*320*64320*320*64 |
33 | 80*80*12880*80*128 | 160*160*128160*160*128 |
44 | 40*40*25640*40*256 | 80*80*25680*80*256 |
55 | 20*20*51220*20*512 | 40*40*51240*40*512 |
66 | 10*10*102410*10*1024 | 20*20*102420*20*1024 |
由于在resnet50网络的第二层对待识别图像的图像特征图进行处理,得到了通道特征图,并将得到的通道特征图作为第二层的输入数据,因此从表1中可以看出,使用图像特征图作为resnet50网络的输入数据与使用通道特征图作为resnet50网络的输入数据相比,虽然两者在resnet50网络的每层得到的输出数据的输出通道数相同,但输出特征图的大小相差4倍,也即,使用通道特征图作为resnet50网络的输入数据时,resnet50网络从第二层开始每层的输出特征图的大小,仅仅是使用图像特征图作为resnet50网络的输入数据所得到的输出特征图大小的1/4,输出特征图的大小变小使得卷积的计算量更小,模型推理速度更快。Since the image feature map of the image to be recognized is processed in the second layer of the resnet50 network, the channel feature map is obtained, and the obtained channel feature map is used as the input data of the second layer. Therefore, it can be seen from Table 1 that using the image Compared with using the feature map as the input data of the resnet50 network and using the channel feature map as the input data of the resnet50 network, although the output data obtained by the two in each layer of the resnet50 network has the same number of output channels, the size of the output feature map differs by 4 times. , that is, when using the channel feature map as the input data of the resnet50 network, the size of the output feature map of each layer of the resnet50 network starting from the second layer is only the output feature map obtained by using the image feature map as the input data of the resnet50 network. 1/4 of the size, the smaller size of the output feature map makes the calculation amount of convolution smaller and the model inference speed is faster.
可以理解的是,对于不同深度的卷积网络,可以选择在卷积网络的不同深度位置对待识别图像的图像特征图进行处理,从而得到通道特征图,例如,可以在resnet50网络的第2层对待识别图像的图像特征图进行处理,得到通道特征图。It can be understood that for convolutional networks of different depths, you can choose to process the image feature map of the image to be recognized at different depth positions of the convolutional network to obtain the channel feature map. For example, you can process it at the second layer of the resnet50 network. The image feature map of the identified image is processed to obtain the channel feature map.
由此可见,对待识别图像的图像特征图进行处理,并将得到的通道特征图作为输入特征图,能够降低每层卷积网络输出数据的大小,而根据模型参数计算公式可知,模型参数量param越小,卷积的计算量也就越小,进而使得图像识别卷积网络的推理速度更快。It can be seen that processing the image feature map of the image to be recognized and using the obtained channel feature map as the input feature map can reduce the size of the output data of each layer of the convolutional network. According to the model parameter calculation formula, the model parameter amount param The smaller it is, the smaller the calculation amount of convolution is, which in turn makes the inference speed of the image recognition convolutional network faster.
其中,模型参数计算公式为:param=n*(c*h*w),其中,n表示图像识别卷积网络的层数,c表示图像识别卷积网络每层输入数据的图像通道数,w表示图像识别卷积网络每层输入数据的宽,h表示图像识别卷积网络每层输入数据的高。Among them, the model parameter calculation formula is: param=n*(c*h*w), where n represents the number of layers of the image recognition convolution network, c represents the number of image channels of input data for each layer of the image recognition convolution network, and w represents the width of the input data of each layer of the image recognition convolutional network, and h represents the height of the input data of each layer of the image recognition convolutional network.
在本申请实施例中,继续以割草机器人为例,除了可以采用上述的resnet50网络作为图像识别卷积网络的主干网络外,也可以采用实例分割算法模型Yolact,同样通过对分辨率为320*320,图像通道数为3的待识别图像进行图像识别,并且,对Yolact模型中的浅层网络层中图像特征图的图像通道的像素点进行采样,从而得到通道特征图,进而将通道特征图作为浅层网络层的下一网络层的输入数据,进行卷积处理,识别速度达到6FPS到8FPS之间,准确率几乎不变,而现有识别可实行区域的速度2FPS到3FPS之间。In the embodiment of this application, continuing to take the lawn mowing robot as an example, in addition to using the above-mentioned resnet50 network as the backbone network of the image recognition convolution network, the instance segmentation algorithm model Yolact can also be used, and the resolution is 320* 320. Perform image recognition on the image to be recognized with an image channel number of 3, and sample the pixel points of the image channel of the image feature map in the shallow network layer in the Yolact model to obtain the channel feature map, and then convert the channel feature map As the input data of the next network layer of the shallow network layer, convolution processing is performed, and the recognition speed reaches between 6FPS and 8FPS, and the accuracy rate is almost unchanged, while the existing recognition speed of the feasible area is between 2FPS and 3FPS.
在一实施例中,对输入特征图进行预设次卷积处理,以得到待识别图像的识别结果的步骤可以包括:对输入特征图进行预设次的卷积计算;将卷积计算 得到的结果进行归一化处理,得到待识别图像的识别结果。In one embodiment, the step of performing a preset number of convolution processes on the input feature map to obtain the recognition result of the image to be recognized may include: performing a preset number of convolution calculations on the input feature map; The results are normalized to obtain the recognition result of the image to be recognized.
对输入图像识别卷积网络的输入特征图进行预设次的卷积计算,然后将卷积计算的结果进行归一化处理,在归一化处理后得到待识别图像的识别结果。Perform a preset number of convolution calculations on the input feature map of the input image recognition convolution network, and then normalize the results of the convolution calculation. After the normalization process, the recognition result of the image to be recognized is obtained.
在一实施例中,待识别图像的识别结果包括目标区域。在步骤S104之后,该图像识别方法还包括:获取目标区域的颜色特征;根据颜色特征对目标区域进行掩膜处理,得到目标区域的掩模图。在一实施例中,待识别图像的识别结果包括目标区域。在步骤S104之后,该图像识别方法还包括:获取目标区域的标签;根据目标区域的标签对目标区域进行掩膜处理,得到目标区域的掩模图。In one embodiment, the recognition result of the image to be recognized includes the target area. After step S104, the image recognition method also includes: obtaining the color characteristics of the target area; performing masking processing on the target area according to the color characteristics to obtain a mask map of the target area. In one embodiment, the recognition result of the image to be recognized includes the target area. After step S104, the image recognition method also includes: obtaining a label of the target area; performing masking processing on the target area according to the label of the target area to obtain a mask image of the target area.
其中,目标区域是指自移动设备的可移动区域。当自移动设备为割草机器人时,自移动设备获取到的待识别图像如图5a中所示,可以将绿草区域和黄色枯草区域作为割草机器人的可移动区域,可移动区域表示割草机器人作业的草地区域,不可移动区域为非草地区域,例如树木,沿边小道,移动的车等。在经过图像识别后,得到的图像识别结果如图5b中所示,其中,目标区域为图5b中的白色区域,即为可移动区域,黑色区域为不可移动区域,白色区域和黑色区域的交界处即为草地边缘。或者按照标签对目标区域进行标记,例如,对可移动区域均标记为1,将不可移动区域标记为0。将同一颜色特征的目标区域做同样的掩膜,或者将同一标签的目标区域做相同的掩膜。当割草机器人在可移动区域内移动时,由于可移动区域内可能存在障碍物等影响割草机器人移动的物体,因此,可以获取目标区域的颜色特征,从而根据目标区域的颜色特征对目标区域进行掩膜处理,或者根据目标区域的标签对目标区域进行掩膜处理,以得到目标区域的掩膜图,并可以根据掩膜图呈现草地边缘,使得在对目标区域内的物体进行识别时,避免割草机器人走出草地边缘,便于进行路径规划或障碍物识别等,进而提高路径规划与自动避障的效率。上述实施例提供的图像识别方法,对图像特征图中的每个图像通道的像素点进行采样,使得获取的通道特征图中的像素点不仅保留了图像特征图的信息,还增加了图像通道的数量,保证了待识别图像中图像信息的完整性,同时,由于经过采样得到的通道特征图的尺寸小于图像特征图,使得当将通道特征图作为卷积处理的输入特征图时,在不丢失图像信息的基础上,大大减少了图像识别的计算量,从而提高图像识别的效率。Among them, the target area refers to the movable area from the mobile device. When the mobile device is a lawn mower robot, the image to be recognized obtained from the mobile device is shown in Figure 5a. The green grass area and the yellow dead grass area can be used as the movable area of the lawn mower robot, and the movable area represents the lawn mower. The grass area where the robot operates, the immovable area is the non-grass area, such as trees, side paths, moving vehicles, etc. After image recognition, the obtained image recognition result is shown in Figure 5b, where the target area is the white area in Figure 5b, which is the movable area, the black area is the immovable area, and the boundary between the white area and the black area This is the edge of the grassland. Or mark the target area according to the label. For example, mark the movable areas as 1 and mark the immovable areas as 0. Make the same mask for the target area with the same color feature, or use the same mask for the target area with the same label. When the lawn mower robot moves in the movable area, since there may be obstacles and other objects in the movable area that affect the movement of the lawn mower robot, the color characteristics of the target area can be obtained, so that the target area can be mapped based on the color characteristics of the target area. Mask processing is performed, or the target area is masked according to the label of the target area to obtain a mask image of the target area, and the edge of the grass can be presented based on the mask image, so that when identifying objects in the target area, It prevents the lawn mower robot from walking out of the edge of the grass, which facilitates path planning or obstacle recognition, thereby improving the efficiency of path planning and automatic obstacle avoidance. The image recognition method provided by the above embodiment samples the pixel points of each image channel in the image feature map, so that the pixel points in the obtained channel feature map not only retain the information of the image feature map, but also increase the image channel quantity, ensuring the integrity of the image information in the image to be recognized. At the same time, since the size of the channel feature map obtained after sampling is smaller than the image feature map, when the channel feature map is used as the input feature map of the convolution process, no loss is achieved. On the basis of image information, the calculation amount of image recognition is greatly reduced, thereby improving the efficiency of image recognition.
请参阅图6,图6是本申请的实施例还提供一种图像识别装置的示意性框图, 该图像识别装置用于执行前述的图像识别方法。其中,该图像识别装置可以配置于自移动设备中。Please refer to Figure 6. Figure 6 is a schematic block diagram of an image recognition device further provided by an embodiment of the present application. The image recognition device is used to execute the aforementioned image recognition method. Wherein, the image recognition device can be configured in a mobile device.
其中,服务器可以为独立的服务器,也可以为服务器集群。该终端可以是手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等电子设备。Among them, the server can be an independent server or a server cluster. The terminal can be an electronic device such as a mobile phone, tablet, laptop, desktop computer, personal digital assistant, and wearable device.
如图6所示,图像识别装置200包括:图像获取模块201、通道采样模块202、输入确定模块203和卷积处理模块204。As shown in FIG. 6 , the image recognition device 200 includes: an image acquisition module 201 , a channel sampling module 202 , an input determination module 203 and a convolution processing module 204 .
图像获取模块201,被配置为获取待识别图像的图像特征图,所述图像特征图包括多个图像通道。The image acquisition module 201 is configured to acquire an image feature map of an image to be recognized, where the image feature map includes multiple image channels.
通道采样模块202,被配置为对每个所述图像通道的像素点进行采样,得到每个图像通道对应的多个通道特征图,每个所述图像通道对应的多个通道特征图的像素点总数等于每个图像通道的像素点总数。The channel sampling module 202 is configured to sample the pixel points of each image channel to obtain multiple channel feature maps corresponding to each image channel. The pixel points of the multiple channel feature maps corresponding to each image channel are The total number is equal to the total number of pixels in each image channel.
输入确定模块203,被配置为将所述图像特征图对应的所有通道特征图作为卷积处理的输入特征图。The input determination module 203 is configured to use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing.
卷积处理模块204,被配置为对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果。The convolution processing module 204 is configured to perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的图像识别装置和各模块的具体工作过程,可以参考前述图像识别方法实施例中的对应过程,在此不再赘述。It should be noted that those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the above-described image recognition device and each module can be referred to the corresponding processes in the foregoing image recognition method embodiments. I won’t go into details here.
上述的图像识别装置可以实现为一种计算机程序的形式,该计算机程序可以在如图7所示的自移动设备上运行。The above image recognition device can be implemented in the form of a computer program, and the computer program can be run on a mobile device as shown in Figure 7.
请参阅图7,图7是本申请实施例提供的一种自移动设备的结构示意性框图。该自移动设备可以是割草机、巡逻机器人、扫雷机器人等小型自移动设备。Please refer to FIG. 7 , which is a schematic structural block diagram of a mobile device provided by an embodiment of the present application. The self-mobile device can be a small self-mobile device such as a lawnmower, a patrol robot, or a mine-clearing robot.
参阅图7,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口,其中,存储器可以包括非易失性存储介质和内存储器。Referring to Figure 7, the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种图像识别方法。Non-volatile storage media stores operating systems and computer programs. The computer program includes program instructions, which when executed, can cause the processor to perform any image recognition method.
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。The processor is used to provide computing and control capabilities to support the operation of the entire computer device.
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种图像识别方法。The internal memory provides an environment for the execution of the computer program in the non-volatile storage medium. When the computer program is executed by the processor, it can cause the processor to execute any image recognition method.
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。This network interface is used for network communication, such as sending assigned tasks, etc. Those skilled in the art can understand that the structure shown in Figure 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor can be a central processing unit (Central Processing Unit, CPU), and the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general processor may be a microprocessor or the processor may be any conventional processor.
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:Wherein, in one embodiment, the processor is used to run a computer program stored in the memory to implement the following steps:
获取待识别图像的图像特征图,所述图像特征图包括多个图像通道。An image feature map of the image to be recognized is obtained, where the image feature map includes multiple image channels.
对每个所述图像通道的像素点进行采样,得到每个所述图像通道对应的多个通道特征图,每个所述图像通道对应的多个通道特征图的像素点总数等于每个所述图像通道的像素点总数。The pixels of each image channel are sampled to obtain multiple channel feature maps corresponding to each image channel. The total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to each of the image channels. The total number of pixels in the image channel.
将所述图像特征图对应的所有通道特征图作为卷积处理的输入特征图。All channel feature maps corresponding to the image feature map are used as input feature maps for convolution processing.
对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果。Perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
在一个实施例中,所述处理器在实现所述对每个所述图像通道的像素点进行采样,得到每个所述图像通道对应的多个通道特征图时,用于实现:In one embodiment, when the processor implements the sampling of pixels of each of the image channels to obtain multiple channel feature maps corresponding to each of the image channels, the processor is used to implement:
对每个所述图像通道的像素点进行分组,得到每个所述图像通道的多个采样像素点组。The pixel points of each image channel are grouped to obtain multiple sampled pixel point groups of each image channel.
将每个所述图像通道的所述采样像素点组中的像素点进行组合,得到每个所述图像通道对应的多个通道特征图。The pixels in the sampling pixel group of each image channel are combined to obtain multiple channel feature maps corresponding to each image channel.
在一个实施例中,所述处理器在实现所述对每个所述图像通道的像素点进行分组,得到每个所述图像通道的多个采样像素点组时,用于实现:In one embodiment, when the processor implements the grouping of pixel points of each of the image channels to obtain multiple sampled pixel point groups of each of the image channels, the processor is configured to implement:
根据预设采样间隔N在所述图像通道的所有像素点中确定起始像素点集,所述起始像素点集为包括(N+1)*(N+1)个像素点的像素矩阵,所述像素矩阵的首个像素点为对应图像通道任一边界顶点。A starting pixel point set is determined among all pixels of the image channel according to the preset sampling interval N, and the starting pixel point set is a pixel matrix including (N+1)*(N+1) pixels, The first pixel point of the pixel matrix is any boundary vertex of the corresponding image channel.
以所述起始像素点集中每个像素点为起始像素点,以预设采样间隔对所述图像通道的像素点进行采样,得到每个所述起始像素点的采样像素点组。Taking each pixel point in the set of starting pixel points as a starting pixel point, the pixel points of the image channel are sampled at a preset sampling interval to obtain a sampled pixel point group for each starting pixel point.
在一个实施例中,所述处理器在实现所述将每个所述图像通道的所述采样像素点组中的像素点进行组合,得到每个所述图像通道对应的多个通道特征图时,用于实现:In one embodiment, when the processor implements the combination of pixels in the sampled pixel group of each image channel to obtain multiple channel feature maps corresponding to each image channel, , used to implement:
对每个所述采样像素点组,根据所述采样像素点组中各个像素点在所述图像特征图中的位置,将各个像素点进行拼接,得到所述采样像素点组的通道特征图。For each sampled pixel group, each pixel is spliced according to the position of each pixel in the sampled pixel group in the image feature map to obtain a channel feature map of the sampled pixel group.
完成所有所述采样像素点组的像素点拼接,得到每个所述图像通道对应的多个通道特征图。Complete the pixel splicing of all the sampled pixel groups to obtain multiple channel feature maps corresponding to each of the image channels.
在一个实施例中,所述处理器在实现所述获取待识别图像的图像特征图时,用于实现:In one embodiment, when implementing the acquisition of the image feature map of the image to be recognized, the processor is configured to implement:
对待识别图像进行预处理,所述预处理包括调整所述待识别图像的分辨率。The image to be recognized is preprocessed, and the preprocessing includes adjusting the resolution of the image to be recognized.
将预处理后的待识别图像进行特征提取,得到图像特征图。Feature extraction is performed on the preprocessed image to be recognized to obtain an image feature map.
在一个实施例中,所述处理器在实现所述对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果时,用于实现:In one embodiment, when performing the preset subconvolution process on the input feature map to obtain the recognition result of the image to be recognized, the processor is configured to:
对所述输入特征图进行预设次的卷积计算。Perform preset times of convolution calculations on the input feature map.
将卷积计算得到的结果进行归一化处理,得到待识别图像的识别结果。The results obtained by the convolution calculation are normalized to obtain the recognition result of the image to be recognized.
在一个实施例中,所述处理器在实现所述待识别图像的识别结果包括目标区域,在对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果之后,用于实现:In one embodiment, after the processor realizes that the recognition result of the image to be recognized includes the target area, and performs a preset convolution process on the input feature map to obtain the recognition result of the image to be recognized, Used to implement:
获取所述目标区域的颜色特征。Obtain the color features of the target area.
根据所述颜色特征对所述目标区域进行掩膜处理,得到所述目标区域的掩模图。Mask processing is performed on the target area according to the color characteristics to obtain a mask image of the target area.
在一个实施例中,所述处理器在实现所述待识别图像的识别结果包括目标区域,在对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果之后,用于实现:In one embodiment, after the processor realizes that the recognition result of the image to be recognized includes the target area, and performs a preset convolution process on the input feature map to obtain the recognition result of the image to be recognized, Used to implement:
获取所述目标区域的标签;Obtain the label of the target area;
根据所述目标区域的标签对所述目标区域进行掩膜处理,得到所述目标区域的掩模图。Mask processing is performed on the target area according to the label of the target area to obtain a mask image of the target area.
本申请的实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序中包括程序指令,所述处理器执行所述程序指令,实现本申请实施例提供的任一项图像识别方法。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. The computer program includes program instructions. The processor executes the program instructions to implement the present application. Any image recognition method provided by the embodiment.
其中,所述计算机可读存储介质可以是前述实施例所述的自移动设备的内部存储单元,例如所述自移动设备的硬盘或内存。所述计算机可读存储介质也可以是所述自移动设备的外部存储设备,例如所述自移动设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The computer-readable storage medium may be an internal storage unit of the mobile device described in the previous embodiment, such as a hard disk or memory of the mobile device. The computer-readable storage medium may also be an external storage device of the mobile device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital device equipped on the mobile device. , SD) card, flash card (Flash Card), etc.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present application. Modification or replacement, these modifications or replacements shall be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.
Claims (10)
- 一种图像识别方法,所述方法包括:An image recognition method, the method includes:获取待识别图像的图像特征图,所述图像特征图包括多个图像通道;Obtain an image feature map of the image to be recognized, where the image feature map includes multiple image channels;对每个所述图像通道的像素点进行采样,得到每个所述图像通道对应的多个通道特征图,每个所述图像通道对应的多个通道特征图的像素点总数等于每个所述图像通道的像素点总数;The pixels of each image channel are sampled to obtain multiple channel feature maps corresponding to each image channel. The total number of pixels of the multiple channel feature maps corresponding to each image channel is equal to each of the image channels. The total number of pixels in the image channel;将所述图像特征图对应的所有通道特征图作为卷积处理的输入特征图;Use all channel feature maps corresponding to the image feature map as input feature maps for convolution processing;对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果。Perform preset subconvolution processing on the input feature map to obtain the recognition result of the image to be recognized.
- 根据权利要求1所述的图像识别方法,其中,所述对每个所述图像通道的像素点进行采样,得到每个所述图像通道对应的多个通道特征图,包括:The image recognition method according to claim 1, wherein the pixel points of each image channel are sampled to obtain multiple channel feature maps corresponding to each image channel, including:对每个所述图像通道的像素点进行分组,得到每个所述图像通道的多个采样像素点组;Group the pixels of each image channel to obtain multiple sampled pixel groups of each image channel;将每个所述图像通道的所述采样像素点组中的像素点进行组合,得到每个所述图像通道对应的多个通道特征图。The pixels in the sampling pixel group of each image channel are combined to obtain multiple channel feature maps corresponding to each image channel.
- 根据权利要求2所述的图像识别方法,其中,所述对每个所述图像通道的像素点进行分组,得到每个所述图像通道的多个采样像素点组,包括:The image recognition method according to claim 2, wherein said grouping the pixel points of each of the image channels to obtain a plurality of sampled pixel point groups of each of the image channels includes:根据预设采样间隔N在所述图像通道的所有像素点中确定起始像素点集,所述起始像素点集为包括(N+1)*(N+1)个像素点的像素矩阵,所述像素矩阵的首个像素点为对应图像通道任一边界顶点;A starting pixel point set is determined among all pixels of the image channel according to the preset sampling interval N, and the starting pixel point set is a pixel matrix including (N+1)*(N+1) pixels, The first pixel point of the pixel matrix is any boundary vertex of the corresponding image channel;以所述起始像素点集中每个像素点为起始像素点,以预设采样间隔对所述图像通道的像素点进行采样,得到每个所述起始像素点的采样像素点组。Taking each pixel point in the set of starting pixel points as a starting pixel point, the pixel points of the image channel are sampled at a preset sampling interval to obtain a sampled pixel point group for each starting pixel point.
- 根据权利要求2所述的图像识别方法,其中,所述将每个所述图像通道的所述采样像素点组中的像素点进行组合,得到每个所述图像通道对应的多个通道特征图,包括:The image recognition method according to claim 2, wherein the pixel points in the sampled pixel point group of each of the image channels are combined to obtain multiple channel feature maps corresponding to each of the image channels. ,include:对每个所述采样像素点组,根据所述采样像素点组中各个像素点在所述图像特征图中的位置,将各个像素点进行拼接,得到所述采样像素点组的通道特征图;For each sampled pixel group, splice the individual pixels according to the position of each pixel in the sampled pixel group in the image feature map to obtain a channel feature map of the sampled pixel group;完成所有所述采样像素点组的像素点拼接,得到每个所述图像通道对应的多个通道特征图。Complete the pixel splicing of all the sampled pixel groups to obtain multiple channel feature maps corresponding to each of the image channels.
- 根据权利要求1所述的图像识别方法,其中,所述获取待识别图像的图 像特征图,包括:The image recognition method according to claim 1, wherein said obtaining the image feature map of the image to be recognized includes:对待识别图像进行预处理,所述预处理包括调整所述待识别图像的分辨率;Preprocessing the image to be recognized, the preprocessing including adjusting the resolution of the image to be recognized;将预处理后的待识别图像进行特征提取,得到图像特征图。Feature extraction is performed on the preprocessed image to be recognized to obtain an image feature map.
- 根据权利要求1所述的图像识别方法,其中,所述对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果,包括:The image recognition method according to claim 1, wherein said performing preset sub-convolution processing on the input feature map to obtain the recognition result of the image to be recognized includes:对所述输入特征图进行预设次的卷积计算;Perform a preset number of convolution calculations on the input feature map;将卷积计算得到的结果进行归一化处理,得到待识别图像的识别结果。The results obtained by the convolution calculation are normalized to obtain the recognition result of the image to be recognized.
- 根据权利要求1-6任意一项所述的图像识别方法,其中,所述待识别图像的识别结果包括目标区域,在对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果之后,所述方法还包括:The image recognition method according to any one of claims 1 to 6, wherein the recognition result of the image to be recognized includes a target area, and the input feature map is subjected to preset convolution processing to obtain the image to be recognized. After the recognition result of the image, the method further includes:获取所述目标区域的颜色特征;Obtain the color characteristics of the target area;根据所述颜色特征对所述目标区域进行掩膜处理,得到所述目标区域的掩模图。Mask processing is performed on the target area according to the color characteristics to obtain a mask image of the target area.
- 根据权利要求3所述的图像识别方法,其中,所述待识别图像的识别结果包括目标区域,在对所述输入特征图进行预设次卷积处理,得到所述待识别图像的识别结果之后,所述方法还包括:The image recognition method according to claim 3, wherein the recognition result of the image to be recognized includes a target area, and after performing a preset subconvolution process on the input feature map to obtain the recognition result of the image to be recognized , the method also includes:获取所述目标区域的标签;Obtain the label of the target area;根据所述目标区域的标签对所述目标区域进行掩膜处理,得到所述目标区域的掩模图。Mask processing is performed on the target area according to the label of the target area to obtain a mask image of the target area.
- 一种自移动设备,所述自移动设备包括存储器和处理器;A self-mobile device, the self-mobile device includes a memory and a processor;所述存储器用于存储计算机程序;The memory is used to store computer programs;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如权利要求1至8中任一项所述的图像识别方法。The processor is configured to execute the computer program and implement the image recognition method according to any one of claims 1 to 8 when executing the computer program.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如权利要求1至8中任一项所述的图像识别方法。A computer-readable storage medium storing a computer program, which when executed by a processor causes the processor to implement image recognition as claimed in any one of claims 1 to 8 method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/096975 WO2023231022A1 (en) | 2022-06-02 | 2022-06-02 | Image recognition method, self-moving device and storage medium |
CN202280002338.3A CN115151950A (en) | 2022-06-02 | 2022-06-02 | Image recognition method, self-moving device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/096975 WO2023231022A1 (en) | 2022-06-02 | 2022-06-02 | Image recognition method, self-moving device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023231022A1 true WO2023231022A1 (en) | 2023-12-07 |
Family
ID=83416098
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/096975 WO2023231022A1 (en) | 2022-06-02 | 2022-06-02 | Image recognition method, self-moving device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115151950A (en) |
WO (1) | WO2023231022A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113156924A (en) * | 2020-01-07 | 2021-07-23 | 苏州宝时得电动工具有限公司 | Control method of self-moving equipment |
CN113256607A (en) * | 2021-06-17 | 2021-08-13 | 常州微亿智造科技有限公司 | Defect detection method and device |
US20210255638A1 (en) * | 2020-02-19 | 2021-08-19 | Positec Power Tools (Suzhou) Co., Ltd. | Area Division and Path Forming Method and Apparatus for Self-Moving Device and Automatic Working System |
CN113344849A (en) * | 2021-04-25 | 2021-09-03 | 山东师范大学 | Microemulsion head detection system based on YOLOv5 |
CN114373117A (en) * | 2021-12-30 | 2022-04-19 | 浙江大华技术股份有限公司 | Target detection method, device and system |
CN114494990A (en) * | 2021-12-21 | 2022-05-13 | 长视科技股份有限公司 | Target detection method, system, terminal equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110298346A (en) * | 2019-05-23 | 2019-10-01 | 平安科技(深圳)有限公司 | Image-recognizing method, device and computer equipment based on divisible convolutional network |
CN111161250B (en) * | 2019-12-31 | 2023-05-26 | 南遥科技(广东)有限公司 | Method and device for detecting dense houses by using multi-scale remote sensing images |
CN114494918A (en) * | 2020-11-11 | 2022-05-13 | 京东方科技集团股份有限公司 | Target identification method and system and readable storage medium |
CN114359382A (en) * | 2021-11-16 | 2022-04-15 | 海宁集成电路与先进制造研究院 | Cooperative target ball detection method based on deep learning and related device |
-
2022
- 2022-06-02 CN CN202280002338.3A patent/CN115151950A/en active Pending
- 2022-06-02 WO PCT/CN2022/096975 patent/WO2023231022A1/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113156924A (en) * | 2020-01-07 | 2021-07-23 | 苏州宝时得电动工具有限公司 | Control method of self-moving equipment |
US20210255638A1 (en) * | 2020-02-19 | 2021-08-19 | Positec Power Tools (Suzhou) Co., Ltd. | Area Division and Path Forming Method and Apparatus for Self-Moving Device and Automatic Working System |
CN113344849A (en) * | 2021-04-25 | 2021-09-03 | 山东师范大学 | Microemulsion head detection system based on YOLOv5 |
CN113256607A (en) * | 2021-06-17 | 2021-08-13 | 常州微亿智造科技有限公司 | Defect detection method and device |
CN114494990A (en) * | 2021-12-21 | 2022-05-13 | 长视科技股份有限公司 | Target detection method, system, terminal equipment and storage medium |
CN114373117A (en) * | 2021-12-30 | 2022-04-19 | 浙江大华技术股份有限公司 | Target detection method, device and system |
Also Published As
Publication number | Publication date |
---|---|
CN115151950A (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10930065B2 (en) | Three-dimensional modeling with two dimensional data | |
Bao et al. | Field‐based robotic phenotyping of sorghum plant architecture using stereo vision | |
WO2019227948A1 (en) | Method and apparatus for planning operation in target region, storage medium, and processor | |
CA3129174A1 (en) | Method and apparatus for acquiring boundary of area to be operated, and operation route planning method | |
WO2020103110A1 (en) | Image boundary acquisition method and device based on point cloud map and aircraft | |
US20190387687A1 (en) | Automated optimization of agricultural treatments based on raster image data system | |
US11676376B2 (en) | Method for detecting field navigation line after ridge sealing of crops | |
WO2020103108A1 (en) | Semantic generation method and device, drone and storage medium | |
CN112464766B (en) | Automatic farmland land identification method and system | |
CN113780357B (en) | Corn leaf disease and pest mobile terminal identification method based on transfer learning and MobileNet | |
CN110796135B (en) | Target positioning method and device, computer equipment and computer storage medium | |
WO2021136224A1 (en) | Image segmentation method and device | |
CN112465038A (en) | Method and system for identifying disease and insect pest types of fruit trees | |
Chen et al. | CitrusYOLO: a algorithm for citrus detection under orchard environment based on YOLOV4 | |
WO2021051268A1 (en) | Machine vision-based tree type identification method and apparatus | |
Zhu et al. | Identification of table grapes in the natural environment based on an improved Yolov5 and localization of picking points | |
WO2023231022A1 (en) | Image recognition method, self-moving device and storage medium | |
JP2020021368A (en) | Image analysis system, image analysis method and image analysis program | |
CN112435274A (en) | Remote sensing image planar ground object extraction method based on object-oriented segmentation | |
CN116739739A (en) | Loan amount evaluation method and device, electronic equipment and storage medium | |
Chen et al. | Measurement of the distance from grain divider to harvesting boundary based on dynamic regions of interest | |
Mazzia et al. | Deepway: a deep learning estimator for unmanned ground vehicle global path planning | |
Hou et al. | Detection and localization of citrus picking points based on binocular vision | |
Villacrés et al. | Assessing a multi-camera system to enhance fruit visibility for robotic harvesting in a V-trellised apple orchard | |
Yang et al. | TCNet: Transformer Convolution Network for Cutting-Edge Detection of Unharvested Rice Regions |