WO2022111352A1 - Target detection method and apparatus, storage medium, and terminal - Google Patents

Target detection method and apparatus, storage medium, and terminal Download PDF

Info

Publication number
WO2022111352A1
WO2022111352A1 PCT/CN2021/131132 CN2021131132W WO2022111352A1 WO 2022111352 A1 WO2022111352 A1 WO 2022111352A1 CN 2021131132 W CN2021131132 W CN 2021131132W WO 2022111352 A1 WO2022111352 A1 WO 2022111352A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
sample image
target
map
network model
Prior art date
Application number
PCT/CN2021/131132
Other languages
French (fr)
Chinese (zh)
Inventor
陈圣卫
Original Assignee
展讯通信(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 展讯通信(上海)有限公司 filed Critical 展讯通信(上海)有限公司
Publication of WO2022111352A1 publication Critical patent/WO2022111352A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the field of computer vision, and in particular, to a target detection method and device, a storage medium and a terminal.
  • object detection is a challenging subject in the field of computer vision, which is widely used in many fields such as robot navigation, intelligent video surveillance, industrial inspection, aerospace, and autonomous driving. Due to the development of related technologies and the needs of the industry, the current requirements for the efficiency and accuracy of target detection are getting higher and higher.
  • CNN Convolutional Neural Networks
  • the technical problem solved by the present invention is to provide an efficient and accurate target detection method, so as to improve the detection accuracy of small-sized targets.
  • an embodiment of the present invention provides a target detection method.
  • the method includes: acquiring a sample image, where the sample image includes a preset target; extracting an initial feature map of the sample image, and analyzing the initial The feature map is subjected to semantic information enhancement processing to obtain a first prediction map of the sample image, the first prediction map is used to indicate the target area and background area of the sample image, and the target area contains the preset The target area, the background area is an area that does not contain the preset target; the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the The second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the bounding box of the preset target; the trained detection network model is used to detect the to-be-tested image to obtain the detection result of the preset target in the image to be tested.
  • each pixel of the initial feature map includes multiple channels
  • the initial feature map is obtained by down-sampling the sample image
  • performing semantic information enhancement processing on the initial feature map includes: : Step 1: Upsampling the initial feature map by 2 times to obtain the first feature map; Step 2: Process the first feature map according to the channel attention mechanism to obtain the second feature map; Step 3 : take the second feature map as a new initial feature map; repeat steps 1 to 3 until the upsampling multiple is equal to the downsampling multiple; step 4: perform the first convolution on the initial feature map operation to obtain the first prediction map, wherein the number of channels of the first prediction map is 2.
  • the method before processing the first feature map according to the channel attention mechanism, the method further includes: performing multiple second convolution operations on the first feature map, and each second convolution operation adopts A 1 ⁇ 1 convolution kernel; the results of multiple second convolution operations are added to the first feature map to obtain a new first feature map.
  • the method before processing the first feature map according to the channel attention mechanism, the method further includes: performing multiple third convolution operations on the first feature map, and performing multiple third convolution operations on the first feature map.
  • a 3 ⁇ 3 convolution kernel is used at least once; the results of multiple third convolution operations are added to the first feature map to obtain a new first feature map.
  • performing multiple second convolution operations on the first feature map includes: in other second convolution operations other than the first second convolution operation, first perform on the first feature map. Batch normalization, and then use the relu activation function.
  • training the detection network model according to the first prediction map includes: Step A: According to the calculating a first loss function value from the first prediction map, the sample image and the first loss function, and calculating a second loss function value according to the second prediction map, the sample image and the second loss function; Step B: Calculate the loss function value of the detection network model according to the first loss function value and the second loss function value; Step C: Determine that the loss function value exceeds a preset threshold, and if so, adjust the The parameters of the module for extracting the initial feature map in the detection network model, if not, then end the training of the detection network model; Step D: using the detection network model after adjusting the parameters to extract the
  • the first loss function is a FocalLoss function.
  • the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image. The larger the weight coefficient of the function.
  • the method before extracting the initial feature map of the sample image, the method further includes: performing data enhancement on the sample image, where the data enhancement includes one or more of the following: adjusting the brightness of the sample image and/or contrast, rotating the sample image by a preset angle, and adding noise to the sample image.
  • the data enhancement includes one or more of the following: adjusting the brightness of the sample image and/or contrast, rotating the sample image by a preset angle, and adding noise to the sample image.
  • the detection network model is a single-step detection network model.
  • an embodiment of the present invention further provides a target detection device, the device includes: an acquisition module for acquiring a sample image, the sample image including a preset target; a processing module for extracting the sample The initial feature map of the image, and the semantic information enhancement processing is performed on the initial feature map to obtain the first prediction map of the sample image, and the first prediction map is used to indicate the target area and background area of the sample image.
  • the target area is an area that includes the preset target
  • the background area is an area that does not include the preset target;
  • the training module is used for the detection network according to the first prediction map and the second prediction map.
  • the model is trained to obtain a trained detection network model, wherein the second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the prediction map.
  • the bounding box of the target is set; the detection module is used to detect the image to be tested by using the trained detection network model, so as to obtain the detection result of the preset target in the image to be tested.
  • An embodiment of the present invention further provides a storage medium, on which a computer program is stored, and the computer program executes the steps of the above target detection method when the computer program is run by a processor.
  • An embodiment of the present invention further provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the target detection in the claim when running the computer program steps of the method.
  • An embodiment of the present invention provides a target detection method, the method includes: acquiring a sample image, where the sample image includes a preset target; extracting an initial feature map of the sample image, and performing semantic information enhancement on the initial feature map process to obtain a first prediction map of the sample image, the first prediction map is used to indicate the target area and background area of the sample image, the target area is the area containing the preset target, the The background area is an area that does not contain the preset target; the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the second prediction map is Calculated on the sample image according to the initial feature map, the second prediction map is used to indicate the bounding box of the preset target; the trained detection network model is used to detect the image to be tested to obtain the The detection result of the preset target in the image to be tested.
  • the initial feature map of the sample image is first extracted, and then the semantic information enhancement processing is performed on the initial feature map to obtain the target area and the background area in the sample image.
  • the first prediction map and then train the detection network model according to the first prediction map and the second prediction map capable of indicating the preset target bounding box, so as to obtain the trained detection network model. Since the first prediction map can indicate the target area and the background area in the sample image, the first prediction map can contain more semantic information of the preset target in the sample image.
  • Using the first prediction map to train the detection network model can make The detection network model learns the semantic information of the preset target well, and when the image to be tested is subsequently detected, the detection result of the preset target can be obtained more accurately, and the detection efficiency is higher.
  • the initial feature map is up-sampled and processed according to the channel attention mechanism, until the up-sampling multiple is equal to the down-sampling multiple when extracting the initial feature map. same.
  • the size of the first prediction map can be made the same as the sample image, so that the loss function can be subsequently trained according to the first prediction map and the sample image.
  • Each pixel in the initial feature map contains multiple channels, and the channel attention mechanism is used to enhance semantic information, strengthen the channel with high correlation with the preset target, weaken the channel with small correlation with the preset target, and finally analyze the initial feature. Convolution operation is performed on the image, and the number of channels of the first prediction image is adjusted to 2, so that the first prediction image can intuitively reflect the semantic information of the preset target in a two-class manner (ie, indicating the target area and the background area).
  • the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image, and the larger the ratio of the preset target, the larger the weight coefficient of the first loss function, that is, , the first prediction map obtained by the semantic information enhancement process plays a greater role in training the detection network model, so that the detection network model has good performance in detecting the preset target.
  • FIG. 1 is a schematic flowchart of a target detection method in an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a detection network model applicable to a target detection method in an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of the semantic information enhancement module in FIG. 2 .
  • FIG. 4 is a schematic structural diagram of the first residual module in FIG. 3 .
  • FIG. 5 is a schematic diagram of a first prediction map in an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a target detection apparatus in an embodiment of the present invention.
  • the convolutional neural networks used for small-size target detection mainly include Feature Pyramid Networks (FPN), Generative Adversarial Networks (GAN), and Generative Adversarial Networks (GAN). Or use an image pyramid size normalization network (Scale Normalization for Image Pyramids, SNIP), etc.
  • FPN obtains more information of small-sized objects in the image by fusing features of different scales
  • GAN improves the detection accuracy by restoring the image information of small objects
  • SNIP is based on multi-scale training only for the same scale as the pre-training scale.
  • the matched targets are gradient backhauled to improve detection accuracy.
  • the method of deepening the network depth is usually adopted, that is, by increasing the number of convolutional layers, the network can obtain more semantic information of small-sized objects during the training process.
  • This method needs to greatly increase the number of convolutional layers in the convolutional neural network, which will result in a complex network structure and a deep depth, and it will take a long time to detect small-size targets in the subsequent detection, resulting in the convolutional neural network in the actual detection of small size.
  • the performance of the target application is not high.
  • an embodiment of the present invention provides a target detection method.
  • the method includes: acquiring a sample image, where the sample image includes a preset target; extracting an initial feature map of the sample image, and analyzing the initial The feature map is subjected to semantic information enhancement processing to obtain a first prediction map of the sample image, the first prediction map is used to indicate the target area and background area of the sample image, and the target area contains the preset The target area, the background area is an area that does not contain the preset target; the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the The second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the bounding box of the preset target; the trained detection network model is used to detect the to-be-tested image to obtain the detection result of the preset target in the image to be tested.
  • the initial feature map of the sample image is first extracted, and then the semantic information enhancement processing is performed on the initial feature map to obtain the target area and the background area in the sample image.
  • the first prediction map and then train the detection network model according to the first prediction map and the second prediction map capable of indicating the preset target bounding box, so as to obtain the trained detection network model. Since the first prediction map can indicate the target area and the background area in the sample image, the first prediction map can contain more semantic information of the preset target in the sample image.
  • the detection network model learns the semantic information of the preset target well, and can quickly and accurately obtain the detection result of the preset target when the image to be tested is subsequently detected.
  • FIG. 1 is a schematic flowchart of a target detection method in an embodiment of the present invention.
  • the target detection method may be performed by a terminal, and the terminal may be various appropriate terminals, such as a mobile phone, a computer, an Internet of Things device, etc., but is not limited thereto.
  • the method can be used to detect whether the image to be tested contains a preset target, and can also be used to detect the specific position and category of the preset target in the image to be tested, but it is not limited thereto.
  • the image to be tested may be an image collected in real time by the terminal, an image pre-stored on the terminal, or an image received from the outside by the terminal, but is not limited thereto.
  • the preset target may be determined by the terminal according to an instruction received from the outside in advance, or may be determined by the terminal recognizing the sample image through various appropriate models.
  • the target detection method shown in FIG. 1 may specifically include the following steps:
  • Step S101 acquiring a sample image, where the sample image includes a preset target
  • Step S102 Extract the initial feature map of the sample image, and perform semantic information enhancement processing on the initial feature map to obtain a first prediction map of the sample image, where the first prediction map is used to indicate the sample
  • the target area and the background area of the image the target area is an area that includes the preset target, and the background area is an area that does not include the preset target;
  • Step S103 Train the detection network model according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the second prediction map is based on the initial feature map for the sample. The image is calculated and obtained, and the second prediction map is used to indicate the bounding box of the preset target;
  • Step S104 Use the trained detection network model to detect the image to be tested, so as to obtain a detection result of the preset target in the image to be tested.
  • the terminal may acquire sample images from outside, or may select at least a part of the training set stored locally as sample images, and the sample images may include preset targets.
  • the preset target refers to a specific target object, for example, a traffic sign, a license plate, a human face, etc.
  • the preset target may be determined by the terminal according to an instruction received from the outside in advance, or may be determined by the terminal through A variety of appropriate models are used to identify sample images.
  • the size may not exceed the preset size, or the size may not exceed the preset proportion of the size of the image, but It is not limited to this.
  • the preset size and preset ratio may be preset.
  • the sample image may include an identification graphic, and the identification graphic is used to indicate the position of the preset target in the sample image, and may also be used to indicate the category of the preset target in the sample image.
  • identification graphics of different shapes may be used to represent different types of preset targets.
  • step S102 before training the detection network model, the detection network model needs to be constructed first, and the detection network model may have various appropriate structures.
  • FIG. 2 is a schematic structural diagram of a detection network model applicable to a target detection method in an embodiment of the present invention.
  • a detection network model applicable to a target detection method in an embodiment of the present invention is described below in a non-limiting manner with reference to FIG. 2 .
  • the detection network model shown in FIG. 2 may include a feature extraction module 21 , a prediction module 22 , and a semantic information enhancement module 23 .
  • the detection network model may be a single-step detection network model (single-step detection network model means that only after the image to be tested needs to be sent to the network, it does not need to go through the candidate region suggestion stage, and the prediction can be directly obtained through a single stage.
  • Set the network model of the detection result of the target or it can be a two-step detection network model (two-step detection network model refers to the network model that generates multiple candidate regions based on the input image to be tested, and then classifies the candidate regions) , or any other appropriate network model, which is not limited here.
  • the detection network model is a single-step detection network model.
  • the feature extraction module 21 can be used to extract features in the sample image to obtain an initial feature map of the sample image.
  • the initial feature map may include position information of the preset target in the sample image, and may also include semantic information of the preset target.
  • the feature extraction module 21 can obtain an initial feature map by performing down-sampling on the sample image multiple times. For example, the feature extraction module 21 downsamples the sample image by 2n times to obtain an initial feature map, where n is a positive integer.
  • the initial feature map extracted by the feature extraction module 21 may be transmitted to the prediction module 22, and the prediction module 22 may perform calculation according to the initial feature map to obtain the second prediction map. Since the initial feature map contains the position information and semantic information of the preset target in the sample image, the second prediction map can be used to indicate the bounding box of the preset target in the sample image. Specifically, the prediction module 22 may calculate, but not limited to, the key point position, offset, and size of the preset target in the sample image according to the initial feature map.
  • the initial feature map extracted by the feature extraction module 21 can also be input to the semantic information enhancement module 23, and the semantic information enhancement module 23 performs semantic information enhancement processing on the initial feature map to obtain the first prediction map.
  • the semantic information enhancement module 23 includes at least one sub-module and a two-class prediction module, wherein the sub-module is used for up-sampling the initial feature map and extracting the semantic information of the preset target in the initial feature map, the two The classification prediction module is used to indicate the target area and the background area in the second prediction map by a binary classification method, wherein the target area refers to the area containing the target, and the background area refers to the area that does not contain the target, so as to enhance the second prediction Semantic information of the graph.
  • the number of the sub-modules also needs to be determined, and the number of the sub-modules is determined by the above-mentioned downsampling multiple.
  • the feature extraction module 21 extracts the initial feature map of the sample image, it downsamples the sample image by 2n times, so the number of sub-modules is n, and n is a positive integer.
  • FIG. 3 shows a schematic structural diagram of the semantic information enhancement module 23 in FIG. 2 .
  • a non-limiting description of the semantic enhancement module in FIG. 2 is given below in conjunction with FIG. 3 .
  • FIG. 3 shows a schematic structural diagram of the semantic information enhancement module when the feature extraction module 21 performs 4 times downsampling on the sample image, which includes a first sub-module 31, a second sub-module 32 and a two-class prediction module 33.
  • both the first sub-module 31 and the second sub-module 32 include an upsampling module 34 , a first residual module 35 , a second residual module 36 , and a channel attention module 37 .
  • the upsampling module 34 can be used to upsample the initial feature map, so that the size of the first prediction map obtained by the semantic information enhancement module is consistent with the size of the sample image, so as to be used for training the loss function subsequently.
  • the first residual module 35 can be used to extract more feature information of the preset target in the images output by the upsampling module 34, and can also avoid gradient disappearance while extracting the feature information.
  • the first residual module 35 may include multiple convolutional layers. If each convolutional layer in the first residual module 35 adopts a 1 ⁇ 1 convolution kernel, the first residual module 35 may be more Well extract the features of each pixel itself.
  • FIG. 4 shows a schematic structural diagram of the first residual module 35 in FIG. 3 .
  • a non-limiting description of the first residual module in FIG. 3 is given below in conjunction with FIG. 4 .
  • the first residual module 35 includes a first convolutional layer 41, a second convolutional layer 42 and a third convolutional layer 43.
  • the image output by the upsampling module 34 can be input to the first convolutional layer 41.
  • the first convolutional layer 41 can use a 1 ⁇ 1 convolution kernel.
  • the image output by the first convolution layer 41 can be input to the second convolution layer 42, and the second convolution layer 42 can use a 1 ⁇ 1 convolution kernel.
  • the second convolutional layer 42 may be a grouped convolutional layer.
  • the image output by the second convolution layer 42 can be input to the third convolution layer 43, and the third convolution layer 43 can use a 1 ⁇ 1 convolution kernel.
  • the output of the third convolution layer 43 can be added to the output of the first convolution layer 41 to obtain the output of the first residual module 35, which can avoid the problem of gradient disappearance in the detection network model.
  • the second residual module 36 can also be used to extract more feature information of the preset target in the images output by the upsampling module 34, and can also extract more feature information while avoiding gradients. disappear.
  • the second residual module 36 may include multiple convolution layers, wherein at least one convolution layer includes a 3 ⁇ 3 convolution kernel, so the second residual module 36 can better extract each pixel point. Corresponding receptive field characteristics.
  • the second residual module 36 may include a fourth convolutional layer, a fifth convolutional layer, and a sixth convolutional layer.
  • the image output by the upsampling module 34 can be input to the fourth convolution layer, and the fourth convolution layer can use a 1 ⁇ 1 convolution kernel.
  • the image output by the fourth convolutional layer can be input to the fifth convolutional layer, and the fifth convolutional layer can use a 3 ⁇ 3 convolution kernel.
  • the fifth convolutional layer may be a grouped convolutional layer.
  • the image output by the fifth convolutional layer can be input to the sixth convolutional layer, and the sixth convolutional layer can use a 1 ⁇ 1 convolution kernel.
  • the output of the sixth convolutional layer can be added with the output of the fourth convolutional layer to obtain the output of the second residual module 36, which can avoid the problem of gradient disappearance in the detection network model.
  • the channel attention module 37 can be used to process the image output by the upsampling module 34 according to the channel attention mechanism. Specifically, for each pixel in the image, the activation function can be used to determine the weight value of each channel. The more relevant the channel is to the preset target, the greater the weight value of the channel, and then the upsampling is based on the weight value of each channel.
  • the image output by the module 34 is subjected to weighted calculation processing, so that the image output by the channel attention module 37 can well represent the semantic information of the preset target and has a stronger directivity.
  • first residual module 35 and second residual module 36 are optional, that is, the first sub-module 31 or the second sub-module 32 may only include the upsampling module 34 and the channel attention
  • the module 37 may also include an upsampling module 34, a channel attention module 37, and/or a first residual module 35, and/or a second residual module 36, but is not limited thereto.
  • the two-class prediction module 33 can be used to adjust the channel value of the image to 2, the two-class prediction module can include a seventh convolution layer, and the number of filters in the seventh convolution layer is 2, so that the two-class prediction module
  • the outputted first prediction map may be used to indicate a target area and a background area, where the target area is an area containing a preset target, and the background area is an area that does not contain a preset target.
  • the terminal can input the sample image to the detection network model, and the detection network model can extract the initial feature map of the sample image, for example, the sample image can be down-sampled to extract the initial feature map .
  • each pixel of the initial feature map includes multiple channels.
  • the initial feature map may include position information of the preset target in the sample image, and may also include semantic information of the preset target.
  • the downsampling multiple is determined by the structure of the module for extracting the initial feature map, and the downsampling multiple for the sample image is the same as the downsampling multiple for the image to be tested.
  • the initial feature map can be extracted by using the feature extraction module 21 in FIG. 2 .
  • data enhancement may also be performed on the sample image, and the data enhancement includes one or more of the following: adjusting the brightness and/or contrast of the sample image; The sample image is rotated by a preset angle to add noise to the sample image, but it is not limited thereto.
  • the terminal can calculate the key point position, offset and preset target size of the preset target in the sample image according to the initial feature map of the sample image, so as to obtain the second prediction map of the sample image, and the second prediction map can be used as Indicates the bounding box of the preset object in the sample image.
  • semantic information enhancement processing may be performed on the initial feature map to obtain the first prediction map of the sample image.
  • the initial feature map can be transmitted to the semantic information enhancement module 23 in FIG. 2 for semantic information enhancement processing.
  • the initial feature map is obtained by downsampling the sample image
  • performing semantic information enhancement processing on the initial feature map may include: Step 1: The image is upsampled twice to obtain the first feature map; Step 2: Process the first feature map according to the channel attention mechanism to obtain the second feature map; Step 3: Use the second feature map as a new initial feature map; repeat steps 1 to 3 until the upsampling multiple is equal to the downsampling multiple; step 4: perform a first convolution operation on the initial feature map to obtain the first prediction Figure, wherein the number of channels of the first prediction map is 2.
  • the channel attention mechanism is used for processing, until the up-sampling factor of the initial feature map is the same as the down-sampling factor of the sample image.
  • the number of upsampling or the number of processing according to the channel attention mechanism is determined by the multiple of downsampling. If the downsampling multiple is 2 n , the number of upsampling or the number of processing using the channel attention mechanism is n, where n is positive integer.
  • an activation function can be used to determine the weight value of each channel of the pixel, and the more relevant the channel is to the preset target, The larger the weight value of the channel, the weighted calculation processing is performed on the first feature map based on the weight value of each channel to obtain the second feature map. Therefore, the second feature map can clearly represent the semantic information of the preset target. more directivity.
  • a first convolution operation may be performed on the initial feature map, for example, the first convolution may be performed by using the two-class prediction module 33 in FIG. 3 . operation.
  • the first convolution operation the number of filters used is 2, and the channels of the obtained first prediction map are 2. Therefore, the first prediction map can indicate the target area and the background area, and the target area refers to The sample image includes the area of the preset target, and the background area refers to the area of the sample image that does not include the preset target.
  • FIG. 5 is a schematic diagram of a first prediction map according to an embodiment of the present invention, in which the target area 51 and the target area 52 are areas including the preset target, and the background area 53 does not include the preset target.
  • the size of the initial feature map can be restored to the size of the sample image, and at the same time, the important channels in each pixel can be screened out, and the important channels refer to the correlation with the preset target. Larger channel, and then obtain the first prediction map through the first convolution operation, so that the first prediction map can intuitively reflect the target area and background area in the sample image in a two-class manner, so that the first prediction map can include Rich semantic information.
  • multiple second convolution operations may be performed on the first feature map, wherein each second convolution operation may A 1 ⁇ 1 convolution kernel is used; the final result of multiple second convolution operations is added to the first feature map to obtain a new first feature map.
  • batch normalization is performed on the first feature map first, and then the relu activation function is used. In order to make the detection network model more optimized.
  • the number of second convolution operations is three. Wherein, when performing the second second convolution operation on the first feature map, the calculation may be performed in a grouped convolution manner.
  • the first residual module 35 shown in FIG. 4 may be used to perform multiple second convolution operations.
  • multiple third convolution operations may be performed on the first feature map, and at least one of the multiple third convolution operations uses a 3 ⁇ 3 convolution operation.
  • Convolution kernel adding the final result of multiple third convolution operations to the first feature map to obtain a new first feature map.
  • other second convolution operations other than the first third convolution operation can use the relu activation function, and perform batch normalization processing, so as to detect the network model. more optimized.
  • the number of times of the third convolution operation is 3, wherein, the second third convolution operation adopts a 3 ⁇ 3 convolution kernel, and other third convolution operations other than the second third convolution operation
  • the triple convolution operation uses a 1 ⁇ 1 convolution kernel.
  • the second third convolution operation is calculated by using the grouped convolution method, and the other third convolution operations other than the second third convolution operation are calculated by the ordinary convolution method.
  • the second residual module 36 shown in FIG. 3 may be used to perform multiple third convolution operations.
  • the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model.
  • the loss function of the detection network model consists of a first loss function and a second loss function.
  • the following formula can be used to express the loss function of the detection network model:
  • Loss is the loss function of the detection network model
  • L semantic is the first loss function
  • L model is the second loss function
  • ⁇ semantic is the weight coefficient of the first loss function
  • training the detection network model may include the following steps: Step A: Calculate a first loss function value according to the first prediction map, the sample image and the first loss function, and calculate the value of the first loss function according to the second prediction Figure, the sample image and the second loss function to calculate the second loss function value; Step B: Calculate the loss function value of the detection network model according to the first loss function value and the second loss function value; Step C: determine that the loss function value exceeds a preset threshold, if so, adjust the parameters of the module used to extract the initial feature map in the detection network model, if not, end the training of the detection network model Step D: Extract the initial feature map of the sample image using the detection network model after adjusting the parameters, and carry out semantic information enhancement processing to the initial feature map to obtain the first prediction map, and according to the initial feature map performing prediction on the sample image to obtain the first prediction map;
  • Steps A to E are repeatedly performed until it is determined in step C that the loss function value does not exceed the preset threshold, that is, until step C jumps to ending the training of the detection network model.
  • the preset threshold value may be received by the terminal from the outside, or may be determined by the terminal calculation. It can be seen from this that in the process of training the detection image, by adjusting the parameters of the module (for example, the feature extraction module 21 in FIG. 2 ) in the detection network model for extracting the initial feature map for many times, the detection network can be The model learns sufficient semantic information of the preset target.
  • the first prediction map indicates the target area and the background area by means of binary classification, which contains rich semantic information
  • the first loss function value when calculating the first loss function value according to the first prediction map and the sample image, there is no need to deepen the depth of the detection network model. To make the detection network model learn more semantic information.
  • the first loss function may be the FocalLoss function, that is,
  • p is the detection The predicted probability of the network model for the preset target.
  • the first loss function is the FocalLoss function
  • the problem of unbalanced positive and negative samples in the sample image can be solved.
  • the preset target is a small-sized target, since the number of small-sized targets in the sample image is generally small, using the FocalLoss function as the first training function can avoid the problem of insufficient training due to a small number of small-sized targets.
  • the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image.
  • the terminal may first obtain the image to be measured, and the image to be measured may be the image to be measured collected by the terminal in real time, or the image to be measured received from the outside in real time, or stored in advance.
  • the image to be tested locally but not limited to this.
  • the trained detection network model is used to detect the image to be tested. If it is detected that the image to be tested contains the preset target, the position and range of the preset target can be output.
  • the Bounding Box marks the position and range of the preset target.
  • the terminal when performing multi-target detection, that is, when the preset target has multiple categories, the terminal can also identify category information of the preset target at the same time.
  • the extracted feature map (for example, can be extracted by the feature extraction module 21 in FIG. 2 ) can contain rich semantic information , so that the calculation is performed according to the feature map of the image to be tested (for example, the prediction module 22 in FIG. 2 can be used to perform the calculation), and the detection result of the preset target can be obtained.
  • the feature map of the image to be tested is not subjected to semantic information enhancement processing, that is, the prediction map used to indicate the bounding box of the preset target is directly calculated according to the feature map. That's it.
  • FIG. 6 is a target detection apparatus in an embodiment of the present invention.
  • the target detection apparatus in an embodiment of the present invention may include an acquisition module 61 , a processing module 62 , a training module 63 , and a detection module 64 .
  • the acquisition module 61 is used to acquire a sample image, and the sample image includes a preset target;
  • the processing module 62 is used to extract the initial feature map of the sample image, and perform semantic information enhancement processing on the initial feature map,
  • the first prediction map is used to indicate the target area and the background area of the sample image, the target area is the area containing the preset target, the background area is an area that does not contain the preset target;
  • the training module 63 is used to train the detection network model according to the first prediction map and the second prediction map, so as to obtain a trained detection network model, wherein the The second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the bounding box of the preset target;
  • the detection module 64 is used for using the trained detection network The model detects the image to be tested to obtain the detection result of the preset target in the image to be tested.
  • FIGS. 1 to 5 For more details on the working principle, working mode, and beneficial effects of the target detection apparatus in the embodiment of the present invention, reference may be made to the above-mentioned related descriptions of FIGS. 1 to 5 , which will not be repeated here.
  • An embodiment of the present invention further provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the target detection method described in FIG. 1 are executed.
  • the storage medium may be a computer-readable storage medium, for example, may include non-volatile memory (non-volatile) or non-transitory (non-transitory) memory, and may also include optical disks, mechanical hard disks, solid-state disks, and the like.
  • An embodiment of the present invention further provides a terminal, including a memory and a processor, where the memory stores computer instructions that can be run on the processor, and the processor executes the computer instructions described in FIG. 1 when the processor runs the computer instructions.
  • the steps of the object detection method may be a terminal device such as a computer, a tablet computer, a mobile phone, etc., but is not limited thereto.
  • the processor may be a central processing unit (central processing unit, CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) ), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM for short), programmable read-only memory (PROM for short), erasable programmable read-only memory (EPROM for short) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, EEPROM for short) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous Dynamic random access memory
  • SDRAM synchronous Dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Synchronous connection dynamic random access memory
  • DR RAM direct memory bus random access memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

A target detection method and apparatus, a storage medium, and a terminal. The method comprises: acquiring a sample image; extracting an initial feature map of the sample image, and performing semantic information enhancement on the initial feature map, so as to obtain a first prediction map of the sample image, the first prediction map being used to indicate a target area and a background area of the sample image, and the target area being an area that comprises a preset target, and the background area being an area that does not comprise the preset target; and training a detection network model according to the first prediction map and a second prediction map, to obtain a trained detection network model, and detecting an image to be tested by using the trained detection network model, to obtain a detection result of the preset target in the image to be tested. In the technical solution of the present invention, a preset target in an image to be tested can be efficiently and accurately detected.

Description

目标检测方法及装置、存储介质、终端Target detection method and device, storage medium and terminal
本申请要求于2020年11月30日提交中国专利局、申请号为202011373448.X、发明名称为“目标检测方法及装置、存储介质、终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 30, 2020 with the application number 202011373448.X and the invention titled "target detection method and device, storage medium, terminal", the entire contents of which are by reference Incorporated in this application.
技术领域technical field
本发明涉及一种计算机视觉领域,尤其涉及一种目标检测方法及装置、存储介质和终端。The present invention relates to the field of computer vision, and in particular, to a target detection method and device, a storage medium and a terminal.
背景技术Background technique
当前目标检测是计算机视觉领域中一个富有挑战性的课题,广泛应用于机器人导航、智能视频监控、工业检测、航空航天、自动驾驶等诸多领域。由于相关技术的发展和产业的需要,当前对目标检测的效率和准确度的要求越来越高。Currently, object detection is a challenging subject in the field of computer vision, which is widely used in many fields such as robot navigation, intelligent video surveillance, industrial inspection, aerospace, and autonomous driving. Due to the development of related technologies and the needs of the industry, the current requirements for the efficiency and accuracy of target detection are getting higher and higher.
随着深度学习技术的高速发展,越来越多的目标检测采用卷积神经网络(Convolutional Neural Networks,CNN)来完成,卷积神经网络逐渐取代了传统的图像处理算法。在目标检测任务中,卷积神经网络的检测准确率虽然屡创新高,但其对于小尺寸目标(例如:不超过预设尺寸的目标,也可以是尺寸占其所属图像的尺寸的比例不超过预设比例的目标)的检测准确率并不高,现有的小尺寸目标的检测准确率通常只有正常尺寸目标的检测准确率的一半。With the rapid development of deep learning technology, more and more target detection is done using Convolutional Neural Networks (CNN), which gradually replaces traditional image processing algorithms. In the target detection task, although the detection accuracy of the convolutional neural network has repeatedly reached new highs, it is not suitable for small-sized targets (for example: targets that do not exceed the preset size, or the proportion of the size to the size of the image to which they belong does not exceed The detection accuracy of the target with a preset ratio) is not high, and the detection accuracy of the existing small-sized target is usually only half of the detection accuracy of the normal-sized target.
因此,亟需提出一种高效、准确的目标检测方法,以提高小尺寸目标的检测准确率。Therefore, it is urgent to propose an efficient and accurate target detection method to improve the detection accuracy of small-sized targets.
发明内容SUMMARY OF THE INVENTION
本发明解决的技术问题是提供一种高效、准确的目标检测方法,以提高小尺寸目标的检测准确率。The technical problem solved by the present invention is to provide an efficient and accurate target detection method, so as to improve the detection accuracy of small-sized targets.
为解决上述技术问题,本发明实施例提供一种目标检测方法,所述方法包括:获取样本图像,所述样本图像包括预设目标;提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。In order to solve the above technical problem, an embodiment of the present invention provides a target detection method. The method includes: acquiring a sample image, where the sample image includes a preset target; extracting an initial feature map of the sample image, and analyzing the initial The feature map is subjected to semantic information enhancement processing to obtain a first prediction map of the sample image, the first prediction map is used to indicate the target area and background area of the sample image, and the target area contains the preset The target area, the background area is an area that does not contain the preset target; the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the The second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the bounding box of the preset target; the trained detection network model is used to detect the to-be-tested image to obtain the detection result of the preset target in the image to be tested.
可选的,所述初始特征图的每个像素点包括多个通道,所述初始特征图是通过对所述样本图像进行下采样得到的,则对所述初始特征图进行语义信息增强处理包括:步骤一:对所述初始特征图进行2倍上采样,以得到第一特征图;步骤二:根据通道注意力机制对所述第一特征图进行处理,以得到第二特征图;步骤三:将所述第二特征图作为新的初始特征图;重复执行步骤一至步骤三,直至所述上采样倍数与所述下采样倍数相等;步骤四:对所述初始特征图进行第一卷积运算,以得到所述第一预测图,其中,所述第一预测图的通道数量为2。Optionally, each pixel of the initial feature map includes multiple channels, and the initial feature map is obtained by down-sampling the sample image, and performing semantic information enhancement processing on the initial feature map includes: : Step 1: Upsampling the initial feature map by 2 times to obtain the first feature map; Step 2: Process the first feature map according to the channel attention mechanism to obtain the second feature map; Step 3 : take the second feature map as a new initial feature map; repeat steps 1 to 3 until the upsampling multiple is equal to the downsampling multiple; step 4: perform the first convolution on the initial feature map operation to obtain the first prediction map, wherein the number of channels of the first prediction map is 2.
可选的,根据通道注意力机制对所述第一特征图进行处理之前,所述方法还包括:对所述第一特征图进行多次第二卷积运算,每次第二卷积运算采用1×1的卷积核;将多次第二卷积运算的结果与所述第一特征图相加,以得到新的第一特征图。Optionally, before processing the first feature map according to the channel attention mechanism, the method further includes: performing multiple second convolution operations on the first feature map, and each second convolution operation adopts A 1×1 convolution kernel; the results of multiple second convolution operations are added to the first feature map to obtain a new first feature map.
可选的,根据通道注意力机制对所述第一特征图进行处理之前, 所述方法还包括:对所述第一特征图进行多次第三卷积运算,多次第三卷积运算中至少一次采用3×3的卷积核;将多次第三卷积运算的结果与所述第一特征图相加,以得到新的第一特征图。Optionally, before processing the first feature map according to the channel attention mechanism, the method further includes: performing multiple third convolution operations on the first feature map, and performing multiple third convolution operations on the first feature map. A 3×3 convolution kernel is used at least once; the results of multiple third convolution operations are added to the first feature map to obtain a new first feature map.
可选的,对所述第一特征图进行多次第二卷积运算包括:第一次第二卷积运算之外的其他第二卷积运算中,均先对所述第一特征图进行批标准化处理,再使用relu激活函数。Optionally, performing multiple second convolution operations on the first feature map includes: in other second convolution operations other than the first second convolution operation, first perform on the first feature map. Batch normalization, and then use the relu activation function.
可选的,所述检测网络模型的损失函数为Loss=λ semanticL semanticmodelL model,其中,Loss为所述检测网络模型的损失函数,L semantic为第一损失函数,L model为第二损失函数,λ semantic为第一损失函数的权重系数,λ model为第二损失函数的权重系数;则根据所述第一预测图对所述检测网络模型进行训练包括:步骤A:根据所述第一预测图、所述样本图像和所述第一损失函数计算第一损失函数值,并根据所述第二预测图、所述样本图像和所述第二损失函数计算第二损失函数值;步骤B:根据所述第一损失函数值和所述第二损失函数值计算所述检测网络模型的损失函数值;步骤C:判断所述损失函数值超过预设阈值,如果是,则调整所述检测网络模型中用于提取所述初始特征图的模块的参数,如果不是,则结束所述检测网络模型的训练;步骤D:采用调整参数后的检测网络模型提取所述样本图像的初始特征图,对所述初始特征图进行语义信息增强处理,以得到所述第一预测图,并根据所述初始特征图对所述样本图像进行预测,以得到所述第一预测图;重复执行步骤A至步骤E,直至步骤C中判断所述损失函数值未超过预设阈值。 Optionally, the loss function of the detection network model is Loss=λ semantic L semanticmodel L model , where Loss is the loss function of the detection network model, L semantic is the first loss function, and L model is the first loss function. Two loss functions, λ semantic is the weight coefficient of the first loss function, and λ model is the weight coefficient of the second loss function; then training the detection network model according to the first prediction map includes: Step A: According to the calculating a first loss function value from the first prediction map, the sample image and the first loss function, and calculating a second loss function value according to the second prediction map, the sample image and the second loss function; Step B: Calculate the loss function value of the detection network model according to the first loss function value and the second loss function value; Step C: Determine that the loss function value exceeds a preset threshold, and if so, adjust the The parameters of the module for extracting the initial feature map in the detection network model, if not, then end the training of the detection network model; Step D: using the detection network model after adjusting the parameters to extract the initial features of the sample image performing semantic information enhancement processing on the initial feature map to obtain the first prediction map, and predicting the sample image according to the initial feature map to obtain the first prediction map; repeat the steps Step A goes to step E until it is determined in step C that the loss function value does not exceed the preset threshold.
可选的,所述第一损失函数为FocalLoss函数。Optionally, the first loss function is a FocalLoss function.
可选的,所述第一损失函数的权重系数由所述样本图像中所述预设目标占所有目标的比例决定,所述预设目标占所有目标的比例越大,则所述第一损失函数的权重系数越大。Optionally, the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image. The larger the weight coefficient of the function.
可选的,在提取所述样本图像的初始特征图之前,所述方法还包括:对所述样本图像进行数据增强,所述数据增强包括以下一项或多 项:调整所述样本图像的亮度和/或对比度、将所述样本图像旋转预设角度、为所述样本图像增加噪声。Optionally, before extracting the initial feature map of the sample image, the method further includes: performing data enhancement on the sample image, where the data enhancement includes one or more of the following: adjusting the brightness of the sample image and/or contrast, rotating the sample image by a preset angle, and adding noise to the sample image.
可选的,所述检测网络模型为单步检测网络模型。Optionally, the detection network model is a single-step detection network model.
为了解决上述技术问题,本发明实施例还提供一种目标检测装置,所述装置包括:获取模块,用于获取样本图像,所述样本图像包括预设目标;处理模块,用于提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;训练模块,用于根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;检测模块,用于采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。In order to solve the above technical problem, an embodiment of the present invention further provides a target detection device, the device includes: an acquisition module for acquiring a sample image, the sample image including a preset target; a processing module for extracting the sample The initial feature map of the image, and the semantic information enhancement processing is performed on the initial feature map to obtain the first prediction map of the sample image, and the first prediction map is used to indicate the target area and background area of the sample image. , the target area is an area that includes the preset target, and the background area is an area that does not include the preset target; the training module is used for the detection network according to the first prediction map and the second prediction map. The model is trained to obtain a trained detection network model, wherein the second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the prediction map. The bounding box of the target is set; the detection module is used to detect the image to be tested by using the trained detection network model, so as to obtain the detection result of the preset target in the image to be tested.
本发明实施例还提供一种存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时执行上述目标检测方法的步骤。An embodiment of the present invention further provides a storage medium, on which a computer program is stored, and the computer program executes the steps of the above target detection method when the computer program is run by a processor.
本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行权利要求上述目标检测方法的步骤。An embodiment of the present invention further provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the target detection in the claim when running the computer program steps of the method.
与现有技术相比,本发明实施例的技术方案具有以下有益效果:Compared with the prior art, the technical solutions of the embodiments of the present invention have the following beneficial effects:
本发明实施例提供一种目标检测方法,所述方法包括:获取样本图像,所述样本图像包括预设目标;提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;根据所述第一预测图和第二预测图对检 测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。本发明实施例的方案中,采用样本图像训练检测网络模型时,首先提取样本图像的初始特征图,然后对初始特征图进行语义信息增强处理,得到能够指示样本图像中的目标区域和背景区域的第一预测图,再根据第一预测图和能够指示预设目标边界框的第二预测图训练检测网络模型,以得到训练后的检测网络模型。由于第一预测图能够指示样本图像中的目标区域和背景区域,因此,第一预测图能够包含较多的样本图像中预设目标的语义信息,采用第一预测图训练检测网络模型,可以使得该检测网络模型很好地学习到预设目标的语义信息,在后续检测待测图像时,能够更准确地得到预设目标的检测结果,而且检测效率更高。An embodiment of the present invention provides a target detection method, the method includes: acquiring a sample image, where the sample image includes a preset target; extracting an initial feature map of the sample image, and performing semantic information enhancement on the initial feature map process to obtain a first prediction map of the sample image, the first prediction map is used to indicate the target area and background area of the sample image, the target area is the area containing the preset target, the The background area is an area that does not contain the preset target; the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the second prediction map is Calculated on the sample image according to the initial feature map, the second prediction map is used to indicate the bounding box of the preset target; the trained detection network model is used to detect the image to be tested to obtain the The detection result of the preset target in the image to be tested. In the solution of the embodiment of the present invention, when using the sample image to train the detection network model, the initial feature map of the sample image is first extracted, and then the semantic information enhancement processing is performed on the initial feature map to obtain the target area and the background area in the sample image. the first prediction map, and then train the detection network model according to the first prediction map and the second prediction map capable of indicating the preset target bounding box, so as to obtain the trained detection network model. Since the first prediction map can indicate the target area and the background area in the sample image, the first prediction map can contain more semantic information of the preset target in the sample image. Using the first prediction map to train the detection network model can make The detection network model learns the semantic information of the preset target well, and when the image to be tested is subsequently detected, the detection result of the preset target can be obtained more accurately, and the detection efficiency is higher.
进一步,本发明实施例中,对初始特征图进行语义信息增强处理时,对初始特征图进行上采样并根据通道注意力机制对其进行处理,直至上采样倍数与提取初始特征图时下采样的倍数相同。通过上采样处理,可以使得第一预测图的大小与样本图像相同,以便后续根据第一预测图和样本图像训练损失函数。初始特征图中每个像素点包含多个通道,采用通道注意力机制进行语义信息增强处理,强化与预设目标相关性大的通道,弱化与预设目标相关性小的通道,最后对初始特征图进行卷积运算,将第一预测图的通道数调整为2,使得第一预测图能够采用二分类的方式(也即,指示目标区域和背景区域)直观地体现预设目标的语义信息。Further, in the embodiment of the present invention, when the semantic information enhancement processing is performed on the initial feature map, the initial feature map is up-sampled and processed according to the channel attention mechanism, until the up-sampling multiple is equal to the down-sampling multiple when extracting the initial feature map. same. Through the upsampling process, the size of the first prediction map can be made the same as the sample image, so that the loss function can be subsequently trained according to the first prediction map and the sample image. Each pixel in the initial feature map contains multiple channels, and the channel attention mechanism is used to enhance semantic information, strengthen the channel with high correlation with the preset target, weaken the channel with small correlation with the preset target, and finally analyze the initial feature. Convolution operation is performed on the image, and the number of channels of the first prediction image is adjusted to 2, so that the first prediction image can intuitively reflect the semantic information of the preset target in a two-class manner (ie, indicating the target area and the background area).
进一步,本发明实施例中,第一损失函数的权重系数由预设目标占样本图像中所有目标的比例确定,预设目标的占比越大,第一损失函数的权重系数越大,也即,经过语义信息增强处理得到的第一预测图在训练检测网络模型时的作用越大,使得检测网络模型在检测预设目标时具有良好的性能。Further, in this embodiment of the present invention, the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image, and the larger the ratio of the preset target, the larger the weight coefficient of the first loss function, that is, , the first prediction map obtained by the semantic information enhancement process plays a greater role in training the detection network model, so that the detection network model has good performance in detecting the preset target.
附图说明Description of drawings
图1是本发明实施例中一种目标检测方法的流程示意图。FIG. 1 is a schematic flowchart of a target detection method in an embodiment of the present invention.
图2是本发明实施例中一种目标检测方法适用的检测网络模型的结构示意图。FIG. 2 is a schematic structural diagram of a detection network model applicable to a target detection method in an embodiment of the present invention.
图3是图2中语义信息增强模块的结构示意图。FIG. 3 is a schematic structural diagram of the semantic information enhancement module in FIG. 2 .
图4是图3中第一残差模块的结构示意图。FIG. 4 is a schematic structural diagram of the first residual module in FIG. 3 .
图5是本发明实施例中第一预测图的示意图。FIG. 5 is a schematic diagram of a first prediction map in an embodiment of the present invention.
图6是本发明实施例中一种目标检测装置的结构示意图。FIG. 6 is a schematic structural diagram of a target detection apparatus in an embodiment of the present invention.
具体实施方式Detailed ways
如背景技术所述,亟需提出一种高效、准确的目标检测方法,以提高小尺寸目标的检测准确率。As described in the background art, there is an urgent need to propose an efficient and accurate target detection method to improve the detection accuracy of small-sized targets.
本发明的发明人经过研究发现,现有技术中,用于小尺寸目标检测的卷积神经网络主要有特征金字塔网络(Feature Pyramid Networks,FPN)、采用生成式对抗网络(Generative Adversarial Networks,GAN)或者是采用图像金字塔尺寸归一化网络(Scale Normalization for Image Pyramids,SNIP)等。其中,FPN通过融合不同尺度的特征以获取图像中小尺寸目标的更多信息,GAN则通过复原小目标图像信息来提升检测精度,而SNIP是在多尺度训练的基础上只对与预训练尺度相匹配的目标进行梯度回传,以提高检测精度。The inventor of the present invention found through research that, in the prior art, the convolutional neural networks used for small-size target detection mainly include Feature Pyramid Networks (FPN), Generative Adversarial Networks (GAN), and Generative Adversarial Networks (GAN). Or use an image pyramid size normalization network (Scale Normalization for Image Pyramids, SNIP), etc. Among them, FPN obtains more information of small-sized objects in the image by fusing features of different scales, GAN improves the detection accuracy by restoring the image information of small objects, and SNIP is based on multi-scale training only for the same scale as the pre-training scale. The matched targets are gradient backhauled to improve detection accuracy.
不论采用何种结构的卷积神经网络,为了提高对于小尺寸目标的检测性能,在网络训练的过程中,都需要使该网络充分学习到小尺寸目标的语义信息。但由于小尺寸目标在图像中占比较小,图像模糊,分辨率较低,卷积神经网络在学习图像中小尺寸目标的特征信息的过 程中,能提取到的小尺寸目标的语义信息非常有限。因此,卷积神经网络模型对于小尺寸目标的特征信息的表达能力较弱。No matter what kind of convolutional neural network structure is used, in order to improve the detection performance of small-sized objects, in the process of network training, it is necessary to make the network fully learn the semantic information of small-sized objects. However, due to the small proportion of small-size objects in the image, the image is blurry, and the resolution is low. In the process of learning the feature information of small-size objects in the image, the semantic information of small-size objects that can be extracted by the convolutional neural network is very limited. Therefore, the convolutional neural network model has a weak ability to express the feature information of small-sized targets.
为了使卷积神经网络获得更多小尺寸目标的语义信息,通常采用加深网络深度的方式,也即通过提高卷积层数量,使网络在训练过程中获得更多小尺寸目标的语义信息,但这种方法需要大幅度增加卷积神经网络中卷积层的数量,会导致网络结构复杂,深度较深,在后续检测小尺寸目标时耗费时间较长,导致卷积神经网络在实际检测小尺寸目标的应用中性能不高。In order to make the convolutional neural network obtain more semantic information of small-sized objects, the method of deepening the network depth is usually adopted, that is, by increasing the number of convolutional layers, the network can obtain more semantic information of small-sized objects during the training process. This method needs to greatly increase the number of convolutional layers in the convolutional neural network, which will result in a complex network structure and a deep depth, and it will take a long time to detect small-size targets in the subsequent detection, resulting in the convolutional neural network in the actual detection of small size. The performance of the target application is not high.
为了解决上述技术问题,本发明实施例提供一种目标检测方法,所述方法包括:获取样本图像,所述样本图像包括预设目标;提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。本发明实施例的方案中,采用样本图像训练检测网络模型时,首先提取样本图像的初始特征图,然后对初始特征图进行语义信息增强处理,得到能够指示样本图像中的目标区域和背景区域的第一预测图,再根据第一预测图和能够指示预设目标边界框的第二预测图训练检测网络模型,以得到训练后的检测网络模型。由于第一预测图能够指示样本图像中的目标区域和背景区域,因此,第一预测图能够包含较多的样本图像中预设目标的语义信息,采用第一预测图训练检测网络模型,可以使得该检测网络模型很好地学习到预设目标的语义信息,在后续检测待测图像时,能够快速、准确地得到预设目标的检测结果。In order to solve the above technical problem, an embodiment of the present invention provides a target detection method. The method includes: acquiring a sample image, where the sample image includes a preset target; extracting an initial feature map of the sample image, and analyzing the initial The feature map is subjected to semantic information enhancement processing to obtain a first prediction map of the sample image, the first prediction map is used to indicate the target area and background area of the sample image, and the target area contains the preset The target area, the background area is an area that does not contain the preset target; the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the The second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the bounding box of the preset target; the trained detection network model is used to detect the to-be-tested image to obtain the detection result of the preset target in the image to be tested. In the solution of the embodiment of the present invention, when using the sample image to train the detection network model, the initial feature map of the sample image is first extracted, and then the semantic information enhancement processing is performed on the initial feature map to obtain the target area and the background area in the sample image. the first prediction map, and then train the detection network model according to the first prediction map and the second prediction map capable of indicating the preset target bounding box, so as to obtain the trained detection network model. Since the first prediction map can indicate the target area and the background area in the sample image, the first prediction map can contain more semantic information of the preset target in the sample image. Using the first prediction map to train the detection network model can make The detection network model learns the semantic information of the preset target well, and can quickly and accurately obtain the detection result of the preset target when the image to be tested is subsequently detected.
为使本发明的上述目的、特征和有益效果能够更为明显易懂,下面结合附图对本发明的具体实施例做详细的说明。In order to make the above objects, features and beneficial effects of the present invention more clearly understood, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
参照图1,图1是本发明实施例中一种目标检测方法的流程示意图。所述目标检测方法可以由终端执行,所述终端可以是各种恰当的终端,例如手机、电脑、物联网设备等,但并不限于此。所述方法可以用于检测待测图像中是否包含预设目标,也可以用于检测待测图像中预设目标的具体位置和类别,但并不限于此。所述待测图像可以是终端实时采集到的图像,也可以是预先存储在终端上的图像,还可以是终端从外部接收到的图像,但并不限于此。所述预设目标可以是终端根据预先从外部接收到的指令来确定的,也可以是终端通过各种恰当的模型对样本图像进行识别来确定的。Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a target detection method in an embodiment of the present invention. The target detection method may be performed by a terminal, and the terminal may be various appropriate terminals, such as a mobile phone, a computer, an Internet of Things device, etc., but is not limited thereto. The method can be used to detect whether the image to be tested contains a preset target, and can also be used to detect the specific position and category of the preset target in the image to be tested, but it is not limited thereto. The image to be tested may be an image collected in real time by the terminal, an image pre-stored on the terminal, or an image received from the outside by the terminal, but is not limited thereto. The preset target may be determined by the terminal according to an instruction received from the outside in advance, or may be determined by the terminal recognizing the sample image through various appropriate models.
图1所示的目标检测方法具体可以包括如下步骤:The target detection method shown in FIG. 1 may specifically include the following steps:
步骤S101:获取样本图像,所述样本图像包括预设目标;Step S101: acquiring a sample image, where the sample image includes a preset target;
步骤S102:提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;Step S102: Extract the initial feature map of the sample image, and perform semantic information enhancement processing on the initial feature map to obtain a first prediction map of the sample image, where the first prediction map is used to indicate the sample The target area and the background area of the image, the target area is an area that includes the preset target, and the background area is an area that does not include the preset target;
步骤S103:根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;Step S103: Train the detection network model according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the second prediction map is based on the initial feature map for the sample. The image is calculated and obtained, and the second prediction map is used to indicate the bounding box of the preset target;
步骤S104:采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。Step S104: Use the trained detection network model to detect the image to be tested, so as to obtain a detection result of the preset target in the image to be tested.
在步骤S101的具体实施中,终端可以从外部获取样本图像,也可以从存储在本地的训练集中选取至少一部分作为样本图像,所述样本图像可以包括预设目标。In the specific implementation of step S101, the terminal may acquire sample images from outside, or may select at least a part of the training set stored locally as sample images, and the sample images may include preset targets.
进一步地,所述预设目标是指特定的目标对象,例如,交通标志牌、车牌、人脸等,该预设目标可以是终端根据预先从外部接收到的指令来确定,也可以是终端通过各种恰当的模型对样本图像进行识别来确定的。Further, the preset target refers to a specific target object, for example, a traffic sign, a license plate, a human face, etc. The preset target may be determined by the terminal according to an instruction received from the outside in advance, or may be determined by the terminal through A variety of appropriate models are used to identify sample images.
此外,所述预设目标还可以额外增加其他条件,例如在特定的目标对象基础上,可以是尺寸不超过预设尺寸,也可以是尺寸占其所述图像的尺寸不超过预设比例,但并不限于此。所述预设尺寸、预设比例可以是预先设置的。In addition, other conditions may be added to the preset target, for example, based on a specific target object, the size may not exceed the preset size, or the size may not exceed the preset proportion of the size of the image, but It is not limited to this. The preset size and preset ratio may be preset.
进一步地,样本图像中可以包括标识图形,标识图形用于指示样本图像中预设目标的位置,也可以用于指示样本图像中预设目标的类别。例如,在多目标检测(也即,预设目标有多个)的场景下,可以采用不同形状的标识图形来表示不同类别的预设目标。Further, the sample image may include an identification graphic, and the identification graphic is used to indicate the position of the preset target in the sample image, and may also be used to indicate the category of the preset target in the sample image. For example, in the scenario of multi-target detection (that is, there are multiple preset targets), identification graphics of different shapes may be used to represent different types of preset targets.
在步骤S102的具体实施中,训练检测网络模型之前,需要先构建检测网络模型,该检测网络模型可以具有各种适当的结构。In the specific implementation of step S102, before training the detection network model, the detection network model needs to be constructed first, and the detection network model may have various appropriate structures.
参考图2,图2是本发明实施例中一种目标检测方法适用的检测网络模型的结构示意图。下面结合图2对本发明实施例中一种目标检测方法适用的检测网络模型进行非限制性的说明。Referring to FIG. 2, FIG. 2 is a schematic structural diagram of a detection network model applicable to a target detection method in an embodiment of the present invention. A detection network model applicable to a target detection method in an embodiment of the present invention is described below in a non-limiting manner with reference to FIG. 2 .
图2示出的检测网络模型可以包括特征提取模块21、预测模块22、语义信息增强模块23。The detection network model shown in FIG. 2 may include a feature extraction module 21 , a prediction module 22 , and a semantic information enhancement module 23 .
进一步地,所述检测网络模型可以是单步检测网络模型(单步检测网络模型是指,仅需要将待测图像送入网络后,不需要经过候选区域建议阶段,可以通过单个阶段直接得到预设目标的检测结果的网络模型),也可以是两步检测网络模型(两步检测网络模型是指,先基于输入的待测图像生成多个候选区域,再对候选区域进行分类的网络模型),还可以是其他任何恰当的网络模型,在此不做任何限制。作为一个非限制性实施例,检测网络模型为单步检测网络模型。Further, the detection network model may be a single-step detection network model (single-step detection network model means that only after the image to be tested needs to be sent to the network, it does not need to go through the candidate region suggestion stage, and the prediction can be directly obtained through a single stage. Set the network model of the detection result of the target), or it can be a two-step detection network model (two-step detection network model refers to the network model that generates multiple candidate regions based on the input image to be tested, and then classifies the candidate regions) , or any other appropriate network model, which is not limited here. As a non-limiting example, the detection network model is a single-step detection network model.
进一步地,特征提取模块21可以用于提取样本图像中的特征, 以得到样本图像的初始特征图。初始特征图可以包括预设目标在样本图像中的位置信息,也可以包括预设目标的语义信息。Further, the feature extraction module 21 can be used to extract features in the sample image to obtain an initial feature map of the sample image. The initial feature map may include position information of the preset target in the sample image, and may also include semantic information of the preset target.
进一步地,特征提取模块21可以通过对样本图像进行多次下采样,以得到初始特征图。例如,特征提取模块21对样本图像进行2 n倍下采样,得到初始特征图,n为正整数。 Further, the feature extraction module 21 can obtain an initial feature map by performing down-sampling on the sample image multiple times. For example, the feature extraction module 21 downsamples the sample image by 2n times to obtain an initial feature map, where n is a positive integer.
进一步地,特征提取模块21提取到的初始特征图可以传输至预测模块22,预测模块22可以根据初始特征图进行计算,以得到第二预测图。由于初始特征图中包含预设目标在样本图像中的位置信息和语义信息,第二预测图可以用于指示样本图像中预设目标的边界框。具体而言,预测模块22可以根据初始特征图计算出样本图像中预设目标的关键点位置、偏移量和预设目标尺寸等,但并不限于此。Further, the initial feature map extracted by the feature extraction module 21 may be transmitted to the prediction module 22, and the prediction module 22 may perform calculation according to the initial feature map to obtain the second prediction map. Since the initial feature map contains the position information and semantic information of the preset target in the sample image, the second prediction map can be used to indicate the bounding box of the preset target in the sample image. Specifically, the prediction module 22 may calculate, but not limited to, the key point position, offset, and size of the preset target in the sample image according to the initial feature map.
进一步地,特征提取模块21提取到的初始特征图还可以输入至语义信息增强模块23,语义信息增强模块23对初始特征图进行语义信息增强处理,以得到第一预测图。其中,语义信息增强模块23中包括至少一个子模块以及二分类预测模块,其中,所述子模块用于对初始特征图进行上采样以及提取初始特征图中预设目标的语义信息,所述二分类预测模块用于通过二分类的方法指示出第二预测图中的目标区域和背景区域,其中,目标区域是指包含目标的区域,背景区域是指未包含目标的区域,以便增强第二预测图的语义信息。Further, the initial feature map extracted by the feature extraction module 21 can also be input to the semantic information enhancement module 23, and the semantic information enhancement module 23 performs semantic information enhancement processing on the initial feature map to obtain the first prediction map. Among them, the semantic information enhancement module 23 includes at least one sub-module and a two-class prediction module, wherein the sub-module is used for up-sampling the initial feature map and extracting the semantic information of the preset target in the initial feature map, the two The classification prediction module is used to indicate the target area and the background area in the second prediction map by a binary classification method, wherein the target area refers to the area containing the target, and the background area refers to the area that does not contain the target, so as to enhance the second prediction Semantic information of the graph.
进一步地,在构建检测网络模型时,还需要确定所述子模块的数量,所述子模块的数量由上述下采样倍数决定。具体而言,特征提取模块21在提取样本图像的初始特征图时,对样本图像进行了2 n倍下采样,则子模块的数量即为n,n为正整数。 Further, when constructing the detection network model, the number of the sub-modules also needs to be determined, and the number of the sub-modules is determined by the above-mentioned downsampling multiple. Specifically, when the feature extraction module 21 extracts the initial feature map of the sample image, it downsamples the sample image by 2n times, so the number of sub-modules is n, and n is a positive integer.
参考图3,图3示出了图2中语义信息增强模块23的结构示意图。下面结合图3对图2中的语义增强模块进行非限制性的说明。Referring to FIG. 3 , FIG. 3 shows a schematic structural diagram of the semantic information enhancement module 23 in FIG. 2 . A non-limiting description of the semantic enhancement module in FIG. 2 is given below in conjunction with FIG. 3 .
图3示出了特征提取模块21对样本图像进行4倍下采样时的语义信息增强模块的结构示意图,其包括第一子模块31、第二子模块 32和二分类预测模块33。3 shows a schematic structural diagram of the semantic information enhancement module when the feature extraction module 21 performs 4 times downsampling on the sample image, which includes a first sub-module 31, a second sub-module 32 and a two-class prediction module 33.
进一步地,第一子模块31和第二子模块32均包括上采样模块34,第一残差模块35、第二残差模块36、通道注意力模块37。Further, both the first sub-module 31 and the second sub-module 32 include an upsampling module 34 , a first residual module 35 , a second residual module 36 , and a channel attention module 37 .
进一步地,所述上采样模块34可以用于对初始特征图进行上采样,使得经过语义信息增强模块得到的第一预测图的尺寸与样本图像的尺寸一致,以便后续用于训练损失函数。Further, the upsampling module 34 can be used to upsample the initial feature map, so that the size of the first prediction map obtained by the semantic information enhancement module is consistent with the size of the sample image, so as to be used for training the loss function subsequently.
进一步地,所述第一残差模块35可以用于提取更多上采样模块34输出的图像中预设目标的特征信息,还可以在提取特征信息的同时,避免梯度消失。所述第一残差模块35中可以包括多个卷积层,如果第一残差模块35中每个卷积层均采用1×1的卷积核,因此,第一残差模块35可以更好地提取每个像素点本身的特征。Further, the first residual module 35 can be used to extract more feature information of the preset target in the images output by the upsampling module 34, and can also avoid gradient disappearance while extracting the feature information. The first residual module 35 may include multiple convolutional layers. If each convolutional layer in the first residual module 35 adopts a 1×1 convolution kernel, the first residual module 35 may be more Well extract the features of each pixel itself.
图4示出了图3中第一残差模块35的结构示意图。下面结合图4对图3中的第一残差模块进行非限制性的说明。FIG. 4 shows a schematic structural diagram of the first residual module 35 in FIG. 3 . A non-limiting description of the first residual module in FIG. 3 is given below in conjunction with FIG. 4 .
第一残差模块35包括第一卷积层41、第二卷积层42和第三卷积层43,上采样模块34输出的图像可以输入至第一卷积层41,第一卷积层41可以采用1×1的卷积核。第一卷积层41输出的图像可以输入至第二卷积层42,第二卷积层42可以采用1×1的卷积核。作为一个非限制性实施例,第二卷积层42可以是分组卷积层。第二卷积层42输出的图像可以输入至第三卷积层43,第三卷积层43可以采用1×1的卷积核。The first residual module 35 includes a first convolutional layer 41, a second convolutional layer 42 and a third convolutional layer 43. The image output by the upsampling module 34 can be input to the first convolutional layer 41. The first convolutional layer 41 can use a 1×1 convolution kernel. The image output by the first convolution layer 41 can be input to the second convolution layer 42, and the second convolution layer 42 can use a 1×1 convolution kernel. As a non-limiting example, the second convolutional layer 42 may be a grouped convolutional layer. The image output by the second convolution layer 42 can be input to the third convolution layer 43, and the third convolution layer 43 can use a 1×1 convolution kernel.
进一步地,第三卷积层43的输出可以与第一卷积层41的输出相加,以得到第一残差模块35的输出,可以避免在检测网络模型中出现梯度消失的问题。Further, the output of the third convolution layer 43 can be added to the output of the first convolution layer 41 to obtain the output of the first residual module 35, which can avoid the problem of gradient disappearance in the detection network model.
进一步地,继续参考图3,第二残差模块36也可以用于提取更多上采样模块34输出的图像中预设目标的特征信息,还可以在提取更多的特征信息的同时,避免梯度消失。所述第二残差模块36中可以包括多个卷积层,其中,至少一个卷积层包括3×3的卷积核,因 此,第二残差模块36可以更好地提取每个像素点对应的感受野的特征。在一个非限制性实施例中,第二残差模块36可以包括第四卷积层、第五卷积层和第六卷积层。其中,上采样模块34输出的图像可以输入至第四卷积层,第四卷积层可以采用1×1的卷积核。第四卷积层输出的图像可以输入至第五卷积层,第五卷积层可以采用3×3的卷积核。作为一个非限制性实施例,第五卷积层可以是分组卷积层。第五卷积层输出的图像可以输入至第六卷积层,第六卷积层可以采用1×1的卷积核。Further, referring to FIG. 3, the second residual module 36 can also be used to extract more feature information of the preset target in the images output by the upsampling module 34, and can also extract more feature information while avoiding gradients. disappear. The second residual module 36 may include multiple convolution layers, wherein at least one convolution layer includes a 3×3 convolution kernel, so the second residual module 36 can better extract each pixel point. Corresponding receptive field characteristics. In one non-limiting embodiment, the second residual module 36 may include a fourth convolutional layer, a fifth convolutional layer, and a sixth convolutional layer. The image output by the upsampling module 34 can be input to the fourth convolution layer, and the fourth convolution layer can use a 1×1 convolution kernel. The image output by the fourth convolutional layer can be input to the fifth convolutional layer, and the fifth convolutional layer can use a 3×3 convolution kernel. As a non-limiting example, the fifth convolutional layer may be a grouped convolutional layer. The image output by the fifth convolutional layer can be input to the sixth convolutional layer, and the sixth convolutional layer can use a 1×1 convolution kernel.
进一步地,第六卷积层的输出可以与第四卷积层的输出相加,以得到第二残差模块36的输出,可以避免在检测网络模型中出现梯度消失的问题。Further, the output of the sixth convolutional layer can be added with the output of the fourth convolutional layer to obtain the output of the second residual module 36, which can avoid the problem of gradient disappearance in the detection network model.
进一步地,通道注意力模块37可以用于根据通道注意力机制对上采样模块34输出的图像进行处理。具体而言,对于图像中的每个像素点,可以采用激活函数确定各个通道的权重值,通道与预设目标越相关,该通道的权重值越大,然后基于各个通道的权重值对上采样模块34输出的图像进行加权计算处理,以使得通道注意力模块37输出的图像能够很好地表征预设目标的语义信息,指向性更强。Further, the channel attention module 37 can be used to process the image output by the upsampling module 34 according to the channel attention mechanism. Specifically, for each pixel in the image, the activation function can be used to determine the weight value of each channel. The more relevant the channel is to the preset target, the greater the weight value of the channel, and then the upsampling is based on the weight value of each channel. The image output by the module 34 is subjected to weighted calculation processing, so that the image output by the channel attention module 37 can well represent the semantic information of the preset target and has a stronger directivity.
需要说明的是,上述第一残差模块35和第二残差模块36是可选的,也即,第一子模块31或第二子模块32中可以只包括上采样模块34和通道注意力模块37,也可以包括上采样模块34、通道注意力模块37、和/或第一残差模块35、和/或第二残差模块36,但并不限于此。It should be noted that the above-mentioned first residual module 35 and second residual module 36 are optional, that is, the first sub-module 31 or the second sub-module 32 may only include the upsampling module 34 and the channel attention The module 37 may also include an upsampling module 34, a channel attention module 37, and/or a first residual module 35, and/or a second residual module 36, but is not limited thereto.
进一步地,二分类预测模块33可以用于将图像的通道值调整为2,二分类预测模块可以包含第七卷积层,第七卷积层的滤波器的数量为2,使得二分类预测模块输出的第一预测图可以用于指示目标区域和背景区域,其中,目标区域为包含预设目标的区域,所述背景区域为未包含预设目标的区域。Further, the two-class prediction module 33 can be used to adjust the channel value of the image to 2, the two-class prediction module can include a seventh convolution layer, and the number of filters in the seventh convolution layer is 2, so that the two-class prediction module The outputted first prediction map may be used to indicate a target area and a background area, where the target area is an area containing a preset target, and the background area is an area that does not contain a preset target.
继续参考图1,检测网络模型构建后,终端可以将样本图像输入 至检测网络模型,检测网络模型可以提取样本图像的初始特征图,例如,可以采用对样本图像进行下采样,以提取初始特征图。其中,所述初始特征图的每个像素点包括多个通道。初始特征图可以包括预设目标在样本图像中的位置信息,也可以包括预设目标的语义信息。Continuing to refer to Figure 1, after the detection network model is constructed, the terminal can input the sample image to the detection network model, and the detection network model can extract the initial feature map of the sample image, for example, the sample image can be down-sampled to extract the initial feature map . Wherein, each pixel of the initial feature map includes multiple channels. The initial feature map may include position information of the preset target in the sample image, and may also include semantic information of the preset target.
需要说明的是,下采样倍数是由提取初始特征图的模块的结构决定的,其对样本图像的下采样倍数与对待测图像的下采样倍数相同。又例如,可以采用图2中的特征提取模块21提取该初始特征图。It should be noted that the downsampling multiple is determined by the structure of the module for extracting the initial feature map, and the downsampling multiple for the sample image is the same as the downsampling multiple for the image to be tested. For another example, the initial feature map can be extracted by using the feature extraction module 21 in FIG. 2 .
进一步地,在提取样本图像的初始特征图之前,还可以对所述样本图像进行数据增强,所述数据增强包括以下一项或多项:调整所述样本图像的亮度和/或对比度、将所述样本图像旋转预设角度、为所述样本图像增加噪声,但并不限于此。Further, before extracting the initial feature map of the sample image, data enhancement may also be performed on the sample image, and the data enhancement includes one or more of the following: adjusting the brightness and/or contrast of the sample image; The sample image is rotated by a preset angle to add noise to the sample image, but it is not limited thereto.
进一步地,终端可以根据样本图像的初始特征图计算样本图像中预设目标的关键点位置、偏移量和预设目标尺寸等,以得到样本图像的第二预测图,第二预测图可以用于指示样本图像中预设目标的边界框。Further, the terminal can calculate the key point position, offset and preset target size of the preset target in the sample image according to the initial feature map of the sample image, so as to obtain the second prediction map of the sample image, and the second prediction map can be used as Indicates the bounding box of the preset object in the sample image.
进一步地,可以对初始特征图进行语义信息增强处理,以得到样本图像的第一预测图。例如,可以将初始特征图传输至图2中的语义信息增强模块23进行语义信息增强处理。Further, semantic information enhancement processing may be performed on the initial feature map to obtain the first prediction map of the sample image. For example, the initial feature map can be transmitted to the semantic information enhancement module 23 in FIG. 2 for semantic information enhancement processing.
在一个非限制性实施例中,所述初始特征图是通过对所述样本图像进行下采样得到的,则对所述初始特征图进行语义信息增强处理可以包括:步骤一:对所述初始特征图进行2倍上采样,以得到第一特征图;步骤二:根据通道注意力机制对所述第一特征图进行处理,以得到第二特征图;步骤三:将所述第二特征图作为新的初始特征图;重复执行步骤一至步骤三,直至所述上采样倍数与所述下采样倍数相等;步骤四:对所述初始特征图进行第一卷积运算,以得到所述第一预测图,其中,所述第一预测图的通道数量为2。In a non-limiting embodiment, the initial feature map is obtained by downsampling the sample image, and performing semantic information enhancement processing on the initial feature map may include: Step 1: The image is upsampled twice to obtain the first feature map; Step 2: Process the first feature map according to the channel attention mechanism to obtain the second feature map; Step 3: Use the second feature map as a new initial feature map; repeat steps 1 to 3 until the upsampling multiple is equal to the downsampling multiple; step 4: perform a first convolution operation on the initial feature map to obtain the first prediction Figure, wherein the number of channels of the first prediction map is 2.
具体而言,对于每次上采样得到的图像,均采用通道注意力机制 进行处理,直至对初始特征图进行上采样的倍数与对样本图像进行下采样的倍数相同。上采样的次数或者根据通道注意力机制进行处理的次数由下采样的倍数决定,如果下采样倍数为2 n,则上采样的次数或者采用通道注意力机制进行处理的次数即为n,n为正整数。 Specifically, for each up-sampling image, the channel attention mechanism is used for processing, until the up-sampling factor of the initial feature map is the same as the down-sampling factor of the sample image. The number of upsampling or the number of processing according to the channel attention mechanism is determined by the multiple of downsampling. If the downsampling multiple is 2 n , the number of upsampling or the number of processing using the channel attention mechanism is n, where n is positive integer.
进一步地,采用注意力通道机制对第一特征图进行处理时,对于第一特征图中每个像素点,可以采用激活函数确定像素点的各个通道的权重值,通道与预设目标越相关,该通道的权重值越大,然后基于各个通道的权重值对第一特征图进行加权计算处理,以得到第二特征图,由此,第二特征图能够清晰地表征预设目标的语义信息,指向性更强。Further, when using the attention channel mechanism to process the first feature map, for each pixel in the first feature map, an activation function can be used to determine the weight value of each channel of the pixel, and the more relevant the channel is to the preset target, The larger the weight value of the channel, the weighted calculation processing is performed on the first feature map based on the weight value of each channel to obtain the second feature map. Therefore, the second feature map can clearly represent the semantic information of the preset target. more directivity.
进一步地,当所述上采样倍数与所述下采样倍数相等时,可以对所述初始特征图进行第一卷积运算,例如,可以采用图3中的二分类预测模块33进行第一卷积运算。在进行第一卷积运算过程中,采用的滤波器的数量为2,得到的第一预测图的通道为2,因此,第一预测图能够指示目标区域和背景区域,所述目标区域是指样本图像中包含预设目标的区域,背景区域是指样本图像中未包括预设目标的区域。参考图5,图5是本发明实施例中第一预测图的示意图,其中,目标区域51和目标区域52即为包含预设目标的区域,而背景区域53中未包含预设目标。Further, when the upsampling multiple is equal to the downsampling multiple, a first convolution operation may be performed on the initial feature map, for example, the first convolution may be performed by using the two-class prediction module 33 in FIG. 3 . operation. During the first convolution operation, the number of filters used is 2, and the channels of the obtained first prediction map are 2. Therefore, the first prediction map can indicate the target area and the background area, and the target area refers to The sample image includes the area of the preset target, and the background area refers to the area of the sample image that does not include the preset target. Referring to FIG. 5 , FIG. 5 is a schematic diagram of a first prediction map according to an embodiment of the present invention, in which the target area 51 and the target area 52 are areas including the preset target, and the background area 53 does not include the preset target.
由此,通过语义信息增强处理,可以在将初始特征图的尺寸还原为样本图像的尺寸的同时,可以筛选出每个像素中重要的通道,所述重要的通道是指与预设目标相关性较大的通道,然后通过第一卷积运算以得到第一预测图,使得第一预测图能够采用二分类的方式直观地体现样本图像中的目标区域和背景区域,使得第一预测图能够包括丰富的语义信息。Therefore, through the semantic information enhancement processing, the size of the initial feature map can be restored to the size of the sample image, and at the same time, the important channels in each pixel can be screened out, and the important channels refer to the correlation with the preset target. Larger channel, and then obtain the first prediction map through the first convolution operation, so that the first prediction map can intuitively reflect the target area and background area in the sample image in a two-class manner, so that the first prediction map can include Rich semantic information.
进一步地,继续参考图1,根据通道注意力机制对所述第一特征图进行处理之前,还可以对第一特征图进行多次第二卷积运算,其中,每次第二卷积运算可以采用1×1的卷积核;将多次第二卷积运 算的最终结果与所述第一特征图相加,以得到新的第一特征图。Further, continuing to refer to FIG. 1 , before processing the first feature map according to the channel attention mechanism, multiple second convolution operations may be performed on the first feature map, wherein each second convolution operation may A 1×1 convolution kernel is used; the final result of multiple second convolution operations is added to the first feature map to obtain a new first feature map.
进一步地,第一次第二卷积运算之外的其他第二卷积运算中,均先对第一特征图进行批标准化处理,再使用relu激活函数。以使得检测网络模型更加优化。Further, in other second convolution operations other than the first second convolution operation, batch normalization is performed on the first feature map first, and then the relu activation function is used. In order to make the detection network model more optimized.
在一个非限制性实施例中,第二卷积运算的次数为3。其中,对第一特征图进行第二次第二卷积运算时,可以采用分组卷积的方式进行计算。In one non-limiting embodiment, the number of second convolution operations is three. Wherein, when performing the second second convolution operation on the first feature map, the calculation may be performed in a grouped convolution manner.
在另一个非限制性实施例中,可以采用图4所示的第一残差模块35进行多次第二卷积运算。In another non-limiting embodiment, the first residual module 35 shown in FIG. 4 may be used to perform multiple second convolution operations.
进一步地,根据通道注意力机制对所述第一特征图进行处理之前,还可以对第一特征图进行多次第三卷积运算,多次第三卷积运算中至少一次采用3×3的卷积核;将多次第三卷积运算的最终结果与所述第一特征图相加,以得到新的第一特征图。Further, before the first feature map is processed according to the channel attention mechanism, multiple third convolution operations may be performed on the first feature map, and at least one of the multiple third convolution operations uses a 3×3 convolution operation. Convolution kernel: adding the final result of multiple third convolution operations to the first feature map to obtain a new first feature map.
进一步地,进行多次第三卷积运算的过程中,第一次第三卷积运算之外的其他第二卷积运算可以均采用relu激活函数,并进行批标准化处理,以使得检测网络模型更加优化。Further, in the process of performing multiple third convolution operations, other second convolution operations other than the first third convolution operation can use the relu activation function, and perform batch normalization processing, so as to detect the network model. more optimized.
在一个非限制性实施例中,第三卷积运算的次数为3,其中,第二次第三卷积运算采用3×3的卷积核,第二次第三卷积运算以外的其他第三卷积运算采用1×1的卷积核。此外,第二次第三卷积运算采用分组卷积的方式进行计算,第二次第三卷积运算以外的其他第三卷积运算采用普通卷积的方式进行计算。In a non-limiting embodiment, the number of times of the third convolution operation is 3, wherein, the second third convolution operation adopts a 3×3 convolution kernel, and other third convolution operations other than the second third convolution operation The triple convolution operation uses a 1×1 convolution kernel. In addition, the second third convolution operation is calculated by using the grouped convolution method, and the other third convolution operations other than the second third convolution operation are calculated by the ordinary convolution method.
在另一个非限制性实施例中,可以采用图3所示的第二残差模块36进行多次第三卷积运算。In another non-limiting embodiment, the second residual module 36 shown in FIG. 3 may be used to perform multiple third convolution operations.
在步骤S103的具体实施中,根据第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型。In the specific implementation of step S103, the detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model.
具体而言,所述检测网络模型的损失函数由第一损失函数和第二 损失函数构成。具体而言,可以采用以下公式表示所述检测网络模型的损失函数:Specifically, the loss function of the detection network model consists of a first loss function and a second loss function. Specifically, the following formula can be used to express the loss function of the detection network model:
Loss=λ semanticL semanticmodelL modelLoss=λ semantic L semanticmodel L model ,
其中,Loss为所述检测网络模型的损失函数,L semantic为第一损失函数,L model为第二损失函数,λ semantic为第一损失函数的权重系数,λ model为第二损失函数的权重系数;其中,λ semanticmodel=1。 Wherein, Loss is the loss function of the detection network model, L semantic is the first loss function, L model is the second loss function, λ semantic is the weight coefficient of the first loss function, and λ model is the weight coefficient of the second loss function ; where, λ semantic + λ model =1.
进一步地,训练所述检测网络模型可以包括以下步骤:步骤A:根据所述第一预测图、所述样本图像和所述第一损失函数计算第一损失函数值,并根据所述第二预测图、所述样本图像和所述第二损失函数计算第二损失函数值;步骤B:根据所述第一损失函数值和所述第二损失函数值计算所述检测网络模型的损失函数值;步骤C:判断所述损失函数值超过预设阈值,如果是,则调整所述检测网络模型中用于提取所述初始特征图的模块的参数,如果不是,则结束所述检测网络模型的训练;步骤D:采用调整参数后的检测网络模型提取所述样本图像的初始特征图,对所述初始特征图进行语义信息增强处理,以得到所述第一预测图,并根据所述初始特征图对所述样本图像进行预测,以得到所述第一预测图;Further, training the detection network model may include the following steps: Step A: Calculate a first loss function value according to the first prediction map, the sample image and the first loss function, and calculate the value of the first loss function according to the second prediction Figure, the sample image and the second loss function to calculate the second loss function value; Step B: Calculate the loss function value of the detection network model according to the first loss function value and the second loss function value; Step C: determine that the loss function value exceeds a preset threshold, if so, adjust the parameters of the module used to extract the initial feature map in the detection network model, if not, end the training of the detection network model Step D: Extract the initial feature map of the sample image using the detection network model after adjusting the parameters, and carry out semantic information enhancement processing to the initial feature map to obtain the first prediction map, and according to the initial feature map performing prediction on the sample image to obtain the first prediction map;
重复执行步骤A至步骤E,直至步骤C中判断所述损失函数值未超过预设阈值,也即,直至步骤C中跳转至结束所述检测网络模型的训练。所述预设阈值可以是终端从外部接收的,也可以是终端计算确定的。由此可知,在训练检测图像的过程中,通过多次调整所述检测网络模型中用于提取所述初始特征图的模块(例如图2中的特征提取模块21)的参数,可以使得检测网络模型学习到充分的预设目标的语义信息。由于第一预测图通过二分类的方式指示出目标区域和背景区域,其包含丰富的语义信息,在根据第一预测图和样本图像计算第一损失函数值时,无需通过加深检测网络模型的深度来使检测网络模型学习到更多的语义信息。Steps A to E are repeatedly performed until it is determined in step C that the loss function value does not exceed the preset threshold, that is, until step C jumps to ending the training of the detection network model. The preset threshold value may be received by the terminal from the outside, or may be determined by the terminal calculation. It can be seen from this that in the process of training the detection image, by adjusting the parameters of the module (for example, the feature extraction module 21 in FIG. 2 ) in the detection network model for extracting the initial feature map for many times, the detection network can be The model learns sufficient semantic information of the preset target. Since the first prediction map indicates the target area and the background area by means of binary classification, which contains rich semantic information, when calculating the first loss function value according to the first prediction map and the sample image, there is no need to deepen the depth of the detection network model. To make the detection network model learn more semantic information.
作为一个非限制性实施例,所述第一损失函数可以为FocalLoss 函数,也即,
Figure PCTCN2021131132-appb-000001
As a non-limiting example, the first loss function may be the FocalLoss function, that is,
Figure PCTCN2021131132-appb-000001
其中,y=1表示样本图像为正样本,也即,该样本图像中包含预设目标,y=0表示样本图像为负样本,也即,该样本图像中未包含预设目标;p为检测网络模型对于预设目标的预测概率。需要说明的是,第一损失函数为FocalLoss函数时,可以解决样本图像中正负样本不平衡的问题。当预设目标为小尺寸目标时,由于样本图像中小尺寸目标的数量一般较少,采用FocalLoss函数作为第一训练函数,则可以避免因小尺寸目标数量较少而训练不充分的问题。Among them, y=1 indicates that the sample image is a positive sample, that is, the sample image contains a preset target, y=0 indicates that the sample image is a negative sample, that is, the sample image does not contain a preset target; p is the detection The predicted probability of the network model for the preset target. It should be noted that, when the first loss function is the FocalLoss function, the problem of unbalanced positive and negative samples in the sample image can be solved. When the preset target is a small-sized target, since the number of small-sized targets in the sample image is generally small, using the FocalLoss function as the first training function can avoid the problem of insufficient training due to a small number of small-sized targets.
进一步地,第一损失函数的权重系数由所述样本图像中所述预设目标占所有目标的比例决定,所述预设目标占所有目标的比例越大,则所述第一损失函数的权重系数越大,也即,经过语义信息增强处理得到的第一预测图在训练检测网络模型时的作用越大,使得检测网络模型在检测预设目标时具有良好的性能。在一个非限制性实施例中,预设目标为尺寸小于32×32的目标,该预设目标在样本图像中的占比为10%,则λ semantic=0.1,λ model=0.9。 Further, the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image. The larger the ratio of the preset target to all the targets, the greater the weight of the first loss function. The larger the coefficient, that is, the greater the effect of the first prediction map obtained through semantic information enhancement processing in training the detection network model, so that the detection network model has good performance in detecting preset targets. In a non-limiting embodiment, the preset target is a target whose size is less than 32×32, and the proportion of the preset target in the sample image is 10%, then λ semantic =0.1, λ model =0.9.
在步骤S104的具体实施中,终端可以先获取待测图像,所述待测图像可以是终端实时采集到的待测图像,也可以是实时从外部接收到的待测图像,还可以是预先存储在本地的待测图像,但并不限于此。In the specific implementation of step S104, the terminal may first obtain the image to be measured, and the image to be measured may be the image to be measured collected by the terminal in real time, or the image to be measured received from the outside in real time, or stored in advance. The image to be tested locally, but not limited to this.
进一步地,采用训练后的检测网络模型检测待测图像,如果检测到所述待测图像中包含所述预设目标,则可以输出预设目标的位置和范围,例如,在待测图像中用边界框(Bounding Box)标注出所述预设目标的位置和范围。Further, the trained detection network model is used to detect the image to be tested. If it is detected that the image to be tested contains the preset target, the position and range of the preset target can be output. The Bounding Box marks the position and range of the preset target.
进一步地,当进行多目标检测时,也即预设目标有多个类别时,终端还可以同时识别出预设目标的类别信息。Further, when performing multi-target detection, that is, when the preset target has multiple categories, the terminal can also identify category information of the preset target at the same time.
由于训练后的检测网络模型能够充分学习到预设目标的语义信息,在检测待测图像时,提取到的特征图(例如,可以采用图2中特 征提取模块21提取)能够包含丰富的语义信息,从而根据待测图像的特征图进行计算(例如,可以采用图2中预测模块22进行计算),即可得到预设目标的检测结果。Since the trained detection network model can fully learn the semantic information of the preset target, when detecting the image to be tested, the extracted feature map (for example, can be extracted by the feature extraction module 21 in FIG. 2 ) can contain rich semantic information , so that the calculation is performed according to the feature map of the image to be tested (for example, the prediction module 22 in FIG. 2 can be used to perform the calculation), and the detection result of the preset target can be obtained.
需要说明的是,在检测待测图像时,并不对待测图像的特征图进行语义信息增强处理,也即,直接将根据特征图计算得到用于指示所述预设目标的边界框的预测图即可。It should be noted that, when detecting the image to be tested, the feature map of the image to be tested is not subjected to semantic information enhancement processing, that is, the prediction map used to indicate the bounding box of the preset target is directly calculated according to the feature map. That's it.
参照图6,图6是本发明实施例中一种目标检测装置,本发明实施例中的目标检测装置可以包括:获取模块61、处理模块62、训练模块63、检测模块64。Referring to FIG. 6 , FIG. 6 is a target detection apparatus in an embodiment of the present invention. The target detection apparatus in an embodiment of the present invention may include an acquisition module 61 , a processing module 62 , a training module 63 , and a detection module 64 .
其中,获取模块61,用于获取样本图像,所述样本图像包括预设目标;处理模块62,用于提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;训练模块63,用于根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;检测模块64,用于采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。Wherein, the acquisition module 61 is used to acquire a sample image, and the sample image includes a preset target; the processing module 62 is used to extract the initial feature map of the sample image, and perform semantic information enhancement processing on the initial feature map, To obtain the first prediction map of the sample image, the first prediction map is used to indicate the target area and the background area of the sample image, the target area is the area containing the preset target, the background area is an area that does not contain the preset target; the training module 63 is used to train the detection network model according to the first prediction map and the second prediction map, so as to obtain a trained detection network model, wherein the The second prediction map is obtained by calculating the sample image according to the initial feature map, and the second prediction map is used to indicate the bounding box of the preset target; the detection module 64 is used for using the trained detection network The model detects the image to be tested to obtain the detection result of the preset target in the image to be tested.
关于本发明实施例中的目标检测装置的工作原理、工作方式、有益效果的更多内容,可以参照上述图1至图5的相关描述,在此不再赘述。For more details on the working principle, working mode, and beneficial effects of the target detection apparatus in the embodiment of the present invention, reference may be made to the above-mentioned related descriptions of FIGS. 1 to 5 , which will not be repeated here.
本发明实施例还提供了一种存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时执行上述图1所述的目标检测方法的步骤。所述存储介质可以是计算机可读存储介质,例如可以包括非挥发性存储器(non-volatile)或者非瞬态(non-transitory)存储器,还可以包括光盘、机械硬盘、固态硬盘等。An embodiment of the present invention further provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the target detection method described in FIG. 1 are executed. The storage medium may be a computer-readable storage medium, for example, may include non-volatile memory (non-volatile) or non-transitory (non-transitory) memory, and may also include optical disks, mechanical hard disks, solid-state disks, and the like.
本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,所述处理器运行所述计算机指令时执行上述图1所述的目标检测方法的步骤。所述终端可以是计算机、平板电脑、手机等终端设备,但并不限于此。An embodiment of the present invention further provides a terminal, including a memory and a processor, where the memory stores computer instructions that can be run on the processor, and the processor executes the computer instructions described in FIG. 1 when the processor runs the computer instructions. The steps of the object detection method. The terminal may be a terminal device such as a computer, a tablet computer, a mobile phone, etc., but is not limited thereto.
具体地,在本发明实施例中,所述处理器可以为中央处理单元(central processing unit,简称CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,简称DSP)、专用集成电路(application specific integrated circuit,简称ASIC)、现成可编程门阵列(field programmable gate array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Specifically, in the embodiment of the present invention, the processor may be a central processing unit (central processing unit, CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) ), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,简称ROM)、可编程只读存储器(programmable ROM,简称PROM)、可擦除可编程只读存储器(erasable PROM,简称EPROM)、电可擦除可编程只读存储器(electrically EPROM,简称EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,简称RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,简称RAM)可用,例如静态随机存取存储器(static RAM,简称SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,简称SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,简称DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,简称ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,简称SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,简称DR RAM)。It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM for short), programmable read-only memory (PROM for short), erasable programmable read-only memory (EPROM for short) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, EEPROM for short) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous Dynamic random access memory (synchronous DRAM, referred to as SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, referred to as DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, referred to as ESDRAM), Synchronous connection dynamic random access memory (synchlink DRAM, referred to as SLDRAM) and direct memory bus random access memory (direct rambus RAM, referred to as DR RAM).
本申请实施例中出现的“多个”是指两个或两个以上。The "plurality" in the embodiments of the present application refers to two or more.
本申请实施例中出现的第一、第二等描述,仅作示意与区分描述对象之用,没有次序之分,也不表示本申请实施例中对设备个数的特别限定,不能构成对本申请实施例的任何限制。The descriptions of the first, second, etc. appearing in the embodiments of the present application are only used for illustration and distinguishing the description objects, and have no order. any limitations of the examples.
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be based on the scope defined by the claims.

Claims (13)

  1. 一种目标检测方法,其特征在于,所述方法包括:A target detection method, characterized in that the method comprises:
    获取样本图像,所述样本图像包括预设目标;acquiring a sample image, the sample image includes a preset target;
    提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;Extract the initial feature map of the sample image, and perform semantic information enhancement processing on the initial feature map to obtain a first prediction map of the sample image, where the first prediction map is used to indicate the target of the sample image an area and a background area, the target area is an area that includes the preset target, and the background area is an area that does not include the preset target;
    根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;The detection network model is trained according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the second prediction map is calculated on the sample image according to the initial feature map obtained, the second prediction map is used to indicate the bounding box of the preset target;
    采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。The trained detection network model is used to detect the image to be tested, so as to obtain the detection result of the preset target in the image to be tested.
  2. 根据权利要求1所述的目标检测方法,其特征在于,所述初始特征图的每个像素点包括多个通道,所述初始特征图是通过对所述样本图像进行下采样得到的,则对所述初始特征图进行语义信息增强处理包括:The target detection method according to claim 1, wherein each pixel of the initial feature map includes multiple channels, and the initial feature map is obtained by down-sampling the sample image, then The semantic information enhancement processing of the initial feature map includes:
    步骤一:对所述初始特征图进行2倍上采样,以得到第一特征图;Step 1: Upsampling the initial feature map by 2 times to obtain a first feature map;
    步骤二:根据通道注意力机制对所述第一特征图进行处理,以得到第二特征图;Step 2: Process the first feature map according to the channel attention mechanism to obtain a second feature map;
    步骤三:将所述第二特征图作为新的初始特征图;Step 3: take the second feature map as a new initial feature map;
    重复执行步骤一至步骤三,直至所述上采样倍数与所述下采样倍数相等;Repeat steps 1 to 3 until the upsampling multiple is equal to the downsampling multiple;
    步骤四:对所述初始特征图进行第一卷积运算,以得到所述第一 预测图,其中,所述第一预测图的通道数量为2。Step 4: Perform a first convolution operation on the initial feature map to obtain the first prediction map, wherein the number of channels of the first prediction map is 2.
  3. 根据权利要求2所述的目标检测方法,其特征在于,根据通道注意力机制对所述第一特征图进行处理之前,所述方法还包括:The target detection method according to claim 2, wherein before processing the first feature map according to a channel attention mechanism, the method further comprises:
    对所述第一特征图进行多次第二卷积运算,每次第二卷积运算采用1×1的卷积核;Perform multiple second convolution operations on the first feature map, and use a 1×1 convolution kernel for each second convolution operation;
    将多次第二卷积运算的结果与所述第一特征图相加,以得到新的第一特征图。The results of multiple second convolution operations are added to the first feature map to obtain a new first feature map.
  4. 根据权利要求2或3所述的目标检测方法,其特征在于,根据通道注意力机制对所述第一特征图进行处理之前,所述方法还包括:The target detection method according to claim 2 or 3, wherein before processing the first feature map according to a channel attention mechanism, the method further comprises:
    对所述第一特征图进行多次第三卷积运算,多次第三卷积运算中至少一次采用3×3的卷积核;performing multiple third convolution operations on the first feature map, and at least one of the multiple third convolution operations uses a 3×3 convolution kernel;
    将多次第三卷积运算的结果与所述第一特征图相加,以得到新的第一特征图。The results of multiple third convolution operations are added to the first feature map to obtain a new first feature map.
  5. 根据权利要求3所述的目标检测方法,其特征在于,对所述第一特征图进行多次第二卷积运算包括:The target detection method according to claim 3, wherein performing multiple second convolution operations on the first feature map comprises:
    第一次第二卷积运算之外的其他第二卷积运算中,均先对所述第一特征图进行批标准化处理,再使用relu激活函数。In other second convolution operations other than the first second convolution operation, batch normalization is performed on the first feature map, and then the relu activation function is used.
  6. 根据权利要求1所述的目标检测方法,其特征在于,所述检测网络模型的损失函数为Loss=λ semanticL semanticmodelL model,其中,Loss为所述检测网络模型的损失函数,L semantic为第一损失函数,L model为第二损失函数,λ semantic为第一损失函数的权重系数,λ model为第二损失函数的权重系数; The target detection method according to claim 1, wherein the loss function of the detection network model is Loss=λ semantic L semanticmodel L model , wherein Loss is the loss function of the detection network model, L semantic is the first loss function, L model is the second loss function, λ semantic is the weight coefficient of the first loss function, and λ model is the weight coefficient of the second loss function;
    则根据所述第一预测图对所述检测网络模型进行训练包括:Then, training the detection network model according to the first prediction map includes:
    步骤A:根据所述第一预测图、所述样本图像和所述第一损失函数计算第一损失函数值,并根据所述第二预测图、所述样本图像 和所述第二损失函数计算第二损失函数值;Step A: Calculate a first loss function value according to the first prediction map, the sample image and the first loss function, and calculate the value of the first loss function according to the second prediction map, the sample image and the second loss function The second loss function value;
    步骤B:根据所述第一损失函数值和所述第二损失函数值计算所述检测网络模型的损失函数值;Step B: Calculate the loss function value of the detection network model according to the first loss function value and the second loss function value;
    步骤C:判断所述损失函数值超过预设阈值,如果是,则调整所述检测网络模型中用于提取所述初始特征图的模块的参数,如果不是,则结束所述检测网络模型的训练;Step C: determine that the loss function value exceeds a preset threshold, if so, adjust the parameters of the module used to extract the initial feature map in the detection network model, if not, end the training of the detection network model ;
    步骤D:采用调整参数后的检测网络模型提取所述样本图像的初始特征图,对所述初始特征图进行语义信息增强处理,以得到所述第一预测图,并根据所述初始特征图对所述样本图像进行预测,以得到所述第一预测图;Step D: Extract the initial feature map of the sample image by using the detection network model after adjusting the parameters, and perform semantic information enhancement processing on the initial feature map to obtain the first prediction map, and according to the initial feature map. performing prediction on the sample image to obtain the first prediction map;
    重复执行步骤A至步骤E,直至步骤C中判断所述损失函数值未超过预设阈值。Repeat steps A to E until it is determined in step C that the loss function value does not exceed a preset threshold.
  7. 根据权利要求6所述的目标检测方法,其特征在于,所述第一损失函数为FocalLoss函数。The target detection method according to claim 6, wherein the first loss function is a FocalLoss function.
  8. 根据权利要求6所述的目标检测方法,其特征在于,所述第一损失函数的权重系数由所述样本图像中所述预设目标占所有目标的比例决定,所述预设目标占所有目标的比例越大,则所述第一损失函数的权重系数越大。The target detection method according to claim 6, wherein the weight coefficient of the first loss function is determined by the ratio of the preset target to all the targets in the sample image, and the preset target accounts for all the targets. The larger the ratio of , the larger the weight coefficient of the first loss function.
  9. 根据权利要求1所述的目标检测方法,其特征在于,在提取所述样本图像的初始特征图之前,所述方法还包括:The target detection method according to claim 1, wherein before extracting the initial feature map of the sample image, the method further comprises:
    对所述样本图像进行数据增强,所述数据增强包括以下一项或多项:调整所述样本图像的亮度和/或对比度、将所述样本图像旋转预设角度、为所述样本图像增加噪声。performing data enhancement on the sample image, the data enhancement including one or more of the following: adjusting the brightness and/or contrast of the sample image, rotating the sample image by a preset angle, adding noise to the sample image .
  10. 根据权利要求1所述的目标检测方法,其特征在于,所述检测网络模型为单步检测网络模型。The target detection method according to claim 1, wherein the detection network model is a single-step detection network model.
  11. 一种目标检测装置,其特征在于,所述装置包括:A target detection device, characterized in that the device comprises:
    获取模块,用于获取样本图像,所述样本图像包括预设目标;an acquisition module for acquiring a sample image, the sample image including a preset target;
    处理模块,用于提取所述样本图像的初始特征图,并对所述初始特征图进行语义信息增强处理,以得到所述样本图像的第一预测图,所述第一预测图用于指示所述样本图像的目标区域和背景区域,所述目标区域为包含所述预设目标的区域,所述背景区域为未包含所述预设目标的区域;A processing module, configured to extract the initial feature map of the sample image, and perform semantic information enhancement processing on the initial feature map to obtain a first prediction map of the sample image, where the first prediction map is used to indicate the the target area and the background area of the sample image, the target area is an area that includes the preset target, and the background area is an area that does not include the preset target;
    训练模块,用于根据所述第一预测图和第二预测图对检测网络模型进行训练,以得到训练后的检测网络模型,其中,所述第二预测图是根据所述初始特征图对所述样本图像进行计算得到的,所述第二预测图用于指示所述预设目标的边界框;A training module is used to train the detection network model according to the first prediction map and the second prediction map to obtain a trained detection network model, wherein the second prediction map is based on the initial feature map. Calculated from the sample image, the second prediction map is used to indicate the bounding box of the preset target;
    检测模块,用于采用训练后的检测网络模型检测待测图像,以得到所述待测图像中所述预设目标的检测结果。The detection module is used to detect the image to be tested by using the trained detection network model, so as to obtain the detection result of the preset target in the image to be tested.
  12. 一种存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器运行时执行权利要求1至10任一项所述目标检测方法的步骤。A storage medium having a computer program stored thereon, characterized in that, when the computer program is run by a processor, the steps of the target detection method according to any one of claims 1 to 10 are executed.
  13. 一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,其特征在于,所述处理器运行所述计算机程序时执行权利要求1至10任一项所述目标检测方法的步骤。A terminal, comprising a memory and a processor, the memory stores a computer program that can run on the processor, wherein the processor executes any one of claims 1 to 10 when the processor runs the computer program The steps of the target detection method described in item.
PCT/CN2021/131132 2020-11-30 2021-11-17 Target detection method and apparatus, storage medium, and terminal WO2022111352A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011373448.XA CN112446378B (en) 2020-11-30 2020-11-30 Target detection method and device, storage medium and terminal
CN202011373448.X 2020-11-30

Publications (1)

Publication Number Publication Date
WO2022111352A1 true WO2022111352A1 (en) 2022-06-02

Family

ID=74738259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131132 WO2022111352A1 (en) 2020-11-30 2021-11-17 Target detection method and apparatus, storage medium, and terminal

Country Status (2)

Country Link
CN (1) CN112446378B (en)
WO (1) WO2022111352A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147709A (en) * 2022-07-06 2022-10-04 西北工业大学 Underwater target three-dimensional reconstruction method based on deep learning
CN116055895A (en) * 2023-03-29 2023-05-02 荣耀终端有限公司 Image processing method and related device
CN116071309A (en) * 2022-12-27 2023-05-05 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Method, device, equipment and storage medium for detecting sound scanning defect of component
CN116704206A (en) * 2023-06-12 2023-09-05 中电金信软件有限公司 Image processing method, device, computer equipment and storage medium
CN117649358A (en) * 2024-01-30 2024-03-05 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN117670755A (en) * 2024-01-31 2024-03-08 四川泓宝润业工程技术有限公司 Detection method and device for lifting hook anti-drop device, storage medium and electronic equipment
CN117994251A (en) * 2024-04-03 2024-05-07 华中科技大学同济医学院附属同济医院 Method and system for evaluating severity of diabetic foot ulcer based on artificial intelligence

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446378B (en) * 2020-11-30 2022-09-16 展讯通信(上海)有限公司 Target detection method and device, storage medium and terminal
CN113283453B (en) * 2021-06-15 2023-08-08 深圳大学 Target detection method, device, computer equipment and storage medium
CN114663904A (en) * 2022-04-02 2022-06-24 成都卫士通信息产业股份有限公司 PDF document layout detection method, device, equipment and medium
WO2023221013A1 (en) * 2022-05-19 2023-11-23 中国科学院深圳先进技术研究院 Small object detection method and apparatus based on feature fusion, device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156144A1 (en) * 2017-02-23 2019-05-23 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
CN110751134A (en) * 2019-12-23 2020-02-04 长沙智能驾驶研究院有限公司 Target detection method, storage medium and computer device
CN111597945A (en) * 2020-05-11 2020-08-28 济南博观智能科技有限公司 Target detection method, device, equipment and medium
CN111738231A (en) * 2020-08-06 2020-10-02 腾讯科技(深圳)有限公司 Target object detection method and device, computer equipment and storage medium
CN112446378A (en) * 2020-11-30 2021-03-05 展讯通信(上海)有限公司 Target detection method and device, storage medium and terminal

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521592B (en) * 2011-11-30 2013-06-12 苏州大学 Multi-feature fusion salient region extracting method based on non-clear region inhibition
CN106529565B (en) * 2016-09-23 2019-09-13 北京市商汤科技开发有限公司 Model of Target Recognition training and target identification method and device calculate equipment
CN110765948A (en) * 2019-10-24 2020-02-07 长沙品先信息技术有限公司 Target detection and identification method and system based on unmanned aerial vehicle
CN111079632A (en) * 2019-12-12 2020-04-28 上海眼控科技股份有限公司 Training method and device of text detection model, computer equipment and storage medium
CN111259930B (en) * 2020-01-09 2023-04-25 南京信息工程大学 General target detection method of self-adaptive attention guidance mechanism
CN111626350B (en) * 2020-05-25 2021-05-18 腾讯科技(深圳)有限公司 Target detection model training method, target detection method and device
CN111914804A (en) * 2020-08-18 2020-11-10 中科弘云科技(北京)有限公司 Multi-angle rotation remote sensing image small target detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156144A1 (en) * 2017-02-23 2019-05-23 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
CN110751134A (en) * 2019-12-23 2020-02-04 长沙智能驾驶研究院有限公司 Target detection method, storage medium and computer device
CN111597945A (en) * 2020-05-11 2020-08-28 济南博观智能科技有限公司 Target detection method, device, equipment and medium
CN111738231A (en) * 2020-08-06 2020-10-02 腾讯科技(深圳)有限公司 Target object detection method and device, computer equipment and storage medium
CN112446378A (en) * 2020-11-30 2021-03-05 展讯通信(上海)有限公司 Target detection method and device, storage medium and terminal

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147709A (en) * 2022-07-06 2022-10-04 西北工业大学 Underwater target three-dimensional reconstruction method based on deep learning
CN115147709B (en) * 2022-07-06 2024-03-19 西北工业大学 Underwater target three-dimensional reconstruction method based on deep learning
CN116071309A (en) * 2022-12-27 2023-05-05 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Method, device, equipment and storage medium for detecting sound scanning defect of component
CN116071309B (en) * 2022-12-27 2024-05-17 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Method, device, equipment and storage medium for detecting sound scanning defect of component
CN116055895A (en) * 2023-03-29 2023-05-02 荣耀终端有限公司 Image processing method and related device
CN116055895B (en) * 2023-03-29 2023-08-22 荣耀终端有限公司 Image processing method and device, chip system and storage medium
CN116704206A (en) * 2023-06-12 2023-09-05 中电金信软件有限公司 Image processing method, device, computer equipment and storage medium
CN117649358A (en) * 2024-01-30 2024-03-05 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN117649358B (en) * 2024-01-30 2024-04-16 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN117670755A (en) * 2024-01-31 2024-03-08 四川泓宝润业工程技术有限公司 Detection method and device for lifting hook anti-drop device, storage medium and electronic equipment
CN117670755B (en) * 2024-01-31 2024-04-26 四川泓宝润业工程技术有限公司 Detection method and device for lifting hook anti-drop device, storage medium and electronic equipment
CN117994251A (en) * 2024-04-03 2024-05-07 华中科技大学同济医学院附属同济医院 Method and system for evaluating severity of diabetic foot ulcer based on artificial intelligence

Also Published As

Publication number Publication date
CN112446378B (en) 2022-09-16
CN112446378A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2022111352A1 (en) Target detection method and apparatus, storage medium, and terminal
US11321593B2 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
CN110310264B (en) DCNN-based large-scale target detection method and device
CN108230329B (en) Semantic segmentation method based on multi-scale convolution neural network
WO2021244621A1 (en) Scenario semantic parsing method based on global guidance selective context network
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
WO2016124103A1 (en) Picture detection method and device
CN110443258B (en) Character detection method and device, electronic equipment and storage medium
CN112949767B (en) Sample image increment, image detection model training and image detection method
CN110020658B (en) Salient object detection method based on multitask deep learning
CN110781980B (en) Training method of target detection model, target detection method and device
CN113139543A (en) Training method of target object detection model, target object detection method and device
CN111046971A (en) Image recognition method, device, equipment and computer readable storage medium
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN111062347B (en) Traffic element segmentation method in automatic driving, electronic equipment and storage medium
CN110991412A (en) Face recognition method and device, storage medium and electronic equipment
CN111695397A (en) Ship identification method based on YOLO and electronic equipment
Mu et al. Finding autofocus region in low contrast surveillance images using CNN-based saliency algorithm
CN110210314B (en) Face detection method, device, computer equipment and storage medium
CN114663654B (en) Improved YOLOv4 network model and small target detection method
CN116468702A (en) Chloasma assessment method, device, electronic equipment and computer readable storage medium
CN112308061B (en) License plate character recognition method and device
CN113989632A (en) Bridge detection method and device for remote sensing image, electronic equipment and storage medium
CN110738225B (en) Image recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21896853

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21896853

Country of ref document: EP

Kind code of ref document: A1