WO2021083126A1 - Target detection and intelligent driving methods and apparatuses, device, and storage medium - Google Patents

Target detection and intelligent driving methods and apparatuses, device, and storage medium Download PDF

Info

Publication number
WO2021083126A1
WO2021083126A1 PCT/CN2020/123918 CN2020123918W WO2021083126A1 WO 2021083126 A1 WO2021083126 A1 WO 2021083126A1 CN 2020123918 W CN2020123918 W CN 2020123918W WO 2021083126 A1 WO2021083126 A1 WO 2021083126A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
different scales
similarity
maps
Prior art date
Application number
PCT/CN2020/123918
Other languages
French (fr)
Chinese (zh)
Inventor
吕书畅
程光亮
石建萍
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911063316.4A external-priority patent/CN112749602A/en
Priority claimed from CN201911054823.1A external-priority patent/CN112749710A/en
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021539414A priority Critical patent/JP2022535473A/en
Priority to KR1020217020811A priority patent/KR20210098515A/en
Publication of WO2021083126A1 publication Critical patent/WO2021083126A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle

Definitions

  • This application relates to the field of image processing, in particular to a method, device, device, and storage medium for target detection and intelligent driving.
  • Single-sample semantic segmentation is an emerging problem in the field of computer vision and intelligent image processing.
  • Single-sample semantic segmentation aims to use a single training sample of a certain category to make the segmentation model have the ability to recognize the pixel in the category.
  • the proposal of single-sample semantic segmentation can effectively reduce the cost of sample collection and annotation of traditional image semantic segmentation problems.
  • Single-sample image semantic segmentation aims to train only a single sample for a certain category of objects, so that the segmentation model has the ability to recognize all pixels of the object.
  • Target query can query the target contained in the image by means of image semantic segmentation.
  • Image semantic segmentation includes single-sample image semantic segmentation. Traditional image semantic segmentation requires a large number of training images for all categories of objects to ensure model performance, which brings extremely high labeling costs.
  • the purpose of this application is to provide a target detection and intelligent driving method, device, equipment, and storage medium to solve the existing technical problem of low target detection accuracy.
  • a target detection method which includes: extracting a plurality of features of different scales on a first image and a second image, respectively, to obtain a plurality of first feature maps of different scales and a plurality of features of different scales.
  • the second feature map ; according to multiple first feature maps of different scales and labels of the first image, and the second feature map of corresponding scales, determine the target to be queried in the second image; the first The label of an image is a result of labeling the target to be queried contained in the first image.
  • an intelligent driving method which includes: collecting road images; using the target detection method as described above to perform a search on the collected road images of the target to be queried according to the supporting image and the label of the supporting image. Query; wherein the label of the support image is the result of marking the target contained in the support image in the same category as the target to be queried; according to the query result, the intelligent driving device that collects the road image is controlled.
  • a target detection device which includes: a feature extraction module and a determination module; the feature extraction module is used to perform feature extraction of a plurality of different scales on a first image and a second image, respectively, Obtain a plurality of first feature maps of different scales and a plurality of second feature maps of different scales; the determining module is used to obtain a plurality of first feature maps of different scales, the labels of the first images, and the corresponding scales The second feature map in the second image determines the target to be queried in the second image; the label of the first image is the result of labeling the target to be queried contained in the first image.
  • an intelligent driving device which includes: a collection module for collecting road images; a query module for adopting the target detection method described above according to the support image and the tag pair of the support image.
  • the collected road images are used to query the target to be queried; wherein the label of the support image is the result of marking the target contained in the support image and the target of the same category as the target to be queried; the control module is used for The query result controls the intelligent driving equipment that collects road images.
  • a target detection device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements the above-mentioned program when the program is executed. Target detection method.
  • a smart driving device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program when the program is executed.
  • the smart driving method as described above.
  • a computer-readable storage medium is provided, and a computer program is stored thereon, and when the program is executed by a processor, the steps of the target detection method are realized, or when the program is executed by the processor, all the steps are realized. Describe the steps of the smart driving method.
  • a chip for running instructions includes a memory and a processor, the memory stores code and data, the memory is coupled with the processor, and the processor runs all The code in the memory enables the chip to execute the steps of the above-mentioned target detection method, or the processor runs the code in the memory so that the chip is used to execute the steps of the above-mentioned smart driving method.
  • a program product containing instructions is provided.
  • the computer executes the steps of the above-mentioned target detection method, or when the program product runs on a computer
  • the computer is made to execute the steps of the smart driving method described above.
  • a computer program is provided.
  • the computer program is executed by a processor, it is used to execute the steps of the above-mentioned target detection method, or when the computer program is executed by a processor, To perform the steps of the smart driving method described above.
  • the feature expression ability of the first image and the second image is improved, so that more judgments can be obtained.
  • the similarity information between the first image and the second image enables the subsequent target detection to have a richer feature input when facing a single sample, thereby improving the segmentation accuracy of the single-sample semantic segmentation, thereby improving the target detection accuracy.
  • FIG. 1 is a flowchart of a target detection method provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a target detection model provided by an embodiment of the application.
  • FIG. 3 is a flowchart of a target detection method provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a symmetric cascade structure provided by an embodiment of the application.
  • FIG. 5 is a flowchart of a target detection method provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a target detection model provided by another embodiment of this application.
  • FIG. 7 is a schematic flowchart of a target query method provided by another embodiment of this application.
  • FIG. 8 is a schematic flowchart of a target query method provided by another embodiment of this application.
  • FIG. 9 is a schematic flowchart of a target query method provided by still another embodiment of this application.
  • FIG. 10 is a schematic flowchart of a target query method provided by another embodiment of this application.
  • FIG. 11 is a schematic flowchart of a smart driving method provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of a target detection process provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of a generation module and an aggregation module provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of comparison between the similarity feature extraction method in the target query method provided by the embodiment of the application and the extraction method in the related technology;
  • FIG. 15 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • FIG. 16 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • Fig. 18 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
  • the single-sample image semantic segmentation deep learning model is to perform feature extraction on the query set image and the support set image respectively, where the query set image is the image that needs to be queried, and the support set image contains the target to be queried.
  • the target to be queried in the support set image is labeled in advance to obtain the label information.
  • Combining the label information, the target in the query set image is determined by the similarity between the features of the support set image and the feature of the query set image.
  • the deep learning model expresses the support set image as a single feature vector, and the feature expression ability of the support set image is limited, which leads to the insufficient ability of the model to describe the similarity between the support set image feature and the query image pixel feature. , Resulting in low accuracy of the target query.
  • the first image may be the above-mentioned support set image
  • the second image may be the above-mentioned query set image.
  • the first image and the second image are extracted with multiple features of different scales.
  • the second image are expressed as multiple features of different scales, which improves the feature expression ability of the first image and the second image, so that more information for judging the similarity between the first image and the second image can be obtained, and then Improve the accuracy of target query.
  • FIG. 1 is a flowchart of a target detection method provided by an embodiment of the application.
  • the embodiments of the present application provide a target detection method. The specific steps of the method are as follows:
  • Step 101 Perform multiple feature extractions of different scales on the first image and the second image, respectively, to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
  • the second image is an image for which a target query needs to be performed.
  • the target query the pixel area where the target to be queried contained in the second image is located can be detected.
  • the target to be queried can be determined according to actual conditions, for example, it can be an animal, plant, person, vehicle, etc., which is not limited here.
  • the label information may be contour information, pixel information, etc. of the target to be queried in the first image, which is not limited here.
  • the tag information may be a binarized tag, and the pixel value of the pixel area where the target is located in the binarized tag is different from the pixel values of other areas in the image.
  • the target detection method of this embodiment can be applied to the target detection process of a vehicle.
  • the vehicle can be an autonomous vehicle or a vehicle equipped with an Advanced Driver Assistance Systems (ADAS) system. It is understandable that the target detection method can also be applied to robots.
  • ADAS Advanced Driver Assistance Systems
  • the first image and the second image may be acquired by an image acquisition device on the vehicle, and the image acquisition device may be a camera, such as a monocular camera, a binocular camera, and the like.
  • the first image can be extracted with multiple features of different scales through the feature extraction algorithm to obtain multiple first feature maps of different scales; the second image can be extracted with multiple features of different scales to obtain multiple The second feature map of different scales.
  • the feature extraction algorithm can be CNN (Convolutional Neural Networks, convolutional neural network) algorithm, LBP (Local Binary Pattern, local binary pattern) algorithm, SIFT (Scale-invariant feature transform, scale-invariant feature transform) algorithm, HOG (Histogram of Oriented Gradient, directional gradient histogram) algorithm, etc., are not limited here.
  • the target detection method of this embodiment can be applied to the target detection model shown in FIG. 2.
  • the target detection model 20 includes: a feature extraction network 21, a scale transformation module 22 and a convolution network 23.
  • the feature extraction network 21 is a neural network, and the feature extraction network 21 can adopt an existing network architecture, such as a VGG (Visual Geometry Group) network, a Resnet network, or other general image feature extraction networks.
  • the first image and the second image can be input into the feature extraction network 21 at the same time for feature extraction of multiple different scales; or two feature extraction networks 21 can be set up, and the two feature extraction networks 21 have the same network architecture and For network parameters, the first image and the second image are respectively input to the two feature extraction networks 21 to perform feature extraction of multiple different scales on the first image and the second image, respectively.
  • multiple different scales can be pre-designated, and for each scale, feature extraction of the scale is performed on the first image and the second image respectively to obtain the first feature map and the second feature map of the scale.
  • Step 102 Determine the target to be queried in the second image according to the labels of the first feature map and the first image of multiple different scales, and the second feature map of the corresponding scale; Contains the results of marking the target to be queried.
  • the label information of the first image can be combined to obtain a similarity map that characterizes the similarity between the first feature map and the second feature map of the scale. . Then, through similarity maps of different scales, the target to be queried in the second image can be determined.
  • multiple first feature maps of different scales and multiple second feature maps of different scales are obtained; according to multiple first feature maps of different scales A feature map and the label of the first image, as well as the second feature map of the corresponding scale, determine the target to be queried in the second image; the label of the first image is to label the target to be queried contained in the first image result.
  • the feature expression ability of the first image and the second image is improved, so that more judgments about the similarity between the first image and the second image can be obtained.
  • the first image contains the target of the same type as the target to be queried
  • the first image contains the posture, texture, color and other information of the target of the same type as the target to be queried. It may be different from the posture, texture, color and other information of the target included in the first image and of the same type as the target to be queried.
  • the target to be queried is traffic lights
  • the traffic lights contained in the first image are arranged vertically
  • traffic lights in the second image the traffic lights in the second image can be arranged horizontally
  • the traffic lights are arranged in the first image and the second image.
  • the state in the image can be inconsistent.
  • multiple feature extractions of different scales are performed on the first image and the second image respectively to obtain multiple first feature maps of different scales and multiple second feature maps of different scales, including:
  • Step 301 Perform feature extraction on the first image and the second image respectively to obtain a first feature map and a second feature map.
  • the feature extraction network 21 includes a first convolution module 211, a second convolution module 212, and a third convolution module 213.
  • the first convolution module 211 includes three convolution layers connected in sequence
  • the second convolution module 212 and the third convolution module 213 each include one convolution layer.
  • the first image and the second image can be simultaneously input into the first convolution module 211 shown in FIG. 2, and the first convolution module 211 respectively outputs corresponding feature extraction results according to the first image and the second image. Then the first convolution module 211 outputs the feature extraction results respectively according to the first image and the second image and then inputs it into the second convolution module 212.
  • the second convolution module 212 according to the first convolution module 211 is based on the first image and the first image.
  • the feature extraction results of the two images respectively output the corresponding feature extraction results, and then the second convolution module 212 according to the first convolution module 211 based on the feature extraction results of the first image and the second image respectively output the feature extraction results and then input the second image
  • the third convolution module 213 continues to perform feature extraction according to the feature extraction result output by the second convolution module 212, so as to output the feature extraction result of the first image and the feature extraction result of the second image respectively , Are the first feature map and the second feature map, respectively.
  • Step 302 Perform multiple scale transformations on the first feature map and the second feature map to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
  • the first feature map and the second feature map are respectively input to the scale transformation module 22 to perform multiple scale conversions on the first feature map and the second feature map respectively through the scale transformation module 22, thereby respectively
  • the first image and the second image are expressed as multiple feature maps of different sizes.
  • performing multiple scale conversions on the first feature map and the second feature map respectively includes: performing down-sampling on the first feature map and the second feature map at least twice, respectively.
  • performing down-sampling on the first feature map and the second feature map at least twice, respectively includes: down-sampling the first feature map and the second feature map at the first sampling rate, respectively, to obtain a lower sampling rate than the first image.
  • the first sampling rate to downsample the first feature map to obtain the first feature map that is downsampled by the first multiple of the first image; then use the second sampling rate to continue to compare the first image downsampled by the first multiple
  • the first feature map is down-sampled to obtain a first feature map that is down-sampled by a second multiple of the first image, where the second multiple is greater than the first multiple.
  • the first sampling rate is also used to downsample the second feature map to obtain the second feature map that is downsampled by the first multiple of the second image; then the second sampling rate is used to continue to compare the first feature map.
  • the second feature map that is down-sampled by the second multiple of the two images is down-sampled to obtain a second feature map that is down-sampled by the second multiple of the second image.
  • the first feature map and the second feature map are down-sampled using the first sampling rate, respectively, to obtain the first feature map that is down-sampled by a first multiple of the first image and a second multiple that is down-sampled than the second image.
  • the method of the embodiment of the present application further includes: using a third sampling rate to compare the first feature map that is down-sampled by a second multiple of the first image and the second feature that is down-sampled by a second multiple of the second image.
  • the image is down-sampled to obtain a first feature map that is down-sampled by a third multiple of the first image and a second feature map that is down-sampled by a third multiple of the second image, and the third multiple is greater than the second multiple.
  • the first multiple, the second multiple, and the third multiple are 8 times, 16 times, and 32 times, respectively.
  • the scale conversion module 22 may adopt a symmetrical cascade structure.
  • the symmetrical cascade structure includes two cascade structures arranged symmetrically with each other, wherein each cascade structure includes successively Three connected sampling units.
  • the two cascade structures are referred to as the first cascade structure 41 and the second cascade structure 42 respectively, and the three sampling units included in the first cascade structure are respectively referred to as the first sampling unit and the second cascade structure.
  • the second sampling unit and the third sampling unit; the three sampling units included in the second cascade structure are called the fourth sampling unit, the fifth sampling unit, and the sixth sampling unit, respectively.
  • the sampling rates of the first sampling unit and the fourth sampling unit are the same, the sampling rates of the second sampling unit and the fifth sampling unit are the same, and the sampling rates of the third sampling unit and the sixth sampling unit are the same.
  • the first sampling unit and the fourth sampling unit respectively use the first sampling rate to sample the first feature map and the second feature map, thereby outputting the first image and the second image that are down-sampled by 8 times compared to the first image and the second image.
  • the symmetric cascade structure shown in FIG. 4 may be used to perform multiple scale conversions on the first feature map and the second feature map respectively.
  • the first feature map is input into the first sampling unit, the second sampling unit, and the third sampling unit separately and sequentially to pass the first sampling unit, respectively.
  • a sampling unit, a second sampling unit, and a third sampling unit perform down-sampling at different sampling rates, thereby outputting a first feature map that is down-sampled 8 times, 16 times, and 32 times compared to the size of the first image.
  • the second feature map is input into the fourth sampling unit, the fifth sampling unit, and the sixth sampling unit respectively and sequentially to pass the The fourth sampling unit, the fifth sampling unit, and the sixth sampling unit perform down-sampling at different sampling rates, thereby outputting a second feature map that is down-sampled 8 times, 16 times, and 32 times compared to the size of the second image.
  • first cascade structure 41 and second cascade structure 42 may also be a two-level cascade structure.
  • first cascade structure 41 and the second cascade structure 42 each include two cascades connected in sequence. Sampling unit.
  • determining the target to be queried in the second image according to a plurality of first feature maps of different scales and labels of the first image, and a second feature map of corresponding scales includes: according to a plurality of first feature maps of different scales The label of the feature map and the first image determines multiple first feature vectors of different scales; multiple first feature vectors of different scales and second feature maps of corresponding scales are calculated according to a preset calculation rule to obtain a calculation result; According to the calculation result, the mask image of the second image is determined; according to the mask image, the target to be queried in the second image is determined.
  • the preset calculation rules include: inner product calculation rules, or cosine distance calculation rules.
  • the label of the first image refers to information indicating the target or the category of the object in the image.
  • the first feature map of each scale and the label of the first image can form a feature vector, for example, the first image is downsampled by 8 times. , 16 times and 32 times the first feature map and the label of the first image are interpolated to form a feature vector, hereinafter referred to as the first feature vector, the second feature vector and the third feature vector, and then the first feature The vector sums the second feature map downsampled 8 times compared to the second image to perform inner product operation, the second feature vector and the second feature map downsampled 16 times compared to the first image to perform the inner product operation, and the third The feature vector and the second feature map downsampled 32 times compared to the first image are subjected to the inner product operation to obtain three probability maps of different scales.
  • the sizes of the three probability maps of different scales are respectively the same as those of the first feature vector and the first feature vector.
  • the size of the second feature vector and the third feature vector are the same. It can also be considered that the sizes of the three probability maps of different scales are compared to the first image or the second image by down-sampling 8 times, 16 times, and 32 times the first feature. The size of the figure or the second characteristic figure is the same. After that, these three probability maps are input to the convolutional network 23, and the convolutional network 23 connects the three probability maps and convolves the connected images, so as to output the mask image mask of the second image to achieve The target detection effect on the second image.
  • determining the target to be queried in the second image includes: first feature maps of multiple different scales
  • the feature map, the label of the first image, and the second feature map of the corresponding scale are used as the guidance information of the third feature map of the corresponding scale to determine the image to be queried in the second image; wherein the third feature map is determined according to the second image, And the second feature map and the third feature map of the same scale are different.
  • this embodiment adds a third feature map to guide the inner product operation results of different scales obtained in the foregoing embodiment, thereby further improving the accuracy of subsequent target detection.
  • the three feature maps can use other feature extraction networks other than the feature extraction network 21 shown in FIG. 2 for feature extraction.
  • the network architecture and network parameters of the feature extraction network of the third feature map are the same as those of the first and second feature maps.
  • the architecture and network parameters are different, for example, the convolution kernel is different.
  • FIG. 5 is a flowchart of a target detection method provided by another embodiment of this application.
  • the target detection method provided in this embodiment specifically includes the following steps:
  • Step 501 Determine multiple first feature vectors of different scales according to multiple first feature maps of different scales and labels of the first images.
  • Step 502 Calculate multiple first feature vectors of different scales and second feature maps of corresponding scales according to a preset calculation rule to obtain multiple mask images of different scales.
  • the mask image obtained in this step will be used as guidance information to guide the third feature map.
  • Step 503 Determine the target to be queried in the second image according to the multiplication result of the multiple mask images of different scales and the third feature map of corresponding scales.
  • the multiplication of multiple mask images of different scales with the third feature map of the corresponding scale refers to the value (scalar) of the mask image at the same position in the mask image of the same scale and the third feature map. Multiply the value (vector) of the third feature map.
  • the method of this embodiment can be applied to the detection model shown in FIG. 6.
  • the detection model shown in FIG. 6 is different from the detection model shown in FIG. 2 in that it is based on the feature extraction network 21 shown in FIG. Some convolutional layers are added, and a third cascade structure is added on the basis of the symmetric cascade structure shown in FIG. 2.
  • the structure of the third cascade structure is the same as the structure of the first cascade structure or the second cascade structure, and its implementation principle can be referred to the introduction of the foregoing embodiment.
  • the detection model 60 includes a feature extraction network 61, a scale conversion module 62 and a convolutional network 63.
  • the feature extraction network 61 includes a fourth convolution module 611, a fifth convolution module 612, a sixth convolution module 613, a seventh convolution module 614, an eighth convolution module 615, a ninth convolution module 616, and a Ten convolution module 617.
  • the sixth convolution module 613 (the third convolution module 213 in FIG. 2) is also connected to the seventh convolution module 614 and the fourth convolution module.
  • the eighth convolution module 615, the ninth convolution module 616, and the tenth convolution module 617 are sequentially connected.
  • the outputs of the sixth convolution module 613 and the seventh convolution module 614 are also used as the input of the eighth convolution module 615 and the ninth convolution module 616, respectively.
  • the output of the tenth convolution module 617 is used as the input of the third cascade structure 33.
  • the seventh convolution module 614 performs feature extraction according to the output results of the sixth convolution module 613 to obtain the first feature map and the second feature map, and then input the scale conversion module 62.
  • the scale conversion module 62 is similar to the one shown in FIG.
  • the scale conversion module 22 has the same structure and principle.
  • the scale conversion module 62 performs different scale conversions on the first feature map and the second feature map.
  • the label information of the first image is also input into the scale conversion module 62.
  • the scale conversion module 62 outputs a plurality of mask images mask32x, mask16x, and mask8x of different scales according to the first feature map, the second feature map of different scales, and the label information of the first image.
  • Mask32x, mask16x, and mask8x respectively represent The mask image is down-sampled 32 times, 16 times, and 8 times than the first feature map or the second feature map.
  • the mask images mask32x, mask16x, and mask8x output by the scale conversion module 62 are then down-sampled by the second image by 8 times, 16 times, and 32 times compared with the second image output by the third cascade structure to perform corresponding pixels.
  • the multiplication operation at the position results in three probability maps. After that, the three probability maps are input into the convolutional network to perform operations such as convolution, so as to realize the target detection of the second image.
  • the feature map extracted by the sixth convolution module 613 can also be directly input into the third cascade structure.
  • this embodiment may also directly input the feature map for the first image and the feature map for the second image output by the sixth convolution module 613 into the first cascade structure and the second cascade structure, respectively.
  • the first convolution module, the second convolution module, and the third convolution module shown in FIG. 2 are a standard VGG network architecture. Those skilled in the art can use the VGG network architecture shown in FIG. 2 according to actual needs.
  • the number of convolution modules is increased or decreased.
  • a plurality of first feature vectors of different scales are determined according to a plurality of first feature maps of different scales and the labels of the first images, and then the plurality of first feature vectors of different scales are combined with the second feature vectors of corresponding scales.
  • the feature map is calculated according to a preset calculation rule to obtain a calculation result, a mask image of the second image is determined according to the calculation result, and a target to be queried in the second image is determined according to the mask image.
  • Multiple mask images at different scales can guide the similarity of the segmentation of the second feature map at the corresponding scale (the mask images mask32x, mask16x, mask8x output by the scale conversion module 62 and the third cascade structure are based on the second image
  • the output second feature map which is down-sampled by 8, 16, and 32 times compared to the second image, is multiplied at the corresponding pixel position).
  • the sixth convolution module since the output result of the fifth convolution module 612 on the second image is input to the sixth convolution module, the sixth convolution module can be based on the output result of the fifth convolution module. After fusion with the output result of the second image, feature extraction is performed again. In this way, richer feature information can be extracted, and during backpropagation, the feedback loss function can also carry richer information, making it more Adjust the network parameters of each convolution module in the feature extraction network. Therefore, in the subsequent target detection process, the detection accuracy of the detection model can also be further improved.
  • FIG. 7 is a schematic flowchart of a target detection method provided by another embodiment of this application. This embodiment describes in detail the specific implementation process of determining the target to be queried in the second image based on multiple first feature maps of different scales and label information of the first image, and second feature maps of corresponding scales. As shown in Figure 7, the method includes:
  • S701 Perform feature extraction of multiple different scales on the first image and the second image respectively, and generate multiple first feature maps of different scales and multiple second feature maps of different scales.
  • S701 is similar to S101 in the embodiment of FIG. 1, and will not be repeated here.
  • S702. Determine multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales; a similarity map of one scale represents the first feature of the scale The similarity between the graph and the second feature graph.
  • the similarity map of each scale contains the similarity information of the features between the first feature map and the second feature map of the scale.
  • S702 may include: determining a plurality of first feature vectors of different scales according to label information of a plurality of first feature maps of different scales and the first image; and comparing the plurality of first feature vectors of different scales with corresponding scales.
  • the second feature map of is multiplied element by element to obtain multiple similarity maps of different scales.
  • the first feature map of the scale and the label information of the first image may be multiplied to obtain the first feature vector of the scale. Then the first feature vector of this scale and the second feature map of this scale are multiplied element by element to obtain the similarity map of this scale.
  • the similarity map of this scale a vector is used at each pixel location to express the similarity of the first feature vector and the second feature map at that location.
  • This embodiment generates similarity maps of different scales by multiplying multiple first feature vectors of different scales and second feature maps of corresponding scales element by element, and replaces the inner product or cosine distance method by multiplying element by element. , Can make the similarity map of each scale contain multi-channel similarity information, make the similarity feature expression more fully, and further improve the accuracy of the target query.
  • similarity maps of different scales can be converted into similarity maps of the same scale through upsampling, and then integrated to obtain an integrated similarity map.
  • it can be implemented by either of the following two implementation manners, which will be described separately below.
  • S703 may include: up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale; adding multiple similarity maps of the same scale to obtain the integrated Similarity graph.
  • multiple similarity maps of different scales may be respectively up-sampled into the same scale, and then added, so as to obtain the integrated similarity.
  • the scales of the three are m1, m2, m3, where m1>m2>m3. Then you can up-sample B and C separately, increase the scales of B and C to m1, and then add A and the up-sampled B and C to obtain the integrated similarity map. At this time, integrate The scale of the subsequent similarity map is m1.
  • S703 may include:
  • A, B, and C Take three similarity graphs as an example to illustrate the implementation.
  • the scales of the three are m1, m2, m3, where m1>m2>m3.
  • C can be up-sampled first, and the scale of C can be increased to m2, and then B and the up-sampled C can be added to obtain a new similarity map D.
  • the scale of D is m2. Then D is up-sampled, the scale of D is increased to m1, and A and the up-sampled D are added to obtain the final integrated similarity map.
  • S704 Determine the target to be queried in the second image according to the integrated similarity map.
  • S704 is similar to S102 in the embodiment of FIG. 1, and will not be repeated here.
  • multiple similarity maps of different scales are determined based on multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, and then the multiple similarity maps of different scales are integrated , Obtain the integrated similarity map, and then determine the target to be queried in the second image according to the integrated similarity map, which can integrate multiple similarities at different scales, so that the integrated similarity includes multiple scales To further improve the accuracy of the target query.
  • FIG. 8 is a schematic flowchart of a target detection method provided by another embodiment of this application.
  • the difference between this embodiment and the embodiment in FIG. 7 is that after determining multiple similarity maps of different scales in S702, before integrating the multiple similarity maps of different scales in S703, the multiple similarity maps of different scales are combined with corresponding The third feature map of the scale is multiplied element by element to obtain multiple similarity maps of different scales after processing.
  • the method includes:
  • S801 Perform feature extraction of multiple different scales on the second image and the first image respectively, and generate multiple first feature maps of different scales and multiple second feature maps of different scales.
  • S801 is similar to S101 in the embodiment of FIG. 1, and will not be repeated here.
  • S802 is similar to S702 in the embodiment of FIG. 7, and will not be repeated here.
  • S804 is similar to S704 in the embodiment of FIG. 7, and will not be repeated here.
  • S805 Determine the target to be queried in the second image according to the integrated similarity map.
  • a plurality of similarity maps of different scales determined according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales are used to compare the third feature maps of the second image.
  • the image is multiplied element by element, and multiple similarity maps of different scales can be used to guide the segmentation of the second image, thereby further improving the accuracy of the target query.
  • Fig. 9 is a flowchart of a target detection method provided by an embodiment of the present application.
  • the target detection method of the foregoing embodiment is executed by a neural network, which is trained by the following steps:
  • Step 901 Perform feature extraction of a plurality of different scales on the first sample image and the second sample image respectively to obtain a plurality of fourth feature maps of different scales and a plurality of fifth feature maps of different scales; among them, the first is the same Both the present image and the second sample image contain objects of the first category.
  • Step 902 Determine the object of the first category in the second sample image according to the labels of the fourth feature map and the first sample image of multiple different scales, and the fifth feature map of the corresponding scale;
  • the label is the result of labeling the objects of the first category contained in the first sample image.
  • Step 903 Adjust the network parameters of the neural network according to the determined difference between the object of the first category in the second sample image and the label of the second sample image; The result of labeling objects of the first category.
  • the above-mentioned target query method is realized by a neural network, and the neural network may be trained first before the target query is performed.
  • a first sample image and a second sample image containing objects of the same category can be obtained from a training set containing multiple sample images, and this object is the target to be queried in the training process.
  • the training set may include multiple subsets, and the sample images in each subset contain objects of the same category.
  • the categories may include vehicles, pedestrians, traffic lights (ie, traffic lights), etc.
  • the acquired first sample image and second sample image may both include traffic lights. Use the traffic lights as the target to be queried during this training. Label the traffic lights in the first sample image to obtain the label of the first sample image. Label the traffic lights in the second sample image to obtain the label of the second sample image.
  • the training process of this embodiment is similar to the process of the target detection method of the foregoing embodiment, and the specific implementation process can refer to the introduction of the foregoing embodiment.
  • the first sample image and the second sample image need to contain objects of the same category to train the neural network so that the neural network can recognize the association between images of the same category.
  • traffic lights can be used to train the neural network
  • street lights can be used to test the neural network or to apply the neural network.
  • FIG. 10 is a schematic flowchart of a target detection method provided by still another embodiment of this application.
  • the test method of the trained neural network in the embodiment of FIG. 9 is described in detail.
  • the method may further include:
  • test images including objects of the same category may be pre-formed into a test image set, and multiple test image sets may be formed into a total test set.
  • the first test image and the second test image are selected from a set of test images, and the neural network is tested through the first test image and the second test image.
  • the neural network can be tested through the first test image and the second test image containing street lights.
  • one sample can be selected as the first test image for each test category in the test image set.
  • one image is selected as the first test image for each category (a total of 20 categories).
  • One test image is then input into the model shown in Figure 2 or Figure 5 for evaluation, where the test image in the test data pair Contains the same type of target.
  • the test may be performed after 100 trainings, or the test may be performed after 120 trainings.
  • the target method of this embodiment can also be accurately detected.
  • the method of randomly selecting test data pairs in the embodiments of the present application can also reduce the task’s strong dependence on samples, and can also accurately detect types of samples that are difficult to collect in actual application scenarios, and can avoid traditional randomly selected test pairs.
  • the problem of uneven selection of categories is caused, and it also solves the problem of floating evaluation indicators due to the different quality of support samples. For example: in the target detection task in automatic driving, a certain target category in the scene that does not provide a large number of training samples can also be accurately detected.
  • FIG. 11 is a schematic flowchart of a smart driving method provided by an embodiment of the application. As shown in Figure 11, the method may include:
  • S1103 Control the smart driving device that collects road images according to the query result.
  • the smart driving device may include an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistant System (ADAS), a robot, and the like.
  • ADAS Advanced Driving Assistant System
  • the road image is used as the above-mentioned second image, and the supporting image is used as the above-mentioned first image. Then the intelligent driving equipment is controlled according to the target detection result.
  • intelligent driving equipment such as autonomous vehicles or robots to perform operations such as deceleration, braking, and steering, or to send instructions such as deceleration, braking, and steering to the driver of an ADAS-equipped vehicle. For example, if the query result shows that the traffic indicator in front of the smart driving device is red, the smart driving device is controlled to slow down and stop. If the query result shows that there is a pedestrian in front of the smart driving device, the smart driving device is controlled to brake.
  • FIG. 12 is a schematic diagram of a target detection process provided by an embodiment of this application.
  • the first image is input to the first convolutional neural network to obtain multiple first feature maps of different scales of the first image
  • the second image is input to the second convolutional neural network to obtain multiple second features of different scales of the second image Figure.
  • the second feature map of the second image, the first feature map of the first image, and the label information of the first image are input to the generating module to obtain similarity maps of multiple scales.
  • the similarity maps of multiple scales are input to the aggregation module to obtain the integrated similarity map.
  • Input the integrated similarity map to the third convolutional neural network to obtain the semantic segmentation map of the second image, so as to realize the target detection of the second image.
  • FIG. 13 is a schematic diagram of a generation module and an aggregation module provided by an embodiment of the application.
  • conv represents the convolutional layer
  • pool represents the pooling process.
  • the feature map of the first image is input to the first convolution channel of the generating module 131 to obtain multiple first feature maps of different scales.
  • the feature map of the second image is input to the second convolution channel of the generating module 131 to obtain a plurality of second feature maps of different scales, which are then multiplied and pooled with the label information of the first image to obtain the image of the first image. Multiple feature vectors of different scales.
  • Multiple feature maps of different scales of the second image are respectively multiplied element by element with feature vectors of corresponding scales to obtain multiple similarity maps of different scales.
  • the generating module 131 outputs multiple similarity maps of different scales to the aggregation module 132, and the aggregation module 132 integrates the multiple similarity maps of different scales, and outputs the integrated similarity maps.
  • FIG. 14 is a schematic diagram of comparison between the similarity feature extraction method and the similarity feature extraction method through inner product or cosine distance in the target detection method provided by an embodiment of the application.
  • the left part of the figure is a schematic diagram of similarity features extracted by inner product or cosine distance.
  • the right part of the figure is a schematic diagram of extracting similarity features by multiplying the vectors of corresponding pixel positions.
  • the method proposed in the embodiment of the present application uses a method of element-wise multiplication to change the output similarity map from a single channel to a multi-channel, which can retain the channel information of the similarity information, and at the same time It can be combined with subsequent convolution and nonlinear operations to further rationally express similarity features, thereby further improving the accuracy of target detection.
  • FIG. 15 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • the target detection device provided by the embodiment of the present application can execute the processing flow provided in the embodiment of the target detection method.
  • the target detection device 150 provided in this embodiment includes: a feature extraction module 151 and a determination module 152;
  • the extraction module 151 is used for extracting multiple features of different scales on the first image and the second image to obtain multiple first feature maps of different scales and multiple second feature maps of different scales;
  • the determining module 152 uses To determine the target to be queried in the second image according to the labels of the first feature map and the first image of multiple different scales, and the second feature map of the corresponding scale; The result of marking the target to be queried.
  • the feature extraction module 151 performs feature extraction of multiple different scales on the first image and the second image respectively to obtain multiple first feature maps of different scales and multiple second feature maps of different scales, specifically It includes: extracting features of the first image and the second image respectively to obtain the first feature map and the second feature map; respectively performing multiple scale transformations on the first feature map and the second feature map to obtain multiple first feature maps of different scales.
  • the feature extraction module 151 when the feature extraction module 151 performs multiple scale transformations on the first feature map and the second feature map respectively, it specifically includes: performing down-sampling on the first feature map and the second feature map at least twice, respectively.
  • the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, it specifically includes: The first feature maps of different scales and the labels of the first image determine multiple first feature vectors of different scales; the multiple first feature vectors of different scales and the second feature maps of corresponding scales are combined according to a preset calculation rule The calculation is performed to obtain the calculation result; the mask image of the second image is determined according to the calculation result; the target to be queried in the second image is determined according to the mask image.
  • the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, it specifically includes: The first feature map of different scales, the label of the first image, and the second feature map of the corresponding scale are used as the guidance information of the third feature map of the corresponding scale to determine the image to be queried in the second image; wherein the third feature map is based on The second image is determined, and the second feature map and the third feature map of the same scale are different.
  • the determining module 152 uses multiple first feature maps of different scales, labels of the first image, and second feature maps of corresponding scales as the guidance information of the third feature maps of corresponding scales to determine the to-be-determined image in the second image.
  • the query map specifically includes: determining a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and the labels of the first images; and combining the plurality of first feature vectors of different scales with the second feature vectors of corresponding scales.
  • the feature map is calculated according to preset calculation rules to obtain multiple mask images at different scales; according to the result of multiplying multiple mask images with different scales and the third feature map of the corresponding scale, the second image is determined Query target.
  • the preset calculation rules include: inner product calculation rules, or cosine distance calculation rules.
  • the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and label information of the first image, and second feature maps of corresponding scales, which specifically includes: The first feature map of different scales, the label information of the first image, and the second feature map of the corresponding scale determine multiple similarity maps of different scales; a similarity map of one scale represents the first feature map and the second feature of the scale Similarity of the graphs; integrate multiple similarity graphs of different scales to obtain an integrated similarity graph; determine the target to be queried in the second image according to the integrated similarity graph.
  • the determining module 152 determines multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, which specifically includes: The first feature map and the label information of the first image are determined to determine multiple first feature vectors of different scales; the multiple first feature vectors of different scales and the second feature maps of corresponding scales are multiplied element by element to obtain multiple Similarity graphs of different scales.
  • the determining module 152 integrates multiple similarity maps of different scales to obtain an integrated similarity map, which specifically includes: up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale.
  • the determining module 152 integrates a plurality of similarity maps of different scales to obtain an integrated similarity map, which specifically includes: a plurality of similarity maps of different scales constitute a similarity map set; The smallest similarity map is up-sampled to obtain a similarity map of the same scale as the second-smallest similarity map; add the obtained similarity map to the second-smallest similarity map to obtain a new similarity map ; The similarity map that has not been up-sampled or added in the similarity map set is combined with the new similarity map to form a new similarity map set, and the up-sampling step and the adding step are repeated until the last one is obtained Similarity graph, the last obtained similarity graph is the integrated similarity graph.
  • the determining module 152 is further configured to: multiply a plurality of similarity maps of different scales and a third feature map of corresponding scales element by element to obtain a plurality of processed similarity maps of different scales; wherein, the third The feature map is determined according to the second image, and the first feature map and the third feature map of the same scale are different; the processed similarity maps of different scales are integrated to obtain an integrated similarity map.
  • the target detection device is implemented by a neural network
  • the device further includes: a training module 153 for training to obtain a neural network by using the following steps.
  • This step includes: performing multiple operations on the first sample image and the second sample image, respectively.
  • the feature extraction of different scales obtains multiple fourth feature maps of different scales and multiple fifth feature maps of different scales; wherein, the first sample image and the second sample image both contain objects of the first category;
  • the labels of the fourth feature map and the first sample image of different scales, and the fifth feature map of the corresponding scale determine the first category of objects in the second sample image; the labels of the first sample image are the same as the first The result of labeling the objects of the first category contained in this image; adjust the network parameters of the neural network according to the determined difference between the objects of the first category in the second sample image and the labels of the second sample image;
  • second The label of the sample image is the result of labeling the objects of the first category contained in the second sample image.
  • the device further includes: a testing module 154 for testing the trained neural network; the testing module specifically uses the following steps to test the trained neural network: the first test image and the second test image are respectively tested Perform feature extraction of multiple different scales to obtain multiple first test feature maps of different scales and multiple second test feature maps of different scales; wherein the first test image and the second test image are derived from a test image set, Each test image in the test image set includes objects of the same category; according to multiple first test feature maps of different scales, labels of the first test images, and second test feature maps of corresponding scales, determine the second test image
  • the target to be queried; the label of the first test image is the result of labeling the target to be queried contained in the first test image.
  • the target detection device provided in the embodiment of the present application can be used to implement the above-mentioned target detection method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here in this embodiment.
  • FIG. 16 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
  • the intelligent driving device 160 provided in this embodiment includes: an acquisition module 161, a query module 162, and a control module 163; wherein, the acquisition module 161 is used to collect road images; the query module 162 is used to adopt the application In the target detection method provided by the embodiment, the collected road images are searched for the target to be queried according to the support image and the label of the support image; wherein the label of the support image is for the target contained in the support image and the target of the same category as the target to be queried. The result of labeling; the control module 163 is used to control the intelligent driving device that collects road images according to the query result.
  • the implementation of the smart driving device provided in the embodiment of the present application can refer to the foregoing smart driving method, and the implementation principle and technical effect are similar, and the details are not described herein again in this embodiment.
  • FIG. 17 is a schematic diagram of the hardware structure of a target detection device provided by an embodiment of the application.
  • the target detection device provided in the embodiment of the present application can execute the processing flow provided in the embodiment of the target detection method.
  • the target detection device 170 provided in this embodiment includes: at least one processor 171 and a memory 172.
  • the target detection device 170 also includes a communication component 173. Among them, the processor 171, the memory 172, and the communication component 173 are connected by a bus 174.
  • At least one processor 171 executes the computer-executable instructions stored in the memory 172, so that the at least one processor 171 executes the above target detection method.
  • FIG. 18 is a schematic diagram of the hardware structure of a smart driving device provided by an embodiment of the application.
  • the smart driving device provided in the embodiment of the present application can execute the processing flow provided in the smart driving method embodiment.
  • the smart driving device 180 provided in this embodiment includes: at least one processor 181 and a memory 182.
  • the smart driving device 180 also includes a communication component 183. Among them, the processor 181, the memory 182, and the communication component 183 are connected by a bus 184.
  • At least one processor 181 executes the computer-executable instructions stored in the memory 182, so that the at least one processor 181 executes the above intelligent driving method.
  • the processor may be a central processing unit (English: Central Processing Unit, abbreviated as: CPU), or other general-purpose processors, digital signal processors ( English: Digital Signal Processor, abbreviation: DSP), Application Specific Integrated Circuit (English: Application Specific Integrated Circuit, abbreviation: ASIC), etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in combination with the application can be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory.
  • NVM non-volatile storage
  • the bus can be an Industry Standard Architecture (ISA) bus, Peripheral Component (PCI) bus, or Extended Industry Standard Architecture (EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the buses in the drawings of this application are not limited to only one bus or one type of bus.
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the target detection method or the intelligent driving method are realized.
  • an embodiment of the present application further provides a chip for executing instructions.
  • the chip includes a memory and a processor.
  • the memory stores code and data.
  • the memory is coupled with the processor.
  • the processor runs the code in the memory so that the chip is used to execute the steps of the above-mentioned target detection method or smart driving method.
  • the embodiment of the present application further provides a program product containing instructions, which when the program product runs on a computer, causes the computer to execute the steps of the above-mentioned target detection method or smart driving method.
  • the embodiment of the present application further provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the above-mentioned target detection method or smart driving method.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor to execute the method described in each embodiment of the present application. Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Abstract

Target detection and intelligent driving methods and apparatuses, a device, and a storage medium. The target detection method comprises: performing feature extraction in multiple different dimensions respectively on a first image and a second image to obtain a first feature map in multiple different dimensions and a second feature map in multiple different dimensions (S101); and determining, according to the first feature map in multiple different dimensions and a label of the first image and the second feature map in corresponding dimensions, a target to be queried in the second image, the label of the first image being the result of labeling the target to be queried comprised in the first image (S102). The first image and the second image are expressed as features in multiple different dimensions, feature expression capabilities of the first image and the second image can be improved, thereby improving the accuracy of target detection.

Description

目标检测、智能行驶方法、装置、设备及存储介质Target detection, intelligent driving method, device, equipment and storage medium
本申请要求于2019年10月31日提交中国专利局,申请号为201911054823.1,申请名称为“目标检测、智能行驶方法、装置、设备及存储介质”的中国发明专利申请、以及于2019年10月31日提交中国专利局,申请号为201911063316.4,申请名称为“目标查询方法、装置、设备及存储介质”的中国发明专利申请的优先权,其与本申请的全部内容通过引用结合在本申请中。This application is required to be submitted to the Chinese Patent Office on October 31, 2019. The application number is 201911054823.1, and the application name is a Chinese invention patent application named "target detection, intelligent driving method, device, equipment and storage medium", and in October 2019 Submitted to the Chinese Patent Office on the 31st, the application number is 201911063316.4, the priority of the Chinese invention patent application with the application name "target search method, device, equipment and storage medium", which is incorporated into this application by reference with the entire content of this application .
技术领域Technical field
本申请涉及图像处理领域,具体涉及一种目标检测、智能行驶方法、装置、设备及存储介质。This application relates to the field of image processing, in particular to a method, device, device, and storage medium for target detection and intelligent driving.
背景技术Background technique
单样本语义分割是计算机视觉领域、智能图像处理领域的新兴问题,单样本语义分割旨在通过某类别的单个训练样本使得分割模型具备在识别该类所在像素的能力。单样本语义分割的提出能够有效降低传统图像语义分割问题的样本采集和标注成本。Single-sample semantic segmentation is an emerging problem in the field of computer vision and intelligent image processing. Single-sample semantic segmentation aims to use a single training sample of a certain category to make the segmentation model have the ability to recognize the pixel in the category. The proposal of single-sample semantic segmentation can effectively reduce the cost of sample collection and annotation of traditional image semantic segmentation problems.
单样本图像语义分割旨在对于某一类别的物体仅仅使用单一样本训练,就能使得分割模型具备识别该物体所有像素的能力。目标查询可以通过图像语义分割的方式查询图像中包含的目标。图像语义分割包括单样本图像语义分割。传统图像语义分割要求对所有类别的物体都有大量的训练图像来保证模型性能,因此带来了极高的标注成本。Single-sample image semantic segmentation aims to train only a single sample for a certain category of objects, so that the segmentation model has the ability to recognize all pixels of the object. Target query can query the target contained in the image by means of image semantic segmentation. Image semantic segmentation includes single-sample image semantic segmentation. Traditional image semantic segmentation requires a large number of training images for all categories of objects to ensure model performance, which brings extremely high labeling costs.
发明内容Summary of the invention
本申请的目的在于提供一种目标检测、智能行驶方法、装置、设备及存储介质,以解决现有目标检测精度低的技术问题。The purpose of this application is to provide a target detection and intelligent driving method, device, equipment, and storage medium to solve the existing technical problem of low target detection accuracy.
为解决上述技术问题,本申请的技术方案是这样实现的:In order to solve the above technical problems, the technical solution of this application is implemented as follows:
在一个实施例中,提供了一种目标检测方法,包括:分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图;根据多个不同尺度的第一特征图和所述第一图 像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标;所述第一图像的标签是对所述第一图像中包含的待查询目标进行标注的结果。In one embodiment, a target detection method is provided, which includes: extracting a plurality of features of different scales on a first image and a second image, respectively, to obtain a plurality of first feature maps of different scales and a plurality of features of different scales. The second feature map; according to multiple first feature maps of different scales and labels of the first image, and the second feature map of corresponding scales, determine the target to be queried in the second image; the first The label of an image is a result of labeling the target to be queried contained in the first image.
在另一个实施例中,提供了一种智能行驶方法,包括:采集道路图像;采用如上所述的目标检测方法根据支持图像以及所述支持图像的标签对采集到的道路图像进行待查询目标的查询;其中,所述支持图像的标签是对所述支持图像中包含的与所述待查询目标同一类别的目标进行标注的结果;根据查询结果对采集道路图像的智能行驶设备进行控制。In another embodiment, an intelligent driving method is provided, which includes: collecting road images; using the target detection method as described above to perform a search on the collected road images of the target to be queried according to the supporting image and the label of the supporting image. Query; wherein the label of the support image is the result of marking the target contained in the support image in the same category as the target to be queried; according to the query result, the intelligent driving device that collects the road image is controlled.
在另一个实施例中,提供了一种目标检测装置,包括:特征提取模块和确定模块;所述特征提取模块,用于分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图;所述确定模块,用于根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标;所述第一图像的标签是对所述第一图像中包含的待查询目标进行标注的结果。In another embodiment, a target detection device is provided, which includes: a feature extraction module and a determination module; the feature extraction module is used to perform feature extraction of a plurality of different scales on a first image and a second image, respectively, Obtain a plurality of first feature maps of different scales and a plurality of second feature maps of different scales; the determining module is used to obtain a plurality of first feature maps of different scales, the labels of the first images, and the corresponding scales The second feature map in the second image determines the target to be queried in the second image; the label of the first image is the result of labeling the target to be queried contained in the first image.
在另一个实施例中,提供了一种智能行驶装置,包括:采集模块,用于采集道路图像;查询模块,用于采用如上所述的目标检测方法根据支持图像以及所述支持图像的标签对采集到的道路图像进行待查询目标的查询;其中,所述支持图像的标签是对所述支持图像中包含的与所述待查询目标同一类别的目标进行标注的结果;控制模块,用于根据查询结果对采集道路图像的智能行驶设备进行控制。In another embodiment, an intelligent driving device is provided, which includes: a collection module for collecting road images; a query module for adopting the target detection method described above according to the support image and the tag pair of the support image. The collected road images are used to query the target to be queried; wherein the label of the support image is the result of marking the target contained in the support image and the target of the same category as the target to be queried; the control module is used for The query result controls the intelligent driving equipment that collects road images.
在另一个实施例中,提供了一种目标检测设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述的目标检测方法。In another embodiment, a target detection device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements the above-mentioned program when the program is executed. Target detection method.
在另一个实施例中,提供了一种智能行驶设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如上所述的智能行驶方法。In another embodiment, a smart driving device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program when the program is executed. The smart driving method as described above.
在另一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现所述目标检测方法的步骤,或者该程序被处理器执行时实现所述智能行驶方法的步骤。In another embodiment, a computer-readable storage medium is provided, and a computer program is stored thereon, and when the program is executed by a processor, the steps of the target detection method are realized, or when the program is executed by the processor, all the steps are realized. Describe the steps of the smart driving method.
在再一个实施例中,提供了一种运行指令的芯片,所述芯片包括存储器、 处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述的目标检测方法的步骤,或者,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述的智能行驶方法的步骤。In yet another embodiment, a chip for running instructions is provided, the chip includes a memory and a processor, the memory stores code and data, the memory is coupled with the processor, and the processor runs all The code in the memory enables the chip to execute the steps of the above-mentioned target detection method, or the processor runs the code in the memory so that the chip is used to execute the steps of the above-mentioned smart driving method.
在又一个实施例中,提供了一种包含指令的程序产品,当所述程序产品在计算机上运行时,使得所述计算机执行上述的目标检测方法的步骤,或者,当所述程序产品在计算机上运行时,使得所述计算机执行上述的智能行驶方法的步骤。In yet another embodiment, a program product containing instructions is provided. When the program product runs on a computer, the computer executes the steps of the above-mentioned target detection method, or when the program product runs on a computer When running on, the computer is made to execute the steps of the smart driving method described above.
在又一个实施例中,提供了一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述的目标检测方法的步骤,或者,当所述计算机程序被处理器执行时,用于执行上述的智能行驶方法的步骤。In yet another embodiment, a computer program is provided. When the computer program is executed by a processor, it is used to execute the steps of the above-mentioned target detection method, or when the computer program is executed by a processor, To perform the steps of the smart driving method described above.
由上面的技术方案可见,上述实施例中由于获取到了不同尺度的第一特征图和第二特征图,提高了第一图像和第二图像的特征表达能力,从而能够获取到更多的判断第一图像和第二图像之间的相似性的信息,使得后续的目标检测在面对单样本时,具有更加丰富的特征输入,从而提高单样本语义分割的分割精度,进而提高目标检测精度。It can be seen from the above technical solutions that in the above embodiment, since the first feature map and the second feature map of different scales are obtained, the feature expression ability of the first image and the second image is improved, so that more judgments can be obtained. The similarity information between the first image and the second image enables the subsequent target detection to have a richer feature input when facing a single sample, thereby improving the segmentation accuracy of the single-sample semantic segmentation, thereby improving the target detection accuracy.
附图说明Description of the drawings
以下附图仅对本申请做示意性说明和解释,并不限定本申请的范围:The following drawings only schematically illustrate and explain the application, and do not limit the scope of the application:
图1为本申请实施例提供的目标检测方法流程图;FIG. 1 is a flowchart of a target detection method provided by an embodiment of the application;
图2为本申请实施例提供的目标检测模型的结构示意图;2 is a schematic structural diagram of a target detection model provided by an embodiment of the application;
图3为本申请实施例提供的目标检测方法流程图;FIG. 3 is a flowchart of a target detection method provided by an embodiment of the application;
图4为本申请实施例提供的对称级联结构的结构示意图;4 is a schematic structural diagram of a symmetric cascade structure provided by an embodiment of the application;
图5为本申请实施例提供的目标检测方法流程图;FIG. 5 is a flowchart of a target detection method provided by an embodiment of the application;
图6为本申请另一实施例提供的目标检测模型的结构示意图;6 is a schematic structural diagram of a target detection model provided by another embodiment of this application;
图7为本申请又一实施例提供的目标查询方法的流程示意图;FIG. 7 is a schematic flowchart of a target query method provided by another embodiment of this application;
图8为本申请另一实施例提供的目标查询方法的流程示意图;FIG. 8 is a schematic flowchart of a target query method provided by another embodiment of this application;
图9为本申请再一实施例提供的目标查询方法的流程示意图;FIG. 9 is a schematic flowchart of a target query method provided by still another embodiment of this application;
图10为本申请还一实施例提供的目标查询方法的流程示意图;FIG. 10 is a schematic flowchart of a target query method provided by another embodiment of this application;
图11为本申请实施例提供的智能行驶方法的流程示意图;FIG. 11 is a schematic flowchart of a smart driving method provided by an embodiment of the application;
图12为本申请实施例提供的目标检测过程的示意图;FIG. 12 is a schematic diagram of a target detection process provided by an embodiment of the application;
图13为本申请实施例提供的生成模块和聚合模块的示意图;FIG. 13 is a schematic diagram of a generation module and an aggregation module provided by an embodiment of the application;
图14为本申请实施例提供的目标查询方法中相似性特征提取方式与相关技术中提取方式的对比示意图;FIG. 14 is a schematic diagram of comparison between the similarity feature extraction method in the target query method provided by the embodiment of the application and the extraction method in the related technology;
图15为本申请实施例提供的目标检测装置的结构示意图;FIG. 15 is a schematic structural diagram of a target detection device provided by an embodiment of the application;
图16为本申请实施例提供的智能行驶装置的结构示意图;FIG. 16 is a schematic structural diagram of a smart driving device provided by an embodiment of the application;
图17为本申请实施例提供的目标检测设备的结构示意图;FIG. 17 is a schematic structural diagram of a target detection device provided by an embodiment of the application;
图18为本申请实施例提供的智能行驶设备的结构示意图。Fig. 18 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of the present application, but not all of the embodiments.
现有技术中,单样本图像语义分割的深度学习模型是对查询集图像和支持集图像分别进行特征提取,其中,查询集图像为需要进行目标查询的图像,支持集图像中包含待查询的目标,支持集图像中待查询的目标预先经过标注得到标签信息。结合标签信息,通过支持集图像的特征和查询集图像的特征之间的相似性,确定查询集图像中的目标。In the prior art, the single-sample image semantic segmentation deep learning model is to perform feature extraction on the query set image and the support set image respectively, where the query set image is the image that needs to be queried, and the support set image contains the target to be queried. , The target to be queried in the support set image is labeled in advance to obtain the label information. Combining the label information, the target in the query set image is determined by the similarity between the features of the support set image and the feature of the query set image.
然而现有技术中,深度学习模型将支持集图像表达为单一的特征向量,对于支持集图像的特征表达能力有限,进而导致模型描述支持集图像特征与查询图像像素特征之间的相似性能力不足,导致目标查询的精度低。However, in the prior art, the deep learning model expresses the support set image as a single feature vector, and the feature expression ability of the support set image is limited, which leads to the insufficient ability of the model to describe the similarity between the support set image feature and the query image pixel feature. , Resulting in low accuracy of the target query.
本申请实施例中,第一图像可以是上述的支持集图像,第二图像可以是上述的查询集图像,通过对第一图像和第二图像进行多个不同尺度的特征提取,将第一图像和第二图像表达为多个不同尺度的特征,提高第一图像和第二图像的特征表达能力,从而能够获取到更多的判断第一图像和第二图像之间的相似性的信息,进而提高目标查询的精准度。In the embodiment of the present application, the first image may be the above-mentioned support set image, and the second image may be the above-mentioned query set image. The first image and the second image are extracted with multiple features of different scales. And the second image are expressed as multiple features of different scales, which improves the feature expression ability of the first image and the second image, so that more information for judging the similarity between the first image and the second image can be obtained, and then Improve the accuracy of target query.
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solutions of the present application and how the technical solutions of the present application solve the above technical problems will be described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below in conjunction with the accompanying drawings.
图1为本申请实施例提供的目标检测方法流程图。本申请实施例针对 现有技术的如上技术问题,提供了目标检测方法,该方法具体步骤如下:FIG. 1 is a flowchart of a target detection method provided by an embodiment of the application. In view of the above technical problems in the prior art, the embodiments of the present application provide a target detection method. The specific steps of the method are as follows:
步骤101、分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图。Step 101: Perform multiple feature extractions of different scales on the first image and the second image, respectively, to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
本实施例中,第二图像为需要进行目标查询的图像,通过目标查询可以检测出第二图像中包含的待查询目标所在的像素区域。其中,待查询目标可以根据实际情况进行确定,例如,可以为动物、植物、人物、车辆等等,在此不作限定。标签信息可以是第一图像中待查询目标的轮廓信息、像素信息等,在此不作限定。可选地,标签信息可以为二值化标签,在二值化标签中目标所在的像素点区域与图像中其他区域的像素值不同。In this embodiment, the second image is an image for which a target query needs to be performed. Through the target query, the pixel area where the target to be queried contained in the second image is located can be detected. Among them, the target to be queried can be determined according to actual conditions, for example, it can be an animal, plant, person, vehicle, etc., which is not limited here. The label information may be contour information, pixel information, etc. of the target to be queried in the first image, which is not limited here. Optionally, the tag information may be a binarized tag, and the pixel value of the pixel area where the target is located in the binarized tag is different from the pixel values of other areas in the image.
本实施例的目标检测方法可以应用于车辆的目标检测过程中,车辆可以是自动驾驶车辆,或者是搭载有高级辅助驾驶(Advanced Driver Assistance Systems,ADAS)系统的车辆等。可以理解的是,目标检测方法还可以应用于机器人。以车辆为例,第一图像和第二图像可以由车辆上的图像采集设备采集得到,图像采集设备可以是相机,例如单目相机、双目相机等。The target detection method of this embodiment can be applied to the target detection process of a vehicle. The vehicle can be an autonomous vehicle or a vehicle equipped with an Advanced Driver Assistance Systems (ADAS) system. It is understandable that the target detection method can also be applied to robots. Taking a vehicle as an example, the first image and the second image may be acquired by an image acquisition device on the vehicle, and the image acquisition device may be a camera, such as a monocular camera, a binocular camera, and the like.
本实施例中,可以通过特征提取算法对第一图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图;对第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第二特征图。其中,特征提取算法可以为CNN(Convolutional Neural Networks,卷积神经网络)算法、LBP(Local Binary Pattern,局部二值模式)算法、SIFT(Scale-invariant feature transform,尺度不变特征变换)算法、HOG(Histogram of Oriented Gradient,方向梯度直方图)算法等,在此不作限定。In this embodiment, the first image can be extracted with multiple features of different scales through the feature extraction algorithm to obtain multiple first feature maps of different scales; the second image can be extracted with multiple features of different scales to obtain multiple The second feature map of different scales. Among them, the feature extraction algorithm can be CNN (Convolutional Neural Networks, convolutional neural network) algorithm, LBP (Local Binary Pattern, local binary pattern) algorithm, SIFT (Scale-invariant feature transform, scale-invariant feature transform) algorithm, HOG (Histogram of Oriented Gradient, directional gradient histogram) algorithm, etc., are not limited here.
在本实施例中,特征提取算法为CNN(Convolutional Neural Networks,卷积神经网络)算法的情况下,本实施例的目标检测方法可以适用于图2所示的目标检测模型。如图2所示,该目标检测模型20包括:特征提取网络21、尺度变换模块22和卷积网络23。其中,特征提取网络21是神经网络,特征提取网络21可以采用已有的网络架构,例如VGG(Visual Geometry Group)网络、Resnet网络或者是其他通用的图像特征提取网络等。例如,可以将第一图像和第二图像同时输入特征提取网络21中进行多个不同尺度的特征提取;也可以设置两个特征提取网络21,这两个特征提取网络21具有相同的网络架构和网络参数,将第一图像和第二图像分别输入这两个 特征提取网络21,以分别对第一图像和第二图像进行多个不同尺度的特征提取。例如,可以预先指定多个不同尺度,针对每个尺度,分别对第一图像和第二图像进行该尺度的特征提取,得到该尺度的第一特征图和第二特征图。In this embodiment, when the feature extraction algorithm is a CNN (Convolutional Neural Networks, convolutional neural network) algorithm, the target detection method of this embodiment can be applied to the target detection model shown in FIG. 2. As shown in FIG. 2, the target detection model 20 includes: a feature extraction network 21, a scale transformation module 22 and a convolution network 23. Among them, the feature extraction network 21 is a neural network, and the feature extraction network 21 can adopt an existing network architecture, such as a VGG (Visual Geometry Group) network, a Resnet network, or other general image feature extraction networks. For example, the first image and the second image can be input into the feature extraction network 21 at the same time for feature extraction of multiple different scales; or two feature extraction networks 21 can be set up, and the two feature extraction networks 21 have the same network architecture and For network parameters, the first image and the second image are respectively input to the two feature extraction networks 21 to perform feature extraction of multiple different scales on the first image and the second image, respectively. For example, multiple different scales can be pre-designated, and for each scale, feature extraction of the scale is performed on the first image and the second image respectively to obtain the first feature map and the second feature map of the scale.
步骤102、根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的第二特征图,确定第二图像中的待查询目标;第一图像的标签是对第一图像中包含的待查询目标进行标注的结果。Step 102: Determine the target to be queried in the second image according to the labels of the first feature map and the first image of multiple different scales, and the second feature map of the corresponding scale; Contains the results of marking the target to be queried.
本实施例中,针对每个尺度的第一特征图、第二特征图,可以结合第一图像的标签信息,得到表征该尺度的第一特征图和第二特征图的相似性的相似度图。然后通过不同尺度的相似度图,可以确定第二图像中的待查询目标。In this embodiment, for the first feature map and the second feature map of each scale, the label information of the first image can be combined to obtain a similarity map that characterizes the similarity between the first feature map and the second feature map of the scale. . Then, through similarity maps of different scales, the target to be queried in the second image can be determined.
本实施例通过分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图;根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的所述第二特征图,确定第二图像中的待查询目标;第一图像的标签是对第一图像中包含的待查询目标进行标注的结果。由于获取到了不同尺度的第一特征图和第二特征图,提高了第一图像和第二图像的特征表达能力,从而能够获取到更多的判断第一图像和第二图像之间的相似性的信息,使得后续的目标检测在面对单样本时,具有更加丰富的特征输入,从而提高单样本语义分割的分割精度,进而提高目标检测精度。In this embodiment, by separately extracting multiple features of different scales on the first image and the second image, multiple first feature maps of different scales and multiple second feature maps of different scales are obtained; according to multiple first feature maps of different scales A feature map and the label of the first image, as well as the second feature map of the corresponding scale, determine the target to be queried in the second image; the label of the first image is to label the target to be queried contained in the first image result. Since the first feature map and the second feature map of different scales are obtained, the feature expression ability of the first image and the second image is improved, so that more judgments about the similarity between the first image and the second image can be obtained The information, which enables subsequent target detection to have a richer feature input when facing a single sample, thereby improving the segmentation accuracy of single-sample semantic segmentation, thereby improving the accuracy of target detection.
在本申请实施例中,如果第一图像中与包含待查询的目标为同一类的目标,那么第一图像中包含的与包含待查询的目标为同一类的目标的姿态、纹理、颜色等信息可以与第一图像中包含的与包含待查询的目标为同一类的目标的姿态、纹理、颜色等信息不同。例如,待查询的目标为红绿灯,第一图像中包含的红绿灯为竖向排列,第二图像中如果包含红绿灯的话,第二图像中的红绿灯可以是横向排列,并且红绿灯在第一图像和第二图像中的状态可以不一致。In the embodiment of the present application, if the first image contains the target of the same type as the target to be queried, the first image contains the posture, texture, color and other information of the target of the same type as the target to be queried. It may be different from the posture, texture, color and other information of the target included in the first image and of the same type as the target to be queried. For example, the target to be queried is traffic lights, the traffic lights contained in the first image are arranged vertically, and if traffic lights are contained in the second image, the traffic lights in the second image can be arranged horizontally, and the traffic lights are arranged in the first image and the second image. The state in the image can be inconsistent.
如图3所示,分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图,包括:As shown in Fig. 3, multiple feature extractions of different scales are performed on the first image and the second image respectively to obtain multiple first feature maps of different scales and multiple second feature maps of different scales, including:
步骤301、分别对第一图像和第二图像进行特征提取,得到第一特征 图和第二特征图。Step 301: Perform feature extraction on the first image and the second image respectively to obtain a first feature map and a second feature map.
如图2所示,特征提取网络21包括第一卷积模块211、第二卷积模块212和第三卷积模块213,其中,第一卷积模块211包括3个依次连接的卷积层,第二卷积模块212和第三卷积模块213分别包括1个卷积层。As shown in FIG. 2, the feature extraction network 21 includes a first convolution module 211, a second convolution module 212, and a third convolution module 213. The first convolution module 211 includes three convolution layers connected in sequence, The second convolution module 212 and the third convolution module 213 each include one convolution layer.
例如,可以将第一图像和第二图像同时输入如图2所示的第一卷积模块211中,第一卷积模块211根据第一图像和第二图像分别输出相应的特征提取结果,紧接着第一卷积模块211根据第一图像和第二图像分别输出的特征提取结果再输入第二卷积模块212中,第二卷积模块212根据第一卷积模块211基于第一图像和第二图像的特征提取结果分别输出相应的特征提取结果,紧接着第二卷积模块212根据第一卷积模块211基于第一图像和第二图像的特征提取结果分别输出的特征提取结果再输入第三卷积模块213中,以通过第三卷积模块213继续根据第二卷积模块212输出的特征提取结果进行特征提取,从而分别输出第一图像的特征提取结果和第二图像的特征提取结果,分别为第一特征图和第二特征图。For example, the first image and the second image can be simultaneously input into the first convolution module 211 shown in FIG. 2, and the first convolution module 211 respectively outputs corresponding feature extraction results according to the first image and the second image. Then the first convolution module 211 outputs the feature extraction results respectively according to the first image and the second image and then inputs it into the second convolution module 212. The second convolution module 212 according to the first convolution module 211 is based on the first image and the first image. The feature extraction results of the two images respectively output the corresponding feature extraction results, and then the second convolution module 212 according to the first convolution module 211 based on the feature extraction results of the first image and the second image respectively output the feature extraction results and then input the second image In the three-convolution module 213, the third convolution module 213 continues to perform feature extraction according to the feature extraction result output by the second convolution module 212, so as to output the feature extraction result of the first image and the feature extraction result of the second image respectively , Are the first feature map and the second feature map, respectively.
步骤302、分别对第一特征图和第二特征图进行多次尺度变换,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图。Step 302: Perform multiple scale transformations on the first feature map and the second feature map to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
如图2所示,分别将第一特征图和第二特征图输入尺度变换模块22,以通过尺度变换模块22分别对第一特征图和第二特征图进行多次尺度转换,从而分别将第一图像和第二图像表达为多个不同尺寸的特征图。As shown in FIG. 2, the first feature map and the second feature map are respectively input to the scale transformation module 22 to perform multiple scale conversions on the first feature map and the second feature map respectively through the scale transformation module 22, thereby respectively The first image and the second image are expressed as multiple feature maps of different sizes.
可选的,分别对第一特征图和第二特征图进行多次尺度转换,包括:对第一特征图和第二特征图分别进行至少两次降采样。Optionally, performing multiple scale conversions on the first feature map and the second feature map respectively includes: performing down-sampling on the first feature map and the second feature map at least twice, respectively.
可选的,对第一特征图和第二特征图分别进行至少两次降采样,包括:分别采用第一采样率对第一特征图和第二特征图进行降采样,得到比第一图像降采样第一倍数的第一特征图和比第二图像降采样第二倍数的第二特征图;分别采用第二采样率对比第一图像降采样第一倍数的第一特征图和比第二图像降采样第二倍数的第二特征图进行降采样,得到比第一图像降采样第二倍数的第一特征图和比第二图像降采样第二倍数的第二特征图,第二倍数大于第一倍数。Optionally, performing down-sampling on the first feature map and the second feature map at least twice, respectively, includes: down-sampling the first feature map and the second feature map at the first sampling rate, respectively, to obtain a lower sampling rate than the first image. Sampling the first feature map of the first multiple and the second feature map down-sampling the second multiple of the second image; compare the first feature map of the first image down-sampling the first feature map and the second image using the second sampling rate respectively Downsampling the second feature map by the second multiple of the down-sampling to obtain the first feature map that is down-sampled by the second multiple of the first image and the second feature map that is down-sampled by the second multiple of the second image, the second multiple is greater than the first A multiple.
例如,采用第一采样率对第一特征图进行降采样,得到比第一图像降采样第一倍数的第一特征图;紧接着采用第二采样率继续对比第一图像降 采样第一倍数的第一特征图进行降采样,得到比第一图像降采样第二倍数的第一特征图,其中,第二倍数大于第一倍数。同样地,对于第二特征图,也是采用第一采样率对第二特征图进行降采样,得到比第二图像降采样第一倍数的第二特征图;紧接着采用第二采样率继续对比第二图像降采样第二倍数的第二特征图进行降采样,得到比第二图像降采样第二倍数的第二特征图。For example, use the first sampling rate to downsample the first feature map to obtain the first feature map that is downsampled by the first multiple of the first image; then use the second sampling rate to continue to compare the first image downsampled by the first multiple The first feature map is down-sampled to obtain a first feature map that is down-sampled by a second multiple of the first image, where the second multiple is greater than the first multiple. Similarly, for the second feature map, the first sampling rate is also used to downsample the second feature map to obtain the second feature map that is downsampled by the first multiple of the second image; then the second sampling rate is used to continue to compare the first feature map. The second feature map that is down-sampled by the second multiple of the two images is down-sampled to obtain a second feature map that is down-sampled by the second multiple of the second image.
可选的,在分别采用第一采样率对第一特征图和第二特征图进行降采样,得到比第一图像降采样第一倍数的第一特征图和比第二图像降采样第二倍数的第二特征图之后,本申请实施例的方法还包括:分别采用第三采样率对比第一图像降采样第二倍数的第一特征图和比第二图像降采样第二倍数的第二特征图进行降采样,得到比第一图像降采样第三倍数的第一特征图和比第二图像降采样第三倍数的第二特征图,第三倍数大于第二倍数。可选的,第一倍数、第二倍数和第三倍数分别为8倍、16倍和32倍。Optionally, the first feature map and the second feature map are down-sampled using the first sampling rate, respectively, to obtain the first feature map that is down-sampled by a first multiple of the first image and a second multiple that is down-sampled than the second image. After the second feature map of the present application, the method of the embodiment of the present application further includes: using a third sampling rate to compare the first feature map that is down-sampled by a second multiple of the first image and the second feature that is down-sampled by a second multiple of the second image. The image is down-sampled to obtain a first feature map that is down-sampled by a third multiple of the first image and a second feature map that is down-sampled by a third multiple of the second image, and the third multiple is greater than the second multiple. Optionally, the first multiple, the second multiple, and the third multiple are 8 times, 16 times, and 32 times, respectively.
在一个可选的示例中,尺度变换模块22可以采用对称级联结构,如图4所示,对称级联结构包括相互对称设置的两个级联结构,其中,每个级联结构分别包括依次连接的三个采样单元。为了方便理解,以下将两个级联结构分别称之为第一级联结构41和第二级联结构42,第一级联结构包括的三个采样单元分别称之为第一采样单元、第二采样单元和第三采样单元;第二级联结构包括的三个采样单元分别称之为第四采样单元、第五采样单元和第六采样单元。其中,第一采样单元和第四采样单元的采样率相同,第二采样单元和第五采样单元的采样率相同,第三采样单元和第六采样单元的采样率相同。例如,第一采样单元和第四采样单元分别采用第一采样率对第一特征图和第二特征图进行采样,从而输出相比第一图像和第二图像来说,降采样8倍的第一特征图和第二特征图;第二采样单元和第五采样单元分别采用第二采样率对第一采样单元和第四采样单元输出的结果继续进行采样,从而输出相比第一图像和第二图像来说,降采样16倍的第一特征图和第二特征图;第三采样单元和第六采样单元分别采用第三采样率对第二采样单元和第五采样单元输出的结果继续进行采样,从而输出相比第一图像和第二图像来说,降采样32倍的第一特征图和第二特征图。In an optional example, the scale conversion module 22 may adopt a symmetrical cascade structure. As shown in FIG. 4, the symmetrical cascade structure includes two cascade structures arranged symmetrically with each other, wherein each cascade structure includes successively Three connected sampling units. To facilitate understanding, the two cascade structures are referred to as the first cascade structure 41 and the second cascade structure 42 respectively, and the three sampling units included in the first cascade structure are respectively referred to as the first sampling unit and the second cascade structure. The second sampling unit and the third sampling unit; the three sampling units included in the second cascade structure are called the fourth sampling unit, the fifth sampling unit, and the sixth sampling unit, respectively. The sampling rates of the first sampling unit and the fourth sampling unit are the same, the sampling rates of the second sampling unit and the fifth sampling unit are the same, and the sampling rates of the third sampling unit and the sixth sampling unit are the same. For example, the first sampling unit and the fourth sampling unit respectively use the first sampling rate to sample the first feature map and the second feature map, thereby outputting the first image and the second image that are down-sampled by 8 times compared to the first image and the second image. A feature map and a second feature map; the second sampling unit and the fifth sampling unit respectively use the second sampling rate to continue sampling the results output by the first sampling unit and the fourth sampling unit, so that the output is compared with the first image and the first For the second image, down-sample the first feature map and the second feature map by 16 times; the third sampling unit and the sixth sampling unit use the third sampling rate to continue with the output results of the second sampling unit and the fifth sampling unit. Sampling, thereby outputting the first feature map and the second feature map that are down-sampled 32 times compared to the first image and the second image.
本实施例可以采用如图4所示的对称级联结构分别对第一特征图和第 二特征图进行多次尺度转换。例如,采用第一级联结构41对第一特征图进行不同尺度的转换时,将第一特征图分别且依次输入第一采样单元、第二采样单元和第三采样单元中,以分别通过第一采样单元、第二采样单元和第三采样单元进行不同采样率的降采样,从而输出相比第一图像的尺寸来说,降采样8倍、16倍和32倍的第一特征图。同样地,采用第二级联结构42对第二特征图进行不同尺度的转换时,将第二特征图分别且依次输入第四采样单元、第五采样单元和第六采样单元中,以分别通过第四采样单元、第五采样单元和第六采样单元进行不同采样率的降采样,从而输出相比第二图像的尺寸来说,降采样8倍、16倍和32倍的第二特征图。In this embodiment, the symmetric cascade structure shown in FIG. 4 may be used to perform multiple scale conversions on the first feature map and the second feature map respectively. For example, when the first cascade structure 41 is used to convert the first feature map to different scales, the first feature map is input into the first sampling unit, the second sampling unit, and the third sampling unit separately and sequentially to pass the first sampling unit, respectively. A sampling unit, a second sampling unit, and a third sampling unit perform down-sampling at different sampling rates, thereby outputting a first feature map that is down-sampled 8 times, 16 times, and 32 times compared to the size of the first image. Similarly, when the second cascade structure 42 is used to convert the second feature map to different scales, the second feature map is input into the fourth sampling unit, the fifth sampling unit, and the sixth sampling unit respectively and sequentially to pass the The fourth sampling unit, the fifth sampling unit, and the sixth sampling unit perform down-sampling at different sampling rates, thereby outputting a second feature map that is down-sampled 8 times, 16 times, and 32 times compared to the size of the second image.
应当理解的是,上述第一级联结构41和第二级联结构42还可以是两级级联结构,例如,第一级联结构41和第二级联结构42各自包括依次连接的两个采样单元。It should be understood that the above-mentioned first cascade structure 41 and second cascade structure 42 may also be a two-level cascade structure. For example, the first cascade structure 41 and the second cascade structure 42 each include two cascades connected in sequence. Sampling unit.
可选的,根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的第二特征图,确定第二图像中的待查询目标,包括:根据多个不同尺度的第一特征图和第一图像的标签,确定多个不同尺度的第一特征向量;将多个不同尺度的第一特征向量与相应尺度的第二特征图按照预设计算规则进行计算,得到计算结果;根据计算结果,确定第二图像的掩码图像;根据掩码图像,确定第二图像中的待查询目标。可选的,预设计算规则包括:内积的计算规则,或者余弦距离的计算规则。其中,第一图像的标签是指表示图像中的目标或对象的类别的信息。Optionally, determining the target to be queried in the second image according to a plurality of first feature maps of different scales and labels of the first image, and a second feature map of corresponding scales includes: according to a plurality of first feature maps of different scales The label of the feature map and the first image determines multiple first feature vectors of different scales; multiple first feature vectors of different scales and second feature maps of corresponding scales are calculated according to a preset calculation rule to obtain a calculation result; According to the calculation result, the mask image of the second image is determined; according to the mask image, the target to be queried in the second image is determined. Optionally, the preset calculation rules include: inner product calculation rules, or cosine distance calculation rules. Wherein, the label of the first image refers to information indicating the target or the category of the object in the image.
以预设计算规则是内积为例,如图2所示,每个尺度的第一特征图和第一图像的标签都可以形成一个特征向量,例如分别将相比第一图像降采样8倍、16倍和32倍的第一特征图和第一图像的标签进行插值运算形成一个特征向量,以下称之为第一特征向量、第二特征向量和第三特征向量,然后分别对第一特征向量和相比第二图像降采样8倍的第二特征图进行内积运算,对第二特征向量和相比第一图像降采样16倍的第二特征图进行内积运算,以及对第三特征向量和相比第一图像降采样32倍的第二特征图进行内积运算,得到三个不同尺度大小的概率图,三个不同尺度大小的概率图的尺寸分别与第一特征向量、第二特征向量和第三特征向量的尺寸相同,也可以认为三个不同尺度大小的概率图的尺寸分别与相比第一图像或第二 图像降采样8倍、16倍和32倍的第一特征图或第二特征图的尺寸相同。之后,再将这三个概率图输入卷积网络23,由卷积网络23将这三个概率图进行连接并对连接后的图像进行卷积,从而输出第二图像的掩码图像mask,达到对第二图像的目标检测效果。Taking the preset calculation rule as the inner product as an example, as shown in Figure 2, the first feature map of each scale and the label of the first image can form a feature vector, for example, the first image is downsampled by 8 times. , 16 times and 32 times the first feature map and the label of the first image are interpolated to form a feature vector, hereinafter referred to as the first feature vector, the second feature vector and the third feature vector, and then the first feature The vector sums the second feature map downsampled 8 times compared to the second image to perform inner product operation, the second feature vector and the second feature map downsampled 16 times compared to the first image to perform the inner product operation, and the third The feature vector and the second feature map downsampled 32 times compared to the first image are subjected to the inner product operation to obtain three probability maps of different scales. The sizes of the three probability maps of different scales are respectively the same as those of the first feature vector and the first feature vector. The size of the second feature vector and the third feature vector are the same. It can also be considered that the sizes of the three probability maps of different scales are compared to the first image or the second image by down-sampling 8 times, 16 times, and 32 times the first feature. The size of the figure or the second characteristic figure is the same. After that, these three probability maps are input to the convolutional network 23, and the convolutional network 23 connects the three probability maps and convolves the connected images, so as to output the mask image mask of the second image to achieve The target detection effect on the second image.
可选的,根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的第二特征图,确定第二图像中的待查询目标,包括:以多个不同尺度的第一特征图、第一图像的标签以及相应尺度的第二特征图作为相应尺度的第三特征图的指导信息,确定第二图像中的待查询图像;其中,第三特征图根据第二图像确定,且同一尺度的第二特征图和第三特征图不同。相较于上述实施例,本实施例增加了一个通过第三特征图对上述实施例中得到的不同尺度大小的内积运算结果进行指导的过程,从而进一步提高后续的目标检测精度,其中,第三特征图可以采用图2所示的特征提取网络21之外的其他特征提取网络来进行特征提取,第三特征图的特征提取网络的网络架构和网络参数与第一、第二特征图的网络架构和网络参数不同,例如,卷积核不同。Optionally, according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, determining the target to be queried in the second image includes: first feature maps of multiple different scales The feature map, the label of the first image, and the second feature map of the corresponding scale are used as the guidance information of the third feature map of the corresponding scale to determine the image to be queried in the second image; wherein the third feature map is determined according to the second image, And the second feature map and the third feature map of the same scale are different. Compared with the foregoing embodiment, this embodiment adds a third feature map to guide the inner product operation results of different scales obtained in the foregoing embodiment, thereby further improving the accuracy of subsequent target detection. The three feature maps can use other feature extraction networks other than the feature extraction network 21 shown in FIG. 2 for feature extraction. The network architecture and network parameters of the feature extraction network of the third feature map are the same as those of the first and second feature maps. The architecture and network parameters are different, for example, the convolution kernel is different.
图5为本申请另一实施例提供的目标检测方法流程图。在上述实施例的基础上,本实施例提供的目标检测方法具体包括如下步骤:FIG. 5 is a flowchart of a target detection method provided by another embodiment of this application. On the basis of the foregoing embodiment, the target detection method provided in this embodiment specifically includes the following steps:
步骤501、根据多个不同尺度的第一特征图和第一图像的标签,确定多个不同尺度的第一特征向量。Step 501: Determine multiple first feature vectors of different scales according to multiple first feature maps of different scales and labels of the first images.
步骤502、将多个不同尺度的第一特征向量与相应尺度的第二特征图按照预设计算规则进行计算,得到多个不同尺度下的掩码图像。Step 502: Calculate multiple first feature vectors of different scales and second feature maps of corresponding scales according to a preset calculation rule to obtain multiple mask images of different scales.
该步骤中得到的掩码图像将作为指导信息,对第三特征图进行指导。The mask image obtained in this step will be used as guidance information to guide the third feature map.
步骤503、根据多个不同尺度的掩码图像和相应尺度的第三特征图相乘的结果,确定第二图像中的待查询目标。Step 503: Determine the target to be queried in the second image according to the multiplication result of the multiple mask images of different scales and the third feature map of corresponding scales.
本实施例中,多个不同尺度的掩码图像和相应尺度的第三特征图相乘,是指相同尺度的掩码图像和第三特征图中同一位置处,掩码图像的值(标量)与第三特征图的值(向量)相乘。In this embodiment, the multiplication of multiple mask images of different scales with the third feature map of the corresponding scale refers to the value (scalar) of the mask image at the same position in the mask image of the same scale and the third feature map. Multiply the value (vector) of the third feature map.
本实施例的方法可以适用于图6所示的检测模型,图6所示的检测模型与图2所示的检测模型的不同之处在于,在图2所示的特征提取网络21 的基础上增加了一些卷积层,以及在图2所示的对称级联结构的基础上增加了一个第三级联结构。其中,第三级联结构的结构与第一级联结构或第二级联结构的结构相同,其实现原理可参见上述实施例的介绍。The method of this embodiment can be applied to the detection model shown in FIG. 6. The detection model shown in FIG. 6 is different from the detection model shown in FIG. 2 in that it is based on the feature extraction network 21 shown in FIG. Some convolutional layers are added, and a third cascade structure is added on the basis of the symmetric cascade structure shown in FIG. 2. Wherein, the structure of the third cascade structure is the same as the structure of the first cascade structure or the second cascade structure, and its implementation principle can be referred to the introduction of the foregoing embodiment.
如图6所示,检测模型60包括特征提取网络61、尺度转换模块62和卷积网络63。其中,特征提取网络61包括第四卷积模块611、第五卷积模块612、第六卷积模块613、第七卷积模块614、第八卷积模块615、第九卷积模块616和第十卷积模块617。第四卷积模块611、第五卷积模块612、第六卷积模块613与图2中所示的第一卷积模块211、第二卷积模块212和第三卷积模块213的网络架构和网络参数相同,其作用和原理可以参见图2所示的实施例部分的内容介绍,本实施例主要针对图6与图2的不同之处进行详细说明。可以看到,在图2所示特征提取网络21的基础上,第六卷积模块613(图2中的第三卷积模块213)之后还连接了第七卷积模块614,以及第四卷积模块611(图2中的第三卷积模块211)之后依次连接了第八卷积模块615、第九卷积模块616和第十卷积模块617。其中,第六卷积模块613和第七卷积模块614的输出还分别作为第八卷积模块615、第九卷积模块616的输入。以及第十卷积模块617的输出作为第三级联结构33的输入。第七卷积模块614分别根据第六卷积模块613的输出结果进行特征提取,得到第一特征图和第二特征图,然后输入尺度转换模块62,尺度转换模块62与图2中所示的尺度转换模块22的结构和原理相同,尺度转换模块62分别对第一特征图和第二特征图进行不同尺度的转换,与此同时,第一图像的标签信息也会输入尺度转换模块62中,由尺度转换模块62根据多个不同尺度的第一特征图、第二特征图以及第一图像的标签信息,输出多个不同尺度的掩码图像mask32x、mask16x、mask8x,mask32x、mask16x、mask8x分别代表比第一特征图或第二特征图降采样32倍、16倍和8倍的掩码图像。尺度转换模块62输出的掩码图像mask32x、mask16x、mask8x再与第三级联结构根据第二图像输出的相比第二图像降采样8倍、16倍和32倍的第二特征图进行相应像素位置处的乘法运算,从而得到三个概率图。之后,再将三个概率图输入卷积网络进行卷积等操作,从而实现对第二图像的目标检测。As shown in FIG. 6, the detection model 60 includes a feature extraction network 61, a scale conversion module 62 and a convolutional network 63. Among them, the feature extraction network 61 includes a fourth convolution module 611, a fifth convolution module 612, a sixth convolution module 613, a seventh convolution module 614, an eighth convolution module 615, a ninth convolution module 616, and a Ten convolution module 617. The network architecture of the fourth convolution module 611, the fifth convolution module 612, the sixth convolution module 613 and the first convolution module 211, the second convolution module 212, and the third convolution module 213 shown in FIG. 2 It is the same as the network parameter, and its function and principle can be referred to the content introduction of the embodiment shown in FIG. 2. This embodiment mainly focuses on the difference between FIG. 6 and FIG. 2 for detailed description. It can be seen that on the basis of the feature extraction network 21 shown in FIG. 2, the sixth convolution module 613 (the third convolution module 213 in FIG. 2) is also connected to the seventh convolution module 614 and the fourth convolution module. After the product module 611 (the third convolution module 211 in FIG. 2 ), the eighth convolution module 615, the ninth convolution module 616, and the tenth convolution module 617 are sequentially connected. Among them, the outputs of the sixth convolution module 613 and the seventh convolution module 614 are also used as the input of the eighth convolution module 615 and the ninth convolution module 616, respectively. And the output of the tenth convolution module 617 is used as the input of the third cascade structure 33. The seventh convolution module 614 performs feature extraction according to the output results of the sixth convolution module 613 to obtain the first feature map and the second feature map, and then input the scale conversion module 62. The scale conversion module 62 is similar to the one shown in FIG. The scale conversion module 22 has the same structure and principle. The scale conversion module 62 performs different scale conversions on the first feature map and the second feature map. At the same time, the label information of the first image is also input into the scale conversion module 62. The scale conversion module 62 outputs a plurality of mask images mask32x, mask16x, and mask8x of different scales according to the first feature map, the second feature map of different scales, and the label information of the first image. Mask32x, mask16x, and mask8x respectively represent The mask image is down-sampled 32 times, 16 times, and 8 times than the first feature map or the second feature map. The mask images mask32x, mask16x, and mask8x output by the scale conversion module 62 are then down-sampled by the second image by 8 times, 16 times, and 32 times compared with the second image output by the third cascade structure to perform corresponding pixels. The multiplication operation at the position results in three probability maps. After that, the three probability maps are input into the convolutional network to perform operations such as convolution, so as to realize the target detection of the second image.
可选的,本实施例也可以直接将通过第六卷积模块613提取的特征图输入第三级联结构中。Optionally, in this embodiment, the feature map extracted by the sixth convolution module 613 can also be directly input into the third cascade structure.
可选的,本实施例还可以直接将第六卷积模块613输出的针对第一图像的特征图和针对第二图像的特征图分别输入第一级联结构和第二级联结构。Optionally, this embodiment may also directly input the feature map for the first image and the feature map for the second image output by the sixth convolution module 613 into the first cascade structure and the second cascade structure, respectively.
可选的,图2中所示的第一卷积模块、第二卷积模块和第三卷积模块为一个标准的VGG网络架构,本领域技术人员可以根据实际需求在图2所示的VGG网络架构和图6中的第四卷积模块、第五卷积模块、第六卷积模块和第七卷积模块的基础上增加或减少卷积模块的数量。本申请实施例通过根据多个不同尺度的第一特征图和第一图像的标签,确定多个不同尺度的第一特征向量,然后将多个不同尺度的第一特征向量与相应尺度的第二特征图按照预设计算规则进行计算,得到计算结果,并根据计算结果确定第二图像的掩码图像,以及根据掩码图像确定第二图像中的待查询目标。多个不同尺度下的掩码图像能够对相应尺度下的第二特征图的分割进行相似性指导(尺度转换模块62输出的掩码图像mask32x、mask16x、mask8x与第三级联结构根据第二图像输出的相比第二图像降采样8倍、16倍和32倍的第二特征图进行相应像素位置处的乘法运算)。另外,以第六卷积模块为例,由于第五卷积模块612对第二图像的输出结果又输入了第六卷积模块,使得第六卷积模块能够根据第五卷积模块的输出结果和第二图像的输出结果进行融合后再次进行特征提取,如此,能够提取出来更加丰富的特征信息,而在反向传播的时候,反馈回来的损失函数也能够带有更加丰富的信息,使得更好地调整特征提取网络中每个卷积模块的网络参数。因此,在后续目标检测过程中,也能够进一步提高检测模型的检测精度。Optionally, the first convolution module, the second convolution module, and the third convolution module shown in FIG. 2 are a standard VGG network architecture. Those skilled in the art can use the VGG network architecture shown in FIG. 2 according to actual needs. On the basis of the network architecture and the fourth convolution module, fifth convolution module, sixth convolution module, and seventh convolution module in FIG. 6, the number of convolution modules is increased or decreased. In this embodiment of the application, a plurality of first feature vectors of different scales are determined according to a plurality of first feature maps of different scales and the labels of the first images, and then the plurality of first feature vectors of different scales are combined with the second feature vectors of corresponding scales. The feature map is calculated according to a preset calculation rule to obtain a calculation result, a mask image of the second image is determined according to the calculation result, and a target to be queried in the second image is determined according to the mask image. Multiple mask images at different scales can guide the similarity of the segmentation of the second feature map at the corresponding scale (the mask images mask32x, mask16x, mask8x output by the scale conversion module 62 and the third cascade structure are based on the second image The output second feature map, which is down-sampled by 8, 16, and 32 times compared to the second image, is multiplied at the corresponding pixel position). In addition, taking the sixth convolution module as an example, since the output result of the fifth convolution module 612 on the second image is input to the sixth convolution module, the sixth convolution module can be based on the output result of the fifth convolution module. After fusion with the output result of the second image, feature extraction is performed again. In this way, richer feature information can be extracted, and during backpropagation, the feedback loss function can also carry richer information, making it more Adjust the network parameters of each convolution module in the feature extraction network. Therefore, in the subsequent target detection process, the detection accuracy of the detection model can also be further improved.
图7为本申请又一实施例提供的目标检测方法的流程示意图。本实施例对根据多个不同尺度的第一特征图和第一图像的标签信息,以及相应尺度的第二特征图,确定第二图像中的待查询目标的具体实现过程进行了详细说明。如图7所示,该方法包括:FIG. 7 is a schematic flowchart of a target detection method provided by another embodiment of this application. This embodiment describes in detail the specific implementation process of determining the target to be queried in the second image based on multiple first feature maps of different scales and label information of the first image, and second feature maps of corresponding scales. As shown in Figure 7, the method includes:
S701、分别对第一图像和第二图像进行多个不同尺度的特征提取,生成多个不同尺度的第一特征图和多个不同尺度的第二特征图。S701: Perform feature extraction of multiple different scales on the first image and the second image respectively, and generate multiple first feature maps of different scales and multiple second feature maps of different scales.
本实施例中,S701与图1实施例中的S101类似,此处不再赘述。In this embodiment, S701 is similar to S101 in the embodiment of FIG. 1, and will not be repeated here.
S702、根据多个不同尺度的第一特征图、第一图像的标签信息和对应 尺度的第二特征图确定多个不同尺度的相似度图;一个尺度的相似度图表征该尺度的第一特征图和第二特征图的相似性。S702. Determine multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales; a similarity map of one scale represents the first feature of the scale The similarity between the graph and the second feature graph.
本实施例中,每个尺度的相似度图中包含该尺度的第一特征图和第二特征图之间的特征的相似性信息。In this embodiment, the similarity map of each scale contains the similarity information of the features between the first feature map and the second feature map of the scale.
可选地,S702可以包括:根据多个不同尺度的第一特征图和第一图像的标签信息,确定多个不同尺度的第一特征向量;将多个不同尺度的第一特征向量与相应尺度的第二特征图逐元素相乘,得到多个不同尺度的相似度图。Optionally, S702 may include: determining a plurality of first feature vectors of different scales according to label information of a plurality of first feature maps of different scales and the first image; and comparing the plurality of first feature vectors of different scales with corresponding scales. The second feature map of is multiplied element by element to obtain multiple similarity maps of different scales.
本实施例中,针对每个尺度的第一特征图,可以将该尺度的第一特征图和第一图像的标签信息进行相乘操作,得到该尺度的第一特征向量。然后将该尺度的第一特征向量与该尺度的第二特征图进行逐元素相乘,得到该尺度的相似度图。在该尺度的相似度图中,每个像素位置上通过一个向量来表达第一特征向量和第二特征图在该位置上的相似性。In this embodiment, for the first feature map of each scale, the first feature map of the scale and the label information of the first image may be multiplied to obtain the first feature vector of the scale. Then the first feature vector of this scale and the second feature map of this scale are multiplied element by element to obtain the similarity map of this scale. In the similarity map of this scale, a vector is used at each pixel location to express the similarity of the first feature vector and the second feature map at that location.
相较于使用内积或余弦距离将两个特征图之间的相似度表达为单通道的相似度图,然后通过单通道的相似度图进行语义分割,从而实现目标查询而言。以内积为例,将两个特征图上的位于同一位置的两个特征向量求内积,得到每个像素位置对应的数值,由于最终得到的相似度图上每个像素位置仅对应一个数值,仅能表征单通道的特征信息,单通道的特征信息不能充分表达支持集图像的特征,导致描述特征图之间的相似性能力不足,进而导致目标查询的精度较低。本实施例通过将多个不同尺度的第一特征向量与相应尺度的第二特征图逐元素相乘的方式生成不同尺度的相似度图,通过逐元素相乘的方式代替内积或余弦距离方式,能够使每个尺度的相似度图中包含多通道的相似性信息,使相似性特征表达更为充分,进一步提高目标查询的精准度。Compared with using inner product or cosine distance to express the similarity between two feature maps as a single-channel similarity map, and then perform semantic segmentation through the single-channel similarity map to achieve the target query. Taking the inner product as an example, the inner product of the two feature vectors at the same position on the two feature maps is obtained to obtain the value corresponding to each pixel position. Since each pixel position on the final similarity map corresponds to only one value, It can only characterize the feature information of a single channel, and the feature information of a single channel cannot fully express the features of the support set image, resulting in insufficient ability to describe the similarity between the feature maps, and thus the accuracy of the target query is low. This embodiment generates similarity maps of different scales by multiplying multiple first feature vectors of different scales and second feature maps of corresponding scales element by element, and replaces the inner product or cosine distance method by multiplying element by element. , Can make the similarity map of each scale contain multi-channel similarity information, make the similarity feature expression more fully, and further improve the accuracy of the target query.
S703、将多个不同尺度的相似度图整合,得到整合后的相似度图。S703. Integrate multiple similarity maps of different scales to obtain an integrated similarity map.
本实施例中,可以通过上采样将不同尺度的相似度图转换为相同尺度的相似度图,然后进行整合,得到整合后的相似度图。具体可以由以下两种实现方式中的任一种来实现,下面分别进行说明。In this embodiment, similarity maps of different scales can be converted into similarity maps of the same scale through upsampling, and then integrated to obtain an integrated similarity map. Specifically, it can be implemented by either of the following two implementation manners, which will be described separately below.
第一种实现方式中,S703可以包括:对多个不同尺度的相似度图进行上采样,得到多个尺度相同的相似度图;对多个尺度相同的相似度图相加,得到整合后的相似度图。In the first implementation manner, S703 may include: up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale; adding multiple similarity maps of the same scale to obtain the integrated Similarity graph.
本实施例中,可以分别将多个不同尺度的相似度图上采样成为同一尺度,然后进行相加,从而得到整合后的相似度。例如,假设共有三个相似度图:A,B,C,三者的尺度依次为m1,m2,m3,其中,m1>m2>m3。则可以分别对B和C进行上采样,将B和C的尺度都提升为m1,然后将A与上采样后的B,C三者进行相加,得到整合后的相似度图,此时整合后的相似度图的尺度为m1。或者,指定一个尺度m4,m4>m1,分别对A,B,C进行上采样,将A,B,C的尺度都提升为m4,然后将上采样后的A,B,C进行相加,得到整合后的相似度图,该相似度图的尺度为m4。In this embodiment, multiple similarity maps of different scales may be respectively up-sampled into the same scale, and then added, so as to obtain the integrated similarity. For example, suppose there are three similarity graphs: A, B, and C. The scales of the three are m1, m2, m3, where m1>m2>m3. Then you can up-sample B and C separately, increase the scales of B and C to m1, and then add A and the up-sampled B and C to obtain the integrated similarity map. At this time, integrate The scale of the subsequent similarity map is m1. Or, specify a scale m4, m4>m1, up-sample A, B, and C respectively, increase the scales of A, B, and C to m4, and then add the up-sampled A, B, and C together, The integrated similarity map is obtained, and the scale of the similarity map is m4.
第二种实现方式中,S703可以包括:In the second implementation manner, S703 may include:
多个不同尺度的相似度图构成相似度图集合;Multiple similarity maps of different scales constitute a similarity map set;
对相似度图集合中尺度最小的相似度图进行上采样,得到与尺度第二小的相似度图相同尺度的相似度图;Up-sampling the similarity map with the smallest scale in the similarity map set to obtain the similarity map with the same scale as the second-smallest similarity map;
将得到的相似度图与尺度第二小的相似度图相加,得到新的相似度图;Add the obtained similarity map to the second-smallest similarity map to obtain a new similarity map;
将相似度图集合中未经过上采样处理或者相加处理的相似度图与新的相似度图构成新的相似度图集合,重复执行上采样的步骤和相加的步骤,直至得到最后一个相似度图,所得到的最后一个相似度图为整合后的相似度图。Combine the similarity maps that have not undergone upsampling or addition processing in the similarity map set to form a new similarity map set, and repeat the upsampling and addition steps until the last similarity is obtained Degree graph, the last similarity graph obtained is the integrated similarity graph.
以三个相似度图为例对该实现方式进行说明。假设共有三个相似度图:A,B,C,三者的尺度依次为m1,m2,m3,其中,m1>m2>m3。可以首先对C进行上采样,将C的尺度提升为m2,然后将B与上采样后的C进行相加,得到新的相似度图D,D的尺度为m2。然后对D进行上采样,将D的尺度提升为m1,将A与上采样后的D进行相加,得到最终整合的相似度图。Take three similarity graphs as an example to illustrate the implementation. Suppose there are three similarity graphs: A, B, and C. The scales of the three are m1, m2, m3, where m1>m2>m3. C can be up-sampled first, and the scale of C can be increased to m2, and then B and the up-sampled C can be added to obtain a new similarity map D. The scale of D is m2. Then D is up-sampled, the scale of D is increased to m1, and A and the up-sampled D are added to obtain the final integrated similarity map.
S704、根据整合后的相似度图,确定第二图像中的待查询目标。S704: Determine the target to be queried in the second image according to the integrated similarity map.
本实施例中,S704与图1实施例中的S102类似,此处不再赘述。In this embodiment, S704 is similar to S102 in the embodiment of FIG. 1, and will not be repeated here.
本实施例通过根据多个不同尺度的第一特征图、第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图,然后将多个不同尺度的相似度图整合,得到整合后的相似度图,再根据整合后的相似度图,确定第二图像中的待查询目标,能够通过对多个不同尺度的相似度整合,使整合得到的相似度中包含多尺度的特征信息,从而进一步提高目标查询的精准度。In this embodiment, multiple similarity maps of different scales are determined based on multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, and then the multiple similarity maps of different scales are integrated , Obtain the integrated similarity map, and then determine the target to be queried in the second image according to the integrated similarity map, which can integrate multiple similarities at different scales, so that the integrated similarity includes multiple scales To further improve the accuracy of the target query.
图8为本申请另一实施例提供的目标检测方法的流程示意图。本实施例与图7实施例的区别在于,在S702确定多个不同尺度的相似度图之后,S703将多个不同尺度的相似度图整合之前,还将多个不同尺度的相似度图与相应尺度的第三特征图进行逐元素相乘,得到处理后的多个不同尺度的相似度图。如图8所示,该方法包括:FIG. 8 is a schematic flowchart of a target detection method provided by another embodiment of this application. The difference between this embodiment and the embodiment in FIG. 7 is that after determining multiple similarity maps of different scales in S702, before integrating the multiple similarity maps of different scales in S703, the multiple similarity maps of different scales are combined with corresponding The third feature map of the scale is multiplied element by element to obtain multiple similarity maps of different scales after processing. As shown in Figure 8, the method includes:
S801、分别对第二图像和第一图像进行多个不同尺度的特征提取,生成多个不同尺度的第一特征图和多个不同尺度的第二特征图。S801: Perform feature extraction of multiple different scales on the second image and the first image respectively, and generate multiple first feature maps of different scales and multiple second feature maps of different scales.
本实施例中,S801与图1实施例中的S101类似,此处不再赘述。In this embodiment, S801 is similar to S101 in the embodiment of FIG. 1, and will not be repeated here.
S802、根据多个不同尺度的第一特征图、第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图;一个尺度的相似度图表征该尺度的第一特征图和第二特征图的相似性。S802. Determine multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales; a similarity map of one scale represents the first feature of the scale The similarity between the graph and the second feature graph.
本实施例中,S802与图7实施例中的S702类似,此处不再赘述。In this embodiment, S802 is similar to S702 in the embodiment of FIG. 7, and will not be repeated here.
S803、将多个不同尺度的相似度图和相应尺度的第三特征图逐元素相乘,得到处理后的多个不同尺度的相似度图;其中,第三特征图根据第二图像确定,且同一尺度的第二特征图和第三特征图不同。S803. Multiply multiple similarity maps of different scales and third feature maps of corresponding scales element by element to obtain processed similarity maps of different scales; wherein the third feature map is determined according to the second image, and The second feature map and the third feature map of the same scale are different.
S804、将处理后的多个不同尺度的相似度图整合,得到整合后的相似度图。S804. Integrate multiple processed similarity maps of different scales to obtain an integrated similarity map.
本实施例中,S804与图7实施例中的S704类似,此处不再赘述。In this embodiment, S804 is similar to S704 in the embodiment of FIG. 7, and will not be repeated here.
本实施例中,在对第二图像进行特征提取,不仅提取得到多个不同尺度的第二特征图,还提取得到多个不同尺度的第三特征图。对于每个尺度,可以对第二图像采用不同的特征提取方式,如采用两个具有不同的网络参数的神经网络等,分别得到该尺度的第二特征图和第三特征图。In this embodiment, when performing feature extraction on the second image, not only multiple second feature maps of different scales are extracted, but also multiple third feature maps of different scales are extracted. For each scale, different feature extraction methods can be used for the second image, such as using two neural networks with different network parameters, etc., to obtain the second feature map and the third feature map of the scale respectively.
在将根据多个不同尺度的第一特征图、第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图之后,针对每个尺度的相似度图,将该尺度的相似度图与该尺度的第三特征图进行逐元素相乘,得到处理后的该尺度的相似度图。然后将处理后的多个不同尺度的相似度图整合,得到整合后的相似度图。After determining multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, for each scale of similarity map, the scale The similarity map of and the third feature map of this scale is multiplied element by element to obtain the processed similarity map of this scale. Then, the processed similarity maps of different scales are integrated to obtain an integrated similarity map.
S805、根据整合后的相似度图,确定第二图像中的待查询目标。S805: Determine the target to be queried in the second image according to the integrated similarity map.
本实施例通过将根据多个不同尺度的第一特征图、第一图像的标签信息和对应尺度的第二特征图确定出的多个不同尺度的相似度图,对第二图 像的第三特征图进行逐元素相乘,能够使用多个不同尺度的相似度图指导第二图像进行分割,从而进一步提高目标查询的准确度。In this embodiment, a plurality of similarity maps of different scales determined according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales are used to compare the third feature maps of the second image. The image is multiplied element by element, and multiple similarity maps of different scales can be used to guide the segmentation of the second image, thereby further improving the accuracy of the target query.
图9是本申请实施例提供的目标检测方法的流程图。Fig. 9 is a flowchart of a target detection method provided by an embodiment of the present application.
如图9所示,上述实施例的目标检测方法由神经网络执行,该神经网络采用以下步骤训练得到:As shown in Fig. 9, the target detection method of the foregoing embodiment is executed by a neural network, which is trained by the following steps:
步骤901:分别对第一样本图像和第二样本图像进行多个不同尺度的特征提取,得到多个不同尺度的第四特征图和多个不同尺度的第五特征图;其中,第一样本图像和第二样本图像均包含第一类别的对象。Step 901: Perform feature extraction of a plurality of different scales on the first sample image and the second sample image respectively to obtain a plurality of fourth feature maps of different scales and a plurality of fifth feature maps of different scales; among them, the first is the same Both the present image and the second sample image contain objects of the first category.
步骤902:根据多个不同尺度的第四特征图和第一样本图像的标签,以及相应尺度的第五特征图,确定第二样本图像中的第一类别的对象;第一样本图像的标签是对第一样本图像中包含的第一类别的对象进行标注的结果。Step 902: Determine the object of the first category in the second sample image according to the labels of the fourth feature map and the first sample image of multiple different scales, and the fifth feature map of the corresponding scale; The label is the result of labeling the objects of the first category contained in the first sample image.
步骤903:根据确定的第二样本图像中的第一类别的对象以及第二样本图像的标签之间的差异,调整神经网络的网络参数;第二样本图像的标签是对第二样本图像中包含的第一类别的对象进行标注的结果。Step 903: Adjust the network parameters of the neural network according to the determined difference between the object of the first category in the second sample image and the label of the second sample image; The result of labeling objects of the first category.
本实施例中,通过神经网络实现上述的目标查询方式,在进行目标查询之前,可以首先对该神经网络进行训练。具体的,可以将从包含多个样本图像的训练集中,获取包含同一类别对象的第一样本图像和第二样本图像,该对象即此次训练过程中待查询的目标。其中,训练集中可以包括多个子集,每个子集中的样本图像均包含同一类别的对象。例如,类别可以包括车辆、行人,交通指示灯(即红绿灯)等,获取的第一样本图像和第二样本图像可以均包含交通指示灯。将交通指示灯作为该次训练过程中待查询的目标。对第一样本图像中的交通指示灯进行标注从而得到第一样本图像的标签。对第二样本图像中的交通指示灯进行标注从而得到第二样本图像的标签。In this embodiment, the above-mentioned target query method is realized by a neural network, and the neural network may be trained first before the target query is performed. Specifically, a first sample image and a second sample image containing objects of the same category can be obtained from a training set containing multiple sample images, and this object is the target to be queried in the training process. Among them, the training set may include multiple subsets, and the sample images in each subset contain objects of the same category. For example, the categories may include vehicles, pedestrians, traffic lights (ie, traffic lights), etc., and the acquired first sample image and second sample image may both include traffic lights. Use the traffic lights as the target to be queried during this training. Label the traffic lights in the first sample image to obtain the label of the first sample image. Label the traffic lights in the second sample image to obtain the label of the second sample image.
本实施例的训练过程与上述实施例的目标检测方法的过程类似,具体实施过程可参见上述实施例的介绍。需要说明的是,本实施例中,第一样本图像和第二样本图像中需包含相同类别的对象,以对神经网络进行训练,使得神经网络能够识别同一类别的图像之间的关联。例如,在训练阶段,可以采用红绿灯来对神经网络进行训练,在测试或者应用阶段可以采用路 灯来对神经网络进行测试,或者应用该神经网络。The training process of this embodiment is similar to the process of the target detection method of the foregoing embodiment, and the specific implementation process can refer to the introduction of the foregoing embodiment. It should be noted that in this embodiment, the first sample image and the second sample image need to contain objects of the same category to train the neural network so that the neural network can recognize the association between images of the same category. For example, in the training phase, traffic lights can be used to train the neural network, and in the testing or application phase, street lights can be used to test the neural network or to apply the neural network.
图10为本申请还一实施例提供的目标检测方法的流程示意图。本实施例中对图9实施例中训练完成的神经网络的测试方式进行了详细说明。如图10所示,该方法还可以包括:FIG. 10 is a schematic flowchart of a target detection method provided by still another embodiment of this application. In this embodiment, the test method of the trained neural network in the embodiment of FIG. 9 is described in detail. As shown in Figure 10, the method may further include:
S1001、分别对第一测试图像和第二测试图像进行多个不同尺度的特征提取,得到多个不同尺度的第一测试特征图和多个不同尺度的第二测试特征图;其中,第一测试图像和所述第二测试图像来源于一个测试图像集,测试图像集中的各个测试图像均包括同一类别的对象。S1001. Perform feature extraction of a plurality of different scales on the first test image and the second test image, respectively, to obtain a plurality of first test feature maps of different scales and a plurality of second test feature maps of different scales; wherein, the first test The image and the second test image are derived from a test image set, and each test image in the test image set includes objects of the same category.
S1002、根据多个不同尺度的第一测试特征图和第一测试图像的标签,以及相应尺度的第二测试特征图,确定第二测试图像中的待查询目标;第一测试图像的标签是对第一测试图像中包含的待查询目标进行标注的结果。S1002. Determine the target to be queried in the second test image according to multiple first test feature maps of different scales and labels of the first test image, and second test feature maps of corresponding scales; the label of the first test image is correct The result of marking the target to be queried contained in the first test image.
本实施例中,可以预先将包括同一类别的对象的测试图像组成一个测试图像集,将多个测试图像集组成一个总的测试集。在对神经网络进行测试时,从一个测试图像集中选取出第一测试图像和第二测试图像,通过第一测试图像和第二测试图像对神经网络进行测试。例如,可以通过包含路灯的第一测试图像和第二测试图像对神经网络进行测试。In this embodiment, test images including objects of the same category may be pre-formed into a test image set, and multiple test image sets may be formed into a total test set. When testing the neural network, the first test image and the second test image are selected from a set of test images, and the neural network is tested through the first test image and the second test image. For example, the neural network can be tested through the first test image and the second test image containing street lights.
在一个示例中,可以在测试图像集中对每一个测试类别选择一个样本作为第一测试图像,例如在PASCAL VOC的测试图像集中,对于每一类(共20类)别分别选择一张图像作为第一测试图像。在测试过程中,对于测试图像集中每一个样本都与其对应类别的第一测试图像组成测试数据对,随后输入图2或图5所示的模型中进行评测,其中,测试数据对中的测试图像中包含了同一类目标。如此,能够避免传统的随机选取的测试数据对带来的类别选取不均匀的问题,同时也解决了由于样本质量不同带来评测指标浮动的问题。可选的,在测试时,可以是训练100次之后,再进行一次测试,也可以是训练120次之后,再进行一次测试。本领域技术人员可以根据实际需求进行相应调整,本实施例对此不做具体限定。In one example, one sample can be selected as the first test image for each test category in the test image set. For example, in the PASCAL VOC test image set, one image is selected as the first test image for each category (a total of 20 categories). One test image. During the testing process, each sample in the test image set and its corresponding first test image form a test data pair, which is then input into the model shown in Figure 2 or Figure 5 for evaluation, where the test image in the test data pair Contains the same type of target. In this way, the problem of uneven selection of categories caused by traditional randomly selected test data pairs can be avoided, and at the same time, the problem of floating evaluation indicators caused by different sample quality can be solved. Optionally, during the test, the test may be performed after 100 trainings, or the test may be performed after 120 trainings. Those skilled in the art can make corresponding adjustments according to actual needs, which is not specifically limited in this embodiment.
本申请实施例的训练好的神经网络,即使面对待查询图像的类别所对应的训练图像的数量在训练图像集中具有较低的占比或者是从未学习过的类别,本实施例的目标方法也能够对其准确检测。另外,本申请实施例的随机选取测试数据对的方法还能够减轻任务对于样本的强依赖,对于实际 应用场景中难以采集到的类别样本也能够准确检测,可以能够避免传统的随机选取的测试对导致的类别选取不均匀的问题,同时也解决了由于支持样本质量不同带来评测指标浮动的问题。例如:在自动驾驶中的目标检测任务中,可以对场景中的某一不提供大量训练样本的目标类别也能够准确检测。With the trained neural network in the embodiment of this application, even if the number of training images corresponding to the category of the image to be queried has a low proportion in the training image set or is a category that has never been learned, the target method of this embodiment It can also be accurately detected. In addition, the method of randomly selecting test data pairs in the embodiments of the present application can also reduce the task’s strong dependence on samples, and can also accurately detect types of samples that are difficult to collect in actual application scenarios, and can avoid traditional randomly selected test pairs. The problem of uneven selection of categories is caused, and it also solves the problem of floating evaluation indicators due to the different quality of support samples. For example: in the target detection task in automatic driving, a certain target category in the scene that does not provide a large number of training samples can also be accurately detected.
图11为本申请实施例提供的智能行驶方法的流程示意图。如图11所示,该方法可以包括:FIG. 11 is a schematic flowchart of a smart driving method provided by an embodiment of the application. As shown in Figure 11, the method may include:
S1101、采集道路图像。S1101. Collect road images.
S1102、采用如上所述的目标检测方法,根据支持图像以及支持图像的标签对采集到的道路图像进行待查询目标的查询;其中,支持图像的标签是对支持图像中包含的与待查询目标同一类别的目标进行标注的结果。S1102, using the target detection method as described above, query the collected road images for the target to be queried according to the support image and the tags of the support image; wherein, the tag of the support image is the same as the target to be queried contained in the support image. The result of labeling the target of the category.
S1103、根据查询结果对采集道路图像的智能行驶设备进行控制。S1103: Control the smart driving device that collects road images according to the query result.
本实施例中,智能行驶设备可以包括自动驾驶车辆、装有高级驾驶辅助系统(Advanced Driving Assistant System,ADAS)的车辆、机器人等。例如,可以获取智能行驶设备在行驶时或者停止时采集的道路图像,然后采用上述目标检测方法对道路图像进行目标检测。在采用上述目标检测方式时,道路图像作为上述的第二图像,支持图像作为上述的第一图像。然后根据目标检测结果对智能行驶设备进行控制。例如,可以直接控制自动驾驶车辆或者机器人等智能行驶设备进行减速、刹车、转向等操作,或者向装有ADAS的车辆的驾驶员发送减速、刹车、转向等指令。例如,若查询结果显示到智能行驶设备前方的交通指示灯为红灯,则控制智能行驶设备减速停车。若查询结果显示到智能行驶设备前方出现行人,则控制智能行驶设备刹车。In this embodiment, the smart driving device may include an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistant System (ADAS), a robot, and the like. For example, it is possible to obtain road images collected by the smart driving device when driving or when it is stopped, and then use the above-mentioned target detection method to perform target detection on the road image. When the above-mentioned target detection method is adopted, the road image is used as the above-mentioned second image, and the supporting image is used as the above-mentioned first image. Then the intelligent driving equipment is controlled according to the target detection result. For example, it is possible to directly control intelligent driving equipment such as autonomous vehicles or robots to perform operations such as deceleration, braking, and steering, or to send instructions such as deceleration, braking, and steering to the driver of an ADAS-equipped vehicle. For example, if the query result shows that the traffic indicator in front of the smart driving device is red, the smart driving device is controlled to slow down and stop. If the query result shows that there is a pedestrian in front of the smart driving device, the smart driving device is controlled to brake.
图12为本申请实施例提供的目标检测过程的示意图。第一图像输入到第一卷积神经网络得到第一图像的多个不同尺度的第一特征图,第二图像输入到第二卷积神经网络得到第二图像的多个不同尺度的第二特征图。第二图像的第二特征图、第一图像的第一特征图以及第一图像的标签信息输入到生成模块,得到多个尺度的相似度图。多个尺度的相似度图输入到聚合模块,得到整合后的相似度图。将整合后的相似度图输入到第三卷积神 经网络,得到第二图像的语义分割图,从而实现对第二图像的目标检测。FIG. 12 is a schematic diagram of a target detection process provided by an embodiment of this application. The first image is input to the first convolutional neural network to obtain multiple first feature maps of different scales of the first image, and the second image is input to the second convolutional neural network to obtain multiple second features of different scales of the second image Figure. The second feature map of the second image, the first feature map of the first image, and the label information of the first image are input to the generating module to obtain similarity maps of multiple scales. The similarity maps of multiple scales are input to the aggregation module to obtain the integrated similarity map. Input the integrated similarity map to the third convolutional neural network to obtain the semantic segmentation map of the second image, so as to realize the target detection of the second image.
图13为本申请实施例提供的生成模块和聚合模块的示意图。图中,conv表示卷积层,pool表示池化处理。第一图像的特征图输入到生成模块131的第一卷积通道得到多个不同尺度的第一特征图。第二图像的特征图输入到生成模块131的第二卷积通道得到多个不同尺度的第二特征图,然后分别与第一图像的标签信息进行相乘处理和池化处理得到第一图像的多个不同尺度的特征向量。第二图像的多个不同尺度的特征图分别与对应尺度的特征向量进行逐元素相乘,得到多个不同尺度的相似度图。生成模块131输出多个不同尺度的相似度图到聚合模块132,聚合模块132对多个不同尺度的相似度图进行整合,输出整合后的相似度图。FIG. 13 is a schematic diagram of a generation module and an aggregation module provided by an embodiment of the application. In the figure, conv represents the convolutional layer, and pool represents the pooling process. The feature map of the first image is input to the first convolution channel of the generating module 131 to obtain multiple first feature maps of different scales. The feature map of the second image is input to the second convolution channel of the generating module 131 to obtain a plurality of second feature maps of different scales, which are then multiplied and pooled with the label information of the first image to obtain the image of the first image. Multiple feature vectors of different scales. Multiple feature maps of different scales of the second image are respectively multiplied element by element with feature vectors of corresponding scales to obtain multiple similarity maps of different scales. The generating module 131 outputs multiple similarity maps of different scales to the aggregation module 132, and the aggregation module 132 integrates the multiple similarity maps of different scales, and outputs the integrated similarity maps.
图14为本申请实施例提供的目标检测方法中相似性特征提取方式与通过内积或余弦距离提取相似性特征的方式的对比示意图。图中左侧部分为通过内积或余弦距离提取相似性特征的示意图。图中右侧部分为通过对应像素位置的向量相乘提取相似性特征的示意图。本申请实施例提出的方法通过逐元素相乘的方法,相较于内积或余弦距离,使得输出的相似度图由单通道变为多通道,这样能够保留相似性信息的通道信息,同时还能够结合后续的卷积和非线性操作进一步合理表达相似性特征,从而进一步提高目标检测的精准度。FIG. 14 is a schematic diagram of comparison between the similarity feature extraction method and the similarity feature extraction method through inner product or cosine distance in the target detection method provided by an embodiment of the application. The left part of the figure is a schematic diagram of similarity features extracted by inner product or cosine distance. The right part of the figure is a schematic diagram of extracting similarity features by multiplying the vectors of corresponding pixel positions. Compared with the inner product or cosine distance, the method proposed in the embodiment of the present application uses a method of element-wise multiplication to change the output similarity map from a single channel to a multi-channel, which can retain the channel information of the similarity information, and at the same time It can be combined with subsequent convolution and nonlinear operations to further rationally express similarity features, thereby further improving the accuracy of target detection.
图15为本申请一实施例提供的目标检测装置的结构示意图。本申请实施例提供的目标检测装置可以执行目标检测方法实施例提供的处理流程,如图15所示,本实施例提供的目标检测装置150包括:特征提取模块151和确定模块152;其中,特征提取模块151,用于分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图;确定模块152,用于根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的第二特征图,确定第二图像中的待查询目标;第一图像的标签是对第一图像中包含的待查询目标进行标注的结果。FIG. 15 is a schematic structural diagram of a target detection device provided by an embodiment of the application. The target detection device provided by the embodiment of the present application can execute the processing flow provided in the embodiment of the target detection method. As shown in FIG. 15, the target detection device 150 provided in this embodiment includes: a feature extraction module 151 and a determination module 152; The extraction module 151 is used for extracting multiple features of different scales on the first image and the second image to obtain multiple first feature maps of different scales and multiple second feature maps of different scales; the determining module 152 uses To determine the target to be queried in the second image according to the labels of the first feature map and the first image of multiple different scales, and the second feature map of the corresponding scale; The result of marking the target to be queried.
可选的,特征提取模块151在分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征 图时,具体包括:分别对第一图像和第二图像进行特征提取,得到第一特征图和第二特征图;分别对第一特征图和第二特征图进行多次尺度变换,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图。Optionally, when the feature extraction module 151 performs feature extraction of multiple different scales on the first image and the second image respectively to obtain multiple first feature maps of different scales and multiple second feature maps of different scales, specifically It includes: extracting features of the first image and the second image respectively to obtain the first feature map and the second feature map; respectively performing multiple scale transformations on the first feature map and the second feature map to obtain multiple first feature maps of different scales. One feature map and multiple second feature maps of different scales.
可选的,特征提取模块151在分别对第一特征图和第二特征图进行多次尺度变换时,具体包括:对第一特征图和第二特征图分别进行至少两次降采样。Optionally, when the feature extraction module 151 performs multiple scale transformations on the first feature map and the second feature map respectively, it specifically includes: performing down-sampling on the first feature map and the second feature map at least twice, respectively.
可选的,确定模块152根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的第二特征图,确定第二图像中的待查询目标时,具体包括:根据多个不同尺度的第一特征图和所述第一图像的标签,确定多个不同尺度的第一特征向量;将多个不同尺度的第一特征向量与相应尺度的第二特征图按照预设计算规则进行计算,得到计算结果;根据计算结果,确定第二图像的掩码图像;根据掩码图像,确定第二图像中的待查询目标。Optionally, when the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, it specifically includes: The first feature maps of different scales and the labels of the first image determine multiple first feature vectors of different scales; the multiple first feature vectors of different scales and the second feature maps of corresponding scales are combined according to a preset calculation rule The calculation is performed to obtain the calculation result; the mask image of the second image is determined according to the calculation result; the target to be queried in the second image is determined according to the mask image.
可选的,确定模块152根据多个不同尺度的第一特征图和第一图像的标签,以及相应尺度的第二特征图,确定第二图像中的待查询目标时,具体包括:以多个不同尺度的第一特征图、第一图像的标签以及相应尺度的第二特征图作为相应尺度的第三特征图的指导信息,确定第二图像中的待查询图像;其中,第三特征图根据第二图像确定,且同一尺度的第二特征图和第三特征图不同。Optionally, when the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, it specifically includes: The first feature map of different scales, the label of the first image, and the second feature map of the corresponding scale are used as the guidance information of the third feature map of the corresponding scale to determine the image to be queried in the second image; wherein the third feature map is based on The second image is determined, and the second feature map and the third feature map of the same scale are different.
可选的,确定模块152以多个不同尺度的第一特征图、第一图像的标签以及相应尺度的第二特征图作为相应尺度的第三特征图的指导信息,确定第二图像中的待查询图,具体包括:根据多个不同尺度的第一特征图和第一图像的标签,确定多个不同尺度的第一特征向量;将多个不同尺度的第一特征向量与相应尺度的第二特征图按照预设计算规则进行计算,得到多个不同尺度下的掩码图像;根据多个不同尺度的掩码图像和相应尺度的第三特征图相乘的结果,确定第二图像中的待查询目标。Optionally, the determining module 152 uses multiple first feature maps of different scales, labels of the first image, and second feature maps of corresponding scales as the guidance information of the third feature maps of corresponding scales to determine the to-be-determined image in the second image. The query map specifically includes: determining a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and the labels of the first images; and combining the plurality of first feature vectors of different scales with the second feature vectors of corresponding scales. The feature map is calculated according to preset calculation rules to obtain multiple mask images at different scales; according to the result of multiplying multiple mask images with different scales and the third feature map of the corresponding scale, the second image is determined Query target.
可选的,预设计算规则包括:内积的计算规则,或者余弦距离的计算规则。Optionally, the preset calculation rules include: inner product calculation rules, or cosine distance calculation rules.
可选的,确定模块152根据多个不同尺度的第一特征图和第一图像的标签信息,以及相应尺度的第二特征图,确定第二图像中的待查询目标,具体包括:根据多个不同尺度的第一特征图、第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图;一个尺度的相似度图表征该尺度的第一特征图和第二特征图的相似性;将多个不同尺度的相似度图整合,得到 整合后的相似度图;根据整合后的相似度图,确定第二图像中的待查询目标。Optionally, the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and label information of the first image, and second feature maps of corresponding scales, which specifically includes: The first feature map of different scales, the label information of the first image, and the second feature map of the corresponding scale determine multiple similarity maps of different scales; a similarity map of one scale represents the first feature map and the second feature of the scale Similarity of the graphs; integrate multiple similarity graphs of different scales to obtain an integrated similarity graph; determine the target to be queried in the second image according to the integrated similarity graph.
可选的,确定模块152根据多个不同尺度的第一特征图、第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图,具体包括:根据多个不同尺度的第一特征图和第一图像的标签信息,确定多个不同尺度的第一特征向量;将多个不同尺度的第一特征向量与相应尺度的第二特征图逐元素相乘,得到多个不同尺度的相似度图。Optionally, the determining module 152 determines multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, which specifically includes: The first feature map and the label information of the first image are determined to determine multiple first feature vectors of different scales; the multiple first feature vectors of different scales and the second feature maps of corresponding scales are multiplied element by element to obtain multiple Similarity graphs of different scales.
可选的,确定模块152将多个不同尺度的相似度图整合,得到整合后的相似度图,具体包括:对多个不同尺度的相似度图进行上采样,得到多个尺度相同的相似度图;对多个尺度相同的相似度图相加,得到整合后的相似度图。Optionally, the determining module 152 integrates multiple similarity maps of different scales to obtain an integrated similarity map, which specifically includes: up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale. Figure: Add multiple similarity maps with the same scale to get an integrated similarity map.
可选的,确定模块152将多个不同尺度的相似度图整合,得到整合后的相似度图,具体包括:多个不同尺度的相似度图构成相似度图集合;对相似度图集合中尺度最小的相似度图进行上采样,得到与尺度第二小的相似度图相同尺度的相似度图;将得到的相似度图与尺度第二小的相似度图相加,得到新的相似度图;将相似度图集合中未经过上采样处理或者相加处理的相似度图与新的相似度图构成新的相似度图集合,重复执行上采样的步骤和相加的步骤,直至得到最后一个相似度图,所得到的最后一个相似度图为整合后的相似度图。Optionally, the determining module 152 integrates a plurality of similarity maps of different scales to obtain an integrated similarity map, which specifically includes: a plurality of similarity maps of different scales constitute a similarity map set; The smallest similarity map is up-sampled to obtain a similarity map of the same scale as the second-smallest similarity map; add the obtained similarity map to the second-smallest similarity map to obtain a new similarity map ; The similarity map that has not been up-sampled or added in the similarity map set is combined with the new similarity map to form a new similarity map set, and the up-sampling step and the adding step are repeated until the last one is obtained Similarity graph, the last obtained similarity graph is the integrated similarity graph.
可选的,确定模块152还用于:将多个不同尺度的相似度图和相应尺度的第三特征图逐元素相乘,得到处理后的多个不同尺度的相似度图;其中,第三特征图根据第二图像确定,且同一尺度的第一特征图和第三特征图不同;将处理后的多个不同尺度的相似度图整合,得到整合后的相似度图。Optionally, the determining module 152 is further configured to: multiply a plurality of similarity maps of different scales and a third feature map of corresponding scales element by element to obtain a plurality of processed similarity maps of different scales; wherein, the third The feature map is determined according to the second image, and the first feature map and the third feature map of the same scale are different; the processed similarity maps of different scales are integrated to obtain an integrated similarity map.
可选的,目标检测装置由神经网络实现,该装置还包括:训练模块153,用于采用以下步骤训练得到神经网络,该步骤包括:分别对第一样本图像和第二样本图像进行多个不同尺度的特征提取,得到多个不同尺度的第四特征图和多个不同尺度的第五特征图;其中,第一样本图像和第二样本图像均包含第一类别的对象;根据多个不同尺度的第四特征图和第一样本图像的标签,以及相应尺度的第五特征图,确定第二样本图像中的第一类别的对象;第一样本图像的标签是对第一样本图像中包含的第一类别的对象进行标注的结果;根据确定的第二样本图像中的第一类别的对象以及第二样本图像的标签之间的差异,调整神经网络的网络参数;第二样本图像的标签是对第二样本图像中包含的第一类别的对象进行标注的结果。Optionally, the target detection device is implemented by a neural network, and the device further includes: a training module 153 for training to obtain a neural network by using the following steps. This step includes: performing multiple operations on the first sample image and the second sample image, respectively. The feature extraction of different scales obtains multiple fourth feature maps of different scales and multiple fifth feature maps of different scales; wherein, the first sample image and the second sample image both contain objects of the first category; The labels of the fourth feature map and the first sample image of different scales, and the fifth feature map of the corresponding scale, determine the first category of objects in the second sample image; the labels of the first sample image are the same as the first The result of labeling the objects of the first category contained in this image; adjust the network parameters of the neural network according to the determined difference between the objects of the first category in the second sample image and the labels of the second sample image; second The label of the sample image is the result of labeling the objects of the first category contained in the second sample image.
可选的,该装置还包括:测试模块154,用于对训练完成的神经网络进行测试;测试模块具体采用以下步骤对训练完成的神经网络进行测试:分别对第一测试图像和第二测试图像进行多个不同尺度的特征提取,得到多个不同尺度的第一测试特征图和多个不同尺度的第二测试特征图;其中,第一测试图像和第二测试图像来源于一个测试图像集,测试图像集中的各个测试图像均包括同一类别的对象;根据多个不同尺度的第一测试特征图和第一测试图像的标签,以及相应尺度的第二测试特征图,确定第二测试图像中的待查询目标;第一测试图像的标签是对第一测试图像中包含的待查询目标进行标注的结果。Optionally, the device further includes: a testing module 154 for testing the trained neural network; the testing module specifically uses the following steps to test the trained neural network: the first test image and the second test image are respectively tested Perform feature extraction of multiple different scales to obtain multiple first test feature maps of different scales and multiple second test feature maps of different scales; wherein the first test image and the second test image are derived from a test image set, Each test image in the test image set includes objects of the same category; according to multiple first test feature maps of different scales, labels of the first test images, and second test feature maps of corresponding scales, determine the second test image The target to be queried; the label of the first test image is the result of labeling the target to be queried contained in the first test image.
本申请实施例提供的目标检测装置,可用于执行上述的目标检测方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。The target detection device provided in the embodiment of the present application can be used to implement the above-mentioned target detection method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here in this embodiment.
图16为本申请一实施例提供的智能行驶装置的结构示意图。如图16所示,本实施例提供的智能行驶装置160包括:采集模块161、查询模块162和控制模块163;其中,采集模块161,用于采集道路图像;查询模块162,用于采用本申请实施例提供的目标检测方法,根据支持图像以及支持图像的标签对采集到的道路图像进行待查询目标的查询;其中,支持图像的标签是对支持图像中包含的与待查询目标同一类别的目标进行标注的结果;控制模块163,用于根据查询结果对采集道路图像的智能行驶设备进行控制。FIG. 16 is a schematic structural diagram of a smart driving device provided by an embodiment of the application. As shown in FIG. 16, the intelligent driving device 160 provided in this embodiment includes: an acquisition module 161, a query module 162, and a control module 163; wherein, the acquisition module 161 is used to collect road images; the query module 162 is used to adopt the application In the target detection method provided by the embodiment, the collected road images are searched for the target to be queried according to the support image and the label of the support image; wherein the label of the support image is for the target contained in the support image and the target of the same category as the target to be queried. The result of labeling; the control module 163 is used to control the intelligent driving device that collects road images according to the query result.
本申请实施例提供的智能行驶装置的实施可以参考前述的智能行驶方法,其实现原理和技术效果类似,本实施例此处不再赘述。The implementation of the smart driving device provided in the embodiment of the present application can refer to the foregoing smart driving method, and the implementation principle and technical effect are similar, and the details are not described herein again in this embodiment.
图17为本申请一实施例提供的目标检测设备的硬件结构示意图。本申请实施例提供的目标检测设备可以执行目标检测方法实施例提供的处理流程,如图17所示,本实施例提供的目标检测设备170包括:至少一个处理器171和存储器172。该目标检测设备170还包括通信部件173。其中,处理器171、存储器172以及通信部件173通过总线174连接。FIG. 17 is a schematic diagram of the hardware structure of a target detection device provided by an embodiment of the application. The target detection device provided in the embodiment of the present application can execute the processing flow provided in the embodiment of the target detection method. As shown in FIG. 17, the target detection device 170 provided in this embodiment includes: at least one processor 171 and a memory 172. The target detection device 170 also includes a communication component 173. Among them, the processor 171, the memory 172, and the communication component 173 are connected by a bus 174.
在具体实现过程中,至少一个处理器171执行所述存储器172存储的计算机执行指令,使得至少一个处理器171执行如上的目标检测方法。In a specific implementation process, at least one processor 171 executes the computer-executable instructions stored in the memory 172, so that the at least one processor 171 executes the above target detection method.
处理器171的具体实现过程可参见上述目标检测方法实施例,其实现原 理和技术效果类似,本实施例此处不再赘述。For the specific implementation process of the processor 171, refer to the foregoing embodiment of the target detection method. The implementation principles and technical effects are similar, and the details are not described here in this embodiment.
图18为本申请一实施例提供的智能行驶设备的硬件结构示意图。本申请实施例提供的智能行驶设备可以执行智能行驶方法实施例提供的处理流程,如图18所示,本实施例提供的智能行驶设备180包括:至少一个处理器181和存储器182。该智能行驶设备180还包括通信部件183。其中,处理器181、存储器182以及通信部件183通过总线184连接。FIG. 18 is a schematic diagram of the hardware structure of a smart driving device provided by an embodiment of the application. The smart driving device provided in the embodiment of the present application can execute the processing flow provided in the smart driving method embodiment. As shown in FIG. 18, the smart driving device 180 provided in this embodiment includes: at least one processor 181 and a memory 182. The smart driving device 180 also includes a communication component 183. Among them, the processor 181, the memory 182, and the communication component 183 are connected by a bus 184.
在具体实现过程中,至少一个处理器181执行所述存储器182存储的计算机执行指令,使得至少一个处理器181执行如上的智能行驶方法。In a specific implementation process, at least one processor 181 executes the computer-executable instructions stored in the memory 182, so that the at least one processor 181 executes the above intelligent driving method.
处理器181的具体实现过程可参见上述智能行驶方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。For the specific implementation process of the processor 181, refer to the foregoing embodiment of the smart driving method, and its implementation principles and technical effects are similar, and will not be repeated here in this embodiment.
在上述的图17和图18所示的实施例中,应理解,处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合申请所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the above-mentioned embodiments shown in FIG. 17 and FIG. 18, it should be understood that the processor may be a central processing unit (English: Central Processing Unit, abbreviated as: CPU), or other general-purpose processors, digital signal processors ( English: Digital Signal Processor, abbreviation: DSP), Application Specific Integrated Circuit (English: Application Specific Integrated Circuit, abbreviation: ASIC), etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in combination with the application can be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器。The memory may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory.
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。The bus can be an Industry Standard Architecture (ISA) bus, Peripheral Component (PCI) bus, or Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, the buses in the drawings of this application are not limited to only one bus or one type of bus.
在另一个实施例中,本申请实施例中还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现所述目标检测方法或智能行驶方法的步骤。In another embodiment, the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the target detection method or the intelligent driving method are realized.
在再一个实施例中,本申请实施例还提供一种运行指令的芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述目标检测方法或智能行驶方法的步骤。In still another embodiment, an embodiment of the present application further provides a chip for executing instructions. The chip includes a memory and a processor. The memory stores code and data. The memory is coupled with the processor. The processor runs the code in the memory so that the chip is used to execute the steps of the above-mentioned target detection method or smart driving method.
在又一个实施例中,本申请实施例还提供一种包含指令的程序产品,当所述程序产品在计算机上运行时,使得所述计算机执行上述目标检测方法或智能行驶方法的步骤。In yet another embodiment, the embodiment of the present application further provides a program product containing instructions, which when the program product runs on a computer, causes the computer to execute the steps of the above-mentioned target detection method or smart driving method.
在又一个实施例中,本申请实施例还提供了一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述的目标检测方法或智能行驶方法的步骤。In yet another embodiment, the embodiment of the present application further provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the above-mentioned target detection method or smart driving method.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor to execute the method described in each embodiment of the present application. Part of the steps. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模 块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, that is, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application, not to limit them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application. range.

Claims (36)

  1. 一种目标检测方法,其特征在于,包括:A target detection method is characterized in that it comprises:
    分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图;Performing multiple feature extractions of different scales on the first image and the second image, respectively, to obtain multiple first feature maps of different scales and multiple second feature maps of different scales;
    根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标;所述第一图像的标签是对所述第一图像中包含的待查询目标进行标注的结果。According to a plurality of first feature maps of different scales and labels of the first image, and the second feature maps of corresponding scales, the target to be queried in the second image is determined; the label of the first image is The result of marking the target to be queried contained in the first image.
  2. 根据权利要求1所述的方法,其特征在于,所述分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图,包括:The method according to claim 1, wherein the first image and the second image are extracted with multiple features of different scales to obtain multiple first feature maps of different scales and multiple first feature maps of different scales. Two feature maps, including:
    分别对所述第一图像和所述第二图像进行特征提取,得到第一特征图和第二特征图;Performing feature extraction on the first image and the second image respectively to obtain a first feature map and a second feature map;
    分别对所述第一特征图和所述第二特征图进行多次尺度变换,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图。Perform multiple scale transformations on the first feature map and the second feature map, respectively, to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
  3. 根据权利要求2所述的方法,其特征在于,所述分别对所述第一特征图和所述第二特征图进行多次尺度变换,包括:The method according to claim 2, wherein said performing multiple scale transformations on said first feature map and said second feature map respectively comprises:
    对所述第一特征图和所述第二特征图分别进行至少两次降采样。The first feature map and the second feature map are down-sampled at least twice, respectively.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标,包括:The method according to any one of claims 1 to 3, wherein the first feature map and the label of the first image according to a plurality of different scales, and the second feature map of corresponding scales, Determining the target to be queried in the second image includes:
    根据多个不同尺度的第一特征图和所述第一图像的标签,确定多个不同尺度的第一特征向量;Determine a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and labels of the first image;
    将所述多个不同尺度的第一特征向量与相应尺度的所述第二特征图按照预设计算规则进行计算,得到计算结果;Calculating the plurality of first feature vectors of different scales and the second feature map of corresponding scales according to a preset calculation rule to obtain a calculation result;
    根据所述计算结果,确定所述第二图像的掩码图像;Determine the mask image of the second image according to the calculation result;
    根据所述掩码图像,确定所述第二图像中的待查询目标。According to the mask image, the target to be queried in the second image is determined.
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标,包括:The method according to any one of claims 1 to 3, wherein the first feature map and the label of the first image according to a plurality of different scales, and the second feature map of corresponding scales, Determining the target to be queried in the second image includes:
    以多个不同尺度的第一特征图、所述第一图像的标签以及相应尺度的所 述第二特征图作为相应尺度的第三特征图的指导信息,确定所述第二图像中的待查询图像;Use multiple first feature maps of different scales, labels of the first image, and the second feature maps of the corresponding scales as the guidance information of the third feature maps of the corresponding scales to determine the to-be-queried in the second image image;
    其中,所述第三特征图根据所述第二图像确定,且同一尺度的第二特征图和第三特征图不同。Wherein, the third feature map is determined according to the second image, and the second feature map and the third feature map of the same scale are different.
  6. 根据权利要求5所述的方法,其特征在于,所述以多个不同尺度的第一特征图、所述第一图像的标签以及相应尺度的所述第二特征图作为相应尺度的第三特征图的指导信息,确定所述第二图像中的待查询图,包括:The method according to claim 5, wherein the first feature map with a plurality of different scales, the label of the first image, and the second feature map of the corresponding scale are used as the third feature of the corresponding scale The guide information of the image, which determines the image to be queried in the second image, includes:
    根据多个不同尺度的第一特征图和所述第一图像的标签,确定多个不同尺度的第一特征向量;Determine a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and labels of the first image;
    将所述多个不同尺度的第一特征向量与相应尺度的所述第二特征图按照预设计算规则进行计算,得到多个不同尺度下的掩码图像;Calculating the plurality of first feature vectors of different scales and the second feature map of corresponding scales according to a preset calculation rule to obtain a plurality of mask images at different scales;
    根据多个不同尺度的掩码图像和相应尺度的所述第三特征图相乘的结果,确定所述第二图像中的待查询目标。According to a multiplication result of a plurality of mask images of different scales and the third feature map of corresponding scales, the target to be queried in the second image is determined.
  7. 根据权利要求4或6所述的方法,其特征在于,所述预设计算规则包括:内积的计算规则,或者余弦距离的计算规则。The method according to claim 4 or 6, wherein the preset calculation rules include: inner product calculation rules or cosine distance calculation rules.
  8. 根据权利要求1所述的方法,其特征在于,所述根据所述多个不同尺度的第一特征图和所述第一图像的标签信息,以及相应尺度的第二特征图,确定所述第二图像中的待查询目标,包括:The method according to claim 1, wherein said determining said first feature map according to said multiple first feature maps of different scales and label information of said first image, and a second feature map of corresponding scales 2. The target to be queried in the image includes:
    根据多个不同尺度的所述第一特征图、所述第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图;一个尺度的相似度图表征该尺度的第一特征图和第二特征图的相似性;According to the first feature maps of multiple different scales, the label information of the first image, and the second feature maps of the corresponding scales, multiple similarity maps of different scales are determined; a similarity map of one scale represents the first feature map of the scale. The similarity between one feature map and the second feature map;
    将多个不同尺度的相似度图整合,得到整合后的相似度图;Integrate multiple similarity maps of different scales to obtain an integrated similarity map;
    根据整合后的相似度图,确定所述第二图像中的待查询目标。According to the integrated similarity map, the target to be queried in the second image is determined.
  9. 根据权利要求8所述的方法,其特征在于,所述根据多个不同尺度的所述第一特征图、所述第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图,包括:8. The method according to claim 8, wherein said determining a plurality of different scales according to a plurality of first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales. Similarity graph, including:
    根据多个不同尺度的第一特征图和所述第一图像的标签信息,确定多个不同尺度的第一特征向量;Determine a plurality of first feature vectors of different scales according to the plurality of first feature maps of different scales and the label information of the first image;
    将所述多个不同尺度的第一特征向量与相应尺度的所述第二特征图逐元素相乘,得到多个不同尺度的相似度图。The multiple first feature vectors of different scales and the second feature map of corresponding scales are multiplied element by element to obtain multiple similarity maps of different scales.
  10. 根据权利要求8或9所述的方法,其特征在于,所述将多个不同尺度 的相似度图整合,得到整合后的相似度图,包括:The method according to claim 8 or 9, wherein the integrating multiple similarity maps of different scales to obtain an integrated similarity map comprises:
    对多个不同尺度的相似度图进行上采样,得到多个尺度相同的相似度图;Up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale;
    对多个尺度相同的相似度图相加,得到整合后的相似度图。Add multiple similarity maps with the same scale to obtain an integrated similarity map.
  11. 根据权利要求8或9所述的方法,其特征在于,所述将多个不同尺度的相似度图整合,得到整合后的相似度图,包括:The method according to claim 8 or 9, wherein the integrating a plurality of similarity maps of different scales to obtain an integrated similarity map comprises:
    所述多个不同尺度的相似度图构成相似度图集合;The multiple similarity graphs of different scales constitute a similarity graph set;
    对所述相似度图集合中尺度最小的相似度图进行上采样,得到与尺度第二小的相似度图相同尺度的相似度图;Up-sampling the similarity map with the smallest scale in the set of similarity maps to obtain a similarity map with the same scale as the second-smallest similarity map;
    将得到的相似度图与尺度第二小的相似度图相加,得到新的相似度图;Add the obtained similarity map to the second-smallest similarity map to obtain a new similarity map;
    将所述相似度图集合中未经过上采样处理或者相加处理的相似度图与新的相似度图构成新的相似度图集合,重复执行上采样的步骤和相加的步骤,直至得到最后一个相似度图,所得到的最后一个相似度图为整合后的相似度图。The similarity graphs that have not undergone upsampling or addition processing in the similarity graph set are combined with the new similarity graphs to form a new similarity graph set, and the upsampling step and the adding step are repeated until the final result is obtained. A similarity graph, the last obtained similarity graph is the integrated similarity graph.
  12. 根据权利要求8-11任一项所述的方法,其特征在于,所述根据多个不同尺度的所述第一特征图、所述第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图之后,将多个不同尺度的相似度图整合,得到整合后的相似度图之前,所述方法还包括:The method according to any one of claims 8-11, wherein the first feature map of a plurality of different scales, the label information of the first image, and the second feature map of the corresponding scale are determined After the multiple similarity maps of different scales are integrated, before the multiple similarity maps of different scales are integrated to obtain the integrated similarity map, the method further includes:
    将多个不同尺度的相似度图和相应尺度的第三特征图逐元素相乘,得到处理后的多个不同尺度的相似度图;其中,所述第三特征图根据所述第二图像确定,且同一尺度的第一特征图和第三特征图不同;Multiple similarity maps of different scales and third feature maps of corresponding scales are multiplied element by element to obtain processed similarity maps of different scales; wherein, the third feature map is determined according to the second image , And the first feature map and the third feature map of the same scale are different;
    将多个不同尺度的相似度图整合,得到整合后的相似度图,包括:Integrate multiple similarity maps of different scales to obtain an integrated similarity map, including:
    将处理后的多个不同尺度的相似度图整合,得到整合后的相似度图。The processed similarity maps of different scales are integrated to obtain an integrated similarity map.
  13. 根据权利要求1-12任一项所述的方法,其特征在于,所述目标检测方法由神经网络执行,所述神经网络采用以下步骤训练得到:The method according to any one of claims 1-12, wherein the target detection method is executed by a neural network, and the neural network is trained by the following steps:
    分别对第一样本图像和第二样本图像进行多个不同尺度的特征提取,得到多个不同尺度的第四特征图和多个不同尺度的第五特征图;其中,所述第一样本图像和所述第二样本图像均包含第一类别的对象;Perform multiple feature extractions of different scales on the first sample image and the second sample image, respectively, to obtain multiple fourth feature maps of different scales and multiple fifth feature maps of different scales; wherein, the first sample Both the image and the second sample image contain objects of the first category;
    根据多个不同尺度的第四特征图和所述第一样本图像的标签,以及相应尺度的所述第五特征图,确定所述第二样本图像中的所述第一类别的对象;所述第一样本图像的标签是对所述第一样本图像中包含的所述第一类别的对象进行标注的结果;Determine the object of the first category in the second sample image according to a plurality of fourth feature maps of different scales and labels of the first sample image, and the fifth feature map of corresponding scales; The label of the first sample image is a result of labeling the objects of the first category contained in the first sample image;
    根据确定的所述第二样本图像中的所述第一类别的对象以及所述第二样本图像的标签之间的差异,调整所述神经网络的网络参数;所述第二样本图像的标签是对所述第二样本图像中包含的所述第一类别的对象进行标注的结果。Adjust the network parameters of the neural network according to the determined difference between the object of the first category in the second sample image and the label of the second sample image; the label of the second sample image is The result of labeling the objects of the first category included in the second sample image.
  14. 根据权利要求13所述的方法,其特征在于,在所述神经网络训练完成后,所述方法还包括:对训练完成的神经网络进行测试;The method according to claim 13, characterized in that, after the neural network training is completed, the method further comprises: testing the trained neural network;
    采用以下步骤对训练完成的神经网络进行测试:Use the following steps to test the trained neural network:
    分别对第一测试图像和第二测试图像进行多个不同尺度的特征提取,得到多个不同尺度的第一测试特征图和多个不同尺度的第二测试特征图;Performing multiple feature extractions of different scales on the first test image and the second test image, respectively, to obtain multiple first test feature maps of different scales and multiple second test feature maps of different scales;
    其中,所述第一测试图像和所述第二测试图像来源于一个测试图像集,所述测试图像集中的各个测试图像均包括同一类别的对象;Wherein, the first test image and the second test image are derived from a test image set, and each test image in the test image set includes objects of the same category;
    根据多个不同尺度的第一测试特征图和所述第一测试图像的标签,以及相应尺度的所述第二测试特征图,确定所述第二测试图像中的待查询目标;所述第一测试图像的标签是对所述第一测试图像中包含的待查询目标进行标注的结果。According to a plurality of first test feature maps of different scales and labels of the first test image, and the second test feature maps of corresponding scales, the target to be queried in the second test image is determined; the first The label of the test image is the result of labeling the target to be queried contained in the first test image.
  15. 一种智能行驶方法,其特征在于,包括:An intelligent driving method, characterized in that it includes:
    采集道路图像;Collect road images;
    采用如权利要求1-14任一项所述的方法根据支持图像以及所述支持图像的标签对采集到的道路图像进行待查询目标的查询;其中,所述支持图像的标签是对所述支持图像中包含的与所述待查询目标同一类别的目标进行标注的结果;The method according to any one of claims 1-14 is used to query the collected road images according to the supporting image and the label of the supporting image; wherein, the label of the supporting image is a reference to the supporting image. The result of marking the target in the same category as the target to be queried contained in the image;
    根据查询结果对采集道路图像的智能行驶设备进行控制。According to the query results, the intelligent driving equipment that collects road images is controlled.
  16. 一种目标检测装置,其特征在于,包括:特征提取模块和确定模块;A target detection device is characterized by comprising: a feature extraction module and a determination module;
    所述特征提取模块,用于分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图;The feature extraction module is configured to perform feature extraction of a plurality of different scales on the first image and the second image respectively to obtain a plurality of first feature maps of different scales and a plurality of second feature maps of different scales;
    所述确定模块,用于根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标;所述第一图像的标签是对所述第一图像中包含的待查询目标进行标注的结果。The determining module is configured to determine the target to be queried in the second image according to a plurality of first feature maps of different scales and labels of the first image, and the second feature maps of corresponding scales; The label of the first image is a result of labeling the target to be queried contained in the first image.
  17. 根据权利要求16所述的装置,其特征在于,所述特征提取模块在分别对第一图像和第二图像进行多个不同尺度的特征提取,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图时,具体包括:The device according to claim 16, wherein the feature extraction module performs feature extraction of a plurality of different scales on the first image and the second image, respectively, to obtain a plurality of first feature maps of different scales and a plurality of When the second feature maps of different scales, specifically include:
    分别对所述第一图像和所述第二图像进行特征提取,得到第一特征图和第二特征图;Performing feature extraction on the first image and the second image respectively to obtain a first feature map and a second feature map;
    分别对所述第一特征图和所述第二特征图进行多次尺度变换,得到多个不同尺度的第一特征图和多个不同尺度的第二特征图。Perform multiple scale transformations on the first feature map and the second feature map, respectively, to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
  18. 根据权利要求17所述的装置,其特征在于,所述特征提取模块在分别对所述第一特征图和所述第二特征图进行多次尺度变换时,具体包括:The device according to claim 17, wherein when the feature extraction module performs multiple scale transformations on the first feature map and the second feature map respectively, it specifically comprises:
    对所述第一特征图和所述第二特征图分别进行至少两次降采样。The first feature map and the second feature map are down-sampled at least twice, respectively.
  19. 根据权利要求16-18任一项所述的装置,其特征在于,所述确定模块根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标时,具体包括:The device according to any one of claims 16-18, wherein the determining module is based on a plurality of first feature maps of different scales, labels of the first image, and the second features of corresponding scales. Figure, when determining the target to be queried in the second image, it specifically includes:
    根据多个不同尺度的第一特征图和所述第一图像的标签,确定多个不同尺度的第一特征向量;Determine a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and labels of the first image;
    将所述多个不同尺度的第一特征向量与相应尺度的所述第二特征图按照预设计算规则进行计算,得到计算结果;Calculating the plurality of first feature vectors of different scales and the second feature map of corresponding scales according to a preset calculation rule to obtain a calculation result;
    根据所述计算结果,确定所述第二图像的掩码图像;Determine the mask image of the second image according to the calculation result;
    根据所述掩码图像,确定所述第二图像中的待查询目标。According to the mask image, the target to be queried in the second image is determined.
  20. 根据权利要求16-18任一项所述的装置,其特征在于,所述确定模块根据多个不同尺度的第一特征图和所述第一图像的标签,以及相应尺度的所述第二特征图,确定所述第二图像中的待查询目标时,具体包括:The device according to any one of claims 16-18, wherein the determining module is based on a plurality of first feature maps of different scales, labels of the first image, and the second features of corresponding scales. Figure, when determining the target to be queried in the second image, it specifically includes:
    以多个不同尺度的第一特征图、所述第一图像的标签以及相应尺度的所述第二特征图作为相应尺度的第三特征图的指导信息,确定所述第二图像中的待查询图像;Use multiple first feature maps of different scales, labels of the first image, and the second feature maps of the corresponding scales as the guidance information of the third feature maps of the corresponding scales to determine the to-be-queried in the second image image;
    其中,所述第三特征图根据所述第二图像确定,且同一尺度的第二特征图和第三特征图不同。Wherein, the third feature map is determined according to the second image, and the second feature map and the third feature map of the same scale are different.
  21. 根据权利要求20所述的装置,其特征在于,所述确定模块以多个不同尺度的第一特征图、所述第一图像的标签以及相应尺度的所述第二特征图作为相应尺度的第三特征图的指导信息,确定所述第二图像中的待查询图,具体包括:The device according to claim 20, wherein the determining module uses a plurality of first feature maps of different scales, labels of the first images, and the second feature maps of corresponding scales as the first feature maps of corresponding scales. The guidance information of the three-characteristic image, which determines the image to be queried in the second image, specifically includes:
    根据多个不同尺度的第一特征图和所述第一图像的标签,确定多个不同尺度的第一特征向量;Determine a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and labels of the first image;
    将所述多个不同尺度的第一特征向量与相应尺度的所述第二特征图按照 预设计算规则进行计算,得到多个不同尺度下的掩码图像;Calculating the plurality of first feature vectors of different scales and the second feature map of corresponding scales according to a preset calculation rule to obtain a plurality of mask images of different scales;
    根据多个不同尺度的掩码图像和相应尺度的所述第三特征图相乘的结果,确定所述第二图像中的待查询目标。According to a multiplication result of a plurality of mask images of different scales and the third feature map of corresponding scales, the target to be queried in the second image is determined.
  22. 根据权利要求19所述的装置,其特征在于,所述预设计算规则包括:内积的计算规则,或者余弦距离的计算规则。The device according to claim 19, wherein the preset calculation rule comprises: a calculation rule of an inner product or a calculation rule of a cosine distance.
  23. 根据权利要求16所述的装置,其特征在于,所述确定模块根据所述多个不同尺度的第一特征图和所述第一图像的标签信息,以及相应尺度的第二特征图,确定所述第二图像中的待查询目标,具体包括:The device according to claim 16, wherein the determining module determines the first feature map of different scales, the label information of the first image, and the second feature map of the corresponding scale. The target to be queried in the second image specifically includes:
    根据多个不同尺度的所述第一特征图、所述第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图;一个尺度的相似度图表征该尺度的第一特征图和第二特征图的相似性;According to the first feature maps of multiple different scales, the label information of the first image, and the second feature maps of the corresponding scales, multiple similarity maps of different scales are determined; a similarity map of one scale represents the first feature map of the scale. The similarity between one feature map and the second feature map;
    将多个不同尺度的相似度图整合,得到整合后的相似度图;Integrate multiple similarity maps of different scales to obtain an integrated similarity map;
    根据整合后的相似度图,确定所述第二图像中的待查询目标。According to the integrated similarity map, the target to be queried in the second image is determined.
  24. 根据权利要求23所述的装置,其特征在于,所述确定模块根据多个不同尺度的所述第一特征图、所述第一图像的标签信息和对应尺度的第二特征图确定多个不同尺度的相似度图,具体包括:The device according to claim 23, wherein the determining module determines a plurality of different features according to a plurality of first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales. The similarity map of the scale, including:
    根据多个不同尺度的第一特征图和所述第一图像的标签信息,确定多个不同尺度的第一特征向量;Determine a plurality of first feature vectors of different scales according to the plurality of first feature maps of different scales and the label information of the first image;
    将所述多个不同尺度的第一特征向量与相应尺度的所述第二特征图逐元素相乘,得到多个不同尺度的相似度图。The multiple first feature vectors of different scales and the second feature map of corresponding scales are multiplied element by element to obtain multiple similarity maps of different scales.
  25. 根据权利要求23或24所述的装置,其特征在于,所述确定模块将多个不同尺度的相似度图整合,得到整合后的相似度图,具体包括:The device according to claim 23 or 24, wherein the determining module integrates a plurality of similarity maps of different scales to obtain an integrated similarity map, which specifically includes:
    对多个不同尺度的相似度图进行上采样,得到多个尺度相同的相似度图;Up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale;
    对多个尺度相同的相似度图相加,得到整合后的相似度图。Add multiple similarity maps with the same scale to obtain an integrated similarity map.
  26. 根据权利要求23或24所述的装置,其特征在于,所述确定模块将多个不同尺度的相似度图整合,得到整合后的相似度图,具体包括:The device according to claim 23 or 24, wherein the determining module integrates a plurality of similarity maps of different scales to obtain an integrated similarity map, which specifically includes:
    所述多个不同尺度的相似度图构成相似度图集合;The multiple similarity graphs of different scales constitute a similarity graph set;
    对所述相似度图集合中尺度最小的相似度图进行上采样,得到与尺度第二小的相似度图相同尺度的相似度图;Up-sampling the similarity map with the smallest scale in the set of similarity maps to obtain a similarity map with the same scale as the second-smallest similarity map;
    将得到的相似度图与尺度第二小的相似度图相加,得到新的相似度图;Add the obtained similarity map to the second-smallest similarity map to obtain a new similarity map;
    将所述相似度图集合中未经过上采样处理或者相加处理的相似度图与新 的相似度图构成新的相似度图集合,重复执行上采样的步骤和相加的步骤,直至得到最后一个相似度图,所得到的最后一个相似度图为整合后的相似度图。The similarity maps that have not undergone upsampling or addition processing in the similarity map set and the new similarity maps form a new similarity map set, and the upsampling step and the adding step are repeated until the final result is obtained. A similarity graph, the last obtained similarity graph is the integrated similarity graph.
  27. 根据权利要求23-26任一项所述的装置,其特征在于,所述确定模块,还用于:The device according to any one of claims 23-26, wherein the determining module is further configured to:
    将多个不同尺度的相似度图和相应尺度的第三特征图逐元素相乘,得到处理后的多个不同尺度的相似度图;其中,所述第三特征图根据所述第二图像确定,且同一尺度的第一特征图和第三特征图不同;Multiple similarity maps of different scales and third feature maps of corresponding scales are multiplied element by element to obtain processed similarity maps of different scales; wherein, the third feature map is determined according to the second image , And the first feature map and the third feature map of the same scale are different;
    将处理后的多个不同尺度的相似度图整合,得到整合后的相似度图。The processed similarity maps of different scales are integrated to obtain an integrated similarity map.
  28. 根据权利要求16-27任一项所述的装置,其特征在于,所述目标检测装置由神经网络实现,所述装置还包括:训练模块,用于采用以下步骤训练得到所述神经网络,所述步骤包括:The device according to any one of claims 16-27, wherein the target detection device is implemented by a neural network, and the device further comprises: a training module for training to obtain the neural network by using the following steps: The steps include:
    分别对第一样本图像和第二样本图像进行多个不同尺度的特征提取,得到多个不同尺度的第四特征图和多个不同尺度的第五特征图;其中,所述第一样本图像和所述第二样本图像均包含第一类别的对象;Perform multiple feature extractions of different scales on the first sample image and the second sample image, respectively, to obtain multiple fourth feature maps of different scales and multiple fifth feature maps of different scales; wherein, the first sample Both the image and the second sample image contain objects of the first category;
    根据多个不同尺度的第四特征图和所述第一样本图像的标签,以及相应尺度的所述第五特征图,确定所述第二样本图像中的所述第一类别的对象;所述第一样本图像的标签是对所述第一样本图像中包含的所述第一类别的对象进行标注的结果;Determine the object of the first category in the second sample image according to a plurality of fourth feature maps of different scales and labels of the first sample image, and the fifth feature map of corresponding scales; The label of the first sample image is a result of labeling the objects of the first category contained in the first sample image;
    根据确定的所述第二样本图像中的所述第一类别的对象以及所述第二样本图像的标签之间的差异,调整所述神经网络的网络参数;所述第二样本图像的标签是对所述第二样本图像中包含的所述第一类别的对象进行标注的结果。Adjust the network parameters of the neural network according to the determined difference between the object of the first category in the second sample image and the label of the second sample image; the label of the second sample image is The result of labeling the objects of the first category included in the second sample image.
  29. 根据权利要求28所述的装置,其特征在于,所述装置还包括:The device according to claim 28, wherein the device further comprises:
    测试模块,用于对训练完成的神经网络进行测试;The test module is used to test the trained neural network;
    所述测试模块具体采用以下步骤对训练完成的神经网络进行测试:The test module specifically uses the following steps to test the trained neural network:
    分别对第一测试图像和第二测试图像进行多个不同尺度的特征提取,得到多个不同尺度的第一测试特征图和多个不同尺度的第二测试特征图;Performing multiple feature extractions of different scales on the first test image and the second test image, respectively, to obtain multiple first test feature maps of different scales and multiple second test feature maps of different scales;
    其中,所述第一测试图像和所述第二测试图像来源于一个测试图像集,所述测试图像集中的各个测试图像均包括同一类别的对象;Wherein, the first test image and the second test image are derived from a test image set, and each test image in the test image set includes objects of the same category;
    根据多个不同尺度的第一测试特征图和所述第一测试图像的标签,以及 相应尺度的所述第二测试特征图,确定所述第二测试图像中的待查询目标;所述第一测试图像的标签是对所述第一测试图像中包含的待查询目标进行标注的结果。According to a plurality of first test feature maps of different scales and labels of the first test image, and the second test feature maps of corresponding scales, the target to be queried in the second test image is determined; the first The label of the test image is the result of labeling the target to be queried contained in the first test image.
  30. 一种智能行驶装置,其特征在于,包括:An intelligent driving device, characterized in that it comprises:
    采集模块,用于采集道路图像;Acquisition module, used to acquire road images;
    查询模块,用于采用如权利要求1-14任一项所述的方法根据支持图像以及所述支持图像的标签对采集到的道路图像进行待查询目标的查询;其中,所述支持图像的标签是对所述支持图像中包含的与所述待查询目标同一类别的目标进行标注的结果;The query module is configured to use the method according to any one of claims 1-14 to query the collected road images for the target to be queried according to the support image and the tag of the support image; wherein the tag of the support image Is the result of labeling the target contained in the supporting image and the target in the same category as the target to be queried;
    控制模块,用于根据查询结果对采集道路图像的智能行驶设备进行控制。The control module is used to control the intelligent driving equipment that collects road images according to the query result.
  31. 一种目标检测设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-14任一项所述的方法。A target detection device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the program when the program is executed as in any one of claims 1-14 The method described.
  32. 一种智能行驶设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求15所述的方法。An intelligent driving device comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the method according to claim 15 when the program is executed.
  33. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-14任一项所述的目标检测方法,或者,该程序被处理器执行时实现权利要求15所述的智能行驶方法。A computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the target detection method according to any one of claims 1-14, or the program is executed by the processor When realizing the intelligent driving method of claim 15.
  34. 一种运行指令的芯片,其特征在于,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述权利要求1-14任一项所述的目标检测方法,或者,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述权利要求15所述的智能行驶方法。A chip for running instructions, characterized in that the chip includes a memory and a processor, the memory stores code and data, the memory is coupled to the processor, and the processor runs the code in the memory The chip is used to execute the target detection method of any one of claims 1-14, or the processor runs the code in the memory so that the chip is used to execute the method of claim 15 Intelligent driving method.
  35. 一种包含指令的程序产品,其特征在于,当所述程序产品在计算机上运行时,使得所述计算机执行上述权利要求1-14任一项所述的目标检测方法,或者,当所述程序产品在计算机上运行时,使得所述计算机执行上述权利要求15所述的智能行驶方法。A program product containing instructions, characterized in that, when the program product runs on a computer, the computer is caused to execute the target detection method according to any one of claims 1-14, or when the program When the product runs on a computer, the computer is made to execute the intelligent driving method described in claim 15.
  36. 一种计算机程序,其特征在于,当所述计算机程序被处理器执行时,用于执行上述权利要求1-14任一项所述的目标检测方法,或者,当所述计算 机程序被处理器执行时,用于执行上述权利要求15所述的智能行驶方法。A computer program, characterized in that, when the computer program is executed by a processor, it is used to execute the target detection method of any one of claims 1-14, or when the computer program is executed by a processor It is used to implement the intelligent driving method described in claim 15 above.
PCT/CN2020/123918 2019-10-31 2020-10-27 Target detection and intelligent driving methods and apparatuses, device, and storage medium WO2021083126A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021539414A JP2022535473A (en) 2019-10-31 2020-10-27 Target detection, intelligent driving methods, devices, equipment and storage media
KR1020217020811A KR20210098515A (en) 2019-10-31 2020-10-27 Target detection, intelligent driving method, apparatus, device and storage medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911063316.4 2019-10-31
CN201911054823.1 2019-10-31
CN201911063316.4A CN112749602A (en) 2019-10-31 2019-10-31 Target query method, device, equipment and storage medium
CN201911054823.1A CN112749710A (en) 2019-10-31 2019-10-31 Target detection and intelligent driving method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021083126A1 true WO2021083126A1 (en) 2021-05-06

Family

ID=75715793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123918 WO2021083126A1 (en) 2019-10-31 2020-10-27 Target detection and intelligent driving methods and apparatuses, device, and storage medium

Country Status (3)

Country Link
JP (1) JP2022535473A (en)
KR (1) KR20210098515A (en)
WO (1) WO2021083126A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313662A (en) * 2021-05-27 2021-08-27 北京沃东天骏信息技术有限公司 Image processing method, device, equipment and storage medium
CN113643239A (en) * 2021-07-15 2021-11-12 上海交通大学 Abnormity detection method, device and medium based on memory mechanism
CN113642415A (en) * 2021-07-19 2021-11-12 南京南瑞信息通信科技有限公司 Face feature expression method and face recognition method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255352A (en) * 2018-09-07 2019-01-22 北京旷视科技有限公司 Object detection method, apparatus and system
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109886286A (en) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 Object detection method, target detection model and system based on cascade detectors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109255352A (en) * 2018-09-07 2019-01-22 北京旷视科技有限公司 Object detection method, apparatus and system
CN109886286A (en) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 Object detection method, target detection model and system based on cascade detectors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GREGORY KOCH, ZEMEL RICHARD, SALAKHUTDINOV RUSLAN: "Siamese Neural Networks for One-shot Image Recognition", ICML DEEP LEARNING WORKSHOP. VOL. 2. 2015.; JULY 10 AND 11, 2015; LILLE GRANDE PALAIS , FRANCE, PROCEEDINGS OF THE 32 ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING, LILLE, FRANCE, 2015. JMLR: W&CP VOLUME 37, FR, vol. 2, 10 July 2015 (2015-07-10) - 11 July 2015 (2015-07-11), FR, pages 1 - 8, XP055445904 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313662A (en) * 2021-05-27 2021-08-27 北京沃东天骏信息技术有限公司 Image processing method, device, equipment and storage medium
CN113643239A (en) * 2021-07-15 2021-11-12 上海交通大学 Abnormity detection method, device and medium based on memory mechanism
CN113643239B (en) * 2021-07-15 2023-10-27 上海交通大学 Abnormality detection method, device and medium based on memory mechanism
CN113642415A (en) * 2021-07-19 2021-11-12 南京南瑞信息通信科技有限公司 Face feature expression method and face recognition method

Also Published As

Publication number Publication date
KR20210098515A (en) 2021-08-10
JP2022535473A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
JP7289918B2 (en) Object recognition method and device
Kamal et al. Automatic traffic sign detection and recognition using SegU-Net and a modified Tversky loss function with L1-constraint
WO2021083126A1 (en) Target detection and intelligent driving methods and apparatuses, device, and storage medium
WO2022126377A1 (en) Traffic lane line detection method and apparatus, and terminal device and readable storage medium
CN112528878A (en) Method and device for detecting lane line, terminal device and readable storage medium
US20230076266A1 (en) Data processing system, object detection method, and apparatus thereof
Wang et al. Centernet-auto: A multi-object visual detection algorithm for autonomous driving scenes based on improved centernet
JP2016062610A (en) Feature model creation method and feature model creation device
WO2022237139A1 (en) Lanesegnet-based lane line detection method and system
CN110781744A (en) Small-scale pedestrian detection method based on multi-level feature fusion
US11340700B2 (en) Method and apparatus with image augmentation
CN110956119B (en) Method for detecting target in image
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN116783620A (en) Efficient three-dimensional object detection from point clouds
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN114764856A (en) Image semantic segmentation method and image semantic segmentation device
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN112395962A (en) Data augmentation method and device, and object identification method and system
CN116188999A (en) Small target detection method based on visible light and infrared image data fusion
Muthalagu et al. Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks
WO2022217434A1 (en) Cognitive network, method for training cognitive network, and object recognition method and apparatus
Al Mamun et al. Efficient lane marking detection using deep learning technique with differential and cross-entropy loss.
CN112749602A (en) Target query method, device, equipment and storage medium
CN114627183A (en) Laser point cloud 3D target detection method
CN112749710A (en) Target detection and intelligent driving method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881806

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217020811

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021539414

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881806

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/09/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20881806

Country of ref document: EP

Kind code of ref document: A1