WO2021083126A1 - 目标检测、智能行驶方法、装置、设备及存储介质 - Google Patents

目标检测、智能行驶方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021083126A1
WO2021083126A1 PCT/CN2020/123918 CN2020123918W WO2021083126A1 WO 2021083126 A1 WO2021083126 A1 WO 2021083126A1 CN 2020123918 W CN2020123918 W CN 2020123918W WO 2021083126 A1 WO2021083126 A1 WO 2021083126A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
different scales
similarity
maps
Prior art date
Application number
PCT/CN2020/123918
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
吕书畅
程光亮
石建萍
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911054823.1A external-priority patent/CN112749710A/zh
Priority claimed from CN201911063316.4A external-priority patent/CN112749602A/zh
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020217020811A priority Critical patent/KR20210098515A/ko
Priority to JP2021539414A priority patent/JP2022535473A/ja
Publication of WO2021083126A1 publication Critical patent/WO2021083126A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle

Definitions

  • This application relates to the field of image processing, in particular to a method, device, device, and storage medium for target detection and intelligent driving.
  • Single-sample semantic segmentation is an emerging problem in the field of computer vision and intelligent image processing.
  • Single-sample semantic segmentation aims to use a single training sample of a certain category to make the segmentation model have the ability to recognize the pixel in the category.
  • the proposal of single-sample semantic segmentation can effectively reduce the cost of sample collection and annotation of traditional image semantic segmentation problems.
  • Single-sample image semantic segmentation aims to train only a single sample for a certain category of objects, so that the segmentation model has the ability to recognize all pixels of the object.
  • Target query can query the target contained in the image by means of image semantic segmentation.
  • Image semantic segmentation includes single-sample image semantic segmentation. Traditional image semantic segmentation requires a large number of training images for all categories of objects to ensure model performance, which brings extremely high labeling costs.
  • the purpose of this application is to provide a target detection and intelligent driving method, device, equipment, and storage medium to solve the existing technical problem of low target detection accuracy.
  • a target detection method which includes: extracting a plurality of features of different scales on a first image and a second image, respectively, to obtain a plurality of first feature maps of different scales and a plurality of features of different scales.
  • the second feature map ; according to multiple first feature maps of different scales and labels of the first image, and the second feature map of corresponding scales, determine the target to be queried in the second image; the first The label of an image is a result of labeling the target to be queried contained in the first image.
  • an intelligent driving method which includes: collecting road images; using the target detection method as described above to perform a search on the collected road images of the target to be queried according to the supporting image and the label of the supporting image. Query; wherein the label of the support image is the result of marking the target contained in the support image in the same category as the target to be queried; according to the query result, the intelligent driving device that collects the road image is controlled.
  • a target detection device which includes: a feature extraction module and a determination module; the feature extraction module is used to perform feature extraction of a plurality of different scales on a first image and a second image, respectively, Obtain a plurality of first feature maps of different scales and a plurality of second feature maps of different scales; the determining module is used to obtain a plurality of first feature maps of different scales, the labels of the first images, and the corresponding scales The second feature map in the second image determines the target to be queried in the second image; the label of the first image is the result of labeling the target to be queried contained in the first image.
  • an intelligent driving device which includes: a collection module for collecting road images; a query module for adopting the target detection method described above according to the support image and the tag pair of the support image.
  • the collected road images are used to query the target to be queried; wherein the label of the support image is the result of marking the target contained in the support image and the target of the same category as the target to be queried; the control module is used for The query result controls the intelligent driving equipment that collects road images.
  • a target detection device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements the above-mentioned program when the program is executed. Target detection method.
  • a smart driving device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program when the program is executed.
  • the smart driving method as described above.
  • a computer-readable storage medium is provided, and a computer program is stored thereon, and when the program is executed by a processor, the steps of the target detection method are realized, or when the program is executed by the processor, all the steps are realized. Describe the steps of the smart driving method.
  • a chip for running instructions includes a memory and a processor, the memory stores code and data, the memory is coupled with the processor, and the processor runs all The code in the memory enables the chip to execute the steps of the above-mentioned target detection method, or the processor runs the code in the memory so that the chip is used to execute the steps of the above-mentioned smart driving method.
  • a program product containing instructions is provided.
  • the computer executes the steps of the above-mentioned target detection method, or when the program product runs on a computer
  • the computer is made to execute the steps of the smart driving method described above.
  • a computer program is provided.
  • the computer program is executed by a processor, it is used to execute the steps of the above-mentioned target detection method, or when the computer program is executed by a processor, To perform the steps of the smart driving method described above.
  • the feature expression ability of the first image and the second image is improved, so that more judgments can be obtained.
  • the similarity information between the first image and the second image enables the subsequent target detection to have a richer feature input when facing a single sample, thereby improving the segmentation accuracy of the single-sample semantic segmentation, thereby improving the target detection accuracy.
  • FIG. 1 is a flowchart of a target detection method provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a target detection model provided by an embodiment of the application.
  • FIG. 3 is a flowchart of a target detection method provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a symmetric cascade structure provided by an embodiment of the application.
  • FIG. 5 is a flowchart of a target detection method provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a target detection model provided by another embodiment of this application.
  • FIG. 7 is a schematic flowchart of a target query method provided by another embodiment of this application.
  • FIG. 8 is a schematic flowchart of a target query method provided by another embodiment of this application.
  • FIG. 9 is a schematic flowchart of a target query method provided by still another embodiment of this application.
  • FIG. 10 is a schematic flowchart of a target query method provided by another embodiment of this application.
  • FIG. 11 is a schematic flowchart of a smart driving method provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of a target detection process provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of a generation module and an aggregation module provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of comparison between the similarity feature extraction method in the target query method provided by the embodiment of the application and the extraction method in the related technology;
  • FIG. 15 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • FIG. 16 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • Fig. 18 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
  • the single-sample image semantic segmentation deep learning model is to perform feature extraction on the query set image and the support set image respectively, where the query set image is the image that needs to be queried, and the support set image contains the target to be queried.
  • the target to be queried in the support set image is labeled in advance to obtain the label information.
  • Combining the label information, the target in the query set image is determined by the similarity between the features of the support set image and the feature of the query set image.
  • the deep learning model expresses the support set image as a single feature vector, and the feature expression ability of the support set image is limited, which leads to the insufficient ability of the model to describe the similarity between the support set image feature and the query image pixel feature. , Resulting in low accuracy of the target query.
  • the first image may be the above-mentioned support set image
  • the second image may be the above-mentioned query set image.
  • the first image and the second image are extracted with multiple features of different scales.
  • the second image are expressed as multiple features of different scales, which improves the feature expression ability of the first image and the second image, so that more information for judging the similarity between the first image and the second image can be obtained, and then Improve the accuracy of target query.
  • FIG. 1 is a flowchart of a target detection method provided by an embodiment of the application.
  • the embodiments of the present application provide a target detection method. The specific steps of the method are as follows:
  • Step 101 Perform multiple feature extractions of different scales on the first image and the second image, respectively, to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
  • the second image is an image for which a target query needs to be performed.
  • the target query the pixel area where the target to be queried contained in the second image is located can be detected.
  • the target to be queried can be determined according to actual conditions, for example, it can be an animal, plant, person, vehicle, etc., which is not limited here.
  • the label information may be contour information, pixel information, etc. of the target to be queried in the first image, which is not limited here.
  • the tag information may be a binarized tag, and the pixel value of the pixel area where the target is located in the binarized tag is different from the pixel values of other areas in the image.
  • the target detection method of this embodiment can be applied to the target detection process of a vehicle.
  • the vehicle can be an autonomous vehicle or a vehicle equipped with an Advanced Driver Assistance Systems (ADAS) system. It is understandable that the target detection method can also be applied to robots.
  • ADAS Advanced Driver Assistance Systems
  • the first image and the second image may be acquired by an image acquisition device on the vehicle, and the image acquisition device may be a camera, such as a monocular camera, a binocular camera, and the like.
  • the first image can be extracted with multiple features of different scales through the feature extraction algorithm to obtain multiple first feature maps of different scales; the second image can be extracted with multiple features of different scales to obtain multiple The second feature map of different scales.
  • the feature extraction algorithm can be CNN (Convolutional Neural Networks, convolutional neural network) algorithm, LBP (Local Binary Pattern, local binary pattern) algorithm, SIFT (Scale-invariant feature transform, scale-invariant feature transform) algorithm, HOG (Histogram of Oriented Gradient, directional gradient histogram) algorithm, etc., are not limited here.
  • the target detection method of this embodiment can be applied to the target detection model shown in FIG. 2.
  • the target detection model 20 includes: a feature extraction network 21, a scale transformation module 22 and a convolution network 23.
  • the feature extraction network 21 is a neural network, and the feature extraction network 21 can adopt an existing network architecture, such as a VGG (Visual Geometry Group) network, a Resnet network, or other general image feature extraction networks.
  • the first image and the second image can be input into the feature extraction network 21 at the same time for feature extraction of multiple different scales; or two feature extraction networks 21 can be set up, and the two feature extraction networks 21 have the same network architecture and For network parameters, the first image and the second image are respectively input to the two feature extraction networks 21 to perform feature extraction of multiple different scales on the first image and the second image, respectively.
  • multiple different scales can be pre-designated, and for each scale, feature extraction of the scale is performed on the first image and the second image respectively to obtain the first feature map and the second feature map of the scale.
  • Step 102 Determine the target to be queried in the second image according to the labels of the first feature map and the first image of multiple different scales, and the second feature map of the corresponding scale; Contains the results of marking the target to be queried.
  • the label information of the first image can be combined to obtain a similarity map that characterizes the similarity between the first feature map and the second feature map of the scale. . Then, through similarity maps of different scales, the target to be queried in the second image can be determined.
  • multiple first feature maps of different scales and multiple second feature maps of different scales are obtained; according to multiple first feature maps of different scales A feature map and the label of the first image, as well as the second feature map of the corresponding scale, determine the target to be queried in the second image; the label of the first image is to label the target to be queried contained in the first image result.
  • the feature expression ability of the first image and the second image is improved, so that more judgments about the similarity between the first image and the second image can be obtained.
  • the first image contains the target of the same type as the target to be queried
  • the first image contains the posture, texture, color and other information of the target of the same type as the target to be queried. It may be different from the posture, texture, color and other information of the target included in the first image and of the same type as the target to be queried.
  • the target to be queried is traffic lights
  • the traffic lights contained in the first image are arranged vertically
  • traffic lights in the second image the traffic lights in the second image can be arranged horizontally
  • the traffic lights are arranged in the first image and the second image.
  • the state in the image can be inconsistent.
  • multiple feature extractions of different scales are performed on the first image and the second image respectively to obtain multiple first feature maps of different scales and multiple second feature maps of different scales, including:
  • Step 301 Perform feature extraction on the first image and the second image respectively to obtain a first feature map and a second feature map.
  • the feature extraction network 21 includes a first convolution module 211, a second convolution module 212, and a third convolution module 213.
  • the first convolution module 211 includes three convolution layers connected in sequence
  • the second convolution module 212 and the third convolution module 213 each include one convolution layer.
  • the first image and the second image can be simultaneously input into the first convolution module 211 shown in FIG. 2, and the first convolution module 211 respectively outputs corresponding feature extraction results according to the first image and the second image. Then the first convolution module 211 outputs the feature extraction results respectively according to the first image and the second image and then inputs it into the second convolution module 212.
  • the second convolution module 212 according to the first convolution module 211 is based on the first image and the first image.
  • the feature extraction results of the two images respectively output the corresponding feature extraction results, and then the second convolution module 212 according to the first convolution module 211 based on the feature extraction results of the first image and the second image respectively output the feature extraction results and then input the second image
  • the third convolution module 213 continues to perform feature extraction according to the feature extraction result output by the second convolution module 212, so as to output the feature extraction result of the first image and the feature extraction result of the second image respectively , Are the first feature map and the second feature map, respectively.
  • Step 302 Perform multiple scale transformations on the first feature map and the second feature map to obtain multiple first feature maps of different scales and multiple second feature maps of different scales.
  • the first feature map and the second feature map are respectively input to the scale transformation module 22 to perform multiple scale conversions on the first feature map and the second feature map respectively through the scale transformation module 22, thereby respectively
  • the first image and the second image are expressed as multiple feature maps of different sizes.
  • performing multiple scale conversions on the first feature map and the second feature map respectively includes: performing down-sampling on the first feature map and the second feature map at least twice, respectively.
  • performing down-sampling on the first feature map and the second feature map at least twice, respectively includes: down-sampling the first feature map and the second feature map at the first sampling rate, respectively, to obtain a lower sampling rate than the first image.
  • the first sampling rate to downsample the first feature map to obtain the first feature map that is downsampled by the first multiple of the first image; then use the second sampling rate to continue to compare the first image downsampled by the first multiple
  • the first feature map is down-sampled to obtain a first feature map that is down-sampled by a second multiple of the first image, where the second multiple is greater than the first multiple.
  • the first sampling rate is also used to downsample the second feature map to obtain the second feature map that is downsampled by the first multiple of the second image; then the second sampling rate is used to continue to compare the first feature map.
  • the second feature map that is down-sampled by the second multiple of the two images is down-sampled to obtain a second feature map that is down-sampled by the second multiple of the second image.
  • the first feature map and the second feature map are down-sampled using the first sampling rate, respectively, to obtain the first feature map that is down-sampled by a first multiple of the first image and a second multiple that is down-sampled than the second image.
  • the method of the embodiment of the present application further includes: using a third sampling rate to compare the first feature map that is down-sampled by a second multiple of the first image and the second feature that is down-sampled by a second multiple of the second image.
  • the image is down-sampled to obtain a first feature map that is down-sampled by a third multiple of the first image and a second feature map that is down-sampled by a third multiple of the second image, and the third multiple is greater than the second multiple.
  • the first multiple, the second multiple, and the third multiple are 8 times, 16 times, and 32 times, respectively.
  • the scale conversion module 22 may adopt a symmetrical cascade structure.
  • the symmetrical cascade structure includes two cascade structures arranged symmetrically with each other, wherein each cascade structure includes successively Three connected sampling units.
  • the two cascade structures are referred to as the first cascade structure 41 and the second cascade structure 42 respectively, and the three sampling units included in the first cascade structure are respectively referred to as the first sampling unit and the second cascade structure.
  • the second sampling unit and the third sampling unit; the three sampling units included in the second cascade structure are called the fourth sampling unit, the fifth sampling unit, and the sixth sampling unit, respectively.
  • the sampling rates of the first sampling unit and the fourth sampling unit are the same, the sampling rates of the second sampling unit and the fifth sampling unit are the same, and the sampling rates of the third sampling unit and the sixth sampling unit are the same.
  • the first sampling unit and the fourth sampling unit respectively use the first sampling rate to sample the first feature map and the second feature map, thereby outputting the first image and the second image that are down-sampled by 8 times compared to the first image and the second image.
  • the symmetric cascade structure shown in FIG. 4 may be used to perform multiple scale conversions on the first feature map and the second feature map respectively.
  • the first feature map is input into the first sampling unit, the second sampling unit, and the third sampling unit separately and sequentially to pass the first sampling unit, respectively.
  • a sampling unit, a second sampling unit, and a third sampling unit perform down-sampling at different sampling rates, thereby outputting a first feature map that is down-sampled 8 times, 16 times, and 32 times compared to the size of the first image.
  • the second feature map is input into the fourth sampling unit, the fifth sampling unit, and the sixth sampling unit respectively and sequentially to pass the The fourth sampling unit, the fifth sampling unit, and the sixth sampling unit perform down-sampling at different sampling rates, thereby outputting a second feature map that is down-sampled 8 times, 16 times, and 32 times compared to the size of the second image.
  • first cascade structure 41 and second cascade structure 42 may also be a two-level cascade structure.
  • first cascade structure 41 and the second cascade structure 42 each include two cascades connected in sequence. Sampling unit.
  • determining the target to be queried in the second image according to a plurality of first feature maps of different scales and labels of the first image, and a second feature map of corresponding scales includes: according to a plurality of first feature maps of different scales The label of the feature map and the first image determines multiple first feature vectors of different scales; multiple first feature vectors of different scales and second feature maps of corresponding scales are calculated according to a preset calculation rule to obtain a calculation result; According to the calculation result, the mask image of the second image is determined; according to the mask image, the target to be queried in the second image is determined.
  • the preset calculation rules include: inner product calculation rules, or cosine distance calculation rules.
  • the label of the first image refers to information indicating the target or the category of the object in the image.
  • the first feature map of each scale and the label of the first image can form a feature vector, for example, the first image is downsampled by 8 times. , 16 times and 32 times the first feature map and the label of the first image are interpolated to form a feature vector, hereinafter referred to as the first feature vector, the second feature vector and the third feature vector, and then the first feature The vector sums the second feature map downsampled 8 times compared to the second image to perform inner product operation, the second feature vector and the second feature map downsampled 16 times compared to the first image to perform the inner product operation, and the third The feature vector and the second feature map downsampled 32 times compared to the first image are subjected to the inner product operation to obtain three probability maps of different scales.
  • the sizes of the three probability maps of different scales are respectively the same as those of the first feature vector and the first feature vector.
  • the size of the second feature vector and the third feature vector are the same. It can also be considered that the sizes of the three probability maps of different scales are compared to the first image or the second image by down-sampling 8 times, 16 times, and 32 times the first feature. The size of the figure or the second characteristic figure is the same. After that, these three probability maps are input to the convolutional network 23, and the convolutional network 23 connects the three probability maps and convolves the connected images, so as to output the mask image mask of the second image to achieve The target detection effect on the second image.
  • determining the target to be queried in the second image includes: first feature maps of multiple different scales
  • the feature map, the label of the first image, and the second feature map of the corresponding scale are used as the guidance information of the third feature map of the corresponding scale to determine the image to be queried in the second image; wherein the third feature map is determined according to the second image, And the second feature map and the third feature map of the same scale are different.
  • this embodiment adds a third feature map to guide the inner product operation results of different scales obtained in the foregoing embodiment, thereby further improving the accuracy of subsequent target detection.
  • the three feature maps can use other feature extraction networks other than the feature extraction network 21 shown in FIG. 2 for feature extraction.
  • the network architecture and network parameters of the feature extraction network of the third feature map are the same as those of the first and second feature maps.
  • the architecture and network parameters are different, for example, the convolution kernel is different.
  • FIG. 5 is a flowchart of a target detection method provided by another embodiment of this application.
  • the target detection method provided in this embodiment specifically includes the following steps:
  • Step 501 Determine multiple first feature vectors of different scales according to multiple first feature maps of different scales and labels of the first images.
  • Step 502 Calculate multiple first feature vectors of different scales and second feature maps of corresponding scales according to a preset calculation rule to obtain multiple mask images of different scales.
  • the mask image obtained in this step will be used as guidance information to guide the third feature map.
  • Step 503 Determine the target to be queried in the second image according to the multiplication result of the multiple mask images of different scales and the third feature map of corresponding scales.
  • the multiplication of multiple mask images of different scales with the third feature map of the corresponding scale refers to the value (scalar) of the mask image at the same position in the mask image of the same scale and the third feature map. Multiply the value (vector) of the third feature map.
  • the method of this embodiment can be applied to the detection model shown in FIG. 6.
  • the detection model shown in FIG. 6 is different from the detection model shown in FIG. 2 in that it is based on the feature extraction network 21 shown in FIG. Some convolutional layers are added, and a third cascade structure is added on the basis of the symmetric cascade structure shown in FIG. 2.
  • the structure of the third cascade structure is the same as the structure of the first cascade structure or the second cascade structure, and its implementation principle can be referred to the introduction of the foregoing embodiment.
  • the detection model 60 includes a feature extraction network 61, a scale conversion module 62 and a convolutional network 63.
  • the feature extraction network 61 includes a fourth convolution module 611, a fifth convolution module 612, a sixth convolution module 613, a seventh convolution module 614, an eighth convolution module 615, a ninth convolution module 616, and a Ten convolution module 617.
  • the sixth convolution module 613 (the third convolution module 213 in FIG. 2) is also connected to the seventh convolution module 614 and the fourth convolution module.
  • the eighth convolution module 615, the ninth convolution module 616, and the tenth convolution module 617 are sequentially connected.
  • the outputs of the sixth convolution module 613 and the seventh convolution module 614 are also used as the input of the eighth convolution module 615 and the ninth convolution module 616, respectively.
  • the output of the tenth convolution module 617 is used as the input of the third cascade structure 33.
  • the seventh convolution module 614 performs feature extraction according to the output results of the sixth convolution module 613 to obtain the first feature map and the second feature map, and then input the scale conversion module 62.
  • the scale conversion module 62 is similar to the one shown in FIG.
  • the scale conversion module 22 has the same structure and principle.
  • the scale conversion module 62 performs different scale conversions on the first feature map and the second feature map.
  • the label information of the first image is also input into the scale conversion module 62.
  • the scale conversion module 62 outputs a plurality of mask images mask32x, mask16x, and mask8x of different scales according to the first feature map, the second feature map of different scales, and the label information of the first image.
  • Mask32x, mask16x, and mask8x respectively represent The mask image is down-sampled 32 times, 16 times, and 8 times than the first feature map or the second feature map.
  • the mask images mask32x, mask16x, and mask8x output by the scale conversion module 62 are then down-sampled by the second image by 8 times, 16 times, and 32 times compared with the second image output by the third cascade structure to perform corresponding pixels.
  • the multiplication operation at the position results in three probability maps. After that, the three probability maps are input into the convolutional network to perform operations such as convolution, so as to realize the target detection of the second image.
  • the feature map extracted by the sixth convolution module 613 can also be directly input into the third cascade structure.
  • this embodiment may also directly input the feature map for the first image and the feature map for the second image output by the sixth convolution module 613 into the first cascade structure and the second cascade structure, respectively.
  • the first convolution module, the second convolution module, and the third convolution module shown in FIG. 2 are a standard VGG network architecture. Those skilled in the art can use the VGG network architecture shown in FIG. 2 according to actual needs.
  • the number of convolution modules is increased or decreased.
  • a plurality of first feature vectors of different scales are determined according to a plurality of first feature maps of different scales and the labels of the first images, and then the plurality of first feature vectors of different scales are combined with the second feature vectors of corresponding scales.
  • the feature map is calculated according to a preset calculation rule to obtain a calculation result, a mask image of the second image is determined according to the calculation result, and a target to be queried in the second image is determined according to the mask image.
  • Multiple mask images at different scales can guide the similarity of the segmentation of the second feature map at the corresponding scale (the mask images mask32x, mask16x, mask8x output by the scale conversion module 62 and the third cascade structure are based on the second image
  • the output second feature map which is down-sampled by 8, 16, and 32 times compared to the second image, is multiplied at the corresponding pixel position).
  • the sixth convolution module since the output result of the fifth convolution module 612 on the second image is input to the sixth convolution module, the sixth convolution module can be based on the output result of the fifth convolution module. After fusion with the output result of the second image, feature extraction is performed again. In this way, richer feature information can be extracted, and during backpropagation, the feedback loss function can also carry richer information, making it more Adjust the network parameters of each convolution module in the feature extraction network. Therefore, in the subsequent target detection process, the detection accuracy of the detection model can also be further improved.
  • FIG. 7 is a schematic flowchart of a target detection method provided by another embodiment of this application. This embodiment describes in detail the specific implementation process of determining the target to be queried in the second image based on multiple first feature maps of different scales and label information of the first image, and second feature maps of corresponding scales. As shown in Figure 7, the method includes:
  • S701 Perform feature extraction of multiple different scales on the first image and the second image respectively, and generate multiple first feature maps of different scales and multiple second feature maps of different scales.
  • S701 is similar to S101 in the embodiment of FIG. 1, and will not be repeated here.
  • S702. Determine multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales; a similarity map of one scale represents the first feature of the scale The similarity between the graph and the second feature graph.
  • the similarity map of each scale contains the similarity information of the features between the first feature map and the second feature map of the scale.
  • S702 may include: determining a plurality of first feature vectors of different scales according to label information of a plurality of first feature maps of different scales and the first image; and comparing the plurality of first feature vectors of different scales with corresponding scales.
  • the second feature map of is multiplied element by element to obtain multiple similarity maps of different scales.
  • the first feature map of the scale and the label information of the first image may be multiplied to obtain the first feature vector of the scale. Then the first feature vector of this scale and the second feature map of this scale are multiplied element by element to obtain the similarity map of this scale.
  • the similarity map of this scale a vector is used at each pixel location to express the similarity of the first feature vector and the second feature map at that location.
  • This embodiment generates similarity maps of different scales by multiplying multiple first feature vectors of different scales and second feature maps of corresponding scales element by element, and replaces the inner product or cosine distance method by multiplying element by element. , Can make the similarity map of each scale contain multi-channel similarity information, make the similarity feature expression more fully, and further improve the accuracy of the target query.
  • similarity maps of different scales can be converted into similarity maps of the same scale through upsampling, and then integrated to obtain an integrated similarity map.
  • it can be implemented by either of the following two implementation manners, which will be described separately below.
  • S703 may include: up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale; adding multiple similarity maps of the same scale to obtain the integrated Similarity graph.
  • multiple similarity maps of different scales may be respectively up-sampled into the same scale, and then added, so as to obtain the integrated similarity.
  • the scales of the three are m1, m2, m3, where m1>m2>m3. Then you can up-sample B and C separately, increase the scales of B and C to m1, and then add A and the up-sampled B and C to obtain the integrated similarity map. At this time, integrate The scale of the subsequent similarity map is m1.
  • S703 may include:
  • A, B, and C Take three similarity graphs as an example to illustrate the implementation.
  • the scales of the three are m1, m2, m3, where m1>m2>m3.
  • C can be up-sampled first, and the scale of C can be increased to m2, and then B and the up-sampled C can be added to obtain a new similarity map D.
  • the scale of D is m2. Then D is up-sampled, the scale of D is increased to m1, and A and the up-sampled D are added to obtain the final integrated similarity map.
  • S704 Determine the target to be queried in the second image according to the integrated similarity map.
  • S704 is similar to S102 in the embodiment of FIG. 1, and will not be repeated here.
  • multiple similarity maps of different scales are determined based on multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, and then the multiple similarity maps of different scales are integrated , Obtain the integrated similarity map, and then determine the target to be queried in the second image according to the integrated similarity map, which can integrate multiple similarities at different scales, so that the integrated similarity includes multiple scales To further improve the accuracy of the target query.
  • FIG. 8 is a schematic flowchart of a target detection method provided by another embodiment of this application.
  • the difference between this embodiment and the embodiment in FIG. 7 is that after determining multiple similarity maps of different scales in S702, before integrating the multiple similarity maps of different scales in S703, the multiple similarity maps of different scales are combined with corresponding The third feature map of the scale is multiplied element by element to obtain multiple similarity maps of different scales after processing.
  • the method includes:
  • S801 Perform feature extraction of multiple different scales on the second image and the first image respectively, and generate multiple first feature maps of different scales and multiple second feature maps of different scales.
  • S801 is similar to S101 in the embodiment of FIG. 1, and will not be repeated here.
  • S802 is similar to S702 in the embodiment of FIG. 7, and will not be repeated here.
  • S804 is similar to S704 in the embodiment of FIG. 7, and will not be repeated here.
  • S805 Determine the target to be queried in the second image according to the integrated similarity map.
  • a plurality of similarity maps of different scales determined according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales are used to compare the third feature maps of the second image.
  • the image is multiplied element by element, and multiple similarity maps of different scales can be used to guide the segmentation of the second image, thereby further improving the accuracy of the target query.
  • Fig. 9 is a flowchart of a target detection method provided by an embodiment of the present application.
  • the target detection method of the foregoing embodiment is executed by a neural network, which is trained by the following steps:
  • Step 901 Perform feature extraction of a plurality of different scales on the first sample image and the second sample image respectively to obtain a plurality of fourth feature maps of different scales and a plurality of fifth feature maps of different scales; among them, the first is the same Both the present image and the second sample image contain objects of the first category.
  • Step 902 Determine the object of the first category in the second sample image according to the labels of the fourth feature map and the first sample image of multiple different scales, and the fifth feature map of the corresponding scale;
  • the label is the result of labeling the objects of the first category contained in the first sample image.
  • Step 903 Adjust the network parameters of the neural network according to the determined difference between the object of the first category in the second sample image and the label of the second sample image; The result of labeling objects of the first category.
  • the above-mentioned target query method is realized by a neural network, and the neural network may be trained first before the target query is performed.
  • a first sample image and a second sample image containing objects of the same category can be obtained from a training set containing multiple sample images, and this object is the target to be queried in the training process.
  • the training set may include multiple subsets, and the sample images in each subset contain objects of the same category.
  • the categories may include vehicles, pedestrians, traffic lights (ie, traffic lights), etc.
  • the acquired first sample image and second sample image may both include traffic lights. Use the traffic lights as the target to be queried during this training. Label the traffic lights in the first sample image to obtain the label of the first sample image. Label the traffic lights in the second sample image to obtain the label of the second sample image.
  • the training process of this embodiment is similar to the process of the target detection method of the foregoing embodiment, and the specific implementation process can refer to the introduction of the foregoing embodiment.
  • the first sample image and the second sample image need to contain objects of the same category to train the neural network so that the neural network can recognize the association between images of the same category.
  • traffic lights can be used to train the neural network
  • street lights can be used to test the neural network or to apply the neural network.
  • FIG. 10 is a schematic flowchart of a target detection method provided by still another embodiment of this application.
  • the test method of the trained neural network in the embodiment of FIG. 9 is described in detail.
  • the method may further include:
  • test images including objects of the same category may be pre-formed into a test image set, and multiple test image sets may be formed into a total test set.
  • the first test image and the second test image are selected from a set of test images, and the neural network is tested through the first test image and the second test image.
  • the neural network can be tested through the first test image and the second test image containing street lights.
  • one sample can be selected as the first test image for each test category in the test image set.
  • one image is selected as the first test image for each category (a total of 20 categories).
  • One test image is then input into the model shown in Figure 2 or Figure 5 for evaluation, where the test image in the test data pair Contains the same type of target.
  • the test may be performed after 100 trainings, or the test may be performed after 120 trainings.
  • the target method of this embodiment can also be accurately detected.
  • the method of randomly selecting test data pairs in the embodiments of the present application can also reduce the task’s strong dependence on samples, and can also accurately detect types of samples that are difficult to collect in actual application scenarios, and can avoid traditional randomly selected test pairs.
  • the problem of uneven selection of categories is caused, and it also solves the problem of floating evaluation indicators due to the different quality of support samples. For example: in the target detection task in automatic driving, a certain target category in the scene that does not provide a large number of training samples can also be accurately detected.
  • FIG. 11 is a schematic flowchart of a smart driving method provided by an embodiment of the application. As shown in Figure 11, the method may include:
  • S1103 Control the smart driving device that collects road images according to the query result.
  • the smart driving device may include an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistant System (ADAS), a robot, and the like.
  • ADAS Advanced Driving Assistant System
  • the road image is used as the above-mentioned second image, and the supporting image is used as the above-mentioned first image. Then the intelligent driving equipment is controlled according to the target detection result.
  • intelligent driving equipment such as autonomous vehicles or robots to perform operations such as deceleration, braking, and steering, or to send instructions such as deceleration, braking, and steering to the driver of an ADAS-equipped vehicle. For example, if the query result shows that the traffic indicator in front of the smart driving device is red, the smart driving device is controlled to slow down and stop. If the query result shows that there is a pedestrian in front of the smart driving device, the smart driving device is controlled to brake.
  • FIG. 12 is a schematic diagram of a target detection process provided by an embodiment of this application.
  • the first image is input to the first convolutional neural network to obtain multiple first feature maps of different scales of the first image
  • the second image is input to the second convolutional neural network to obtain multiple second features of different scales of the second image Figure.
  • the second feature map of the second image, the first feature map of the first image, and the label information of the first image are input to the generating module to obtain similarity maps of multiple scales.
  • the similarity maps of multiple scales are input to the aggregation module to obtain the integrated similarity map.
  • Input the integrated similarity map to the third convolutional neural network to obtain the semantic segmentation map of the second image, so as to realize the target detection of the second image.
  • FIG. 13 is a schematic diagram of a generation module and an aggregation module provided by an embodiment of the application.
  • conv represents the convolutional layer
  • pool represents the pooling process.
  • the feature map of the first image is input to the first convolution channel of the generating module 131 to obtain multiple first feature maps of different scales.
  • the feature map of the second image is input to the second convolution channel of the generating module 131 to obtain a plurality of second feature maps of different scales, which are then multiplied and pooled with the label information of the first image to obtain the image of the first image. Multiple feature vectors of different scales.
  • Multiple feature maps of different scales of the second image are respectively multiplied element by element with feature vectors of corresponding scales to obtain multiple similarity maps of different scales.
  • the generating module 131 outputs multiple similarity maps of different scales to the aggregation module 132, and the aggregation module 132 integrates the multiple similarity maps of different scales, and outputs the integrated similarity maps.
  • FIG. 14 is a schematic diagram of comparison between the similarity feature extraction method and the similarity feature extraction method through inner product or cosine distance in the target detection method provided by an embodiment of the application.
  • the left part of the figure is a schematic diagram of similarity features extracted by inner product or cosine distance.
  • the right part of the figure is a schematic diagram of extracting similarity features by multiplying the vectors of corresponding pixel positions.
  • the method proposed in the embodiment of the present application uses a method of element-wise multiplication to change the output similarity map from a single channel to a multi-channel, which can retain the channel information of the similarity information, and at the same time It can be combined with subsequent convolution and nonlinear operations to further rationally express similarity features, thereby further improving the accuracy of target detection.
  • FIG. 15 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • the target detection device provided by the embodiment of the present application can execute the processing flow provided in the embodiment of the target detection method.
  • the target detection device 150 provided in this embodiment includes: a feature extraction module 151 and a determination module 152;
  • the extraction module 151 is used for extracting multiple features of different scales on the first image and the second image to obtain multiple first feature maps of different scales and multiple second feature maps of different scales;
  • the determining module 152 uses To determine the target to be queried in the second image according to the labels of the first feature map and the first image of multiple different scales, and the second feature map of the corresponding scale; The result of marking the target to be queried.
  • the feature extraction module 151 performs feature extraction of multiple different scales on the first image and the second image respectively to obtain multiple first feature maps of different scales and multiple second feature maps of different scales, specifically It includes: extracting features of the first image and the second image respectively to obtain the first feature map and the second feature map; respectively performing multiple scale transformations on the first feature map and the second feature map to obtain multiple first feature maps of different scales.
  • the feature extraction module 151 when the feature extraction module 151 performs multiple scale transformations on the first feature map and the second feature map respectively, it specifically includes: performing down-sampling on the first feature map and the second feature map at least twice, respectively.
  • the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, it specifically includes: The first feature maps of different scales and the labels of the first image determine multiple first feature vectors of different scales; the multiple first feature vectors of different scales and the second feature maps of corresponding scales are combined according to a preset calculation rule The calculation is performed to obtain the calculation result; the mask image of the second image is determined according to the calculation result; the target to be queried in the second image is determined according to the mask image.
  • the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and labels of the first image, and second feature maps of corresponding scales, it specifically includes: The first feature map of different scales, the label of the first image, and the second feature map of the corresponding scale are used as the guidance information of the third feature map of the corresponding scale to determine the image to be queried in the second image; wherein the third feature map is based on The second image is determined, and the second feature map and the third feature map of the same scale are different.
  • the determining module 152 uses multiple first feature maps of different scales, labels of the first image, and second feature maps of corresponding scales as the guidance information of the third feature maps of corresponding scales to determine the to-be-determined image in the second image.
  • the query map specifically includes: determining a plurality of first feature vectors of different scales according to a plurality of first feature maps of different scales and the labels of the first images; and combining the plurality of first feature vectors of different scales with the second feature vectors of corresponding scales.
  • the feature map is calculated according to preset calculation rules to obtain multiple mask images at different scales; according to the result of multiplying multiple mask images with different scales and the third feature map of the corresponding scale, the second image is determined Query target.
  • the preset calculation rules include: inner product calculation rules, or cosine distance calculation rules.
  • the determining module 152 determines the target to be queried in the second image according to multiple first feature maps of different scales and label information of the first image, and second feature maps of corresponding scales, which specifically includes: The first feature map of different scales, the label information of the first image, and the second feature map of the corresponding scale determine multiple similarity maps of different scales; a similarity map of one scale represents the first feature map and the second feature of the scale Similarity of the graphs; integrate multiple similarity graphs of different scales to obtain an integrated similarity graph; determine the target to be queried in the second image according to the integrated similarity graph.
  • the determining module 152 determines multiple similarity maps of different scales according to multiple first feature maps of different scales, label information of the first image, and second feature maps of corresponding scales, which specifically includes: The first feature map and the label information of the first image are determined to determine multiple first feature vectors of different scales; the multiple first feature vectors of different scales and the second feature maps of corresponding scales are multiplied element by element to obtain multiple Similarity graphs of different scales.
  • the determining module 152 integrates multiple similarity maps of different scales to obtain an integrated similarity map, which specifically includes: up-sampling multiple similarity maps of different scales to obtain multiple similarity maps of the same scale.
  • the determining module 152 integrates a plurality of similarity maps of different scales to obtain an integrated similarity map, which specifically includes: a plurality of similarity maps of different scales constitute a similarity map set; The smallest similarity map is up-sampled to obtain a similarity map of the same scale as the second-smallest similarity map; add the obtained similarity map to the second-smallest similarity map to obtain a new similarity map ; The similarity map that has not been up-sampled or added in the similarity map set is combined with the new similarity map to form a new similarity map set, and the up-sampling step and the adding step are repeated until the last one is obtained Similarity graph, the last obtained similarity graph is the integrated similarity graph.
  • the determining module 152 is further configured to: multiply a plurality of similarity maps of different scales and a third feature map of corresponding scales element by element to obtain a plurality of processed similarity maps of different scales; wherein, the third The feature map is determined according to the second image, and the first feature map and the third feature map of the same scale are different; the processed similarity maps of different scales are integrated to obtain an integrated similarity map.
  • the target detection device is implemented by a neural network
  • the device further includes: a training module 153 for training to obtain a neural network by using the following steps.
  • This step includes: performing multiple operations on the first sample image and the second sample image, respectively.
  • the feature extraction of different scales obtains multiple fourth feature maps of different scales and multiple fifth feature maps of different scales; wherein, the first sample image and the second sample image both contain objects of the first category;
  • the labels of the fourth feature map and the first sample image of different scales, and the fifth feature map of the corresponding scale determine the first category of objects in the second sample image; the labels of the first sample image are the same as the first The result of labeling the objects of the first category contained in this image; adjust the network parameters of the neural network according to the determined difference between the objects of the first category in the second sample image and the labels of the second sample image;
  • second The label of the sample image is the result of labeling the objects of the first category contained in the second sample image.
  • the device further includes: a testing module 154 for testing the trained neural network; the testing module specifically uses the following steps to test the trained neural network: the first test image and the second test image are respectively tested Perform feature extraction of multiple different scales to obtain multiple first test feature maps of different scales and multiple second test feature maps of different scales; wherein the first test image and the second test image are derived from a test image set, Each test image in the test image set includes objects of the same category; according to multiple first test feature maps of different scales, labels of the first test images, and second test feature maps of corresponding scales, determine the second test image
  • the target to be queried; the label of the first test image is the result of labeling the target to be queried contained in the first test image.
  • the target detection device provided in the embodiment of the present application can be used to implement the above-mentioned target detection method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here in this embodiment.
  • FIG. 16 is a schematic structural diagram of a smart driving device provided by an embodiment of the application.
  • the intelligent driving device 160 provided in this embodiment includes: an acquisition module 161, a query module 162, and a control module 163; wherein, the acquisition module 161 is used to collect road images; the query module 162 is used to adopt the application In the target detection method provided by the embodiment, the collected road images are searched for the target to be queried according to the support image and the label of the support image; wherein the label of the support image is for the target contained in the support image and the target of the same category as the target to be queried. The result of labeling; the control module 163 is used to control the intelligent driving device that collects road images according to the query result.
  • the implementation of the smart driving device provided in the embodiment of the present application can refer to the foregoing smart driving method, and the implementation principle and technical effect are similar, and the details are not described herein again in this embodiment.
  • FIG. 17 is a schematic diagram of the hardware structure of a target detection device provided by an embodiment of the application.
  • the target detection device provided in the embodiment of the present application can execute the processing flow provided in the embodiment of the target detection method.
  • the target detection device 170 provided in this embodiment includes: at least one processor 171 and a memory 172.
  • the target detection device 170 also includes a communication component 173. Among them, the processor 171, the memory 172, and the communication component 173 are connected by a bus 174.
  • At least one processor 171 executes the computer-executable instructions stored in the memory 172, so that the at least one processor 171 executes the above target detection method.
  • FIG. 18 is a schematic diagram of the hardware structure of a smart driving device provided by an embodiment of the application.
  • the smart driving device provided in the embodiment of the present application can execute the processing flow provided in the smart driving method embodiment.
  • the smart driving device 180 provided in this embodiment includes: at least one processor 181 and a memory 182.
  • the smart driving device 180 also includes a communication component 183. Among them, the processor 181, the memory 182, and the communication component 183 are connected by a bus 184.
  • At least one processor 181 executes the computer-executable instructions stored in the memory 182, so that the at least one processor 181 executes the above intelligent driving method.
  • the processor may be a central processing unit (English: Central Processing Unit, abbreviated as: CPU), or other general-purpose processors, digital signal processors ( English: Digital Signal Processor, abbreviation: DSP), Application Specific Integrated Circuit (English: Application Specific Integrated Circuit, abbreviation: ASIC), etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in combination with the application can be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory.
  • NVM non-volatile storage
  • the bus can be an Industry Standard Architecture (ISA) bus, Peripheral Component (PCI) bus, or Extended Industry Standard Architecture (EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the buses in the drawings of this application are not limited to only one bus or one type of bus.
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the target detection method or the intelligent driving method are realized.
  • an embodiment of the present application further provides a chip for executing instructions.
  • the chip includes a memory and a processor.
  • the memory stores code and data.
  • the memory is coupled with the processor.
  • the processor runs the code in the memory so that the chip is used to execute the steps of the above-mentioned target detection method or smart driving method.
  • the embodiment of the present application further provides a program product containing instructions, which when the program product runs on a computer, causes the computer to execute the steps of the above-mentioned target detection method or smart driving method.
  • the embodiment of the present application further provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the above-mentioned target detection method or smart driving method.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor to execute the method described in each embodiment of the present application. Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
PCT/CN2020/123918 2019-10-31 2020-10-27 目标检测、智能行驶方法、装置、设备及存储介质 WO2021083126A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020217020811A KR20210098515A (ko) 2019-10-31 2020-10-27 표적 검출, 지능형 주행 방법, 장치, 디바이스 및 저장매체
JP2021539414A JP2022535473A (ja) 2019-10-31 2020-10-27 ターゲット検出、インテリジェント走行方法、装置、機器及び記憶媒体

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911054823.1 2019-10-31
CN201911054823.1A CN112749710A (zh) 2019-10-31 2019-10-31 目标检测、智能行驶方法、装置、设备及存储介质
CN201911063316.4 2019-10-31
CN201911063316.4A CN112749602A (zh) 2019-10-31 2019-10-31 目标查询方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021083126A1 true WO2021083126A1 (zh) 2021-05-06

Family

ID=75715793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123918 WO2021083126A1 (zh) 2019-10-31 2020-10-27 目标检测、智能行驶方法、装置、设备及存储介质

Country Status (3)

Country Link
JP (1) JP2022535473A (ja)
KR (1) KR20210098515A (ja)
WO (1) WO2021083126A1 (ja)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313662A (zh) * 2021-05-27 2021-08-27 北京沃东天骏信息技术有限公司 图像处理方法、装置、设备及存储介质
CN113643239A (zh) * 2021-07-15 2021-11-12 上海交通大学 一种基于记存机制的异常检测方法、装置和介质
CN113642415A (zh) * 2021-07-19 2021-11-12 南京南瑞信息通信科技有限公司 人脸特征表达方法及人脸识别方法
CN113642415B (zh) * 2021-07-19 2024-06-04 南京南瑞信息通信科技有限公司 人脸特征表达方法及人脸识别方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255352A (zh) * 2018-09-07 2019-01-22 北京旷视科技有限公司 目标检测方法、装置及系统
CN109344821A (zh) * 2018-08-30 2019-02-15 西安电子科技大学 基于特征融合和深度学习的小目标检测方法
CN109886286A (zh) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 基于级联检测器的目标检测方法、目标检测模型及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (zh) * 2018-08-30 2019-02-15 西安电子科技大学 基于特征融合和深度学习的小目标检测方法
CN109255352A (zh) * 2018-09-07 2019-01-22 北京旷视科技有限公司 目标检测方法、装置及系统
CN109886286A (zh) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 基于级联检测器的目标检测方法、目标检测模型及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GREGORY KOCH, ZEMEL RICHARD, SALAKHUTDINOV RUSLAN: "Siamese Neural Networks for One-shot Image Recognition", ICML DEEP LEARNING WORKSHOP. VOL. 2. 2015.; JULY 10 AND 11, 2015; LILLE GRANDE PALAIS , FRANCE, PROCEEDINGS OF THE 32 ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING, LILLE, FRANCE, 2015. JMLR: W&CP VOLUME 37, FR, vol. 2, 10 July 2015 (2015-07-10) - 11 July 2015 (2015-07-11), FR, pages 1 - 8, XP055445904 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313662A (zh) * 2021-05-27 2021-08-27 北京沃东天骏信息技术有限公司 图像处理方法、装置、设备及存储介质
CN113643239A (zh) * 2021-07-15 2021-11-12 上海交通大学 一种基于记存机制的异常检测方法、装置和介质
CN113643239B (zh) * 2021-07-15 2023-10-27 上海交通大学 一种基于记存机制的异常检测方法、装置和介质
CN113642415A (zh) * 2021-07-19 2021-11-12 南京南瑞信息通信科技有限公司 人脸特征表达方法及人脸识别方法
CN113642415B (zh) * 2021-07-19 2024-06-04 南京南瑞信息通信科技有限公司 人脸特征表达方法及人脸识别方法

Also Published As

Publication number Publication date
JP2022535473A (ja) 2022-08-09
KR20210098515A (ko) 2021-08-10

Similar Documents

Publication Publication Date Title
JP7289918B2 (ja) 物体認識方法及び装置
Kamal et al. Automatic traffic sign detection and recognition using SegU-Net and a modified Tversky loss function with L1-constraint
WO2022126377A1 (zh) 检测车道线的方法、装置、终端设备及可读存储介质
CN112132156B (zh) 多深度特征融合的图像显著性目标检测方法及系统
CN112528878A (zh) 检测车道线的方法、装置、终端设备及可读存储介质
US20230076266A1 (en) Data processing system, object detection method, and apparatus thereof
Wang et al. Centernet-auto: A multi-object visual detection algorithm for autonomous driving scenes based on improved centernet
JP2016062610A (ja) 特徴モデル生成方法及び特徴モデル生成装置
WO2022237139A1 (zh) 一种基于LaneSegNet的车道线检测方法及系统
CN110781744A (zh) 一种基于多层次特征融合的小尺度行人检测方法
US11340700B2 (en) Method and apparatus with image augmentation
WO2021083126A1 (zh) 目标检测、智能行驶方法、装置、设备及存储介质
CN110956119B (zh) 一种图像中目标检测的方法
WO2022217434A1 (zh) 感知网络、感知网络的训练方法、物体识别方法及装置
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN116783620A (zh) 根据点云的高效三维对象检测
CN114913498A (zh) 一种基于关键点估计的并行多尺度特征聚合车道线检测方法
CN115631344A (zh) 一种基于特征自适应聚合的目标检测方法
CN112395962A (zh) 数据增广方法及装置、物体识别方法及系统
CN116188999A (zh) 一种基于可见光和红外图像数据融合的小目标检测方法
Muthalagu et al. Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks
Al Mamun et al. Efficient lane marking detection using deep learning technique with differential and cross-entropy loss.
CN112749602A (zh) 目标查询方法、装置、设备及存储介质
CN113223037A (zh) 一种面向大规模数据的无监督语义分割方法及系统
CN112446292B (zh) 一种2d图像显著目标检测方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881806

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217020811

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021539414

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881806

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/09/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20881806

Country of ref document: EP

Kind code of ref document: A1