WO2022134464A1 - 目标检测定位置信度确定方法、装置、电子设备及存储介质 - Google Patents

目标检测定位置信度确定方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022134464A1
WO2022134464A1 PCT/CN2021/096419 CN2021096419W WO2022134464A1 WO 2022134464 A1 WO2022134464 A1 WO 2022134464A1 CN 2021096419 W CN2021096419 W CN 2021096419W WO 2022134464 A1 WO2022134464 A1 WO 2022134464A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate frame
target
feature
map
feature map
Prior art date
Application number
PCT/CN2021/096419
Other languages
English (en)
French (fr)
Inventor
楼文杰
王瑞
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022134464A1 publication Critical patent/WO2022134464A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a method, device, electronic device and storage medium for determining the reliability of target detection and location determination.
  • Object detection is one of the basic tasks of computer vision, which separates objects and backgrounds from complex images, and specifies the classification of each object and the location of the region frame.
  • target detection is widely used in many fields, such as image segmentation task using the frame information obtained by target detection to input the segmentation network, image text generation task using the target detection to obtain the position of interest features input to the codec
  • image segmentation task using the frame information obtained by target detection to input the segmentation network
  • image text generation task using the target detection to obtain the position of interest features input to the codec
  • the frame position obtained by the target detection is used for analysis in the target tracking task.
  • the inventors found that most of the current object detection methods use the confidence of the classification as the benchmark of the candidate frame score. However, in the information quantified as candidate boxes, the classification quality is not related to the localization quality. In order to improve the accuracy of evaluating the candidate frame, a fully connected neural network branch is added to the two-stage target detection model to evaluate the positioning accuracy of the candidate frame. However, the inventor realized that this evaluation method can only be applied to the two-stage target detection model, and the evaluation of the positioning accuracy of the candidate frame depends on the previous pooling operation, resulting in limited and one-sided information for evaluating the positioning accuracy.
  • the method of determining the positioning accuracy of the candidate frame is relatively simple, and the accuracy is low.
  • the embodiments of the present application provide a method, device, electronic device, and storage medium for determining the reliability of a target detection location. By splicing the underlying feature maps, the evaluation accuracy of the location reliability of a candidate frame is improved.
  • an embodiment of the present application provides a method for determining the reliability of target detection and positioning, including:
  • the second mask map of each candidate frame is spliced with the N second feature maps to obtain the first target feature map corresponding to each candidate frame, wherein the first target feature map is A three-dimensional matrix with a dimension of W1*H1*(N+1), where W1*H1 is the second mask map of each candidate frame and the size of each second feature map;
  • the location reliability of each candidate frame is determined according to the first target feature map corresponding to each candidate frame.
  • an apparatus for determining the reliability of target detection and location including:
  • an acquisition unit for acquiring an image to be detected
  • a processing unit configured to input the image to be detected into a target detection network for target detection, and obtain at least one candidate frame, a first mask map of each candidate frame, and N first feature maps, the N first
  • the feature maps are N feature maps on N channels of the first network layer of the target detection network, the first network layer is located at the bottom of the target detection network, and N is an integer greater than 1;
  • the first mask maps of the candidate frames are expanded to obtain the second mask map of each candidate frame;
  • N second feature maps are intercepted from the N first feature maps;
  • the second mask map of the frame is spliced with the N second feature maps to obtain the first target feature map corresponding to each candidate frame, wherein the first target feature map is a dimension of W1*H1 *(N+1) three-dimensional matrix, W1*H1 is the second mask map of each candidate frame and the size of each second feature map; according to the first target feature map corresponding to each candidate frame , and determine the location reliability of each candidate frame.
  • an embodiment of the present application provides an electronic device, including: a processor, the processor is connected to a memory, the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory , so that the electronic device performs the following methods:
  • the second mask map of each candidate frame is spliced with the N second feature maps to obtain the first target feature map corresponding to each candidate frame, wherein the first target feature map is A three-dimensional matrix with a dimension of W1*H1*(N+1), where W1*H1 is the second mask map of each candidate frame and the size of each second feature map;
  • the location reliability of each candidate frame is determined according to the first target feature map corresponding to each candidate frame.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program causes a computer to execute the following method:
  • the second mask map of each candidate frame is spliced with the N second feature maps to obtain the first target feature map corresponding to each candidate frame, wherein the first target feature map is A three-dimensional matrix with a dimension of W1*H1*(N+1), where W1*H1 is the second mask map of each candidate frame and the size of each second feature map;
  • the location reliability of each candidate frame is determined according to the first target feature map corresponding to each candidate frame.
  • the first mask map is first expanded to increase the scale of the first mask map, and then the second feature map is intercepted.
  • the surrounding information of the candidate frame will be intercepted, thereby obtaining the global information related to the candidate frame; in addition, the second feature map will be intercepted from the first feature map at the bottom. Since the depth of the bottom network layer is relatively shallow, The first feature map will contain a large amount of spatial position information, therefore, the extracted second feature map also contains a large amount of spatial position information corresponding to the region selected by the candidate frame.
  • the second feature map and the mask map are spliced to determine the location reliability, instead of simply using the mask map to determine the location reliability, which improves the complexity of determining the location reliability; because the first target feature map is obtained by splicing That is, it includes the spatial position information of the region selected by the candidate frame, and also includes the global information of the region, thereby improving the accuracy of the positioning reliability.
  • FIG. 1 is a schematic flowchart of a method for determining the reliability of target detection and location determination provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of obtaining a mask map and intercepting a feature map according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of splicing N second feature maps and second mask maps according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of duplicating and combining feature vectors according to an embodiment of the present application.
  • FIG. 5 is a block diagram of functional units of a target detection and location reliability determination device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an apparatus for determining the reliability of target detection and position determination according to an embodiment of the present application.
  • the technical solution of the present application may relate to the technical field of artificial intelligence, for example, it can be applied to image processing and other scenarios for region extraction, thereby improving the evaluation accuracy of the location reliability of candidate frames, thereby promoting the construction of smart cities.
  • the data involved in this application such as various images and/or location reliability, may be stored in a database, or may be stored in a blockchain, which is not limited in this application.
  • FIG. 1 is a schematic flowchart of a method for determining the reliability of target detection and location according to an embodiment of the present application. The method is applied to a device for determining the position reliability of target detection. The method of this embodiment includes the following steps:
  • the target detection and location reliability determination device acquires the image to be detected.
  • the to-be-detected image may be various types of images.
  • it can be a face image
  • the target detection is to detect the area where the face is located in the image to be detected
  • it can be a medical image
  • the target detection is to detect the area where the lesion is located in the image to be detected.
  • This application does not limit the types of images to be detected.
  • the target detection and location reliability determination device inputs the to-be-detected image into a target detection network for target detection, and obtains at least one candidate frame, a first mask map of each candidate frame, and N first feature maps, the The N first feature maps are N feature maps on N channels of the first network layer of the target detection network, the first network layer is located at the bottom of the target detection network, and N is an integer greater than 1.
  • the N channels are in one-to-one correspondence with the N first feature maps.
  • the target detection network may be a trained neural network, such as Fast-CNN network, Fast-RCNN network, YOLO network, SSD network, IOU-net network, and so on.
  • the target detection network in this application is an IOU-net network as an example to illustrate the process of determining the location reliability, and the IOU-net network is composed of a feature map pyramid network (Feature Pyramid Networks, FPN) as a backbone.
  • FPN feature Map Pyramid Networks
  • the image to be recognized is extracted by FPN, and the feature map obtained from each layer in the pyramid is input into two branch networks: the classification branch network (class subnet) and the box branch network. (box subnet) to locate the candidate frame and classify the targets in the candidate frame to obtain at least one candidate frame and the first mask map of each candidate frame, wherein the first mask map of each candidate frame is used for Indicates the probability of belonging to the target in the image area corresponding to each check box. For example, the pixel corresponding to 0 indicates that the pixel does not belong to the target, and the pixel corresponding to 1 indicates that the pixel belongs to the target.
  • the classification branch network class subnet
  • box branch network box branch network
  • the N first feature maps are N feature maps output from the process of feature extraction through the N channels of the first network layer.
  • the target detection network since the target detection network includes multiple network layers, and each network layer includes multiple channels. Therefore, first obtain the size specified by the target detection network, that is, specify the size of the image input to the target detection network. It should be understood that if the size of the image to be detected does not meet the size specified by the target detection network, the The image to be detected is expanded or cropped, so that the size of the image to be detected meets the size specified by the target detection network; then, according to the size of the second mask map of each candidate frame, the size specified by the target detection network, and The preset hyperparameters determine the first network layer of the target detection network.
  • the first network layer is located at the bottom of the target detection network, because the depth of these network layers is relatively shallow, the extracted feature map will contain a large amount of spatial position information of the original image (image to be detected), so that subsequent stitching There will also be a large amount of spatial position information in the subsequent first target feature map, thereby making the accuracy of the positioning reliability of the obtained candidate frame higher.
  • the first network layer can be represented by formula (1):
  • k represents the position of the first network layer in the multiple network layers, that is, the kth network layer in the target detection network is taken as the first network layer
  • k 0 is a preset hyperparameter, for example, it can be set to 4
  • W and H are the width and height of the second mask image, respectively
  • W 0 and H 0 are the dimensions specified by the target detection network.
  • the numbers in Figure 2 represent the number of channels of the FPN and the size (width and height) of the feature map.
  • 256 and KA represent the number of channels of the FPN respectively 256 and KA
  • W and H represent the width of the feature map, respectively. and high.
  • the dimensions of the feature map mentioned later are the width and height of the feature map, and will not be described again.
  • the device for determining position reliability of target detection expands the first mask map of each candidate frame to obtain a second mask map of each candidate frame.
  • the first mask image may be expanded in a preset padding manner. For example, five pixels with zero pixel values may be expanded on the top, bottom, left, and right sides of the first mask image to obtain the second mask image. mask map.
  • other expansion modes are also possible, and this application does not limit the expansion modes.
  • the target detection and location reliability determination device intercepts N second feature maps from the N first feature maps, where the N second feature maps and the N first feature maps are in one-to-one correspondence.
  • each second feature map in the N second feature maps has the same size as the second mask map of each candidate frame, and the area of each second feature map in the image to be detected is the same as the size of the second mask map of each candidate frame.
  • the area selected by each candidate frame in the image to be detected is the same, and N is an integer greater than 1. Therefore, N second feature maps can be respectively cut out from the N first feature maps according to the area framed by each candidate frame.
  • the target detection and location reliability determination device splices the second mask map of each candidate frame and the N second feature maps to obtain a first target feature map corresponding to each candidate frame, wherein , the first target feature map is a three-dimensional matrix with dimension W 1 *H 1 *(N+1), W 1 *H 1 is the second mask map of each candidate frame and each second The size of the feature map.
  • the second mask map of each candidate frame and the N second feature maps are spliced (vertically spliced) to obtain the first target feature map corresponding to each candidate frame.
  • the second mask map and N second feature maps of each candidate frame are vertically spliced to obtain the first target feature map corresponding to each candidate frame, that is, one dimension is W 1 *H 1 *(N+1) three-dimensional matrix, W 1 *H 1 is the second mask map of each candidate box and the size of each second feature map.
  • W 1 *H 1 is the second mask map of each candidate box and the size of each second feature map.
  • the present application does not limit the splicing order of the second mask map and the N second feature maps.
  • the second mask map may also be spliced between two second feature maps, and so on.
  • the target detection location reliability determination device determines the location reliability of each candidate frame according to the first target feature map corresponding to each candidate frame.
  • the first target feature map of each candidate frame is input into a fully convolutional network, and the first target feature map of each candidate frame is input to a fully convolutional network.
  • Feature extraction is performed on a target feature map (feature extraction is performed on multiple channels), and the second feature map of each candidate frame is obtained (a second feature map will be obtained on each channel); then, through two convolutions
  • the convolutional layer with a kernel size of 1*1 successively performs convolution processing on the second feature map of each candidate frame (that is, performs convolution processing on the second feature maps on multiple channels respectively), and obtains the value of each candidate frame.
  • feature vector and normalize the feature vector of each candidate frame to obtain the location reliability of each candidate frame.
  • the first mask map is first expanded to increase the size of the first mask map, and then the first mask map is intercepted.
  • the surrounding information of the candidate frame will be cut out, so as to obtain the global information related to the candidate frame; in addition, the second feature map will also be cut from the bottom, because the depth of the bottom network layer is relatively shallow, the first A feature map contains a large amount of spatial position information, therefore, the extracted second feature map contains a large amount of spatial position information related to the region selected by the candidate frame.
  • the second feature map and the mask map are spliced to determine the location reliability, instead of simply using the mask map to determine the location reliability, which improves the complexity of determining the location reliability; because the first target is obtained by splicing
  • the feature map not only includes the spatial location information of the region selected by the candidate frame, but also includes the global information of the region, thereby improving the accuracy of the location reliability.
  • the classification confidence of each candidate frame can be obtained, wherein the classification confidence of each candidate frame can be used in the target image to be detected. It is obtained in the process of detection and will not be described again; then, according to the classification confidence of each candidate frame and the positioning confidence of each candidate frame, the target confidence of each candidate frame is determined, that is, the classification confidence of each candidate frame.
  • NMS Non Maximum Suppression
  • the image to be detected may be a face image
  • the target detection may be to detect a face region in the face image. Therefore, the target framed in each candidate frame is the face area in the face image. Since the target detection accuracy of the present application is relatively high, the accuracy of the face region framed by the target candidate frame is relatively high.
  • the to-be-detected image may be a vehicle image
  • the target detection may be to detect a vehicle area in the vehicle image. Therefore, the target framed by each candidate frame is the vehicle area in the vehicle image. Since the target detection accuracy of the present application is relatively high, the accuracy of the vehicle area framed by the target candidate frame is relatively high.
  • the image to be detected may be a medical image
  • the target detection may be to detect a target in the medical image
  • the target in the medical image may be a lesion area, an organ, a tissue, etc.
  • this application takes the detection of a lesion area in a medical image as an example for description, and the detection methods of other targets are similar to this, and will not be described again. Since the target detection accuracy of the present application is relatively high, the accuracy of the lesion area framed by the target candidate frame is relatively high.
  • the implementation manner of performing target detection on the medical image may be: inputting the medical image into the target detection network, and obtaining the target detection network of the medical image.
  • the third feature map then, obtain a medical knowledge graph corresponding to the medical field to which the medical image belongs, that is, obtain a medical knowledge graph corresponding to the medical field, wherein the medical knowledge graph is a medical entity corresponding to the medical entity.
  • the medical knowledge map of gastric cancer includes the medical entity "gastric cancer” of gastric cancer medicine, and its corresponding description is "gastric cancer is a malignant tumor originating from the gastric mucosal epithelium", and its corresponding medical plans include: differences in gastric cancer, gastric cancer symptoms, gastric cancer A map composed of diffusion and transfer pathways, etc.
  • the medical knowledge graph is vectorized to obtain a first feature vector corresponding to the medical knowledge graph. Because the medical knowledge graph is essentially a relationship composed of multiple medical texts. Therefore, each medical text contained in the medical knowledge graph can be vectorized by a method similar to word embedding, and the text vector corresponding to each medical text can be obtained. Perform vectorization to obtain a text vector corresponding to each medical text; finally, splicing multiple text vectors corresponding to multiple medical texts to obtain a first feature vector corresponding to the medical knowledge map.
  • the first feature vector is copied and combined multiple times to obtain a first three-dimensional matrix.
  • the first feature vector is copied and combined multiple times to obtain a first three-dimensional matrix.
  • the data of each layer in the first three-dimensional matrix is the same, wherein the dimension of each layer is the same as the third feature of the medical image.
  • the dimensions of the graphs are the same, that is, the dimensions of each layer are W 2 and H 2 , which is equivalent to copying the first feature vector W 2 times horizontally and H 2 times vertically to obtain the first three-dimensional matrix; then, the first three-dimensional matrix is obtained.
  • the first three-dimensional matrix is spliced with the third feature map of the medical image to obtain a fourth feature map.
  • the splicing method is similar to the splicing method in FIG. 3 and will not be described again.
  • target detection is performed according to the fourth feature map to obtain at least one candidate frame corresponding to the lesion in the medical image and a first mask map corresponding to each candidate frame.
  • the medical knowledge map corresponding to the medical image is spliced. Because the medical atlas describes the information related to the target of the medical image (for example, the location information of the lesion), it is equivalent to adding prior knowledge in the process of target detection, so that the candidate frame will not appear in some areas that should not be The location of the appearance improves the accuracy of object detection in medical images.
  • the area selected by the candidate frame is generally an area larger than the lesion. Therefore, the lesions in the target candidate frame can also be segmented based on the target candidate frame, so as to accurately display the lesion area in the target candidate frame. Specifically, feature extraction is performed on the image framed by the target candidate frame to obtain a fifth feature map. For example, the image framed by the target candidate frame can be cut out from the image to be detected, and then input to the training completed.
  • An image segmentation network which performs feature extraction on the image to be detected through the image segmentation to obtain a fifth feature map
  • the image segmentation network can be a fully convolutional network (Fully Convolutional Networks, FCN); the corresponding medical image
  • FCN Fully Convolutional Networks
  • the body part is encoded to obtain the second feature vector corresponding to the body part, and the body part can be mapped to obtain the second feature vector corresponding to the body part.
  • the GB2312 corresponding to the Chinese of the body part can be obtained.
  • the code is used as the second feature vector of the body part; finally, the second feature vector is copied and combined multiple times to obtain a second three-dimensional matrix, wherein, the method of copying and combining the second feature vector is as shown in FIG. 4 .
  • the way out is similar and will not be described again.
  • the position information of the target candidate frame in the image to be detected is encoded to obtain a third feature vector, wherein the position information of the target candidate frame in the to-be-detected image can be reflected by the coordinate information of the target candidate frame, wherein the The coordinate information of the target candidate frame can be reflected by the pixel coordinates of the upper left corner and the upper right corner of the target candidate frame, and the pixel coordinates of the upper left corner and the upper right corner of the target candidate frame can be performed by the target detection network. Output during target detection. Then, the pixel coordinates of the upper left corner and the pixel coordinates of the upper right corner are spliced, and the spliced coordinates are used as the third feature vector.
  • the pixel coordinates of the upper left corner are (4, 3), and the pixel coordinates of the lower right corner are (16, 9), then the third feature vector obtained by splicing is (4, 3, 16, 9). Then, the third eigenvector is copied and combined multiple times to obtain a third three-dimensional matrix.
  • the data of each layer in the third three-dimensional matrix is the same, and the size of each layer of the third three-dimensional matrix is the same as that of the fifth three-dimensional matrix.
  • the size of the feature maps is the same, and the manner of copying and combining the third feature vector is similar to that of FIG. 4 , and will not be described again.
  • the second three-dimensional matrix, the third three-dimensional matrix and the fifth feature map are spliced to obtain a sixth feature map.
  • the splicing method is similar to that shown in FIG. 3 and will not be described again; then, according to the fully convolutional network and The sixth feature map is used to segment the lesions in the target candidate frame.
  • the body part to which the medical image belongs is fused, so that the body part in the body part can be known in advance according to prior knowledge Which areas will have lesions, and what kind of lesions; then, by combining the position of the target candidate frame, you can know which area of the body part the target candidate frame belongs to, and thus know which areas of the target candidate frame will have lesions. and the type of lesions, thereby reducing the false positive rate of lesion segmentation for lesions in the target candidate frame, that is, lesions will not be segmented in some areas where lesions are unlikely to appear in the target candidate frame, and will not be segmented.
  • the lesion types that do not belong to the target candidate frame are segmented, thereby reducing the false positive rate of lesion segmentation.
  • FIG. 5 is a block diagram of functional units of a target detection and location reliability determination device provided by an embodiment of the present application.
  • the device 500 for determining the position reliability of target detection includes: an acquisition unit 501 and a processing unit 502, wherein:
  • an acquisition unit 501 configured to acquire an image to be detected
  • the processing unit 502 is configured to input the image to be detected into a target detection network for target detection, and obtain at least one candidate frame, a first mask map of each candidate frame, and N first feature maps, the Nth A feature map is N feature maps on N channels of the first network layer of the target detection network, the first network layer is located at the bottom of the target detection network, and N is an integer greater than 1;
  • the second mask map of each candidate frame is spliced with the N second feature maps to obtain the first target feature map corresponding to each candidate frame, wherein the first target feature map is A three-dimensional matrix with dimension W 1 *H 1 *(N+1), W 1 *H 1 is the second mask map of each candidate frame and the size of each second feature map;
  • the first target feature map corresponding to each candidate frame is used to determine the location reliability of each candidate frame.
  • the processing unit 502 before cutting out the N second feature maps from the N first feature maps, the processing unit 502 is further configured to:
  • the first network layer of the target detection network is determined according to the size of the second mask map of each candidate frame, the size specified by the target detection network, and the preset hyperparameters.
  • the processing unit in terms of determining the location reliability of each candidate frame according to the first target feature map corresponding to each candidate frame, is specifically configured to:
  • the second target feature map corresponding to each candidate frame is successively convolved through two convolution layers with a convolution kernel size of 1*1, to obtain a feature vector corresponding to each candidate frame;
  • the feature vector corresponding to each candidate frame is normalized to obtain the location reliability of each candidate frame.
  • the processing unit 502 is further configured to:
  • the classification confidence of each candidate frame and the location confidence of each candidate frame determine the target confidence of each candidate frame
  • non-maximum value suppression is performed on the at least one candidate frame to obtain a target candidate frame in the at least one candidate frame.
  • the image to be detected is a medical image.
  • processing Unit 502 in terms of inputting the image to be detected into a target detection network for target detection, and obtaining at least one candidate frame and a first mask map of each candidate frame in the at least one candidate frame, processing Unit 502 is specifically used for:
  • the first eigenvector is copied and combined multiple times to obtain a first three-dimensional matrix, wherein the data of each layer in the first three-dimensional matrix is the same, and the size of each layer of the first three-dimensional matrix is the same as that of the first three-dimensional matrix.
  • the size of the third feature map is the same;
  • processing unit 502 is further configured to:
  • the position information of the target candidate frame in the image to be detected is encoded to obtain a third eigenvector, and the third eigenvector is copied and combined multiple times to obtain a third three-dimensional matrix, wherein the third eigenvector is obtained.
  • the data of each layer in the three-dimensional matrix is the same, and the size of each layer of the third three-dimensional matrix is the same as the size of the fifth feature map;
  • the target in the target candidate frame is segmented.
  • FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device includes: a processor and a memory.
  • the electronic device may further include a transceiver.
  • an electronic device 600 includes a transceiver 601 , a processor 602 and a memory 603 . They are connected by bus 604 .
  • the memory 603 is used to store computer programs and data, and can transmit the data stored in the memory 603 to the processor 602 .
  • the processor 602 is used to read the computer program in the memory 603 to perform the following operations:
  • the second mask map of each candidate frame is spliced with the N second feature maps to obtain the first target feature map corresponding to each candidate frame, wherein the first target feature map is A three-dimensional matrix whose dimension is W 1 *H 1 *(N+1), where W 1 *H 1 is the second mask map of each candidate frame and the size of each second feature map;
  • the location reliability of each candidate frame is determined according to the first target feature map corresponding to each candidate frame.
  • the processor 602 before the N second feature maps are intercepted from the N first feature maps, the processor 602 is further configured to read the computer program in the memory 603 to perform the following operations:
  • the first network layer of the target detection network is determined according to the size of the second mask map of each candidate frame, the size specified by the target detection network, and the preset hyperparameters.
  • the processor 602 in terms of determining the location reliability of each candidate frame according to the first target feature map corresponding to each candidate frame, is specifically configured to perform the following operations:
  • the second target feature map corresponding to each candidate frame is successively convolved through two convolution layers with a convolution kernel size of 1*1, to obtain a feature vector corresponding to each candidate frame;
  • the feature vector corresponding to each candidate frame is normalized to obtain the location reliability of each candidate frame.
  • the processor 602 is further configured to read the computer program in the memory 603 to perform the following operations:
  • the classification confidence of each candidate frame and the location confidence of each candidate frame determine the target confidence of each candidate frame
  • non-maximum value suppression is performed on the at least one candidate frame to obtain a target candidate frame in the at least one candidate frame.
  • the image to be detected is a medical image.
  • processing The device 602 is specifically configured to perform the following operations:
  • the first eigenvector is copied and combined multiple times to obtain a first three-dimensional matrix, wherein the data of each layer in the first three-dimensional matrix is the same, and the size of each layer of the first three-dimensional matrix is the same as that of the first three-dimensional matrix.
  • the size of the third feature map is the same;
  • the processor 602 is further configured to read the computer program in the memory 603 to perform the following operations:
  • the position information of the target candidate frame in the image to be detected is encoded to obtain a third eigenvector, and the third eigenvector is copied and combined multiple times to obtain a third three-dimensional matrix, wherein the third eigenvector is obtained.
  • the data of each layer in the three-dimensional matrix is the same, and the size of each layer of the third three-dimensional matrix is the same as the size of the fifth feature map;
  • the target in the target candidate frame is segmented.
  • the transceiver 601 may be the acquisition unit 501 of the device 500 for determining the location reliability of target detection in the embodiment shown in FIG. 5
  • the processor 602 may be the location reliability determination device for target detection in the embodiment described in FIG. 5
  • the processing unit 502 of the apparatus 500 may be the acquisition unit 501 of the device 500 for determining the location reliability of target detection in the embodiment shown in FIG. 5 .
  • the target detection and location reliability determination device in this application can include smart phones (such as Android mobile phones, iOS mobile phones, Windows Phone mobile phones, etc.), tablet computers, handheld computers, notebook computers, mobile Internet devices MID (Mobile Internet Devices, Abbreviation: MID) or wearable devices, etc.
  • the above-mentioned device for determining the reliability of target detection and location is only an example, not exhaustive, including but not limited to the above-mentioned device for determining the reliability of target detection and location.
  • the above-mentioned apparatus for determining the reliability of target detection and location may further include: an intelligent vehicle-mounted terminal, a computer device, and the like.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement any one of the target detection and positioning as described in the foregoing method embodiments Some or all of the steps of a confidence determination method.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • Embodiments of the present application further provide a computer program product, the computer program product comprising a non-transitory computer-readable storage medium storing a computer program, the computer program being operable to cause a computer to execute the methods described in the foregoing method embodiments Part or all of the steps of any method for determining the location reliability of target detection.
  • the disclosed apparatus may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative, for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, and can also be implemented in the form of software program modules.
  • the integrated unit if implemented in the form of a software program module and sold or used as a stand-alone product, may be stored in a computer readable memory.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art, or all or part of the technical solution, and the computer software product is stored in a memory.
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种目标检测定位置信度确定方法、装置、电子设备及存储介质,涉及人工智能技术领域。方法包括:获取待检测图像(101);将待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图(102);对每个候选框的第一掩码图进行扩充,得到每个候选框的第二掩码图(103);从N个第一特征图中截取N个第二特征图(104);将每个候选框的第二掩码图与N个第二特征图进行拼接,得到每个候选框对应的第一目标特征图(105);根据每个候选框对应的第一目标特征图,确定每个候选框的定位置信度(106)。提高了定位置信度的精度。

Description

目标检测定位置信度确定方法、装置、电子设备及存储介质
本申请要求于2020年12月25日提交中国专利局、申请号为202011572377.6,发明名称为“目标检测定位置信度确定方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,具体涉及一种目标检测定位置信度确定方法、装置、电子设备及存储介质。
背景技术
目标检测从复杂图像中分离出目标与背景,并指明每个目标的分类以及所在的区域框位置,是计算机视觉的基础任务之一。目标检测作为一种通用的前处理范式广泛应用于众多领域,如图像分割任务中使用目标检测得到的框信息输入分割网络,图像文本生成任务中使用目标检测得到的感兴趣位置特征输入至编解码器结构,目标追踪任务中使用目标检测获得的框位置进行分析。
发明人发现,目前目标检测方法大多使用分类的置信度作为候选框得分的基准。然而,被量化为候选框的信息中,分类好坏与定位好坏并不相关。为了提高评价候选框的准确度,在二阶段目标检测模型上增加了一个全连接神经网络分支用于评价候选框的定位精度。但是,发明人意识到,该评价方法仅能应用到二阶段目标检测模型,并且,对候选框定位精度的评价依赖于之前的池化操作,造成用于评价定位精度的信息比较局限和片面。
因此,现有的目标检测,确定候选框定位精度的方式比较单一,准确度低。
发明内容
本申请实施例提供了一种目标检测定位置信度确定方法、装置、电子设备及存储介质,通过拼接底层特征图,提高对候选框的定位置信度的评价精度。
第一方面,本申请实施例提供一种目标检测定位置信度确定方法,包括:
获取待检测图像;
将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
从所述N个第一特征图中截取N个第二特征图;
将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W1*H1*(N+1)的三维矩阵,W1*H1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
第二方面,本申请实施例提供一种目标检测定位置信度确定装置,包括:
获取单元,用于获取待检测图像;
处理单元,用于将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;从所述N个第一特征图中截取N个第二特征图;将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W1*H1*(N+1)的三维矩阵,W1*H1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;根据所述每个候选框对应的第一目 标特征图,确定所述每个候选框的定位置信度。
第三方面,本申请实施例提供一种电子设备,包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行以下方法:
获取待检测图像;
将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
从所述N个第一特征图中截取N个第二特征图;
将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W1*H1*(N+1)的三维矩阵,W1*H1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序使得计算机执行以下方法:
获取待检测图像;
将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
从所述N个第一特征图中截取N个第二特征图;
将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W1*H1*(N+1)的三维矩阵,W1*H1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
在本申请实施例中,在获取到每个候选的第一掩码图之后,先对第一掩码图进行扩充,增大了第一掩码图的规模,这样再截取第二特征图的过程中,会将候选框的周围信息截取出来,从而获取了与候选框相关的全局信息;此外,还从底部的第一特征图中截取第二特征图,由于底部网络层的深度比较浅,该第一特征图中会包含有大量的空间位置信息,因此,截取出的第二特征图中也包含有大量的与该候选框所框选的区域对应的空间位置信息。然后,将第二特征图与掩码图进行拼接,确定定位置信度,而不是单纯的使用掩码图确定定位置信度,提高了确定定位置信度的复杂性;由于拼接得到第一目标特征图即包含有该候选框所框选的区域的空间位置信息,也包含有该区域的全局信息,从而提高了定位置信度的精度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种目标检测定位置信度确定方法的流程示意图;
图2为本申请实施例提供的一种获取掩码图与截取特征图的示意图;
图3为本申请实施例提供的一种将N第二特征图与第二掩码图进行拼接的示意图;
图4为本申请实施例提供的一种对特征向量进行复制与组合的示意图;
图5为本申请实施例提供的一种目标检测定位置信度确定装置的功能单元组成框图;
图6为本申请实施例提供的一种目标检测定位置信度确定装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结果或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请的技术方案可涉及人工智能技术领域,如可应用于图像处理等场景中,用于区域提取,由此可提高对候选框的定位置信度的评价精度,从而推动智慧城市的建设。可选的,本申请涉及的数据如各种图像和/或定位置信度等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。
参阅图1,图1为本申请实施例提供的一种目标检测定位置信度确定方法的流程示意图。该方法应用于目标检测定位置信度确定装置。本实施例的方法包括以下步骤:
101:目标检测定位置信度确定装置获取待检测图像。
示例性的,该待检测图像可以为各种类型的图像。比如,可以为人脸图像,则目标检测就是检测出该待检测图像中人脸所在的区域,也可以为医学图像,则目标检测就是检测出该待检测图像中病灶所在的区域。本申请不对待检测图像的类型进行限定。
102:目标检测定位置信度确定装置将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数。
其中,N个通道与N个第一特征图一一对应。
示例性的,目标检测网络可以为完成训练的神经网络,比如,可以为Fast-CNN网络、Fast-RCNN网络、YOLO网络、SSD网络、IOU-net网络,等等。本申请中目标检测网络为IOU-net网络为例说明确定定位置信度的过程,且该IOU-net网络是以特征图金字塔网络(Feature Pyramid Networks,FPN)为骨架(backbone)构成的。
示例性的,如图2所示,通过FPN对该待识别图像进行特征提取,并将金字塔中每层得到的特征图输入到两个分支网络:即分类分支网络(class subnet)和框分支网络(box subnet)进行候选框的定位以及对候选框中的目标进行分类,得到至少一个候选框以及每个候选框的第一掩码图,其中,每个候选框的第一掩码图用于表征每个选框对应的图像区域中属于目标的概率,比如,0对应的像素点表示该像素点不属于目标,1对应的像素点表征该像素点属于目标。
此外,在对待检测图像进行目标检测的过程中,会通过目标检测网络的每个网络层对待检测图像进行不同尺度的特征提取。因此,N个第一特征图为通过第一网络层的N个通道进行特征提取的过程输出的N个特征图。
示例性的,由于目标检测网络包括包括多个网络层,且每个网络层包括有多个通道。因此,先获取该目标检测网络所规定的尺寸,即规定输入到该目标检测网络的图像的尺寸,应理解,若待检测图像的尺寸不满足该目标检测网络所规定的尺寸,则需要对该待检测图像进行扩充或者裁剪,以使该待检测图像的尺寸满足该目标检测网络所规定的尺寸;然后,根据每个候选框的第二掩码图的尺寸、目标检测网络所规定的尺寸以及预设的超参,确定目标检测网络的第一网络层。
一般来说,该第一网络层位于该目标检测网络的底部,因为这些网络层的深度比较浅,提取的特征图中会包含有原始图像(待检测图像)的大量空间位置信息,这样后续拼接之后的第一目标特征图中也会有大量的空间位置信息,进而使得到的候选框的定位置信度的精度较高。
示例性的,第一网络层可以通过公式(1)表示:
Figure PCTCN2021096419-appb-000001
其中,k表示第一网络层在该多个网络层的位置,即将目标检测网络中的第k个网络层作为第一网络层,k 0为预设的超参,比如,可以设置为4,W和H分别为第二掩码图的宽和高,W 0和H 0为目标检测网络所规定的尺寸。
应理解,图2中的数字代表FPN的通道数,以及特征图的尺寸(宽和高),比如,256和KA分别代表FPN的通道数量为256和KA,W和H分别表示特征图的宽和高。后续所提到特征图的尺寸均为特征图的宽和高,不再叙述。
103:目标检测定位置信度确定装置对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图。
示例性的,可以按照预设的扩充(padding)方式对第一掩码图进行扩充,比如,可以该第一掩码图的上下左右各扩充五个像素值为零的像素点,得到第二掩码图。当然,也可以有其他的扩充方式,本申请不对扩充方式进行限定。
104:目标检测定位置信度确定装置从N个第一特征图中截取N个第二特征图,其中,所述N个第二特征图以及所述N个第一特征图一一对应。
其中,N个第二特征图中的每个第二特征图与每个候选框的第二掩码图的尺寸相同,且每个第二特征图在所述待检测图像中的区域与所述每个候选框在所述待检测图像中框选的区域相同,N为大于1的整数。因此,可根据每个候选框所框选的区域从N个第一特征图上分别截取出N个第二特征图。
105:目标检测定位置信度确定装置将所述每个候选框的第二掩码图以及所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸。
示例性的,将每个候选框的第二掩码图以及N个第二特征图进行拼接(纵向拼接),得到每个候选框对应的第一目标特征图。如图3所示,将每个候选框的第二掩码图以及N个第二特征图进行纵向拼接,得到每个候选框对应的第一目标特征图,即一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为每个候选框的第二掩码图以及每个第二特征图的尺寸。应理解,本申请并不限定第二掩码图以及N个第二特征图的拼接顺序,比如,也可以将第二掩码图拼接到两个第二特征图之间,等等。
106:目标检测定位置信度确定装置根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
示例性的,如图2所示,在得到每个候选框的第一目标特征图之后,将每个候选框的第一目标特征图输入到一个全卷积网络,对每个候选框的第一目标特征图进行特征提取(在多个通道上分别进行特征提取),得到每个候选框的第二特征图(每个通道上都会得到一个第二特征图);然后,通过两个卷积核尺寸为1*1的卷积层先后对每个候选框的第二特征图进行卷积处理(即对多个通道上的第二特征图分别进行卷积处理),得到每个候选框的特征向量,并对每个候选框的特征向量进行归一化处理,得到每个候选框的定位置信度。
可以看出,在本申请实施例中,在获取到每个候选的第一掩码图之后,先对第一掩码图进行扩充,增大了第一掩码图的规模,这样再截取第二特征图的过程中,会将候选框的周围信息截取出来,从而获取了与候选框相关的全局信息;此外,还从底部截取第二特征图,由于底部网络层的深度比较浅,该第一特征图中会包含有大量的空间位置信息,因此,截取出的第二特征图中包含有大量的与该候选框所框选的区域的空间位置信息。然后,将第二特征图与掩码图进行拼接,去确定定位置信度,而不是单纯的使用掩码图去确定定位置信度,提高了确定定位置信度的复杂性;由于拼接得到第一目标特征图即包含有该候选框所框选的区域的空间位置信息,也包含有该区域的全局信息,从而提高了定位置信度的精度。
在本申请的一个实施方式中,在确定出每个候选框的定位置信度之后,可以获取每个候选框的分类置信度,其中,每个候选框的分类置信度可以在对待检测图像进行目标检测的过程中得到,不再叙述;然后,根据每个候选框的分类置信度以及每个候选框的定位置信度,确定每个候选框的目标置信度,即将每个候选框的分类置信度与该候选框的定位置信度相乘,得到每个候选框的目标置信度;最后,根据每个候选框的目标置信度,对该至少一个候选框进行非极大值抑制(Non Maximum Suppression,NMS),得到该至少一个候选框中的目标候选框,即对该至少一个候选框进行筛选,剔除重复框选了目标的候选框。应理解,由于确定出的定位置信度的精度较高,则得到的目标置信度的精度也相对较高,使保留下来的目标候选框的精度也比较高,提高了目标检测的精度。
在本申请的一个实施方式中,该待检测图像可以为人脸图像,则目标检测可以为检测该人脸图像中的人脸区域。因此,每个候选框中所框选的目标为人脸图像中的人脸区域。由于本申请的目标检测精度较高,目标候选框所框选出的人脸区域的精度比较高。
在本申请的一个实施方式中,该待检测图像可以为车辆图像,则目标检测可以为检测该车辆图像中的车辆区域。因此,每个候选框所框选的目标为车辆图像中的车辆区域。由于本申请的目标检测精度较高,则目标候选框所框选出的车辆区域的精度比较高。
在本申请的一个实施方式中,该待检测图像可以为医学图像,则目标检测可以为检测该医学图像中的目标,其中,该医学图像中的目标可为病灶区域、器官、组织,等等,本申请以检测医学图像中的病灶区域为例进行说明,其他目标的检测方式与此类似,不再叙述。由于本申请的目标检测精度比较高,目标候选框所框选出的病灶区域的精度比较高。
在本申请的一个实施方式中,在该待检测图像为医学图像的情况下,对该医学图像进行目标检测的实现方式可以为:将该医学图像输入到该目标检测网络,得到该医学图像的第三特征图;然后,获取与该医学图像所属的医学领域对应的医学知识图谱,即获取与该医学领域的医学知识图谱,其中,该医学知识图谱是由医学实体,与该医学实体对应的描述(即对该医学实体的解释说明)以及与该医学实体对应的医疗方案所组成的一种图谱。比如,胃癌医学知识图谱包括胃癌医学的医学实体“胃癌”,其对应的描述为“胃癌是起源于胃黏膜上皮的恶性肿瘤”,其对应的医疗方案包括:胃癌的差异性、胃癌症状、胃癌的扩散和转移途径,等等组成的一种图谱。然后,将该医学知识图谱进行向量化,得到与该医 学知识图谱对应的第一特征向量。由于医学知识图谱本质是由多个医疗文本组成的关系。因此,可以通过类似词嵌入的方法分别对医学知识图谱中包含的每个医疗文本进行向量化,得到每个医疗文本对应的文本向量,比如,分别对医学知识图谱中的“实体”、治疗方案进行向量化,得到每个医疗文本对应的文本向量;最后,再把多个医疗文本对应的多个文本向量进行拼接,得到该医学知识图谱对应的第一特征向量。
进一步的,在得到医学知识图谱对应的第一特征向量之后,对该第一特征向量进行多次复制与组合,得到第一三维矩阵。如图4所示,将第一特征向量进行多次复制与组合,得到第一三维矩阵,该第一三维矩阵中每层的数据相同,其中,每层的维度与该医学图像的第三特征图的尺寸相同,即每层的尺寸均为W 2和H 2,即相当于将该第一特征向量横向复制W 2次,纵向复制H 2次,得到该第一三维矩阵;然后,将该第一三维矩阵与该医学图像的第三特征图进行拼接,得到第四特征图,其拼接方式与图3中的拼接方式类似,不再叙述。最后,根据该第四特征图进行目标检测,得到该医学图像中与病灶对应的至少一个候选框以及每个候选框对应的第一掩码图。
可以看出,在本申请实施例中,在对医学图像进行目标检测的过程中,拼接了医学图像对应的医学知识图谱。由于,该医学图谱描述有与该医学图像的目标相关的信息(比如,病灶的位置信息),相当于在目标检测的过程中加入了先验知识,这样候选框就不会出现在一些不应该出现的位置,提高了对医学图像目标检测的精度。
在本申请的一个实施方式中,在得到与病灶对应的目标候选框之后,由于候选框框选的区域一般是大于病灶的区域。因此,还可以基于目标候选框对该目标候选框中的病灶进行分割,以精确显示该目标候选框中的病灶区域。具体的,对该目标候选框所框选的图像进行特征提取,得到第五特征图,比如,可以将目标候选框所框选的图像从待检测图像中截取出来,然后,输入到完成训练的图像分割网络,通过该图像分割对该待检测图像进行特征提取,得到第五特征图,其中,该图像分割网络可以为全卷积网络(Fully Convolutional Networks,FCN);将该医学图像所对应的身体部位进行编码,得到与该身体部位对应的第二特征向量,可以将该身体部位进行映射处理,得到该身体部位对应的第二特征向量,比如,可以将该身体部位的中文所对应的GB2312码作为该身体部位的第二特征向量;最后,将该第二特征向量进行多次复制与组合,得到第二三维矩阵,其中,对第二特征向量进行复制与组合的方式,与图4示出的方式类似,不再叙述。
进一步的,将目标候选框在待检测图像中的位置信息进行编码,得到第三特征向量,其中,目标候选框在待检测图像中的位置信息可以由目标候选框的坐标信息反映,其中,该目标候选框的坐标信息可以为该目标候选框的左上角的像素坐标和右上角的像素坐标反映,并且该目标候选框的左上角的像素坐标和右上角的像素坐标可以由目标检测网络在进行目标检测的过程中输出。然后,将左上角的像素坐标和右上角的像素坐标进行拼接,并将拼接后的坐标作为第三特征向量。比如,左上角的像素坐标为(4,3),右下角的像素坐标为(16,9),则拼接得到的第三特征向量为(4,3,16,9)。然后,将该第三特征向量进行多次复制与组合,得到第三三维矩阵,同样,该第三三维矩阵中每层的数据相同,且该第三三维矩阵的每层的尺寸与该第五特征图的尺寸相同,并且对第三特征向量进行复制与组合的方式与图4的方式类似,不再叙述。
最后,将第二三维矩阵、第三三维矩阵以及第五特征图进行拼接,得到第六特征图,拼接方式与图3示出的方式类似,不再叙述;然后,根据该全卷积网络以及该第六特征图,对该目标候选框中的病灶进行分割。
可以看出,在本申请实施例中,在对目标候选框中的病灶进行分割的过程中,融合了该医学图像所属的身体部位,这样就可以根据先验知识,提前知道该身体部位中的哪些区域会有病灶,以及是哪种病灶;然后,再融合该目标候选框的位置,则可以知道该目标候 选框属于该身体部位的哪个区域,从而知道该目标候选框的哪些区域会有病灶以及病灶的类型,进而降低对该目标候选框的病灶进行病灶分割的假阳率,即不会在该目标候选框内的一些不可能出现病灶的区域分割出病灶,而且,也不会分割出分割出不属于该目标候选框的病灶类型,从而降低病灶分割的假阳率。
参阅图5,图5本申请实施例提供的一种目标检测定位置信度确定装置的功能单元组成框图。目标检测定位置信度确定装置500包括:获取单元501和处理单元502,其中:
获取单元501,用于获取待检测图像;
处理单元502,用于将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
从所述N个第一特征图中截取N个第二特征图;
将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
在一些可能的实施方式中,在从N个第一特征图中截取N个第二特征图之前,处理单元502,还用于:
获取所述目标检测网络所规定的尺寸;
根据所述每个候选框的第二掩码图的尺寸、所述目标检测网络所规定的尺寸以及预设的超参,确定所述目标检测网络的第一网络层。
在一些可能的实施方式中,在根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度方面,处理单元,具体用于:
对所述每个候选框对应的第一目标特征图进行特征提取,得到所述每个候选框对应的第二目标特征图;
通过两个卷积核尺寸为1*1的卷积层先后对所述每个候选框对应的第二目标特征图进行卷积处理,得到所述每个候选框对应的特征向量;
将所述每个候选框对应的特征向量进行归一化处理,得到所述每个候选框的定位置信度。
在一些可能的实施方式中,在根据所述目标特征图,确定所述每个候选框的定位置信度之后,处理单元502,还用于:
获取所述每个候选框的分类置信度;
根据所述每个候选框的分类置信度以及所述每个候选框的定位置信度,确定所述每个候选框的目标置信度;
根据所述每个候选框的目标置信度,对所述至少一个候选框进行非极大值抑制,得到所述至少一个候选框中的目标候选框。
在一些可能的实施方式中,所述待检测图像为医学图像。
在一些可能的实施方式中,在将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框以及所述至少一个候选框中每个候选框的第一掩码图方面,处理单元502,具体用于:
将所述医学图像输入到目标检测网络,得到所述医学图像的第三特征图;
获取与所述医学图像所属的医学领域对应的医学知识图谱;
将所述医学知识图谱进行向量化,得到与所述医学知识图谱对应的第一特征向量;
将所述第一特征向量进行多次复制与组合,得到第一三维矩阵,其中,所述第一三维矩阵中每层的数据相同,且所述第一三维矩阵的每层的尺寸与所述第三特征图的尺寸相同;
将所述第一三维矩阵与所述医学图像的第三特征图进行拼接,得到第四特征图,根据所述第四特征图对所述医学图像进行目标检测,得到所述医学图像中与目标对应的至少一个候选框以及所述每个候选框对应的第一掩码图。
在一些可能的实施方式中,处理单元502,还用于:
对所述目标候选框所框选的图像进行特征提取,得到第五特征图;
将所述医学图像所对应的身体部位进行编码,得到与所述身体部位对应的第二特征向量,将所述第二特征向量进行多次复制与组合,得到第二三维矩阵,其中,所述第二三维矩阵中每层的数据相同,且所述第二三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
将所述目标候选框在所述待检测图像中的位置信息进行编码,得到第三特征向量,将所述第三特征向量进行多次复制与组合,得到第三三维矩阵,其中,所述第三三维矩阵中每层的数据相同,且所述第三三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
将所述第二三维矩阵、所述第三三维矩阵以及所述第五特征图进行拼接,得到第六特征图;
根据所述第六特征图,对所述目标候选框中的目标进行分割。
参阅图6,图6为本申请实施例提供的一种电子设备的结构示意图。该电子设备包括:处理器和存储器。可选的,该电子设备还可包括收发器。例如,如图6所示,电子设备600包括收发器601、处理器602和存储器603。它们之间通过总线604连接。存储器603用于存储计算机程序和数据,并可以将存储器603存储的数据传输给处理器602。
处理器602用于读取存储器603中的计算机程序执行以下操作:
控制收发器601获取待检测图像;
将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
从所述N个第一特征图中截取N个第二特征图;
将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
在一些可能的实施方式中,在从N个第一特征图中截取N个第二特征图之前,处理器602还用于读取存储器603中的计算机程序执行以下操作:
获取所述目标检测网络所规定的尺寸;
根据所述每个候选框的第二掩码图的尺寸、所述目标检测网络所规定的尺寸以及预设的超参,确定所述目标检测网络的第一网络层。
在一些可能的实施方式中,在根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度方面,处理器602具体用于执行以下操作:
对所述每个候选框对应的第一目标特征图进行特征提取,得到所述每个候选框对应的第二目标特征图;
通过两个卷积核尺寸为1*1的卷积层先后对所述每个候选框对应的第二目标特征图进行卷积处理,得到所述每个候选框对应的特征向量;
将所述每个候选框对应的特征向量进行归一化处理,得到所述每个候选框的定位置信 度。
在一些可能的实施方式中,在根据所述目标特征图,确定所述每个候选框的定位置信度之后,处理器602还用于读取存储器603中的计算机程序执行以下操作:
获取所述每个候选框的分类置信度;
根据所述每个候选框的分类置信度以及所述每个候选框的定位置信度,确定所述每个候选框的目标置信度;
根据所述每个候选框的目标置信度,对所述至少一个候选框进行非极大值抑制,得到所述至少一个候选框中的目标候选框。
在一些可能的实施方式中,所述待检测图像为医学图像。
在一些可能的实施方式中,在将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框以及所述至少一个候选框中每个候选框的第一掩码图方面,处理器602具体用于执行以下操作:
将所述医学图像输入到目标检测网络,得到所述医学图像的第三特征图;
获取与所述医学图像所属的医学领域对应的医学知识图谱;
将所述医学知识图谱进行向量化,得到与所述医学知识图谱对应的第一特征向量;
将所述第一特征向量进行多次复制与组合,得到第一三维矩阵,其中,所述第一三维矩阵中每层的数据相同,且所述第一三维矩阵的每层的尺寸与所述第三特征图的尺寸相同;
将所述第一三维矩阵与所述医学图像的第三特征图进行拼接,得到第四特征图,根据所述第四特征图对所述医学图像进行目标检测,得到所述医学图像中与目标对应的至少一个候选框以及所述每个候选框对应的第一掩码图。
在一些可能的实施方式中,处理器602还用于读取存储器603中的计算机程序执行以下操作:
对所述目标候选框所框选的图像进行特征提取,得到第五特征图;
将所述医学图像所对应的身体部位进行编码,得到与所述身体部位对应的第二特征向量,将所述第二特征向量进行多次复制与组合,得到第二三维矩阵,其中,所述第二三维矩阵中每层的数据相同,且所述第二三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
将所述目标候选框在所述待检测图像中的位置信息进行编码,得到第三特征向量,将所述第三特征向量进行多次复制与组合,得到第三三维矩阵,其中,所述第三三维矩阵中每层的数据相同,且所述第三三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
将所述第二三维矩阵、所述第三三维矩阵以及所述第五特征图进行拼接,得到第六特征图;
根据所述第六特征图,对所述目标候选框中的目标进行分割。
具体地,上述收发器601可为图5所述的实施例的目标检测定位置信度确定装置500的获取单元501,上述处理器602可以为图5所述的实施例的目标检测定位置信度确定装置500的处理单元502。
应理解,本申请中的目标检测定位置信度确定装置可以包括智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备MID(Mobile Internet Devices,简称:MID)或穿戴式设备等。上述目标检测定位置信度确定装置仅是举例,而非穷举,包含但不限于上述目标检测定位置信度确定装置。在实际应用中,上述目标检测定位置信度确定装置还可以包括:智能车载终端、计算机设备等等。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如上述方法实施例中记载的任何一种目标检测定位置信度确定方法的部分或全部步骤。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种目标检测定位置信度确定方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种目标检测定位置信度确定方法,包括:
    获取待检测图像;
    将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
    对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
    从所述N个第一特征图中截取N个第二特征图;
    将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
    根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
  2. 根据权利要求1所述的方法,其中,在从N个第一特征图中截取N个第二特征图之前,所述方法还包括:
    获取所述目标检测网络所规定的尺寸;
    根据所述每个候选框的第二掩码图的尺寸、所述目标检测网络所规定的尺寸以及预设的超参,确定所述目标检测网络的第一网络层。
  3. 根据权利要求1所述的方法,其中,所述根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度,包括:
    对所述每个候选框对应的第一目标特征图进行特征提取,得到所述每个候选框对应的第二目标特征图;
    通过两个卷积核尺寸为1*1的卷积层先后对所述每个候选框对应的第二目标特征图进行卷积处理,得到所述每个候选框对应的特征向量;
    将所述每个候选框对应的特征向量进行归一化处理,得到所述每个候选框的定位置信度。
  4. 根据权利要求1-3中任一项所述的方法,其中,在根据所述目标特征图,确定所述每个候选框的定位置信度之后,所述方法还包括:
    获取所述每个候选框的分类置信度;
    根据所述每个候选框的分类置信度以及所述每个候选框的定位置信度,确定所述每个候选框的目标置信度;
    根据所述每个候选框的目标置信度,对所述至少一个候选框进行非极大值抑制,得到所述至少一个候选框中的目标候选框。
  5. 根据权利要求1-3中任一项所述的方法,其中,所述待检测图像包括医学图像。
  6. 根据权利要求5所述的方法,其中,所述将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框以及所述至少一个候选框中每个候选框的第一掩码图,包括:
    将所述医学图像输入到目标检测网络,得到所述医学图像的第三特征图;
    获取与所述医学图像所属的医学领域对应的医学知识图谱;
    将所述医学知识图谱进行向量化,得到与所述医学知识图谱对应的第一特征向量;
    将所述第一特征向量进行多次复制与组合,得到第一三维矩阵,其中,所述第一三维矩阵中每层的数据相同,且所述第一三维矩阵的每层的尺寸与所述第三特征图的尺寸相同;
    将所述第一三维矩阵与所述医学图像的第三特征图进行拼接,得到第四特征图,根据所述第四特征图对所述医学图像进行目标检测,得到所述医学图像中与目标对应的至少一 个候选框以及所述每个候选框对应的第一掩码图。
  7. 根据权利要求6所述的方法,其中,所述方法还包括:
    对所述目标候选框所框选的图像进行特征提取,得到第五特征图;
    将所述医学图像所对应的身体部位进行编码,得到与所述身体部位对应的第二特征向量,将所述第二特征向量进行多次复制与组合,得到第二三维矩阵,其中,所述第二三维矩阵中每层的数据相同,且所述第二三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
    将所述目标候选框在所述待检测图像中的位置信息进行编码,得到第三特征向量,将所述第三特征向量进行多次复制与组合,得到第三三维矩阵,其中,所述第三三维矩阵中每层的数据相同,且所述第三三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
    将所述第二三维矩阵、所述第三三维矩阵以及所述第五特征图进行拼接,得到第六特征图;
    根据所述第六特征图,对所述目标候选框中的目标进行分割。
  8. 一种目标检测定位置信度确定装置,包括:
    获取单元,用于获取待检测图像;
    处理单元,用于将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
    对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
    从所述N个第一特征图中截取N个第二特征图;
    将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W1*H1*(N+1)的三维矩阵,W1*H1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
    根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
  9. 一种电子设备,包括:处理器和存储器,所述处理器与所述存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行以下方法:
    获取待检测图像;
    将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
    对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
    从所述N个第一特征图中截取N个第二特征图;
    将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
    根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
  10. 根据权利要求9所述的电子设备,其中,在从N个第一特征图中截取N个第二特征图之前,所述处理器还用于执行:
    获取所述目标检测网络所规定的尺寸;
    根据所述每个候选框的第二掩码图的尺寸、所述目标检测网络所规定的尺寸以及预设的超参,确定所述目标检测网络的第一网络层。
  11. 根据权利要求9所述的电子设备,其中,执行所述根据所述每个候选框对应的第 一目标特征图,确定所述每个候选框的定位置信度,包括:
    对所述每个候选框对应的第一目标特征图进行特征提取,得到所述每个候选框对应的第二目标特征图;
    通过两个卷积核尺寸为1*1的卷积层先后对所述每个候选框对应的第二目标特征图进行卷积处理,得到所述每个候选框对应的特征向量;
    将所述每个候选框对应的特征向量进行归一化处理,得到所述每个候选框的定位置信度。
  12. 根据权利要求9-11中任一项所述的电子设备,其中,在根据所述目标特征图,确定所述每个候选框的定位置信度之后,所述处理器还用于执行:
    获取所述每个候选框的分类置信度;
    根据所述每个候选框的分类置信度以及所述每个候选框的定位置信度,确定所述每个候选框的目标置信度;
    根据所述每个候选框的目标置信度,对所述至少一个候选框进行非极大值抑制,得到所述至少一个候选框中的目标候选框。
  13. 根据权利要求9-11中任一项所述的电子设备,其中,所述待检测图像包括医学图像;执行所述将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框以及所述至少一个候选框中每个候选框的第一掩码图,包括:
    将所述医学图像输入到目标检测网络,得到所述医学图像的第三特征图;
    获取与所述医学图像所属的医学领域对应的医学知识图谱;
    将所述医学知识图谱进行向量化,得到与所述医学知识图谱对应的第一特征向量;
    将所述第一特征向量进行多次复制与组合,得到第一三维矩阵,其中,所述第一三维矩阵中每层的数据相同,且所述第一三维矩阵的每层的尺寸与所述第三特征图的尺寸相同;
    将所述第一三维矩阵与所述医学图像的第三特征图进行拼接,得到第四特征图,根据所述第四特征图对所述医学图像进行目标检测,得到所述医学图像中与目标对应的至少一个候选框以及所述每个候选框对应的第一掩码图。
  14. 根据权利要求13所述的电子设备,其中,所述处理器还用于执行:
    对所述目标候选框所框选的图像进行特征提取,得到第五特征图;
    将所述医学图像所对应的身体部位进行编码,得到与所述身体部位对应的第二特征向量,将所述第二特征向量进行多次复制与组合,得到第二三维矩阵,其中,所述第二三维矩阵中每层的数据相同,且所述第二三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
    将所述目标候选框在所述待检测图像中的位置信息进行编码,得到第三特征向量,将所述第三特征向量进行多次复制与组合,得到第三三维矩阵,其中,所述第三三维矩阵中每层的数据相同,且所述第三三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
    将所述第二三维矩阵、所述第三三维矩阵以及所述第五特征图进行拼接,得到第六特征图;
    根据所述第六特征图,对所述目标候选框中的目标进行分割。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:
    获取待检测图像;
    将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框、每个候选框的第一掩码图以及N个第一特征图,所述N个第一特征图为所述目标检测网络的第一网络层的N个通道上的N个特征图,所述第一网络层位于所述目标检测网络的底部,N为大于1的整数;
    对所述每个候选框的第一掩码图进行扩充,得到所述每个候选框的第二掩码图;
    从所述N个第一特征图中截取N个第二特征图;
    将所述每个候选框的第二掩码图与所述N个第二特征图进行拼接,得到所述每个候选框对应的第一目标特征图,其中,所述第一目标特征图是一个维度为W 1*H 1*(N+1)的三维矩阵,W 1*H 1为所述每个候选框的第二掩码图以及每个第二特征图的尺寸;
    根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度。
  16. 根据权利要求15所述的计算机可读存储介质,其中,在从N个第一特征图中截取N个第二特征图之前,所述计算机程序被处理器执行时还用于实现:
    获取所述目标检测网络所规定的尺寸;
    根据所述每个候选框的第二掩码图的尺寸、所述目标检测网络所规定的尺寸以及预设的超参,确定所述目标检测网络的第一网络层。
  17. 根据权利要求15所述的计算机可读存储介质,其中,执行所述根据所述每个候选框对应的第一目标特征图,确定所述每个候选框的定位置信度,包括:
    对所述每个候选框对应的第一目标特征图进行特征提取,得到所述每个候选框对应的第二目标特征图;
    通过两个卷积核尺寸为1*1的卷积层先后对所述每个候选框对应的第二目标特征图进行卷积处理,得到所述每个候选框对应的特征向量;
    将所述每个候选框对应的特征向量进行归一化处理,得到所述每个候选框的定位置信度。
  18. 根据权利要求15-17中任一项所述的计算机可读存储介质,其中,在根据所述目标特征图,确定所述每个候选框的定位置信度之后,所述计算机程序被处理器执行时还用于实现:
    获取所述每个候选框的分类置信度;
    根据所述每个候选框的分类置信度以及所述每个候选框的定位置信度,确定所述每个候选框的目标置信度;
    根据所述每个候选框的目标置信度,对所述至少一个候选框进行非极大值抑制,得到所述至少一个候选框中的目标候选框。
  19. 根据权利要求15-17中任一项所述的计算机可读存储介质,其中,所述待检测图像包括医学图像;执行所述将所述待检测图像输入到目标检测网络进行目标检测,得到至少一个候选框以及所述至少一个候选框中每个候选框的第一掩码图,包括:
    将所述医学图像输入到目标检测网络,得到所述医学图像的第三特征图;
    获取与所述医学图像所属的医学领域对应的医学知识图谱;
    将所述医学知识图谱进行向量化,得到与所述医学知识图谱对应的第一特征向量;
    将所述第一特征向量进行多次复制与组合,得到第一三维矩阵,其中,所述第一三维矩阵中每层的数据相同,且所述第一三维矩阵的每层的尺寸与所述第三特征图的尺寸相同;
    将所述第一三维矩阵与所述医学图像的第三特征图进行拼接,得到第四特征图,根据所述第四特征图对所述医学图像进行目标检测,得到所述医学图像中与目标对应的至少一个候选框以及所述每个候选框对应的第一掩码图。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还用于实现:
    对所述目标候选框所框选的图像进行特征提取,得到第五特征图;
    将所述医学图像所对应的身体部位进行编码,得到与所述身体部位对应的第二特征向量,将所述第二特征向量进行多次复制与组合,得到第二三维矩阵,其中,所述第二三维矩阵中每层的数据相同,且所述第二三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
    将所述目标候选框在所述待检测图像中的位置信息进行编码,得到第三特征向量,将 所述第三特征向量进行多次复制与组合,得到第三三维矩阵,其中,所述第三三维矩阵中每层的数据相同,且所述第三三维矩阵的每层的尺寸与所述第五特征图的尺寸相同;
    将所述第二三维矩阵、所述第三三维矩阵以及所述第五特征图进行拼接,得到第六特征图;
    根据所述第六特征图,对所述目标候选框中的目标进行分割。
PCT/CN2021/096419 2020-12-25 2021-05-27 目标检测定位置信度确定方法、装置、电子设备及存储介质 WO2022134464A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011572377.6 2020-12-25
CN202011572377.6A CN112668573B (zh) 2020-12-25 2020-12-25 目标检测定位置信度确定方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022134464A1 true WO2022134464A1 (zh) 2022-06-30

Family

ID=75410156

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096419 WO2022134464A1 (zh) 2020-12-25 2021-05-27 目标检测定位置信度确定方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112668573B (zh)
WO (1) WO2022134464A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272086A (zh) * 2022-09-29 2022-11-01 杭州太美星程医药科技有限公司 医学图像缝合方法和装置、电子设备和存储介质
CN116563665A (zh) * 2023-04-25 2023-08-08 北京百度网讯科技有限公司 目标检测模型的训练方法、目标检测方法、装置及设备
CN117351485A (zh) * 2023-12-04 2024-01-05 深圳赛陆医疗科技有限公司 基因测序训练数据集的增强方法及装置、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668573B (zh) * 2020-12-25 2022-05-10 平安科技(深圳)有限公司 目标检测定位置信度确定方法、装置、电子设备及存储介质
CN114898155B (zh) * 2022-05-18 2024-05-28 平安科技(深圳)有限公司 车辆定损方法、装置、设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464230A (zh) * 2017-08-23 2017-12-12 京东方科技集团股份有限公司 图像处理方法及装置
CN110070072A (zh) * 2019-05-05 2019-07-30 厦门美图之家科技有限公司 一种生成物体检测模型的方法
CN110633661A (zh) * 2019-08-31 2019-12-31 南京理工大学 一种融合语义分割的遥感图像目标检测方法
US20200050871A1 (en) * 2017-11-28 2020-02-13 Visual Semantics, Inc. Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation
CN110909642A (zh) * 2019-11-13 2020-03-24 南京理工大学 一种基于多尺度语义特征融合的遥感图像目标检测方法
CN111667468A (zh) * 2020-05-28 2020-09-15 平安科技(深圳)有限公司 基于神经网络的oct图像病灶检测方法、装置及介质
CN111967595A (zh) * 2020-08-17 2020-11-20 成都数之联科技有限公司 候选框标注方法及系统及模型训练方法及目标检测方法
CN112001841A (zh) * 2020-07-14 2020-11-27 歌尔股份有限公司 图像待检测区域提取方法、装置和产品缺陷检测系统
CN112668573A (zh) * 2020-12-25 2021-04-16 平安科技(深圳)有限公司 目标检测定位置信度确定方法、装置、电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7702673B2 (en) * 2004-10-01 2010-04-20 Ricoh Co., Ltd. System and methods for creation and use of a mixed media environment
CN106682669A (zh) * 2016-12-15 2017-05-17 深圳市华尊科技股份有限公司 一种图像处理方法及移动终端
CN108416776B (zh) * 2018-03-16 2021-04-30 京东方科技集团股份有限公司 图像识别方法、图像识别装置、计算机产品和可读存储介质
CN109389030B (zh) * 2018-08-23 2022-11-29 平安科技(深圳)有限公司 人脸特征点检测方法、装置、计算机设备及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464230A (zh) * 2017-08-23 2017-12-12 京东方科技集团股份有限公司 图像处理方法及装置
US20200050871A1 (en) * 2017-11-28 2020-02-13 Visual Semantics, Inc. Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation
CN110070072A (zh) * 2019-05-05 2019-07-30 厦门美图之家科技有限公司 一种生成物体检测模型的方法
CN110633661A (zh) * 2019-08-31 2019-12-31 南京理工大学 一种融合语义分割的遥感图像目标检测方法
CN110909642A (zh) * 2019-11-13 2020-03-24 南京理工大学 一种基于多尺度语义特征融合的遥感图像目标检测方法
CN111667468A (zh) * 2020-05-28 2020-09-15 平安科技(深圳)有限公司 基于神经网络的oct图像病灶检测方法、装置及介质
CN112001841A (zh) * 2020-07-14 2020-11-27 歌尔股份有限公司 图像待检测区域提取方法、装置和产品缺陷检测系统
CN111967595A (zh) * 2020-08-17 2020-11-20 成都数之联科技有限公司 候选框标注方法及系统及模型训练方法及目标检测方法
CN112668573A (zh) * 2020-12-25 2021-04-16 平安科技(深圳)有限公司 目标检测定位置信度确定方法、装置、电子设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272086A (zh) * 2022-09-29 2022-11-01 杭州太美星程医药科技有限公司 医学图像缝合方法和装置、电子设备和存储介质
CN115272086B (zh) * 2022-09-29 2022-12-27 杭州太美星程医药科技有限公司 医学图像缝合方法和装置、电子设备和存储介质
CN116563665A (zh) * 2023-04-25 2023-08-08 北京百度网讯科技有限公司 目标检测模型的训练方法、目标检测方法、装置及设备
CN117351485A (zh) * 2023-12-04 2024-01-05 深圳赛陆医疗科技有限公司 基因测序训练数据集的增强方法及装置、设备及存储介质
CN117351485B (zh) * 2023-12-04 2024-03-08 深圳赛陆医疗科技有限公司 基因测序训练数据集的增强方法及装置、设备及存储介质

Also Published As

Publication number Publication date
CN112668573B (zh) 2022-05-10
CN112668573A (zh) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2022134464A1 (zh) 目标检测定位置信度确定方法、装置、电子设备及存储介质
CN109961009B (zh) 基于深度学习的行人检测方法、系统、装置及存储介质
WO2021017261A1 (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
CN109683699B (zh) 基于深度学习实现增强现实的方法、装置及移动终端
WO2020119527A1 (zh) 人体动作识别方法、装置、终端设备及存储介质
CN111814794B (zh) 文本检测方法、装置、电子设备及存储介质
CN112991447A (zh) 一种动态环境下视觉定位与静态地图构建方法及系统
CN109829353B (zh) 一种基于空间约束的人脸图像风格化方法
CN111459269B (zh) 一种增强现实显示方法、系统及计算机可读存储介质
CN111583184A (zh) 图像分析方法、网络、计算机设备和存储介质
CN108053447A (zh) 基于图像的重定位方法、服务器及存储介质
KR20200136723A (ko) 가상 도시 모델을 이용하여 객체 인식을 위한 학습 데이터 생성 방법 및 장치
CN111104813A (zh) 二维码图像关键点检测方法、装置、电子设备及存储介质
CN110991412A (zh) 人脸识别的方法、装置、存储介质及电子设备
CN117274605B (zh) 一种从无人机拍摄的照片中提取水域轮廓的方法及装置
CN113516697B (zh) 图像配准的方法、装置、电子设备及计算机可读存储介质
CN112333468B (zh) 图像处理方法、装置、设备及存储介质
CN111353325A (zh) 关键点检测模型训练方法及装置
WO2019076188A1 (zh) 一种图像目标识别方法、装置及计算机设备
CN114972492A (zh) 一种基于鸟瞰图的位姿确定方法、设备和计算机存储介质
CN116030466B (zh) 图像文本信息识别与处理方法、装置及计算机设备
CN115578539B (zh) 室内空间高精度视觉位置定位方法、终端及存储介质
CN116310899A (zh) 基于YOLOv5改进的目标检测方法及装置、训练方法
CN110390724A (zh) 一种带有实例分割的slam方法
CN112288817B (zh) 基于图像的三维重建处理方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21908475

Country of ref document: EP

Kind code of ref document: A1