WO2021196896A1 - 目标检测方法、装置、电子设备和可读存储介质 - Google Patents

目标检测方法、装置、电子设备和可读存储介质 Download PDF

Info

Publication number
WO2021196896A1
WO2021196896A1 PCT/CN2021/075822 CN2021075822W WO2021196896A1 WO 2021196896 A1 WO2021196896 A1 WO 2021196896A1 CN 2021075822 W CN2021075822 W CN 2021075822W WO 2021196896 A1 WO2021196896 A1 WO 2021196896A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
information
target
target detection
amount
Prior art date
Application number
PCT/CN2021/075822
Other languages
English (en)
French (fr)
Inventor
尚太章
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021196896A1 publication Critical patent/WO2021196896A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of image processing technology, and more specifically, to a target detection method, device, electronic equipment, and readable storage medium.
  • target detection has been widely used in electronic devices such as autonomous driving, pedestrian detection, license plate recognition, mobile phones and AR glasses, which has also enabled a large number of intelligent algorithms to be integrated into these electronic devices for further improvement
  • the intelligence of electronic devices Although target detection has been developed for many years, there are still many problems to be solved, for example, the detection efficiency is too low.
  • This application proposes a target detection method, device, electronic equipment, and readable storage medium to improve the above-mentioned defects.
  • an embodiment of the present application provides a target detection method, including: performing feature extraction on an image to be detected to obtain multiple layers of the image to be detected; obtaining the amount of information corresponding to each layer; and searching The layer whose amount of information meets the preset condition is used as the target layer; the target layer is used to obtain the target detection result.
  • an embodiment of the present application also provides a target detection device, which includes: a feature acquisition module, an information amount acquisition module, a target layer acquisition module, and a detection result acquisition module.
  • the feature acquisition module is used to perform feature extraction on the image to be detected to obtain multiple layers of the image to be detected.
  • the information volume obtaining module is used to obtain the information volume corresponding to each layer.
  • the target layer acquisition module is used to find the layer whose information content meets the preset conditions and serve as the target layer.
  • the detection result obtaining module is used to obtain the target detection result by using the target layer.
  • the embodiments of the present application also provide an electronic device, including one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and Is configured to be executed by the one or more processors, and the one or more programs are configured to execute the above-mentioned method.
  • an embodiment of the present application also provides a computer-readable medium, and the computer-readable storage medium stores program code, and the program code can be invoked by a processor to execute the foregoing method.
  • the target detection method, device, electronic device, and readable storage medium provided by the embodiments of the present application, feature extraction is performed on the image to be detected first to obtain multiple layers of the image to be detected, and then information corresponding to each layer is obtained Then, find the layer whose information meets the preset conditions as the target layer, and use the target layer to obtain the target detection result.
  • This application makes the target layer acquired by target detection more effective through the introduction of the amount of information, which not only improves the accuracy of the target detection result, but also speeds up the efficiency of target detection to a certain extent.
  • FIG. 1 shows a method flowchart of a target detection method provided by an embodiment of the present application
  • FIG. 2 shows a structure diagram of a target detection model in a target detection method provided by an embodiment of the present application
  • FIG. 3 shows a schematic diagram of an information amount calculation module in a target detection method provided by an embodiment of the present application
  • FIG. 4 shows a method flowchart of a target detection method provided by another embodiment of the present application.
  • FIG. 5 shows a flowchart of step S403 in the target detection method provided by another embodiment of the present application.
  • FIG. 6 shows a method flowchart of a target detection method provided by another embodiment of the present application.
  • FIG. 7A shows a schematic diagram of a target layer including multiple candidate frames in a target detection method provided by another embodiment of the present application.
  • FIG. 7B shows a schematic diagram of a target candidate frame obtained through de-duplication processing in a target detection method provided by another embodiment of the present application.
  • FIG. 8 shows a flowchart of acquiring a target detection model in a target detection method provided by another embodiment of the present application.
  • Fig. 9 shows a block diagram of a target detection device provided by an embodiment of the present application.
  • FIG. 10 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 11 shows a storage unit provided by an embodiment of the present application for storing or carrying program code for implementing the target detection method according to the embodiment of the present application.
  • the target detection algorithm is different from the classification algorithm. It not only needs to identify the category to which the target belongs, but also detects the position of the target in the picture, so the classification is more difficult.
  • the target detection algorithm has been developed for many years, there are still many problems to be solved. For example, the overlap and concealment between targets, the small target in the picture, the similarity of the targets in the class, and the wide variety of different objects in nature can all lead to problems.
  • the target detection accuracy is not high.
  • the network model used for target detection has high computational complexity and large delay compared with other network models. At the same time, the model needs to occupy a larger memory space. This makes it difficult to deploy the target detection model in terminal equipment but has low practicality. .
  • Target detection algorithms can be divided into two categories, namely one-stage target detection algorithms and two-stage target detection algorithms.
  • one-stage target detection algorithms mainly include SSD (Single Shot Multibox Detector, single-stage multi-target detection) series and yolo (You only lookonce, a single-stage target detection model) series
  • two-stage target detection algorithms mainly Represented by the RCNN series, including RCNN, fast-RCNN, faster-RCNN, R-RCN, mask-RCNN, etc.
  • the one-stage target detection algorithm has low time complexity and good practicability. It can be deployed in terminal equipment, but its detection accuracy is relatively low.
  • the two-stage target detection algorithm has higher accuracy than the one-stage target detection algorithm, but the time complexity of the network model is high, so it is more difficult to deploy it on the terminal device.
  • Deep learning technology has occupied a dominant position in the field of computer vision, and many target detection algorithms have emerged rapidly, and have quickly become the main method in the field of target detection.
  • the relative size of the small target can be increased by increasing the resolution of the input picture, thereby improving the detection effect of the small target.
  • a larger and deeper feature extraction network model is used to extract feature information, so that more effective feature information can be extracted, the characterization ability of features can be improved, and the detection effect of small targets can be improved by optimizing features.
  • the higher-level features can represent the upper-level semantic information of the picture, the information of the small target is likely to be lost in the upper layer, so the use of the low-level feature information can improve the detection effect of the small target to a certain extent.
  • an embodiment of the present application provides a target detection method.
  • FIG. 1 shows a target detection method provided by an embodiment of the present application, and the method is applied to the electronic devices of FIG. 10 and FIG. 11.
  • the target detection method is used to improve the accuracy of target detection results.
  • the target detection method includes: step S101 to step S104.
  • Step S101 Perform feature extraction on the image to be detected to obtain multiple layers of the image to be detected.
  • the embodiment of the present application may use a trained target detection model to perform feature extraction on the image to be detected, that is, input the image to be detected into the target detection model, and use the target detection model to perform feature extraction on the image to be detected.
  • Feature extraction refers to the use of the target detection model to extract the image information to be detected, and to determine whether each image point in the image to be detected belongs to an image feature, and the result of feature extraction is to divide the points on the image into different sub Sets, these subsets often belong to isolated points, continuous curves or continuous regions.
  • the image to be detected may be one image or multiple images, and the format of the image may include bmp, jpg, png, etc., which format is not specifically limited here.
  • the image features include color features, texture features, shape features, and spatial relationship features.
  • the color features are pixel-based features and are mainly used to represent the surface properties of the scene corresponding to the image. Similar to the color feature, the texture feature is also used to express the surface properties of the scene corresponding to the image. Unlike the color feature, the texture feature is not based on pixel features. It needs to be statistically calculated in an area containing multiple pixels.
  • the shape feature is the shape of the target segmented in the image. The shape feature can be divided into the contour feature and the area feature. The contour feature of the image is mainly for the outer boundary of the object, and the area feature of the image is related to the entire shape area.
  • the spatial relationship feature refers to the mutual spatial position or relative direction relationship between multiple objects segmented in the image. These relationships can also be divided into connection/adjacent relationships, overlap/overlap relationships, and containment/containment relationships. Spatial location information can be divided into relative spatial location information and absolute spatial location information. Relative spatial location information emphasizes the relative situation between targets, such as the relationship between up and down, left and right. Absolute spatial location information emphasizes the distance between targets. And orientation.
  • Step S102 Obtain the amount of information corresponding to each of the layers.
  • the amount of information in the embodiments of this application refers to the number of feature points contained in the feature map in a layer, that is, the amount of information corresponding to different layers is different, so the larger the amount of information, the more the features contained in the layer The more points.
  • Obtaining the amount of information corresponding to each of the layers includes: separating the feature maps in each of the layers according to the number of channels to obtain multiple feature vectors, and then obtaining the covariance matrix of each of the feature vectors, and Obtain the amount of information corresponding to each layer according to the covariance matrix.
  • c is the number of channels
  • cov ij is the covariance matrix
  • the embodiment of the present application provides the structure diagram of the target detection model as shown in FIG. 2.
  • the detection model can perform feature extraction on the image to be detected to obtain multiple layers.
  • L3, L4, L5, L6, and L7 as shown in Fig. 2 are all layers, and each of the layers contains different feature maps, which also indicates that the number of feature points included is also different. Then the layers can be transferred to the information calculation module, and the module can be used to calculate the number of feature points contained in each layer.
  • L3 A layer can include multiple feature maps, and these feature maps can be separated according to the number of channels.
  • the feature map can be a feature matrix, which is mainly obtained by the target detection model using its convolutional layer to perform feature extraction on the image to be detected.
  • Each feature map can include multiple feature points. If the feature points are different, the corresponding feature map is It is not the same.
  • Some feature maps extract the contour of the object to be detected, some feature maps extract the shape of the object to be detected, and some feature maps extract the strongest feature of the object to be detected.
  • the A feature map extracts the eyes of a cat
  • the B feature map extracts the cat's ears
  • the C feature map extracts the overall outline of the cat.
  • the feature maps in the L3 layer are separated according to the number of channels to obtain c m*n vectors, where c is the number of channels. Then calculate the covariance matrix of these c vectors, and finally obtain the amount of information corresponding to the L3 layer according to the obtained covariance matrix.
  • the amount of information calculated is the amount of information in each area in the image to be detected, there will be a phenomenon that the amount of information of some layers is 0. At this time, it indicates that there is no layer corresponding to the amount of information.
  • Useful feature information so that the layer can be directly removed without using it for subsequent target detection, which can reduce the interference of useless layers to a certain extent.
  • Step S103 Search for a layer whose amount of information meets a preset condition, and use it as a target layer.
  • multiple layers can be filtered by judging whether the amount of information satisfies a preset condition, such as judging whether the amount of information is greater than a preset threshold, and if it is greater than the preset threshold, the layer corresponding to the amount of information is retained. If it is less than the preset threshold, the layer corresponding to the amount of information is removed.
  • a preset condition such as judging whether the amount of information is greater than a preset threshold, and if it is greater than the preset threshold, the layer corresponding to the amount of information is retained. If it is less than the preset threshold, the layer corresponding to the amount of information is removed.
  • Step S104 Obtain a target detection result by using the target layer.
  • the target detection result can be obtained according to the target layer, and the target layer can contain position information of the target to be detected in the image to be detected and category information of the target to be detected.
  • the target detection result can be obtained by comparing and analyzing this information.
  • the position information of the target to be detected in the image to be detected may include the width and height of the image corresponding to the target to be detected, as well as the coordinates of the center point of the target image to be detected, the coordinates of the upper left corner, or the coordinates of the lower right corner.
  • the target detection result may also include the confidence level and the probability of the target to be detected. Among them, the range of confidence can be [0, 1], and the introduction of these two pieces of information can make the description of the target detection result more accurate.
  • the target detection method proposed in the embodiments of the present application acquires multiple layers, and the multiple layers contain enough feature points to realize small target detection.
  • the acquisition of these layers can make the detection of small targets more accurate.
  • this application uses the amount of information to filter these layers to remove useless layers. This can not only improve the accuracy of small target detection, but also to a certain extent. Can speed up the efficiency of target detection.
  • the target detection method may include steps S401 to S404.
  • Step S401 Perform feature extraction on the image to be detected to obtain multiple layers of the image to be detected.
  • Step S402 Obtain the amount of information corresponding to each of the layers.
  • Step S403 Find a layer with an information amount greater than a preset threshold, and use it as a target layer.
  • the amount of information is the number of feature points contained in all feature maps in the layer. After obtaining the amount of information for each layer, it can be judged whether the amount of information corresponding to each layer is greater than the preset threshold. Set the threshold to save the layer corresponding to the amount of information.
  • 5 layers can be obtained, which are layer L3, layer L4, layer L5, layer L6, and layer L7, which can be obtained by calculation
  • the preset threshold C can be set to 6, and C is compared with the amount of information l 3 , the amount of information l 4 , the amount of information l 5 , the amount of information l 6 and the amount of information l 7 respectively, then the amount of information can be known l 5 is less than the preset threshold C, and the amount of information l 3 , the amount of information l 4 , the amount of information l 6, and the amount of information l 7 are all greater than the preset threshold C.
  • the amount of information l 3 , the amount of information l 4 , The layer L3, the layer L4, the layer L6, and the layer L7 corresponding to the information amount l 6 and the information amount l 7 are used as target layers.
  • the preset threshold may be an empirical value, or the average value of the amount of information corresponding to all layers.
  • the preset threshold C may also be the amount of information l 3 , the amount of information l 4 , and the information The average of the amount l 5 , the amount of information l 6, and the amount of information l 7. If the amount of information is greater than the average value, the layer corresponding to the amount of information is retained, and if the amount is less than the average value, the layer corresponding to the amount of information is removed.
  • multiple preset thresholds can be acquired. When the layer is divided into multiple categories according to the feature information it contains, each category can correspond to a preset threshold. In this way, the finally obtained target layer can be more accurate and effective.
  • the specific setting of the preset threshold in the embodiment of the present application is not specifically limited here, and can be obtained according to actual needs.
  • searching for a layer with an information amount greater than a preset threshold may include the steps shown in FIG. 5, and it can be known from FIG. 5 that step S403 includes step S4031 to step S4032.
  • Step S4031 From the collection of the multiple layers, a layer with an information amount greater than a preset threshold is searched for as a layer to be selected.
  • the layers with the information greater than the preset threshold can be selected as the candidate layers, and the layers with the information less than the preset threshold can be directly removed.
  • the amount of information l 3 , the amount of information l 4 , the amount of information l 6, and the amount of information 7 are all greater than the preset threshold C, then the amount of information l 3 , the amount of information l 4 , the amount of information l 6 and The layer L3, layer L4, layer L6, and layer L7 corresponding to the amount of information l 7 can all be used as candidate layers, and the layer L3 corresponding to the amount of information l 3 can be directly selected because the amount of information is less than the preset threshold. Remove it and not use it as a candidate layer.
  • Step S4032 From the multiple candidate layers, search for a layer that meets the specified requirements and use it as the target layer.
  • the order of each layer to be selected can be determined based on the order of the amount of information in descending order to obtain the target sequence, and then the top N candidate layers in the target sequence are used as the target layer.
  • N is a positive integer.
  • the order of the information amount obtained is information amount l 3 , information amount l 6 , Information volume l 7 and information volume 4 , and then the order of each layer to be selected can be determined according to the sequence, and the target sequence is obtained.
  • the target sequence is: layer to be selected L3, layer to be selected L6, layer to be selected L7 And the candidate layer L4, at this time, the top 3 candidate layers in the target sequence can be used as the target layer.
  • the target layer includes the candidate layer L3, the candidate layer L6 and the candidate image. Layer L7.
  • the order of each layer to be selected may be determined based on the order of the amount of information from small to large to obtain the target sequence, and then the lower M candidate layers in the target sequence are used as the target image.
  • Floor M is a positive number, and M can be set according to empirical values. In the embodiment of the present application, M can be set to 3 first, that is, the last three candidate layers in the target sequence are used as target layers.
  • Step S404 Obtain a target detection result by using the target layer.
  • the target detection method proposed in the embodiment of the present application can first obtain the amount of information corresponding to each layer after obtaining multiple layers, and then compare the amount of information with a preset threshold to obtain the amount of information that meets the conditions. Then the layer corresponding to the amount of information can be obtained.
  • the target detection method proposed in this application can make the acquisition of target detection results more accurate and effective.
  • the target detection method may include steps S601 to S606.
  • Step S601 Perform feature extraction on the image to be detected to obtain multiple layers of the image to be detected.
  • Step S602 Obtain the amount of information corresponding to each of the layers.
  • Step S603 Search for a layer whose amount of information meets a preset condition, and use it as a target layer.
  • Step S604 Obtain the recognition result and candidate frame of each feature map in each target layer, and each candidate frame corresponds to a recognition result.
  • the recognition result and candidate frame of each feature map in each target layer can be obtained, and each candidate frame corresponds to a recognition result.
  • the recognition result may include the category of the object to be detected, the classification probability of the object to be detected, and the position information of the object to be detected in the image to be detected.
  • the target layer can include multiple candidate layers. Therefore, one object to be detected can correspond to multiple candidate frames. It should be noted that each layer includes at least one feature map, and the target layer includes at least one layer.
  • Step S605 De-duplicate the candidate frames to obtain the remaining candidate frames.
  • the candidate frame After obtaining the candidate frame of each feature map of each layer, the candidate frame can be deduplicated, and the better candidate frame can be retained.
  • the embodiment of the present application may use non-maximum value suppression to perform deduplication processing on candidate frames.
  • the non-maximum value suppression is used to suppress elements that are not maximum values, remove redundant candidate frames, and retain the best candidate frame.
  • the target layer finally obtained through the feature extraction and layer search operations of the image to be detected may include 4 candidate frames, which are A, B, C, and D respectively, where, The target to be detected corresponding to the candidate frame A and the candidate frame D is a dog, and the target to be detected corresponding to the candidate frame B and the candidate frame C is a cat.
  • each candidate frame corresponds to a recognition result
  • the recognition result includes the classification probability of each candidate frame.
  • This application can sort each candidate frame according to the classification probability, such as the classification of candidate frame A The probability is 0.9, the classification probability of candidate frame D is 0.7, the classification probability of candidate frame B is 0.85, and the classification probability of candidate frame C is 0.8.
  • the candidate frame is discarded.
  • IOU AD 0.9
  • IOU AD 0.9 is greater than the preset threshold 0.5
  • the candidate frame B can be marked, and it is determined whether the degree of overlap between the candidate frame and other candidate frames meets a preset condition, and if the preset condition is met, the candidate frame that meets the preset condition is removed.
  • the embodiments of the present application can also use Softe-NMS, Softer-NMS, IoU-guided NMS, or Yes-Net to de-duplicate candidate frames. Which method is used to de-duplicate multiple candidate frames is not here. Make clear restrictions.
  • Step S606 Obtain a target detection result according to the remaining candidate frames and the recognition results corresponding to the remaining candidate frames.
  • a target detection model may be obtained first, and the target detection model may be used in the target detection method proposed in the embodiment of the present application.
  • the acquisition of the detection model is shown in Fig. 8. From the figure, it can be known that the acquisition of the target detection model includes steps S801 to S805.
  • Step S801 Input the training data set to the neural network model.
  • the neural network model is mainly used for target detection, which can be network models such as resnet, mobilenet, xception, densenet, etc.
  • the training data set is mainly used for continuous training and optimization of the neural network model, that is, the neural network model is input
  • the training data set continuously adjusts its network parameters.
  • the training data set can be ImageNet data set, PASCAL VOC data set, CIFAR data set or COCO data set, etc.
  • the training data set can also be obtained by manual shooting, or a web crawler can be used to obtain image data sets of classified targets under different scales, positions, and illuminations from the Internet, and automatically convert related images into fixed-size and fixed-format images. Pictures, for example, automatically convert related pictures into 32*32 jpg pictures.
  • Step S802 Perform a feature extraction operation.
  • the feature extraction operation includes using the neural network model to perform feature extraction on each image in the training data set to obtain multiple first layers.
  • Step S803 Acquire the first information amount corresponding to each of the first layers.
  • obtaining the first information corresponding to each first layer includes: obtaining the number of executions of the feature extraction operation, and if the number of executions of the feature extraction operation is greater than a specified value, obtaining the current information The acquired first information amount corresponding to each of the first layers.
  • this application can obtain a piece of record information, the record information is used to record the number of executions of the feature extraction operation, and when the number of executions of the feature extraction operation is greater than a specified value, the operation of obtaining the amount of information is executed.
  • the ImageNet data set can be input to the neural network model to train the target detection model. Before the final target detection model is obtained, the neural network model needs to be trained for multiple epochs.
  • the embodiment of this application does not add the information amount calculation module at the beginning of the model training. Instead, it adds the information amount calculation module after training N epochs, that is, when the number of executions of the feature extraction operation is greater than the specified value, the first is obtained.
  • the first amount of information of the layer where the first layer is the layer obtained by performing the current feature extraction operation.
  • the first information volume corresponding to the first layer acquired this time is not acquired, that is, the information volume acquisition operation is not performed. For example, if the number of executions of feature extraction is 18, and the specified value is 19, obviously, the number of executions of feature extraction of 18 is less than the specified value of 19.
  • the first information volume of the layer is calculated.
  • the number of executions of feature extraction is 20, and the specified value is 19. Obviously, the number of executions of feature extraction at this time 20 is greater than the specified value of 19. In this case, you need to calculate each of the The first information volume of the first layer.
  • Step S804 From the plurality of first layers, search for a first layer whose first information amount satisfies a preset condition as the first target layer.
  • Step S805 Obtain the loss data of each first layer in the first target layer, and train the neural network model in combination with the loss data to obtain a target detection model.
  • the gradient descent method can be used to combine the loss data to minimize the loss function of the neural network model, and at the same time, the weight parameters of the neural network model can be performed layer by layer.
  • the loss function can include a classification loss function and a positioning loss function.
  • the classification loss function is used to predict the category of the target to be detected, and the position loss function is used to refine the final candidate frame. By combining these two losses The function can dynamically realize the positioning of the final detection frame.
  • the target detection method proposed in the embodiment of the application adds the calculation of the amount of information when acquiring the target detection model, which can make the target detection more accurate.
  • the application can use non-maximum values to suppress the multiplication.
  • the candidate frames are deduplicated, so that a more accurate target detection result can be obtained.
  • the finally obtained target detection model can be applied to the target detection method.
  • this application can obtain multiple layers, and these layers There are enough feature points in it, so there is no need to use a larger and deeper feature extraction model when selecting a neural network model, because a larger and deeper feature extraction model not only requires a lot of memory, but also the target detection speed is slow.
  • This application only needs a simple feature extraction model for target detection. Therefore, the target detection method proposed in this application can not only improve the accuracy of small target detection, but also can effectively reduce the running time of small target detection.
  • an embodiment of the present application proposes a target detection device 900.
  • the target detection device 900 is used to improve the accuracy of target detection.
  • the target detection device 900 includes: a feature acquisition module 901, an information amount acquisition module 902, a target layer acquisition module 903, and a detection result acquisition module 904.
  • the feature acquisition module 901 is configured to perform feature extraction on the image to be detected to obtain multiple layers of the image to be detected.
  • the feature acquisition module 901 is also used to input a training data set to a neural network model to perform a feature extraction operation.
  • the feature extraction operation includes using the neural network model to perform feature extraction on each image in the training data set.
  • Obtain a plurality of first layers obtain the first information amount corresponding to each of the first layers, and search for the first layer whose first information amount meets a preset condition from the plurality of first layers,
  • the loss data of each first layer in the first target layer is acquired, and the neural network model is trained in combination with the loss data to obtain a target detection model, the target detection model Used in the target detection method.
  • the feature acquisition module 901 is also configured to acquire the number of executions of the feature extraction operation. If the number of executions of the feature extraction operation is greater than a specified value, then acquire the corresponding number of each first layer acquired this time. The first amount of information.
  • the information amount obtaining module 902 is used to obtain the information amount corresponding to each of the layers.
  • the target layer obtaining module 903 is configured to find a layer whose information content meets a preset condition and serve as the target layer.
  • the target layer obtaining module 903 is also used to find a layer whose information amount is greater than a preset threshold, as the target layer. Specifically, a layer with an amount of information greater than a preset threshold is searched for as a candidate layer, and a layer that meets a specified requirement is searched for among the plurality of candidate layers as a target layer.
  • the target layer obtaining module 903 is also used to determine the order of each layer to be selected based on the amount of information from large to small, to obtain the target sequence, and to list the top N layers in the target sequence As the target layer, where N is a positive integer.
  • the detection result obtaining module 904 is configured to obtain the target detection result by using the target layer.
  • the detection result obtaining module 904 is also used to obtain the recognition result and candidate frame of each feature map in each target layer, each candidate frame corresponds to a recognition result, and the candidate frame is removed. Reprocessing is performed to obtain the remaining candidate frames, and the target detection result is obtained according to the remaining candidate frames and the recognition results corresponding to the remaining candidate frames.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • FIG. 10 shows a structural block diagram of an electronic device 1000 according to an embodiment of the present application.
  • the electronic device 1000 may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, or an e-book.
  • the electronic device 1000 in this application may include one or more of the following components: a processor 1010, a memory 1020, and one or more application programs, where one or more application programs may be stored in the memory 1020 and configured to be configured by One or more processors 1010 execute, and one or more programs are configured to execute the methods described in the foregoing method embodiments.
  • the processor 1010 may include one or more processing cores.
  • the processor 1010 uses various interfaces and lines to connect various parts of the entire electronic device 1000, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 1020, and calling data stored in the memory 1020.
  • the processor 1010 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 1010 may integrate one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing of display content; the modem is used for processing wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 1010, but may be implemented by a communication chip alone.
  • the memory 1020 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 1020 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 1020 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing an operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc.
  • the data storage area can also store data (such as phone book, audio and video data, chat record data) created by the electronic device 1000 in use.
  • FIG. 11 shows a structural block diagram of a computer-readable storage medium 2000 provided by an embodiment of the present application.
  • the computer-readable storage medium 2000 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
  • the computer-readable storage medium 2000 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 2000 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 2000 has a storage space for executing the program code 2010 of any method step in the above-mentioned method. These program codes can be read from or written into one or more computer program products.
  • the program code 2010 may be compressed in a suitable form, for example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种目标检测方法、装置、电子设备和可读存储介质,属于图像处理技术领域。该方法包括:对待检测图像进行特征提取,得到所述待检测图像的多个图层;获取每个所述图层对应的信息量;查找信息量满足预设条件的图层,作为目标图层;利用所述目标图层获取目标检测结果。本申请根据信息量获取目标图层,如此能够更加准确有效的获取到目标检测结果。

Description

目标检测方法、装置、电子设备和可读存储介质
相关申请的交叉引用
本申请要求于2020年4月2日提交中国专利局的申请号为CN202010256223.X、名称为“目标检测方法、装置、电子设备和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,更具体地,涉及一种目标检测方法、装置、电子设备和可读存储介质。
背景技术
随着人工智能技术的发展,目标检测在自动驾驶、行人检测、车牌识别以及移动手机和AR眼镜等电子设备得到广泛应用,这也使得大量智能算法被集成进这些电子设备中,用于进一步提高电子设备的智能。虽然目标检测已经发展多年,但是仍然存在很多问题有待解决,例如,检测效率过低。
发明内容
本申请提出了一种目标检测方法、装置、电子设备和可读存储介质,以改善上述缺陷。
第一方面,本申请实施例提供了一种目标检测方法,包括:对待检测图像进行特征提取,得到所述待检测图像的多个图层;获取每个所述图层对应的信息量;查找信息量满足预设条件的图层,作为目标图层;利用所述目标图层获取目标检测结果。
第二方面,本申请实施例还提供了一种目标检测装置,所述装置包括:特征获取模块、信息量获取模块、目标图层获取模块和检测结果获取模块。特征获取模块,用于对待检测图像进行特征提取,得到所述待检测图像的多个图层。信息量获取模块,用于获取每个所述图层对应的信息量。目标图层获取模块,用于查找信息量满足预设条件的图层,作为目标图层。检 测结果获取模块,用于利用所述目标图层获取目标检测结果。
第三方面,本申请实施例还提供了一种电子设备,包括一个或多个处理器;存储器;一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述方法。
第四方面,本申请实施例还提供了一种计算机可读介质,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行上述方法。
本申请实施例提供的目标检测方法、装置、电子设备和可读存储介质,首先对待检测图像进行特征提取,从而得到所述待检测图像的多个图层,而后获取每个图层对应的信息量,再查找信息满足预设条件的图层,作为目标图层,并利用所述目标图层获取目标检测结果。本申请通过信息量的引入使目标检测获取到的目标图层更加有效,如此不仅可以提高目标检测结果的准确性而且在一定程度上可以加快目标检测的效率。
本申请实施例的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请实施例而了解。本申请实施例的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请一个实施例提供的目标检测方法的方法流程图;
图2示出了本申请一个实施例提供的目标检测方法中目标检测模型结构图;
图3示出了本申请一个实施例提供的目标检测方法中信息量计算模块示意图;
图4示出了本申请另一个实施例提供的目标检测方法的方法流程图;
图5示出了本申请另一个实施例提供的目标检测方法中步骤S403的流程图;
图6示出了本申请又一个实施例提供的目标检测方法的方法流程图;
图7A示出了本申请又一个实施例提供的目标检测方法中目标图层中包括多个候选框的示意图;
图7B示出了本申请又一个实施例提供的目标检测方法中经过去重处理得到目标候选框的示意图;
图8示出了本申请又一个实施例提供的目标检测方法中目标检测模型获取流程图;
图9示出了本申请实施例提供的目标检测装置的模块框图;
图10示出了本申请实施例提供的电子设备的结构框图;
图11示出了本申请实施例提供的用于保存或者携带实现根据本申请实施例的目标检测方法的程序代码的存储单元。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
随着深度学习的不断发展与完善,人们不再只关注网络模型的准确度,并且开始关注网络模型的轻量化和小型化,由此来提高网络模型在终端设备的实际应用价值。目标检测算法不同于分类算法,不仅要识别出目标所属的类别,还要检测出目标在图片中的位置,因此分类难度更大。虽然目标检测算法已经发展多年,但是仍然存在很多问题有待解决,例如,目标之间的重叠遮掩、图片中的目标较小、类内目标的相似性以及自然界中不同物体种类繁多等问题均会导致目标检测准确度不高。另外,用于目标检测的网络模型与其他的网络模型相比计算复杂度高且延迟大,同时模型需要占据的内存空间也较大,如此导致目标检测模型在终端设备部署难度大但是实用性低。
目标检测算法主要可以分为两类,分别是one-stage目标检测算法和two-stage目标检测算法。其中,one-stage目标检测算法主要包括SSD(Single Shot Multibox Detector,单阶段多目标检测)系列和yolo(You only look once,一种单阶段目标检测模型)系列,而two-stage目标检测算法主要以RCNN系列为代表,包括RCNN、fast-RCNN、faster-RCNN、R-RCN、mask-RCNN等。one-stage目标检测算法时间复杂度低,且具有很好的实用性,可以在终端设备中部署,但是其检测精度则比较低。two-stage目标检测算法相较于one-stage目标检测算法精度高,但是网络模型的时间复杂度高,因此要将其部署于终端设备上比较困难。
现阶段,在实际应用中小目标的检测越来越重要。特别是在AR(Augmented Reality,增强现实)眼镜使用场景下,更是对小目标检测提出了极高的要求,因为当移动智能手机作为智能终端来实现目标检测时候, 使用者可以使用手机靠近目标,提高目标在图片中的大小,这样可以直接跳过小目标检测的难题。但是在AR眼镜中使用的时候,如果目标距离较远,使用者不可能戴着眼镜直接近距离的凑近目标来实现目标检测和识别,但是如果不靠近物体,小目标又很难检测出来,这极大的降低了AR眼镜使用者的用户体验。所以相对于移动手机,在AR眼镜使用场景下,对小目标检测的要求格外的严苛。
在计算机视觉领域对小目标检测一直都是个难题,过去研究人员使用形态学方法、小波变换等方法进行小目标检测,随着深度学习的急速发展,Alexnet、VGG、Inception、mobileNet等不同的模型架构层出不穷。深度学习技术在计算机视觉领域已经占据主导地位,也有很多目标检测算法迅速涌出,并且迅速的成为目标检测领域的主要方法。现有深度学习小目标检测技术方案中,主要有三个途径。第一可以通过提高输入图片的分辨率来增加小目标的相对大小,从而提高小目标的检测效果。第二采用较大较深的特征提取网络模型来提取特征信息,这样可以提取更多的有效特征信息,提高特征的表征能力,通过优化特征的方式来提高小目标的检测效果。第三通过使用底层特征信息,尽可能的保留小目标检测的准确率和召回率。因为较高层的特征虽然可以表征图片的上层语义信息,但是对小目标的信息在上层的时候很可能会丢失,所以使用底层特征信息,可以在一定程度上提高小目标的检测效果。
但是上述几种途径,都有不同的问题,使用不同层的特征图来进行目标检测,固然可以提取更多的特征,但是不同的图片,得到同一层的特征图也是不一样的。例如,使用L3,L4,…L7的特征图作为目标检测的特征图,分别进行不同尺寸的目标检测。因为输入图片的大小不同,所以图片中包含的物体大小以及位置等信息也不相同,如此也导致最终得到的L3,L4,…L7层的特征含有的信息存在一定的差异。现在技术通常采用的方法是,对所有层都分别进行目标检测,然后使用NMS(Non Maximum SuppreSSion,非极大值抑制)算法,去除不合理的检测框得到最终的检测框。但是,对于不同的图片,包含有的物体位置,大小不同。因此并不是所有的层都包含有目标的位置和类别信息,如果使用所有的层进行目标检测,会生成很多无用框,而无用框可能会混淆正确的检测框,导致检测出错,同时无用框会增加NMS阶段算法的计算时间,因为待选框越多,NMS所需要的时间就越多。
因此,为了解决上述问题,本申请实施例提供了一种目标检测方法。请参阅图1,示出了本申请一个施例提供的目标检测方法,该方法应用于图10和图11的电子设备。该目标检测方法用于提高目标检测结果的准确性。在具体的实施例中,该目标检测方法包括:步骤S101至步骤S104。
步骤S101:对待检测图像进行特征提取,得到所述待检测图像的多个 图层。
本申请实施例可以利用已经训练好的目标检测模型对待检测图像进行特征提取,即将待检测图像输入至目标检测模型中,并利用该目标检测模型对待检测图像进行特征提取。特征提取指的是利用目标检测模型提取待检测的图像信息,并确定待检测图像中每个图像的点是否属于一个图像特征,而特征提取的结果则是把图像上的点分为不同的子集,这些子集往往属于孤立的点、连续的曲线或者连续的区域。另外,待检测图像可以是一张图像也可以是多张图像,而图像的格式可以包括bmp,jpg,png等,具体哪种格式这里不进行明确限制。
在一种实施方式中,图像特征包括颜色特征、纹理特征、形状特征和空间关系特征,颜色特征是基于像素点的特征,其主要用于表示图像所对应的景物的表面性质。纹理特征与颜色特征类似也是用于表示图像所对应的景物的表面性质,与颜色特征不同纹理特征不是基于像素点的特征,它需要在包含多个像素点的区域中进行统计计算。形状特征是图像中目分割出的目标的形状,形状特征可以分为轮廓特征和区域特征,图像的轮廓特征主要针对物体的外边界,而图像的区域特征则关系到整个形状区域。空间关系特征是指图像中分割出来的多个目标之间的相互的空间位置或相对方向关系,这些关系也可分为连接/邻接关系、交叠/重叠关系和包含/包容关系等。空间位置信息可以分为相对空间位置信息和绝对空间位置信息,相对空间位置信息强调的是目标之间的相对情况,如上下左右关系等,绝对空间位置信息强调的则是目标之间的距离大小以及方位。
步骤S102:获取每个所述图层对应的信息量。
本申请实施例中信息量指的是图层中特征图包含的特征点的数量,即图层不相同则其对应的信息量也不相同,因此信息量越大则表明图层中含有的特征点就越多。获取每个所述图层对应的信息量,包括:将每个所述图层中的特征图按照通道数分离,得到多个特征向量,然后获取每个所述特征向量的协方差矩阵,并根据所述协方差矩阵获取每个图层对应的信息量。
信息量的计算公式如下:
Figure PCTCN2021075822-appb-000001
其中,c为通道数,cov ij为协方差矩阵。
为了更清楚的说明信息量的计算过程,本申请实施例给出了如图2所示的目标检测模型结构图,从图2可以看出,将待检测图像输入至目标检测模型后,该目标检测模型可以对所述待检测图像进行特征提取,得到多个图层。如图2所示的L3、L4、L5、L6以及L7均为图层,每个所述图层 包含的特征图均不相同,如此也表明其包含的特征点的数量也不相同。然后可以将图层传输至信息量计算模块,利用该模块对每个图层包含的特征点数量进行计算。
请参阅图3,给出了一个信息量计算模块的具体示例图,该信息量计算模块针对的是图2中的一个图层,现以L3图层为例,从图3可以看出L 3图层可以包括多个特征图,而这些特征图又可以按照通道数进行分离。其中,特征图可以为特征矩阵,其主要是目标检测模型利用其卷积层对待检测图像进行特征提取获取的,每个特征图可以包括多个特征点,特征点不同则其对应的特征图就不相同,有的特征图提取的是待检测目标的轮廓,有的特征图提取的是待检测目标的形状,也有的特征图提取的是待检测目标的最强特征。例如,A特征图提取的是猫的眼睛,B特征图提取的是猫的耳朵,而C特征图提取的则是猫的整体轮廓等。本申请实施例对L3图层中的特征图按照通道数分离可以得到c个m*n的向量,其中,c为通道数。然后计算这c个向量的协方差矩阵,最后根据得到的协方差矩阵即可得到L3图层对应的信息量。
需要说明的是,因为信息量计算的是待检测图像中每个区域的信息量,所以会存在某些图层信息量为0的现象,此时表明该信息量对应的图层中不存在任何有用的特征信息,如此可以直接将该图层去除掉不将其用于后续的目标检测,其在一定程度上可以减少无用图层的干扰。
步骤S103:查找信息量满足预设条件的图层,作为目标图层。
获取到每个图层的信息量之后,可以通过对比这些信息量来实现对多个图层的筛选,进而获取到包含特征点较多的图层,而去除掉包含特征点较少的图层。本申请实施例可以通过判断信息量是否满足预设条件来对多个图层进行筛选,如判断信息量是否大于预设阈值,如果大于该预设阈值,则保留该信息量对应的图层,如果小于该预设阈值则去除该信息量对应的图层。
步骤S104:利用所述目标图层获取目标检测结果。
在一种实施方式中,获取到目标图层之后可以根据所述目标图层获取目标检测结果,目标图层中可以包含有待检测目标在待检测图像中的位置信息和待检测目标的类别信息,通过对比分析这些信息即可得到目标检测结果。其中,待检测目标在待检测图像中的位置信息可以包括待检测目标对应的图像的宽度和高度,以及待检测目标图像中心点的坐标、左上角的坐标或者右下角的坐标等。另外,目标检测结果还可以包括置信度和待检测目标的概率。其中,置信度的范围可以是[0,1],通过这两个信息的引入可以使目标检测结果的描述更加准确。
本申请实施例提出的一种目标检测方法为了更准确的对小目标进行检 测,获取了多个图层,所述多个图层中包含有足够多的特征点用以实现对小目标检测,这些图层的获取可以使小目标的检测更加准确,同时,本申请利用信息量对这些图层进行筛选,去除掉无用的图层,如此不仅可以提高小目标检测的准确率而且在一定程度上可以加快目标检测的效率。
本申请另一实施例提供了一种目标检测方法,请参阅图4,该目标检测方法可以包括步骤S401至步骤S404。
步骤S401:对待检测图像进行特征提取,得到所述待检测图像的多个图层。
步骤S402:获取每个所述图层对应的信息量。
步骤S403:查找信息量大于预设阈值的图层,作为目标图层。
通过上述介绍可以知道信息量为图层中所有特征图包含的特征点的数量,获取到每个图层的信息量后可以判断每个图层对应的信息量是否大于预设阈值,如果大于预设阈值则保存该信息量对应的图层。例如,对待检测图像进行特征提取后可以得到5个图层,这5个图层分别为图层L3、图层L4、图层L5、图层L6和图层L7,经过计算可以获取到图层L3对应的信息量l 3=10,图层L4对应的信息量l 4=7,图层L5对应的信息量l 5=3,图层L6对应的信息量l 6=9,图层L7对应的信息量l 7=8。本申请实施例可以将预设阈值C设置为6,分别将C与信息量l 3、信息量l 4、信息量l 5、信息量l 6和信息量l 7进行比较,则可以知道信息量l 5小于预设阈值C,而信息量l 3、信息量l 4、信息量l 6和信息量l 7均大于预设阈值C,此时即可将信息量l 3、信息量l 4、信息量l 6和信息量l 7对应的图层L3、图层L4、图层L6和图层L7作为目标图层。
在一种实施方式中,预设阈值可以是经验值,也可以是所有图层对应的信息量的平均值,如上示例中预设阈值C也可以为信息量l 3、信息量l 4、信息量l 5、信息量l 6和信息量l 7的平均,信息量大于所述平均值则保留该信息量对应的图层,小于则去除该信息量对应的图层。另外,在获取目标图层的时候可以获取多个预设阈值,当图层根据其包含的特征信息分为多个类别时,每个类别可以对应一个预设阈值。如此可以使最终获取的目标图层更加准确有效,本申请实施例中预设阈值具体如何设置这里不进行明确限制,可以根据实际需求进行获取。
另外,查找信息量大于预设阈值的图层可以包括如图5所示的步骤,从图5可以知道步骤S403包括步骤S4031至步骤S4032。
步骤S4031:由所述多个图层集中,查找信息量大于预设阈值的图层,作为待选图层。
在一种实施方式中,信息量大于预设阈值的图层不止一个,可以将这 些信息量大于预设阈值的图层作为待选图层,而小于预设阈值的图层则可以直接去除掉。例如,上述示例中通过对比可以知道信息量l 3、信息量l 4、信息量l 6和信息量 7均大于预设阈值C,则信息量l 3、信息量l 4、信息量l 6和信息量l 7对应的图层L3、图层L4、图层L6和图层L7均可以作为待选图层,而信息量l 3对应的图层L3因为信息量小于预设阈值,则可以直接将其去除掉不作为待选图层。
步骤S4032:由所述多个待选图层中,查找满足指定要求的图层,作为目标图层。
获取到多个待选图层后,可以查找满足指定要求的图层作为目标图层。具体的,可以基于信息量由大到小的顺序,确定每个待选图层的排序,得到目标序列,而后将所述目标序列中排名靠前的N个待选图层作为目标图层,其中,N为正整数。例如,通过与预设阈值比较最终获取的待选图层分别是待选图层L3、待选图层L4、待选图层L6和待选图层L7,这些待选图层对应的信息量l 3=10、信息量l 4=7、信息量l 6=9以及信息量l 7=8,对这些信息量进行排序,得到的信息量的顺序为信息量l 3、信息量l 6、信息量l 7和信息量 4,然后根据该顺序可以确定每个待选图层的排序,得到目标序列,该目标序列为:待选图层L3、待选图层L6、待选图层L7和待选图层L4,此时可以将目标序列中排名靠前的3个待选图层作为目标图层,此时目标图层包括待选图层L3、待选图层L6和待选图层L7。
在另一些实施方式中,可以基于信息量从小到大的顺序,确定每个待选图层的排序,得到目标序列,而后将所述目标序列中靠后的M个待选图层作为目标图层。其中,M为正数,M可以根据经验值进行设置,本申请实施例中M可以优先设置为3,即将目标序列中最后三个待选图层作为目标图层。
步骤S404:利用所述目标图层获取目标检测结果。
本申请实施例提出的一种目标检测方法在获取到多个图层之后可以先获取每个图层对应的信息量,然后将该信息量与预设阈值进行比较得到符合条件的信息量,如此即可获取到该信息量对应的图层。通过将图层信息量的比较不仅可以使目标图层的获取更加准确,而且在比较信息量的过程中可以去除掉一些信息量较小的图层,在一定程度上能够加快目标检测结果获取的速度,因此利用本申请提出的目标检测方法可以使目标检测结果的获取更加准确有效。
本申请又一实施例提供了一种目标检测方法,请参阅图6,该目标检测方法可以包括步骤S601至步骤S606。
步骤S601:对待检测图像进行特征提取,得到所述待检测图像的多个图层。
步骤S602:获取每个所述图层对应的信息量。
步骤S603:查找信息量满足预设条件的图层,作为目标图层。
步骤S604:获取每个所述目标图层内的每个特征图的识别结果和候选框,每个所述候选框对应一个识别结果。
本申请实施例获取到目标图层之后,可以获取每个所述目标图层内的每个特征图的识别结果和候选框,每个候选框对应一个识别结果。其中,识别结果可以包括待检测目标的类别、待检测目标的分类概率以及待检测目标在待检测图像中的位置信息。通过上述介绍可以知道,目标图层中可以包括多个待选图层,因此,一个待检测目标可以对应多个候选框。需要说明的是,每个图层内包括至少一个特征图,且目标图层包括至少一个图层。
步骤S605:对所述候选框进行去重处理,得到剩余的候选框。
获取到每个图层的每个特征图的候选框之后可以对候选框进行去重处理,并保留较优的候选框。本申请实施例可以利用非极大值抑制对候选框进行去重处理,所述非极大值抑制用于抑制不是极大值的元素,去除冗余的候选框而保留最优的一个候选框。如图7A所示,对待检测图像经过特征提取和图层的查找等操作最终获取到的目标图层可以包括4个候选框,这四个候选框分别为A,B,C和D,其中,候选框A和候选框D对应的待检测目标是狗,而候选框B和候选框C对应的待检测目标是猫。通过上述介绍可以知道每个候选框均对应一个识别结果,而该识别结果包括每个候选框的分类概率,本申请可以按照所述分类概率对每个候选框进行排序,如候选框A的分类概率为0.9,候选框D的分类概率为0.7,候选框B的分类概率为0.85,候选框C的分类概率为0.8。首先可以根据分类概率对所有候选框进行排序,并对分类概率最大的A进行标记,然后从分类概率最大的候选框A开始,分别判断候选框A与候选框B、候选框C和候选框D的重合度IOU是否大于预设阈值,如果大于预设阈值则舍弃该候选框。通过计算得到IOU AD=0.9,IOU AB=0,IOU AC=0,因为IOU AD=0.9大于预设阈值0.5,因此可以将候选框D去除掉。接着可以将候选框B进行标记,并判断该候选框与其他候选框的重合度是否满足预设条件,如果满足预设条件,则去除掉满足预设条件的候选框。对图7A经过非极大值抑制处理后即可得到如图7B所示的视图。另外,本申请实施例也可以利用Softe-NMS、Softer-NMS、IoU-guided NMS或者Yes-Net等对候选框进行去重处理,具体使用哪种方式对多个候选框进行去重处理这里不进行明确限制。
步骤S606:根据所述剩余的候选框和剩余的候选框对应的识别结果,得到目标检测结果。
在另一种实施方式中,在对待检测图像进行特征提取,得到多个图层 之前可以先获取一个目标检测模型,并将该目标检测模型用于本申请实施例提出的目标检测方法中,目标检测模型的获取如图8所示,从图可以知道目标检测模型获取包括步骤S801至步骤S805。
步骤S801:输入训练数据集至神经网络模型。
本申请实施例中神经网络模型主要用于目标检测,其可以是resnet、mobilenet、xception、densenet等网络模型,训练数据集主要用于对神经网络模型不断训练与优化,即神经网络模型通过输入的训练数据集不断对其网络参数进行调整。训练数据集可以为ImageNet数据集、PASCAL VOC数据集、CIFAR数据集或者COCO数据集等。另外,所述训练数据集也可以通过手动拍摄获取,或者可以利用网络爬虫从互联网中获取分类目标不同尺度、位置、光照下的图片数据集,并将相关图片自动转换成固定尺寸、固定格式的图片,例如,将相关图片自动转换成32*32的jpg图片。
步骤S802:执行特征提取操作,所述特征提取操作包括利用所述神经网络模型对所述训练数据集中的每个图像进行特征提取,得到多个第一图层。
步骤S803:获取每个所述第一图层对应的第一信息量。
在一种实施方式中,获取每个第一图层对应的第一信息,包括:获取所述特征提取操作的执行次数,若所述特征提取操作的执行次数大于指定数值,则获取本次所获取的每个所述第一图层对应的第一信息量。换句话说,本申请可以获取一个记录信息,所述记录信息用于记录特征提取操作的执行次数,当所述特征提取操作的执行次数大于指定数值,则执行信息量获取的操作。在一个具体的实施例中,可以将ImageNet数据集输入至神经网络模型,进行目标检测模型的训练,在获取到最终的目标检测模型之前需要对神经网络模型进行多次epoch的训练,由此可以获取到多个图层,这些图层有一些是在训练初期获取的,而有些则是在训练后期获取的。因为神经网络模型在初期训练时获取的图层通常是噪声特征图,因此可以直接将这些噪声特征图去除掉,即不计算这些噪声特征图的信息量。
本申请实施例在模型训练开始阶段,并不加入信息量计算模块,而是在训练N个epoch之后,加入信息量计算模块,即当特征提取操作的执行次数大于指定数值时,才获取第一图层的第一信息量,其中,第一图层是通过执行当前特征提取操作获取的图层。另外,当特征提取操作的执行次数小于或者等于指定数值时,则不获取本次所获取的第一图层对应的第一信息量,即不执行信息量获取操作。例如,获取到特征提取的执行次数为18次,而指定数值为19,显然,特征提取的执行次数18小于指定数值19,此时,则不需要对本次所获取的每个所述第一图层的第一信息量进行计算。又如,获取到特征提取的执行次数为20次,而指定数值为19,显然,此时 特征提取的执行次数20大于指定数值19,此时,则需计算本次所获取的每个所述第一图层的第一信息量。
步骤S804:由所述多个第一图层中,查找第一信息量满足预设条件的第一图层,作为第一目标图层。
步骤S805:获取所述第一目标图层中每个所述第一图层的损失数据,并结合该损失数据对神经网络模型进行训练,得到目标检测模型。
在一种实施方式中,获取到第一图层的损失数据后,可以利用梯度下降法结合该损失数据对神经网络模型的损失函数进行最小化,同时可以对神经网络模型的权重参数进行逐层反向调节。其中,损失函数可以包括分类损失函数和定位损失函数,分类损失函数用于对待检测目标的类别进行预测,而位置损失函数则用于对最终获取的候选框进行精修,通过结合这两个损失函数可以动态实现对最终检测边框的定位。
本申请实施例提出的一种目标检测方法在获取目标检测模型时加入了信息量的计算可以使目标检测更加准确,同时在获取到多个候选框后本申请可以利用非极大值抑制对多个候选框进行去重处理,如此而可以获取到一个更加准确的目标检测结果,最终获取的目标检测模型可以应用于目标检测方法,另外,因为本申请可以获取多个图层,而这些图层中包含有足够多的特征点,因此在选用神经网络模型时不需要使用较大较深的特征提取模型,因为较大较深的特征提取模型不仅需要大量的内存,而且目标检测速度慢,即本申请在进行目标检测时只需要简单的特征提取模型即可实现,因此利用本申请提出的目标检测方法不仅可以提高小目标检测的准确性,而且可以有效减少小目标检测的运行时间。
请参阅图9,本申请实施例提出了一种目标检测装置900。该目标检测装置900用于提高目标检测的准确性。在具体的实施例中,该目标检测装置900包括:特征获取模块901、信息量获取模块902、目标图层获取模块903和检测结果获取模块904。
特征获取模块901,用于对待检测图像进行特征提取,得到所述待检测图像的多个图层。
进一步的,特征获取模块901还用于输入训练数据集至神经网络模型,执行特征提取操作,所述特征提取操作包括利用所述神经网络模型对所述训练数据集中的每个图像进行特征提取,得到多个第一图层,获取每个所述第一图层对应的第一信息量,由所述多个第一图层中,查找第一信息量满足预设条件的第一图层,作为第一目标图层,获取所述第一目标图层中每个所述第一图层的损失数据,并结合该损失数据对神经网络模型进行训练,得到目标检测模型,所述目标检测模型用于所述目标检测方法。
进一步的,特征获取模块901还用于获取所述特征提取操作的执行次数,若所述特征提取操作的执行次数大于指定数值,则获取本次所获取的每个所述第一图层对应的第一信息量。
信息量获取模块902,用于获取每个所述图层对应的信息量。
目标图层获取模块903,用于查找信息量满足预设条件的图层,作为目标图层。
进一步的,目标图层获取模块903还用于查找信息量大于预设阈值的图层,作为目标图层。具体的,查找信息量大于预设阈值的图层,作为待选图层,由所述多个待选图层中,查找满足指定要求的图层,作为目标图层。
进一步的,目标图层获取模块903还用于基于信息量由大到小的时候,确定每个待选图层的排序,得到目标序列,将所述目标序列中排名靠前的N个图层作为目标图层,其中,N为正整数。
检测结果获取模块904,用于利用所述目标图层获取目标检测结果。
进一步的,检测结果获取模块904还用于获取每个所述目标图层内的每个特征图的识别结果和候选框,每个所述候选框对应一个识别结果,对所述候选框进行去重处理,得到剩余的候选框,根据所述剩余的候选框和该剩余的候选框对应的识别结果,得到目标检测结果。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
请参阅图10,其示出了本申请实施例提供的一种电子设备1000的结构框图。该电子设备1000可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本申请中的电子设备1000可以包括一个或多个如下部件:处理器1010、存储器1020、以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器1020中并被配置为由一个或多个处理器1010执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。
处理器1010可以包括一个或者多个处理核。处理器1010利用各种接 口和线路连接整个电子设备1000内的各个部分,通过运行或执行存储在存储器1020内的指令、程序、代码集或指令集,以及调用存储在存储器1020内的数据,执行电子设备1000的各种功能和处理数据。可选地,处理器1010可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器1010可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1010中,单独通过一块通信芯片进行实现。
存储器1020可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器1020可用于存储指令、程序、代码、代码集或指令集。存储器1020可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备1000在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。
请参阅图11,其示出了本申请实施例提供的一种计算机可读存储介质2000的结构框图。该计算机可读存储介质2000中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质2000可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质2000包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质2000具有执行上述方法中的任何方法步骤的程序代码2010的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码2010可以例如以适当形式进行压缩。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种目标检测方法,其特征在于,包括:
    对待检测图像进行特征提取,得到所述待检测图像的多个图层;
    获取每个所述图层对应的信息量;
    查找信息量满足预设条件的图层,作为目标图层;
    利用所述目标图层获取目标检测结果。
  2. 根据权利要求1所述的方法,其特征在于,所述信息量为所述图层包含的特征点的数量;
    查找信息量满足预设条件的图层,作为目标图层,包括:
    由所述多个图层中,查找信息量大于预设阈值的图层,作为目标图层。
  3. 根据权利要求2所述的方法,其特征在于,所述查找信息量大于预设阈值的图层,作为目标图层,包括:
    由所述多个图层中,查找信息量大于预设阈值的图层,作为待选图层;
    由所述多个待选图层中,查找满足指定要求的图层,作为目标图层。
  4. 根据权利要求3所述的方法,其特征在于,所述由所述多个待选图层中,查找满足指定要求的图层,作为目标图层,包括:
    基于信息量由大到小的顺序,确定每个待选图层的排序,得到目标序列;
    将所述目标序列中排名靠前的N个图层作为目标图层,其中,N为正整数。
  5. 根据权利要求2所述的方法,其特征在于,所述由所述多个图层中,查找信息量大于预设阈值的图层,作为目标图层之前,包括:
    确定所有图层对应信息量的平均值,并将该平均值作为所述预设阈值。
  6. 根据权利要求1所述的方法,其特征在于,所述图层包括至少一个特征图,所述目标图层包括至少一个图层;
    所述利用所述目标图层获取目标检测结果,包括:
    获取每个所述目标图层内的每个特征图的识别结果和候选框,每个所述候选框对应一个识别结果;
    对所述候选框进行去重处理,得到剩余的候选框;
    根据所述剩余的候选框和该剩余的候选框对应的识别结果,得到目标检测结果。
  7. 根据权利要求6所述的方法,其特征在于,所述对所述候选框进行去重处理,得到剩余的候选框,包括:
    利用非极大值抑制对所述候选框进行去重处理,得到剩余的候选框,所述非极大值抑制用于抑制非极大值元素。
  8. 根据权利要求6所述的方法,其特征在于,所述对所述候选框进行去重处理,得到剩余的候选框,包括:
    确定所述候选框对应的分类概率,并根据所述分类概率对每个所述候选框进行排序,得到排序结果;
    根据所述排序结果确定分类概率最大的候选框,并将该候选框作为最大候选框;
    获取所述最大候选框与其他候选框之间的重合度,并确定所述最大候选框与其他候选框之间的重合度是否满足预设条件;
    若所述最大候选框与其他候选框之间的重合度满足预设条件,则去掉所述最大候选框,得到剩余的候选框。
  9. 根据权利要求8所述的方法,其特征在于,所述确定所述最大候选框与其他候选框之间的重合度是否满足预设条件,包括:
    确定所述最大候选框与其他候选框之间的重合度是否大于重合度阈值;
    若大于,则表示所述最大候选框与其他候选框之间的重合度满足预设条件。
  10. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    若所述最大候选框与其他候选框之间的重合度不满足预设条件,则保留满足预设条件的候选框,并根据排序结果获取一个新的分类概率最大的候选框,重复执行候选框去重处理,得到剩余的候选框。
  11. 根据权利要求1所述的方法,其特征在于,所述对待检测图像进行特征提取,得到多个图层之前,包括:
    输入训练数据集至神经网络模型;
    执行特征提取操作,所述特征提取操作包括利用所述神经网络模型对所述训练数据集中的每个图像进行特征提取,得到多个第一图层;
    获取每个所述第一图层对应的第一信息量;
    由所述多个第一图层中,查找第一信息量满足预设条件的第一图层,作为第一目标图层;
    获取所述第一目标图层中每个所述第一图层的损失数据,并利用该损失数据对神经网络模型进行训练,得到目标检测模型,所述目标检测模型用于所述目标检测方法。
  12. 根据权利要求11所述的方法,其特征在于,所述获取每个所述第一图层对应的第一信息量,包括:
    获取所述特征提取操作的执行次数;
    若所述特征提取操作的执行次数大于指定数值,则获取本次所获取的每个所述第一图层对应的第一信息量。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    若所述特征提取操作的执行次数小于或者等于指定数值,持续监测所述特征提取操作的执行次数,直至所述特征提取的执行次数大于指定数值,则执行所述第一信息量获取操作。
  14. 根据权利要求11所述的方法,其特征在于,所述利用该损失数据对神经网络模型进行训练,得到目标检测模型,包括:
    根据梯度下降法和所述损失数据对所述神经网络模型的损失函数进行最小化,以及
    对所述神经网络模型的权重参数进行逐层反向调节,得到目标检测模型。
  15. 根据权利要求1至14任一所述的方法,其特征在于,所述获取每个所述图层对应的信息量,包括:
    将每个所述图层中的特征图按照通道数分离,得到多个特征向量;
    获取每个所述特征向量的协方差矩阵,并根据所述协方差矩阵获取每个图层对应的信息量。
  16. 根据权利要求15所述的方法,其特征在于,所述信息量的计算公式为:
    Figure PCTCN2021075822-appb-100001
    其中,c为通道数,cov ij为协方差矩阵。
  17. 根据权利要求1至14任一所述的方法,其特征在于,所述利用所述目标图层获取目标检测结果,包括:
    确定待检测目标在待检测图像中的位置信息,以及确定所述待检测目标的类别信息;
    根据所述位置信息和所述类别信息得到目标检测结果。
  18. 一种目标检测装置,其特征在于,所述装置包括:
    特征获取模块,用于对待检测图像进行特征提取,得到所述待检测图像的多个图层;
    信息量获取模块,用于获取每个所述图层对应的信息量;
    目标图层获取模块,用于查找信息量满足预设条件的图层,作为目标图层;
    检测结果获取模块,用于利用所述目标图层获取目标检测结果。
  19. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行如权利要求1-17任一项所述的方法。
  20. 一种计算机可读取存储介质,其特征在于,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-17任一项所述的方法。
PCT/CN2021/075822 2020-04-02 2021-02-07 目标检测方法、装置、电子设备和可读存储介质 WO2021196896A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010256223.XA CN111444976A (zh) 2020-04-02 2020-04-02 目标检测方法、装置、电子设备和可读存储介质
CN202010256223.X 2020-04-02

Publications (1)

Publication Number Publication Date
WO2021196896A1 true WO2021196896A1 (zh) 2021-10-07

Family

ID=71651021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075822 WO2021196896A1 (zh) 2020-04-02 2021-02-07 目标检测方法、装置、电子设备和可读存储介质

Country Status (2)

Country Link
CN (1) CN111444976A (zh)
WO (1) WO2021196896A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114030958A (zh) * 2021-10-27 2022-02-11 北京云迹科技有限公司 一种电梯调度方法、装置、设备和介质
CN116664828A (zh) * 2023-04-15 2023-08-29 北京中科航星科技有限公司 一种智能装备图像信息处理系统及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444976A (zh) * 2020-04-02 2020-07-24 Oppo广东移动通信有限公司 目标检测方法、装置、电子设备和可读存储介质
CN115150614A (zh) * 2021-03-30 2022-10-04 中国电信股份有限公司 图像特征的传输方法、装置和系统
CN113283322A (zh) * 2021-05-14 2021-08-20 柳城牧原农牧有限公司 一种牲畜外伤检测方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951055A (zh) * 2017-03-10 2017-07-14 广东欧珀移动通信有限公司 一种移动终端的显示控制方法、装置及移动终端
CN109308386A (zh) * 2018-09-11 2019-02-05 深圳市彬讯科技有限公司 工程图墙体识别方法、装置及电子设备
CN109582410A (zh) * 2018-10-17 2019-04-05 广州视源电子科技股份有限公司 截图方法、装置、设备及计算机可读存储介质
CN110796016A (zh) * 2019-09-30 2020-02-14 万翼科技有限公司 工程图纸识别方法、电子设备及相关产品
CN111444976A (zh) * 2020-04-02 2020-07-24 Oppo广东移动通信有限公司 目标检测方法、装置、电子设备和可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4540661B2 (ja) * 2006-02-28 2010-09-08 三洋電機株式会社 物体検出装置
CN105184771A (zh) * 2015-08-12 2015-12-23 西安斯凯智能科技有限公司 一种自适应运动目标检测系统及检测方法
CN107301383B (zh) * 2017-06-07 2020-11-24 华南理工大学 一种基于Fast R-CNN的路面交通标志识别方法
JP2019061484A (ja) * 2017-09-26 2019-04-18 キヤノン株式会社 画像処理装置及びその制御方法及びプログラム
CN109740537B (zh) * 2019-01-03 2020-09-15 广州广电银通金融电子科技有限公司 人群视频图像中行人图像属性的精确标注方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951055A (zh) * 2017-03-10 2017-07-14 广东欧珀移动通信有限公司 一种移动终端的显示控制方法、装置及移动终端
CN109308386A (zh) * 2018-09-11 2019-02-05 深圳市彬讯科技有限公司 工程图墙体识别方法、装置及电子设备
CN109582410A (zh) * 2018-10-17 2019-04-05 广州视源电子科技股份有限公司 截图方法、装置、设备及计算机可读存储介质
CN110796016A (zh) * 2019-09-30 2020-02-14 万翼科技有限公司 工程图纸识别方法、电子设备及相关产品
CN111444976A (zh) * 2020-04-02 2020-07-24 Oppo广东移动通信有限公司 目标检测方法、装置、电子设备和可读存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114030958A (zh) * 2021-10-27 2022-02-11 北京云迹科技有限公司 一种电梯调度方法、装置、设备和介质
CN116664828A (zh) * 2023-04-15 2023-08-29 北京中科航星科技有限公司 一种智能装备图像信息处理系统及方法
CN116664828B (zh) * 2023-04-15 2023-12-15 北京中科航星科技有限公司 一种智能装备图像信息处理系统及方法

Also Published As

Publication number Publication date
CN111444976A (zh) 2020-07-24

Similar Documents

Publication Publication Date Title
WO2021196896A1 (zh) 目标检测方法、装置、电子设备和可读存储介质
WO2021169723A1 (zh) 图像识别方法、装置、电子设备及存储介质
CN108475331B (zh) 用于对象检测的方法、装置、系统和计算机可读介质
CN112733749B (zh) 融合注意力机制的实时行人检测方法
WO2020164282A1 (zh) 基于yolo的图像目标识别方法、装置、电子设备和存储介质
CN108596944B (zh) 一种提取运动目标的方法、装置及终端设备
WO2018103608A1 (zh) 一种文字检测方法、装置及存储介质
CN105184763B (zh) 图像处理方法和装置
WO2019114036A1 (zh) 人脸检测方法及装置、计算机装置和计算机可读存储介质
US8750573B2 (en) Hand gesture detection
US9483701B1 (en) System and method for using segmentation to identify object location in images
WO2022033095A1 (zh) 一种文本区域的定位方法及装置
CN111797709B (zh) 一种基于回归检测的实时动态手势轨迹识别方法
CN106156777B (zh) 文本图片检测方法及装置
JP2003016448A (ja) 前景/背景セグメント化を用いた画像のイベント・クラスタリング
Aytekin et al. Visual saliency by extended quantum cuts
CN109948457B (zh) 基于卷积神经网络和cuda加速的实时目标识别方法
Singh et al. Content-based image retrieval based on supervised learning and statistical-based moments
Singh et al. A novel position prior using fusion of rule of thirds and image center for salient object detection
Hu et al. RGB-D image multi-target detection method based on 3D DSF R-CNN
KR102576157B1 (ko) 인공 신경망을 이용한 고속 객체 검출 방법 및 장치
CN112560856B (zh) 车牌检测识别方法、装置、设备及存储介质
Zhu et al. Scene text relocation with guidance
CN114724175B (zh) 行人图像的检测网络、检测方法、训练方法、电子设备和介质
CN114639143B (zh) 基于人工智能的人像归档方法、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21781500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21781500

Country of ref document: EP

Kind code of ref document: A1