CN111275082A - Indoor object target detection method based on improved end-to-end neural network - Google Patents

Indoor object target detection method based on improved end-to-end neural network Download PDF

Info

Publication number
CN111275082A
CN111275082A CN202010039334.5A CN202010039334A CN111275082A CN 111275082 A CN111275082 A CN 111275082A CN 202010039334 A CN202010039334 A CN 202010039334A CN 111275082 A CN111275082 A CN 111275082A
Authority
CN
China
Prior art keywords
target
image
neural network
convolutional neural
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010039334.5A
Other languages
Chinese (zh)
Inventor
陈略峰
吴敏
曹卫华
张平平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202010039334.5A priority Critical patent/CN111275082A/en
Publication of CN111275082A publication Critical patent/CN111275082A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an indoor object target detection method based on an improved end-to-end neural network, which comprises the steps of marking each target in a training set by utilizing a marking frame, and obtaining the category and position information of each target in the training set; initializing a convolutional neural network, and preprocessing a training set; segmenting the preprocessed training set image into M multiplied by N grids; selecting an initial candidate frame by using a grid; detecting a target for each grid to obtain a category confidence of a target category; setting the output of the convolutional neural network according to the class confidence coefficient to obtain a final prediction frame; training a convolutional neural network to obtain a trained convolutional neural network; and testing the image of the target to be detected by using the trained convolutional neural network so as to determine the category and the location of the target object. The invention provides a feature extraction mode of first pooling and then convolution in the neural network, reduces the loss of feature information and simultaneously realizes rapid indoor target detection.

Description

Indoor object target detection method based on improved end-to-end neural network
Technical Field
The invention relates to the field of image recognition, in particular to an indoor object target detection method based on an improved end-to-end neural network.
Background
The intelligent robot needs to operate in a complex environment with real-time change of environment, climate, weather, illumination and scenery, and external factors such as pedestrians and obstacles with different postures and uncertain actions may exist in the operation process. These factors bring great challenges to the robot, and therefore have great significance and difficulty in the research of the environment perception algorithm of the intelligent mobile robot. The indoor space is a common working scene of the intelligent emotional robot. Compared with an outdoor environment, the indoor environment is often more complicated, so that the robot is more difficult to understand the environment. In addition, the individual demands of people in modern society for articles make objects have various and different shapes, which is one of the challenges of environmental understanding. The description of the objects in the environment and the relation between the objects and the surrounding objects are established, and the method has important significance for task execution of the emotional robot. For example, navigation of robots requires recognition and positioning of objects, interaction of human faces and gestures requires perception of the surrounding environment (including objects and people), and recognition and tracking of interacting people. The establishment of environment perception is an important step of the robot and a cognitive environment, and information support is provided for subsequent diversified operations of the robot. Scene objects typically include people, tables, chairs, and the like. The difficulty of detection increases significantly when they occur in the same scene, especially in complex indoor environments. Therefore, accurate detection of objects in a complex indoor environment is one of the difficulties of environmental sensing technology.
The target detection of the indoor object comprises three parts of extraction of a candidate frame, detection of a target to be detected and detection, identification and positioning of the object target. In particular, the object target detection technology has been developed after decades of research, and has made great progress in both detection accuracy and speed. The mainstream detection mainly includes a Deformable Part (DPM), a Deep Network (DN), and a Decision tree (DF). The traditional detection method is based on a manually designed feature extractor, and the purpose of object detection is achieved by extracting training classifiers such as Haar features, Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and the like. But the detection features of the manual design are difficult to adapt to the large changes of the dynamic object. The depth network can learn features from image pixels, improving object detector performance. The deep network is also deeply applied in the field of pedestrian detection, and with the construction of a large-scale training data set and the continuous enhancement of hardware computing capacity, the deep network structure has great success in different visual tasks. In the aspect of target detection, the method mainly comprises a first-stage RCNN (Region-CNN), a Fast-RCNN and a Fast RCNN series and a second-stage detection YOLO (you Only Look one), SSD (Single Shot Multi Box Detector) and CORNER NET, and the accuracy and speed of target detection reach new heights. The YOLO neural network is one of the most excellent target detection architectures at present, and is particularly remarkable in the aspect of detection real-time performance.
The feature expression method based on learning is widely concerned and researched, and compared with the feature designed manually, the deep learning feature is obtained by constructing a deep network structure and directly extracting from the original image pixel, so that the feature design problem is converted into a network architecture problem. Unnecessary feature design details are greatly reduced, meanwhile, a certain semantic attribute is displayed by high-level feature mapping of the deep neural network, and the best effect is achieved based on deep learning in related international events such as PASCAL VOC and Image Net large-scale visual recognition challenge games. Although deep learning feature expressions have more essential feature expressions, the training of the network requires a large amount of data due to the large number of parameters involved in learning the deep neural network, and therefore the calculation process is heavy and needs further optimization.
The target detection of the indoor object can be applied to the information processing of the intelligent machine for sensing the environment of the emotional robot system, the cognitive ability and the decision analysis ability of the intelligent machine can be further improved, and the intelligence and the adaptability of human-computer interaction are further enhanced. Particularly, on the basis of analyzing visual information of different modes, the environment is sensed and reflected, more abundant information can be obtained, and conditions are created for realizing higher-level machine intelligence.
Disclosure of Invention
The invention aims to solve the technical problems that in the prior art, the processing speed is low and the calculated amount is large, and provides an indoor object target detection method based on an improved end-to-end neural network.
The technical scheme adopted by the invention for solving the technical problems is as follows: an indoor object target detection method based on an improved end-to-end neural network is constructed, and the method comprises the following steps:
s1, constructing an end-to-end convolutional neural network, wherein the end-to-end convolutional neural network comprises a plurality of pooling layers for reducing image pixels, a plurality of convolutional layers for extracting image features, 1 full-connection layer and 1 classification output layer;
s2, acquiring a target image data set, constructing a training set based on the target image data set, labeling a labeling frame of each image in the training set, and determining the category and position information of each predefined target in the images of the training set;
s3, inputting the training set marked by the marking box into the convolutional neural network constructed in the step S1, and carrying out network initialization; the method comprises the steps that input image data are subjected to image pixel reduction through 1 pooling layer, then input into a convolution layer connected with the pooling layer, subjected to image feature extraction, subjected to weighting and processing on input feature vectors through a full-connection layer, and subjected to classification output layer, so that preprocessing of training set images is realized;
s4, dividing each image in the preprocessed training set into M multiplied by N network cells; selecting an initial candidate frame for each image by using the M multiplied by N network cells obtained by segmentation; b initial candidate frames are randomly generated by each network unit cell, and M multiplied by N multiplied by B initial candidate frames are generated in total;
s5, detecting a predefined target for each network cell obtained by segmentation to obtain a category confidence coefficient of the target category of M multiplied by N multiplied by B; setting the output of a convolutional neural network according to the obtained confidence coefficient of the object class, and determining a final object prediction frame;
s6, taking the training set marked by the marking box as the input of the convolutional neural network, taking the target prediction box obtained in the step S5 as the output of the convolutional neural network, and training the convolutional neural network to obtain the convolutional neural network finally used for target detection;
and S7, inputting the image to be subjected to target detection into the convolutional neural network trained based on the step S6, and carrying out indoor object target detection.
The implementation of the indoor object target detection method based on the improved end-to-end neural network has the following beneficial effects:
1. the invention designs an improved end-to-end neural model, provides a feature extraction mode of first pooling and then convolution in a neural network, reduces the loss of feature information and simultaneously realizes rapid indoor target detection;
2. while the model is improved and finely adjusted, detection and result optimization are carried out through a self-made picture data set and a VOC2007 data set of the human-computer interaction indoor environment of the emotion robot, and the detection performance of the indoor environment is improved;
3. the improved end-to-end neural model disclosed by the invention is verified and analyzed through an experimental result, namely, a learning model can be converted into a specific model from a general model for target detection, and the situation categories related to environmental information can be continuously enriched, so that a data set is enriched and applied to an emotional robot interaction system.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a diagram of the indoor object target detection process based on the improved end-to-end convolutional neural network model of the present invention;
FIG. 2 is a diagram based on an improved end-to-end neural model architecture;
FIG. 3 is a comparison graph of target recognition at different grid scales.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
The detection method based on the improved end-to-end convolution neural network model unifies candidate frame extraction, feature extraction, target classification and target positioning into a neural network, extracts candidate areas from an image through the built neural network, and predicts the position and probability of a pedestrian through the characteristics of the whole image. The problem of detecting the indoor environment target is converted into a regression problem, and end-to-end detection is really realized.
Please refer to fig. 1, which is a diagram illustrating an indoor object target detection process based on an improved end-to-end convolutional neural network model according to the present invention. Wherein the input image is divided into M × N units, each unit being given B initial candidate frames of different specifications; the parameters M, N, B are all positive integers and are equal to or greater than 1. As shown in fig. 1, prediction candidate frames are extracted via the convolutional layer network, and the number of each image candidate frame is M × N × B.
The specific implementation steps are as follows:
step 1: constructing an end-to-end convolutional neural network, wherein the end-to-end convolutional neural network comprises a plurality of pooling layers for reducing image pixels, a plurality of convolutional layers for extracting image features, 1 full-connection layer and 1 classification output layer; referring to fig. 2, the improved end-to-end convolutional neural network model provided in this embodiment includes 18 convolutional layers for extracting image features, 6 pooling layers for reducing image pixels, 1 classification output layer, and 1 full connection layer. Under the structure, on one hand, the loss of characteristic information is reduced through a full connection layer connected with a convolution layer; on the other hand, a feature extraction mode of first pooling and then convolution is provided in an improved end-to-end neural network, so that the loss of feature information is reduced, and meanwhile, rapid indoor target detection is realized; finally, the convolutional neural network module provided by this embodiment converts the detection problem into a regression problem, and truly realizes end-to-end detection.
Step 2: acquiring a target image data set, constructing a training set based on the target image data set, labeling a labeling frame of each image in the training set, and determining the category and position information of each predefined target in the images of the training set;
the target image data set comprises an image data set for carrying out an indoor interaction environment of the emotional robot and a VOC2007 related data set, wherein the data set of the indoor environment is manufactured according to the two data sets, the data set is divided into a training set and a testing set, each image in the training set is labeled through labeling frame labeling software, each target in the images of the training set is determined, and the category and the position information of each target in the images of the training set are obtained; wherein:
in this step, the construction process of the training set and the test set is as follows: ten thousand images are selected from the collected indoor environment images and VOC data sets of the emotional robot as data sets, eight thousand images in the data sets are used as training sets, and the remaining two thousand images are used as testing sets; the training set is used for subsequent convolutional neural network training, and the test set is used for testing the accuracy of finally obtained positioning data when the training set is input into the trained convolutional neural network;
in this step, the predefined target is set as: according to the emotional robot interaction scene and the objects, setting the predefined targets as four types of targets including pedestrians, chairs where people sit, tables and computer displays in the image; since the existing pascalloc 2007 selects 20 object classes, in this embodiment, in order to improve the accuracy of object detection, unnecessary tags need to be reduced to adapt to the identification of the target object in the indoor environment, wherein the above-defined 4 object classes are considered as predefined specific models, including chairs, tables, people and computer monitors; the output data of the last layer (classified output layer) of the convolutional neural network constructed in the invention directly corresponds to the label, so that the output data can be realized by controlling the output number of the layer;
in this step, the process of labeling the labeling box is as follows: marking each target (a pedestrian, a chair on which a person sits, a table and a computer display) in the training set image by using a marking frame so as to obtain the category and position information of each indoor environment object target in the training set image; the category information is the category to which the name of the object target belongs, the position information is the coordinate of the center point of the marking frame and the width and height of the marking frame, and the currently obtained category and position information are stored in an options folder in an xml format;
after the xml format file which is marked is converted into a txt format file which is suitable for target detection of the improved end-to-end neural model, a folder for storing a data set is established under HOME, and three folders are generated under the folder and named as respectively Anotations, Image Sets and JPEG Images folders. Uniformly adjusting the format of indoor image picture data into a format of 'jpg', uniformly renaming the picture data from '000001. jpg' according to a PASCAL VOC official naming method, and finally storing the processed picture data in a JPEG Images folder;
labeling the picture data, namely labeling the category and the position information of the target, specifically comprising: and storing the labeling information as a file with the same name and the format of 'xml', and storing the file into an options folder. Generating a training sample set and a testing sample set according to the existing data in proportion, generating a 'train.txt' file and a 'test.txt' file, storing absolute path information of the training sample set and the testing sample set in the files, and placing the 'txt' file in a Main folder under an Image Sets folder.
And step 3: inputting the training set labeled by the labeling box into the convolutional neural network constructed in the step 1, and carrying out network initialization; the method comprises the steps that input image data are subjected to image pixel reduction through 1 pooling layer, then input into a convolution layer connected with the pooling layer, subjected to image feature extraction, subjected to weighting and processing on input feature vectors through a full-connection layer, and subjected to classification output layer, so that preprocessing of training set images is realized; wherein:
initializing the convolutional neural network, and inputting a training set labeled by using a labeling box into the convolutional neural network; preprocessing images in the training set; the preprocessing comprises one or more of rotation, contrast enhancement, inclination and scaling, the image has certain distortion after the preprocessing, and the accuracy of final image recognition can be increased through training of the distorted image.
And 4, step 4: in this embodiment, each image in the preprocessed training set is divided into 14 × 14 grids; the grid divided in the YOLO is used for detecting a target object, and an initial candidate frame is selected by using the grid; each grid randomly generates two initial candidate boxes, or the width and height of the initial candidate boxes are defined in advance according to experience, and a total of 14 × 14 × 2 candidate boxes are generated. The size is a size specified by the neural network model;
the present embodiment considers that the multi-layered convolved trellis and pooled trellis partitioning operation is changed from the original 7 × 7 to 14 × 14 to increase the size of the network feature map. FIG. 3 is a comparison of object recognition for different grid sizes. In fig. 3, the left side is a schematic diagram of target recognition with a 7 × 7 grid, and the right side is a schematic diagram of target recognition with a 14 × 14 grid; as can be seen from fig. 3, the system can predict only 1 target under 7 × 7 grid, but the improved technical solution proposed in this embodiment can identify 2 targets under 14 × 14 grid. When a plurality of target objects are arranged in the graph, particularly small target objects are contained, the extraction capacity of small target features can be increased, and the small target can be identified. The various targets are elements that constitute different environments, and the environments can be distinguished by the identification of objects.
In the improved end-to-end convolution neural network model of the workpiece, the size of the selected image is smaller than that of the image to be detected, so that the speed of operation processing can be ensured, and class identification can be rapidly carried out. Generally, 448 × 448 or 416 × 416, etc. are selected.
And 5: detecting a predefined target aiming at each network cell obtained by segmentation to obtain a class confidence coefficient of a target class of 14 multiplied by 2; setting the output of a convolutional neural network according to the obtained confidence coefficient of the object class, and determining a final object prediction frame;
in this step, the step of generating the target prediction frame specifically includes:
(1) firstly, generating an initial detection frame according to an initial preset coordinate point position;
(2) secondly, predicting a dynamic detection frame, and performing iterative prediction on the generated detection frame to generate a latest detection frame;
(3) secondly, calculating the contact ratio of the latest detection frame; if the coincidence degree of the latest detection frame is greater than or equal to a preset coincidence degree threshold value, the latest detection frame is reserved; if the coincidence degree of the latest detection frame is smaller than a preset coincidence degree threshold value, continuing to predict the dynamic detection frame;
(4) and finally, based on the coincidence degree of the detection frames, taking the latest detection frame reserved as a target prediction frame for detecting the object.
In this step, the calculation process of the confidence of the target category is as follows:
target detection is carried out based on the target prediction frames, whether a target to be distinguished exists in each target prediction frame is predicted, and the distinguishing result is positioned as follows: the confidence level conf (object) is calculated by the formula:
Figure BDA0002366386300000081
wherein, Pr (object) indicates whether an object falls into a cell corresponding to the candidate frame; if yes, the target confidence of the corresponding candidate box in the cell is
Figure BDA0002366386300000091
Otherwise, the candidate frame is determined to have no object, conf (object) is 0; specifically, the calculation formula of the target confidence may be described as:
Figure BDA0002366386300000092
Figure BDA0002366386300000093
illustrating the ratio of the intersection area to the union area of the predicted frame and the actual frame:
Figure BDA0002366386300000094
step 6: taking the training set image marked by the marking box in the step 2 as the input of the convolutional neural network, taking the training set image of the final target prediction box obtained in the step 5 as the output of the convolutional neural network, and training the convolutional neural network to obtain a final weight and the trained convolutional neural network; the training convolutional neural network comprises the following steps:
(1) firstly, receiving an image to be detected, and adjusting the size of the image to be detected according to a preset requirement to generate a first detection image; inputting the first detection image into a convolutional neural network for matching identification to generate an initial candidate box, classification identification information and a classification probability value corresponding to the classification identification information; during training, each picture in the data set marks the center coordinates of the object, when the object falls into a certain grid, the grid is responsible for detecting the object, and two candidate frames generated by the grid share the category;
(2) secondly, determining whether each initial candidate box identifies the target object or not based on the classification probability value, and taking the initial candidate box which successfully identifies the target object as a target prediction box; and performing prediction judgment on the target Object based on the obtained target prediction frames, setting the conditional probability of predicting the target Object to be Pr (Person | Object), and defining the confidence Conf of the target Object in the target prediction frames as follows:
Figure BDA0002366386300000095
wherein Pr (object) is used to determine whether there is an object falling into the object prediction boxIn the corresponding network cell;
Figure BDA0002366386300000101
representing the ratio of the intersection area and the union area of the prediction frame and the actual frame;
it should be further noted that, if the probability of identifying the object in the detection box exceeds the classification probability value, it indicates that the indoor object is enclosed in the detection box, and the object in the picture has been identified. And if the classification probability value is smaller than a preset classification probability threshold value, re-identifying until the classification probability value is larger than the preset classification probability threshold value. The neural network model performs multilayer convolution operation on the image.
(3) Finally, for each target prediction frame, predicting the probability of the target object and the position of the boundary frame, wherein the predicted value output by each target prediction frame is as follows:
[X,Y,W,H,Conf(Object),Conf];
wherein X, Y is the offset of the predicted frame center relative to the network cell boundary, W, H is the ratio of the predicted frame width to the whole image; for each image data input, the final net output is the vector M × N × B × [ X, Y, W, H, Conf (object), Conf ].
And 7: and testing the images of the indoor environment of the test set by using the trained convolutional neural network and the final weight value so as to determine the target category and the positioning of the indoor environment.
The invention provides an indoor object target detection method based on an improved end-to-end neural model, which uses a deep neural network to perform an emotion robot interaction environment object target detection experiment. And improving an end-to-end neural model, and carrying out verification analysis on an experimental result. From experimental results, the improved end-to-end neural model on the self-made data set can improve the average accuracy of object detection. Based on the deep neural network, the learning model can be converted from a general model to a specific model for target detection. Context categories relating to environmental information can continue to be enriched, data sets enriched, and applied to emotional robot interaction systems.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (6)

1. An indoor object target detection method based on an improved end-to-end neural network is characterized by comprising the following steps:
s1, constructing an end-to-end convolutional neural network, wherein the end-to-end convolutional neural network comprises a plurality of pooling layers for reducing image pixels, a plurality of convolutional layers for extracting image features, 1 full-connection layer and 1 classification output layer;
s2, acquiring a target image data set, constructing a training set based on the target image data set, labeling a labeling frame of each image in the training set, and determining the category and position information of each predefined target in the images of the training set;
s3, inputting the training set marked by the marking box into the convolutional neural network constructed in the step S1, and carrying out network initialization; the method comprises the steps that input data are subjected to image pixel adjustment through 1 pooling layer, then input into a convolution layer connected with the pooling layer, subjected to image feature extraction, subjected to weighting and processing through a full-connection layer, and output results through a classification output layer, so that the pre-processing of training set images is realized;
s4, dividing each image in the preprocessed training set into M multiplied by N network cells; selecting an initial candidate frame for each image by using the M multiplied by N network cells obtained by segmentation; b initial candidate frames are randomly generated by each network unit cell, and M multiplied by N multiplied by B initial candidate frames are generated in total; the parameters M, N, B are all positive integers and are greater than or equal to 1;
s5, detecting a predefined target for each network cell obtained by segmentation to obtain a category confidence coefficient of the target category of M multiplied by N multiplied by B; setting the output of a convolutional neural network according to the obtained confidence coefficient of the object class, and determining a final object prediction frame;
s6, taking the training set marked by the marking box as the input of the convolutional neural network, taking the target prediction box obtained in the step S5 as the output of the convolutional neural network, and training the convolutional neural network to obtain the final convolutional neural network for detecting the target of the indoor object;
and S7, inputting the image to be subjected to the indoor object target detection into the convolutional neural network obtained based on the training of the step S6 to obtain a target detection result.
2. The indoor object target detection method according to claim 1, wherein in step S2, the target image dataset includes an image dataset and a VOC2007 dataset of an emotional robot indoor interaction environment, and the image annotation software performs annotation of an annotation box on each image in the training set to obtain the category and position information of each target in the images in the training set.
3. The indoor object target detection method of claim 2, wherein the predefined target is set to be a pedestrian, a chair on which a person sits, a table, a computer display included in the image according to the emotional robot interaction scene and the object.
4. The indoor object target detection method according to claim 1, wherein in step S4, the preprocessed training set image is divided into 14 x 14 network cells; and selecting initial candidate frames by using the network cells, wherein 2 initial candidate frames are randomly generated in each network cell, and 14 × 14 × 2 initial candidate frames are generated in total.
5. The indoor object target detection method according to claim 1, wherein in step S5, the target detection is performed on the target prediction boxes, whether or not the target to be discriminated is predicted to exist in each target prediction box is determined based on the confidence conf (object), and the confidence of the target prediction boxes in which the target does not exist is set to 0; wherein, the mathematical formula of the confidence coefficient is defined as:
Figure FDA0002366386290000021
pr (object) is used for judging whether an object falls into the network cell corresponding to the object prediction frame;
Figure FDA0002366386290000022
if the target object exists in the network cell, setting the target confidence coefficient as
Figure FDA0002366386290000023
Otherwise, determining that no target object exists in the target prediction frame, and setting the confidence coefficient to be Conf (object) 0;
Figure FDA0002366386290000024
the ratio of the intersection area of the prediction box and the actual box to the union area is expressed.
6. The indoor object target detection method according to claim 1, wherein in step S6, the training of the convolutional neural network is divided into the following steps:
s51, receiving an image to be detected, and adjusting the size of the image to be detected according to a preset requirement to generate a first detection image; inputting the first detection image into a convolutional neural network for matching identification to generate an initial candidate box, classification identification information and a classification probability value corresponding to the classification identification information;
s52, determining whether each initial candidate box identifies the target object or not based on the classification probability value, and taking the initial candidate box which successfully identifies the target object as a target prediction box; and performing prediction judgment on the target Object based on the obtained target prediction frames, setting the conditional probability of predicting the target Object to be Pr (Person | Object), and defining the confidence Conf of the target Object in the target prediction frames as follows:
Figure FDA0002366386290000031
wherein, Pr (object) is used to judge whether there is object in the network cell corresponding to the object prediction frame;
Figure FDA0002366386290000032
representing the ratio of the intersection area and the union area of the prediction frame and the actual frame;
s53, for each target prediction box, predicting the probability of the target object and the position of the boundary box, wherein the predicted value output by each target prediction box is as follows:
[X,Y,W,H,Conf(Object),Conf];
wherein X, Y is the offset of the predicted frame center relative to the network cell boundary, W, H is the ratio of the predicted frame width to the whole image; for each image data input, the final net output is the vector M × N × B × [ X, Y, W, H, Conf (object), Conf ].
CN202010039334.5A 2020-01-14 2020-01-14 Indoor object target detection method based on improved end-to-end neural network Pending CN111275082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010039334.5A CN111275082A (en) 2020-01-14 2020-01-14 Indoor object target detection method based on improved end-to-end neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010039334.5A CN111275082A (en) 2020-01-14 2020-01-14 Indoor object target detection method based on improved end-to-end neural network

Publications (1)

Publication Number Publication Date
CN111275082A true CN111275082A (en) 2020-06-12

Family

ID=71003008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010039334.5A Pending CN111275082A (en) 2020-01-14 2020-01-14 Indoor object target detection method based on improved end-to-end neural network

Country Status (1)

Country Link
CN (1) CN111275082A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931723A (en) * 2020-09-23 2020-11-13 北京易真学思教育科技有限公司 Target detection and image recognition method and device, and computer readable medium
CN112200274A (en) * 2020-12-09 2021-01-08 湖南索莱智能科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112633052A (en) * 2020-09-15 2021-04-09 北京华电天仁电力控制技术有限公司 Belt tearing detection method
CN112766046A (en) * 2020-12-28 2021-05-07 深圳市捷顺科技实业股份有限公司 Target detection method and related device
CN112926681A (en) * 2021-03-29 2021-06-08 复旦大学 Target detection method and device based on deep convolutional neural network
CN113112866A (en) * 2021-04-14 2021-07-13 深圳市旗扬特种装备技术工程有限公司 Intelligent traffic early warning method and intelligent traffic early warning system
CN113160144A (en) * 2021-03-25 2021-07-23 平安科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium
CN113158922A (en) * 2021-04-26 2021-07-23 平安科技(深圳)有限公司 Traffic flow statistical method, device and equipment based on YOLO neural network
CN113177511A (en) * 2021-05-20 2021-07-27 中国人民解放军国防科技大学 Rotating frame intelligent perception target detection method based on multiple data streams
CN113361319A (en) * 2021-04-02 2021-09-07 中国电子科技集团公司第五十四研究所 Indoor scene detection method based on target detection and personnel attribute identification
CN113508421A (en) * 2021-06-24 2021-10-15 商汤国际私人有限公司 Method, device, equipment and storage medium for switching state of desktop game
CN113688825A (en) * 2021-05-17 2021-11-23 海南师范大学 AI intelligent garbage recognition and classification system and method
CN113837086A (en) * 2021-09-24 2021-12-24 南通大学 Reservoir phishing person detection method based on deep convolutional neural network
CN114120057A (en) * 2021-11-09 2022-03-01 华侨大学 Confusion matrix generation method based on Paddledetection
WO2022042352A1 (en) * 2020-08-28 2022-03-03 安翰科技(武汉)股份有限公司 Image recognition method, electronic device and readable storage medium
CN114140792A (en) * 2022-02-08 2022-03-04 山东力聚机器人科技股份有限公司 Micro target detection method and device based on dynamic sliding window
CN114739388A (en) * 2022-04-20 2022-07-12 中国移动通信集团广东有限公司 Indoor positioning navigation method and system based on UWB and laser radar
CN115082713A (en) * 2022-08-24 2022-09-20 中国科学院自动化研究所 Method, system and equipment for extracting target detection frame by introducing space contrast information
CN115100631A (en) * 2022-07-18 2022-09-23 浙江省交通运输科学研究院 Road map acquisition system and method for multi-source information composite feature extraction
CN116863342A (en) * 2023-09-04 2023-10-10 江西啄木蜂科技有限公司 Large-scale remote sensing image-based pine wood nematode dead wood extraction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330437A (en) * 2017-07-03 2017-11-07 贵州大学 Feature extracting method based on the real-time detection model of convolutional neural networks target
CN109447033A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Vehicle front obstacle detection method based on YOLO
CN109543754A (en) * 2018-11-23 2019-03-29 中山大学 The parallel method of target detection and semantic segmentation based on end-to-end deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330437A (en) * 2017-07-03 2017-11-07 贵州大学 Feature extracting method based on the real-time detection model of convolutional neural networks target
CN109447033A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Vehicle front obstacle detection method based on YOLO
CN109543754A (en) * 2018-11-23 2019-03-29 中山大学 The parallel method of target detection and semantic segmentation based on end-to-end deep learning

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022042352A1 (en) * 2020-08-28 2022-03-03 安翰科技(武汉)股份有限公司 Image recognition method, electronic device and readable storage medium
CN112633052A (en) * 2020-09-15 2021-04-09 北京华电天仁电力控制技术有限公司 Belt tearing detection method
CN111931723B (en) * 2020-09-23 2021-01-05 北京易真学思教育科技有限公司 Target detection and image recognition method and device, and computer readable medium
CN111931723A (en) * 2020-09-23 2020-11-13 北京易真学思教育科技有限公司 Target detection and image recognition method and device, and computer readable medium
CN112200274A (en) * 2020-12-09 2021-01-08 湖南索莱智能科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112766046B (en) * 2020-12-28 2024-05-10 深圳市捷顺科技实业股份有限公司 Target detection method and related device
CN112766046A (en) * 2020-12-28 2021-05-07 深圳市捷顺科技实业股份有限公司 Target detection method and related device
CN113160144A (en) * 2021-03-25 2021-07-23 平安科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium
CN113160144B (en) * 2021-03-25 2023-05-26 平安科技(深圳)有限公司 Target object detection method, target object detection device, electronic equipment and storage medium
CN112926681A (en) * 2021-03-29 2021-06-08 复旦大学 Target detection method and device based on deep convolutional neural network
CN113361319A (en) * 2021-04-02 2021-09-07 中国电子科技集团公司第五十四研究所 Indoor scene detection method based on target detection and personnel attribute identification
CN113112866A (en) * 2021-04-14 2021-07-13 深圳市旗扬特种装备技术工程有限公司 Intelligent traffic early warning method and intelligent traffic early warning system
CN113158922A (en) * 2021-04-26 2021-07-23 平安科技(深圳)有限公司 Traffic flow statistical method, device and equipment based on YOLO neural network
CN113688825A (en) * 2021-05-17 2021-11-23 海南师范大学 AI intelligent garbage recognition and classification system and method
CN113177511A (en) * 2021-05-20 2021-07-27 中国人民解放军国防科技大学 Rotating frame intelligent perception target detection method based on multiple data streams
CN113508421A (en) * 2021-06-24 2021-10-15 商汤国际私人有限公司 Method, device, equipment and storage medium for switching state of desktop game
CN113837086A (en) * 2021-09-24 2021-12-24 南通大学 Reservoir phishing person detection method based on deep convolutional neural network
CN114120057A (en) * 2021-11-09 2022-03-01 华侨大学 Confusion matrix generation method based on Paddledetection
CN114140792A (en) * 2022-02-08 2022-03-04 山东力聚机器人科技股份有限公司 Micro target detection method and device based on dynamic sliding window
CN114739388A (en) * 2022-04-20 2022-07-12 中国移动通信集团广东有限公司 Indoor positioning navigation method and system based on UWB and laser radar
CN115100631A (en) * 2022-07-18 2022-09-23 浙江省交通运输科学研究院 Road map acquisition system and method for multi-source information composite feature extraction
CN115082713A (en) * 2022-08-24 2022-09-20 中国科学院自动化研究所 Method, system and equipment for extracting target detection frame by introducing space contrast information
CN116863342A (en) * 2023-09-04 2023-10-10 江西啄木蜂科技有限公司 Large-scale remote sensing image-based pine wood nematode dead wood extraction method
CN116863342B (en) * 2023-09-04 2023-11-21 江西啄木蜂科技有限公司 Large-scale remote sensing image-based pine wood nematode dead wood extraction method

Similar Documents

Publication Publication Date Title
CN111275082A (en) Indoor object target detection method based on improved end-to-end neural network
CN107808143B (en) Dynamic gesture recognition method based on computer vision
CN110728209B (en) Gesture recognition method and device, electronic equipment and storage medium
CN107038448B (en) Target detection model construction method
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN111709310B (en) Gesture tracking and recognition method based on deep learning
CN105528575B (en) Sky detection method based on Context Reasoning
CN113033398B (en) Gesture recognition method and device, computer equipment and storage medium
CN104778242A (en) Hand-drawn sketch image retrieval method and system on basis of image dynamic partitioning
JP2012226745A (en) Method and system for detecting body in depth image
CN111862119A (en) Semantic information extraction method based on Mask-RCNN
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
US20090245575A1 (en) Method, apparatus, and program storage medium for detecting object
CN103903013A (en) Optimization algorithm of unmarked flat object recognition
CN106909895B (en) Gesture recognition method based on random projection multi-kernel learning
CN112115291B (en) Three-dimensional indoor model retrieval method based on deep learning
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN112541491A (en) End-to-end text detection and identification method based on image character region perception
CN112861917A (en) Weak supervision target detection method based on image attribute learning
CN108345835B (en) Target identification method based on compound eye imitation perception
CN111597875A (en) Traffic sign identification method, device, equipment and storage medium
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111709317A (en) Pedestrian re-identification method based on multi-scale features under saliency model
Singh et al. Pose recognition using the Radon transform
CN105404866A (en) Implementation method for multi-mode automatic implementation of human body state sensing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200612

RJ01 Rejection of invention patent application after publication