CN111753581A

CN111753581A - Target detection method and device

Info

Publication number: CN111753581A
Application number: CN201910239061.6A
Authority: CN
Inventors: 徐法明; 何洪亮; 林建华; 王进
Original assignee: Rainbow Software Co ltd
Current assignee: Rainbow Software Co ltd; ArcSoft Corp Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-10-09

Abstract

The invention discloses a target detection method and a target detection device. Wherein, the method comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a preset model for detection to obtain a detection result, wherein the detection comprises depth feature extraction, and the depth feature extraction is used for extracting abstract feature expression layer by layer from the image to be detected to obtain a feature space. The method solves the technical problem that in the prior art, model training is time-consuming in the process of detecting the target through the image.

Description

Target detection method and device

Technical Field

The invention relates to the field of machine learning, in particular to a target detection method and device.

Background

With the development of computer technology, image processing technology provides convenience for people's life and work. The image processing technology enhances the visual effect of the picture through various algorithms, such as contrast, noise reduction and the like. Currently, the most common applications of image processing techniques include face detection and recognition, e.g., payment by face recognition, check-in to work.

However, the current detection algorithm still needs to be improved in terms of model training and detection speed.

Disclosure of Invention

The embodiment of the invention provides a target detection method and a target detection device, which at least solve the technical problem that in the prior art, model training is time-consuming in the process of detecting a target through an image.

According to an aspect of an embodiment of the present invention, there is provided a target detection method, including: acquiring an image to be detected; inputting an image to be detected into a preset model for detection to obtain a detection result, wherein the detection comprises depth feature extraction, and the depth feature extraction is used for extracting abstract feature expression layer by layer from the image to be detected to obtain a feature space.

Further, the feature space includes a plurality of feature vectors.

Further, the target detection method further comprises the steps of carrying out first processing on the image to be detected through the convolution layer to obtain a first characteristic; performing second treatment on the first characteristic through the pooling layer to obtain a second characteristic; and extracting abstract feature expression by performing at least one first treatment and at least one second treatment to obtain a feature space.

Further, the detecting further includes feature alignment, and the feature alignment performs feature sampling on the abstract feature expression using different sampling rates to obtain a third feature of the size alignment.

Further, the detecting further comprises feature abstraction and fusion, the feature abstraction and fusion comprising: performing a plurality of branch convolution operations on the third characteristic to obtain a plurality of branch results; performing point-by-point summation operation on the multiple branch results to obtain multiple summation results; and performing connection operation on the plurality of summation results to obtain a fourth characteristic.

Further, the detecting further comprises: inputting the fourth characteristic into the full-connection layer to obtain a prediction result; and classifying and regressing the prediction result to obtain a detection result.

Further, classifying and regressing the prediction results includes filtering negative examples.

Further, filtering the negative examples includes setting a ratio of the positive examples to the negative examples, and filtering the negative examples according to the confidence ranking.

Further, the target detection method further includes: the detection result comprises at least one of the following: whether the image to be detected has the target to be detected, the position, the type and the position of the characteristic point of the target to be detected.

Further, before the image to be detected is input to the preset model for detection to obtain a detection result, the target detection method further includes: and acquiring an image sample, and training the neural network model to obtain a preset model.

Further, the target detection method further includes: obtaining a loss function according to an analysis result and an actual result output by the neural network model; and performing iterative optimization on the loss function.

Further, the preset model comprises a cascade structure of multi-stage detectors, wherein each stage of detectors comprises at least one convolutional layer and at least one pooling layer.

Further, the predetermined model includes a single stage detector, wherein the single stage detector includes at least one convolutional layer and at least one pooling layer.

Further, the target detection method further includes: and monitoring the target in the specific closed space according to the detection result.

According to another aspect of the embodiments of the present invention, there is also provided an object detection apparatus, including: the acquisition module is used for acquiring an image to be detected; the detection module is used for inputting the image to be detected into a preset model for detection to obtain a detection result, wherein the detection comprises depth feature extraction, the depth feature extraction is used for extracting abstract feature expression layer by layer from the image to be detected to obtain a feature space, and the feature space comprises a plurality of feature vectors.

Further, the detection module includes: the first processing module is used for carrying out first processing on the image to be detected through the convolution layer to obtain a first characteristic; the second processing module is used for carrying out second processing on the first characteristics through the pooling layer to obtain second characteristics; and the extraction module is used for extracting the abstract feature expression by performing at least one first treatment and at least one second treatment to obtain a feature space.

Further, the object detection device further includes: the fusion module is used for performing multiple branch convolution operations on the third feature to obtain multiple branch results; performing point-by-point summation operation on the multiple branch results to obtain multiple summation results; and performing connection operation on the plurality of summation results to obtain a fourth characteristic.

Further, the detection module further comprises: the third processing module is used for inputting the fourth characteristic into the full-connection layer to obtain a prediction result; and the fourth processing module is used for classifying and regressing the prediction result to obtain a detection result.

Further, the fourth processing module includes: and the filtering module is used for filtering the negative sample.

Further, the filtering module is further configured to set a ratio of the positive samples to the negative samples, and filter the negative samples according to the confidence ranking.

Further, the detection result includes at least one of: whether the image to be detected has the target to be detected, the position, the type and the position of the characteristic point of the target to be detected.

Further, the object detection device further includes: and the training module is used for acquiring the image sample and training the neural network model to obtain a preset model.

Further, the training module comprises: the fifth processing module is used for obtaining a loss function according to an analysis result and an actual result output by the neural network model; and the iteration module is used for performing iterative optimization on the loss function.

Further, the object detection device further includes: and the tracking module is used for tracking the target to be searched according to the detection result.

According to another aspect of the embodiments of the present invention, there is also provided an in-vehicle living body monitoring system, including: any item mark detection device; and vehicle state detection means for detecting a vehicle state to obtain vehicle condition information, and activating the target detection means when the vehicle condition information satisfies a preset condition.

Further, the vehicle state includes at least one of: whether the vehicle is stopped; whether the engine is shut down; whether the door is locked; the temperature condition in the vehicle.

Further, the system also comprises an information transmission module which is used for transmitting the detection result obtained by the target detection device to one or more of the client, the control module and the alarm module.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein the apparatus on which the storage medium is located is controlled to execute the object detection method when the program runs.

According to another aspect of the embodiments of the present invention, there is also provided a processor configured to execute a program, where the program executes an object detection method.

In the embodiment of the invention, a mode of extracting image features layer by layer is adopted, after an image to be detected corresponding to a target to be detected is obtained, the image to be detected is input into a preset model for detection, and a detection result is obtained, wherein the preset model is obtained by extracting the features of an image sample layer by layer based on a neural network model. In the process of extracting image features of the image to be detected layer by layer, the expression capability, the detection precision and the detection speed of the model are further improved. In addition, the scheme provided by the application is optimized and simplified in the aspect of network structure, for example, decomposition of convolution layers, combination of different operation layers and the like, so that the detection speed is further increased, and the technical problem that in the prior art, model training is time-consuming in the process of detecting the target through the image is solved. Moreover, the single-stage detection scheme provided by the application can realize an end-to-end training process, only performs one-time training on the model, and only performs one-time filtering on the negative sample, and does not need to perform sample filtering on each layer of the preset model, so that the single-stage detection scheme has the characteristics of small calculated amount and high detection speed, and the model training speed is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method of target detection according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an alternative face detection region according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of an alternative body detection region according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative face detection region according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of an alternative body detection region according to an embodiment of the present invention;

FIG. 6 is a flow chart of an alternative target detection method according to an embodiment of the present invention;

FIG. 7 is a flow chart of an alternative target detection method according to an embodiment of the present invention; and

fig. 8 is a schematic diagram of an object detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of an object detection method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here. In addition, the scheme that this application provided can be applied to computer, cell-phone platform, still can be applied to platforms such as control camera, unmanned aerial vehicle.

Fig. 1 is a flowchart of an object detection method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

and S102, acquiring an image to be detected.

In an optional scheme, a user obtains an image to be detected through camera equipment in an enclosed space, wherein the enclosed space can be vehicles such as automobiles and buses or places such as elevators and storage rooms, the camera is a camera which is self-contained or independently installed in the enclosed space, and also can be an electronic device which is self-contained or independently installed and is provided with the camera.

The camera device can be connected with a detection terminal (for example, a mobile phone, a computer, etc.) of a user, and after the user sends an instruction for obtaining an image, the camera device sends the image to be detected to the detection terminal for detection. Optionally, the instruction for acquiring the image may include a time period corresponding to the image to be detected and an acquisition frequency of the camera device.

And S104, inputting the image to be detected into a preset model for detection to obtain a detection result, wherein the detection comprises depth feature extraction, and the depth feature extraction is realized by extracting abstract feature expression layer by layer from the image to be detected to obtain a feature space.

In step S104, the feature space includes a plurality of feature vectors.

It should be noted that the preset model is a model obtained by training a neural network model based on an image sample, where the image sample is a sample including a large number of images, for example, for an animal image, the image sample is an image sample including an animal and an image sample not including an animal, and the preset model is trained by using a large number of image samples, so that the generalization capability of the preset model can be improved, and the detection accuracy of the preset model can be improved. In addition, in step S104, the preset model obtains the feature space by adopting a layer-by-layer extraction method, and further obtains the detection result.

In an alternative scheme, the preset model comprises a cascade structure of multiple stages of detectors, each stage of detector is used for filtering a part of negative samples, and the detection precision of the image can be improved.

In an alternative, the predetermined pattern comprises a single stage detector. Compared with a multi-stage detector, the single-stage detector can achieve an end-to-end training process, only trains the model once, and only filters the negative sample once, and does not need to filter the sample at each layer of the preset model.

In addition, it should be further noted that the detection result in step S104 may be the detection of the content of the target to be detected, and the detection result includes at least one of the following: whether the image to be detected has the target to be detected, the position, the type and the position of the feature point of the target to be detected, for example, in the case that the target to be detected is a pet dog, the detection result may include: whether the image to be detected contains the dog, the position of the dog (for example, in a closed space, in the center of a road, beside a garbage bin on the roadside), the type of the dog (for example, the hair length of the dog, the color of the dog hair, the variety of the dog and the like), and the position of the characteristic point.

In an alternative arrangement, the target in a particular enclosed space may be monitored based on the detection results. For example, whether children or pets exist in the monitoring vehicle is taken as an example for explanation, after the image pickup device sends the acquired image (i.e., the image to be detected) to the detection terminal, the detection terminal detects the image through a preset model to obtain a detection result. When the detection result shows that the children or the pets exist in the car, the car owner can be informed in time so as to avoid the dangers of the children or the pets in the car.

Based on the schemes defined in the above steps S102 to S104, it can be known that, after the image to be detected is obtained, the image to be detected is input to the preset model for detection in a manner of extracting image features layer by layer, so as to obtain a detection result, where the detection includes depth feature extraction, and the depth feature extraction obtains a feature space by extracting abstract feature expression layer by layer from the image to be detected.

It is easy to notice that, in the process of extracting image features layer by layer from an image to be detected, when the preset model comprises a single-stage detector, the preset model can be trained only once, namely, the image is detected in a single stage, so that the model training speed is increased, the purpose of saving the time of model training is achieved, the technical effect of increasing the model training speed is realized, and the technical problem that the model training is time-consuming in the process of detecting a target through the image in the prior art is solved.

Optionally, before the image to be detected is input to the preset model for detection, and a detection result is obtained, the detection terminal needs to train the preset model. Specifically, the detection terminal firstly obtains an image sample, and then trains the neural network model based on the image sample to obtain a preset model.

Specifically, the detection terminal can obtain a loss function according to an analysis result and an actual result output by the neural network model, and then perform iterative optimization on the loss function to obtain a preset model. Optionally, in the process of performing iterative optimization on the loss function, a model corresponding to the minimum value of the output value of the loss function may be used as a preset model.

It should be noted that the image sample includes a first type image and a second type image, where the first type image is an image including a preset type of object, and the second type image is an image not including the preset type of object, for example, if the preset type is an animal, the first type image is an image including the animal, and the second type image is an image not including the animal. Specifically, after acquiring a plurality of first-type images and a plurality of second-type images, the detection terminal may obtain an image sample according to the first-type images and the second-type images.

Optionally, the neural network model at least includes: a convolutional layer, a pooling layer, and an output layer. The detection terminal can process the image to be detected for the first time through the convolution layer to obtain the first characteristic. And then, carrying out second processing on the first features through the pooling layer to obtain second features, and then extracting abstract expression to obtain a feature space by carrying out at least one first processing and at least one second processing, wherein the first processing is convolution processing, and the second features at least comprise main features of the image to be detected.

In an alternative, the predetermined model comprises a cascade of multiple stages of detectors, each stage of said detectors comprising at least one of said convolutional layers and at least one of said pooling layers.

In an alternative, the predetermined model comprises a single stage detector comprising at least one convolutional layer and at least one pooling layer.

It should be noted that, in the process of performing convolution processing on the image to be detected based on the convolution layer, convolution result characteristics of different filter operators can be obtained, that is, the first characteristic is obtained. The first feature is then input to the pooling layer and processed to obtain a maximum or average feature, i.e., the second feature described above. And repeating the process, namely taking the output of the current layer as the input of the next layer, and finally extracting the abstract feature expression of the image to be detected to obtain the feature space.

Further, it should be noted that for an animal, the main features of each image may include, but are not limited to, the length of the animal's ear, the position of the ear, the shape and size of the nose, the shape and size of the mouth, the length of the tail, etc.

Further, the preset model also performs feature alignment operation, feature abstraction and fusion, classification and regression on the image to be detected. Specifically, the preset model performs feature sampling on the abstract feature expression by using different sampling rates to obtain a third feature with aligned size. And then performing multiple branch convolution operations on the third characteristic to obtain multiple branch results, and performing point-by-point summation operation on the multiple branch results to obtain multiple summation results. And then, connecting the plurality of summation results to obtain a fourth feature, inputting the fourth feature into a full-connection layer to obtain a prediction result, and classifying and regressing the prediction result to obtain a detection result.

Optionally, the process of classifying and regressing the preset result is essentially a process of filtering the negative sample. Specifically, the proportion of the positive samples and the negative samples is set, then the negative samples are sorted according to the confidence coefficient, and the negative samples are filtered according to the sorting result.

It should be noted that, after obtaining the feature space, the detection terminal learns whether the region to be detected contains a preset type of target (e.g., whether the face of a pet is contained), and simultaneously extracts features in the region to be detected. The region to be detected is an effective detection region for detecting the image to be detected, and for an animal, the region to be detected may include a face detection region and a body detection region, as shown in fig. 2 and 3, where fig. 2 shows the face detection region and fig. 3 shows the body detection region.

Optionally, the area to be detected is taken as a face detection area for example. Firstly, after a face detection area in an image to be detected is determined, size information of the face detection area is determined, then a sampling rate corresponding to the size information is determined, and the sampling rate is used for sampling the face detection area in the image to be detected, so that a sampled third feature is obtained. Further, the detection terminal performs multiple branch convolution operations on the sampled third feature to obtain multiple branch results, then performs point-by-point summation operations on the multiple branch results to obtain multiple summation results, performs multiple branch convolution operations on the multiple summation results, and performs feature connection operations on the branch convolution results. Repeating the steps to obtain the characteristics of higher abstraction and higher expression capability, namely obtaining the fourth characteristic.

It should be noted that the neural network model further includes a fully connected layer. After the fourth feature is obtained, the detection terminal may perform classification and regression based on the fourth feature to achieve detection and key point positioning (for example, the reference numerals 1 to 11 in fig. 4 are key points) of the face detection region of the target to be detected and detection and positioning of the body detection region of the target to be detected.

Optionally, for the face detection region, the detection terminal first obtains coordinate values of the face features of the target of the preset type included in the region to be detected, then predicts the fourth feature based on the full connection layer to obtain a prediction result, calculates a loss function of the coordinate values and the prediction result, and finally performs iterative training on the loss function, and takes a result corresponding to the minimum output value of the loss function as a processing result.

Optionally, for the body detection area, the detection terminal first obtains coordinate values of body features of a preset type of target included in the area to be detected. And then predicting the fourth feature based on the full-link layer to obtain a prediction result, calculating a loss function of the coordinate values and the prediction result, performing iterative training on the loss function, taking a result corresponding to the minimum output value of the loss function as a processing result, and marking a detection result on the upper left corner of the picture as shown in fig. 5.

In an alternative, fig. 6 and 7 show a target detection method based on a face detection region and a target detection method based on a body detection region, respectively. As can be seen from fig. 6 and 7, the methods adopted by the two methods are substantially the same, and both methods are that after an image to be detected is obtained, depth feature extraction is performed on the image to be detected to obtain a region to be detected and features corresponding to the region to be detected, then alignment operation is performed on the features of the region to be detected based on size information of the region to be detected, and feature abstraction and fusion are performed on the aligned features. What is different is that for the face detection area, the positioning, classification and key point positioning of the face area are realized; for the body detection region, a localization of the body region and a classification of the body region are realized.

It should be noted that the solution provided in the present application can be applied to target detection, as well as target identification, for example, a pet dog of a user is lost, and in order to find the lost pet dog, the user obtains images (i.e., images to be identified) captured by respective cameras through a client, for example, cameras (e.g., cameras) are installed in and around the home of the user. After the camera device sends the acquired image (namely the image to be identified) to the detection terminal, the detection terminal identifies the image through a preset model to obtain an identification result. The recognition terminal then acquires a target image of a target (e.g., a lost pet dog) input by the user. And the identification terminal identifies the target image and the image to be identified according to the identification result, and screens out the target image containing the target to be identified from the image to be identified.

Further, the recognition terminal can also acquire position information of each image pickup device. Optionally, after obtaining the target image, the recognition terminal determines a target image capturing device that transmits the target image, and determines the appearance position of the target according to the position of the target image capturing device. And then determining the appearance time of the target according to the acquisition time of the target image acquired by the target camera device. And finally, the recognition terminal draws a moving path of the target according to the appearance position and the appearance time of the target, and the user tracks the target according to the moving path.

In addition, the scheme provided by the application can also be applied to the special effect aspect of the pet face, such as virtual sunglasses, virtual blush and the like. In addition, the scheme provided by the application can also be applied to the aspect of preprocessing the pet photo, for example, sharpening, enhancing and the like are carried out on the pet face area.

Based on the content described in the embodiment, the scheme provided by the application adopts a mode of extracting features layer by layer and abstract fusion of the features, so that the expression capability, the detection precision and the detection speed of the model are further improved. In addition, the scheme provided by the application is optimized and simplified in the aspect of network structure, for example, decomposition of convolution layers, combination of different operation layers and the like, so that the detection speed is further improved. Moreover, the single-stage detection scheme provided by the application can realize an end-to-end training process, only performs one-time training on the model, and only performs one-time filtering on the negative sample, and does not need to perform sample filtering on each layer of the preset model, so that the single-stage detection scheme has the characteristics of small calculated amount and high detection speed, and the model training speed is improved.

Example 2

According to an embodiment of the present invention, there is also provided an embodiment of an object detection apparatus, where fig. 8 is a schematic diagram of an object detection apparatus according to an embodiment of the present invention, and as shown in fig. 8, the apparatus includes: an acquisition module 801 and a detection module 803.

The acquisition module 801 is used for acquiring an image to be detected; the detection module 803 is configured to input the image to be detected into a preset model for detection, so as to obtain a detection result, where the detection includes depth feature extraction, and the depth feature extraction obtains a feature space by extracting abstract feature expression layer by layer from the image to be detected.

It should be noted here that the acquiring module 801 and the detecting module 803 correspond to steps S102 to S104 of the above embodiment, and the two modules are the same as the example and the application scenario realized by the corresponding steps, but are not limited to the disclosure of the above embodiment.

Optionally, the feature space includes a plurality of feature vectors.

In an alternative, the detection module comprises: the first processing module is used for carrying out first processing on the image to be detected through the convolution layer to obtain a first characteristic; the second processing module is used for carrying out second processing on the first characteristics through the pooling layer to obtain second characteristics; and the extraction module is used for extracting the abstract feature expression by performing at least one first treatment and at least one second treatment to obtain a feature space.

Optionally, the detecting further includes feature alignment, and the feature alignment performs feature sampling on the abstract feature expression using different sampling rates to obtain a third feature with aligned size.

Optionally, the target detection apparatus further includes: the fusion module is used for abstracting and fusing the features, and is also used for performing a plurality of branch convolution operations on the third features to obtain a plurality of branch results; performing point-by-point summation operation on the multiple branch results to obtain multiple summation results; and performing connection operation on the plurality of summation results to obtain a fourth characteristic.

In an optional aspect, the detection module further includes: the third processing module is used for inputting the fourth characteristic into the full-connection layer to obtain a prediction result; and the fourth processing module is used for classifying and regressing the prediction result to obtain a detection result.

Wherein, the fourth processing module includes: and the filtering module is used for filtering the negative sample.

Optionally, the filtering module is further configured to set a ratio between the positive sample and the negative sample, and filter the negative sample according to the confidence ranking.

Optionally, the detection result includes at least one of: whether the image to be detected has the target to be detected, the position, the type and the position of the characteristic point of the target to be detected.

In an optional aspect, the object detection apparatus further includes: and the training module is used for acquiring the image sample and training the neural network model to obtain a preset model.

Wherein, the training module includes: the fifth processing module is used for obtaining a loss function according to an analysis result and an actual result output by the neural network model; and the iteration module is used for performing iterative optimization on the loss function.

In an optional aspect, the object detection apparatus further includes: and the tracking module is used for tracking the target to be searched according to the detection result.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided an in-vehicle living body monitoring system including the target detecting device in embodiment 2; and vehicle state detection means for detecting a vehicle state to obtain vehicle condition information, and activating the target detection means when the vehicle condition information satisfies a preset condition.

In one alternative, the vehicle state includes at least one of: whether the vehicle is stopped; whether the engine is shut down; whether the door is locked; the temperature condition in the vehicle. The above-mentioned vehicle state may be detected by various sensors, for example, by a vehicle acceleration sensor to detect whether the vehicle is stopped and the time of the stop; the temperature condition in the vehicle is detected by the temperature sensor.

In an alternative, the preset condition may be any combination of at least one or more of the following: the vehicle stops and the stop time exceeds a preset value; the engine is flameout, and the flameout time exceeds a preset value; closing the vehicle door; the temperature in the vehicle is lower than or higher than a preset value. As will be appreciated by those skilled in the art, the predetermined condition is primarily for detecting that the vehicle is in an unattended state.

In an optional scheme, the monitoring system further comprises an information transmission module, which is used for transmitting the detection result obtained by the target detection device to one or more of the client, the control module and the alarm module. Specifically, the client may be a mobile phone of a user or a communication device carried at any time, so that the receiving control module may start a corresponding device of the vehicle according to a detection result of the vehicle state. For example, when it is detected that the temperature in the vehicle is lower than or exceeds a certain value, the ventilation apparatus or the air conditioning apparatus is turned on. The alarm module can remind the car owner through predetermineeing the warning mode, predetermine the combination that the warning mode includes one or more of the following: text reminding, picture reminding, vibration reminding, light flicker reminding and sound reminding. After the living body exists in the vehicle, sending alarm information to a preset terminal, wherein the sending mode includes but is not limited to: and sending information to the alarm module in the form of network, short message or telephone.

By the embodiment, when a living body (such as a child or a pet) exists in the car, the car owner can be informed in time so as to avoid similar events that the child or the pet is left in the car for many times in recent years.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein the apparatus in which the storage medium is located is controlled to execute the object detection method in embodiment 1 when the program runs.

Example 5

According to another aspect of the embodiments of the present invention, there is also provided a processor configured to run a program, where the program executes the object detection method in embodiment 1.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of object detection, comprising:

acquiring an image to be detected;

inputting the image to be detected into a preset model for detection to obtain a detection result, wherein the detection comprises depth feature extraction, and the depth feature extraction is used for extracting abstract feature expression layer by layer from the image to be detected to obtain a feature space.

2. The method of claim 1, wherein the feature space comprises a plurality of feature vectors.

3. The method according to claim 1, wherein the depth feature extraction is performed by extracting abstract feature expression layer by layer from the image to be detected, and obtaining a feature space comprises:

carrying out first processing on the image to be detected through the convolution layer to obtain a first characteristic;

performing second processing on the first characteristic through the pooling layer to obtain a second characteristic;

and extracting the abstract feature expression by performing at least one first treatment and at least one second treatment to obtain the feature space.

4. The method of claim 1, wherein the detecting further comprises feature alignment, the feature alignment feature sampling the abstract feature representation using different sampling rates to obtain a size-aligned third feature.

5. The method of claim 4, wherein the detecting further comprises feature abstraction and fusion, the feature abstraction and fusion comprising:

performing a plurality of branch convolution operations on the third feature to obtain a plurality of branch results;

performing point-by-point summation operation on the multiple branch results to obtain multiple summation results;

and performing connection operation on the plurality of summation results to obtain a fourth characteristic.

6. The method of claim 5, wherein the detecting further comprises:

inputting the fourth feature into a full-connection layer to obtain a prediction result;

and classifying and regressing the prediction result to obtain the detection result.

7. The method of claim 6, wherein classifying and regressing the prediction results comprises filtering negative examples.

8. The method of claim 7, wherein filtering the negative examples comprises setting a ratio of positive examples to the negative examples, and filtering the negative examples according to a confidence ranking.

9. The method of claim 1, wherein the detection result comprises at least one of: whether the image to be detected has the target to be detected, the position and the type of the target to be detected and the position of the characteristic point.

10. The method according to claim 1, wherein before the image to be detected is input to a preset model for detection, the method further comprises:

and acquiring an image sample, and training a neural network model to obtain the preset model.

11. The method of claim 10, wherein obtaining image samples and training a neural network model comprises:

obtaining a loss function according to an analysis result and an actual result output by the neural network model;

and performing iterative optimization on the loss function.

12. The method of claim 1, wherein the predetermined model comprises a cascaded configuration of multiple stages of detectors, wherein each stage of the detectors comprises at least one convolutional layer and at least one pooling layer.

13. The method of claim 1, wherein the predetermined model comprises a single stage detector, wherein the single stage detector comprises at least one convolutional layer and at least one pooling layer.

14. The method of claim 1, further comprising: and monitoring the target in the specific closed space according to the detection result.

15. An object detection device, comprising:

the acquisition module is used for acquiring an image to be detected;

the detection module is used for inputting the image to be detected into a preset model for detection to obtain a detection result, wherein the detection comprises depth feature extraction, and the depth feature extraction is used for extracting abstract feature expression layer by layer from the image to be detected to obtain a feature space.

16. The apparatus of claim 15, wherein the feature space comprises a plurality of feature vectors.

17. The apparatus of claim 15, wherein the detection module comprises:

the first processing module is used for carrying out first processing on the image to be detected through the convolution layer to obtain a first characteristic;

the second processing module is used for carrying out second processing on the first characteristics through the pooling layer to obtain second characteristics;

and the extraction module is used for extracting the abstract feature expression by performing at least one first processing and at least one second processing to obtain the feature space.

18. The apparatus of claim 15, wherein the detecting further comprises feature alignment, the feature alignment feature sampling the abstract feature representation using different sampling rates to obtain a size-aligned third feature.

19. The apparatus of claim 18, further comprising: the fusion module is used for performing multiple branch convolution operations on the third feature to obtain multiple branch results; performing point-by-point summation operation on the multiple branch results to obtain multiple summation results; and performing connection operation on the plurality of summation results to obtain a fourth characteristic.

20. The apparatus of claim 19, wherein the detection module further comprises:

the third processing module is used for inputting the fourth characteristic into a full-connection layer to obtain a prediction result;

and the fourth processing module is used for classifying and regressing the prediction result to obtain the detection result.

21. The apparatus of claim 20, wherein the fourth processing module comprises: and the filtering module is used for filtering the negative sample.

22. The apparatus of claim 21, wherein the filtering module is further configured to set a ratio of positive samples to negative samples, and filter the negative samples according to a confidence ranking.

23. The apparatus of claim 15, wherein the detection result comprises at least one of: whether the image to be detected has the target to be detected, the position and the type of the target to be detected and the position of the characteristic point.

24. The apparatus of claim 15, further comprising:

and the training module is used for acquiring an image sample and training the neural network model to obtain the preset model.

25. The apparatus of claim 24, wherein the training module comprises:

the fifth processing module is used for obtaining a loss function according to an analysis result and an actual result output by the neural network model;

and the iteration module is used for performing iterative optimization on the loss function.

26. An in-vehicle living body monitoring system, comprising:

the object detection device of any one of claims 15 to 25; and

and the vehicle state detection device is used for detecting the vehicle state to obtain vehicle condition information and starting the target detection device when the vehicle condition information meets a preset condition.

27. The system of claim 26, wherein the vehicle state comprises at least one of: whether the vehicle is stopped; whether the engine is shut down; whether the door is locked; the temperature condition in the vehicle.

28. The system of claim 26, further comprising: and the information transmission module is used for transmitting the detection result obtained by the target detection device to one or more of the client, the control module and the alarm module.

29. A storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the object detection method of any one of claims 1 to 14.

30. A processor configured to run a program, wherein the program is configured to perform the object detection method of any one of claims 1 to 14 when the program is run.