CN112652013A

CN112652013A - Camera object finding method based on deep learning

Info

Publication number: CN112652013A
Application number: CN202110082166.2A
Authority: CN
Inventors: 段强; 李锐; 王建华
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2021-04-13

Abstract

The invention discloses a camera object finding method based on deep learning, which belongs to the technical field of deep learning and image processing. The method for detecting the object by the deep learning target and extracting the features can greatly improve the detection rate of the object, and improve the detection accuracy and the class number of the detected objects.

Description

Camera object finding method based on deep learning

Technical Field

The invention relates to the technical field of deep learning and image processing, in particular to a camera object finding method based on deep learning.

Background

The current concept of the internet of things is prevalent, and a huge number of monitoring cameras exist in society or families, so that most of the life time and the life area of people are covered. The ubiquitous video data can be used for monitoring and can be expanded to other applications. For example, video object finding is also performed by some object finding algorithms based on a camera at present, but most of the algorithms are based on a traditional image comparison mode, the object detection rate of the algorithms is not high, and the detection accuracy is low.

Disclosure of Invention

The technical task of the invention is to provide the camera object finding method based on deep learning aiming at the defects, which can greatly improve the object detection rate, improve the detection accuracy and the class number of the detected objects.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a camera object finding method based on deep learning carries out video analysis on a real-time video or a historical video through target detection or local feature extraction, and positions the current position or the last appearing position of an object to be found.

Deploying a target detection and feature extraction algorithm, calling data of a monitoring camera, acquiring a category or sample image of an article to be searched from a user, performing video analysis to detect or match the article, and giving the current article position or the position of the article appearing last in a history record.

Preferably, the target detection is performed by specifying an object in a real-time surveillance video or a stored surveillance video based on a general target detection algorithm to perform video analysis and positioning.

Further, the target detection algorithm includes efficientDet, YOLO, and/or SSD.

Preferably, the target detection algorithm may be fine-tuned if necessary using its own data derived from the user's labelling of the item susceptible to loss.

Preferably, the local feature extraction is based on deep learning, and feature extraction and comparison are carried out in the video through sample graphs of the given articles for positioning.

Furthermore, the feature extraction network used for local feature extraction comprises GeoDesc and/or Hardnet, only an image sample of an article to be searched is required to be given to generate a feature point set, then a monitoring image is given to generate the feature point set, matching is carried out between the two point sets by using a FLANN or BruteForce method, and supervision information is not required.

Preferably, the video clip of the last occurrence of the item is given when the real-time positioning fails.

Preferably, the method is implemented as follows:

1) deploying a deep learning framework and a target detection and feature extraction algorithm in an edge server or a cloud server, and accessing camera data;

2) converting all the frames of the video into images, and performing uniform pretreatment on all the images;

3) the user gives a category name or a sample figure of an article to be searched and selects different video analysis modes according to different given information;

4) when the article type information is given, firstly searching the type supported by the target detection, and if the article exists, performing the target detection task on the video;

5) when the target detection fails or a sample image of the image is given, extracting and matching image features;

6) and any one of the two steps detects the time and the position of the article when the article to be searched is returned.

The invention also claims a camera object finding device based on deep learning, which comprises: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is used for calling the machine readable program and executing the method.

The invention also claims a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the above-described method.

Compared with the prior art, the camera object searching method based on deep learning has the following beneficial effects:

the method can greatly improve the detection rate of objects by using a deep learning target detection and feature extraction method, including the detection accuracy and the category number of detected articles; and as system redundancy, the method also uses local feature extraction based on deep learning, compared with the traditional SIFT, SURF and other methods, the feature extractor based on learning is more robust, and the number of feature points and the discrimination of feature description are better.

Drawings

Fig. 1 is a flowchart of a camera object finding method based on deep learning according to an embodiment of the present invention;

Detailed Description

The present invention will be further described with reference to the following specific examples.

People can recall spider silk traces in the brain most of time when losing things, and often a proper reminder can enable people to instantly recall the position of the object. And by utilizing target detection or feature extraction in deep learning to carry out video analysis, the current or last appearing position of the article is positioned so as to provide clues for people to seek objects.

The embodiment of the invention provides a camera object finding method based on deep learning,

and performing video analysis on the real-time video or the historical video through target detection or local feature extraction, and positioning to the current position or the last appearing position of the object to be searched.

The camera object finding method based on deep learning has two modes,

the target detection is to perform video analysis and positioning on specified articles in a real-time monitoring video or a stored monitoring video based on a general target detection algorithm;

the local feature extraction is based on deep learning, and feature extraction and comparison are carried out in the video through the sample graph of the given article for positioning.

When the real-time positioning fails, a video clip of the last appearance of the article is given.

The target detection can use advanced target detection algorithms such as efficentdet, YOLO, SSD, etc. When necessary, the data can be finely adjusted by using the data of the user, and the data is derived from the label of the lost article.

The EfficientDet is a target detection algorithm series published by google in 2019, 11 months, respectively comprises eight algorithms from D0-D7, can give the result of SOTA for different equipment limitations, and always obtains better efficiency than the prior art under wide resource constraints. Particularly, under the conditions of a single model and a single scale, the EfficientDet-D7 achieves the most advanced 52.2AP on a COCO testing device, has 52M parameters and 325B FLOPs, and compared with the prior algorithm, the parameter quantity is reduced by 4 to 9 times, and the FLOPs are reduced by 13 to 42 times.

YOLO defines the problem of object detection as a regression problem of bounding box and classification confidence; the whole image is used as input and is divided into SxS grids, each cell predicts B bounding boxes (x, y, w, h) and corresponding classification confidence coefficients (class-specific confidence score), wherein the classification confidence coefficients are the probability that the bounding boxes are objects and the result of multiplying the bounding boxes by a true value IOU.

The SSD abstracts the solution space of the object detection problem into a set of bounding boxes with preset (dimension, aspect ratio), and predicts the classified label and box offset to better frame out the object in each bounding box, and combines the prediction results of a plurality of feature maps with different sizes for one picture, so as to process the objects with different sizes.

Local feature extraction based on deep learning can use GeoDesc, Hardnet and other feature extraction networks, the local feature extraction does not need supervision information, only needs to give an image sample of an object to be searched to generate a feature point set, then gives a monitoring image to generate the feature point set, and matches the two point sets by using a FLANN or BruteForce method.

The embodiment of the invention provides a camera object finding method based on deep learning, which comprises the following implementation processes:

The embodiment of the invention also provides a camera object finding device based on deep learning, which comprises: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to invoke the machine readable program to execute the method for camera object finding based on deep learning described in the above embodiments.

An embodiment of the present invention further provides a computer-readable medium, where the computer-readable medium has stored thereon computer instructions, and when executed by a processor, the computer instructions cause the processor to execute the method for finding an object based on deep learning in the above embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. A camera object finding method based on deep learning is characterized in that real-time video or historical video is subjected to video analysis through target detection or local feature extraction, and the current position or the last appearing position of an object to be found is located.

2. The camera finding method according to claim 1, wherein the object detection is based on a general object detection algorithm to specify an object in a real-time surveillance video or a stored surveillance video for video analysis and positioning.

3. The camera object-seeking method based on deep learning of claim 2, wherein said target detection algorithm comprises efficientDet, YOLO and/or SSD.

4. The camera finding method based on deep learning of claim 3, wherein the target detection algorithm can be fine-tuned by using data from the user's label of the easily lost object.

5. The camera finding method based on deep learning as claimed in claim 1, wherein the local feature extraction is based on deep learning, and feature extraction and comparison are performed in the video through a sample graph of a given article for positioning.

6. The camera object finding method based on deep learning of claim 5, wherein the feature extraction network used for local feature extraction includes GeoDesc and/or Hardnet, an image sample of an object to be found is given, a feature point set is generated, a monitoring image is given to generate a feature point set, and a FLANN or BruteForce method is used for matching between the two point sets.

7. The camera object-seeking method based on deep learning of claim 1, wherein a video clip of the last appearance of an object is given when real-time positioning fails.

8. The camera object finding method based on deep learning according to any one of claims 1 to 7, characterized in that the method is implemented as follows:

9. The utility model provides a camera device of looking for something based on deep learning which characterized in that includes: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 8.

10. Computer readable medium, characterized in that it has stored thereon computer instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 8.