CN117636201A

CN117636201A - Camera object detection method and system based on edge calculation

Info

Publication number: CN117636201A
Application number: CN202311417833.3A
Authority: CN
Inventors: 赵泽钧; 袁苇; 张宏辉
Original assignee: Fujian Newland Communication Science Technologies Co ltd
Current assignee: Fujian Newland Communication Science Technologies Co ltd
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-03-01

Abstract

The invention provides a camera object detection method and a camera object detection system based on edge calculation, which belong to the technical field of camera object detection, wherein the method comprises the following steps: step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame; step S20, the cloud host predicts the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model, and sends a prediction result to the camera; step S30, the camera performs differential encoding on the video frames to be transmitted based on the received prediction result and then sends the video frames to the cloud host; and S40, the cloud host performs object detection on the video frame based on an object detection model constructed by the R-FCN network. The invention has the advantages that: the accuracy and the efficiency of camera object detection are greatly improved.

Description

Camera object detection method and system based on edge calculation

Technical Field

The invention relates to the technical field of camera target detection, in particular to a camera object detection method and system based on edge calculation.

Background

The internet of things platform (cloud) is used for visual data analysis, and has wide application in many fields, such as families, businesses, robots, autopilots and the like. Because many cameras lack sufficient computing and storage resources, they cannot perform complex tasks such as object detection that require a neural network to participate, and therefore, an image needs to be transmitted to the cloud for object detection (object detection).

With the increasing number of cameras, the high data volume of the image sensor in the camera can press the bandwidth of the shared channel, for example, a camera transmitting 1920x1080 video at 30fps needs a bandwidth of 1492 megabits per second (Mbps), under the condition of limited bandwidth, the camera has to adopt a compression technology to reduce the transmitted data volume, but the compressed image can also affect the accuracy of cloud detection, if not compressed, the sending time of the image can be prolonged, and the efficiency of camera object detection is low.

Therefore, how to provide a method and a system for detecting a camera object based on edge calculation, so as to improve the accuracy and efficiency of camera object detection, is a technical problem to be solved.

Disclosure of Invention

The invention aims to solve the technical problem of providing a camera object detection method and a camera object detection system based on edge calculation, which can improve the accuracy and efficiency of camera object detection.

In a first aspect, the present invention provides a camera object detection method based on edge calculation, including the following steps:

step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame;

step S20, the cloud host predicts the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model, and sends a prediction result to the camera;

step S30, the camera performs differential encoding on the video frames to be transmitted based on the received prediction result and then sends the video frames to the cloud host;

and S40, the cloud host performs object detection on the video frame based on an object detection model constructed by the R-FCN network.

Further, in the step S20, the RO I prediction model is composed of an association module and a motion detection module;

the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm;

the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI area and the non-ROI area of the video frame including the next frame.

Further, the state predictions include predictions of visible state variables and latent state variables;

the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates;

the RO I region and non-ROI region constitute a complete video frame.

Further, the step S30 specifically includes:

the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, matches the coding modes corresponding to the RO I region and the non-ROI region through a preset RO I mapping table, differentially codes the ROI region and the non-RO I region in a video frame to be transmitted through the coding modes, enables the compression ratio of the non-ROI region to be higher than that of the RO I region, and then sends the coded video frame to a cloud host in real time.

Further, in the step S40, the R-FCN network is composed of a feature extractor that depends on the region generation network, and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability of each object.

In a second aspect, the present invention provides a camera object detection system based on edge calculation, including:

the video frame receiving module is used for receiving video frames sent by the camera frame by utilizing the cloud host of the edge calculation;

the RO I region prediction module is used for predicting the ROI region in the next frame of video frame by utilizing the received video frame based on the ROI prediction model by the cloud host, and sending the prediction result to the camera;

the differential coding module is used for enabling the camera to carry out differential coding on the video frames to be transmitted based on the received prediction result and then send the video frames to the cloud host;

the object detection module is used for carrying out object detection on the video frames by the cloud host based on an object detection model constructed by the R-FCN network.

Further, in the RO I region prediction module, the ROI prediction model is composed of an association module and a motion detection module;

the RO I region and non-ROI region constitute a complete video frame.

Further, the differential encoding module is specifically configured to:

Further, in the object detection module, the R-FCN network is composed of a feature extractor that depends on a region generation network, and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability of each object.

The invention has the advantages that:

the method comprises the steps that a video frame sent by a camera is received frame by frame through a cloud host, the cloud host predicts an ROI (region of interest) area in a next frame of video frame by utilizing the received video frame based on an ROI prediction model, a prediction result comprising the ROI area and a non-ROI area of the next frame of video frame is sent to the camera, the camera carries out differential encoding on the video frame to be transmitted based on the prediction result, even if the compression ratio of the non-ROI area in the video frame is higher than an RO I area, not only is the whole size of the video frame compressed to a certain extent, but also the image quality of the RO I area is guaranteed, the encoded video frame is sent to the cloud host in real time, and finally the cloud host carries out object detection on the video frame based on an object detection model constructed by an R-FCN network; the method and the device have the advantages that through the cooperation of the camera and the cloud host, the ROI area and the non-RO I area in the video frame are differentially encoded (compressed), so that the accuracy of object detection of the video frame is improved while the bandwidth constraint is met, and finally the accuracy and the efficiency of object detection of the camera are greatly improved.

Drawings

The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.

Fig. 1 is a flowchart of a camera object detection method based on edge calculation according to the present invention.

Fig. 2 is a schematic structural diagram of a camera object detection system based on edge calculation according to the present invention.

Detailed Description

According to the technical scheme in the embodiment of the application, the overall thought is as follows: through the cooperation of the camera and the cloud host, the ROI area and the non-ROI area in the video frame are differentially encoded, so that the accuracy of object detection of the video frame is improved while bandwidth constraint is met, and the accuracy and the efficiency of object detection of the camera are further improved.

Referring to fig. 1 to 2, a preferred embodiment of a camera object detection method based on edge calculation according to the present invention includes the following steps:

step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame; because the video frames are detected frame by frame in the cloud host, the video frames output by the camera cannot maintain a correlation in the modes of identifiers and the like;

step S20, the cloud host predicts an RO I region in a next frame of video frame by utilizing the received video frame based on an ROI prediction model, and sends a prediction result to the camera;

In the step S20, the ROI prediction model is composed of an association module and a motion detection module;

the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm; since the detection operation is performed frame by frame, this association is an indispensable operation to ensure the correctness of the result;

the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI (region of interest) and the non-RO I region of the video frame including the next frame.

The prediction result is used for guiding the camera to adopt a multi-quality-level compression algorithm, so that the ROI area keeps higher quality, and the non-ROI area keeps lower quality, thereby meeting the double requirements of bandwidth and object detection accuracy.

The state prediction includes a prediction of a visible state variable and a latent state variable;

the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates; the Kalman filter assumes that state variables obey Gaussian distribution and a linear dynamic motion model;

the RO I region and non-ROI region constitute a complete video frame.

The step S30 specifically includes:

the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the RO I region and a coding mode corresponding to the non-ROI region are matched through a preset RO I mapping table, the coder carries out differential coding on the ROI region and the non-ROI region in a video frame to be transmitted through the coding mode, the compression ratio of the non-RO I region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.

I.e. the RO I region and the non-ROI region are encoded with different QF values (compression quality factors). When different QF values are used for encoding different areas of the video frame, the image quality of non-ROI areas will be reduced, and the detection effect will be affected when new objects (new objects) appear in these non-ROI areas, so that it is necessary to have the R-FCN network perform calibration adaptation for JPEG with different QF values, the first output images will be set with corresponding parameters for retraining the R-FCN network, these images will be divided into 5 levels for representing different QF values, and then the R-FCN network will be continuously trained with the Caffe frame at a lower rate.

In the step S40, the R-FCN network is composed of a feature extractor depending on a regional generation network (RPN), and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability (reliability score) of each object.

The invention relates to a preferred embodiment of a camera object detection system based on edge calculation, which comprises the following modules:

the video frame receiving module is used for receiving video frames sent by the camera frame by utilizing the cloud host of the edge calculation; because the video frames are detected frame by frame in the cloud host, the video frames output by the camera cannot maintain a correlation in the modes of identifiers and the like;

In the RO I region prediction module, the ROI prediction model is composed of an association module and a motion detection module;

the ROI area and non-ROI area constitute a complete video frame.

The differential encoding module is specifically configured to:

the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the ROI region and the non-ROI region are matched with a coding mode corresponding to the ROI region and the non-ROI region through a preset ROI mapping table, the coder carries out differential coding on the ROI region and the non-ROI region in a video frame to be transmitted through the coding mode, the compression ratio of the non-ROI region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.

I.e. the ROI areas and non-ROI areas are encoded with different QF values (compression quality factors). When different QF values are used for encoding different areas of the video frame, the image quality of non-ROI areas will be reduced, and the detection effect will be affected when new objects (new objects) appear in these non-ROI areas, so that it is necessary to have the R-FCN network perform calibration adaptation for JPEG with different QF values, the first output images will be set with corresponding parameters for retraining the R-FCN network, these images will be divided into 5 levels for representing different QF values, and then the R-FCN network will be continuously trained with the Caffe frame at a lower rate.

In the object detection module, the R-FCN network is composed of a feature extractor that depends on a regional generation network (RPN), and is configured to generate a corresponding bounding box for each object in a video frame, and generate a class probability (reliability score) of each object.

In summary, the invention has the advantages that:

the method comprises the steps that a video frame sent by a camera is received frame by frame through a cloud host, the cloud host predicts an ROI (region of interest) area in a next frame of video frame by utilizing the received video frame based on an ROI prediction model, a prediction result comprising the ROI area and a non-ROI area of the next frame of video frame is sent to the camera, the camera carries out differential encoding on the video frame to be transmitted based on the prediction result, even if the compression ratio of the non-ROI area in the video frame is higher than that of the ROI area, not only is the whole size of the video frame compressed to a certain extent, but also the image quality of the ROI area is guaranteed, the encoded video frame is sent to the cloud host in real time, and finally the cloud host carries out object detection on the video frame based on an object detection model constructed by an R-FCN network; the method and the device have the advantages that through the cooperation of the camera and the cloud host, the ROI area and the non-ROI area in the video frame are differentially encoded (compressed), so that the accuracy of object detection of the video frame is improved while the bandwidth constraint is met, and finally the accuracy and the efficiency of object detection of the camera are greatly improved.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims

1. A camera object detection method based on edge calculation is characterized in that: the method comprises the following steps:

2. The camera object detection method based on edge calculation as claimed in claim 1, wherein: in the step S20, the ROI prediction model is composed of an association module and a motion detection module;

3. The camera object detection method based on edge calculation as claimed in claim 2, wherein: the state prediction includes a prediction of a visible state variable and a latent state variable;

the ROI area and non-ROI area constitute a complete video frame.

4. The camera object detection method based on edge calculation as claimed in claim 1, wherein: the step S30 specifically includes:

the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the ROI region and the non-ROI region are matched with a coding mode corresponding to the ROI region and the non-ROI region through a preset ROI mapping table, the differential coding is carried out on the ROI region and the non-ROI region in the video frame to be transmitted through the coding mode, the compression ratio of the non-ROI region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.

5. The camera object detection method based on edge calculation as claimed in claim 1, wherein: in the step S40, the R-FCN network is composed of a feature extractor that depends on the region generation network, and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability of each object.

6. The utility model provides a camera object detecting system based on edge calculation which characterized in that: the device comprises the following modules:

the ROI area prediction module is used for predicting the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model by the cloud host, and sending a prediction result to the camera;

7. A camera object detection system based on edge computation as defined in claim 6, wherein: in the ROI area prediction module, the ROI prediction model is composed of an association module and a motion detection module;

8. A camera object detection system based on edge computation as defined in claim 7, wherein: the state prediction includes a prediction of a visible state variable and a latent state variable;

the ROI area and non-ROI area constitute a complete video frame.

9. A camera object detection system based on edge computation as defined in claim 6, wherein: the differential encoding module is specifically configured to:

10. A camera object detection system based on edge computation as defined in claim 6, wherein: in the object detection module, the R-FCN network is composed of a feature extractor depending on a region generation network, and is used for respectively generating a corresponding bounding box for each object in a video frame and generating class probability of each object.