CN117636201A - Camera object detection method and system based on edge calculation - Google Patents

Camera object detection method and system based on edge calculation Download PDF

Info

Publication number
CN117636201A
CN117636201A CN202311417833.3A CN202311417833A CN117636201A CN 117636201 A CN117636201 A CN 117636201A CN 202311417833 A CN202311417833 A CN 202311417833A CN 117636201 A CN117636201 A CN 117636201A
Authority
CN
China
Prior art keywords
roi
camera
object detection
video frame
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311417833.3A
Other languages
Chinese (zh)
Inventor
赵泽钧
袁苇
张宏辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Newland Communication Science Technologies Co ltd
Original Assignee
Fujian Newland Communication Science Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Newland Communication Science Technologies Co ltd filed Critical Fujian Newland Communication Science Technologies Co ltd
Priority to CN202311417833.3A priority Critical patent/CN117636201A/en
Publication of CN117636201A publication Critical patent/CN117636201A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a camera object detection method and a camera object detection system based on edge calculation, which belong to the technical field of camera object detection, wherein the method comprises the following steps: step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame; step S20, the cloud host predicts the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model, and sends a prediction result to the camera; step S30, the camera performs differential encoding on the video frames to be transmitted based on the received prediction result and then sends the video frames to the cloud host; and S40, the cloud host performs object detection on the video frame based on an object detection model constructed by the R-FCN network. The invention has the advantages that: the accuracy and the efficiency of camera object detection are greatly improved.

Description

Camera object detection method and system based on edge calculation
Technical Field
The invention relates to the technical field of camera target detection, in particular to a camera object detection method and system based on edge calculation.
Background
The internet of things platform (cloud) is used for visual data analysis, and has wide application in many fields, such as families, businesses, robots, autopilots and the like. Because many cameras lack sufficient computing and storage resources, they cannot perform complex tasks such as object detection that require a neural network to participate, and therefore, an image needs to be transmitted to the cloud for object detection (object detection).
With the increasing number of cameras, the high data volume of the image sensor in the camera can press the bandwidth of the shared channel, for example, a camera transmitting 1920x1080 video at 30fps needs a bandwidth of 1492 megabits per second (Mbps), under the condition of limited bandwidth, the camera has to adopt a compression technology to reduce the transmitted data volume, but the compressed image can also affect the accuracy of cloud detection, if not compressed, the sending time of the image can be prolonged, and the efficiency of camera object detection is low.
Therefore, how to provide a method and a system for detecting a camera object based on edge calculation, so as to improve the accuracy and efficiency of camera object detection, is a technical problem to be solved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a camera object detection method and a camera object detection system based on edge calculation, which can improve the accuracy and efficiency of camera object detection.
In a first aspect, the present invention provides a camera object detection method based on edge calculation, including the following steps:
step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame;
step S20, the cloud host predicts the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model, and sends a prediction result to the camera;
step S30, the camera performs differential encoding on the video frames to be transmitted based on the received prediction result and then sends the video frames to the cloud host;
and S40, the cloud host performs object detection on the video frame based on an object detection model constructed by the R-FCN network.
Further, in the step S20, the RO I prediction model is composed of an association module and a motion detection module;
the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm;
the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI area and the non-ROI area of the video frame including the next frame.
Further, the state predictions include predictions of visible state variables and latent state variables;
the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates;
the RO I region and non-ROI region constitute a complete video frame.
Further, the step S30 specifically includes:
the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, matches the coding modes corresponding to the RO I region and the non-ROI region through a preset RO I mapping table, differentially codes the ROI region and the non-RO I region in a video frame to be transmitted through the coding modes, enables the compression ratio of the non-ROI region to be higher than that of the RO I region, and then sends the coded video frame to a cloud host in real time.
Further, in the step S40, the R-FCN network is composed of a feature extractor that depends on the region generation network, and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability of each object.
In a second aspect, the present invention provides a camera object detection system based on edge calculation, including:
the video frame receiving module is used for receiving video frames sent by the camera frame by utilizing the cloud host of the edge calculation;
the RO I region prediction module is used for predicting the ROI region in the next frame of video frame by utilizing the received video frame based on the ROI prediction model by the cloud host, and sending the prediction result to the camera;
the differential coding module is used for enabling the camera to carry out differential coding on the video frames to be transmitted based on the received prediction result and then send the video frames to the cloud host;
the object detection module is used for carrying out object detection on the video frames by the cloud host based on an object detection model constructed by the R-FCN network.
Further, in the RO I region prediction module, the ROI prediction model is composed of an association module and a motion detection module;
the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm;
the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI area and the non-ROI area of the video frame including the next frame.
Further, the state predictions include predictions of visible state variables and latent state variables;
the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates;
the RO I region and non-ROI region constitute a complete video frame.
Further, the differential encoding module is specifically configured to:
the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, matches the coding modes corresponding to the RO I region and the non-ROI region through a preset RO I mapping table, differentially codes the ROI region and the non-RO I region in a video frame to be transmitted through the coding modes, enables the compression ratio of the non-ROI region to be higher than that of the RO I region, and then sends the coded video frame to a cloud host in real time.
Further, in the object detection module, the R-FCN network is composed of a feature extractor that depends on a region generation network, and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability of each object.
The invention has the advantages that:
the method comprises the steps that a video frame sent by a camera is received frame by frame through a cloud host, the cloud host predicts an ROI (region of interest) area in a next frame of video frame by utilizing the received video frame based on an ROI prediction model, a prediction result comprising the ROI area and a non-ROI area of the next frame of video frame is sent to the camera, the camera carries out differential encoding on the video frame to be transmitted based on the prediction result, even if the compression ratio of the non-ROI area in the video frame is higher than an RO I area, not only is the whole size of the video frame compressed to a certain extent, but also the image quality of the RO I area is guaranteed, the encoded video frame is sent to the cloud host in real time, and finally the cloud host carries out object detection on the video frame based on an object detection model constructed by an R-FCN network; the method and the device have the advantages that through the cooperation of the camera and the cloud host, the ROI area and the non-RO I area in the video frame are differentially encoded (compressed), so that the accuracy of object detection of the video frame is improved while the bandwidth constraint is met, and finally the accuracy and the efficiency of object detection of the camera are greatly improved.
Drawings
The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.
Fig. 1 is a flowchart of a camera object detection method based on edge calculation according to the present invention.
Fig. 2 is a schematic structural diagram of a camera object detection system based on edge calculation according to the present invention.
Detailed Description
According to the technical scheme in the embodiment of the application, the overall thought is as follows: through the cooperation of the camera and the cloud host, the ROI area and the non-ROI area in the video frame are differentially encoded, so that the accuracy of object detection of the video frame is improved while bandwidth constraint is met, and the accuracy and the efficiency of object detection of the camera are further improved.
Referring to fig. 1 to 2, a preferred embodiment of a camera object detection method based on edge calculation according to the present invention includes the following steps:
step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame; because the video frames are detected frame by frame in the cloud host, the video frames output by the camera cannot maintain a correlation in the modes of identifiers and the like;
step S20, the cloud host predicts an RO I region in a next frame of video frame by utilizing the received video frame based on an ROI prediction model, and sends a prediction result to the camera;
step S30, the camera performs differential encoding on the video frames to be transmitted based on the received prediction result and then sends the video frames to the cloud host;
and S40, the cloud host performs object detection on the video frame based on an object detection model constructed by the R-FCN network.
In the step S20, the ROI prediction model is composed of an association module and a motion detection module;
the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm; since the detection operation is performed frame by frame, this association is an indispensable operation to ensure the correctness of the result;
the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI (region of interest) and the non-RO I region of the video frame including the next frame.
The prediction result is used for guiding the camera to adopt a multi-quality-level compression algorithm, so that the ROI area keeps higher quality, and the non-ROI area keeps lower quality, thereby meeting the double requirements of bandwidth and object detection accuracy.
The state prediction includes a prediction of a visible state variable and a latent state variable;
the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates; the Kalman filter assumes that state variables obey Gaussian distribution and a linear dynamic motion model;
the RO I region and non-ROI region constitute a complete video frame.
The step S30 specifically includes:
the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the RO I region and a coding mode corresponding to the non-ROI region are matched through a preset RO I mapping table, the coder carries out differential coding on the ROI region and the non-ROI region in a video frame to be transmitted through the coding mode, the compression ratio of the non-RO I region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.
I.e. the RO I region and the non-ROI region are encoded with different QF values (compression quality factors). When different QF values are used for encoding different areas of the video frame, the image quality of non-ROI areas will be reduced, and the detection effect will be affected when new objects (new objects) appear in these non-ROI areas, so that it is necessary to have the R-FCN network perform calibration adaptation for JPEG with different QF values, the first output images will be set with corresponding parameters for retraining the R-FCN network, these images will be divided into 5 levels for representing different QF values, and then the R-FCN network will be continuously trained with the Caffe frame at a lower rate.
In the step S40, the R-FCN network is composed of a feature extractor depending on a regional generation network (RPN), and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability (reliability score) of each object.
The invention relates to a preferred embodiment of a camera object detection system based on edge calculation, which comprises the following modules:
the video frame receiving module is used for receiving video frames sent by the camera frame by utilizing the cloud host of the edge calculation; because the video frames are detected frame by frame in the cloud host, the video frames output by the camera cannot maintain a correlation in the modes of identifiers and the like;
the RO I region prediction module is used for predicting the ROI region in the next frame of video frame by utilizing the received video frame based on the ROI prediction model by the cloud host, and sending the prediction result to the camera;
the differential coding module is used for enabling the camera to carry out differential coding on the video frames to be transmitted based on the received prediction result and then send the video frames to the cloud host;
the object detection module is used for carrying out object detection on the video frames by the cloud host based on an object detection model constructed by the R-FCN network.
In the RO I region prediction module, the ROI prediction model is composed of an association module and a motion detection module;
the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm; since the detection operation is performed frame by frame, this association is an indispensable operation to ensure the correctness of the result;
the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI (region of interest) and the non-RO I region of the video frame including the next frame.
The prediction result is used for guiding the camera to adopt a multi-quality-level compression algorithm, so that the ROI area keeps higher quality, and the non-ROI area keeps lower quality, thereby meeting the double requirements of bandwidth and object detection accuracy.
The state prediction includes a prediction of a visible state variable and a latent state variable;
the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates; the Kalman filter assumes that state variables obey Gaussian distribution and a linear dynamic motion model;
the ROI area and non-ROI area constitute a complete video frame.
The differential encoding module is specifically configured to:
the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the ROI region and the non-ROI region are matched with a coding mode corresponding to the ROI region and the non-ROI region through a preset ROI mapping table, the coder carries out differential coding on the ROI region and the non-ROI region in a video frame to be transmitted through the coding mode, the compression ratio of the non-ROI region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.
I.e. the ROI areas and non-ROI areas are encoded with different QF values (compression quality factors). When different QF values are used for encoding different areas of the video frame, the image quality of non-ROI areas will be reduced, and the detection effect will be affected when new objects (new objects) appear in these non-ROI areas, so that it is necessary to have the R-FCN network perform calibration adaptation for JPEG with different QF values, the first output images will be set with corresponding parameters for retraining the R-FCN network, these images will be divided into 5 levels for representing different QF values, and then the R-FCN network will be continuously trained with the Caffe frame at a lower rate.
In the object detection module, the R-FCN network is composed of a feature extractor that depends on a regional generation network (RPN), and is configured to generate a corresponding bounding box for each object in a video frame, and generate a class probability (reliability score) of each object.
In summary, the invention has the advantages that:
the method comprises the steps that a video frame sent by a camera is received frame by frame through a cloud host, the cloud host predicts an ROI (region of interest) area in a next frame of video frame by utilizing the received video frame based on an ROI prediction model, a prediction result comprising the ROI area and a non-ROI area of the next frame of video frame is sent to the camera, the camera carries out differential encoding on the video frame to be transmitted based on the prediction result, even if the compression ratio of the non-ROI area in the video frame is higher than that of the ROI area, not only is the whole size of the video frame compressed to a certain extent, but also the image quality of the ROI area is guaranteed, the encoded video frame is sent to the cloud host in real time, and finally the cloud host carries out object detection on the video frame based on an object detection model constructed by an R-FCN network; the method and the device have the advantages that through the cooperation of the camera and the cloud host, the ROI area and the non-ROI area in the video frame are differentially encoded (compressed), so that the accuracy of object detection of the video frame is improved while the bandwidth constraint is met, and finally the accuracy and the efficiency of object detection of the camera are greatly improved.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims (10)

1. A camera object detection method based on edge calculation is characterized in that: the method comprises the following steps:
step S10, a cloud host computer which utilizes edge calculation receives video frames sent by a camera frame by frame;
step S20, the cloud host predicts the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model, and sends a prediction result to the camera;
step S30, the camera performs differential encoding on the video frames to be transmitted based on the received prediction result and then sends the video frames to the cloud host;
and S40, the cloud host performs object detection on the video frame based on an object detection model constructed by the R-FCN network.
2. The camera object detection method based on edge calculation as claimed in claim 1, wherein: in the step S20, the ROI prediction model is composed of an association module and a motion detection module;
the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm;
the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI area and the non-ROI area of the video frame including the next frame.
3. The camera object detection method based on edge calculation as claimed in claim 2, wherein: the state prediction includes a prediction of a visible state variable and a latent state variable;
the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates;
the ROI area and non-ROI area constitute a complete video frame.
4. The camera object detection method based on edge calculation as claimed in claim 1, wherein: the step S30 specifically includes:
the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the ROI region and the non-ROI region are matched with a coding mode corresponding to the ROI region and the non-ROI region through a preset ROI mapping table, the differential coding is carried out on the ROI region and the non-ROI region in the video frame to be transmitted through the coding mode, the compression ratio of the non-ROI region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.
5. The camera object detection method based on edge calculation as claimed in claim 1, wherein: in the step S40, the R-FCN network is composed of a feature extractor that depends on the region generation network, and is configured to generate a corresponding bounding box for each object in the video frame, and generate a class probability of each object.
6. The utility model provides a camera object detecting system based on edge calculation which characterized in that: the device comprises the following modules:
the video frame receiving module is used for receiving video frames sent by the camera frame by utilizing the cloud host of the edge calculation;
the ROI area prediction module is used for predicting the ROI area in the next frame of video frame by utilizing the received video frame based on the ROI prediction model by the cloud host, and sending a prediction result to the camera;
the differential coding module is used for enabling the camera to carry out differential coding on the video frames to be transmitted based on the received prediction result and then send the video frames to the cloud host;
the object detection module is used for carrying out object detection on the video frames by the cloud host based on an object detection model constructed by the R-FCN network.
7. A camera object detection system based on edge computation as defined in claim 6, wherein: in the ROI area prediction module, the ROI prediction model is composed of an association module and a motion detection module;
the association module is used for carrying out association prediction on each object in the video frame through the SORT algorithm;
the motion detection module is used for predicting states of the unassociated objects in the association module through a Kalman filter, and further obtaining a prediction result of the ROI area and the non-ROI area of the video frame including the next frame.
8. A camera object detection system based on edge computation as defined in claim 7, wherein: the state prediction includes a prediction of a visible state variable and a latent state variable;
the state variable is the vertex coordinates of the boundary box of the ROI area, and the latent state variable is the movement speed of the vertex coordinates;
the ROI area and non-ROI area constitute a complete video frame.
9. A camera object detection system based on edge computation as defined in claim 6, wherein: the differential encoding module is specifically configured to:
the camera analyzes the received prediction result to obtain an ROI region and a non-ROI region, the ROI region and the non-ROI region are matched with a coding mode corresponding to the ROI region and the non-ROI region through a preset ROI mapping table, the differential coding is carried out on the ROI region and the non-ROI region in the video frame to be transmitted through the coding mode, the compression ratio of the non-ROI region is higher than that of the ROI region, and then the coded video frame is sent to a cloud host in real time.
10. A camera object detection system based on edge computation as defined in claim 6, wherein: in the object detection module, the R-FCN network is composed of a feature extractor depending on a region generation network, and is used for respectively generating a corresponding bounding box for each object in a video frame and generating class probability of each object.
CN202311417833.3A 2023-10-30 2023-10-30 Camera object detection method and system based on edge calculation Pending CN117636201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311417833.3A CN117636201A (en) 2023-10-30 2023-10-30 Camera object detection method and system based on edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311417833.3A CN117636201A (en) 2023-10-30 2023-10-30 Camera object detection method and system based on edge calculation

Publications (1)

Publication Number Publication Date
CN117636201A true CN117636201A (en) 2024-03-01

Family

ID=90031124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311417833.3A Pending CN117636201A (en) 2023-10-30 2023-10-30 Camera object detection method and system based on edge calculation

Country Status (1)

Country Link
CN (1) CN117636201A (en)

Similar Documents

Publication Publication Date Title
US20190370980A1 (en) System and method for real-time detection of objects in motion
Marcenaro et al. Distributed architectures and logical-task decomposition in multimedia surveillance systems
EP2956891B1 (en) Segmenting objects in multimedia data
Shi et al. Optimal placement and intelligent smoke detection algorithm for wildfire-monitoring cameras
CN108683877A (en) Distributed massive video resolution system based on Spark
Wang et al. Vabus: Edge-cloud real-time video analytics via background understanding and subtraction
CN111460999A (en) Low-altitude aerial image target tracking method based on FPGA
CN110705412A (en) Video target detection method based on motion history image
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
Dong et al. WAVE: Edge-device cooperated real-time object detection for open-air applications
Zhu et al. A semantic-aware transmission with adaptive control scheme for volumetric video service
Zhang et al. Mfvp: Mobile-friendly viewport prediction for live 360-degree video streaming
Mo et al. Eventtube: An artificial intelligent edge computing based event aware system to collaborate with individual devices in logistics systems
Ko et al. An energy-quality scalable wireless image sensor node for object-based video surveillance
Shallari et al. Image scaling effects on deep learning based applications
CN117636201A (en) Camera object detection method and system based on edge calculation
CN116095347B (en) Construction engineering safety construction method and system based on video analysis
CN114640669A (en) Edge calculation method and device
EP4224860A1 (en) Processing a time-varying signal using an artificial neural network for latency compensation
CN114663810B (en) Object image augmentation method, device, equipment and storage medium based on multiple modes
Zhang et al. Edge assisted real-time instance segmentation on mobile devices
Dahirou et al. Motion Detection and Object Detection: Yolo (You Only Look Once)
Itano et al. Human actions recognition in video scenes from multiple camera viewpoints
CN114333057A (en) Combined action recognition method and system based on multi-level feature interactive fusion
Zhao et al. Research on intelligent target detection and coder-decoder technology based on embedded platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination