CN114154018A

CN114154018A - Cloud-edge collaborative video stream processing method and system for unmanned system

Info

Publication number: CN114154018A
Application number: CN202210117381.6A
Authority: CN
Inventors: 郭锐; 陈忠; 施晓东; 徐俊瑜; 马驰; 李磊; 姚传明; 王超; 刘佳
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2022-02-08
Filing date: 2022-02-08
Publication date: 2022-03-08
Anticipated expiration: 2042-02-08
Also published as: CN114154018B

Abstract

The invention discloses a cloud-edge collaborative video stream processing method and system for an unmanned system. The method comprises the following steps: the terminal unattended platform cuts the collected video stream, and sends each cut frame of picture to a designated edge terminal device for processing according to a preset distribution strategy; the edge terminal equipment responds to the picture identification request, adds the picture into a cloud sending queue to wait for sending to the cloud, and adds the picture into a local identification queue to wait for identification; the edge terminal equipment takes out the picture to be identified from the local identification queue for identification processing; according to the confidence coefficient obtained by local picture recognition, the recognition result with the confidence coefficient larger than or equal to the specified threshold value is taken as a correct recognition result to be stored; and the edge terminal equipment gives a final picture identification result according to the local identification result and the cloud identification result. According to the invention, the video information processing efficiency of the unattended system in a complex environment can be improved through cloud-edge cooperative processing.

Description

Cloud-edge collaborative video stream processing method and system for unmanned system

Technical Field

The invention relates to unmanned system video stream processing, in particular to an unmanned system-oriented cloud-side collaborative video stream processing method and system.

Background

The unattended system collects video streams through video monitoring and recording equipment to perform real-time analysis and target detection. Most of existing unmanned systems send video streams collected from a terminal unmanned platform to a cloud for processing. However, if all the image data are executed in the cloud, huge overhead is caused to the transmission network; if all detection tasks are executed at the edge end, the edge end equipment is required to have better computing and storing hardware configuration, and the method depends on continuous and stable guarantee, and is difficult to be suitable for the complex and changeable environment of the edge end. Therefore, a single cloud processing or a single edge processing is unrealistic, which needs to combine the cloud computing and the edge computing technologies to construct a cloud-edge collaborative video stream processing system, thereby improving the video information processing efficiency of the unattended system in a complex environment.

Disclosure of Invention

The purpose of the invention is as follows: in order to solve the technical problems, the invention provides a cloud-edge collaborative video stream processing method and system for an unmanned system, which optimize the video stream processing capability of the unmanned system in a cloud-edge collaborative mode by utilizing the low time-delay property of an edge end and the resource sufficiency of a cloud end.

The technical scheme is as follows: a cloud edge collaborative video stream processing method facing an unmanned system comprises the following steps:

the terminal unattended platform cuts the collected video stream, and sends each cut frame of picture to the appointed edge end equipment for processing according to a preset distribution strategy;

the edge terminal equipment responds to the picture identification request, adds the picture into a cloud sending queue to wait for sending to the cloud, and adds the picture into a local identification queue to wait for identification;

the edge terminal equipment identifies the picture data in the local identification queue;

the edge terminal equipment identifies the obtained confidence coefficient f according to the local picture, and the confidence coefficient f is larger than or equal to the specified threshold value

The recognition result of (2) is saved as a correct recognition result;

the edge end equipment judges whether a cloud end identification result is received or not, if the edge end equipment receives the cloud end identification result, comprehensive judgment and scoring are carried out according to the time and the accuracy of the cloud end and the local identification of the edge end, when the score is larger than or equal to a threshold value s, the cloud end identification result is judged to be superior to the local identification result, and the edge end equipment returns the cloud end identification result to the human-computer interaction interface; otherwise, the edge terminal equipment returns a local identification result to the man-machine interaction interface;

and if the edge terminal equipment does not receive the identification result of the cloud, directly returning the local identification result to the human-computer interaction interface.

Wherein the preset allocation policy includes: selecting the edge terminal equipment with the minimum expected waiting time to process the picture according to the identification queue length and the identification time of the adjacent edge terminal equipment stored in the local database:

where d represents the selected edge device, N represents the total number of edge devices,

indicating the number of pictures waiting to be identified in the ith edge device,

is shown as

The individual edge devices identify the average time of a picture.

The specified threshold value

The identification result is obtained according to the test set, and the values are as follows:

where M is the number of pictures of the object to be detected in the test set,

the number of correctly identified pictures in the test set.

Further, comprehensive judgment and scoring are carried out according to the duration of cloud identification and the accuracy of promotion, and the formula is as follows:

the accuracy of the promotion is represented, the accuracy of the cloud model and the accuracy of the edge end are obtained through the test set, and the difference between the accuracy of the cloud model and the accuracy of the edge end is the difference

A value of (d);

indicating slave transmissionThe time that the picture has passed to the cloud to return the results,

and the average time delay from the edge end to the cloud end is represented.

An unmanned system-oriented cloud-edge collaborative video stream processing system comprises:

the terminal unattended platform is used for cutting the collected video stream and sending each frame of cut pictures to appointed edge-end equipment for processing according to a preset distribution strategy;

the edge terminal equipment is used for responding to the picture identification request, adding the picture to be identified into a cloud sending queue to wait for sending to the cloud, and adding the picture to be identified into a local identification queue to wait for identification; determining a final recognition result of the picture based on the local recognition result and the cloud recognition result;

the cloud end is used for identifying the pictures sent by the edge end equipment and returning an identification result to the edge end equipment;

the edge terminal device determines a final recognition result of the picture based on the local recognition result and the cloud recognition result, and the determination comprises the following steps:

the edge terminal equipment identifies the picture data in the local identification queue, and according to the confidence coefficient f obtained by local picture identification, the confidence coefficient f is greater than or equal to the specified threshold value

The recognition result of (2) is saved as a correct recognition result;

the edge end equipment judges whether the cloud end identification result exists or not, if the edge end equipment receives the cloud end identification result, comprehensive judgment and scoring are carried out according to the time and the accuracy of cloud end and edge end local identification, when the score is larger than or equal to a threshold value s, the cloud end identification result is judged to be superior to the local identification result, and the edge end equipment returns the cloud end identification result to the human-computer interaction interface; otherwise, the edge terminal equipment returns a local identification result;

An unmanned system oriented video stream processing device comprising:

the data acquisition module is used for acquiring a picture identification request of each frame of picture after video stream cutting from the tail end unattended platform, wherein each frame of picture after video stream cutting is distributed according to a preset distribution strategy;

the data queue module is used for responding to the picture identification request, adding the picture to be identified into a cloud sending queue to wait for being sent to the cloud, and adding the picture to be identified into a local identification queue to wait for being identified;

the local picture identification module is used for identifying the picture data in the local identification queue and making the confidence coefficient f be more than or equal to a specified threshold value according to the confidence coefficient f obtained by local picture identification

The recognition result of (2) is saved as a correct recognition result;

the cloud side comprehensive analysis decision module is used for judging whether a cloud side identification result is received or not, if the cloud side identification result is received, comprehensive judgment and scoring are carried out according to the time and the accuracy of the cloud side and the local identification respectively, when the score is larger than or equal to a threshold value s, the cloud side identification result is judged to be superior to the local identification result, and the cloud side identification result is returned to the human-computer interaction interface; otherwise, returning a local identification result; and if the identification result of the cloud is not received, directly returning the local identification result to the human-computer interaction interface.

A computer device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, the programs when executed by the processors implement the steps of:

acquiring a picture identification request of each frame of picture after video stream cutting from a terminal unattended platform, wherein each frame of picture after video stream cutting is distributed according to a preset distribution strategy;

responding to the picture identification request, adding the picture to be identified into a cloud end sending queue to wait for sending to the cloud end, and adding the picture to be identified into a local identification queue to wait for identification;

identifying the picture data in the local identification queue, and according to the confidence f obtained by local picture identification, making the confidence f be greater than or equal to a specified threshold value

The recognition result of (2) is saved as a correct recognition result;

judging whether a cloud identification result is received or not, if the cloud identification result is received, comprehensively judging and scoring according to the time and accuracy of cloud and local identification respectively, judging that the cloud identification result is superior to the local identification result when the score is greater than or equal to a threshold value s, and returning the cloud identification result to the human-computer interaction interface; otherwise, returning a local identification result; and if the identification result of the cloud is not received, directly returning the local identification result to the human-computer interaction interface.

Has the advantages that: according to the method, each frame of picture cut from the video stream by the edge end is sent to the edge end device with the minimum estimated waiting time for processing the picture, the edge end device also sends a copy of picture to the cloud end device after receiving the picture, the edge end and the cloud end are processed simultaneously, if the cloud end is not processed, the identification result of the edge end is obtained, if the cloud end has a return result, the return result is scored according to time delay and improvement precision to comprehensively judge, and the low ductility of the edge end and the resource sufficiency of the cloud end can be combined through cloud-edge cooperative processing, so that the efficiency of unmanned system video stream processing is improved.

Drawings

Fig. 1 is a flowchart of a method for processing a cloud-edge collaborative video stream for an unmanned system according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an object detection process according to an embodiment of the invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

The invention provides a collaborative video stream processing method and system for an unmanned system. Cloud refers to a cloud server or any platform or infrastructure that provides cloud computing services. The video stream processing method based on the system comprises the following steps: firstly, the unattended platform collects video streams and then sends pictures in the videos to the edge end device, then the edge end device and the cloud end device conduct cooperative processing, if a picture processing result is obtained through cloud end return, scoring is comprehensively judged according to time delay and precision improvement rate, whether the recognition result of the cloud end needs to be used is determined, and if the cloud end does not return the result, the picture processing result of the edge end is obtained. Referring to fig. 1, the method specifically includes the following steps:

(1) data distribution:

and the tail end unattended platform cuts the collected video stream and sends each cut frame of picture to the edge end equipment. Suppose there is

An edge terminal device, each edge terminal device

Maintain a queue with

And waiting for detection and identification of the picture.

Is shown as

The average time of identifying one picture by each edge terminal device is obtained by averaging the identification time of the pictures with the same type and a large number of samples in the model training process. The specific sending edge end equipment is determined by real-time task allocation. The edge end determines which close device to process the current picture through the real-time task distribution system. Real-time task scoringThe matching system selects an edge device with the smallest expected waiting time for processing the picture. Namely:

each edge device acquires the neighboring edge devices through timing data synchronization

And

values and stored in a local mini-database SQLite, able to be based on

And

the dynamic change of the value triggers a change of the distribution strategy.

(2) And (3) cloud edge collaborative image processing:

(2-1) for edge terminal device

When receiving a picture identification request, the terminal device is added into the cloud sending queue

Waiting to be sent to the cloud end, and adding the picture to be identified into a local identification queue

Awaiting recognition.

(2-2) whenever edge end device

From the local queue when a picture has been identified

And taking out the next picture for local identification processing. The identification model of the edge-end device is as follows: yolov3 was used first for detecting common objects in the video stream and then ResNet-152 or MobileNet was used for more accurate and specific classification of the detected objects and labeling them. Fig. 2 is a process diagram of target detection.

(2-3) edge terminal device

Obtaining a confidence level after identifying a picture

When is coming into contact with

When the picture does not identify the target, the picture returns a result without identifying the target. I.e. only confidence levels greater than or equal to a specified threshold

Is saved to be less than

The recognition results of (2) are all discarded.

The value of (a) is selected to maximize the accuracy of the whole edge terminal. Is provided with a test center

A picture of the object to be detected in

In the result of (A), there are

The sheet is correct and the sheet is,

the values of (A) are as follows:

(2-4) when the picture is identified by the cloud equipment, the cloud identification model is as follows: yolov3 was used first for detecting common objects in the video stream and then ResNet-152 or MobileNet was used for more accurate and specific classification of the detected objects and labeling them. Fig. 2 is a process diagram of target detection. Cloud equipment returns detection result to edge equipment

。

(2-5) if edge terminal device

When local identification is completed, the identification result of the cloud is also received, comprehensive judgment and scoring are carried out according to the time and the accuracy of the respective identification of the cloud and the edge, and the scoring details are as follows:

wherein

Expressing the accuracy of the promotion, obtaining the accuracy of the cloud model and the accuracy of the edge end through the test set, and then performing difference to obtain the accuracy of the cloud model and the accuracy of the edge end

The value of (c).

Representing the time elapsed from sending the picture to the cloud returning the results,

and the average time delay from the edge end to the cloud end is represented. Here, the

Represents the proportion of cloud processing time delay to end-to-end time delay, because

Representing the one-way average delay from the edge end to the cloud end,

indicating the time elapsed from sending the picture to returning the result to the cloud, is bi-directional, and therefore

Multiply by 2 times and

a comparison is made.

(2-6) when score is larger than or equal to a threshold value s, judging that the cloud identification result is superior to the edge end identification result, and returning the cloud identification result to the human-computer interaction interface by the edge end equipment; otherwise, judging that the edge end identification result is superior to the cloud end identification result, and returning the local identification result to the human-computer interaction interface by the edge end equipment. In an embodiment of the present invention, the threshold s is 0.6.

(3) And outputting a result:

and after the identification is finished, returning the identified result to the user, and storing the result in the cloud database.

The present invention also provides an unmanned system-oriented video stream processing device, which corresponds to an edge device in the video stream processing system, and the device includes:

The recognition result of (2) is saved as a correct recognition result;

The data acquisition module obtains each frame of picture to be processed according to a preset distribution strategy, wherein the distribution strategy is as follows: the video stream processing equipment synchronizes the length of a data queue of the video stream processing equipment and the average processing time of a picture at regular time, and each video stream processing equipment selects equipment with the minimum expected waiting time to process the picture according to the identification queue length and the identification time of the adjacent edge end equipment stored in the local database:

where d denotes a selected video stream processing apparatus, N denotes a total number of video stream processing apparatuses,

at the i-th video streamThe number of pictures to be detected and identified in the management device,

is shown as

A video stream processing apparatus identifies an average time of a picture.

The identification model of the local picture identification module is as follows: yolov3 was used first for detecting common objects in the video stream and then ResNet-152 or MobileNet was used for more accurate and specific classification of the detected objects and labeling them.

Specified threshold for use by local picture recognition module

where M is the number of pictures of the object to be detected in the test set,

the number of correctly identified pictures in the test set.

The cloud side comprehensive analysis decision module carries out comprehensive judgment and scoring according to the time and the accuracy rate respectively identified by the cloud side and the local, and the formula is as follows:

wherein

The accuracy of the promotion is represented and obtained by the difference between the accuracy of the cloud and the accuracy of the local area;

representing the average latency from the video stream processing device to the cloud.

The present invention also provides a computer apparatus comprising:

one or more processors;

a memory; and

The recognition result of (2) is saved as a correct recognition result;

The steps executed by the processor correspond to the functions implemented by the functional modules of the video stream processing device facing the unmanned system, and the specific calculation method involved in each step may refer to the corresponding description of the video stream processing device, which is not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A cloud edge collaborative video stream processing method for an unmanned system is characterized by comprising the following steps:

the terminal unattended platform cuts the collected video stream, and sends each cut frame of picture to a designated edge terminal device for processing according to a preset distribution strategy;

the edge terminal equipment responds to the picture identification request, adds the picture to be identified into a cloud sending queue to wait for sending to the cloud, and adds the picture into a local identification queue to wait for local identification;

The recognition result of (2) is saved as a correct recognition result;

the edge end equipment judges whether a cloud end identification result is received or not, if the edge end equipment receives the cloud end identification result, comprehensive judgment and scoring are carried out according to the time and the accuracy of the cloud end and the local identification of the edge end, when the score is larger than or equal to a threshold value s, the cloud end identification result is judged to be superior to the local identification result, and the edge end equipment returns the cloud end identification result to the human-computer interaction interface; otherwise, the edge terminal equipment returns a local identification result;

2. The unmanned-system-oriented cloud-edge collaborative video stream processing method according to claim 1, wherein the preset allocation policy includes: selecting the edge equipment with the minimum expected waiting time to process the picture according to the identification queue length and the identification time of the adjacent edge equipment stored in the local database of each edge equipment:

is shown as

The individual edge devices identify the average time of a picture.

3. The unmanned-system-oriented cloud-edge collaborative video streaming method according to claim 1, wherein the specified threshold is set

where M is the number of pictures of the object to be detected in the test set,

the number of correctly identified pictures in the test set.

4. The unmanned-system-oriented cloud-edge collaborative video stream processing method according to claim 1, wherein comprehensive judgment and scoring are performed according to time and accuracy rates respectively identified by a cloud end and an edge end, and a formula is as follows:

wherein

The accuracy of the promotion is represented, and the accuracy is obtained by making a difference between the accuracy of the cloud end and the accuracy of the edge end;

and the average time delay from the edge end to the cloud end is represented.

5. The unmanned-system-oriented cloud-edge collaborative video stream processing method according to claim 1, wherein a local identification model of an edge device performs primary detection by using Yolov3, and performs secondary detection and marking on an object which is detected for the first time by using ResNet-152 or MobileNet.

6. An unmanned system-oriented cloud-edge collaborative video stream processing system is characterized by comprising:

The recognition result of (2) is saved as a correct recognition result;

7. The unmanned-system-oriented cloud-edge collaborative video stream processing system according to claim 6, wherein the edge device performs comprehensive judgment and scoring according to the time and accuracy respectively identified by the cloud and the edge, and the formula is as follows:

wherein

and the average time delay from the edge end to the cloud end is represented.

8. The unmanned-system-oriented cloud-edge collaborative video streaming system according to claim 6, wherein the preset allocation policy comprises: selecting the edge equipment with the minimum expected waiting time to process the picture according to the identification queue length and the identification time of the adjacent edge equipment stored in the local database of each edge equipment:

is shown as

The individual edge devices identify the average time of a picture.

9. The unmanned-system-oriented cloud-edge-collaborative video streaming system of claim 6, wherein the specified threshold is

where M is the number of pictures of the object to be detected in the test set,

the number of correctly identified pictures in the test set.

10. An unmanned system-oriented video stream processing apparatus, comprising:

the data acquisition module is used for acquiring a picture identification request of each frame of picture after video stream cutting from the tail end unattended platform, wherein each frame of picture after video stream cutting is distributed by the tail end unattended platform according to a preset distribution strategy;

Is recognized as a resultSaved as a correct recognition result;

11. A computer device, comprising:

one or more processors;

a memory; and

acquiring a picture identification request of each frame of picture after video stream cutting from the tail-end unattended platform, wherein each frame of picture after video stream cutting is distributed by the tail-end unattended platform according to a preset distribution strategy;

The recognition result of (2) is saved as a correct recognition result;