CN117037158B

CN117037158B - Urban brain cloud edge cooperative computing method and device based on video semantic driving

Info

Publication number: CN117037158B
Application number: CN202311298523.4A
Authority: CN
Inventors: 高丰; 郑宇化; 孙铭鸽
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-01-09
Anticipated expiration: 2043-10-09
Also published as: CN117037158A

Abstract

The application provides a video semantic driving-based urban brain cloud edge collaborative computing method and device, comprising the following steps: acquiring video stream data; carrying out semantic detection on the video stream data to obtain a semantic detection result; extracting a target picture from the video stream data according to the semantic detection result; determining a node which establishes communication with the current edge node and has processing resources as a target node; and sending the target picture to the target node so that the target node extracts the semantic result of the target picture to obtain the semantic result. The method solves the problems of limited bandwidth, difficult direct video stream transmission and limited load capacity of the edge server among the edge nodes, and can provide high-efficiency cloud edge cooperative processing capacity when facing sudden large-flow processing, and effectively ensure uninterrupted traffic flows such as target tracking, flow statistics and the like.

Description

Urban brain cloud edge cooperative computing method and device based on video semantic driving

Technical Field

The application relates to the technical field of video processing, in particular to a video semantic driving-based urban brain cloud edge collaborative computing method and device.

Background

The video system is an important component in intelligent traffic, can be used for monitoring road traffic, vehicle running, illegal behaviors and the like, and provides data support for intelligent traffic.

How to efficiently process massive video stream data of the urban brain presents a great challenge to a back-end information system. In the process of the side cloud cooperative processing, the edge server can collect video streams of multiple paths of cameras nearby and send result data (including but not limited to traffic flow, vehicle tracking information, people flow and the like) generated after the video streams are processed in real time to the cloud side server, so that communication overhead and calculation load of the cloud server are greatly reduced.

However, in the cloud-edge cooperative scene, the complete video stream needs to be transmitted to the adjacent node, and the adjacent node is subjected to repeated processing by adopting a video target detection method. On the basis, if the conditions of burst large flow and the like are met, the edge server is overloaded and cannot process in real time, so that the conditions of service data interruption and the like are caused.

Disclosure of Invention

In order to overcome the problems in the related art, the specification provides a video semantic driving-based urban brain cloud edge collaborative computing method and device.

In a first aspect, the present application provides a video semantic driving-based urban brain cloud edge collaborative computing method, including:

acquiring video stream data;

carrying out semantic detection on the video stream data to obtain a semantic detection result;

extracting a target picture from the video stream data according to the semantic detection result;

determining a node which establishes communication with the current edge node and has processing resources as a target node;

and sending the target picture to the target node so that the target node extracts the semantic result of the target picture to obtain the semantic result.

Optionally, the node establishing communication with the current edge node and having processing resources comprises: an edge node adjacent to the current edge node and a center node;

the determining a node that establishes communication with the current edge node and has processing resources as a target node includes:

determining a target node from the adjacent edge nodes according to the image processing queue length corresponding to the adjacent edge nodes under the condition that the image processing queue length of the current edge node exceeds a preset queue length threshold;

And under the condition that the lengths of the image processing queues of the adjacent edge nodes exceed a preset queue length threshold value, determining the center node as a target node.

Optionally, the determining the target node from the adjacent edge nodes according to the image processing queue lengths corresponding to the adjacent edge nodes includes:

and determining the adjacent edge node with the shortest image processing queue length as a target node.

Optionally, the performing semantic detection on the video stream data to obtain a semantic detection result includes: and detecting the license plate region of the video stream data to obtain a license plate region detection result.

Optionally, the detecting the license plate region of the video stream data to obtain a license plate region detection result includes:

extracting features of each frame of picture corresponding to the video stream data by utilizing a pre-trained license plate detection model to obtain feature images with different scales; the license plate detection model is obtained by training based on an SSD target detection algorithm;

generating a plurality of candidate frames on the feature map using anchor frames of different aspect ratios and sizes;

classifying and regression predicting a plurality of candidate frames to obtain the class probability and the position information of the candidate frames;

And determining a final candidate frame from the plurality of candidate frames through a non-maximum suppression algorithm according to the class probability and the position information of the candidate frame, and taking the class probability and the position information corresponding to the final candidate frame as a license plate region detection result.

Optionally, the extracting from the video stream data according to the semantic detection result to obtain the target picture includes:

extracting an initial target picture containing a license plate from the video stream data according to the category probability;

and according to the position information, intercepting a license plate region from the initial target picture to serve as a final target picture.

In a second aspect, the present application further provides a video semantic driving-based urban brain cloud edge collaborative computing method, including:

the video semantic driving-based city brain cloud edge collaborative computing system comprises a current edge node and a target node, and the method comprises the following steps:

the current edge node executes the video semantic driving-based city brain cloud edge collaborative computing method;

and the target node extracts semantic results from the target picture to obtain semantic results.

Optionally, the target node performs semantic result extraction on the target picture to obtain a semantic result, which includes:

Storing the target pictures obtained from the current edge node into an image processing queue of the target node, and sequentially extracting semantic results from the target pictures in the image processing queue to obtain semantic results.

and the target node carries out license plate recognition on the target picture to obtain a license plate recognition result.

Optionally, the target node performs license plate recognition on the target picture to obtain a license plate recognition result, including:

preprocessing the target picture to obtain a preprocessed picture;

extracting the characteristics of the preprocessed picture;

identifying the extracted features by utilizing a plurality of decision trees in a pre-trained random forest model to obtain a plurality of initial identification results; the initial recognition result corresponds to the decision tree, and the random forest model is obtained based on training of a random forest algorithm;

voting or averaging the initial recognition results to obtain a final license plate recognition result.

Optionally, the video semantic driving-based city brain cloud edge collaborative computing system further comprises a central node;

After the target node extracts the semantic result of the target picture and obtains the semantic result, the method further comprises the following steps:

the target node sends the semantic result to a central node;

and the central node acquires the semantic result and performs target tracking or flow statistics processing based on the semantic result to acquire a video service processing result.

In a third aspect, the present application further provides a video semantic driving-based city brain cloud edge cooperative computing device, including:

the data acquisition module is used for acquiring video stream data;

the semantic detection module is used for carrying out semantic detection on the video stream data to obtain a semantic detection result;

the target picture acquisition module is used for extracting a target picture from the video stream data according to the semantic detection result;

a target node determining module, configured to determine a node that establishes communication with a current edge node and has processing resources, as a target node;

and the target picture sending module is used for sending the target picture to the target node so that the target node can extract the semantic result of the target picture and obtain the semantic result.

In a fourth aspect, the present application further provides a video semantic driving-based urban brain cloud edge cooperative computing device, where the video semantic driving-based urban brain cloud edge cooperative computing system includes a current edge node and a target node, and the method includes:

The current edge node processing module is used for executing the video semantic driving-based city brain cloud edge collaborative computing method according to any one of the above;

and the target node processing module is used for extracting the semantic result of the target picture by the target node to obtain the semantic result.

In a fifth aspect, the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the above-mentioned urban brain cloud edge collaborative computing method based on video semantic driving.

In a sixth aspect, the application further provides an electronic device, where the device includes a memory and a processor, where the memory is configured to store computer instructions that can be executed on the processor, and the processor is configured to implement the above-mentioned urban brain cloud edge collaborative computing method based on video semantic driving when the computer instructions are executed.

According to the video semantic driving-based city brain cloud edge collaborative computing method and device, semantic detection is firstly carried out on received multipath video stream data in the current edge node, so that a target image containing semantic information is determined from the multipath video stream, further, the target node is determined from the neighboring node and the center node according to processing resources of the neighboring node and the center node, further semantic result identification is carried out through the target node still having processing capacity, and therefore the current edge node is not required to send complete video stream data to the target node for business processing, the problems that bandwidth between the edge nodes is limited, video stream direct transmission is difficult, and load capacity of an edge server is limited are solved, and the video semantic driving-based city brain cloud edge collaborative computing method can provide efficient cloud edge collaborative processing capacity when burst large-flow processing is carried out, and effectively guarantees that business flows such as target tracking and flow statistics are not interrupted.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a collaborative processing scenario in the related art;

FIG. 2 is one of the flow diagrams of the video semantic driving-based city brain cloud edge collaborative computing method shown in the present application;

FIG. 3 is a second flow chart of the video semantic driving-based city brain cloud edge collaborative computing method shown in the application;

fig. 4 is a schematic diagram of a message timing transmission flow shown in the present application;

FIG. 5 is a schematic diagram of a target node determination process shown in the present application;

FIG. 6 is a third flow chart of the collaborative computing method of the urban brain cloud edge based on video semantic driving shown in the application;

FIG. 7 is one of the block diagrams of a video semantic drive-based urban brain cloud computing device shown in the present application;

FIG. 8 is a second block diagram of a video semantic drive-based urban brain cloud computing device;

Fig. 9 is a block diagram of the electronic device shown in the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "first," "second," and the like in the description and in the claims, are not used for any order, quantity, or importance, but are used for distinguishing between different elements. Likewise, the terms "a" or "an" and the like do not denote a limitation of quantity, but rather denote the presence of at least one. "plurality" or "plurality" means two or more. The word "comprising" or "comprises", and the like, means that elements or items appearing before "comprising" or "comprising" are encompassed by the element or item recited after "comprising" or "comprising" and equivalents thereof, and that other elements or items are not excluded. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The application provides a video semantic driving-based urban brain cloud edge collaborative computing method and device. The present application will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.

Before describing the video semantic driving-based city brain cloud edge collaborative computing method and device provided by the application in detail, the terms mentioned in the application are explained:

video semantic driving: the video semantic driving is a novel video analysis technology, semantic information in a video is extracted by utilizing an artificial intelligence technology, and the volume of data processed later is greatly reduced, so that smaller flow transmission, higher-efficiency video analysis and faster video processing are realized. Unlike traditional video analysis techniques, video semantic driving not only analyzes the acquired complete image, but also extracts and analyzes semantic information in the image in combination with business characteristics, thereby realizing more efficient analysis and faster processing.

Urban brain: the urban brain is an artificial intelligent center constructed for urban traffic management, public safety, emergency management, grid prevention and control, medical and health, travel, environmental protection, urban refined management and the like by utilizing new generation information technologies such as artificial intelligence, big data, 5G, internet of things, digital twinning, VR, AR and the like, promotes construction and opens up various urban digital management platforms, real-time full-scale urban data is utilized, running short boards are corrected in real time, urban public resources are optimized, and high-quality breakthrough of urban management mode, service mode and digital industry development is realized.

Bian Yun collaborative calculation: bian Yun collaborative computing is that by the collaborative computing of the edge side and the cloud side, the edge side cleans, generalizes and infers a great deal of local data, and provides some processed information to the cloud side so as to support the business operation of the cloud side. The edge computing technology provides real-time edge intelligent services nearby by fusing network, computing, storage and application core capabilities at the network edge side close to the object or data source. The edge server receives the data at the end side to process locally, but when burst large traffic occurs, the edge server can offload tasks to the adjacent edge server, so that high-speed real-time processing is ensured.

Edge node, neighbor node and cooperative node: in the edge cloud cooperative computing process, an edge node refers to an edge server which directly receives video data streams and processes service processing. The edge nodes have cost sensitivity, and according to road condition and flow statistical information, one edge node normally receives 5-8 paths of high-definition video signals; the adjacent nodes are adjacent nodes formed by geographically adjacent edge nodes, and the adjacent nodes mutually send the length of a Hello message exchange task processing queue; the cooperative nodes refer to other nodes which need to perform cooperative processing under the conditions of overload of local nodes and the like, and the cooperative nodes are called cooperative nodes, and can be edge nodes or cloud side nodes.

Fig. 2 is a first flow chart of a video-semantic-driving-based urban brain cloud edge collaborative computing method shown in the application, and fig. 3 is a second flow chart of the video-semantic-driving-based urban brain cloud edge collaborative computing method shown in the application; as shown in fig. 2 and fig. 3, a video-semantic-driving-based urban brain cloud edge collaborative computing method is applied to any one edge node, and takes a certain edge node as a current edge node, and includes:

Step S201, video stream data is acquired.

In this step, as can be seen from the edge cloud collaboration shown in fig. 1, each edge node is correspondingly connected with video stream data of multiple paths of cameras, and takes a certain edge node as an example, multiple paths of video stream data corresponding to the current edge node are obtained.

Step S202, carrying out semantic detection on the video stream data to obtain a semantic detection result.

In the step, semantic detection is performed on each frame of picture in the video stream data through a semantic detection algorithm preset on the current edge node, so that a semantic detection result is obtained.

For example, license plate detection, pedestrian detection, object detection, road detection, etc. may be performed on the video stream data by using the target detection algorithm, and a license plate detection result, a pedestrian detection result, an object detection result, a road detection result, etc. are correspondingly obtained.

The target detection algorithm may be R-CNN series, SPP-Net, or the like, or may be YOLO series, SSD, or the like, and the present application is not limited thereto.

And step S203, extracting a target picture from the video stream data according to the semantic detection result.

In this step, if the semantic detection result is that a certain frame of picture in the video stream contains specific semantic information (the semantic information is determined according to a semantic detection target, if the semantic detection is license plate detection, the semantic information is characterized in that the picture contains a license plate), then the image containing the semantic information is extracted as a target picture. And if the semantic detection result is that the picture does not contain semantic information, discarding the picture which does not contain semantic information.

Step S204, determining a node having processing resources and establishing communication with the current edge node as a target node.

In this step, the target node is determined from the neighboring edge nodes and the center node according to the processing resource conditions of the edge node and the center node neighboring the current edge node.

The adjacent edge nodes and the center node are all in communication connection with the current edge node, and when cloud edge cooperative computing or edge and edge cooperative computing is carried out, timing message transmission is carried out between the current edge node and the adjacent edge nodes and the center node, so that normal communication between the current edge node and the adjacent edge nodes and between the current edge node and the center node is ensured. As shown in fig. 4, in the process of sending the message content at regular time, the ID information of the current edge node and the current image processing queue length n are obtained, and then the address table (ID information, etc.) of the adjacent edge nodes is obtained, and in the case that the time of the timing arrives, the current edge node sends a Hello message to each adjacent edge node according to the address table information, where the Hello message includes the ID information of the current edge node and the image processing queue length n in the current edge node, so that the image processing queue lengths of the other edge nodes can be mastered in real time between the adjacent nodes.

Step S205, sending the target picture to the target node, so that the target node performs semantic result extraction on the target picture, and obtains a semantic result.

In this step, after determining the target node, a link is formed between the current edge node and the target node through a connection request, and then the current edge node sends the target picture to the target node, so as to implement semantic transmission (taking license plate detection as an example, here, the current edge node sends a license plate local screenshot sequence message to the target node), and a semantic result acquisition algorithm preset in the target node further extracts a semantic result of the target picture containing semantic information, thereby obtaining a semantic result.

For example, a license plate recognition algorithm may be used to perform license plate recognition on a license plate in a target picture, so as to obtain a license plate recognition result, where the license plate recognition result refers to a license plate number.

The human face recognition algorithm can be used for carrying out human face recognition on pedestrians in the target picture so as to obtain a human face recognition result, namely pedestrian information.

Or the object recognition algorithm is utilized to perform object recognition on objects (including dynamic and static, wherein the dynamic includes vehicles and the like, and the static includes signboards and the like) in the target picture so as to obtain an object recognition result.

And the road category can be obtained by identifying the road in the target picture by using a road detection algorithm.

The license plate recognition algorithm can be a license plate recognition algorithm based on machine learning, such as decision trees, support vector machines and the like; it may also be a license plate recognition algorithm based on deep learning, such as SSD, yolo series, RCNN series, etc.

The face recognition algorithm may be a mature face recognition algorithm such as PNet, RNet, ONet, and is not limited thereto.

After the semantic result is obtained, target tracking, flow statistics and the like can be performed by utilizing the semantic result, and the target tracking, the flow statistics and the like are also subsequent processing for the semantic result. And the central node gathers the business data such as target tracking, flow statistics and the like formed by the target node or the current edge node based on the semantic result.

It should be noted that, if the current edge node has processing resources, the semantic result extraction (i.e. local processing) is performed on the target picture by using the semantic result acquisition algorithm preset on the current edge node, so as to obtain a semantic result. At this time, the current edge node can process the collected multi-path video stream data, and the data does not need to be migrated to an adjacent edge node or a central node, so that the problem of service flow interruption caused by extreme conditions such as burst large flow and the like in the cloud edge cooperation and edge-edge cooperation calculation process is avoided.

According to the video semantic drive-based city brain cloud edge collaborative computing method, semantic detection is firstly carried out on received multipath video stream data in the current edge node, so that a target image containing semantic information is determined from the multipath video stream, further, the target node is determined from the adjacent node and the center node according to processing resources of the adjacent node and the center node, further semantic result extraction is carried out through the target node still having processing capacity, therefore, the current edge node is not required to send complete video stream data to the target node for business processing, the problems of limited bandwidth among the edge nodes, difficult direct video stream transmission and limited load capacity of an edge server are solved, and when burst large-flow processing is carried out, the video semantic drive-based city brain cloud edge collaborative computing method provided by the application can provide efficient cloud edge collaborative processing capacity, and effectively ensure that business flows such as target tracking and flow statistics are not interrupted.

In another embodiment of the step S202, taking license plate detection as an example, the detecting the license plate region of the video stream data to obtain a license plate region detection result includes:

And extracting the characteristics of each frame of picture corresponding to the video stream data by using a pre-trained license plate detection model to obtain characteristic diagrams with different scales.

The license plate detection model is obtained based on SSD target detection algorithm training.

Multiple candidate boxes are generated on the feature map using anchor boxes of different aspect ratios and sizes.

And classifying and carrying out regression prediction on the plurality of candidate frames to obtain the class probability and the position information of the candidate frames.

The license plate detection algorithm of the related technology is based on the rapid target detection of the universal video stream, and improves the efficiency while ensuring the accuracy. The method mainly comprises the steps of video target detection combined with tracking, video stream target detection based on feature migration or fusion of motion information and the like.

The video target detection combined with the detection and tracking is a common video target detection method, and the basic idea is to detect the target of a static image for each frame in a video, then track a target frame by using a multi-target tracking algorithm, and correct the previous detection result by using the tracking result so as to improve the stability and the continuity. The advantage of this approach is that existing single frame object detectors and multi-object trackers can be utilized without requiring additional design of complex network structures or training procedures. The disadvantage of this approach is that it relies on the performance of a single frame object detector and a multi-object tracker, which if one of them is in error, can affect the overall video object detection. One representative effort is T-CNN, which proposes a video object detection framework based on tracking and regression, first using fast R-CNN to perform object detection on each frame in the video, then using MDNet to perform multi-object tracking on the detection result, and finally using a regression network to optimize and reorder the tracking result.

The video stream target detection method based on the motion information for feature migration or fusion is a video target detection method which utilizes the motion information such as optical flow to estimate feature change between adjacent frames, and then transmits or fuses features from key frames to other frames so as to reduce repeated calculation and improve consistency. The method has the advantages that the motion information can be utilized to enhance the space-time information in the video, and the accuracy and the robustness of target detection are improved. The disadvantage of this method is that it requires additional calculation of motion information such as optical flow, which increases the amount of calculation and the time overhead. Some representative works, such as Deep Feature Flow, propose a feature transfer method based on optical flow, which maps features of key frames to other frames through optical flow, then uses a fusion module to fuse the transferred features with features of the current frame, and finally uses a detection module to perform object detection.

In the method, firstly, the SSD algorithm is utilized to detect license plate areas, further license plate recognition is carried out on target images containing license plates, target tracking or flow statistics is carried out after center nodes acquire various license plate recognition results, the performance of a single-frame target detector and a multi-target tracker is not depended in the middle, and motion information such as optical flows does not need to be calculated additionally, so that the detection accuracy rate can be ensured, and meanwhile, the efficiency can be improved.

Specifically, an SSD (Single-Shot-MultiBox-Detector) algorithm in the method is a Single-stage (Single-stage) -based target detection method, and the method combines the ideas of a deep convolutional neural network and a multi-scale feature map, so that the position location and the category prediction of a target can be performed simultaneously. The SSD destination detection method for license plate detection consists of the following stages:

generating a multi-scale feature map: SSDs use a basic convolutional neural network (typically VGG or ResNet, etc.) as the feature extractor. In the feature extractor, feature maps of different scales are obtained by adding an additional convolution layer after the different network layers. These feature maps have different receptive fields that enable detection on targets of different sizes.

Anchor Boxes (Anchor Boxes) generation: on the obtained feature maps of different scales, the SSD algorithm generates candidate frames through anchor frame centering of different aspect ratios and sizes. Wherein the anchor frame is a set of predefined rectangular frames covering objects of different shapes and sizes.

Feature map classification and regression: for the generated candidate frames, the SSD performs classification and regression prediction through convolution and full connection layers. Wherein the classification prediction outputs probabilities that each candidate box belongs to a different class using a softmax function. The regression prediction is used for adjusting the position and the size of the anchor frame so as to better match with the license plate.

Non-maximal inhibition (Non-Maximum Suppression, NMS): to remove redundant candidate boxes, SSDs use NMS algorithms to filter out the final detection results. The NMS algorithm sorts and screens the candidate frames according to the confidence and overlapping degree of the candidate frames so as to reserve detection frames which are high in confidence and non-overlapping.

In addition, the training process also comprises the following steps:

loss function: SSDs use a multitasking loss function to train the model. The multitasking loss function includes a classification loss and a positional regression loss. Wherein the classification loss uses a cross entropy loss function for measuring the difference between the classification predictions and the actual labels. The position regression loss uses a smoothl 1 loss function for measuring the difference between the prediction bounding box and the actual bounding box. And training based on the multitasking loss function, the training image and the corresponding label information to obtain a license plate detection model.

In another embodiment of the step S203, taking license plate detection as an example, the extracting the target picture from the video stream data according to the semantic detection result includes:

In this embodiment, in order to further reduce the calculation amount in the cloud edge calculation or the edge calculation process, the picture including the license plate is cut, specifically, an initial target picture is determined according to the category probability in the license plate region detection result (if the probability of including the license plate region is larger than the probability threshold, the initial target picture is determined, if the probability of including the license plate region is smaller than the probability threshold, the license plate region is not included, and no processing is performed), and further, the license plate region in the initial target picture is cut according to the position information in the license plate region detection result, so that the cut picture is used as a final target picture.

In another embodiment of step S204, the node establishing communication with the current edge node and having processing resources includes: an edge node adjacent to the current edge node and a center node.

and under the condition that the image processing queue length of the current edge node exceeds a preset queue length threshold, determining a target node from the adjacent edge nodes according to the image processing queue length corresponding to the adjacent edge nodes.

In this embodiment, first, it is determined whether the current edge node has the capability of processing the target picture, that is, detecting the image processing queue in the current edge node（/>) Whether the length n of the queue is smaller than or equal to the preset queue length threshold value n _max If n is less than or equal to n _max Then encapsulate as object->Is added to the image processing queue of the current edge node +.>In this case, new->. And then, sequentially processing the target images in the new queue Q by utilizing a semantic result acquisition algorithm preset in the current edge node to obtain a semantic result.

If the image processing queue in the current edge nodeLength n of (2) is greater than n _max Determining that the current edge node has no capability to acquire semantic results, and passing edgesAnd the semantic result acquisition is completed in a cloud cooperation or edge-to-edge cooperation mode.

As shown in fig. 5, by communicating with a timing message between adjacent edge nodes (the message includes ID information of the adjacent node and the current image processing queue length of the adjacent node, such as Hello message), it is determined whether the length n 'of the image processing queue Q' in the adjacent edge node is less than or equal to a preset queue length threshold If n' is less than or equal to%>Then the adjacent edge node is determined as the target node and packaged into an object +.>The target image of the target node is sent to the target node and added to an image processing queue Q ' to form a new Q ', and a semantic result is obtained by processing the target image in the new Q ' through a semantic result obtaining algorithm preset in the target node.

Further, by comparing the lengths n 'of the image processing queues Q' in a plurality of neighboring edge nodes in communication with the current edge node, the lengths are minimized n _min Is determined as the target node.

If the length n 'of the image processing queues Q' of all the adjacent edge nodes is larger than the preset queue length threshold value after all the adjacent edge nodes are comparedAnd judging that the adjacent edge nodes have no processing capacity, and finishing semantic result acquisition in a cloud-edge cooperative mode, namely determining a central node of the cloud as a target node. The current edge node will be encapsulated as object +.>The target image of the target image is sent to a central node, and semantic result acquisition is carried out on the target image by a semantic result acquisition algorithm preset on the central node.

It should be noted that, the lengths of the image processing queues on different edge nodes are different, and the current edge node can obtain the lengths of the image processing queues on the adjacent edge nodes through real-time communication with the adjacent edge nodes. The preset queue length thresholds on different edge nodes may be the same or different, which is not limited.

Each edge node and each center node are preset with a semantic detection algorithm and a semantic result acquisition algorithm.

In addition, each edge node is initialized at the beginning, and after initialization, the edge nodes acquire respective preset queue length thresholds and ID information of each edge node, wherein the ID information is used as a basis for communication between the edge nodes.

Fig. 6 is a third flow chart of a video-semantic-driving-based urban brain cloud edge cooperative computing method shown in the present application, as shown in fig. 6, and the video-semantic-driving-based urban brain cloud edge cooperative computing method is applied to a target node and a current edge node, where the target node is a node determined by the video-semantic-driving-based urban brain cloud edge cooperative computing method in any of the embodiments, and the method includes:

step S601, the current edge node executes the above-mentioned city brain cloud edge collaborative computing method based on video semantic driving.

In this step, the current edge node determines the target node through the above-mentioned video semantic driving-based city brain cloud edge cooperative computing method, and sends the target picture to the target node, specifically see the above description.

Step S602, the target node performs semantic result extraction on the target picture to obtain a semantic result.

In the step, after the target node acquires the target picture, the target picture is identified by utilizing a semantic result acquisition algorithm preset on the target node, so that a semantic result is obtained.

For example, a license plate recognition algorithm may be used to perform license plate recognition on a target picture including a license plate region, so as to obtain a license plate recognition result.

And the face recognition algorithm can be used for carrying out face recognition on the target picture containing the pedestrian to obtain a face recognition result.

According to the urban brain cloud edge collaborative computing method based on video semantic driving, the target node only carries out semantic result acquisition on the target picture containing semantic information in the video stream data, and complete video stream data is not required to be acquired from the current edge node, so that transmission pressure between the current edge node and an adjacent edge node or a central node is relieved, computing pressure on the target node is reduced, cloud edge collaborative processing capability can be improved, service flows such as target tracking and flow statistics are effectively guaranteed to be uninterrupted, and the method is sufficient for facing sudden large-flow scenes.

In another embodiment of the step S602, the extracting, by the target node, the semantic result of the target picture to obtain the semantic result includes:

storing the target pictures obtained from the current edge node into an image processing queue of the target node, and sequentially extracting semantic results from the target pictures in the image processing queue to obtain semantic results. That is, the target node stores the obtained target picture into the existing image processing queue, and then sequentially extracts the semantic result of the target picture in the image processing queue by using a preset semantic result acquisition algorithm, thereby obtaining the semantic result.

In another embodiment of the step S602, taking license plate recognition as an example, the target node performs license plate recognition on the target picture to obtain a license plate recognition result, including:

and preprocessing the target picture to obtain a preprocessed picture.

And extracting the characteristics of the preprocessed picture.

And identifying the extracted features by utilizing a plurality of decision trees in the pre-trained random forest model to obtain a plurality of initial identification results.

The initial recognition result corresponds to the decision tree, and the random forest model is obtained based on training of a random forest algorithm.

In this embodiment, the license plate detection algorithm is a random forest algorithm, which is an integrated algorithm formed by decision trees, and is obtained by training acquired data, and the specific training process is as follows:

first, a plurality of training sets are constructed by randomly selecting samples from the acquired data by random sample selection using a put-back sampling (bootstrap sampling) approach. The size of each training set is typically the same as the original data set, but some samples may be repeated while some samples are not selected.

Then, constructing random feature construction decision trees, and randomly selecting a part of features to consider when carrying out feature classification for the nodes of each decision tree. A subset is typically randomly selected from all features, from which the best feature is then selected for partitioning. For each training set, a decision tree is constructed using the randomly selected feature subset. The decision tree may be constructed using a common decision tree algorithm, such as CART (Classification and Regression Trees) algorithm or others, without limitation.

Next is the integration of random forests, which combines multiple decision trees into a random forest model. The prediction results of the decision tree may be integrated in particular by voting (classification problem) or averaging (regression problem).

Repeating the steps to construct a plurality of decision trees and form a random forest model. The size of the random forest may be controlled by setting the number of decision trees or other termination conditions.

Finally, the evaluation dataset is used to evaluate the performance and generalization ability of the random forest model. Common evaluation indexes include accuracy, precision, recall, F1 value, and the like. And after the performance and generalization capability of the random forest model meet certain requirements, license plate recognition is performed by using the random forest model.

After a trained random forest model is obtained, a target picture is firstly obtained, preprocessing such as image denoising, graying and binarization is carried out on the target picture, feature extraction is carried out on the preprocessed picture, extracted features are obtained, the extracted features are input into the trained random forest model, initial recognition results are obtained through prediction of all decision trees in the random forest model, and the number of the initial recognition results corresponds to the number of the decision trees. And finally, obtaining a final license plate recognition result by voting or averaging.

Compared with the traditional license plate recognition algorithm, the license plate recognition algorithm based on the random forest has good robustness and accuracy, can effectively process noise and complex background in license plate images, and is suitable for various illumination conditions and license plate types.

Further, the video semantic driving-based city brain cloud edge collaborative computing system further comprises a central node;

and the target node sends the semantic result to a central node.

It should be noted that, when the target node is an adjacent edge node, the semantic result needs to be sent to the central node, and if the target node is the central node, the semantic result does not need to be sent any more. In addition, semantic results in the current edge node also need to be sent to the central node.

After the central node obtains the semantic result, the central node performs subsequent business processing such as target tracking, flow statistics and the like.

In addition, the edge node in the application adopts an MLU220 intelligent module as an SOC edge acceleration chip, the module adopts an MLUv02 architecture of the domestic of the chilly, and a single-system solution of 16TOPS AI calculation can be realized on the module based on the size of a credit card, and the power consumption is only 15W. The MLU220 module can be widely applied to intelligent power, intelligent manufacturing, intelligent rail traffic, intelligent energy and other edge computing scenes, highly diversified artificial intelligent applications such as vision, voice, natural language processing, traditional machine learning and the like are supported, and an edge intelligent solution of various businesses is realized. The TSMC 16nm technology is adopted, so that the method has high calculation power, low power consumption and rich I/O interfaces. Meanwhile, the MLU220 chip adopts innovative technology of the chilly in the field of processor architecture, the architecture is the intelligent processor MLUv02 of the latest generation of the chilly, the maximum 32TOPS (INT 4) computing power is realized, and the power consumption is reduced to 10W. Based on the hardware, the urban brain cloud edge collaborative calculation based on video semantic driving can be ensured to be more efficient.

The application also provides a video-semantic-drive-based city brain cloud edge cooperative computing device, which corresponds to a video-semantic-drive-based city brain cloud edge cooperative computing method, as shown in fig. 7, fig. 7 is one of block diagrams of the video-semantic-drive-based city brain cloud edge cooperative computing device, and the device comprises:

A data acquisition module 701, configured to acquire video stream data;

the semantic detection module 702 is configured to perform semantic detection on the video stream data to obtain a semantic detection result;

a target picture obtaining module 703, configured to extract a target picture from the video stream data according to the semantic detection result;

a target node determining module 704, configured to determine, as a target node, a node that establishes communication with a current edge node and has processing resources;

and the target picture sending module 705 is configured to send the target picture to the target node, so that the target node performs semantic result extraction on the target picture, and obtains a semantic result.

The application also provides a video-semantic-drive-based city brain cloud edge cooperative computing device, which corresponds to a video-semantic-drive-based city brain cloud edge cooperative computing method, as shown in fig. 8, fig. 8 is a second block diagram of the video-semantic-drive-based city brain cloud edge cooperative computing device, wherein the video-semantic-drive-based city brain cloud edge cooperative computing system comprises a current edge node and a target node, and the method comprises:

a current edge node processing module 801, configured to execute the video semantic driving-based city brain cloud edge collaborative computing method according to any one of the above description by using the current edge node;

And the target node processing module 802 is configured to extract a semantic result from the target image by using the target node, and obtain a semantic result.

The implementation process of the functions and actions of each module in the urban brain cloud edge cooperative computing device based on video semantic driving is specifically described in the implementation process of corresponding steps in the urban brain cloud edge cooperative computing method based on video semantic driving, and is not described herein again.

Embodiments of the present application also provide an electronic device, as shown in fig. 9, where the electronic device 900 may include a computer-readable storage medium 990, where the computer-readable storage medium 990 may store a program that may be called by the processor 910, and may include a non-volatile storage medium. In some embodiments, electronic device 900 may include memory 980 and interface 970. In some embodiments, electronic device 900 may also include other hardware depending on the application.

The computer readable storage medium 990 of the embodiment of the present application has a program stored thereon, which when executed by the processor 910, is used to implement the above-described urban brain cloud edge collaborative computing method based on video semantic driving.

The present application may take the form of a computer program product embodied on one or more computer-readable storage media 990 (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-readable storage media 990, including both permanent and non-permanent, removable and non-removable media, may be any method or technology for storage of information. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer readable storage media 990 include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.

The embodiment of the application also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and the program is executed by a processor to realize the urban brain cloud edge collaborative computing method based on video semantic driving according to any embodiment of the application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A video semantic driving-based urban brain cloud edge collaborative computing method is characterized by comprising the following steps of:

Acquiring video stream data;

sending the target picture to the target node so that the target node extracts a semantic result of the target picture to obtain a semantic result;

the semantic detection of the video stream data is performed, and obtaining a semantic detection result comprises:

detecting license plate areas of the video stream data to obtain license plate area detection results;

the detecting the license plate region of the video stream data to obtain a license plate region detection result comprises the following steps:

2. The method of claim 1, wherein the node establishing communication with the current edge node and having processing resources comprises: an edge node adjacent to the current edge node and a center node;

3. The method of claim 2, wherein the determining the target node from the neighboring edge nodes according to the image processing queue lengths corresponding to the neighboring edge nodes comprises:

4. The method of claim 1, wherein extracting the target picture from the video stream data according to the semantic detection result comprises:

5. The city brain cloud edge cooperative computing method based on video semantic driving is characterized in that the city brain cloud edge cooperative computing system based on video semantic driving comprises a current edge node and a target node, and the method comprises the following steps:

the current edge node executes the video semantic driving-based city brain cloud edge collaborative computing method according to any one of claims 1 to 4;

6. The method of claim 5, wherein the target node performs semantic result extraction on the target picture to obtain a semantic result, comprising:

7. The method of claim 6, wherein the target node performs semantic result extraction on the target picture to obtain a semantic result, comprising:

8. The method of claim 7, wherein the target node performs license plate recognition on the target picture to obtain a license plate recognition result, comprising:

preprocessing the target picture to obtain a preprocessed picture;

extracting the characteristics of the preprocessed picture;

9. The method of any of claims 5-8, wherein the video semantic driven city brain cloud computing system further comprises a central node;

the target node sends the semantic result to a central node;

10. Urban brain cloud edge cooperative computing device based on video semantic driving, which is characterized by comprising:

the data acquisition module is used for acquiring video stream data;

determining a final candidate frame from the plurality of candidate frames through a non-maximum suppression algorithm according to the class probability and the position information of the candidate frames, and taking the class probability and the position information corresponding to the final candidate frame as a license plate region detection result;

11. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-9.

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-9 when executing the program.