CN114299370A

CN114299370A - Internet of things scene perception method and device based on cloud edge cooperation

Info

Publication number: CN114299370A
Application number: CN202111478787.9A
Authority: CN
Inventors: 邵苏杰; 郭少勇; 徐思雅; 李鸣; 张栋; 于泉杰; 邵聪章; 李易; 邱雪松; 亓峰
Original assignee: Beijing University of Posts and Telecommunications; China Electronics Standardization Institute
Current assignee: Beijing University of Posts and Telecommunications; China Electronics Standardization Institute
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-04-08

Abstract

The invention provides an Internet of things scene perception method and device based on cloud edge cooperation. The method comprises the following steps: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server. According to the method provided by the invention, through the dynamic instance perception model, the perception processing time delay of the scene of the Internet of things can be effectively reduced, and the adaptability and perception accuracy of the high dynamic scene in the scene of the Internet of things are improved.

Description

Internet of things scene perception method and device based on cloud edge cooperation

Technical Field

The invention relates to the technical field of cloud services, in particular to a method and a device for sensing a scene of an internet of things based on cloud edge cooperation. In addition, an electronic device and a processor-readable storage medium are also related.

Background

In recent years, with the rapid development of the fifth Generation Mobile Communication Technology (5G for short), cloud computing Technology, and artificial intelligence Technology, the concept of "universal interconnection" has been gaining more and more attention and development in the industry. Scene information perception is an important basic stone for realizing the technology of internet of everything, and relates to the collection and processing of video image data and various sensing data in a specific scene, and the scene information obtained through perception is used for the upper-layer application of an internet of things system and the decision of managers.

At present, two methods of cloud computing and edge computing are generally adopted for realizing scene perception of the smart internet of things. The cloud computing mode serving as the most common scene information perception method at present meets the scene information perception requirement to a certain extent, but is difficult to meet the actual application requirement in the aspects of instantaneity, accuracy and resource utilization rate; the edge computing mode processes data at the edge of the network, has lower processing delay, and can also reduce the load of the network, but the edge server has limited computing capacity and cannot meet the requirement of complex computing task processing in scene perception tasks. Therefore, how to design a real-time and accurate scene perception scheme of the internet of things based on cloud-edge collaboration ensures that the scene can not only ensure high-quality perception of different types of information in the scene, but also meet the real-time requirement has very important practical significance.

Disclosure of Invention

Therefore, the invention provides an internet of things scene perception method and device based on cloud edge coordination, and aims to overcome the defects that perception accuracy and instantaneity of specific examples in an internet of things scene are poor due to the fact that an internet of things scene perception scheme in the prior art is high in limitation.

In a first aspect, the invention provides a method for sensing a scene of an internet of things based on cloud-edge collaboration, which includes: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;

carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;

and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.

Further, the static examples, the dynamic examples, the abnormal examples, and the sensing results of the sensing data are input into a local multi-example scene fusion model for processing, so as to obtain a local scene output by the local multi-example scene fusion model, which specifically includes:

and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model, combining the static examples, the dynamic examples and the abnormal examples to obtain corresponding three-dimensional scenes, and matching the sensing data into the three-dimensional scenes to obtain the local scenes.

Further, performing instance awareness on the initial scene data based on a dynamic instance awareness model, and determining a static instance, a dynamic instance, and an abnormal instance corresponding to the image data, specifically: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.

Further, the method for sensing the scene of the internet of things based on cloud edge coordination further comprises the following steps: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.

Further, the method for sensing the scene of the internet of things based on cloud-edge coordination further includes, before acquiring initial scene data to be sensed:

and pre-training the dynamic instance perception model deployed in the edge server based on a pre-deployed deep neural network model in the cloud server so as to realize sharing of part of target parameters of each dynamic instance perception model in each edge server in the training process of the dynamic instance perception model.

In a second aspect, the present invention further provides a device for sensing a scene of an internet of things based on cloud-edge coordination, including: the data extraction unit is used for acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;

the instance perception unit is used for carrying out instance perception on the initial scene data based on a dynamic instance perception model and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;

and the scene synthesis unit is used for inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.

Further, the scene synthesis unit is specifically configured to:

Further, the example sensing unit is specifically configured to: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.

Further, the scene synthesis unit is specifically configured to: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.

Further, the internet of things scene sensing device based on cloud edge coordination further includes, before acquiring initial scene data to be sensed: a model training unit;

the model training unit is used for pre-training the dynamic instance perception model deployed in the edge server based on a deep neural network model pre-deployed in the cloud server, so that part of target parameters of each local multi-instance scene fusion model in each edge server are shared in the dynamic instance perception model training process.

In a third aspect, the present invention also provides an electronic device, including: the scene perception method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the steps of the scene perception method based on the cloud edge cooperation for the internet of things are realized.

In a fourth aspect, the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when the computer program is executed by a processor, the steps of the method for scene awareness of internet of things based on cloud edge coordination are implemented.

According to the Internet of things scene perception method based on cloud edge coordination, static examples, dynamic examples and abnormal examples corresponding to image data are determined through example perception of initial scene data, and perception results of the static examples, the dynamic examples, the abnormal examples and the sensing data are input into a local multi-example scene fusion model deployed in an edge server to be processed, so that a local scene output by the local multi-example scene fusion model is obtained, the Internet of things scene perception processing time delay can be effectively reduced, and the adaptability and perception accuracy of high dynamic scenes in the Internet of things scene are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow diagram of a scene sensing method of the internet of things based on cloud edge collaboration provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a deep neural network model training process based on cloud edge coordination according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a training differentiation process of an abnormal case perception model provided by an embodiment of the present invention;

FIG. 4 is a flow chart of local scene awareness provided by an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a deep neural network model in a cloud server and an edge server according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a comparison between an inference delay and a transmission delay for processing a key point sensing task according to two calculation schemes provided by the embodiment of the present invention;

FIG. 7 is a diagram illustrating a comparison of loss _ keypoint when ShareLayer is set to the first 0, 4, and 8 layers of keypoint _ head when load batch is 3 according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an internet of things scene sensing device based on cloud edge collaboration provided by an embodiment of the present invention;

fig. 9 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the internet of things scene perception method based on cloud edge coordination is described in detail below. As shown in fig. 1, which is a schematic flow chart of a scene sensing method of the internet of things based on cloud edge coordination according to an embodiment of the present invention, a specific implementation process includes the following steps:

step 101: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data.

In the embodiment of the present invention, before executing this step, the dynamic instance awareness model deployed in the edge server needs to be pre-trained based on the deep neural network model pre-deployed in the cloud server, and part of target parameters of each dynamic instance awareness model in each edge server are shared in the dynamic instance awareness model training process, so as to improve the convergence speed of the dynamic instance awareness model in a high dynamic scene.

As shown in fig. 2, the invention provides a cloud-edge-based collaborative neural network model training method for a human body posture key point recognition task in dynamic instance perception, which is used for training a deep neural network model for an internet of things scene at each edge server, namely a dynamic instance perception model; meanwhile, the dynamic instance perception model deployed in the edge server is pre-trained on the basis of a deep neural network model pre-deployed in the cloud server to reduce the load of the edge server.

In the embodiment of the invention, the deep neural network model deployed on the cloud server is called CloudNet, and the lightweight deep neural network model deployed on the edge server is called EdgeNet. The method trains the EdgeNet based on the scene information acquired in real time to obtain the dynamic instance perception model of the practical application.

Specifically, a keypoint _ head (head network of feature points) of an ROI head (head network of a region of interest) in the EdgeNet is divided into two parts, namely a shared layer ShareLayer and an adaptive layer. Wherein, ShareLayer is a high m layer of keypoint _ head, is used for extracting common characteristics of examples in each sub-scene, and is trained by each EdgeNet together; the adaptive layer is the rest of the keypoint _ head and is used for adapting to different characteristics of the examples in each sub-scene, and the characteristics are different from each EdgeNet to each other. The roi (region of interest) refers to a region of interest in machine vision and image processing. In the aspect of acquiring the label data required by the EdgeNet training, the label data required by the training can be automatically generated and acquired through the cloudNet, because the number of the neural network layers of the cloudNet is large, and higher identification precision can be obtained. Compared with the traditional labeling mode, the method has the advantages that the workload can be greatly reduced by means of the mode of automatically generating the label data by using the CloudNet, and the model training and updating speed can be increased.

In the implementation process of the invention, the EdgeNet training process comprises an EdgeNet initialization process, an EdgeNet learning process and a Sharelayer updating process.

Wherein the process is initialized for EdgeNet. Let W_cIs a parameter of CloudNet, W_c-bIs part of target parameter shared by the CloudNet backbone network and the EdgeNet, W_c-rsParameters for ShareLayer in ROI head of CloudNet; w_eAs a parameter of edgeNet, W_e-bIs a parameter of the backbone network in EdgeNet, W_e-rs、W_e-raParameters of ShareLayer and AdapteLayer in ROI head of EdgeNet, respectively. Given training set

EdgeNet will optimize the following loss function by training:

in the EdgeNet initialization phase, CloudNet is first trained using a large amount of data and sent down to the edge server. After the edge server receives the CloudNet, the W in the CloudNet is firstly added_c-b、W_c-rsAs W in edgeNet_e-b、W_e-rsParameter of, W is next to_e-raInitialising to a random value and subsequently W_e-b、W_e-rs、W_e-raAre combined into W_eWe is optimized to complete the EdgeNet training. In order to save computing resources, parameters of a backbone network part of the EdgeNet are frozen in the training process, and only the ROI head part is finely adjusted. By using

Represents W_eThe value after the update, concatenating @, means connecting two parameter sets.

The method comprises the following specific steps: inputting:

W_cCloudNet; and (3) outputting: EdgeNet, Label-builder. The algorithm 1 comprises: step 1: passing CloudNet (parameter value W) through cloud server_c) Sending the data to an edge server; step 2: generating W by edge server processing CloudNet_c、W_c-b、W_c-rsThree copies; and step 3: the edge server connects the backbone network with ShareLayer and AdapteLayer in the ROI head to construct an EdgeNet; and 4, step 4: w is to be_e-raInitializing to a random value; and 5: w_e＝W_c-b∪W_e-rs∪W_e-ra(ii) a Step 6:

and 7: return the value of EdgeNet to

The value of Label-builder is W_c。

Wherein the EdgeNet learning process is aimed at. The invention further trains the EdgeNet by using the scene information acquired in real time, and the method is based on the EdgeNet initialization stepAnd generating Label data required by training by the Label-builder obtained by the section. The Label data generated by the Label-builder is assumed to be accurate. After receiving the scene data, the edge server acquires a sensing result by using the EdgeNet and stores the sensing result in a database, and when the edge server is idle, Label data of the scene data is generated by using the Label-builder, and the EdgeNet is trained on the basis of the Label data of the scene data. Note the book

For the parameters of the EdgeNet after the end of the initialization stage of the EdgeNet, when the number of the instances in the newly acquired scene is accumulated to M, the EdgeNet starts to be trained, and the newly acquired M instances and the labels thereof are recorded as

EdgeNet will be trained to optimize the following loss function:

the method comprises the following specific steps:

inputting: the number of edges of the edge net is equal to the number of edges of the edge net,

target

Label-builder; and (3) outputting: and (5) updated EdgeNet.

Step 1: the edge server obtains the scene instance by using Label-builder

Is marked with a label

Step 2:

is composed of

An updated value; and step 3: return the value of EdgeNet to

Wherein the update procedure is directed to Sharelayer. The invention sets a shared layer ShareLayer (W) in the ROI head of the EdgeNet_e-rs) In the layer, each EdgeNet is used for joint training to further accelerate the convergence speed of the training EdgeNet and extract the common characteristics of the instances in different scenes. In the real-time training process, after each EdgeNet on each edge server is trained for a certain number of times, parameters of the ShareLayer in the ROI head of each edge server are extracted and uploaded to the cloud server, and the cloud server collects gradients generated by the ShareLayer in each EdgeNet training by adopting a preset Fedavg algorithm to update the ShareLayer. Note the book

For the updated parameters of the Sharelayer, the detailed procedure of the Sharelayer update is as follows:

inputting: sharelayer of N edgenets, training batch size, N_i,

Sharelayer, W of CloudNet_c-rs(ii) a And (3) outputting: an updated ShareLayer.

Step 1: each edge server separates Sharelayer's parameters from the EdgeNet (i.e., each edge server separates Sharelayer's parameters from the EdgeNet

) Uploading the cloud server; step 2: calculating gradients

And step 3: calculating a weighted average gradient sum

Wherein

And 4, step 4:

step 5 returns the updated ShareLayer to

After the ShareLayer is updated, the cloud server issues the updated ShareLayer parameters to each edge server, and each edge server calls the step 3-7 of the algorithm 1 again to train the EdgeNet, so that an EdgeNet initialization-EdgeNet learning-ShareLayer updating cycle is completed, and finally a dynamic instance perception model meeting application conditions is obtained.

Scene instance extraction can be implemented in this step. Specifically, in the scene of the internet of things, the initial scene information includes two categories, namely, sensing data (such as voltage, temperature, humidity and the like) and image data (such as personnel, inspection robots, faulty equipment and the like), wherein the image data is divided into three categories, namely a static example, a dynamic example and an abnormal example. In the scene instance extraction stage, initial scene information is firstly classified to obtain static instances, dynamic instances, abnormal instances and sensing data.

The sensing data can be acquired by preset sensing data acquisition equipment, and the sensing data is processed by adopting a preset N-shot K-way small sample fusion algorithm model. In particular, for sensing data

First, K support set classes are constructed, each class having N samples (S)₁，...，S_N) The objective of the perceptual algorithm model is to determine to which support set class the collected sensing data should belong, and the optimization objective is shown as the following formula:

in the formula:

sensing data in the scene of the Internet of things is fused into K types in the learning process of the N-shot K-way small sample, so that the calculation amount of subsequent analysis of upper-layer application is reduced.

The image data are obtained by preset image acquisition equipment and comprise a plurality of example types, and the image data are divided into three types, namely static examples, dynamic examples and abnormal examples. The method adopts a Mask-RCNN model to detect and extract three types of target examples from image data and generate a Mask so as to complete perception processing by using a corresponding perception algorithm model (namely an example perception model) in subsequent steps.

Step 102: carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server.

In the specific implementation process of this step, the initial scene data may be input into a preset dynamic instance perception model for instance perception, so as to obtain a static instance, a dynamic instance, and an abnormal instance corresponding to the image data. The dynamic instance perception model is obtained by training with preset marking data as training samples.

It should be noted that the static examples refer to relatively fixed objects in the scene of the internet of things, such as a factory building, a generator set, a pipeline, and the like. In the embodiment of the invention, the 3D model of the static instance can be preset in the edge server when the scene instance perception process is implemented, and the preset 3D model is directly called to participate in the fusion of the local scene after the static instance in the scene is identified by the Mask-RCNN. The dynamic examples refer to objects, such as workers and the like, which change in real time in the scene of the internet of things, and can be synthesized into the 3D model by detecting key points and then synthesizing the 3D model according to the key point parameters. Taking the generation of the human body 3D Model as an example, firstly, parameters of key points of the human body are detected by using a keypoint _ head branch of Mask RCNN, then, the SMPL (a Skinned Multi-Person Linear Model) is further used to parameterize the human body Model to generate the human body 3D Model based on the detected parameters, and finally, the perception of the human body instance (i.e. dynamic instance) is completed. The abnormal examples are abnormal information in scenes of the Internet of things, such as illegal intrusion, fault equipment and the like, and are characterized in that the characteristics cannot be predicted, and the 3D model can only be directly perceived by scene images.

According to the method, the abnormal instances are sensed through the Mesh-RCNN, the abnormal instances in the initial image are identified based on the Mask-RCNN, the sub-images of the abnormal instances are segmented, and then Mesh data of the instances are generated by the Mesh-RCNN based on the sub-images. The method is used for respectively training the Mesh-RCNN models for different types of abnormal instances, and when the abnormal instances are sensed, the corresponding Mesh-RCNN models are selected from the model base according to corresponding label data to generate Mesh data of the abnormal instances.

As shown in FIG. 3, it is the training differentiation process of Mesh-RCNN. The Mask-RCNN is first trained based on labeling data so that it can detect several specific types of abnormal instances, and labeled with labels, respectively. Under the initial condition, the Mesh-RCNN model corresponding to each tag data is a universal model and is obtained by initialization based on a pre-training model deployed on a cloud server. And when a certain number of instances corresponding to the label data are accumulated, obtaining the Mesh labels of the instances through a related labeling technology, and training a Mesh-RCNN model corresponding to the label based on the Mesh labels. When a new abnormal instance needs to be detected, the Mask-RCNN can be trained again based on the labeled data so that a new abnormal instance can be detected, and then the Mesh-RCNN training process is repeated again.

Step 103: and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.

As shown in fig. 4, in this step, the static examples, the dynamic examples, the abnormal examples, and the sensing results of the sensing data are input into a local multi-example scene fusion model, the static examples, the dynamic examples, and the abnormal examples are combined to obtain corresponding three-dimensional scenes, and the sensing data are matched into the three-dimensional scenes to obtain the local scenes. Furthermore, at least one local scene can be fused at the cloud server end, so that a global scene can be synthesized based on the local scenes. Specifically, after the initial scene information is acquired by the acquisition device in step 101, a local scene is obtained by performing three stages of instance extraction, instance perception and scene fusion based on a pre-deployed model in the edge server. And finally, synthesizing each local scene at the cloud server side to obtain a global scene after each local scene is obtained based on the perception of each edge server.

In the scene synthesis process, the 3D models of the static examples, the dynamic examples and the abnormal examples are combined based on the edge server, the sensing data are matched into the three-dimensional scene, and the perception of the local scene is finally completed. It should be noted that, in the implementation process, two copies of the synthesized local scene are retained: one is stored in the edge server to provide real-time service for local users, and the other is uploaded to the cloud server to synthesize a global scene.

Specifically, the feature extraction process is denoted as F, the instance perception process is denoted as P, the instance synthesis process is denoted as B, the initial data is denoted as Ori, the sensing data in the scene of the internet of things is denoted as Sen, the dynamic instance is denoted as Dym, the static instance is denoted as Sta, the abnormal instance is denoted as Gen, the local scene perception result is denoted as Loc, and the global scene perception result is denoted as Gol. From a dataflow perspective, the multi-instance perceptual fusion process of local scenes may be represented as:

the global scene composition process at the cloud server side can be represented as follows:

the following is illustrated with reference to specific examples:

the specific process verifies that the cloud edge cooperation-based deep neural network model training method is verified in a human body posture key point recognition task. A cloud edge collaborative training method of Mask RCNN is constructed based on a available Decctron 2 framework in target detection, CloudNet and EdgeNet are keypoint _ RCNN _ R _50_ FPN _3x models provided based on Decctron 2, and detailed network structures of the models are shown in FIG. 5. The network structure of the CloudNet is the same as that of the original model, the backbone network of the EdgeNet reserves stem, res2 and res3 parts in the backbone network of the original model, and the rest parts are the same as that of the original model. ShareLayer extracting common characteristics of parameters is the front 4 layers of FPN, Box Head, Box _ predictor and keypoint _ Head of Mask RCNN, and AdapteLayer is the back 4 layers of keypoint _ Head. The preset image data with the human body posture key point labels are used as real-time scene data, and the real-time scene data acquisition process is simulated by loading the image data in batches in the training process.

The specific process of the invention mainly compares two aspects: the training effect of the scene perception average time delay and cloud edge collaborative training method.

Fig. 6 shows processing delay and transmission delay of the human body posture key point detection task executed by the cloud server and the edge server, where the inference delay is actually measured by the simulation system, the transmission delay is summarized by the online data report, and the data in the figure is transmission delay in the WiFi environment. Compared with a single cloud computing processing mode, the scene instance perception task is sunk to the edge server, so that the total processing delay is reduced by about 39.8%. The inference delay is reduced by 38.6%, and the transmission delay is reduced by 26.9%.

In order to verify the effect of the cloud-edge cooperation training method on the EdgeNet training, the process of acquiring scene data and training the EdgeNet in practice is simulated. Initially, the cloud server has been deployed with a pre-trained CloudNet (deep neural network model). The edge server firstly initializes the EdgeNet (initial dynamic instance perception model) based on the preset 100 pieces of image data; then, loading the pictures by taking 100 pictures as a batch, and after the loading of each batch is finished, the edge server retrains the EdgeNet; after the edge servers finish 2 times of the retraining of the EdgeNet, the united cloud server updates the ShareLayer and sends an updating result to the edge servers, after one batch of pictures are loaded again, the edge servers train the EdgeNet based on the merged ShareLayer, and therefore, after the pictures of three batches are loaded, one round of EdgeNet initialization-edgeNet learning-ShareLayer updating circulation is finished. The invention simulates 12 loading batches in total, namely completing the processing cycle for 4 times, and finally obtaining the dynamic instance perception model.

The invention verifies the change condition of the loss _ keypoint along with the iteration times when the ShareLayer is respectively the first 0 layer, the first 4 layer and the first 8 layers of the keypoint _ head. Wherein, ShareLayer is the first 4 layers of keypoint _ head, namely the method provided by the invention. As shown in fig. 7, it shows the comparison of loss _ keypoint in the process of training EdgeNet by using the above three methods when loading for the 3 rd time. As can be seen, setting ShareLayer _ num to 4 results in the smallest loss _ keypoint, which results in 7.428% less loss _ keypoint than ShareLayer _ num to 0 and 4.856% less loss _ keypoint than ShareLayer _ num to 8.

The invention provides an internet of things scene perception method based on cloud-edge cooperation, aiming at the problem that the existing cloud computing mode and edge computing mode are difficult to adapt to the requirements of the internet of things scene perception on real-time performance and accuracy. Firstly, characteristics and perception requirements of scene data such as sensing data and image data in scenes of the Internet of things are analyzed, a scene information perception method for distinguishing dynamic examples, static examples and abnormal examples is designed, and local scene information edge perception and global scene cloud synthesis are supported. Secondly, aiming at a high-precision network recognition model (namely a dynamic instance perception model) of a high-frequency change dynamic instance in a scene, a deep neural network model training method based on cloud edge cooperation is designed, training of the neural network model on an edge server is assisted through a cloud, partial parameters of each edge neural network model are shared by the cloud to improve the convergence speed of the model, the dynamic instance perception model is obtained, and perception processing time delay and model training time are effectively reduced.

By adopting the internet of things scene perception method based on cloud edge coordination, the time delay of the perception processing of the internet of things scene can be effectively reduced, and the adaptability and perception accuracy of the high-dynamic scene in the internet of things scene are improved.

Corresponding to the method for sensing the scene of the internet of things based on cloud-edge coordination, the invention further provides a device for sensing the scene of the internet of things based on cloud-edge coordination. Since the embodiment of the device is similar to the method embodiment described above, the description is relatively simple, and please refer to the description in the method embodiment section, and the following description of the embodiment of the internet of things scene sensing device based on cloud edge coordination is only illustrative. Fig. 8 is a schematic structural diagram of an internet of things scene sensing device based on cloud edge coordination according to an embodiment of the present invention.

The invention relates to an Internet of things scene sensing device based on cloud edge collaboration, which specifically comprises:

a data extraction unit 801, configured to acquire initial scene data to be perceived; the initial scene data comprises image data and sensing data;

an instance perception unit 802, configured to perform instance perception on the initial scene data based on a dynamic instance perception model, and determine a static instance, a dynamic instance, and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;

and a scene synthesis unit 803, configured to input the static instance, the dynamic instance, the abnormal instance, and the sensing result of the sensing data into a local multi-instance scene fusion model for processing, so as to obtain a local scene output by the local multi-instance scene fusion model.

Further, the scene synthesis unit is specifically configured to:

the model training unit is used for pre-training the dynamic instance perception model deployed in the edge server based on a deep neural network model pre-deployed in the cloud server so as to realize sharing of part of target parameters of each dynamic instance perception model in each edge server in the dynamic instance perception model training process.

By adopting the internet of things scene perception device based on cloud edge coordination, the static examples, the dynamic examples and the abnormal examples corresponding to the image data are determined by carrying out example perception on the initial scene data, and the perception results of the static examples, the dynamic examples, the abnormal examples and the sensing data are input into the dynamic example perception model deployed in the edge server for processing, so that the local scene output by the dynamic example perception model is obtained, the internet of things scene perception processing time delay can be effectively reduced, and the adaptability and perception precision of high dynamic scenes in the internet of things scene are improved.

Corresponding to the method for sensing the scene of the internet of things based on cloud edge cooperation, the invention further provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 9 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor)901, a memory (memory)902 and a communication bus 903, wherein the processor 901 and the memory 902 complete communication with each other through the communication bus 903 and communicate with the outside through a communication interface 904. The processor 901 may invoke logic instructions in the memory 902 to perform a cloud-edge coordination-based internet of things scene awareness method, the method comprising: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server; and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.

Furthermore, the logic instructions in the memory 902 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, embodiments of the present invention further provide a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for scene awareness of the internet of things based on cloud edge coordination provided by the above-mentioned method embodiments. The method comprises the following steps: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server; and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.

In another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when the computer program is executed by a processor, the computer program is implemented to perform the method for sensing a scene of an internet of things based on cloud edge coordination provided in the foregoing embodiments. The method comprises the following steps: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server; and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.

The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A scene perception method of the Internet of things based on cloud edge collaboration is characterized by comprising the following steps:

acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;

2. The internet of things scene perception method based on cloud edge collaboration according to claim 1, wherein the static examples, the dynamic examples, the abnormal examples and the perception results of the sensing data are input into a local multi-example scene fusion model to be processed, and a local scene output by the local multi-example scene fusion model is obtained, specifically including:

3. The internet of things scene perception method based on cloud-edge collaboration as claimed in claim 1, wherein instance perception is performed on the initial scene data based on a dynamic instance perception model, and a static instance, a dynamic instance, and an abnormal instance corresponding to the image data are determined, specifically: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.

4. The internet of things scene awareness method based on cloud-edge collaboration as claimed in claim 1, further comprising: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.

5. The internet of things scene perception method based on cloud-edge collaboration as claimed in claim 1, further comprising, before obtaining initial scene data to be perceived:

6. The utility model provides a thing networking scene perception device based on cloud limit is cooperative which characterized in that includes:

the data extraction unit is used for acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;

7. The internet of things scene awareness apparatus based on cloud edge collaboration as claimed in claim 6, wherein the scene synthesis unit is specifically configured to:

8. The internet of things scene awareness apparatus based on cloud edge collaboration as claimed in claim 6, wherein the instance awareness unit is specifically configured to: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for scene awareness of internet of things based on cloud edge coordination according to any one of claims 1 to 5.

10. A processor-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for scene awareness of internet of things based on cloud-edge collaboration as claimed in any one of claims 1 to 5.