CN114299370A - Internet of things scene perception method and device based on cloud edge cooperation - Google Patents

Internet of things scene perception method and device based on cloud edge cooperation Download PDF

Info

Publication number
CN114299370A
CN114299370A CN202111478787.9A CN202111478787A CN114299370A CN 114299370 A CN114299370 A CN 114299370A CN 202111478787 A CN202111478787 A CN 202111478787A CN 114299370 A CN114299370 A CN 114299370A
Authority
CN
China
Prior art keywords
scene
instance
perception
dynamic
examples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111478787.9A
Other languages
Chinese (zh)
Inventor
邵苏杰
郭少勇
徐思雅
李鸣
张栋
于泉杰
邵聪章
李易
邱雪松
亓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
China Electronics Standardization Institute
Original Assignee
Beijing University of Posts and Telecommunications
China Electronics Standardization Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, China Electronics Standardization Institute filed Critical Beijing University of Posts and Telecommunications
Priority to CN202111478787.9A priority Critical patent/CN114299370A/en
Publication of CN114299370A publication Critical patent/CN114299370A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides an Internet of things scene perception method and device based on cloud edge cooperation. The method comprises the following steps: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server. According to the method provided by the invention, through the dynamic instance perception model, the perception processing time delay of the scene of the Internet of things can be effectively reduced, and the adaptability and perception accuracy of the high dynamic scene in the scene of the Internet of things are improved.

Description

Internet of things scene perception method and device based on cloud edge cooperation
Technical Field
The invention relates to the technical field of cloud services, in particular to a method and a device for sensing a scene of an internet of things based on cloud edge cooperation. In addition, an electronic device and a processor-readable storage medium are also related.
Background
In recent years, with the rapid development of the fifth Generation Mobile Communication Technology (5G for short), cloud computing Technology, and artificial intelligence Technology, the concept of "universal interconnection" has been gaining more and more attention and development in the industry. Scene information perception is an important basic stone for realizing the technology of internet of everything, and relates to the collection and processing of video image data and various sensing data in a specific scene, and the scene information obtained through perception is used for the upper-layer application of an internet of things system and the decision of managers.
At present, two methods of cloud computing and edge computing are generally adopted for realizing scene perception of the smart internet of things. The cloud computing mode serving as the most common scene information perception method at present meets the scene information perception requirement to a certain extent, but is difficult to meet the actual application requirement in the aspects of instantaneity, accuracy and resource utilization rate; the edge computing mode processes data at the edge of the network, has lower processing delay, and can also reduce the load of the network, but the edge server has limited computing capacity and cannot meet the requirement of complex computing task processing in scene perception tasks. Therefore, how to design a real-time and accurate scene perception scheme of the internet of things based on cloud-edge collaboration ensures that the scene can not only ensure high-quality perception of different types of information in the scene, but also meet the real-time requirement has very important practical significance.
Disclosure of Invention
Therefore, the invention provides an internet of things scene perception method and device based on cloud edge coordination, and aims to overcome the defects that perception accuracy and instantaneity of specific examples in an internet of things scene are poor due to the fact that an internet of things scene perception scheme in the prior art is high in limitation.
In a first aspect, the invention provides a method for sensing a scene of an internet of things based on cloud-edge collaboration, which includes: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;
carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
Further, the static examples, the dynamic examples, the abnormal examples, and the sensing results of the sensing data are input into a local multi-example scene fusion model for processing, so as to obtain a local scene output by the local multi-example scene fusion model, which specifically includes:
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model, combining the static examples, the dynamic examples and the abnormal examples to obtain corresponding three-dimensional scenes, and matching the sensing data into the three-dimensional scenes to obtain the local scenes.
Further, performing instance awareness on the initial scene data based on a dynamic instance awareness model, and determining a static instance, a dynamic instance, and an abnormal instance corresponding to the image data, specifically: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.
Further, the method for sensing the scene of the internet of things based on cloud edge coordination further comprises the following steps: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.
Further, the method for sensing the scene of the internet of things based on cloud-edge coordination further includes, before acquiring initial scene data to be sensed:
and pre-training the dynamic instance perception model deployed in the edge server based on a pre-deployed deep neural network model in the cloud server so as to realize sharing of part of target parameters of each dynamic instance perception model in each edge server in the training process of the dynamic instance perception model.
In a second aspect, the present invention further provides a device for sensing a scene of an internet of things based on cloud-edge coordination, including: the data extraction unit is used for acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;
the instance perception unit is used for carrying out instance perception on the initial scene data based on a dynamic instance perception model and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;
and the scene synthesis unit is used for inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
Further, the scene synthesis unit is specifically configured to:
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model, combining the static examples, the dynamic examples and the abnormal examples to obtain corresponding three-dimensional scenes, and matching the sensing data into the three-dimensional scenes to obtain the local scenes.
Further, the example sensing unit is specifically configured to: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.
Further, the scene synthesis unit is specifically configured to: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.
Further, the internet of things scene sensing device based on cloud edge coordination further includes, before acquiring initial scene data to be sensed: a model training unit;
the model training unit is used for pre-training the dynamic instance perception model deployed in the edge server based on a deep neural network model pre-deployed in the cloud server, so that part of target parameters of each local multi-instance scene fusion model in each edge server are shared in the dynamic instance perception model training process.
In a third aspect, the present invention also provides an electronic device, including: the scene perception method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the steps of the scene perception method based on the cloud edge cooperation for the internet of things are realized.
In a fourth aspect, the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when the computer program is executed by a processor, the steps of the method for scene awareness of internet of things based on cloud edge coordination are implemented.
According to the Internet of things scene perception method based on cloud edge coordination, static examples, dynamic examples and abnormal examples corresponding to image data are determined through example perception of initial scene data, and perception results of the static examples, the dynamic examples, the abnormal examples and the sensing data are input into a local multi-example scene fusion model deployed in an edge server to be processed, so that a local scene output by the local multi-example scene fusion model is obtained, the Internet of things scene perception processing time delay can be effectively reduced, and the adaptability and perception accuracy of high dynamic scenes in the Internet of things scene are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flow diagram of a scene sensing method of the internet of things based on cloud edge collaboration provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a deep neural network model training process based on cloud edge coordination according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training differentiation process of an abnormal case perception model provided by an embodiment of the present invention;
FIG. 4 is a flow chart of local scene awareness provided by an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a deep neural network model in a cloud server and an edge server according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a comparison between an inference delay and a transmission delay for processing a key point sensing task according to two calculation schemes provided by the embodiment of the present invention;
FIG. 7 is a diagram illustrating a comparison of loss _ keypoint when ShareLayer is set to the first 0, 4, and 8 layers of keypoint _ head when load batch is 3 according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an internet of things scene sensing device based on cloud edge collaboration provided by an embodiment of the present invention;
fig. 9 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the internet of things scene perception method based on cloud edge coordination is described in detail below. As shown in fig. 1, which is a schematic flow chart of a scene sensing method of the internet of things based on cloud edge coordination according to an embodiment of the present invention, a specific implementation process includes the following steps:
step 101: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data.
In the embodiment of the present invention, before executing this step, the dynamic instance awareness model deployed in the edge server needs to be pre-trained based on the deep neural network model pre-deployed in the cloud server, and part of target parameters of each dynamic instance awareness model in each edge server are shared in the dynamic instance awareness model training process, so as to improve the convergence speed of the dynamic instance awareness model in a high dynamic scene.
As shown in fig. 2, the invention provides a cloud-edge-based collaborative neural network model training method for a human body posture key point recognition task in dynamic instance perception, which is used for training a deep neural network model for an internet of things scene at each edge server, namely a dynamic instance perception model; meanwhile, the dynamic instance perception model deployed in the edge server is pre-trained on the basis of a deep neural network model pre-deployed in the cloud server to reduce the load of the edge server.
In the embodiment of the invention, the deep neural network model deployed on the cloud server is called CloudNet, and the lightweight deep neural network model deployed on the edge server is called EdgeNet. The method trains the EdgeNet based on the scene information acquired in real time to obtain the dynamic instance perception model of the practical application.
Specifically, a keypoint _ head (head network of feature points) of an ROI head (head network of a region of interest) in the EdgeNet is divided into two parts, namely a shared layer ShareLayer and an adaptive layer. Wherein, ShareLayer is a high m layer of keypoint _ head, is used for extracting common characteristics of examples in each sub-scene, and is trained by each EdgeNet together; the adaptive layer is the rest of the keypoint _ head and is used for adapting to different characteristics of the examples in each sub-scene, and the characteristics are different from each EdgeNet to each other. The roi (region of interest) refers to a region of interest in machine vision and image processing. In the aspect of acquiring the label data required by the EdgeNet training, the label data required by the training can be automatically generated and acquired through the cloudNet, because the number of the neural network layers of the cloudNet is large, and higher identification precision can be obtained. Compared with the traditional labeling mode, the method has the advantages that the workload can be greatly reduced by means of the mode of automatically generating the label data by using the CloudNet, and the model training and updating speed can be increased.
In the implementation process of the invention, the EdgeNet training process comprises an EdgeNet initialization process, an EdgeNet learning process and a Sharelayer updating process.
Wherein the process is initialized for EdgeNet. Let WcIs a parameter of CloudNet, Wc-bIs part of target parameter shared by the CloudNet backbone network and the EdgeNet, Wc-rsParameters for ShareLayer in ROI head of CloudNet; weAs a parameter of edgeNet, We-bIs a parameter of the backbone network in EdgeNet, We-rs、We-raParameters of ShareLayer and AdapteLayer in ROI head of EdgeNet, respectively. Given training set
Figure BDA0003394633480000071
EdgeNet will optimize the following loss function by training:
Figure BDA0003394633480000072
in the EdgeNet initialization phase, CloudNet is first trained using a large amount of data and sent down to the edge server. After the edge server receives the CloudNet, the W in the CloudNet is firstly addedc-b、Wc-rsAs W in edgeNete-b、We-rsParameter of, W is next toe-raInitialising to a random value and subsequently We-b、We-rs、We-raAre combined into WeWe is optimized to complete the EdgeNet training. In order to save computing resources, parameters of a backbone network part of the EdgeNet are frozen in the training process, and only the ROI head part is finely adjusted. By using
Figure BDA0003394633480000073
Represents WeThe value after the update, concatenating @, means connecting two parameter sets.
The method comprises the following specific steps: inputting:
Figure BDA0003394633480000074
WcCloudNet; and (3) outputting: EdgeNet, Label-builder. The algorithm 1 comprises: step 1: passing CloudNet (parameter value W) through cloud serverc) Sending the data to an edge server; step 2: generating W by edge server processing CloudNetc、Wc-b、Wc-rsThree copies; and step 3: the edge server connects the backbone network with ShareLayer and AdapteLayer in the ROI head to construct an EdgeNet; and 4, step 4: w is to bee-raInitializing to a random value; and 5: we=Wc-b∪We-rs∪We-ra(ii) a Step 6:
Figure BDA0003394633480000075
and 7: return the value of EdgeNet to
Figure BDA0003394633480000076
The value of Label-builder is Wc
Wherein the EdgeNet learning process is aimed at. The invention further trains the EdgeNet by using the scene information acquired in real time, and the method is based on the EdgeNet initialization stepAnd generating Label data required by training by the Label-builder obtained by the section. The Label data generated by the Label-builder is assumed to be accurate. After receiving the scene data, the edge server acquires a sensing result by using the EdgeNet and stores the sensing result in a database, and when the edge server is idle, Label data of the scene data is generated by using the Label-builder, and the EdgeNet is trained on the basis of the Label data of the scene data. Note the book
Figure BDA0003394633480000081
For the parameters of the EdgeNet after the end of the initialization stage of the EdgeNet, when the number of the instances in the newly acquired scene is accumulated to M, the EdgeNet starts to be trained, and the newly acquired M instances and the labels thereof are recorded as
Figure BDA0003394633480000082
EdgeNet will be trained to optimize the following loss function:
Figure BDA0003394633480000083
the method comprises the following specific steps:
inputting: the number of edges of the edge net is equal to the number of edges of the edge net,
Figure BDA0003394633480000084
target
Figure BDA0003394633480000085
Label-builder; and (3) outputting: and (5) updated EdgeNet.
Step 1: the edge server obtains the scene instance by using Label-builder
Figure BDA0003394633480000086
Is marked with a label
Figure BDA0003394633480000087
Step 2:
Figure BDA0003394633480000088
is composed of
Figure BDA0003394633480000089
An updated value; and step 3: return the value of EdgeNet to
Figure BDA00033946334800000810
Wherein the update procedure is directed to Sharelayer. The invention sets a shared layer ShareLayer (W) in the ROI head of the EdgeNete-rs) In the layer, each EdgeNet is used for joint training to further accelerate the convergence speed of the training EdgeNet and extract the common characteristics of the instances in different scenes. In the real-time training process, after each EdgeNet on each edge server is trained for a certain number of times, parameters of the ShareLayer in the ROI head of each edge server are extracted and uploaded to the cloud server, and the cloud server collects gradients generated by the ShareLayer in each EdgeNet training by adopting a preset Fedavg algorithm to update the ShareLayer. Note the book
Figure BDA00033946334800000811
For the updated parameters of the Sharelayer, the detailed procedure of the Sharelayer update is as follows:
inputting: sharelayer of N edgenets, training batch size, Ni,
Figure BDA0003394633480000091
Sharelayer, W of CloudNetc-rs(ii) a And (3) outputting: an updated ShareLayer.
Step 1: each edge server separates Sharelayer's parameters from the EdgeNet (i.e., each edge server separates Sharelayer's parameters from the EdgeNet
Figure BDA0003394633480000092
) Uploading the cloud server; step 2: calculating gradients
Figure BDA0003394633480000093
And step 3: calculating a weighted average gradient sum
Figure BDA0003394633480000094
Wherein
Figure BDA0003394633480000095
And 4, step 4:
Figure BDA0003394633480000096
step 5 returns the updated ShareLayer to
Figure BDA0003394633480000097
After the ShareLayer is updated, the cloud server issues the updated ShareLayer parameters to each edge server, and each edge server calls the step 3-7 of the algorithm 1 again to train the EdgeNet, so that an EdgeNet initialization-EdgeNet learning-ShareLayer updating cycle is completed, and finally a dynamic instance perception model meeting application conditions is obtained.
Scene instance extraction can be implemented in this step. Specifically, in the scene of the internet of things, the initial scene information includes two categories, namely, sensing data (such as voltage, temperature, humidity and the like) and image data (such as personnel, inspection robots, faulty equipment and the like), wherein the image data is divided into three categories, namely a static example, a dynamic example and an abnormal example. In the scene instance extraction stage, initial scene information is firstly classified to obtain static instances, dynamic instances, abnormal instances and sensing data.
The sensing data can be acquired by preset sensing data acquisition equipment, and the sensing data is processed by adopting a preset N-shot K-way small sample fusion algorithm model. In particular, for sensing data
Figure BDA0003394633480000098
First, K support set classes are constructed, each class having N samples (S)1,...,SN) The objective of the perceptual algorithm model is to determine to which support set class the collected sensing data should belong, and the optimization objective is shown as the following formula:
Figure BDA0003394633480000099
in the formula:
Figure BDA00033946334800000910
sensing data in the scene of the Internet of things is fused into K types in the learning process of the N-shot K-way small sample, so that the calculation amount of subsequent analysis of upper-layer application is reduced.
The image data are obtained by preset image acquisition equipment and comprise a plurality of example types, and the image data are divided into three types, namely static examples, dynamic examples and abnormal examples. The method adopts a Mask-RCNN model to detect and extract three types of target examples from image data and generate a Mask so as to complete perception processing by using a corresponding perception algorithm model (namely an example perception model) in subsequent steps.
Step 102: carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server.
In the specific implementation process of this step, the initial scene data may be input into a preset dynamic instance perception model for instance perception, so as to obtain a static instance, a dynamic instance, and an abnormal instance corresponding to the image data. The dynamic instance perception model is obtained by training with preset marking data as training samples.
It should be noted that the static examples refer to relatively fixed objects in the scene of the internet of things, such as a factory building, a generator set, a pipeline, and the like. In the embodiment of the invention, the 3D model of the static instance can be preset in the edge server when the scene instance perception process is implemented, and the preset 3D model is directly called to participate in the fusion of the local scene after the static instance in the scene is identified by the Mask-RCNN. The dynamic examples refer to objects, such as workers and the like, which change in real time in the scene of the internet of things, and can be synthesized into the 3D model by detecting key points and then synthesizing the 3D model according to the key point parameters. Taking the generation of the human body 3D Model as an example, firstly, parameters of key points of the human body are detected by using a keypoint _ head branch of Mask RCNN, then, the SMPL (a Skinned Multi-Person Linear Model) is further used to parameterize the human body Model to generate the human body 3D Model based on the detected parameters, and finally, the perception of the human body instance (i.e. dynamic instance) is completed. The abnormal examples are abnormal information in scenes of the Internet of things, such as illegal intrusion, fault equipment and the like, and are characterized in that the characteristics cannot be predicted, and the 3D model can only be directly perceived by scene images.
According to the method, the abnormal instances are sensed through the Mesh-RCNN, the abnormal instances in the initial image are identified based on the Mask-RCNN, the sub-images of the abnormal instances are segmented, and then Mesh data of the instances are generated by the Mesh-RCNN based on the sub-images. The method is used for respectively training the Mesh-RCNN models for different types of abnormal instances, and when the abnormal instances are sensed, the corresponding Mesh-RCNN models are selected from the model base according to corresponding label data to generate Mesh data of the abnormal instances.
As shown in FIG. 3, it is the training differentiation process of Mesh-RCNN. The Mask-RCNN is first trained based on labeling data so that it can detect several specific types of abnormal instances, and labeled with labels, respectively. Under the initial condition, the Mesh-RCNN model corresponding to each tag data is a universal model and is obtained by initialization based on a pre-training model deployed on a cloud server. And when a certain number of instances corresponding to the label data are accumulated, obtaining the Mesh labels of the instances through a related labeling technology, and training a Mesh-RCNN model corresponding to the label based on the Mesh labels. When a new abnormal instance needs to be detected, the Mask-RCNN can be trained again based on the labeled data so that a new abnormal instance can be detected, and then the Mesh-RCNN training process is repeated again.
Step 103: and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
As shown in fig. 4, in this step, the static examples, the dynamic examples, the abnormal examples, and the sensing results of the sensing data are input into a local multi-example scene fusion model, the static examples, the dynamic examples, and the abnormal examples are combined to obtain corresponding three-dimensional scenes, and the sensing data are matched into the three-dimensional scenes to obtain the local scenes. Furthermore, at least one local scene can be fused at the cloud server end, so that a global scene can be synthesized based on the local scenes. Specifically, after the initial scene information is acquired by the acquisition device in step 101, a local scene is obtained by performing three stages of instance extraction, instance perception and scene fusion based on a pre-deployed model in the edge server. And finally, synthesizing each local scene at the cloud server side to obtain a global scene after each local scene is obtained based on the perception of each edge server.
In the scene synthesis process, the 3D models of the static examples, the dynamic examples and the abnormal examples are combined based on the edge server, the sensing data are matched into the three-dimensional scene, and the perception of the local scene is finally completed. It should be noted that, in the implementation process, two copies of the synthesized local scene are retained: one is stored in the edge server to provide real-time service for local users, and the other is uploaded to the cloud server to synthesize a global scene.
Specifically, the feature extraction process is denoted as F, the instance perception process is denoted as P, the instance synthesis process is denoted as B, the initial data is denoted as Ori, the sensing data in the scene of the internet of things is denoted as Sen, the dynamic instance is denoted as Dym, the static instance is denoted as Sta, the abnormal instance is denoted as Gen, the local scene perception result is denoted as Loc, and the global scene perception result is denoted as Gol. From a dataflow perspective, the multi-instance perceptual fusion process of local scenes may be represented as:
Figure BDA0003394633480000121
the global scene composition process at the cloud server side can be represented as follows:
Figure BDA0003394633480000122
the following is illustrated with reference to specific examples:
the specific process verifies that the cloud edge cooperation-based deep neural network model training method is verified in a human body posture key point recognition task. A cloud edge collaborative training method of Mask RCNN is constructed based on a available Decctron 2 framework in target detection, CloudNet and EdgeNet are keypoint _ RCNN _ R _50_ FPN _3x models provided based on Decctron 2, and detailed network structures of the models are shown in FIG. 5. The network structure of the CloudNet is the same as that of the original model, the backbone network of the EdgeNet reserves stem, res2 and res3 parts in the backbone network of the original model, and the rest parts are the same as that of the original model. ShareLayer extracting common characteristics of parameters is the front 4 layers of FPN, Box Head, Box _ predictor and keypoint _ Head of Mask RCNN, and AdapteLayer is the back 4 layers of keypoint _ Head. The preset image data with the human body posture key point labels are used as real-time scene data, and the real-time scene data acquisition process is simulated by loading the image data in batches in the training process.
The specific process of the invention mainly compares two aspects: the training effect of the scene perception average time delay and cloud edge collaborative training method.
Fig. 6 shows processing delay and transmission delay of the human body posture key point detection task executed by the cloud server and the edge server, where the inference delay is actually measured by the simulation system, the transmission delay is summarized by the online data report, and the data in the figure is transmission delay in the WiFi environment. Compared with a single cloud computing processing mode, the scene instance perception task is sunk to the edge server, so that the total processing delay is reduced by about 39.8%. The inference delay is reduced by 38.6%, and the transmission delay is reduced by 26.9%.
In order to verify the effect of the cloud-edge cooperation training method on the EdgeNet training, the process of acquiring scene data and training the EdgeNet in practice is simulated. Initially, the cloud server has been deployed with a pre-trained CloudNet (deep neural network model). The edge server firstly initializes the EdgeNet (initial dynamic instance perception model) based on the preset 100 pieces of image data; then, loading the pictures by taking 100 pictures as a batch, and after the loading of each batch is finished, the edge server retrains the EdgeNet; after the edge servers finish 2 times of the retraining of the EdgeNet, the united cloud server updates the ShareLayer and sends an updating result to the edge servers, after one batch of pictures are loaded again, the edge servers train the EdgeNet based on the merged ShareLayer, and therefore, after the pictures of three batches are loaded, one round of EdgeNet initialization-edgeNet learning-ShareLayer updating circulation is finished. The invention simulates 12 loading batches in total, namely completing the processing cycle for 4 times, and finally obtaining the dynamic instance perception model.
The invention verifies the change condition of the loss _ keypoint along with the iteration times when the ShareLayer is respectively the first 0 layer, the first 4 layer and the first 8 layers of the keypoint _ head. Wherein, ShareLayer is the first 4 layers of keypoint _ head, namely the method provided by the invention. As shown in fig. 7, it shows the comparison of loss _ keypoint in the process of training EdgeNet by using the above three methods when loading for the 3 rd time. As can be seen, setting ShareLayer _ num to 4 results in the smallest loss _ keypoint, which results in 7.428% less loss _ keypoint than ShareLayer _ num to 0 and 4.856% less loss _ keypoint than ShareLayer _ num to 8.
The invention provides an internet of things scene perception method based on cloud-edge cooperation, aiming at the problem that the existing cloud computing mode and edge computing mode are difficult to adapt to the requirements of the internet of things scene perception on real-time performance and accuracy. Firstly, characteristics and perception requirements of scene data such as sensing data and image data in scenes of the Internet of things are analyzed, a scene information perception method for distinguishing dynamic examples, static examples and abnormal examples is designed, and local scene information edge perception and global scene cloud synthesis are supported. Secondly, aiming at a high-precision network recognition model (namely a dynamic instance perception model) of a high-frequency change dynamic instance in a scene, a deep neural network model training method based on cloud edge cooperation is designed, training of the neural network model on an edge server is assisted through a cloud, partial parameters of each edge neural network model are shared by the cloud to improve the convergence speed of the model, the dynamic instance perception model is obtained, and perception processing time delay and model training time are effectively reduced.
By adopting the internet of things scene perception method based on cloud edge coordination, the time delay of the perception processing of the internet of things scene can be effectively reduced, and the adaptability and perception accuracy of the high-dynamic scene in the internet of things scene are improved.
Corresponding to the method for sensing the scene of the internet of things based on cloud-edge coordination, the invention further provides a device for sensing the scene of the internet of things based on cloud-edge coordination. Since the embodiment of the device is similar to the method embodiment described above, the description is relatively simple, and please refer to the description in the method embodiment section, and the following description of the embodiment of the internet of things scene sensing device based on cloud edge coordination is only illustrative. Fig. 8 is a schematic structural diagram of an internet of things scene sensing device based on cloud edge coordination according to an embodiment of the present invention.
The invention relates to an Internet of things scene sensing device based on cloud edge collaboration, which specifically comprises:
a data extraction unit 801, configured to acquire initial scene data to be perceived; the initial scene data comprises image data and sensing data;
an instance perception unit 802, configured to perform instance perception on the initial scene data based on a dynamic instance perception model, and determine a static instance, a dynamic instance, and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;
and a scene synthesis unit 803, configured to input the static instance, the dynamic instance, the abnormal instance, and the sensing result of the sensing data into a local multi-instance scene fusion model for processing, so as to obtain a local scene output by the local multi-instance scene fusion model.
Further, the scene synthesis unit is specifically configured to:
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model, combining the static examples, the dynamic examples and the abnormal examples to obtain corresponding three-dimensional scenes, and matching the sensing data into the three-dimensional scenes to obtain the local scenes.
Further, the example sensing unit is specifically configured to: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.
Further, the scene synthesis unit is specifically configured to: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.
Further, the internet of things scene sensing device based on cloud edge coordination further includes, before acquiring initial scene data to be sensed: a model training unit;
the model training unit is used for pre-training the dynamic instance perception model deployed in the edge server based on a deep neural network model pre-deployed in the cloud server so as to realize sharing of part of target parameters of each dynamic instance perception model in each edge server in the dynamic instance perception model training process.
By adopting the internet of things scene perception device based on cloud edge coordination, the static examples, the dynamic examples and the abnormal examples corresponding to the image data are determined by carrying out example perception on the initial scene data, and the perception results of the static examples, the dynamic examples, the abnormal examples and the sensing data are input into the dynamic example perception model deployed in the edge server for processing, so that the local scene output by the dynamic example perception model is obtained, the internet of things scene perception processing time delay can be effectively reduced, and the adaptability and perception precision of high dynamic scenes in the internet of things scene are improved.
Corresponding to the method for sensing the scene of the internet of things based on cloud edge cooperation, the invention further provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 9 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor)901, a memory (memory)902 and a communication bus 903, wherein the processor 901 and the memory 902 complete communication with each other through the communication bus 903 and communicate with the outside through a communication interface 904. The processor 901 may invoke logic instructions in the memory 902 to perform a cloud-edge coordination-based internet of things scene awareness method, the method comprising: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server; and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
Furthermore, the logic instructions in the memory 902 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, embodiments of the present invention further provide a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for scene awareness of the internet of things based on cloud edge coordination provided by the above-mentioned method embodiments. The method comprises the following steps: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server; and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
In another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when the computer program is executed by a processor, the computer program is implemented to perform the method for sensing a scene of an internet of things based on cloud edge coordination provided in the foregoing embodiments. The method comprises the following steps: acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data; carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server; and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A scene perception method of the Internet of things based on cloud edge collaboration is characterized by comprising the following steps:
acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;
carrying out instance perception on the initial scene data based on a dynamic instance perception model, and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
2. The internet of things scene perception method based on cloud edge collaboration according to claim 1, wherein the static examples, the dynamic examples, the abnormal examples and the perception results of the sensing data are input into a local multi-example scene fusion model to be processed, and a local scene output by the local multi-example scene fusion model is obtained, specifically including:
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model, combining the static examples, the dynamic examples and the abnormal examples to obtain corresponding three-dimensional scenes, and matching the sensing data into the three-dimensional scenes to obtain the local scenes.
3. The internet of things scene perception method based on cloud-edge collaboration as claimed in claim 1, wherein instance perception is performed on the initial scene data based on a dynamic instance perception model, and a static instance, a dynamic instance, and an abnormal instance corresponding to the image data are determined, specifically: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.
4. The internet of things scene awareness method based on cloud-edge collaboration as claimed in claim 1, further comprising: and fusing at least one local scene at the cloud server side to synthesize a global scene based on the local scene.
5. The internet of things scene perception method based on cloud-edge collaboration as claimed in claim 1, further comprising, before obtaining initial scene data to be perceived:
and pre-training the dynamic instance perception model deployed in the edge server based on a pre-deployed deep neural network model in the cloud server so as to realize sharing of part of target parameters of each dynamic instance perception model in each edge server in the training process of the dynamic instance perception model.
6. The utility model provides a thing networking scene perception device based on cloud limit is cooperative which characterized in that includes:
the data extraction unit is used for acquiring initial scene data to be perceived; the initial scene data comprises image data and sensing data;
the instance perception unit is used for carrying out instance perception on the initial scene data based on a dynamic instance perception model and determining a static instance, a dynamic instance and an abnormal instance corresponding to the image data; the dynamic instance perception model is a deep neural network model which is finished in advance based on cloud edge collaborative training and is deployed in an edge server;
and the scene synthesis unit is used for inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model for processing to obtain a local scene output by the local multi-example scene fusion model.
7. The internet of things scene awareness apparatus based on cloud edge collaboration as claimed in claim 6, wherein the scene synthesis unit is specifically configured to:
and inputting the static examples, the dynamic examples, the abnormal examples and the sensing results of the sensing data into a local multi-example scene fusion model, combining the static examples, the dynamic examples and the abnormal examples to obtain corresponding three-dimensional scenes, and matching the sensing data into the three-dimensional scenes to obtain the local scenes.
8. The internet of things scene awareness apparatus based on cloud edge collaboration as claimed in claim 6, wherein the instance awareness unit is specifically configured to: and inputting the initial scene data into the dynamic instance perception model for instance perception to obtain a static instance, a dynamic instance and an abnormal instance corresponding to the image data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for scene awareness of internet of things based on cloud edge coordination according to any one of claims 1 to 5.
10. A processor-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for scene awareness of internet of things based on cloud-edge collaboration as claimed in any one of claims 1 to 5.
CN202111478787.9A 2021-12-06 2021-12-06 Internet of things scene perception method and device based on cloud edge cooperation Pending CN114299370A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111478787.9A CN114299370A (en) 2021-12-06 2021-12-06 Internet of things scene perception method and device based on cloud edge cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111478787.9A CN114299370A (en) 2021-12-06 2021-12-06 Internet of things scene perception method and device based on cloud edge cooperation

Publications (1)

Publication Number Publication Date
CN114299370A true CN114299370A (en) 2022-04-08

Family

ID=80966407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111478787.9A Pending CN114299370A (en) 2021-12-06 2021-12-06 Internet of things scene perception method and device based on cloud edge cooperation

Country Status (1)

Country Link
CN (1) CN114299370A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760214A (en) * 2022-04-21 2022-07-15 中国地质大学(北京) Service anomaly detection method based on edge-cloud cooperative network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760214A (en) * 2022-04-21 2022-07-15 中国地质大学(北京) Service anomaly detection method based on edge-cloud cooperative network
CN114760214B (en) * 2022-04-21 2023-12-08 中国地质大学(北京) Service abnormality detection method based on edge-cloud cooperative network

Similar Documents

Publication Publication Date Title
CN109344908B (en) Method and apparatus for generating a model
CN109214343B (en) Method and device for generating face key point detection model
CN111741330B (en) Video content evaluation method and device, storage medium and computer equipment
CN108230278B (en) Image raindrop removing method based on generation countermeasure network
CN112967212A (en) Virtual character synthesis method, device, equipment and storage medium
CN110598019B (en) Repeated image identification method and device
WO2021120961A1 (en) Brain addiction structure map evaluation method and apparatus
CN111028216A (en) Image scoring method and device, storage medium and electronic equipment
CN109919252A (en) The method for generating classifier using a small number of mark images
CN112398674B (en) Method and device for generating VNFD configuration template for describing virtual network functions
CN110807437A (en) Video granularity characteristic determination method and device and computer-readable storage medium
US20220366244A1 (en) Modeling Human Behavior in Work Environment Using Neural Networks
WO2023030381A1 (en) Three-dimensional human head reconstruction method and apparatus, and device and medium
CN109583367A (en) Image text row detection method and device, storage medium and electronic equipment
CN114610272A (en) AI model generation method, electronic device, and storage medium
JP2023526899A (en) Methods, devices, media and program products for generating image inpainting models
CN114299370A (en) Internet of things scene perception method and device based on cloud edge cooperation
CN117726884B (en) Training method of object class identification model, object class identification method and device
CN116152938A (en) Method, device and equipment for training identity recognition model and transferring electronic resources
CN113763928A (en) Audio category prediction method and device, storage medium and electronic equipment
Lou et al. Real-time 3D facial tracking via cascaded compositional learning
CN112818887A (en) Human body skeleton sequence behavior identification method based on unsupervised learning
CN110197226B (en) Unsupervised image translation method and system
CN115049764B (en) Training method, device, equipment and medium of SMPL parameter prediction model
CN115146788A (en) Training method and device of distributed machine learning model and electric equipment storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination