CN110210378B - Embedded video image analysis method and device based on edge calculation - Google Patents

Embedded video image analysis method and device based on edge calculation Download PDF

Info

Publication number
CN110210378B
CN110210378B CN201910461504.6A CN201910461504A CN110210378B CN 110210378 B CN110210378 B CN 110210378B CN 201910461504 A CN201910461504 A CN 201910461504A CN 110210378 B CN110210378 B CN 110210378B
Authority
CN
China
Prior art keywords
convolution
current
target
neural network
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910461504.6A
Other languages
Chinese (zh)
Other versions
CN110210378A (en
Inventor
张江辉
马敏
田西兰
赵洪立
蔡红军
王曙光
夏勇
夏鹏
王斌
刘丽莎
吴昭
吴颖
李江涛
孙龙
吴涛
姜欢欢
刘海飞
常沛
张玉营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN201910461504.6A priority Critical patent/CN110210378B/en
Publication of CN110210378A publication Critical patent/CN110210378A/en
Application granted granted Critical
Publication of CN110210378B publication Critical patent/CN110210378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an embedded video image analysis method and device based on edge calculation, which are applied to analyzing camera images in a video monitoring network, wherein the video monitoring network comprises a plurality of cameras connected with a monitoring center, and the method comprises the following steps: recognizing a preset target from a video shot by a camera; aiming at a preset target identified from a video, acquiring attribute features of the preset target and/or scene attribute features of the preset target; the attribute characteristics of the preset target comprise type, position and quantity in the image and the like; the scene attribute characteristics of the preset target include: one or a combination of shooting time, shooting place and shooting angle of the original image; and uploading the acquired attribute characteristics of the preset target and the scene attribute characteristics of the preset target to a monitoring center corresponding to the camera for constructing a video big data analysis application system. By applying the embodiment of the invention, the cost can be saved.

Description

Embedded video image analysis method and device based on edge calculation
Technical Field
The invention relates to an image identification method and device, in particular to an embedded video image analysis method and device based on edge calculation.
Background
The image, especially the video image, has abundant information which is difficult to compare with other information acquisition means, so that the image is the most intuitive and reliable information acquisition means for human, and has been the focus of people to pay attention to and rely on. With the development of social security technology, monitoring networks based on monitoring video images play an important role in the fields of security, traffic and the like. At present, a large number of video monitoring cameras are arranged in numerous places such as urban roads, expressways, markets, stations and the like, and a monitoring video network with wide coverage is formed. A large number of video images which can be generated by the network every day gather large-scale monitoring video data resources. However, since the image itself is unstructured information, it is not possible to directly utilize the big data technology for mining, so that a large amount of monitoring video images with rich information cannot be processed in real time and effectively utilized. Under the general condition, analysis and interpretation of monitoring video images mainly depend on manual work, so that the efficiency is low, and the situation that the timeliness requirement of safety monitoring individuals is high cannot be met, such as: the method comprises the steps of monitoring and controlling the overall situation of road traffic conditions in real time during urban traffic rush hours, quickly tracking vehicles and personnel in a vicious crime case, performing related detection and the like. In the situation, the video monitoring images in a larger range need to be analyzed and processed in real time to form a more complete and clear situation, so that accurate and powerful information support is provided for traffic control and case detection decision.
In order to solve the above problems, the conventional identification method is to construct a "cloud" computing center in the background and analyze the video image by using the strong computing power of the "cloud" center. However, the existing video cameras can generate tens of megabits of video data per second, and when a large amount of data is formed by original video images generated by thousands or tens of thousands of deployed video cameras, not only a huge challenge is created for the data transmission capability of a video monitoring network, but also the computing capability of a cloud center is very easy and difficult to deal with, so that the method needs to greatly transform and upgrade the existing video data transmission network, greatly improve the computing capability of the cloud computing center, and further cause higher cost.
Therefore, the prior art has the technical problem that the traditional monitoring video upgrading cost is high.
Disclosure of Invention
The invention provides an embedded video image analysis method and device based on edge computing, which is an edge computing device and is used for solving the technical problem that the traditional surveillance video in the prior art is high in upgrading cost.
The invention solves the technical problems through the following technical scheme:
the embodiment of the invention provides an embedded video image analysis method based on edge calculation, which is applied to a camera in a video monitoring network, wherein the video monitoring network comprises a plurality of cameras in communication connection with a monitoring center, and the method comprises the following steps:
identifying a preset target from a video shot by a camera, wherein the preset target comprises: one or a combination of a person, a vehicle, a building;
the method comprises the steps of acquiring attribute features of a preset target and/or scene attribute features of the preset target aiming at the preset target identified from a video, wherein the attribute features of the preset target comprise: the method comprises the steps that a preset target is a vehicle, and the type, body color, license plate, vehicle position and the like of the vehicle are identified; the preset target is a person, and the gender, age, clothing, position and the like of the person are identified; the preset target is one of buildings, and the type, the position and the like of the preset target are identified; the scene attribute characteristics of the preset target include: one or a combination of shooting time, shooting place and shooting angle of the original image;
and uploading the acquired attribute characteristics of the preset target and the scene attribute characteristics of the preset target to a monitoring center corresponding to the camera.
Optionally, before the preset target is identified from the video captured by the camera, the method further includes:
the method comprises the steps of acquiring an original image in video stream data shot by a camera, and taking the original image as a video shot by the camera.
Optionally, the acquiring an original image in video stream data captured by a camera includes:
acquiring model data of a camera, and searching a video coding format of the camera from a pre-stored model data-video coding format list according to the model data of the camera;
and decoding the video stream data shot by the camera by using a decoding method corresponding to the video coding format, and restoring an original image shot by the camera.
Optionally, the identifying a preset target from a video shot by a camera includes:
the method comprises the following steps that an ARM is used as a main control unit, an FPGA is used as a core acceleration unit to construct a hardware computing framework for identifying a preset target; based on the hardware architecture, a preset target contained in each original image China is identified by utilizing a pre-constructed convolutional neural network model, wherein the preset target comprises: one or a combination of a person, a vehicle, a building.
Optionally, the process of constructing the pre-constructed target convolutional neural network is as follows:
constructing an initial convolutional neural network with an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer, and training;
acquiring a conversion matrix aiming at pruning operation according to the number of convolution kernels in a target convolution neural network obtained after preset pruning and the number of convolution kernels in the constructed initial convolution neural network;
acquiring the minimum reconstruction error of each convolution kernel in the initial convolution neural network according to the conversion matrix and the weight of each convolution kernel;
and eliminating the convolution kernels of which the corresponding minimum reconstruction errors exceed a preset numerical range to obtain the constructed target convolution neural network.
Optionally, the obtaining a conversion matrix for the pruning operation according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network includes:
according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network, using a formula, Y = (N × c × k) h ×k w ) -1 ·n×c×k h ×k w A transformation matrix for the pruning operation is obtained, wherein,
y is a conversion matrix for pruning operation; n is the number of convolution kernels in the initial convolution neural network; c is the number of channels corresponding to the characteristic diagram; k is a radical of h ×k w Is the size of the convolution kernel;and n is the number of convolution kernels in the target convolution neural network obtained after pruning.
Optionally, the obtaining a minimum reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel includes:
based on the transform matrix and the weights of the various convolution kernels, using a formula,
Figure BDA0002078200930000031
obtaining a minimized reconstruction error of each convolution kernel in the initial convolutional neural network, wherein,
min is a minimum evaluation function; beta is a selection vector coefficient corresponding to a channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w is a group of T A transpose matrix that is a weight matrix of the convolution kernel; c' is the number of channels retained after pruning; c is the number of channels corresponding to the characteristic diagram; i O 0 Is a zero norm function.
Optionally, the obtaining a minimum reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel includes:
for each convolution kernel, a transformation matrix is generated based on the transformation matrix and the weights of the respective convolution kernels, using a formula,
Figure BDA0002078200930000032
and acquiring the reconstruction error of each convolution kernel in the initial convolution neural network, wherein,
beta is a selection vector coefficient corresponding to the channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; i O F Is a norm function; y is a transformation matrix for pruning operations(ii) a Sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; λ is a penalty coefficient; | | non-woven hair 1 Is a norm function;
Figure BDA0002078200930000041
is any one of i; c' is the number of channels retained after pruning; c is the number of channels corresponding to the characteristic diagram; i O 0 Is a zero norm function.
Optionally, the removing the convolution kernel whose corresponding minimized reconstruction error exceeds the preset value range to obtain the constructed target convolution neural network includes:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel left after the elimination, the weight matrix of the convolution kernel is kept unchanged, and by utilizing a formula,
Figure BDA0002078200930000042
obtaining a current value of a selection vector coefficient corresponding to a channel of length c, wherein,
Figure BDA0002078200930000043
selecting a current value of the vector coefficient corresponding to the channel with the length of c; argmin is a function minimum variable evaluation function;
determining | | | beta | | | non-conducting phosphor 0 Whether to converge;
if so, using a formula,
Figure BDA0002078200930000044
obtaining weights corresponding to the convolution kernel that minimizes the reconstruction error; taking the current value of the selection vector coefficient corresponding to the channel with the length c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector of the convolution kernelThe coefficient and the target convolution kernel weight update the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of obtaining the current value of the selection vector coefficient corresponding to the channel with the length of c until | |. Beta | | computation is finished 0 Converging;
and taking the updated current network model as a current network model, taking the next convolutional layer of the current convolutional layer as a current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset value range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as a target convolutional neural network model.
Optionally, the removing the convolution kernel whose corresponding minimized reconstruction error exceeds the preset value range to obtain the constructed target convolution neural network includes:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel remained after the elimination, by using a formula,
Figure BDA0002078200930000051
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure BDA0002078200930000052
selecting a current value of the vector coefficient corresponding to the channel with the length c; argmin is a function minimum variable evaluation function;
by means of the formula (I) and (II),
Figure BDA0002078200930000053
obtaining a current weight of a convolution kernel corresponding to the reconstruction error;
judging whether the reconstruction error corresponding to the current value of the selected vector coefficient and the current weight of the convolution kernel is converged;
if yes, taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as a target selection vector coefficient and a target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of acquiring the current value of the selection vector coefficient corresponding to the channel with the length of c until the reconstruction error corresponding to the current value of the selection vector coefficient and the current weight of the convolution kernel converges;
and taking the updated current network model as the current network model, taking the next convolutional layer of the current convolutional layer as the current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as the target convolutional neural network model.
Optionally, the step of using the pruned current network model as a target convolutional neural network model includes:
quantizing the model parameters in the pruned current network model by using a linear quantization algorithm, and converting 32-bit floating point numbers into 8-bit integers;
coding the current network model after the model parameters are quantized by using a Huffman coding algorithm;
and taking the coded current network model as a target convolutional neural network model.
Optionally, when using the pre-trained convolutional neural network model for identification, the n × m convolutional kernel operation is split into n × m multiplication operations and n × m-1 addition operations, and,
when n x m is an odd number, taking n x m-1 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-1 times of addition operation is completed to obtain an operation result of an n x m convolution kernel;
when n x m is an even number, taking n x m-2 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-2 times of addition operation is completed; and summing the sum of the n x m-2 times of addition operation and the addition operation which does not participate in the operation to obtain an operation result of the n x m convolution kernel.
The embodiment of the invention provides an embedded video image analysis device based on edge calculation, which is applied to a camera in a video monitoring network, wherein the video monitoring network comprises a plurality of cameras in communication connection with a monitoring center, and the device comprises:
the identification module is used for identifying a preset target from a video shot by a camera, wherein the preset target comprises: one or a combination of a person, a vehicle, a building;
the first obtaining module is configured to, for a preset target identified from a video, obtain an attribute feature of the preset target and/or a scene attribute feature of the preset target, where the attribute feature of the preset target includes: when the preset target is a vehicle, one or a combination of the type, the body color, the license plate and the position of the vehicle; when the preset target is a person, one or a combination of the sex, the age, the clothing and the position of the person; when the preset target is a building, one or a combination of the position and the type of the building; the scene attribute characteristics of the preset target include: one or a combination of shooting time, shooting place and shooting angle of the original image;
and the uploading module is used for uploading the acquired attribute characteristics of the preset target and the scene attribute characteristics of the preset target to a monitoring center corresponding to the camera.
Optionally, the embodiment of the present invention further includes: and the second acquisition module is used for acquiring an original image in the video stream data shot by the camera and taking the original image as a video shot by the camera.
Optionally, the second obtaining module is configured to:
acquiring the model data of a camera, and searching the video coding format of the camera from a pre-stored model data-video coding format list according to the model data of the camera;
and decoding the video stream data shot by the camera by using a decoding method corresponding to the video coding format, and restoring an original image shot by the camera.
Optionally, the identification module is configured to:
the method comprises the following steps that an ARM is used as a main control unit, an FPGA is used as a core acceleration unit to construct a hardware computing architecture for recognizing a preset target; based on the hardware architecture, a preset target contained in each original image China is identified by utilizing a pre-constructed convolutional neural network model, wherein the preset target comprises: one or a combination of a person, a vehicle, a building.
Optionally, the process of constructing the pre-constructed target convolutional neural network is as follows:
constructing an initial convolutional neural network with an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer, and training;
acquiring a conversion matrix aiming at pruning operation according to the number of convolution kernels in a target convolution neural network obtained after preset pruning and the number of convolution kernels in the constructed initial convolution neural network;
acquiring the minimum reconstruction error of each convolution kernel in the initial convolution neural network according to the conversion matrix and the weight of each convolution kernel;
and eliminating the convolution kernels of which the corresponding minimum reconstruction errors exceed the preset numerical range to obtain the constructed target convolution neural network.
Optionally, the obtaining a conversion matrix for the pruning operation according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network includes:
according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network, using a formula, Y = (N × c × k) h ×k w ) -1 ·n×c×k h ×k w A transformation matrix for the pruning operation is obtained, wherein,
y is a conversion matrix for pruning operation; n is the number of convolution kernels in the initial convolution neural network; c is the number of channels corresponding to the characteristic diagram; k is a radical of h ×k w Is the size of the convolution kernel; and n is the number of convolution kernels in the target convolution neural network obtained after pruning.
Optionally, the obtaining a minimum reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel includes:
based on the transform matrix and the weights of the various convolution kernels, using a formula,
Figure BDA0002078200930000071
obtaining a minimized reconstruction error of each convolution kernel in the initial convolutional neural network, wherein,
min is a minimum evaluation function; beta is a selection vector coefficient corresponding to a channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w is a group of T Transpose moment of weight matrix for convolution kernelArraying; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
Optionally, the obtaining, according to the transformation matrix and the weight of each convolution kernel, a minimized reconstruction error of each convolution kernel in the initial convolutional neural network includes:
for each convolution kernel, a transformation matrix is generated based on the transformation matrix and the weights of the respective convolution kernels, using a formula,
Figure BDA0002078200930000081
and acquiring the reconstruction error of each convolution kernel in the initial convolution neural network, wherein,
beta is a selection vector coefficient corresponding to a channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; i O F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; λ is a penalty coefficient; i O 1 Is a norm function;
Figure BDA0002078200930000085
is any one of i; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
Optionally, the removing the convolution kernel whose corresponding minimized reconstruction error exceeds the preset value range to obtain the constructed target convolution neural network includes:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel left after the elimination, the weight matrix of the convolution kernel is kept unchanged, and by utilizing a formula,
Figure BDA0002078200930000083
obtaining a current value of a selection vector coefficient corresponding to a channel of length c, wherein,
Figure BDA0002078200930000084
selecting a current value of the vector coefficient corresponding to the channel with the length of c; argmin is a function minimum variable evaluation function;
determining | | | beta | | | non-conducting phosphor 0 Whether to converge;
if so, using a formula,
Figure BDA0002078200930000091
obtaining weights corresponding to the convolution kernel that minimizes the reconstruction error; taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of obtaining the current value of the selection vector coefficient corresponding to the channel with the length of c until | |. Beta | | computation is finished 0 Converging;
and taking the updated current network model as the current network model, taking the next convolutional layer of the current convolutional layer as the current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as the target convolutional neural network model.
Optionally, the removing the convolution kernel whose corresponding minimized reconstruction error exceeds the preset value range to obtain the constructed target convolution neural network includes:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel remained after the elimination, by using a formula,
Figure BDA0002078200930000092
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure BDA0002078200930000093
selecting a current value of the vector coefficient corresponding to the channel with the length of c; argmin is a function minimum variable evaluation function; />
By means of the formula(s),
Figure BDA0002078200930000094
obtaining a current weight of a convolution kernel corresponding to the reconstruction error;
judging whether the reconstruction error corresponding to the current value of the selected vector coefficient and the current weight of the convolution kernel is converged;
if yes, taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of acquiring the current value of the selection vector coefficient corresponding to the channel with the length of c until the reconstruction error corresponding to the current value of the selection vector coefficient and the current weight of the convolution kernel converges;
and taking the updated current network model as the current network model, taking the next convolutional layer of the current convolutional layer as the current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as the target convolutional neural network model.
Optionally, the step of using the pruned current network model as a target convolutional neural network model includes:
quantizing the model parameters in the pruned current network model by using a linear quantization algorithm;
coding the current network model after the model parameters are quantized by using a Huffman coding algorithm;
and taking the coded current network model as a target convolutional neural network model.
Optionally, when using the pre-trained convolutional neural network model for identification, the n × m convolutional kernel operation is split into n × m multiplication operations and n × m-1 addition operations, and,
when n is an odd number, taking n-m-1 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of n-m-1 times of addition operation is completed to obtain an operation result of an n-m convolution kernel;
when n is an even number, taking n x m-2 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-2 times of addition operation is completed; and summing the sum of the n x m-2 times of addition operation and the addition operation which does not participate in the operation to obtain an operation result of the n x m convolution kernel.
Compared with the prior art, the invention has the following advantages:
by applying the embodiment of the invention, the target attribute characteristics and the scene attribute characteristics in the shot video stream data are extracted at the camera end, so that the problems of receiving, storing, analyzing and calculating thousands of paths of videos at the same time by the cloud computing center are solved, the requirements on upgrading of the transmission bandwidth and the computing capacity of the cloud computing center are reduced, and the cost is further saved.
Drawings
Fig. 1 is a schematic flowchart of an embedded video image parsing method based on edge computation according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a principle of an embedded video image parsing method based on edge computation according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a convolutional neural network compression in an embedded video image parsing method based on edge computation according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a pruning flow of an initial convolutional neural network in an embedded video image analysis method based on edge computation according to an embodiment of the present invention;
fig. 5 is another schematic diagram of an initial convolutional neural network pruning flow in an embedded video image analysis method based on edge computation according to an embodiment of the present invention;
fig. 6 is a schematic diagram of data flow before and after network parameter quantization in an embedded video image analysis method based on edge calculation according to an embodiment of the present invention;
fig. 7 is a schematic view of an FPGA implementation flow in an embedded video image parsing method based on edge computation according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a convolution acceleration operation in an embedded video image parsing method based on edge calculation according to an embodiment of the present invention;
fig. 9 is a schematic diagram illustrating a pooling acceleration operation in an embedded video image parsing method based on edge calculation according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an embedded video image parsing apparatus based on edge calculation according to an embodiment of the present invention.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
The embodiment of the invention provides an embedded video image analysis method and device based on edge calculation, and firstly introduces the embedded video image analysis method based on edge calculation provided by the embodiment of the invention.
It should be noted that, the embodiment of the present invention is preferably applied to the analysis of the image content of the camera in the existing large video monitoring network for security, transportation, and the like, and generally includes a plurality of cameras in communication connection with the monitoring center.
Example 1
Fig. 1 is a schematic flowchart of an embedded video image parsing method based on edge computation according to an embodiment of the present invention, and fig. 2 is a schematic diagram of a principle of the embedded video image parsing method based on edge computation according to an embodiment of the present invention; as shown in fig. 1 and 2, the method includes:
s101: identifying a preset target from a video shot by a camera, wherein the preset target comprises: one or a combination of a person, a vehicle, a building.
Fig. 3 is a schematic diagram illustrating a compression flow of a target convolutional neural network in an embedded video image parsing method based on edge computation according to an embodiment of the present invention; fig. 4 is a schematic diagram illustrating a pruning flow of an initial convolutional neural network in an embedded video image analysis method based on edge computation according to an embodiment of the present invention; fig. 5 is another schematic diagram of an initial convolutional neural network pruning flow in an embedded video image analysis method based on edge computation according to an embodiment of the present invention; as shown in figures 3-4 of the drawings,
specifically, the step may include the following steps:
a: an initial convolutional neural network with an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer is constructed and trained.
It can be understood that a convolutional neural network with a common structure can be used, and then a training set composed of monitoring images is used for training the constructed convolutional neural network, and the convolutional neural network can automatically adjust the weight parameters of the convolutional neural network according to the condition of the training set; meanwhile, the hyper-parameters of the convolutional neural network are adjusted manually, and then training of the convolutional neural network is completed, and the initial convolutional neural network is obtained.
B: and acquiring a conversion matrix aiming at pruning operation according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network.
Specifically, according to the number N of convolution kernels in the target convolution neural network obtained after the preset pruning and the number N of convolution kernels in the constructed initial convolution neural network, by using a formula,
Y=(N×c×k h ×k w ) -1 ·n×c×k h ×k w a transformation matrix for the pruning operation is obtained, wherein,
y is a conversion matrix for pruning operation; n is the number of convolution kernels in the initial convolution neural network; c is the number of channels corresponding to the characteristic diagram; k is a radical of h ×k w Is the size of the convolution kernel; and n is the number of convolution kernels in the target convolution neural network obtained after pruning.
C: and acquiring the minimum reconstruction error of each convolution kernel in the initial convolution neural network according to the conversion matrix and the weight of each convolution kernel.
Specifically, the C step may be a C1 step or a C2 step.
C1: based on the transformation matrix Y and the weights of the various convolution kernels, using a formula,
Figure BDA0002078200930000121
obtaining a minimized reconstruction error of each convolution kernel in the initial convolutional neural network, wherein,
min is a minimum evaluation function; beta is a length ofc, selecting vector coefficients corresponding to the channels; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
C2: for each convolution kernel, based on the transformation matrix Y and the weights of the respective convolution kernels, using a formula,
Figure BDA0002078200930000131
and acquiring the reconstruction error of each convolution kernel in the initial convolution neural network, wherein,
beta is a selection vector coefficient corresponding to the channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; λ is a penalty coefficient; | | non-woven hair 1 Is a norm function;
Figure BDA0002078200930000133
is any one of i; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
In practical applications, the method of obtaining the minimized reconstruction error is also called the LASSO regression method.
D: and eliminating the convolution kernels of which the corresponding minimum reconstruction errors exceed a preset numerical range to obtain the constructed target convolution neural network.
Specifically, the D step may be a D1 step or a D2 step.
D1: taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel left after the elimination, the weight matrix of the convolution kernel is kept unchanged, and by utilizing a formula,
Figure BDA0002078200930000134
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure BDA0002078200930000135
selecting a current value of the vector coefficient corresponding to the channel with the length c; argmin is a function minimum variable evaluation function;
determining | | | beta | | | non-conducting phosphor 0 Whether to converge;
if so, using a formula,
Figure BDA0002078200930000136
obtaining weights corresponding to the convolution kernel that minimizes the reconstruction error; taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of obtaining the current value of the selection vector coefficient corresponding to the channel with the length of c until | |. Beta | | computation is finished 0 Converging;
d2: taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel remained after the elimination, by using a formula,
Figure BDA0002078200930000141
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure BDA0002078200930000142
selecting a current value of the vector coefficient corresponding to the channel with the length of c; argmin is a function minimum variable evaluation function;
by means of the formula(s),
Figure BDA0002078200930000143
obtaining a current weight of a convolution kernel corresponding to the reconstruction error;
judging whether the reconstruction error corresponding to the current value of the selected vector coefficient and the current weight of the convolution kernel is converged;
if yes, taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of acquiring the current value of the selection vector coefficient corresponding to the channel with the length of c until the reconstruction error corresponding to the current value of the selection vector coefficient and the current weight of the convolution kernel is converged;
and taking the updated network model as the current network model, taking the next convolution layer of the current convolution layer as the current convolution layer, and returning to execute the step of removing the convolution kernels of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolution kernel in the current convolution layer in the current network model until each convolution layer of the current network model is pruned, thereby simplifying the structure of the network model and effectively reducing the calculated amount.
Fig. 6 is a schematic diagram of data flow before and after quantization in an embedded video image analysis method based on edge computation according to an embodiment of the present invention, and as shown in fig. 6, a linear quantization algorithm is used to quantize a weight parameter in a pruned current network model; then, the quantized model parameters are coded by using a Huffman coding algorithm; and taking the coded current network model as a final target convolutional neural network model. The storage scale of the network model after the quantization coding is reduced, and the storage requirement is reduced.
In practical application, in a trained model, parameters are stored in a 32-bit floating point mode, and the model trained by the large CNN network occupies hundreds of megabits of storage space, so that the model, namely a parameter quantization compression algorithm, can be further compressed by changing a parameter storage mode. In the practical application of the algorithm, the quantitative parameter setting is required according to the network structure characteristics. The quantization method can be as follows:
counting the maximum value and the minimum value of the parameters, dividing all the parameters by the difference between the maximum value and the minimum value in the parameters, multiplying the quotient of the obtained values by 256, mapping the values to an interval of 0-255, obtaining 8-bit parameters after quantization, and converting 32-bit floating point numbers into 8-bit integers. In practical applications, the quantization algorithm may be a non-linear quantization algorithm, etc.
Embodiments of the present invention exploit the non-uniform distribution of effective weights by quantizing the weights and by variable length coding, i.e., huffman coding, and characterize the weights using variable length coding without loss of training accuracy.
And E, deploying the target convolutional neural network on an embedded hardware platform, and then identifying a preset target by the embedded computing platform according to the content contained in each image in the video images shot by the camera.
In practical applications, the preset target to be identified in this step may be a target in a preset target list set manually, and the preset target list may be updated manually by an operator, or may be automatically identified by a system, and then automatically added.
In general, the important target is a person whose operation range exceeds a set range, a person who enters an alert area, a vehicle, a person who wears special clothes, or the like.
As shown in fig. 4, after a certain convolution kernel is culled, redundant neurons corresponding to the convolution kernel should be removed to simplify the network structure.
By applying the embodiment of the invention, the method realizes the compression of the weight of the convolutional neural network, thereby reducing and simplifying the structure of the convolutional neural network, reducing the storage amount of weight parameters and enabling the target convolutional neural network to realize the same operation speed and the equivalent target detection and identification effect under the embedded environment with less computing resources and small storage amount.
S102: for a preset target identified from a video, obtaining attribute features of the preset target and/or scene attribute features of the preset target, and forming description about scene image content, wherein the attribute features of the preset target include: the method comprises the steps that a preset target is a vehicle, and the type, the body color, the license plate, the position in an image and the like of the vehicle are recognized; the preset target is a person, and the sex, age, clothing, position in the image and the like of the person are identified; the preset target is one of buildings, and the type of the building, the position in the image and the like are identified; the scene attribute characteristics of the preset target include: the shooting time, the shooting place and the shooting angle of the original image are one or a combination.
In practical application, the extracted target attribute features and/or scene attribute features of the preset target can be structurally described by the recognition result, an image and a complete information frame combining corresponding description information thereof are generated, and a formatted information code stream conforming to a TCP/IP protocol is formed and uploaded to a network.
S103: and uploading the acquired attribute characteristics of the preset target and the scene attribute characteristics of the preset target to a monitoring center corresponding to the camera.
The obtained data is transmitted to a monitoring center through a video transmission network, so that the monitoring center can process the data received based on a big data technology, complete the tasks of storage, analysis, retrieval, statistics and the like of video images and contents thereof, and meet the requirement of a city or an area on rapid analysis processing and application of massive video contents.
By applying the embodiment shown in fig. 1 of the invention, the target attribute characteristics and the scene attribute characteristics in the shot video stream data are extracted at the camera end, so that the problems of receiving, storing, analyzing and calculating thousands of paths of videos at the same time by the cloud computing center are solved, the requirements on upgrading of the transmission bandwidth and the computing capacity of the cloud computing center are reduced, and the cost is further saved.
In addition, the device to which the method of the embodiment shown in fig. 1 of the present invention is applied can be in data connection with a plurality of cameras, and one device analyzes and uploads images shot by the plurality of cameras, so that the number of deployed devices is reduced, and the cost is saved.
Example 2
On the basis of embodiment 1 of the present invention, before the step S101, the method further includes:
s104: an original image in video stream data captured by a camera is acquired.
Specifically, model data of a camera can be acquired, and a video coding format of the camera is searched from a pre-stored model data-video coding format list according to the model data of the camera; and decoding the video stream data shot by the camera by using a decoding method corresponding to the video coding format, and restoring the original image shot by the camera.
The inventor also finds that in the security field, as the manufacturers of the cameras are numerous, the construction time span is large, and the specifications of the monitoring video terminal equipment, the image compression format and the coding format are different, in the embodiment of the invention, different decoding strategies are adopted according to the coding formats of the monitoring cameras with different specifications, information such as original images, image parameters and the like are decoded and recovered from the code streams output by the cameras, and then the preset target is identified, so that the purpose of being compatible with the existing cameras with different models is achieved.
Example 3
On the basis of embodiment 1 of the present invention, further, when the embodiment 1 of the present invention is executed, an ARM (Random Access Memory) may be used as a main control unit, and an FPGA (Field Programmable Gate Array) may be used as an acceleration unit to construct a hardware core platform architecture for identifying a preset target. Based on the hardware architecture, a preset target contained in each original image is identified by utilizing a pre-constructed convolutional neural network model, wherein the preset target comprises: one or a combination of a person, a vehicle, a building.
Specifically, when the convolutional neural network model is identified by using the pre-trained convolutional neural network model, the n × m convolutional kernel operation may be divided into n × m multiplication operations and n × m-1 addition operations, and,
when n x m is an odd number, taking n x m-1 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-1 times of addition operation is completed to obtain an operation result of an n x m convolution kernel;
when n x m is an even number, taking n x m-2 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-2 times of addition operation is completed; and summing the sum of the n x m-2 times of addition operation and the addition operation which does not participate in the operation to obtain an operation result of the n x m convolution kernel.
Fig. 7 is a schematic diagram illustrating an FPGA implementation flow in an embedded video image parsing method based on edge computation according to an embodiment of the present invention; fig. 8 is a schematic diagram illustrating a convolution acceleration operation in an embedded video image parsing method based on edge calculation according to an embodiment of the present invention; fig. 9 is a schematic diagram illustrating a pooling acceleration operation in an embedded video image parsing method based on edge calculation according to an embodiment of the present invention; as shown in fig. 7 to fig. 8, the processing method according to the embodiment of the present invention may be deployed on an FPGA, so as to perform the above-described operations according to the embodiment of the present invention.
An ARM + FPGA heterogeneous processing architecture is adopted, wherein the ARM is used as a control unit and mainly completes scheduling and task management of an algorithm; the FPGA serves as a core acceleration unit, acceleration processing is carried out on operations such as convolution, pooling and the like of a main body in the neural network, algorithm operation efficiency is improved, and real-time processing requirements are met.
Currently, in the first aspect, the deep learning algorithm is generally implemented by using python language by means of deep learning frameworks such as Caffe, tensrflow, and pitorch. Although the development difficulty of the algorithm is greatly reduced by adopting the deep learning framework, most resources are occupied by installing the deep learning framework on the embedded platform, so that the algorithm cannot meet the requirement of real-time processing. Therefore, the embodiment of the invention adopts the C/C + + language to realize the realization of the convolution network, avoids using a deep learning framework, saves the embedded platform resources and effectively improves the processing speed of the algorithm.
In the second aspect, no mature chip is currently available for use in embedded applications. The invention adopts ARM + FPGA Hardware Processing architecture, simulates CPU (Central Processing Unit) and GPU (Graphics Processing Unit) architecture, performs convolutional neural network accelerated computation by using the parallel Processing capability of the FPGA, and uses Verilog HDL (Hardware Description Language) Hardware Description Language to realize the acceleration effect similar to GPU so as to meet the requirement of real-time Processing.
Because the convolutional neural network has a large amount of convolution and pooling operations, and needs to consume more DSP (digital signal processing) processing resources and RAM (random access memory) storage resources in the calculation process, the ZYNQ7100 FPGA with relatively more calculation and storage resources is selected as the core of system processing in the embodiment of the invention. As shown in fig. 4, 2020 DSP slices are provided in the chip, and each multiplication and addition operation consumes 2 DSP slices for calculation, so that more than 1000 multiplication or addition operations can be performed in parallel in one clock cycle; the internal storage capacity is 26.5Mb, and the requirements of data caching of optical images, convolution templates, characteristic diagrams and the like are met; the internal logic processing unit is 444K, and can provide relatively sufficient logic operation resource guarantee for complex logic operation and control. Table 1 is a list of FPGA model numbers suitable for use in embodiments of the present invention, as shown in table 1,
TABLE 1
Figure BDA0002078200930000171
Figure BDA0002078200930000181
/>
In practical application, each multiplication operation in the deep learning network model occupies 2 DSP slices, and each addition occupies 2 DSP slices. The system takes a representative target detection and identification algorithm SSD as an example (Single Shot multi box Detector), the input image of the 1 st convolution layer in the algorithm network is 300 (image length) × 300 (image width) × 3 (image channel number), the convolution kernel size is 3 × 3, and the number is 64. The number of times of multiplication required is 300 × 300 × 3 × 3 × 3 × 64= 155520000.
If the above is executed once in the FPGA, more than 3 hundred million DSP slices are needed, which obviously cannot be satisfied in the real situation. Therefore, the convolution acceleration calculation architecture is designed in the FPGA, taking 3 × 3 convolution as an example, as shown in fig. 8, the operation is performed 9 times and 8 times, and then 9 × 2+8 × 2=34 DSP slices are required to be occupied during one operation. Since 2020 DSP slices are shared in the ZYNQ7100 type FPGA, the parallel computation can be performed about 59 times in one clock. Taking the first layer convolution as an example, the input is 300 × 300 × 3, and there are 64 convolution kernels, 3 × 3. Therefore, the 3 × 3 convolution architecture in the FPGA is used for supply and demand 300 × 300 × 3 × 64=17280000 times, and since each clock can be calculated 59 times in parallel, this operation requires 17280000/59=292881 clocks. The clock frequency of ZYNQ7100 is 250MHz, and the processing time is 1.17ms in theory. The calculation time does not take into account delay and read data time, and is an upper limit achievable by theoretical calculation.
The deep learning network is mainly composed of a convolutional layer and a pooling layer, and pooling operation can also be accelerated in the FPGA, wherein the acceleration architecture of maximum pooling operation in the FPGA is shown in fig. 9. The algorithm adopts 2 multiplied by 2 maximum pooling, and each step of pooling operation needs to occupy 3 DSP slices. Since 2020 DSP Slice units are shared in ZYNQ7100, 670 pooling structures can be executed in parallel. Taking pooling level 1 as an example, its input is 300 × 300 × 64, it requires 150 × 150 × 64=1440000 pooling, it requires 1440000/670=2150 clocks to be executed in parallel, it requires 0.0085ms. The computation time is mostly in convolution operations.
Based on comprehensive analysis, the SSD target detection and identification algorithm consumes about 230 milliseconds when running on the platform, the processing speed is higher than 4 frames/second, and the requirement of near-real-time analysis processing can be met. With the technical progress, the neural network algorithm can be conveniently transplanted on an advanced FPGA hardware platform or an AI chip, and the algorithm processing speed is further improved.
By applying the embodiment 3 of the invention, the convolution neural network can be accelerated to operate in the FPGA.
In practical application, when an AI (Artificial Intelligence) technology is used for identifying an image generated by a video monitoring system, an AI intelligent camera can be used for replacing a traditional camera, so that the camera can not only image, but also can understand image content, the conversion from the image to information is realized, and the load of back-end big data processing is greatly simplified. Therefore, the smart camera is a future development trend. However, in practical application, the AI smart camera is expensive, the price of the low-end AI smart camera reaches more than one hundred thousand yuan, and the price of the middle-end AI smart camera is more than one hundred thousand yuan or even dozens of ten thousand yuan. For a large number of existing widely distributed traditional monitoring video terminals, the AI intelligent cameras are replaced completely, so that the cost is huge, and the problems of huge resource waste and repeated construction are caused, so that the problems are not paid.
Therefore, in the embodiment 3 of the present invention, by using the embedded edge computing platform, content analysis can be performed on multiple existing video images of the camera at the same time, so as to identify the preset target, and then the attribute information of the preset target is sent to the monitoring center through the video monitoring network, so that the pressure for performing analysis processing on a large amount of video images at the same time is reduced through distributed processing, and a large amount of cost is saved.
Corresponding to the embodiment of the invention shown in fig. 1, the embodiment of the invention also provides an embedded video image analysis device based on edge calculation.
Fig. 10 is a schematic structural diagram of an embedded video image parsing apparatus based on edge computation according to an embodiment of the present invention, as shown in fig. 10, which is applied to parsing image content of a camera in a video monitoring network and can process a plurality of camera images communicatively connected to a monitoring center at the same time, where the apparatus includes:
an identifying module 1001, configured to identify a preset target from a video captured by a camera, where the preset target includes: one or a combination of a person, a vehicle, a building;
a first obtaining module 1002, configured to obtain, for a preset target identified from a video, an attribute feature of the preset target and/or a scene attribute feature of the preset target, where the attribute feature of the preset target includes: the preset target is a vehicle, the preset target is a person, and the preset target is one of buildings; the scene attribute characteristics of the preset target include: one or a combination of shooting time, shooting place and shooting angle of the original image;
an uploading module 1003, configured to upload the acquired attribute features of the preset target and the scene attribute features of the preset target to a monitoring center corresponding to the camera.
By applying the embodiment shown in fig. 10 of the invention, the target attribute characteristics and the scene attribute characteristics in the shot video stream data are extracted and transmitted at the position close to the monitoring camera, the original system architecture is not changed, the problem that a background simultaneously analyzes a large number of video images is solved, and the cost of upgrading and transforming the monitoring video network is saved.
In a specific implementation manner of the embodiment of the present invention, on the basis of the embodiment shown in fig. 10 of the present invention, there are added:
in a specific implementation manner of the embodiment of the present invention, the apparatus further includes:
and the second acquisition module is used for acquiring an original image in the video stream data shot by the camera and taking the original image as a video shot by the camera.
In a specific implementation manner of the embodiment of the present invention, the second obtaining module is configured to:
acquiring model data of a camera, and searching a video coding format of the camera from a pre-stored model data-video coding format list according to the model data of the camera;
and decoding the video stream data shot by the camera by using a decoding method corresponding to the video coding format, and restoring an original image shot by the camera.
In a specific implementation manner of the embodiment of the present invention, the identification module 1001 is configured to:
the method comprises the following steps that an ARM is used as a main control unit, an FPGA is used as a core acceleration unit to construct a hardware computing framework for identifying a preset target; based on the hardware architecture, a preset target contained in each original image China is identified by utilizing a pre-constructed convolutional neural network model, wherein the preset target comprises: one or a combination of a person, a vehicle, a building.
In a specific implementation manner of the embodiment of the present invention, the identification module 1001 includes a construction unit, configured to:
constructing an initial convolutional neural network with an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer, and training;
acquiring a conversion matrix aiming at pruning operation according to the number of convolution kernels in a target convolution neural network obtained after preset pruning and the number of convolution kernels in the constructed initial convolution neural network;
acquiring the minimum reconstruction error of each convolution kernel in the initial convolution neural network according to the conversion matrix and the weight of each convolution kernel;
and eliminating the convolution kernels of which the corresponding minimum reconstruction errors exceed a preset numerical range to obtain the constructed target convolution neural network.
In a specific implementation manner of the embodiment of the present invention, the building unit is configured to:
according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network, using a formula, Y = (N × c × k) h ×k w )- 1 ·n×c×k h ×k w A transformation matrix for the pruning operation is obtained, wherein,
y is a conversion matrix for pruning operation; n is the number of convolution kernels in the initial convolution neural network; c is the number of channels corresponding to the characteristic diagram; k is a radical of h ×k w Is the size of the convolution kernel; and n is the number of convolution kernels in the target convolution neural network obtained after pruning.
In a specific implementation manner of the embodiment of the present invention, the building unit is configured to:
based on the transform matrix and the weights of the various convolution kernels, using a formula,
Figure BDA0002078200930000201
obtaining a minimized reconstruction error of each convolution kernel in the initial convolutional neural network, wherein,
min is a minimum value evaluation function; beta is a selection vector coefficient corresponding to the channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
In a specific implementation manner of the embodiment of the present invention, the building unit is configured to:
for each convolution kernel, a transformation matrix is generated based on the transformation matrix and the weights of the respective convolution kernels, using a formula,
Figure BDA0002078200930000211
and acquiring the reconstruction error of each convolution kernel in the initial convolution neural network, wherein,
beta is a selection vector coefficient corresponding to the channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; λ is a penalty coefficient; | | non-woven hair 1 Is a norm function;
Figure BDA0002078200930000213
is any one of i; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
In a specific implementation manner of the embodiment of the present invention, the building unit is configured to:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
for each convolution kernel left after the elimination, the weight matrix of the convolution kernel is kept unchanged, and by using a formula,
Figure BDA0002078200930000214
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure BDA0002078200930000215
selecting a current value of the vector coefficient corresponding to the channel with the length of c; argmin is a function minimum variable evaluation function; />
Determining | | | beta | | | non-conducting phosphor 0 Whether to converge;
if so, by using a formula,
Figure BDA0002078200930000216
obtaining weights corresponding to the convolution kernel that minimizes the reconstruction error; taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as a target selection vector coefficient and a target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of obtaining the current value of the selection vector coefficient corresponding to the channel with the length of c until | |. Beta | | computation is finished 0 Converging;
and taking the updated current network model as the current network model, taking the next convolutional layer of the current convolutional layer as the current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as the target convolutional neural network model.
In a specific implementation manner of the embodiment of the present invention, the building unit is configured to:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel remained after the elimination, by using a formula,
Figure BDA0002078200930000221
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure BDA0002078200930000222
selecting a current value of the vector coefficient corresponding to the channel with the length c; argmin is a function minimum variable evaluation function;
by means of the formula(s),
Figure BDA0002078200930000223
obtaining a current weight of a convolution kernel corresponding to the reconstruction error;
judging whether the reconstruction error corresponding to the current value of the selected vector coefficient and the current weight of the convolution kernel is converged;
if yes, taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of acquiring the current value of the selection vector coefficient corresponding to the channel with the length of c until the reconstruction error corresponding to the current value of the selection vector coefficient and the current weight of the convolution kernel is converged;
and taking the updated current network model as the current network model, taking the next convolutional layer of the current convolutional layer as the current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as the target convolutional neural network model.
In a specific implementation manner of the embodiment of the present invention, the building unit is configured to:
quantizing the model parameters in the current network model after pruning by using a quantization algorithm;
then, coding the current network model after the model parameters are quantized by using a Huffman coding algorithm;
and taking the coded current network model as a target convolutional neural network model.
In a specific implementation manner of the embodiment of the present invention, the identification module 1001 is configured to:
when using a previously trained convolutional neural network model for identification, the n x m convolutional kernel operation is divided into n x m multiplication operations and n x m-1 addition operations, and,
when n x m is an odd number, taking n x m-1 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-1 times of addition operation is completed to obtain an operation result of an n x m convolution kernel;
when n x m is an even number, taking n x m-2 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-2 times of addition operation is completed; and summing the sum of the n x m-2 times of addition operation and the addition operation which does not participate in the operation to obtain an operation result of the n x m convolution kernel.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. An embedded video image analysis method based on edge calculation is characterized in that the method is applied to cameras in a video monitoring network, the video monitoring network comprises a plurality of cameras which are in communication connection with a monitoring center, and the method comprises the following steps:
the method for recognizing the preset target from the video shot by the camera comprises the following steps:
the method comprises the following steps that an ARM is used as a main control unit, an FPGA is used as a core acceleration unit to construct a hardware computing framework for identifying a preset target; based on the hardware computing architecture, recognizing preset targets contained in China in each original image by using a pre-constructed convolutional neural network model, wherein the preset targets comprise: one or a combination of a person, a vehicle, a building;
the method comprises the steps of acquiring attribute features of a preset target and/or scene attribute features of the preset target aiming at the preset target identified from a video, wherein the attribute features of the preset target comprise: when the preset target is a vehicle, one or a combination of the type, the body color, the license plate and the position of the vehicle; when the preset target is a person, one or a combination of the sex, the age, the clothing and the position of the person; when the preset target is a building, one or a combination of the position and the type of the building; the scene attribute characteristics of the preset target include: one or a combination of shooting time, shooting place and shooting angle of the original image;
and uploading the acquired attribute characteristics of the preset target and the scene attribute characteristics of the preset target to a monitoring center corresponding to the camera.
2. The embedded video image parsing method based on edge calculation as claimed in claim 1, wherein before the preset target is identified from the video captured by the camera, the method further comprises:
acquiring the model data of a camera, and searching the video coding format of the camera from a pre-stored model data-video coding format list according to the model data of the camera;
decoding video stream data shot by the camera by using a decoding method corresponding to the video coding format, and restoring an original image shot by the camera;
and taking the original image as a video shot by a camera.
3. The embedded video image parsing method based on edge computation of claim 1, wherein the pre-constructed target convolutional neural network is constructed by the following process:
constructing an initial convolutional neural network with an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer, and training;
acquiring a conversion matrix aiming at pruning operation according to the number of convolution kernels in a target convolution neural network obtained after preset pruning and the number of convolution kernels in the constructed initial convolution neural network;
acquiring the minimum reconstruction error of each convolution kernel in the initial convolution neural network according to the conversion matrix and the weight of each convolution kernel;
and eliminating the convolution kernels of which the corresponding minimum reconstruction errors exceed the preset numerical range to obtain the constructed target convolution neural network.
4. The method according to claim 3, wherein the obtaining of the transformation matrix for the pruning operation according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network comprises:
according to the number of convolution kernels in the target convolution neural network obtained after the preset pruning and the number of convolution kernels in the constructed initial convolution neural network, using a formula, Y = (N × c × k) h ×k w ) -1 ·n×c×k h ×k w A transformation matrix for the pruning operation is obtained, wherein,
y is a conversion matrix for pruning operation; n is the number of convolution kernels in the initial convolution neural network;c is the number of channels corresponding to the characteristic diagram; k is a radical of h ×k w Is the size of the convolution kernel; and n is the number of convolution kernels in the target convolution neural network obtained after pruning.
5. The embedded video image parsing method based on edge computation of claim 3, wherein the obtaining the minimized reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel comprises:
based on the transformation matrix and the weights of the individual convolution kernels, using a formula,
Figure FDA0003957172620000031
obtaining a minimized reconstruction error of each convolution kernel in the initial convolutional neural network, wherein,
min is a minimum evaluation function; beta is a selection vector coefficient corresponding to the channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; | | non-woven hair F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w is a group of T A transpose matrix that is a weight matrix of the convolution kernel; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
6. The embedded video image parsing method based on edge computation of claim 3, wherein the obtaining the minimized reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel comprises:
for each convolution kernel, based on the transformation matrix and the weights of the respective convolution kernels, using a formula,
Figure FDA0003957172620000032
and acquiring the reconstruction error of each convolution kernel in the initial convolution neural network, wherein,
beta is a selection vector coefficient corresponding to the channel with the length of c; beta is a i Marking the batch of the ith channel; w is a weight matrix of the convolution kernel; n is the number of convolution kernels in the initial convolution neural network; i O F Is a norm function; y is a conversion matrix for pruning operation; sigma is a summation function; x i A slice matrix for the ith channel; w T A transpose matrix that is a weight matrix of the convolution kernel; lambda is a penalty coefficient; i O 1 Is a norm function;
Figure FDA0003957172620000033
is any one of i; c' is the number of channels reserved after pruning; c is the number of channels corresponding to the characteristic diagram; | | non-woven hair 0 Is a zero norm function.
7. The embedded video image analysis method based on edge computing as claimed in claim 6, wherein the removing the convolution kernel whose corresponding minimum reconstruction error exceeds the preset value range to obtain the constructed target convolution neural network comprises:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
for each convolution kernel left after the elimination, the weight matrix of the convolution kernel is kept unchanged, and by using a formula,
Figure FDA0003957172620000041
obtain the current value of the select vector coefficient corresponding to the channel of length c, wherein->
Figure FDA0003957172620000042
Selecting vector coefficients for a channel of length cA previous value; argmin is a function minimum variable evaluation function;
determining | | | beta | | | non-conducting phosphor 0 Whether to converge;
if so, by using a formula,
Figure FDA0003957172620000043
obtaining weights corresponding to the convolution kernel that minimizes the reconstruction error; taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of obtaining the current value of the selection vector coefficient corresponding to the channel with the length of c until | |. Beta | | computation is finished 0 Converging;
and taking the updated current network model as the current network model, taking the next convolutional layer of the current convolutional layer as the current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset numerical range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as the target convolutional neural network model.
8. The embedded video image analysis method based on edge computing according to claim 6, wherein the removing the convolution kernel whose corresponding minimized reconstruction error exceeds the preset numerical range to obtain the constructed target convolution neural network comprises:
taking the initial convolutional neural network as a current network model, and removing a convolutional kernel of which the corresponding minimum reconstruction error exceeds a preset numerical range aiming at each convolutional kernel in a current convolutional layer in the current network model;
aiming at each convolution kernel remained after the elimination, by using a formula,
Figure FDA0003957172620000051
obtaining a current value of a selection vector coefficient corresponding to a channel with length c, wherein,
Figure FDA0003957172620000052
selecting a current value of the vector coefficient corresponding to the channel with the length of c; argmin is a function minimum variable evaluation function;
by means of the formula (I) and (II),
Figure FDA0003957172620000053
obtaining a current weight of a convolution kernel corresponding to the reconstruction error;
judging whether the reconstruction error corresponding to the current value of the selected vector coefficient and the current weight of the convolution kernel is converged;
if yes, taking the current value of the selection vector coefficient corresponding to the channel with the length of c and the weight of the convolution kernel corresponding to the minimized reconstruction error as the target selection vector coefficient and the target convolution kernel weight of the convolution kernel, and updating the current network model according to the target selection vector coefficient and the target convolution kernel weight;
if not, updating the penalty coefficient according to a preset step length, and returning to the step of acquiring the current value of the selection vector coefficient corresponding to the channel with the length of c until the reconstruction error corresponding to the current value of the selection vector coefficient and the current weight of the convolution kernel is converged;
and taking the updated current network model as a current network model, taking the next convolutional layer of the current convolutional layer as a current convolutional layer, returning to execute the step of removing the convolutional cores of which the corresponding minimum reconstruction errors exceed the preset value range aiming at each convolutional core in the current convolutional layer in the current network model until each convolutional layer of the current network model is pruned, and taking the pruned current network model as a target convolutional neural network model.
9. The embedded video image parsing method based on edge calculation of claim 1, wherein the n x m convolution kernel operation is split into n x m multiplication operations and n x m-1 addition operations when using the pre-trained convolution neural network model for identification, and,
when n x m is an odd number, taking n x m-1 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-1 times of addition operation is completed to obtain an operation result of an n x m convolution kernel;
when n is an even number, taking n x m-2 times of addition operation as current operation, summing every two operations in the current operation to obtain a summed operation result, taking the summed operation result as current operation, and returning to execute the step of summing every two operations in the current operation to obtain the summed operation result until the summation of the n x m-2 times of addition operation is completed; and summing the sum of the n x m-2 times of addition operation and the addition operation which does not participate in the operation to obtain an operation result of the n x m convolution kernel.
CN201910461504.6A 2019-05-30 2019-05-30 Embedded video image analysis method and device based on edge calculation Active CN110210378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910461504.6A CN110210378B (en) 2019-05-30 2019-05-30 Embedded video image analysis method and device based on edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910461504.6A CN110210378B (en) 2019-05-30 2019-05-30 Embedded video image analysis method and device based on edge calculation

Publications (2)

Publication Number Publication Date
CN110210378A CN110210378A (en) 2019-09-06
CN110210378B true CN110210378B (en) 2023-04-07

Family

ID=67789601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910461504.6A Active CN110210378B (en) 2019-05-30 2019-05-30 Embedded video image analysis method and device based on edge calculation

Country Status (1)

Country Link
CN (1) CN110210378B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796245B (en) * 2019-10-25 2022-03-22 浪潮电子信息产业股份有限公司 Method and device for calculating convolutional neural network model
CN112991192B (en) * 2019-12-18 2023-07-25 杭州海康威视数字技术股份有限公司 Image processing method, device, equipment and system thereof
CN112541096B (en) * 2020-07-27 2023-01-24 中咨数据有限公司 Video monitoring method for smart city
CN112486677B (en) * 2020-11-25 2024-01-12 深圳市中博科创信息技术有限公司 Data graph transmission method and device
CN113542600B (en) * 2021-07-09 2023-05-12 Oppo广东移动通信有限公司 Image generation method, device, chip, terminal and storage medium
CN114170619B (en) * 2021-10-18 2022-08-19 中标慧安信息技术股份有限公司 Data checking method and system based on edge calculation
CN114566052B (en) * 2022-04-27 2022-08-12 华南理工大学 Method for judging rotation of highway traffic flow monitoring equipment based on traffic flow direction
CN116664872A (en) * 2023-07-26 2023-08-29 成都实时技术股份有限公司 Embedded image recognition method, medium and system based on edge calculation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012151777A1 (en) * 2011-05-09 2012-11-15 上海芯启电子科技有限公司 Multi-target tracking close-up shooting video monitoring system
CN107506695A (en) * 2017-07-28 2017-12-22 武汉理工大学 Video monitoring equipment failure automatic detection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012151777A1 (en) * 2011-05-09 2012-11-15 上海芯启电子科技有限公司 Multi-target tracking close-up shooting video monitoring system
CN107506695A (en) * 2017-07-28 2017-12-22 武汉理工大学 Video monitoring equipment failure automatic detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
复杂场景图像中人员目标检测算法的改进;郝叶林等;《五邑大学学报(自然科学版)》;20180215(第01期);全文 *

Also Published As

Publication number Publication date
CN110210378A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN110210378B (en) Embedded video image analysis method and device based on edge calculation
CN110084281B (en) Image generation method, neural network compression method, related device and equipment
CN111445026A (en) Deep neural network multi-path reasoning acceleration method for edge intelligent application
CN113595993A (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN112819252A (en) Convolutional neural network model construction method
CN114071141A (en) Image processing method and equipment
Chakraborty et al. MAGIC: Machine-learning-guided image compression for vision applications in Internet of Things
CN114169506A (en) Deep learning edge computing system framework based on industrial Internet of things platform
CN114331837A (en) Method for processing and storing panoramic monitoring image of protection system of extra-high voltage converter station
CN117354467A (en) Intelligent optimized transmission system for image data
CN112399177A (en) Video coding method and device, computer equipment and storage medium
CN111314707A (en) Data mapping identification method, device and equipment and readable storage medium
WO2023029559A1 (en) Data processing method and apparatus
CN108675071B (en) Cloud cooperative intelligent chip based on artificial neural network processor
Gao et al. Triple-partition network: collaborative neural network based on the ‘end device-edge-cloud’
CN112884118A (en) Neural network searching method, device and equipment
CN116644783A (en) Model training method, object processing method and device, electronic equipment and medium
CN116095183A (en) Data compression method and related equipment
CN112926517B (en) Artificial intelligence monitoring method
CN115189474A (en) Power distribution station electric energy meter identification method and system based on raspberry group 4B
CN113919479B (en) Method for extracting data features and related device
CN112446859A (en) Satellite-borne thermal infrared camera image cloud detection method based on deep learning
CN115409150A (en) Data compression method, data decompression method and related equipment
CN114170560B (en) Multi-device edge video analysis system based on deep reinforcement learning
CN109919203A (en) A kind of data classification method and device based on Discrete Dynamic mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant