CN113743221A - Multi-view pedestrian behavior identification method and system under edge computing architecture - Google Patents

Multi-view pedestrian behavior identification method and system under edge computing architecture Download PDF

Info

Publication number
CN113743221A
CN113743221A CN202110891098.4A CN202110891098A CN113743221A CN 113743221 A CN113743221 A CN 113743221A CN 202110891098 A CN202110891098 A CN 202110891098A CN 113743221 A CN113743221 A CN 113743221A
Authority
CN
China
Prior art keywords
video data
human behavior
behavior
visual angles
different visual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110891098.4A
Other languages
Chinese (zh)
Other versions
CN113743221B (en
Inventor
王雪
游伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110891098.4A priority Critical patent/CN113743221B/en
Publication of CN113743221A publication Critical patent/CN113743221A/en
Application granted granted Critical
Publication of CN113743221B publication Critical patent/CN113743221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a multi-view human behavior identification method and system under an edge computing architecture, and belongs to the technical field of human behavior identification. The method comprises the following steps: the method comprises the steps that a camera set shoots the same scene from different visual angles, human behavior video data of different visual angles are obtained and transmitted to edge computing nodes connected with the camera set, the human behavior video data to be recognized at different visual angles in the same time period are collected and stored and subjected to data preprocessing, the data are input to a human behavior feature encoder to obtain multi-visual angle human behavior feature vectors, a cloud server receives the multi-visual angle human behavior feature vectors uploaded by the edge computing nodes and inputs the multi-visual angle human behavior feature vectors to a human behavior recognition model to obtain human behavior recognition results. Through the extraction of the human behavior characteristics on the edge computing nodes, the cloud server performs human behavior classification, so that the computing pressure of the cloud server is reduced, and the real-time performance of recognition is improved; the multi-view human behavior information is collected and utilized, the expression capability of the characteristics is improved, and the accuracy of human behavior identification is improved.

Description

Multi-view pedestrian behavior identification method and system under edge computing architecture
Technical Field
The present application relates to the field of pedestrian behavior identification technologies, and in particular, to a method and a system for identifying a multi-view pedestrian behavior under an edge computing architecture.
Background
The behavior and meaning of people can be judged through the image data of the camera set by the pedestrian behavior identification technology, and the pedestrian behavior identification technology has important significance for improving the automation and intelligence level of a security monitoring system and ensuring the stability and order of social production and life. In the existing human behavior identification method, image data acquired by a camera needs to be uploaded to a cloud server, a large amount of video data is stored in the cloud server, and a data label is marked by adopting a mode of manually checking videos.
In the related art, in order to reduce the workload of manual labeling, a technical route of an automatic supervision learning method is adopted. However, on the one hand, when there is a case where an object is occluded or a human body is occluded by itself, the recognition accuracy of the self-supervised learning method is low. On the other hand, the self-supervision learning method runs on the end cloud server, occupies a large amount of computing resources of the cloud server, and causes high delay of human behavior recognition tasks.
Disclosure of Invention
The application discloses a multi-view behavior identification method and system under an edge computing architecture, which are used for solving the problems or at least partially solving the problems.
In a first aspect, an embodiment of the present invention discloses a method for identifying multi-view pedestrian behaviors under an edge computing architecture, where the method includes:
the method comprises the steps that a camera set shoots the same scene from different visual angles to obtain behavior video data of people to be identified from different visual angles, and the behavior video data of the people to be identified from different visual angles are transmitted to edge computing nodes connected with the camera set;
the edge computing node collects and stores the human behavior video data to be identified at different visual angles in the same time period, carries out data preprocessing on the human behavior video data to be identified at different visual angles in the same time period, inputs the preprocessed data into a human behavior feature encoder to obtain multi-visual-angle human behavior feature vectors and transmits the multi-visual-angle human behavior feature vectors to the cloud server;
and the cloud server receives the multi-view behavior feature vectors uploaded by the edge computing nodes, inputs the behavior feature vectors into a behavior recognition model, and obtains behavior recognition results of the behavior video data to be recognized at different views.
Optionally, the method further comprises:
the method comprises the following steps that a camera set shoots the same scene from different visual angles to obtain first same personal behavior video data of different visual angles, and the first same personal behavior video data of different visual angles are transmitted to edge computing nodes connected with the camera set;
the edge computing node collects and stores first same personal behavior video data of different visual angles in the same time period, data preprocessing is carried out on the first same personal behavior video data of different visual angles in the same time period, a preset personal behavior self-supervision characteristic learning model is trained on the basis of the preprocessed first same personal behavior video data of different visual angles, and the personal behavior characteristic encoder is obtained.
Optionally, the method further comprises:
the camera set shoots the same scene from different visual angles to obtain second sample pedestrian behavior video data of different visual angles, and the second sample pedestrian behavior video data of different visual angles are transmitted to the edge computing node connected with the camera set;
the edge computing node uploads a plurality of second sample human behavior video data with different visual angles, collects and stores the second sample human behavior video data with different visual angles in the same time period, performs data preprocessing on the second sample human behavior video data with different visual angles in the same time period, inputs the preprocessed data into a human behavior feature encoder to obtain a multi-visual angle human behavior feature vector, and transmits the multi-visual angle human behavior feature vector to a cloud server;
the cloud server receives the multi-view human behavior characteristic vectors uploaded by the edge computing nodes and the second sample human behavior video data of the preset number and different views, and trains a preset model according to the behavior category labels marked on the second sample human behavior video data of the preset number and the multi-view human behavior characteristic vectors to obtain a human behavior recognition model.
Optionally, the human behavior video data preprocessing of different view angles includes:
determining skeleton data of the human behavior video data at different visual angles according to the human behavior video data at different visual angles;
preprocessing the skeleton data of the pedestrian behavior video data at different visual angles to obtain skeleton sequences of the pedestrian behavior video data at different visual angles;
and fusing the skeleton sequences of the human behavior video data at different visual angles to obtain a fused skeleton fragment sequence.
Optionally, the method further comprises:
reordering the fusion framework fragment sequence obtained after pretreatment according to a plurality of ordering modes, and marking an ordering mode label;
training a preset human behavior self-supervision characteristic learning model based on the preprocessed first same human behavior video data with different visual angles, and the training comprises the following steps:
inputting the reordered fusion framework fragment sequence and the ordering mode label thereof into a human behavior self-supervision characteristic learning model for training.
Optionally, the determining, according to the human behavior video data at different viewing angles, skeleton data of the human behavior video data at different viewing angles includes:
calculating the positions of human body posture key points of each frame of picture in the human behavior video data at different visual angles, wherein the positions of the human body posture key points are skeleton data of the human behavior video data at different visual angles;
the calculation formula of the skeleton data is as follows:
Figure BDA0003195938210000031
wherein the content of the first and second substances,
Figure BDA0003195938210000032
the image of the ith frame of the camera is represented, x and y respectively represent the horizontal and vertical coordinates of the human body posture key points in the image, j represents the number of the human body posture key points, and N is the total number of the human body posture key points.
Optionally, the preprocessing the skeleton data of the human behavior video data at different viewing angles to obtain the skeleton sequence of the human behavior video data at different viewing angles includes:
subtracting the mean value of the coordinate positions of all the human posture key points in each frame image from the coordinate position of the human posture key point in each frame image, wherein the calculation formula is as follows:
Figure BDA0003195938210000033
determining the skeleton characteristics of each frame of image, wherein the calculation formula is as follows:
Figure BDA0003195938210000034
determining a skeleton sequence of the human behavior video data of different visual angles, wherein the calculation formula is as follows:
Figure BDA0003195938210000041
performing normalization processing on the skeleton sequences of the human behavior video data at different visual angles, wherein the calculation formula is as follows:
Figure BDA0003195938210000042
wherein the content of the first and second substances,
Figure BDA0003195938210000043
as a skeletal feature of the t frame image numbered as the ith camera, SiFor people from different perspectives to behave as a sequence of video data skeletons,
Figure BDA0003195938210000044
the human behavior video data skeleton sequences of different visual angles after normalization.
Optionally, the fusing the skeleton sequences of the human behavior video data at different viewing angles to obtain a fused skeleton segment sequence includes:
dividing the human behavior video data skeleton sequence of each different visual angle equally according to time nodes to obtain a plurality of skeleton segments;
randomly extracting any one of the skeleton segments corresponding to each time node, and fusing the skeleton segments corresponding to the multiple time nodes to obtain skeleton segment sequences of the human behavior video data at different visual angles.
Optionally, the human behavior recognition model outputs the human behavior recognition prediction result according to the following formula:
Figure BDA0003195938210000045
wherein f isfusionAs a human behavior classifier, g (X)1),g(X2),g(X3) M is a multi-view human behavior feature vector, and is a human behavior recognition prediction result, (i) represents the ith element of the vector, and K is the number of the types of behaviors to be recognized.
In a second aspect, an embodiment of the present invention discloses a multi-view person behavior recognition system under an edge computing architecture, where the system includes:
the camera set shoots the same scene from different visual angles to obtain the behavior video data of the people to be identified from different visual angles, and transmits the behavior video data of the people to be identified from different visual angles to the edge computing node connected with the behavior video data;
the edge computing node is used for receiving the human behavior video data of different visual angles transmitted by the camera group, carrying out data preprocessing on the human behavior video data of different visual angles, and training a preset human behavior self-supervision characteristic learning model based on the human behavior video data of different visual angles after data preprocessing to obtain the human behavior characteristic encoder; the human behavior video data of different visual angles are transmitted to a cloud server, data preprocessing is carried out on the human behavior video data of different visual angles, the human behavior video data are input to a human behavior feature encoder, multi-visual-angle human behavior feature vectors are obtained, and the human behavior feature vectors are transmitted to the cloud server;
the cloud server is used for uploading the multi-view behavior characteristic vectors and the behavior video data at different views by the edge computing node, determining behavior category labels of manual labeling according to the behavior video data at different views, and training a preset model according to the behavior category labels of the manual labeling and the multi-view behavior characteristic vectors to obtain a behavior recognition model; and receiving the multi-view pedestrian behavior feature vectors uploaded by the edge computing nodes, and inputting the pedestrian behavior feature vectors into a pedestrian behavior identification model to obtain pedestrian behavior identification results of the pedestrian behavior video data at different views.
Compared with the prior art, the method has the following advantages:
the application provides a multi-view person behavior identification method and system under an edge computing architecture, which comprises the following steps: the method comprises the steps that a camera set shoots the same scene from different visual angles to obtain pedestrian behavior video data of different visual angles, the pedestrian behavior video data are transmitted to edge computing nodes connected with the camera set, the pedestrian behavior video data to be identified of different visual angles in the same time period are collected and stored and subjected to data preprocessing, the data are input to a pedestrian behavior feature encoder to obtain multi-visual-angle pedestrian behavior feature vectors, a cloud server receives the multi-visual-angle pedestrian behavior feature vectors uploaded by the edge computing nodes, the pedestrian behavior feature vectors are input to a pedestrian behavior identification model, and pedestrian behavior identification results of the pedestrian behavior video data of different visual angles are obtained.
By performing behavior feature extraction on the edge computing nodes, the cloud server only needs to perform behavior classification, so that the computing pressure of the cloud server is reduced, and the real-time performance of recognition is improved; by collecting the human behavior video data information to be identified at different visual angles, the distinguishing degree of the features is improved, the expression capability of the features is further improved, and the accuracy of human behavior identification is improved; and the multi-view human behavior characteristics can be automatically supervised and learned by utilizing the computing resources of the edge computing nodes under the condition of not manually labeling human behavior data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a multi-view behavior recognition system under an edge computing architecture according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a multi-view behavior recognition method under an edge computing architecture according to an embodiment of the present invention;
FIG. 3 is a flow chart of the training steps of the human behavior feature encoder in an embodiment of the present invention;
FIG. 4 is a flowchart illustrating steps of data preprocessing performed on pedestrian video data from different viewing angles according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating key point numbering of skeleton data according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of multi-view skeleton segment fusion in an embodiment of the invention;
FIG. 7 is a diagram of an autonomic behavior supervised feature learning model in an embodiment of the present invention;
FIG. 8 is a schematic diagram of a human behavior feature encoder according to an embodiment of the present invention;
FIG. 9 is a flowchart illustrating the training steps of the human behavior recognition model according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a human behavior classifier in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the related technology, a technical route of an automatic supervision learning method is adopted, the method comprises the steps of firstly training a deep neural network through a preposed task, then taking the deep neural network obtained by the training of the preposed task as a feature encoder, finally extracting human behavior features by using the feature encoder, and training a classifier (a full connection layer, a nearest neighbor classifier, a support vector machine and the like) with a simple structure by using the features of a small number of samples and labels.
However, the current self-supervised learning methods for skeleton are still few, and have the following disadvantages:
1. only aiming at the single-view camera, when the conditions of object shielding or human body self shielding exist, the pedestrian behavior identification accuracy rate can be greatly reduced. At present, a large number of cameras are deployed in a public area, a plurality of cameras are often adopted to shoot the same scene from different visual angles, the self-supervision learning method of the single-visual-angle camera cannot simultaneously utilize the advantages of the multi-visual-angle cameras, the distinguishability of the single-visual-angle features is not high, the difficulty of subsequent classification tasks is increased, and the improvement of the human behavior identification accuracy rate is limited.
2. The self-supervision learning method runs on the cloud server, occupies a large amount of computing resources of the cloud server, and causes high delay of human behavior recognition tasks.
Therefore, the technical idea of the invention is proposed: the method comprises the steps that a camera set shoots the same scene from different visual angles, pedestrian behavior video data of different visual angles are obtained and transmitted to edge computing nodes connected with the camera set, the pedestrian behavior video data to be recognized of different visual angles in the same time period are collected and stored and subjected to data preprocessing, the data are input to a pedestrian behavior feature encoder to obtain multi-visual-angle pedestrian behavior feature vectors, a cloud server receives the multi-visual-angle pedestrian behavior feature vectors uploaded by the edge computing nodes and inputs the pedestrian behavior feature vectors to a pedestrian behavior recognition model, and pedestrian behavior recognition results of the pedestrian behavior video data of different visual angles are obtained. By performing behavior feature extraction on the edge computing nodes, the cloud server only needs to perform behavior classification, so that the computing pressure of the cloud server is reduced, and the real-time performance of recognition is improved; by collecting the human behavior video data information to be identified at different visual angles, the distinguishing degree of the features is improved, the expression capability of the features is further improved, and the accuracy of human behavior identification is improved; and the multi-view human behavior characteristics can be automatically supervised and learned by utilizing the computing resources of the edge computing nodes under the condition of not manually labeling human behavior data.
Referring to fig. 1, the present invention provides a multi-view person behavior recognition system under an edge computing architecture, the system including:
the camera set shoots the same scene from different visual angles to obtain the behavior video data of the people to be identified from different visual angles, and transmits the behavior video data of the people to be identified from different visual angles to the edge computing node connected with the behavior video data;
the edge computing node is used for receiving the human behavior video data of different visual angles transmitted by the camera group, carrying out data preprocessing on the human behavior video data of different visual angles, and training a preset human behavior self-supervision characteristic learning model based on the human behavior video data of different visual angles after data preprocessing to obtain the human behavior characteristic encoder; the human behavior video data of different visual angles are transmitted to a cloud server, data preprocessing is carried out on the human behavior video data of different visual angles, the human behavior video data are input to a human behavior feature encoder, multi-visual-angle human behavior feature vectors are obtained, and the human behavior feature vectors are transmitted to the cloud server;
the cloud server is used for uploading the multi-view behavior characteristic vectors and the behavior video data at different views by the edge computing node, determining behavior category labels of manual labeling according to the behavior video data at different views, and training a preset model according to the behavior category labels of the manual labeling and the multi-view behavior characteristic vectors to obtain a behavior recognition model; and receiving the multi-view pedestrian behavior feature vectors uploaded by the edge computing nodes, and inputting the pedestrian behavior feature vectors into a pedestrian behavior identification model to obtain pedestrian behavior identification results of the pedestrian behavior video data at different views.
In this embodiment, the camera group is composed of a plurality of cameras, and serves as a sensing end, the cameras adopt a network high-definition CCD with a resolution of 1280 × 720, a frame rate is 25 frames/second, and the cameras should support an RTSP real-time streaming protocol. The edge computing node adopts a high-performance workstation and is provided with an Intel Xeon E5-2640v4@2.4GHz processor, a 64G memory and an NVIDIA RTX3090 GPU. The software platform of the edge computing node adopts Anaconda software to build a Python operating environment, and installs an NVIDIA CUDA operating library and CUDNN acceleration software. The PyTorch deep learning library is installed. The openpos pose estimation toolkit is installed and the Python interface program is compiled. The cloud adopts a rack-mounted cloud server, and is provided with an Intel Xeon Silver 4114@2.2GHz processor, a 128G memory and an NVIDIA TESLA V100 GPU. A software platform of the cloud server adopts Anaconda software to build a Python operating environment, and an NVIDIA CUDA operating library and CUDNN acceleration software are installed. The PyTorch deep learning library is installed. The cameras and the edge computing nodes, the edge computing nodes and the cloud server are connected through network cables, and fixed IP addresses are used for communication.
Based on the same inventive concept, an embodiment of the present invention provides a method for identifying a multi-view behavior under an edge computing architecture, where an implementation environment of the method may be a multi-view behavior identification system under an edge computing architecture shown in fig. 1. Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a multi-view behavior identification method under an edge computing architecture according to an embodiment of the present application, where the method includes the following steps:
step S201: the camera set shoots the same scene from different visual angles to obtain the behavior video data of the people to be identified from different visual angles, and transmits the behavior video data of the people to be identified from different visual angles to the edge computing node connected with the same.
When the same scene is shot, a plurality of cameras are erected around the scene for shooting, the erection angles of the cameras are different or the heights of the cameras are different, video data at different visual angles in the same scene can be collected, namely behavior video data of people to be recognized at different visual angles can be collected, and the behavior video data of people to be recognized at different visual angles are transmitted to edge computing nodes arranged near a camera group through network cables and gateway equipment.
Step S202: the edge computing node collects and stores the human behavior video data to be recognized at different visual angles in the same time period, carries out data preprocessing on the human behavior video data to be recognized at different visual angles in the same time period, inputs the preprocessed data into a human behavior feature encoder, obtains multi-visual-angle human behavior feature vectors, and transmits the multi-visual-angle human behavior feature vectors to the cloud server.
After the edge computing node receives video streams of pedestrian behavior video data to be identified at different visual angles, a synchronous image acquisition task of a multi-visual-angle video is executed on the edge computing node, and the task specifically comprises the following steps: continuously collecting and storing human behavior video clips with different visual angles in the same time period on edge nodes
Figure BDA0003195938210000091
Where C is the number of cameras, i 1,2, C is the camera number, T represents the number of image frames per video segment,
Figure BDA0003195938210000092
representing the t frame image of the ith camera. The collected human behavior video data at different visual angles are extracted through a multi-visual-angle video human behavior framework, framework data are preprocessed, multi-visual-angle framework sequences are fused and then input into a human behavior feature encoder, the human behavior feature encoder outputs multi-visual-angle human behavior feature vectors corresponding to human behavior video data to be identified at different visual angles, all the multi-visual-angle human behavior features are spliced into one feature vector, and the feature vector is uploaded to a cloud server. By adopting the human behavior video data of different visual angles for shooting the same scene from different visual angles, the problem of insufficient distinguishing degree of single visual angle features is solved, the feature expression capability is improved, the difficulty of classification tasks is reduced, and the accuracy of human behavior classification tasks is improved.
Step S203: and the cloud server receives the multi-view behavior feature vectors uploaded by the edge computing nodes, inputs the behavior feature vectors into a behavior recognition model, and obtains behavior recognition results of the behavior video data to be recognized at different views.
And the cloud server receives the pedestrian behavior characteristic vector uploaded by the edge computing node, inputs the characteristic vector into a pedestrian behavior recognition model of the cloud server, and completes a final behavior recognition task. Because the feature extraction task is completed by the edge computing node, only the human behavior recognition task is executed at the cloud end, and the feature extraction task of the human behavior video data to be recognized at different visual angles is not executed. Therefore, the computing pressure of the cloud server is reduced, and the real-time performance of behavior identification is improved.
In the embodiment, through the steps, the human behavior feature encoder obtained by performing self-supervision learning on the multi-view human behavior features on the edge computing nodes is utilized, human behavior feature extraction can be realized on the edge nodes, and the cloud server only needs to run a simple human behavior classification model, so that the computing pressure of the cloud server is reduced, and the real-time performance of human behavior identification is improved; meanwhile, by utilizing the multi-view human behavior information, the expression capability of the characteristics can be improved, and the accuracy of human behavior identification can be improved.
The processing flow of the data to be identified for the multi-view person behavior identification is similar to the processing flow of the sample data, and the differences are only in the target object difference and the subsequent operation difference.
As shown in fig. 3, training the self-supervised feature learning model of the preset human behavior includes the following steps:
step S200-1: the method comprises the following steps that a camera set shoots the same scene from different visual angles to obtain first same personal behavior video data of different visual angles, and the first same personal behavior video data of different visual angles are transmitted to edge computing nodes connected with the camera set;
step S200-2: the edge computing node collects and stores first same personal behavior video data of different visual angles in the same time period, data preprocessing is carried out on the first same personal behavior video data of different visual angles in the same time period, a preset personal behavior self-supervision characteristic learning model is trained on the basis of the preprocessed first same personal behavior video data of different visual angles, and the personal behavior characteristic encoder is obtained.
In the embodiment, after the plurality of cameras finish collecting the first same personal behavior video data of different visual angles, the collected first same personal behavior video data of different visual angles are transmitted to the edge computing node directly connected with the same in a video stream mode. After the edge computing node receives video streams of pedestrian behavior video data to be identified at different visual angles, a synchronous image acquisition task of a multi-visual-angle video is executed on the edge computing node, and the task specifically comprises the following steps: continuously collecting and storing human behavior video clips with different visual angles in the same time period on edge nodes
Figure BDA0003195938210000101
Where C is the number of cameras, i 1,2, C is the camera number, T represents the number of image frames per video segment,
Figure BDA0003195938210000102
representing the t frame image of the ith camera. The collected human behavior video data of different visual angles are subjected to extraction of human behavior skeletons of the multi-visual-angle video, skeleton data preprocessing, fusion of multi-visual-angle skeleton sequences and training of a human behavior self-supervision feature learning model as source data, and a human behavior feature encoder is obtained. Under the condition of not manually labeling human behavior data, self-supervision learning is carried out on the multi-view human behavior characteristics by utilizing the computing resources of the edge computing nodes, and a human behavior characteristic encoder is obtained.
As shown in fig. 4, the data preprocessing is performed on the pedestrian behavior video data of different viewing angles, and includes the steps of:
step S200-2-1: determining skeleton data of the human behavior video data at different visual angles according to the human behavior video data at different visual angles;
step S200-2-2: preprocessing the skeleton data of the pedestrian behavior video data at different visual angles to obtain skeleton sequences of the pedestrian behavior video data at different visual angles;
step S200-2-3: and fusing the skeleton sequences of the human behavior video data at different visual angles to obtain a fused skeleton fragment sequence.
Further, in step S200-2-1, determining skeleton data of the human behavior video data of different viewing angles according to the human behavior video data of different viewing angles includes:
calculating the positions of human body posture key points of each frame of picture in the human behavior video data at different visual angles, wherein the positions of the human body posture key points are skeleton data of the human behavior video data at different visual angles;
the calculation formula of the skeleton data is as follows:
Figure BDA0003195938210000111
wherein the content of the first and second substances,
Figure BDA0003195938210000112
the image of the ith frame of the camera is represented, x and y respectively represent the horizontal and vertical coordinates of the human body posture key points in the image, j represents the number of the human body posture key points, and N is the total number of the human body posture key points.
In this embodiment, the calculation of the human behavior video data to be recognized at different viewing angles or the calculation of the sample human behavior video data at different viewing angles is performed. The skeleton comprises 18 human posture key points in total, as shown in figure 5.
Further, in step S200-2-2, the skeleton data of the human behavior video data at different viewing angles is preprocessed to obtain skeleton sequences of the human behavior video data at different viewing angles:
subtracting the mean value of the coordinate positions of all the human posture key points in each frame image from the coordinate position of the human posture key point in each frame image, wherein the calculation formula is as follows:
Figure BDA0003195938210000113
determining the skeleton characteristics of each frame of image, wherein the calculation formula is as follows:
Figure BDA0003195938210000114
determining a skeleton sequence of the human behavior video data of different visual angles, wherein the calculation formula is as follows:
Figure BDA0003195938210000121
performing normalization processing on the skeleton sequences of the human behavior video data at different visual angles, wherein the calculation formula is as follows:
Figure BDA0003195938210000122
in the formula (I), the compound is shown in the specification,
Figure BDA0003195938210000123
skeleton features of the t frame image of the camera numbered i, SiFor people from different perspectives to behave as a sequence of video data skeletons,
Figure BDA0003195938210000124
the human behavior video data skeleton sequences of different visual angles after normalization.
Further, in the step S200-2-3, the fusing the skeleton sequences of the human behavior video data at different view angles to obtain a fused skeleton segment sequence includes:
dividing the human behavior video data skeleton sequence of each different visual angle equally according to time nodes to obtain a plurality of skeleton segments;
randomly extracting any one of the skeleton segments corresponding to each time node, and fusing the skeleton segments corresponding to the multiple time nodes to obtain skeleton segment sequences of the human behavior video data at different visual angles.
In the present embodiment, 3 cameras (C ═ 3) are used as an example, and the procedure is as shown in fig. 6. First, each fragment of the framework sequence
Figure BDA0003195938210000125
Is divided into 3 skeleton segments according to the time sequence
Figure BDA0003195938210000126
Figure BDA0003195938210000127
Then, a set of first skeleton segments containing 3 cameras is constructed
Figure BDA0003195938210000128
Randomly extracting a fragment X therefrom1. In the same way respectively from
Figure BDA0003195938210000129
And
Figure BDA00031959382100001210
middle random fragment X2And X3Obtaining a skeleton fragment sequence [ X ] containing multi-view human behavior information1,X2,X3]。
It should be noted that the to-be-recognized behavior video data, the second sample behavior video data and the first sample behavior video data all need to be subjected to data preprocessing, the data preprocessing processes are completely the same, only the operations executed after the processing are different, the to-be-recognized behavior video data and the second sample behavior video data are used as input and input into a behavior feature encoder to extract multi-view behavior feature vectors, the first sample behavior video data is used as input, and a behavior self-supervision feature learning model is input to perform training.
The fusion framework segment sequence after data preprocessing can be used as input to train a human behavior self-supervision characteristic learning model, and further, the extraction accuracy of a human behavior characteristic encoder is improved.
The fusion framework fragment sequence obtained after pretreatment can be reordered according to a plurality of ordering modes, and an ordering mode label is marked;
and training a preset human behavior self-supervision characteristic learning model based on the preprocessed first same human behavior video data with different visual angles, wherein the training comprises the following steps:
inputting the reordered fusion framework fragment sequence and the ordering mode label thereof into a human behavior self-supervision characteristic learning model for training.
In this embodiment, the fusion framework fragment sequences obtained after the pre-treatment are randomly reordered to obtain reordered framework fragment sequences
Figure BDA0003195938210000131
The sorting modes are 6 in total, and the sorting modes and labels thereof are shown in table 1. Reordered backbone fragment sequences
Figure BDA0003195938210000132
And the sequencing mode label is input into a self-supervision characteristic learning model for training.
TABLE 1
Sorting mode Label (R)
[X1,X2,X3] 0
[X1,X3,X2] 1
[X2,X1,X3] 2
[X2,X3,X1] 3
[X3,X1,X2] 4
[X3,X2,X1] 5
As shown in fig. 7, in the human behavior self-supervision feature learning model, after the reordered skeleton segment sequence and the ordering mode label are input, the ordering mode needs to be determined according to the input skeleton segment sequence. Wherein g represents a human behavior feature encoder, and h represents a sorting mode classifier. The training process of the human behavior self-supervision feature learning model is as follows: and respectively inputting the 3 skeleton segments obtained by fusing the multi-view skeleton segments into a human behavior feature encoder g, respectively encoding each skeleton segment into 128-dimensional features by the encoder, and splicing the 3 128-dimensional features into 384-dimensional human behavior feature vectors. And inputting the vector into a sorting mode classifier h, and outputting a predicted value of the sorting mode. By using
Figure BDA0003195938210000141
The probability distribution of the predicted value of the sorting mode is represented as follows:
Figure BDA0003195938210000142
and generating a true value y of the sorting mode by adopting one-hot coding on the sorting mode label. The model loss was calculated using the following loss function:
Figure BDA0003195938210000143
and training a human behavior self-supervision characteristic learning model by adopting a random gradient descent method to obtain a human behavior characteristic encoder g and a sorting mode classifier h.
In a possible embodiment, the human behavior feature encoder is configured as shown in fig. 8, and has a skeleton segment as an input and outputs a feature vector of a fixed length. Taking the length T of the video sequence as an example, after the skeleton sequence is divided into 3 segments on average, the length of each skeleton sequence segment is 32, each frame in the sequence includes x and y coordinates of 18 key points, and the data dimension input to the feature encoder is (32, 36). Fig. 9 "Conv 1D" shows the one-dimensional convolutional layer [4], and 3 numbers in parentheses after "Conv 1D" respectively show the number of output channels, the size of convolutional kernel, and the convolution step size of the one-dimensional convolutional layer. BN is a batch normalization layer to prevent gradient disappearance or explosion, speed up training, ReLU (modified linear unit) is the activation function employed by the feature encoder. The numbers outside the box are the dimensions of the output feature maps of the layers of the feature encoder. The encoding process of the human behavior characteristic encoder on the skeleton segment is as follows: the skeleton fragment was input into 6 convolutional layers. The convolution kernel size of 6 convolutional layers is 6, the convolution step size of the 3 rd and 6 th convolutional layers is 2, and the convolution step size of the other convolutional layers is 1. The number of output channels per layer of the first 3 convolutional layers is 64, and the number of output channels per layer of the last 3 convolutional layers is 128. And residual error connection is adopted among every 3 convolutional layers, one-dimensional convolution is also adopted for residual error connection, the size of a convolution kernel is 1, and the convolution step length is 2. After the skeleton segment passes through 6 convolutional layers, the dimension of the output feature map is (8,128). The previous level of feature map is input into the max pooling layer, which has a step size of 4, and the output feature map dimension is (2,128). And inputting the feature map of the previous layer into a flattening layer, arranging the feature maps into a one-dimensional vector by the layer, inputting the feature map of the previous layer into a full-connection layer with the output feature map dimension being (256), and inputting the feature map of the previous layer into a full-connection layer with the output feature map dimension being (128), thereby obtaining the 128-dimensional feature vector output by the encoder.
As shown in fig. 9, training the human behavior recognition model includes the following steps:
step S200-3: the camera set shoots the same scene from different visual angles to obtain second sample pedestrian behavior video data of different visual angles, and the second sample pedestrian behavior video data of different visual angles are transmitted to the edge computing node connected with the camera set;
step S200-4: the edge computing node uploads a preset number of second sample pedestrian video data with different visual angles, collects and stores the second sample pedestrian video data with different visual angles in the same time period, performs data preprocessing on the second sample pedestrian video data with different visual angles in the same time period, inputs the preprocessed data into a pedestrian characteristic encoder to obtain a multi-visual-angle pedestrian characteristic vector, and transmits the multi-visual-angle pedestrian characteristic vector to a cloud server;
step S200-5: the cloud server receives the multi-view human behavior characteristic vectors uploaded by the edge computing nodes and the second sample human behavior video data of the preset number and different views, and trains a preset model according to the behavior category labels marked on the second sample human behavior video data of the preset number and the multi-view human behavior characteristic vectors to obtain a human behavior recognition model.
In this embodiment. Firstly, the human behavior feature encoder obtained by training of the self-supervision learning method can extract human behavior features which contain sufficient information and are well distinguishable, so that a small number of human behavior video segments are uploaded to the cloud server by edge nodes, human behavior type labels are labeled for the video segments by adopting a manual labeling method, a human behavior classifier with a simple structure is constructed, after training, a human behavior recognition model with a high recognition rate and a simple structure can be obtained, a large number of labels do not need to be labeled manually, and the workload of manual labeling is saved. Training human behavior classifier f by adopting human behavior feature vectors uploaded by edge nodes and manually labeled behavior class labelsactionAnd the pedestrian behavior classifier f after trainingactionThe cloud server only needs to classify the behaviors of people, so that the computing pressure of the cloud server is reduced, and the real-time performance of recognition is improved.
In a possible implementation, the behavior classification model trains behavior classesPersonal behavior classifier f for training personal behavior by using special labelsactionThe structure of (2) is shown in fig. 10. The input of the classifier is a 384-dimensional human behavior feature vector, which comprises 2 fully connected layers. The output of the first fully connected layer fc1 is 256-dimensional, using the ReLU activation function. The output dimension of the second full-connection layer fc2 is the same as the number K of the types of behaviors to be recognized, and a Softmax activation function is adopted. The model was trained using a cross entropy loss function and using a stochastic gradient descent method. Using trained classifiers factionAnd when the human behavior recognition task is completed, the human behavior recognition model is as follows:
Figure BDA0003195938210000161
wherein f isfusionAs a human behavior classifier, g (X)1),g(X2),g(X3) M is a multi-view human behavior feature vector, and is a human behavior recognition prediction result, (i) represents the ith element of the vector, and K is the number of the types of behaviors to be recognized.
Based on the same inventive concept, an embodiment of the present application provides a readable storage medium, where the storage medium stores a multi-view behavior recognition program under an edge computing architecture, and the multi-view behavior recognition program under the edge computing architecture is executed by a processor to implement the steps of the multi-view behavior recognition method under the edge computing architecture according to the first aspect of the embodiment of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the appended characteristic claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the embodiments of the present invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method and the system for identifying the multi-view pedestrian behavior under the edge computing architecture provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A multi-view behavior identification method under an edge computing architecture, the method comprising:
the method comprises the steps that a camera set shoots the same scene from different visual angles to obtain behavior video data of people to be identified from different visual angles, and the behavior video data of the people to be identified from different visual angles are transmitted to edge computing nodes connected with the camera set;
the edge computing node collects and stores the human behavior video data to be identified at different visual angles in the same time period, carries out data preprocessing on the human behavior video data to be identified at different visual angles in the same time period, inputs the preprocessed data into a human behavior feature encoder to obtain multi-visual-angle human behavior feature vectors and transmits the multi-visual-angle human behavior feature vectors to the cloud server;
and the cloud server receives the multi-view behavior feature vectors uploaded by the edge computing nodes, inputs the behavior feature vectors into a behavior recognition model, and obtains behavior recognition results of the behavior video data to be recognized at different views.
2. The method of claim 1, further comprising:
the method comprises the following steps that a camera set shoots the same scene from different visual angles to obtain first same personal behavior video data of different visual angles, and the first same personal behavior video data of different visual angles are transmitted to edge computing nodes connected with the camera set;
the edge computing node collects and stores first same personal behavior video data of different visual angles in the same time period, data preprocessing is carried out on the first same personal behavior video data of different visual angles in the same time period, a preset personal behavior self-supervision characteristic learning model is trained on the basis of the preprocessed first same personal behavior video data of different visual angles, and the personal behavior characteristic encoder is obtained.
3. The method of claim 2, further comprising:
the camera set shoots the same scene from different visual angles to obtain second sample pedestrian behavior video data of different visual angles, and the second sample pedestrian behavior video data of different visual angles are transmitted to the edge computing node connected with the camera set;
the edge computing node uploads a preset number of second sample human behavior video data at different visual angles, collects and stores the second sample human behavior video data at different visual angles in the same time period, performs data preprocessing on the second sample human behavior video data at different visual angles in the same time period, inputs the preprocessed data into a human behavior feature encoder to obtain multi-visual angle human behavior feature vectors, and transmits the multi-visual angle human behavior feature vectors to a cloud server;
the cloud server receives the multi-view human behavior characteristic vectors uploaded by the edge computing nodes and the second sample human behavior video data of the preset number and different views, and trains a preset model according to the behavior category labels marked on the second sample human behavior video data of the preset number and the multi-view human behavior characteristic vectors to obtain a human behavior recognition model.
4. The method according to claim 1 or 2, wherein the preprocessing of the human behavior video data of different view angles comprises:
determining skeleton data of the human behavior video data at different visual angles according to the human behavior video data at different visual angles;
preprocessing the skeleton data of the pedestrian behavior video data at different visual angles to obtain skeleton sequences of the pedestrian behavior video data at different visual angles;
and fusing the skeleton sequences of the human behavior video data at different visual angles to obtain a fused skeleton fragment sequence.
5. The method of claim 4, further comprising:
reordering the fusion framework fragment sequence obtained after pretreatment according to a plurality of ordering modes, and marking an ordering mode label;
training a preset human behavior self-supervision characteristic learning model based on the preprocessed first same human behavior video data with different visual angles, and the training comprises the following steps:
inputting the reordered fusion framework fragment sequence and the ordering mode label thereof into a human behavior self-supervision characteristic learning model for training.
6. The method of claim 4, wherein determining the skeleton data of the human behavior video data of different perspectives from the human behavior video data of different perspectives comprises:
calculating the positions of human body posture key points of each frame of picture in the human behavior video data at different visual angles, wherein the positions of the human body posture key points are skeleton data of the human behavior video data at different visual angles;
the calculation formula of the skeleton data is as follows:
Figure FDA0003195938200000021
wherein the content of the first and second substances,
Figure FDA0003195938200000022
the image of the ith frame of the camera is represented, x and y respectively represent the horizontal and vertical coordinates of the human body posture key points in the image, j represents the number of the human body posture key points, and N is the total number of the human body posture key points.
7. The method of claim 6, wherein preprocessing the skeleton data of the human behavior video data of different view angles to obtain a skeleton sequence of the human behavior video data of different view angles comprises:
subtracting the mean value of the coordinate positions of all the human posture key points in each frame image from the coordinate position of the human posture key point in each frame image, wherein the calculation formula is as follows:
Figure FDA0003195938200000031
determining the skeleton characteristics of each frame of image, wherein the calculation formula is as follows:
Figure FDA0003195938200000032
determining a skeleton sequence of the human behavior video data of different visual angles, wherein the calculation formula is as follows:
Figure FDA0003195938200000033
performing normalization processing on the skeleton sequences of the human behavior video data at different visual angles, wherein the calculation formula is as follows:
Figure FDA0003195938200000034
wherein the content of the first and second substances,
Figure FDA0003195938200000035
as a skeletal feature of the t frame image numbered as the ith camera, SiFor people from different perspectives to behave as a sequence of video data skeletons,
Figure FDA0003195938200000036
the human behavior video data skeleton sequences of different visual angles after normalization.
8. The method of claim 4, wherein fusing the skeleton sequences of the human behavior video data from different perspectives to obtain a fused skeleton segment sequence comprises:
dividing the human behavior video data skeleton sequence of each different visual angle equally according to time nodes to obtain a plurality of skeleton segments;
randomly extracting any one of the skeleton segments corresponding to each time node, and fusing the skeleton segments corresponding to the multiple time nodes to obtain skeleton segment sequences of the human behavior video data at different visual angles.
9. The method of claim 3, wherein the human behavior recognition model outputs the human behavior recognition prediction according to the following formula:
Figure FDA0003195938200000037
wherein f isfusionAs a human behavior classifier, g (X)1),g(X2),g(X3) For the multi-view human behavior feature vector, m is the human behavior recognition prediction result, (i) the ith element of the vector is represented, and K is the number of the types of behaviors to be recognized。
10. A multi-perspective behavior recognition system under an edge computing architecture, the system comprising:
the camera set shoots the same scene from different visual angles to obtain the behavior video data of the people to be identified from different visual angles, and transmits the behavior video data of the people to be identified from different visual angles to the edge computing node connected with the behavior video data;
the edge computing node is used for receiving the human behavior video data of different visual angles transmitted by the camera group, carrying out data preprocessing on the human behavior video data of different visual angles, and training a preset human behavior self-supervision characteristic learning model based on the human behavior video data of different visual angles after data preprocessing to obtain the human behavior characteristic encoder; the human behavior video data of different visual angles are transmitted to a cloud server, data preprocessing is carried out on the human behavior video data of different visual angles, the human behavior video data are input to a human behavior feature encoder, multi-visual-angle human behavior feature vectors are obtained, and the human behavior feature vectors are transmitted to the cloud server;
the cloud server is used for uploading the multi-view behavior characteristic vectors and the behavior video data at different views by the edge computing node, determining behavior category labels of manual labeling according to the behavior video data at different views, and training a preset model according to the behavior category labels of the manual labeling and the multi-view behavior characteristic vectors to obtain a behavior recognition model; and receiving the multi-view pedestrian behavior feature vectors uploaded by the edge computing nodes, and inputting the pedestrian behavior feature vectors into a pedestrian behavior identification model to obtain pedestrian behavior identification results of the pedestrian behavior video data at different views.
CN202110891098.4A 2021-08-04 2021-08-04 Multi-view pedestrian behavior identification method and system under edge computing architecture Active CN113743221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110891098.4A CN113743221B (en) 2021-08-04 2021-08-04 Multi-view pedestrian behavior identification method and system under edge computing architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110891098.4A CN113743221B (en) 2021-08-04 2021-08-04 Multi-view pedestrian behavior identification method and system under edge computing architecture

Publications (2)

Publication Number Publication Date
CN113743221A true CN113743221A (en) 2021-12-03
CN113743221B CN113743221B (en) 2022-05-20

Family

ID=78730161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110891098.4A Active CN113743221B (en) 2021-08-04 2021-08-04 Multi-view pedestrian behavior identification method and system under edge computing architecture

Country Status (1)

Country Link
CN (1) CN113743221B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565970A (en) * 2022-01-27 2022-05-31 内蒙古工业大学 High-precision multi-angle behavior recognition method based on deep learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038420A (en) * 2017-11-21 2018-05-15 华中科技大学 A kind of Human bodys' response method based on deep video
CN109558781A (en) * 2018-08-02 2019-04-02 北京市商汤科技开发有限公司 A kind of multi-angle video recognition methods and device, equipment and storage medium
CN109785322A (en) * 2019-01-31 2019-05-21 北京市商汤科技开发有限公司 Simple eye human body attitude estimation network training method, image processing method and device
CN109886172A (en) * 2019-02-01 2019-06-14 深圳市商汤科技有限公司 Video behavior recognition methods and device, electronic equipment, storage medium, product
CN110458944A (en) * 2019-08-08 2019-11-15 西安工业大学 A kind of human skeleton method for reconstructing based on the fusion of double-visual angle Kinect artis
CN111275583A (en) * 2020-01-20 2020-06-12 上海大学 Service method based on face recognition and database
CN111405241A (en) * 2020-02-21 2020-07-10 中国电子技术标准化研究院 Edge calculation method and system for video monitoring
CN112347875A (en) * 2020-10-26 2021-02-09 清华大学 Edge cooperative target detection method and device based on region division
CN112884822A (en) * 2021-02-09 2021-06-01 北京工业大学 Framework extraction method of human body multi-view image sequence based on RepNet model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038420A (en) * 2017-11-21 2018-05-15 华中科技大学 A kind of Human bodys' response method based on deep video
CN109558781A (en) * 2018-08-02 2019-04-02 北京市商汤科技开发有限公司 A kind of multi-angle video recognition methods and device, equipment and storage medium
CN109785322A (en) * 2019-01-31 2019-05-21 北京市商汤科技开发有限公司 Simple eye human body attitude estimation network training method, image processing method and device
CN109886172A (en) * 2019-02-01 2019-06-14 深圳市商汤科技有限公司 Video behavior recognition methods and device, electronic equipment, storage medium, product
CN110458944A (en) * 2019-08-08 2019-11-15 西安工业大学 A kind of human skeleton method for reconstructing based on the fusion of double-visual angle Kinect artis
CN111275583A (en) * 2020-01-20 2020-06-12 上海大学 Service method based on face recognition and database
CN111405241A (en) * 2020-02-21 2020-07-10 中国电子技术标准化研究院 Edge calculation method and system for video monitoring
CN112347875A (en) * 2020-10-26 2021-02-09 清华大学 Edge cooperative target detection method and device based on region division
CN112884822A (en) * 2021-02-09 2021-06-01 北京工业大学 Framework extraction method of human body multi-view image sequence based on RepNet model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HAORAN WANG 等: "Skeleton edge motion networks for human action recognition", 《NEUROCOMPUTING》 *
游伟,王雪: "人行为骨架特征识别边缘计算方法研究", 《仪器仪表学报》 *
王嘉强: "基于跌倒检测的老人看护系统的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
裴晓敏 等: "时空特征融合深度学习网络人体行为识别方法", 《红外与激光工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565970A (en) * 2022-01-27 2022-05-31 内蒙古工业大学 High-precision multi-angle behavior recognition method based on deep learning

Also Published As

Publication number Publication date
CN113743221B (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN110235138B (en) System and method for appearance search
US9251425B2 (en) Object retrieval in video data using complementary detectors
CN108846365B (en) Detection method and device for fighting behavior in video, storage medium and processor
CN111784685A (en) Power transmission line defect image identification method based on cloud edge cooperative detection
CN110991283A (en) Re-recognition and training data acquisition method and device, electronic equipment and storage medium
CN111798456A (en) Instance segmentation model training method and device and instance segmentation method
CN109871821B (en) Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network
CN110781964A (en) Human body target detection method and system based on video image
CN113298789A (en) Insulator defect detection method and system, electronic device and readable storage medium
CN112668410B (en) Sorting behavior detection method, system, electronic device and storage medium
CN112381132A (en) Target object tracking method and system based on fusion of multiple cameras
CN111177469A (en) Face retrieval method and face retrieval device
CN112070071B (en) Method and device for labeling objects in video, computer equipment and storage medium
CN112132130B (en) Real-time license plate detection method and system for whole scene
CN111626090A (en) Moving target detection method based on depth frame difference convolutional neural network
CN113743221B (en) Multi-view pedestrian behavior identification method and system under edge computing architecture
CN113516102A (en) Deep learning parabolic behavior detection method based on video
CN113033523B (en) Method and system for constructing falling judgment model and falling judgment method and system
Jeon et al. Leveraging future trajectory prediction for multi-camera people tracking
CN113936175A (en) Method and system for identifying events in video
CN112288702A (en) Road image detection method based on Internet of vehicles
CN115937492A (en) Transformer equipment infrared image identification method based on feature identification
CN115115713A (en) Unified space-time fusion all-around aerial view perception method
CN113869122A (en) Distribution network engineering reinforced control method
CN113824989A (en) Video processing method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant