CN107194559B - Workflow identification method based on three-dimensional convolutional neural network - Google Patents

Workflow identification method based on three-dimensional convolutional neural network Download PDF

Info

Publication number
CN107194559B
CN107194559B CN201710335309.XA CN201710335309A CN107194559B CN 107194559 B CN107194559 B CN 107194559B CN 201710335309 A CN201710335309 A CN 201710335309A CN 107194559 B CN107194559 B CN 107194559B
Authority
CN
China
Prior art keywords
frame
workflow
neural network
video
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710335309.XA
Other languages
Chinese (zh)
Other versions
CN107194559A (en
Inventor
胡海洋
丁佳民
陈洁
胡华
程凯明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Taoyi Data Technology Co ltd
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201710335309.XA priority Critical patent/CN107194559B/en
Publication of CN107194559A publication Critical patent/CN107194559A/en
Application granted granted Critical
Publication of CN107194559B publication Critical patent/CN107194559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Multimedia (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a workflow identification method based on a three-dimensional convolutional neural network. Only different process tasks are divided in advance and different action behaviors are labeled manually in the process of analyzing the video, so that the method is not in line with the automation requirement of intelligent manufacturing. The invention firstly provides an interframe difference method with a self-adaptive threshold value, which is mainly used for dividing the region of a moving object from a complex background, thereby reducing the time complexity of subsequent feature extraction and model training; secondly, the 3D convolutional neural network is improved to be capable of fully adapting to a factory environment with a plurality of monitoring devices, and for different views, views at different angles are fused according to weights by adopting a view pooling layer; finally, a new action division method is provided, and continuous production actions in the video are automatically divided, so that an automatic workflow identification process is realized.

Description

Workflow identification method based on three-dimensional convolutional neural network
Technical Field
The invention belongs to the technical field of workflow identification, and is used for quickly and accurately identifying and detecting a production and manufacturing process.
Background
The intelligent manufacturing is a further development direction of manufacturing automation, and the artificial intelligence technology is widely applied to various links of engineering design, process design, production scheduling, fault diagnosis and the like in the industrial manufacturing process, so that the manufacturing process is intelligent, and the productivity is greatly improved. Workflow recognition (workflow recognition) has attracted attention from the industry and the scientific research community as an important technical direction for intelligent manufacturing. The camera installed in a manufacturing workshop is utilized to shoot the whole process of production scheduling on a production line, then the video is calculated and processed, and the industrial production flow is identified and detected quickly and accurately, so that the method plays an important role in protecting the personal safety of staff, reducing production overhead, ensuring product quality, optimizing production scheduling and flow specification.
However, workflow identification techniques have their complexities and specificities. Firstly, because various machines, transport vehicles, auxiliary equipment and other objects in a production workshop are more and often shielded from each other, and the similarity of different process operations and frequent light intensity changes in the workshop bring challenges to the analysis and identification of videos and images. Furthermore, the dynamic production workflow process makes the identification process rather complex and prone to bias: for example, different tasks in a workflow tend to have different execution times, and there is no explicit definition between the start and end of a task; these tasks may even involve both human and machine actions, and some of these workflow-independent actions must be distinguished from the actual production task. These aspects make conventional motion/pose recognition methods that rely on target object detection and tracking difficult to adapt to complex factory manufacturing environments. In addition, some researchers have developed partial research on workflow identification technology, but how to automatically divide the production process/action of the image sequence in the video is not clearly defined, and most of them only divide the different process tasks in advance and manually mark different action behaviors in the process of analyzing the video, which obviously does not meet the automation requirement of intelligent manufacturing.
Disclosure of Invention
Aiming at the current research situation, the invention provides a workflow identification framework with stronger robustness. In the framework, firstly, an interframe difference method with an adaptive threshold is provided, and the method is mainly used for dividing a region of a moving object from a complex background, so that the time complexity of subsequent feature extraction and model training is reduced; secondly, the 3D convolutional neural network is improved to be capable of fully adapting to a factory environment with a plurality of monitoring devices, and for different views, views at different angles are fused according to weights by adopting a view pooling layer; finally, a new action division method is provided, and continuous production actions in the video are automatically divided, so that an automatic workflow identification process is realized.
The method comprises the following specific steps:
step (1), exporting a workflow video containing multiple visual angles from a data set, and acquiring the video resolution and the frame number of the workflow video at each visual angle;
initializing an interframe difference threshold of workflow videos of all visual angles; respectively carrying out the steps (3) to (11) on the workflow video of each visual angle;
step (3), setting t to be 2;
reading three continuous video frames of t-1, t and t +1, and carrying out graying and median filtering processing on the three video frames;
step 5, performing interframe difference operation on the previous two frames and the next two frames respectively to obtain two interframe difference images;
step (6), dynamically updating an interframe difference threshold according to the two interframe difference images obtained in the step (5); the method for dynamically updating the interframe difference threshold comprises the following steps:
6.1 set 1 for l, t frame inter-frame difference threshold
Figure GDA0002443294720000021
dkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
Figure GDA0002443294720000022
N1and N2Respectively represent that
Figure GDA0002443294720000023
And
Figure GDA0002443294720000024
the total number of pixels of;
6.3 if
Figure GDA0002443294720000025
Then will be
Figure GDA0002443294720000026
Is assigned to tau1 tOtherwise, let l ═ l +1, repeat step 6.2;
performing binarization processing on the current frame according to the inter-frame differential threshold obtained in the step (6), wherein pixel points larger than the inter-frame differential threshold are set as 1, and pixel points smaller than the inter-frame differential threshold are set as 0;
step (8), operating and operating the two previous and next interframe difference images to obtain three frame difference images, and acquiring the center coordinates of the interest points by using a block extraction method;
step (9), segmenting the extracted interest points from the original image of the current frame;
step (10), gradually adding 1 to the value of t, and repeatedly executing the steps (4) to (9) until the value of t is 1 less than the value of the last frame of the workflow video, wherein the segmentation size of the step (9) is unchanged in the repeated process; storing the interest point images obtained in the step (9) in each repeated process as interest point videos according to the sequence, and classifying the interest point videos according to the classification rules in the data set;
step (11), randomly selecting 90% of the interest point videos obtained in the step (10) as a training set, and taking the rest as a test set;
step (12), constructing a multi-view three-dimensional convolution neural network, and initializing the number of training rounds to be 5000; the multi-view three-dimensional convolution neural network construction method comprises the following steps:
12.1 convolution and pooling operations are as follows:
initializing a four-dimensional convolution kernel with the size of 9 × 10 for the first convolution layer, wherein an activation function is sigmoid, the window size of the first pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 7 × 30 for the second convolution layer, wherein the activation function is sigmoid, the window size of the second pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 8 × 5 × 50 for the third convolution layer, wherein the activation function is sigmoid, the window size of the third pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 4 x 3 x 150 for the fourth convolution layer, wherein the activation function is sigmoid, the window size of the fourth pooling layer is 2, and the step length is 2;
12.2 initializing each characteristic graph weight parameter in the weighted average view pooling layer
Figure GDA0002443294720000031
Is [0,1 ]]A medium random value, and
Figure GDA0002443294720000032
the weighted average view pooling operation in the weighted average view pooling layer is as follows:
Figure GDA0002443294720000033
wherein a is the weighted average characteristic graph after the weighted average view pooling operation, t1The sequence number of the pooling profile after the convolution and pooling operations,
Figure GDA0002443294720000034
is a serial number t1The corresponding pooled feature map is weighted, exp represents an exponential function with e as the base,
Figure GDA0002443294720000035
is a serial number t1Corresponding pooling profiles;
12.3, respectively initializing a convolution kernel of 3000 × 1500 and 1500 × 750 for the first two fully-connected layers, and setting an activation function as Relu; inputting the weighted average characteristic graph after the weighted average view pooling operation into the front two fully-connected layers;
12.4 initialize a 750 × 14 convolution kernel for the last fully connected layer and set the Softmax classification function.
Step (13), randomly selecting 20 videos from a training set corresponding to the workflow videos of each visual angle, inputting the 20 videos into the multi-view three-dimensional convolution neural network in the step (12) for feature training, and outputting training errors;
step (14), randomly selecting 10 videos from a training set corresponding to the workflow videos of each visual angle, inputting the 10 videos into a multi-view three-dimensional convolutional neural network for verification, and obtaining the accuracy of classification and identification of the multi-view three-dimensional convolutional neural network;
step (15), repeating the steps (13) to (14), and subtracting 1 from the number of training rounds each time until the number of training rounds is 0 to obtain a trained multi-view three-dimensional convolution neural network;
step (16), testing the multi-view three-dimensional convolution neural network in the step (15) by using a test set corresponding to the workflow video of each visual angle;
step (17), acquiring the resolution and the frame number of the newly input workflow video, and initializing an interframe difference threshold; setting t to be 2;
step (18), extracting the center coordinates of the interest points of two adjacent frames according to the steps (4) to (8), calculating the distance between the two center coordinates, and marking the distance as a motion state S if the distance is greater than a set threshold value T1Otherwise, the flag is in a relatively static state S0
Step (19), gradually adding 1 to the t value, repeating the step (18) until the t value is 1 less than the last frame value of the newly input workflow video, and counting continuous S0And S1When S is detected0Or S1When the number is greater than or equal to N, continuous S is divided0Or S1Storing the target interest points in the corresponding frames into a frame queue, otherwise discarding the continuous S0Or S1Corresponding to the frame.
Step (20), each successive S of the frame queue0Or S1And extracting continuous key frames from the ith frame in the corresponding frame set, wherein i is more than 5, and the number of the key frames is the same as that of each section of classified video in the data set.
Step (21), inputting the videos formed by the key frames in the step (20) according to the sequence into the multi-view three-dimensional convolution neural network trained in the step (15) to classify and recognize the staff behaviors;
and (22) comparing the behavior type obtained in the step (21) with a predefined standard workflow.
The invention has the following beneficial effects:
the workflow identification method based on the three-dimensional convolution neural network mainly comprises the following functional modules: the device comprises a moving object segmentation module, a behavior identification module and an action division module.
The moving target segmentation module mainly realizes the segmentation of target interest points from image and video sequences. Because the target motion in the workflow video sequence is relatively large, and the background is basically in a static state, the two frames of images before and after can be subtracted to obtain an inter-frame difference image, and then the moving target can be segmented according to the size relation between the pixel difference and the threshold value. The adopted self-adaptive three-frame difference method is to perform AND operation on the inter-frame difference images obtained by the first two frames and the second two frames in the three video frames to obtain three-frame difference images, and the setting of the threshold value is automatically adjusted according to the previous inter-frame difference image, so that the influence of noise can be effectively avoided;
and the behavior recognition module performs behavior recognition on the moving target by utilizing the 3D convolutional neural network and the multi-view learning capability. To achieve multi-view fusion, we use a view-pooling layer (view-pooling layer) to fuse the global view information. Multiple independent 3D-CNNs are involved in the multi-view 3D-CNNs for extracting features from image sequences of different views; then, fusing feature descriptors extracted from different views in a view pooling layer and learning view-related features; finally, a full connected neural network (FNN) with a softmax classifier is used for final identification;
the action partitioning module defines two states: a moving state and a relatively stationary state. And (4) taking the central coordinate of the interest point of each frame, wherein when the interest point moves, the central coordinate of the interest point also moves. At this time, the difference between the center coordinates of the interest points of two adjacent frames can be taken to represent the state of the current interest point. Dynamic and static partitioning is achieved in this way;
the workflow identification method provided by the invention can effectively solve two problems to be solved in workflow identification under complex environment, wherein the first problem is that objects such as various machines, transport vehicles and auxiliary instruments in a production workshop are shielded from each other, the similarity of different process operations and the influence of frequent light intensity change in the workshop on workflow identification, and the second problem is how to automatically divide the production process/action of an image sequence in a video.
Drawings
FIG. 1 is a schematic diagram of a multi-view three-dimensional convolutional neural network construction;
fig. 2 is a schematic diagram of the division of the working flow.
Detailed Description
The invention is further illustrated by the following figures and examples.
First, concept definition and symbol description are performed:
interframe difference threshold
Figure GDA0002443294720000051
t denotes the current frame number, l ≧ 1 denotes the recursion order, dkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
Figure GDA0002443294720000052
N1and N2Respectively represent that
Figure GDA0002443294720000053
And
Figure GDA0002443294720000054
the total number of pixels.
a: the weighted average view pools the weighted average profiles after the operation.
t1: sequence number of pooled feature map after convolution and pooling operations.
Figure GDA0002443294720000061
Number t1The corresponding pooled feature map takes weight.
Figure GDA0002443294720000062
Number t1Corresponding pooling profiles.
Secondly, the workflow identification method based on the three-dimensional convolution neural network comprises the following implementation steps:
(1) and (3) moving object segmentation: video monitoring equipment on a production line is often erected at a higher position, so that most of areas in a monitoring picture are factory backgrounds irrelevant to workflow identification, and if feature vectors are directly extracted from the whole monitoring picture, the difficulty of feature extraction and the time consumption of calculation are greatly increased. Therefore, the three-frame difference method using the adaptive threshold value segments the moving object (interest point) part in the video, thereby reducing the workload of the later steps. Specifically, the method comprises the following steps:
(1.1) exporting the multi-view workflow video from the data set, and acquiring the video resolution and the frame number of the workflow video at each view;
(1.2) initializing an interframe difference threshold of workflow videos of all visual angles; setting t to 2, the steps (1.3) - (1.9) are performed for each view workflow video respectively
(1.3) reading a video frame t and two adjacent frames t-1 and t +1 thereof, and carrying out graying and median filtering processing on the three video frames;
(1.4) performing interframe difference operation on the first two frames and the second two frames respectively to obtain two interframe difference images;
(1.5) dynamically updating the interframe difference threshold according to the two interframe difference images obtained in the step (1.4), wherein the updating method comprises the following steps:
(1.5.1) setting l as 1, and setting the t frame interframe difference threshold value
Figure GDA0002443294720000063
dkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
(1.5.2) order
Figure GDA0002443294720000064
N1And N2Respectively represent that
Figure GDA0002443294720000065
And
Figure GDA0002443294720000066
the total number of pixels of (a);
(1.5.3) if
Figure GDA0002443294720000067
Then will be
Figure GDA0002443294720000068
Is assigned to tau1 tOtherwise, let l ═ l +1, repeat step (1.5.2);
(1.6) carrying out binarization processing on the current frame (namely the middle frame) according to the inter-frame differential threshold obtained in the step (1.5), wherein pixel points larger than the inter-frame differential threshold are set as 1, and pixel points smaller than the inter-frame differential threshold are set as 0;
(1.7) performing AND operation on the front differential image and the rear differential image to obtain three frames of differential images, and acquiring the center coordinates of the interest points by using a Blob Extraction method;
(1.8) segmenting the extracted interest points from the original image of the current frame;
(1.9) gradually adding 1 to the t value, and repeatedly executing the steps (1.3) - (1.8) until the t value is 1 less than the value of the last frame of the workflow video, wherein in the repeated process, the segmentation size in the step (1.8) is unchanged; storing the interest point images obtained in the step (1.8) in each repeated process as interest point videos according to the sequence, and classifying the interest point videos according to the classification rules in the data set;
(2) behavior identification based on a multi-view three-dimensional convolutional neural network: after the current manufacturing production line is inspected, the fact that the multiple cameras are adopted for synchronous real-time monitoring from different angles in the same working scene in the current manufacturing production line is often found, and therefore the quality of products and the safety of staff are guaranteed. By utilizing the characteristic, the influence of a complex environment of a factory on behavior recognition is effectively reduced by using a multi-view feature extraction and fusion method, and the accuracy of the behavior recognition is improved. The specific execution steps are as follows:
(2.1) selecting 90% of the interest point videos obtained in the step (1) as a training set, and taking the rest as a test set;
(2.2) constructing a multi-view three-dimensional convolution neural network (see the attached figure 1). The number of the initial training rounds is 5000, and the multi-view three-dimensional convolution neural network construction method comprises the following steps:
the operation processes of convolution and pooling are (2.2.1) - (2.2.4):
(2.2.1) initializing a four-dimensional convolution kernel of size 9 x 10 for the first convolution layer, the activation function being sigmoid, the first pooling layer window size being 2, and the step size being 2;
(2.2.2) initializing a four-dimensional convolution kernel of size 9 x 7 x 30 for the second convolutional layer, with an activation function of sigmoid, a second pooling layer window size of 2, and a step size of 2;
(2.2.3) initializing a four-dimensional convolution kernel of size 9 x 8 x 5 x 50 for the third convolutional layer, with an activation function of sigmoid, a third pooling layer window size of 2, and a step size of 2;
(2.2.4) initializing a four-dimensional convolution kernel with a size of 4 x 3 x 150 for the fourth convolution layer, with an activation function of sigmoid, a fourth pooling layer window size of 2, and a step size of 2;
(2.2.5) initializing feature map weight parameters in the weighted-average view pooling layer
Figure GDA0002443294720000071
Is [0,1 ]]A medium random value, and
Figure GDA0002443294720000072
the weighted average view pooling layer (WAVP) is calculated as follows:
Figure GDA0002443294720000081
(2.2.6) initializing a convolution kernel of 3000 × 1500 and 1500 × 750 for the first two fully connected layers respectively, and setting the activation function to Relu; inputting the weighted average characteristic graph after the weighted average view pooling operation into the front two fully-connected layers;
(2.2.7) initialize a 750 x 14 convolution kernel for the last fully-connected layer and set the Softmax classification function, where 14 is the kind of action.
(2.3) randomly selecting 20 videos from the training set of the workflow videos of all the visual angles, inputting the 20 videos into the multi-view three-dimensional convolution neural network in the step (2.2) for feature training, and outputting training errors;
(2.4) randomly selecting 10 videos from the training set of the workflow videos of each visual angle, inputting the 10 videos into the multi-view three-dimensional convolutional neural network for verification, and obtaining the accuracy of classification and identification of the multi-view three-dimensional convolutional neural network;
(2.5) repeating the steps (2.3) - (2.4), and subtracting 1 from the number of training rounds each time until the number of training rounds is 0 to obtain a trained multi-view three-dimensional convolutional neural network;
(2.6) testing the multi-view three-dimensional convolution neural network in the (2.5) by using a test set corresponding to the workflow video of each visual angle;
(3) the action division method based on the state comprises the following steps: in an actual environment, the motions of the workers often occur continuously, and in this case, if the motions are to be recognized, the motions need to be divided first, and then each motion can be recognized separately. It has been observed that a displacement occurs between the operations of the worker from taking the part, handling the part to placing the part, and the worker from taking the welding tool to welding the part (see fig. 2). Therefore, the motion can be divided according to the motion state of the worker. The specific execution steps are as follows:
(3.1) acquiring the resolution and the frame number of the newly input video, and initializing an interframe difference threshold; setting t to be 2;
(3.2) extracting the center coordinates of the interest points of two adjacent frames according to the steps (1.3) to (1.7), calculating the distance between the two center coordinates, and marking the distance as a motion state S if the distance is greater than a threshold value T set by people1Otherwise, the flag is in a relatively static state S0
(3.3) t fetchGradually adding 1 to the value, repeating the step (3.2) until the value of t is 1 less than the value of the last frame of the newly input video, and counting the continuous S0And S1Number of S when successive S are detected0Or S1When the number is larger than or equal to N, N is larger than 10, and continuous S is divided by the method of (1.8)0Or S1Storing the target interest points in the corresponding frames into a frame queue, otherwise discarding the continuous S0Or S1Corresponding to the frame.
(3.4) respective successive S of the frame queue0Or S1And extracting continuous key frames from the ith frame in the corresponding frame set, wherein i is more than 5, so that the number of the key frames is the same as that of each classified video in the data set.
(3.5) inputting the video formed by the key frames in the step (3.4) in sequence into the trained multi-view three-dimensional convolution neural network in the step (2.5) to classify and recognize the staff behaviors;
and (3.6) comparing the behavior categories obtained in (3.5) with a predefined standard workflow.

Claims (1)

1. A workflow identification method based on a three-dimensional convolution neural network is characterized by comprising the following steps: the method comprises the following specific steps:
step (1), exporting a workflow video containing multiple visual angles from a data set, and acquiring the video resolution and the frame number of the workflow video at each visual angle;
initializing an interframe difference threshold of workflow videos of all visual angles; respectively carrying out the steps (3) to (11) on the workflow video of each visual angle;
step (3), setting t to be 2;
reading three continuous video frames of t-1, t and t +1, and carrying out graying and median filtering processing on the three video frames;
step 5, performing interframe difference operation on the previous two frames and the next two frames respectively to obtain two interframe difference images;
step (6), dynamically updating an interframe difference threshold according to the two interframe difference images obtained in the step (5); the method for dynamically updating the interframe difference threshold comprises the following steps:
6.1 setting l 1, t frameInterframe difference threshold
Figure FDA0002443294710000011
dkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
6.2 order
Figure FDA0002443294710000012
N1And N2Respectively represent that
Figure FDA0002443294710000013
And
Figure FDA0002443294710000014
the total number of pixels of;
6.3 if
Figure FDA0002443294710000015
Then will be
Figure FDA0002443294710000016
Is assigned to tau1 tOtherwise, let l ═ l +1, repeat step 6.2;
performing binarization processing on the current frame according to the inter-frame differential threshold obtained in the step (6), wherein pixel points larger than the inter-frame differential threshold are set as 1, and pixel points smaller than the inter-frame differential threshold are set as 0;
step (8), operating and operating the two previous and next interframe difference images to obtain three frame difference images, and acquiring the center coordinates of the interest points by using a block extraction method;
step (9), segmenting the extracted interest points from the original image of the current frame;
step (10), gradually adding 1 to the value of t, and repeatedly executing the steps (4) to (9) until the value of t is 1 less than the value of the last frame of the workflow video, wherein the segmentation size of the step (9) is unchanged in the repeated process; storing the interest point images obtained in the step (9) in each repeated process as interest point videos according to the sequence, and classifying the interest point videos according to the classification rules in the data set;
step (11), randomly selecting 90% of the interest point videos obtained in the step (10) as a training set, and taking the rest as a test set;
step (12), constructing a multi-view three-dimensional convolution neural network, and initializing the number of training rounds to be 5000; the multi-view three-dimensional convolution neural network construction method comprises the following steps:
12.1 convolution and pooling operations are as follows:
initializing a four-dimensional convolution kernel with the size of 9 × 10 for the first convolution layer, wherein an activation function is sigmoid, the window size of the first pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 7 × 30 for the second convolution layer, wherein the activation function is sigmoid, the window size of the second pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 8 × 5 × 50 for the third convolution layer, wherein the activation function is sigmoid, the window size of the third pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 4 x 3 x 150 for the fourth convolution layer, wherein the activation function is sigmoid, the window size of the fourth pooling layer is 2, and the step length is 2;
12.2 initializing each characteristic graph weight parameter in the weighted average view pooling layer
Figure FDA0002443294710000021
Is [0,1 ]]A medium random value, and
Figure FDA0002443294710000022
the weighted average view pooling operation in the weighted average view pooling layer is as follows:
Figure FDA0002443294710000023
wherein a is the weighted average characteristic graph after the weighted average view pooling operation, t1For the pools after convolution and pooling operationsThe serial number of the feature map is changed,
Figure FDA0002443294710000024
is a serial number t1The corresponding pooled feature map is weighted, exp represents an exponential function with e as the base,
Figure FDA0002443294710000025
is a serial number t1Corresponding pooling profiles;
12.3, respectively initializing a convolution kernel of 3000 × 1500 and 1500 × 750 for the first two fully-connected layers, and setting an activation function as Relu; inputting the weighted average characteristic graph after the weighted average view pooling operation into the front two fully-connected layers;
12.4, initializing a 750 × 14 convolution kernel for the last fully-connected layer and setting a Softmax classification function;
step (13), randomly selecting 20 videos from the training set corresponding to the workflow videos of each visual angle, inputting the videos into the multi-view three-dimensional convolution neural network in the step (12) for feature training, and outputting training errors;
step (14), randomly selecting 10 videos from a training set corresponding to the workflow videos of each visual angle, inputting the 10 videos into a multi-view three-dimensional convolutional neural network for verification, and obtaining the accuracy of classification and identification of the multi-view three-dimensional convolutional neural network;
step (15), repeating the steps (13) to (14), and subtracting 1 from the number of training rounds each time until the number of training rounds is 0 to obtain a trained multi-view three-dimensional convolution neural network;
step (16), testing the multi-view three-dimensional convolution neural network in the step (15) by using a test set corresponding to the workflow video of each visual angle;
step (17), acquiring the resolution and the frame number of the newly input workflow video, and initializing an interframe difference threshold; setting t to be 2;
step (18), extracting the center coordinates of the interest points of two adjacent frames according to the steps (4) to (8), calculating the distance between the two center coordinates, and marking the distance as a motion state S if the distance is greater than a set threshold value T1Otherwise, mark asRelative stationary state S0
Step (19), gradually adding 1 to the t value, repeating the step (18) until the t value is 1 less than the last frame value of the newly input workflow video, and counting continuous S0And S1When S is detected0Or S1When the number is larger than or equal to N, N is larger than 10, and continuous S is divided0Or S1Storing the target interest points in the corresponding frames into a frame queue, otherwise discarding the continuous S0Or S1Corresponding frames;
step (20), each successive S of the frame queue0Or S1Extracting continuous key frames from the ith frame in the corresponding frame set, wherein i is more than 5, so that the number of the key frames is the same as the number of the frames of each classified section of video in the data set;
step (21), inputting the videos formed by the key frames in the step (20) according to the sequence into the multi-view three-dimensional convolution neural network trained in the step (15) to classify and recognize the staff behaviors;
and (22) comparing the behavior type obtained in the step (21) with a predefined standard workflow.
CN201710335309.XA 2017-05-12 2017-05-12 Workflow identification method based on three-dimensional convolutional neural network Active CN107194559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710335309.XA CN107194559B (en) 2017-05-12 2017-05-12 Workflow identification method based on three-dimensional convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710335309.XA CN107194559B (en) 2017-05-12 2017-05-12 Workflow identification method based on three-dimensional convolutional neural network

Publications (2)

Publication Number Publication Date
CN107194559A CN107194559A (en) 2017-09-22
CN107194559B true CN107194559B (en) 2020-06-05

Family

ID=59873285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710335309.XA Active CN107194559B (en) 2017-05-12 2017-05-12 Workflow identification method based on three-dimensional convolutional neural network

Country Status (1)

Country Link
CN (1) CN107194559B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10032136B1 (en) * 2012-07-30 2018-07-24 Verint Americas Inc. System and method of scheduling work within a workflow with defined process goals
CN107798297B (en) * 2017-09-28 2021-03-23 成都大熊智能科技有限责任公司 Method for automatically extracting stable frame based on inter-frame difference
CN107766292B (en) * 2017-10-30 2020-12-29 中国科学院计算技术研究所 Neural network processing method and processing system
CN108875931B (en) * 2017-12-06 2022-06-21 北京旷视科技有限公司 Neural network training and image processing method, device and system
CN108010538B (en) * 2017-12-22 2021-08-24 北京奇虎科技有限公司 Audio data processing method and device and computing equipment
CN108447048B (en) * 2018-02-23 2021-09-14 天津大学 Convolutional neural network image feature processing method based on attention layer
CN108235003B (en) * 2018-03-19 2020-03-06 天津大学 Three-dimensional video quality evaluation method based on 3D convolutional neural network
CN108681690B (en) * 2018-04-04 2021-09-03 浙江大学 Assembly line personnel standard operation detection system based on deep learning
CN109065165B (en) * 2018-07-25 2021-08-17 东北大学 Chronic obstructive pulmonary disease prediction method based on reconstructed airway tree image
CN109068174B (en) * 2018-09-12 2019-12-27 上海交通大学 Video frame rate up-conversion method and system based on cyclic convolution neural network
CN110969217B (en) * 2018-09-28 2023-11-17 杭州海康威视数字技术股份有限公司 Method and device for image processing based on convolutional neural network
CN109145874B (en) * 2018-09-28 2023-07-04 大连民族大学 Application of measuring difference between continuous frames of video and convolution characteristic diagram in obstacle detection of vision sensing part of autonomous automobile
CN109409294B (en) * 2018-10-29 2021-06-22 南京邮电大学 Object motion trajectory-based classification method and system for ball-stopping events
CN109635843B (en) * 2018-11-14 2021-06-18 浙江工业大学 Three-dimensional object model classification method based on multi-view images
CN109711454B (en) * 2018-12-21 2020-07-31 电子科技大学 Feature matching method based on convolutional neural network
CN110704653A (en) * 2019-09-09 2020-01-17 上海慧之建建设顾问有限公司 Method for searching component by graph in BIM (building information modeling) model and graph-text searching system
CN111160410B (en) * 2019-12-11 2023-08-08 北京京东乾石科技有限公司 Object detection method and device
CN111144262B (en) * 2019-12-20 2023-05-16 北京容联易通信息技术有限公司 Process anomaly detection method based on monitoring video
CN111310801B (en) * 2020-01-20 2024-02-02 桂林航天工业学院 Mixed dimension flow classification method and system based on convolutional neural network
CN112116195B (en) * 2020-07-21 2024-04-16 蓝卓数字科技有限公司 Railway beam production procedure identification method based on example segmentation
CN112016409A (en) * 2020-08-11 2020-12-01 艾普工华科技(武汉)有限公司 Deep learning-based process step specification visual identification determination method and system
CN114299128A (en) * 2021-12-30 2022-04-08 咪咕视讯科技有限公司 Multi-view positioning detection method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217214A (en) * 2014-08-21 2014-12-17 广东顺德中山大学卡内基梅隆大学国际联合研究院 Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
CN106407903A (en) * 2016-08-31 2017-02-15 四川瞳知科技有限公司 Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method
WO2017031088A1 (en) * 2015-08-15 2017-02-23 Salesforce.Com, Inc Three-dimensional (3d) convolution with 3d batch normalization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217214A (en) * 2014-08-21 2014-12-17 广东顺德中山大学卡内基梅隆大学国际联合研究院 Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method
WO2017031088A1 (en) * 2015-08-15 2017-02-23 Salesforce.Com, Inc Three-dimensional (3d) convolution with 3d batch normalization
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
CN106407903A (en) * 2016-08-31 2017-02-15 四川瞳知科技有限公司 Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method

Also Published As

Publication number Publication date
CN107194559A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
CN107194559B (en) Workflow identification method based on three-dimensional convolutional neural network
Santosh et al. Tracking multiple moving objects using gaussian mixture model
CN111126115B (en) Violent sorting behavior identification method and device
CN110298297A (en) Flame identification method and device
CN109460719A (en) A kind of electric operating safety recognizing method
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
Chetverikov et al. Dynamic texture as foreground and background
CN110751097B (en) Semi-supervised three-dimensional point cloud gesture key point detection method
CN108734109B (en) Visual target tracking method and system for image sequence
CN113449606B (en) Target object identification method and device, computer equipment and storage medium
Zhao et al. Background subtraction based on deep pixel distribution learning
CN111886600A (en) Device and method for instance level segmentation of image
CN106023249A (en) Moving object detection method based on local binary similarity pattern
Gaba et al. Motion detection, tracking and classification for automated Video Surveillance
CN108345835B (en) Target identification method based on compound eye imitation perception
Abdullah et al. Objects detection and tracking using fast principle component purist and kalman filter.
Ali et al. Deep Learning Algorithms for Human Fighting Action Recognition.
KR101690050B1 (en) Intelligent video security system
Nosheen et al. Efficient Vehicle Detection and Tracking using Blob Detection and Kernelized Filter
CN113936034A (en) Apparent motion combined weak and small moving object detection method combined with interframe light stream
Arif et al. People counting in extremely dense crowd using blob size optimization
CN110111358B (en) Target tracking method based on multilayer time sequence filtering
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
CN114821441A (en) Deep learning-based airport scene moving target identification method combined with ADS-B information
Sawalakhe et al. Foreground background traffic scene modeling for object motion detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220831

Address after: Room 405, 6-8 Jiaogong Road, Xihu District, Hangzhou City, Zhejiang Province, 310013

Patentee after: Hangzhou Taoyi Data Technology Co.,Ltd.

Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang

Patentee before: HANGZHOU DIANZI University

TR01 Transfer of patent right