CN109977818A - A kind of action identification method and system based on space characteristics and multi-target detection - Google Patents
A kind of action identification method and system based on space characteristics and multi-target detection Download PDFInfo
- Publication number
- CN109977818A CN109977818A CN201910192305.XA CN201910192305A CN109977818A CN 109977818 A CN109977818 A CN 109977818A CN 201910192305 A CN201910192305 A CN 201910192305A CN 109977818 A CN109977818 A CN 109977818A
- Authority
- CN
- China
- Prior art keywords
- target
- direction vector
- video
- target detection
- targets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of action identification method and system based on space characteristics and multi-target detection, this method comprises: step S1, the movement detected to needs is decomposed, obtain each decomposition goal, the collection of data set is carried out to each decomposition goal, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal;Step S2, constantly obtain video flowing, using the target detection model to the video flow detection target of input, obtain location information of the single goal in video image, and calculate the direction vector feature between target, compare the variation tendency of direction vector feature in video streaming, constantly close target is merged into fresh target;Step S3 extracts the fresh target of synthesis, merges to obtain only remaining major heading and when time target in all targets of a movement decomposition, the generation of the movement is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates.
Description
Technical field
The present invention relates to technical field of image processing, more particularly to a kind of dynamic based on space characteristics and multi-target detection
Make recognition methods and system.
Background technique
The fast development of science and technology makes the demand for understanding video content and analyzing more and more, the prison being seen everywhere
Control camera allows us to obtain the video information of magnanimity, and the method efficiency of traditional artificial treatment video is too low, and
And accuracy rate is not high, therefore, by computer and vision algorithm in conjunction with technology received widely to substitute artificial method
Concern, this technology can not only greatly reduce cost of labor, improve the efficiency of processing event, but also discovery thing can be improved
The accuracy rate of part and the timeliness for handling event.In field of video content analysis, action recognition is one very important
Branch, detection effect and performance are most important for the detection of anomalous event and behavior, so, action recognition has very strong
Social effect.
By the way that discovery is investigated to existing action identification method, most of action recognitions are all based in deep learning
Convolutional neural networks carry out single frames picture single goal detection or extract feature after, classification prediction is carried out by SVM classifier,
Also there is the Optical-flow Feature for extracting single goal in successive frame, be then based on deep learning and be trained classification, ballot statistics obtains pre-
It surveys as a result, extract feature however, these action identification methods are all based on and act single goal itself and analyzed, thus when dynamic
When complexity of making comparisons, certain features are easy to be ignored, so as to cause the ineffective of detection.
Summary of the invention
In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide a kind of based on space characteristics and more
The action identification method and system of target detection, with by simplifying compound action, and by utilizing the space between multiple target
The accuracy rate of feature raising action recognition.
In order to achieve the above object, the present invention proposes a kind of action identification method based on space characteristics and multi-target detection, packet
Include following steps:
Step S1, the movement detected to needs are decomposed, and each decomposition goal is obtained, and carry out data to each decomposition goal
The collection of collection, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal;
Step S2, constantly acquisition video flowing are obtained using the target detection model to the video flow detection target of input
Location information of the single goal in video image, and the direction vector feature between target is calculated, compare direction vector feature and is regarding
Constantly close target is merged into fresh target by the variation tendency in frequency stream;
Step S3 extracts the fresh target of synthesis, in all targets of a movement decomposition merge only remaining major heading and
When secondary target, this is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates
The generation of movement.
Preferably, step S1 further comprises:
Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained;
Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets;
Step S102 pre-processes the target data set of acquisition;
Step S103 is trained each target data set using YoloV3 network to obtain the target detection of each decomposition goal
Model.
Preferably, in step S102, the pretreatment includes but is not limited to the target in the image of target data set
It is translated, mirror image operation, some Gaussian noises, salt-pepper noise is added in target position, random cropping partial target image is right
Image shaken, padding.
Preferably, step S2 further comprises:
Step S200 carries out target detection using the target detection model to video flowing present frame, obtains each target
Location information, and the direction vector between each target is calculated, extract the feature of direction vector between target two-by-two;
Step S201 carries out target detection using the target detection model to next frame video, obtains the position of each target
Confidence breath, and the direction vector between each target is calculated, extract the feature of direction vector;
The direction vector feature obtained according to before and after frames video is compared, compares direction vector feature by step S202
Constantly close target is merged into fresh target by the variation tendency of length characteristic and direction character in video streaming.
Preferably, in step S202, judgement whether there is close trend between target two-by-two, close if it exists to become
Gesture, then continue next frame video, and return step S201 is overlapped until target is close two-by-two, and is merged into fresh target.
Preferably, in step S202, if in the video frame of front and back two-by-two the direction vector length of target it is continuous reduction with
And direction is consistent in video interframe, then illustrates two targets constantly close.
Preferably, two direction vector walking direction of video interframe indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to
Amount, (xn, yn) indicate target position coordinates,
In continually entering video flowing, if it exists | uN-1, n| > | u 'N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two
Target has the tendency that close.
Preferably, in step S202, by the IOU size between two targets to determine whether two targets should be closed
And at a fresh target.
Preferably, in step S202, the IOU between two targets is calculated, then merges two targets if more than some threshold value
For fresh target.
In order to achieve the above objectives, the present invention also provides a kind of action recognition system based on space characteristics and multi-target detection
System, comprising:
Target detection model training acquiring unit, the movement for detecting to needs are decomposed, and each decomposition mesh is obtained
Mark carries out the collection of data set to each decomposition goal, and is trained to obtain each point to each target data set based on deep learning
Solve the target detection model of target;
Object detection unit is examined for constantly obtaining video flowing using video flowing of the target detection model to input
Survey target, obtain location information of the single goal in video image, and calculate the direction vector feature between target, compare direction to
Constantly close target is merged into fresh target by the variation tendency of measure feature in video streaming;
Action recognition unit, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only surplus
When lower major heading and secondary target, according to the two between the feature and target position for the direction vector that video interframe generates
IOU come judge movement generation.
Compared with prior art, a kind of action identification method and system based on space characteristics and multi-target detection of the present invention
By being multiple simple targets by movement decomposition and establishing target detection model, the space in video between multiple target is made full use of
Vector characteristics, by interframe vector variation characteristic, movement relation and positional relationship by multiple targets in continuous interframe are examined
Survey movement, realizes the purpose for improving action recognition accuracy rate.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of the action identification method based on space characteristics and multi-target detection of the present invention;
Fig. 2 is the network structure of YoloV3 network in the specific embodiment of the invention;
Fig. 3 is the schematic diagram that IOU is calculated in the specific embodiment of the invention;
Fig. 4 is a kind of system architecture diagram of the motion recognition system based on space characteristics and multi-target detection of the present invention;
Fig. 5 is the detail structure chart of target detection model training acquiring unit in the specific embodiment of the invention;
Fig. 6 is the detail structure chart of object detection unit in the specific embodiment of the invention;
Fig. 7 is the process of the action identification method based on space characteristics and multi-target detection of the specific embodiment of the invention
Figure.
Specific embodiment
Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can
Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences
Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from
Various modifications and change are carried out under spirit of the invention.
Fig. 1 is a kind of step flow chart of the action identification method based on space characteristics and multi-target detection of the present invention.Such as
Shown in Fig. 1, a kind of action identification method based on space characteristics and multi-target detection of the present invention includes the following steps:
Step S1 decomposes the movement that detects of needs, obtains each decomposition goal, such as this movement of drinking can be with
Be decomposed into wineglass, hand, three targets of mouth (such as Manual definition will drink movement decomposition be three wineglass, hand and mouth mesh
Mark), carry out the collection of data set to each decomposition goal, the data set some can be looked in online certain disclosed data sets
It arrives, some then need to collect pictures is labeled by software such as labelImg, and based on deep learning to each target data
Collect and is trained to obtain the target detection model of each decomposition goal, in the present invention, the data set training based on all decomposition goals
A target detection model is obtained, this target detection model can detecte all targets in data set.
Specifically, step S1 further comprises:
Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained.
Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets.Such as it is received from network
Collection includes the picture of each decomposition goal, and the picture comprising identical decomposition goal is collected together, the mesh of the decomposition goal is formed
Data set is marked, such as the picture comprising each decomposition goal can be downloaded from the Internet, the figure of single decomposition goal can also be downloaded, is led to
It crosses annotation tool the target in picture is marked out to form the target data set of the decomposition goal, it includes original images
The comment file generated with mark.
Step S102 pre-processes the target data set of acquisition.Specifically, in order to improve the property of target detection
Can, before being trained based on deep learning to target data set, the image concentrated to the target data of acquisition is pre-processed,
The pretreatment includes but is not limited to: being translated to the target in the image of target data set, mirror image operation, in target position
Be added some Gaussian noises, salt-pepper noise etc., random cropping partial target image, image is done shake, padding.
Step S103 is trained each target data set using YoloV3 network to obtain the target detection of each decomposition goal
Model.
In the present invention, the network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as basic portion
Part detects target in three various sizes of outputs, then will test target and merged to obtain final goal by NMS, defeated
Scale is respectively 13*13,26*26,52*52 out, and the network structure of YoloV3 network is as shown in Fig. 2, wherein residual block part one
21 convolutional layers (convolutional layer including several 3*3 and 1*1 convolution) is shared, remaining is res layers, and the part YOLO is yolo network
Feature interaction layer, is divided into three scales, and in each scale, local feature interaction is realized by way of convolution kernel, acts on class
It is similar to full articulamentum but is the part realized by way of convolution kernel (3*3 and 1*1) between characteristic pattern (feature map)
Feature interaction, in the specific embodiment of the invention, Far Left is smallest dimension yolo layers, and input is the characteristic pattern of 13*13
(feature map) exports the characteristic pattern (feature map) of 13*13 size, in this base by a series of convolution operation
Classification is carried out on plinth and position returns;Centre is mesoscale yolo layers, the characteristic pattern that yolo layers of smallest dimension are exported
(feature map) carries out a series of convolution operations, exports the characteristic pattern (feature map) of 26*26 size, then herein into
Row classification and position return;Rightmost is the yolo layer of large scale, the characteristic pattern (feature that yolo layers of mesoscale are exported
Map a series of convolution operations) are carried out, the characteristic pattern (feature map) of 52*52 size is exported, then carry out herein classification and
Position returns.
Step S2, constantly acquisition video flowing obtain monocular using target detection model to the video flow detection target of input
The location information being marked in video image, and the direction vector feature between target is calculated, compare direction vector feature in video flowing
In variation tendency, constantly close target is merged into fresh target.In the specific embodiment of the invention, before and after frames are regarded respectively
Frequency meter calculates the direction vector between target two-by-two, and the space characteristics i.e. length of vector for extracting direction vector and direction are in video flowing
Between variation tendency, if the length of the direction vector between two targets constantly reduces and direction is consistent in video interframe, explanation
Two targets constantly it is close, then calculate the IOU between two targets, then merge into fresh target for two if more than some threshold value.
Specifically, step S2 further comprises:
Step S200 carries out target detection using the target detection model that training obtains in S1 to video flowing present frame, obtains
To the location information of each target, and the direction vector between each target is calculated, extracts the feature of direction vector between target two-by-two.
In the specific embodiment of the invention, the direction vector method obtained between target two-by-two is as follows: assuming that each single goal exists
Location information in image is represented by (x1, y1, t1), (x2, y2, t1) ..., (xn, yn, t1), wherein t1Indicate t1Frame view
Frequently, (xn, yn) position coordinates that indicate target, the direction vector between target may be expressed as:
u1,2=(x1-x2, y1-y2, t1)
u1, n=(x1-xn, y1-yn, t1)
…
uN-1, n=(xn-1-xn, yn-1-yn, t1)
Wherein, uN-1, nIndicate the direction vector between (n-1)th target and n-th of target.
In the embodiment of the present invention, direction vector feature generally refers to the length characteristic and direction character of direction vector, i.e.,
The feature for extracting direction vector between target two-by-two is exactly to calculate length and the direction of direction vector, wherein the length of direction vector
Degree feature may be expressed as:
Step S201 carries out target detection using target detection model to next frame, obtains the location information of each target, and
The direction vector between each target is calculated, the feature of direction vector is extracted.The calculating of target detection and direction vector herein
Identical as step S200, it will not be described here.
Step S202, the direction vector feature that step S201 is obtained and the direction vector obtained according to former frame video are special
Sign compares, and compares the variation tendency of direction vector feature length characteristic in video streaming and direction character, will be constantly close
Target is merged into fresh target, that is, judges to whether there is close trend between target two-by-two, if it exists close trend, then under continuing
One frame video, return step S201, until target merges constantly close target very close to (such as close coincidence) two-by-two
At fresh target, if close trend is not present in continuous several frames between two targets, illustrate it is not related between the two targets,
The relationship between the two targets is then no longer paid close attention to below.
In the present invention, trend close between target is by the direction vector of target two-by-two in the video frame of front and back two-by-two
What the feature that the continuous reduction of length and direction are consistent was judged, i.e. two direction vector walking direction of video interframe can
It indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to
Amount.In continually entering video flowing, if it exists | uN-1, n|>|u′N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two targets connect
Close trend.
In the specific embodiment of the invention, merging target is by the IOU (Intersection-over- between two targets
Union is handed over and is compared) size be to determine whether a fresh target should be merged by two targets.Specifically, when two targets
Between IOU (hand over and compare) two targets can synthesize to when being more than some threshold value T a fresh targets, between two targets
IOU may be expressed as:
Wherein A, B indicate two targets, and the schematic diagram of IOU is as shown in Figure 3.
Step S3 extracts the fresh target of synthesis, when all targets of a movement decomposition merge only remaining major heading and
When secondary target, judge to move according to IOU of the two between the feature and target position for the direction vector that video interframe generates
The generation of work.
In the specific embodiment of the invention, the difference of major heading and time target refers in multiple targets of movement decomposition,
The motion change of video interframe is less major heading, and remaining decomposition goal is kept moving in interframe, one be finally synthesizing
Fresh target is referred to as time target.Secondary target can be constantly close to major heading in video streaming, by the direction between two targets to
The IOU between space characteristics, that is, vector length, direction and the variable quantity and two targets of length of formation, which is measured, to judge movement is
No generation.
Fig. 4 is a kind of system architecture diagram of the motion recognition system based on space characteristics and multi-target detection of the present invention.Such as
Shown in Fig. 4, a kind of motion recognition system based on space characteristics and multi-target detection of the present invention, comprising:
Target detection model training acquiring unit 401, the movement for detecting to needs are decomposed, and each decomposition is obtained
Target carries out the collection of data set to each decomposition goal, and is trained to obtain respectively to each target data set based on deep learning
The target detection model of decomposition goal.For example, three wineglass, hand, mouth targets can be decomposed into for this movement of drinking, it is right
Each decomposition goal carries out the collection of data set, some can be focused to find out the data set in online certain disclosed data, some
It then needs to collect pictures and be labeled by software such as labelImg, be then based on deep learning and each target data set is carried out
Training obtains the target detection model of each decomposition goal, this target detection model can detecte all targets in data set.
Specifically, as shown in figure 5, target detection model training acquiring unit 401 further comprises:
Movement decomposition unit 4010, the movement detected to needs are decomposed, several decomposition goals are obtained.
Target data set collector unit 4011 obtains multiple number of targets for carrying out data set collection to each decomposition goal
According to collection.Such as the picture comprising each decomposition goal is collected from network, the picture comprising identical decomposition goal is collected together,
Form the target data set of the decomposition goal.
Pretreatment unit 4012, for being pre-processed to the target data set of acquisition.Specifically, in order to improve target
The performance of detection need to be obtained using 4012 Duis of pretreatment unit before being trained based on deep learning to target data set
The image that target data is concentrated is pre-processed, and the pretreatment includes but is not limited to: to the mesh in the image of target data set
Mark translated, mirror image operation, and some Gaussian noises, salt-pepper noise etc., random cropping partial target figure is added in target position
Picture, image is done shake, padding.
Model training unit 4013 obtains each decomposition mesh for being trained using YoloV3 network to each target data set
Target target detection model.
In the present invention, the network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as basic portion
Part detects target in three various sizes of outputs, then will test target and merged to obtain final goal by NMS, defeated
Scale is respectively 13*13,26*26,52*52 out.
Object detection unit 402, for constantly obtaining video flowing, using target detection model to the video flow detection of input
Target obtains location information of the single goal in video image, and calculates the direction vector feature between target, compares direction vector
Constantly close target is merged into fresh target by the variation tendency of feature in video streaming.In the specific embodiment of the invention, mesh
Detection unit 402 is marked respectively to the direction vector between the calculating of before and after frames video two-by-two target, and the space for extracting direction vector is special
Sign is the variation tendency between video flowing of length and direction of vector, if the length of the direction vector between two targets constantly reduce and
Direction is consistent in video interframe, illustrate two targets constantly it is close, then the IOU between two targets is calculated, if more than some
Threshold value then merges into fresh target for two.
Specifically, as shown in fig. 6, object detection unit 402 further comprises:
Former frame module of target detection 4021 carries out target detection using target detection model to video flowing present frame, obtains
To the location information of each target, and the direction vector between each target is calculated, extracts the feature of direction vector between target two-by-two.
In the specific embodiment of the invention, the direction vector method obtained between target two-by-two is as follows: assuming that each single goal exists
Location information in image is represented by (x1, y1, t1), (x2, y2, t1) ..., (xn, yn, t1), wherein t1Indicate t1Frame view
Frequently, (xn, yn) position coordinates that indicate target, the direction vector between target may be expressed as:
u1,2=(x1-x2, y1-y2, t1)
u1, n=(x1-xn, y1-yn, t1)
…
uN-1, n=(xn-1-xn, yn-1-yn, t1)
Wherein, uN-1, nIndicate the direction vector between (n-1)th target and n-th of target.
In the embodiment of the present invention, direction vector feature generally refers to the length characteristic and direction character of direction vector, i.e.,
The feature for extracting direction vector between target two-by-two is exactly to calculate length and the direction of direction vector, wherein the length of direction vector
Degree feature may be expressed as:
A later frame object detection unit 4022 is obtained for carrying out target detection using target detection model to next frame
The location information of each target, and the direction vector between each target is calculated, extract the feature of direction vector.Target detection herein
And the calculating of direction vector is identical as step S200, it will not be described here.
Trend judgement processing unit 4023, the direction vector feature and basis that a later frame object detection unit 4022 is obtained
The direction vector feature that former frame video obtains compares, and judges to whether there is close trend between target two-by-two, close if it exists
Trend, then continue next frame video, return to a later frame object detection unit 4022, until two-by-two target very close to (such as
Close to coincidence), merge target.If close trend is not present in continuous several frames between two targets, illustrate the two targets it
Between it is not related, behind then no longer pay close attention to relationship between the two targets.
In the present invention, trend close between target is by the direction vector of target two-by-two in the video frame of front and back two-by-two
What the feature that the continuous reduction of length and direction are consistent was judged, i.e. two direction vector walking direction of video interframe can
It indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to
Amount.In continually entering video flowing, if it exists | uN-1, n| > | u 'N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two targets have
Close trend.
In the specific embodiment of the invention, merging target is by the IOU (Intersection-over- between two targets
Union is handed over and is compared) size be to determine whether a fresh target should be merged by two targets.Specifically, when two targets
Between IOU (hand over and compare) two targets can synthesize to when being more than some threshold value T a fresh targets, between two targets
IOU may be expressed as:
Wherein A, B indicate two targets.
Action recognition unit 403, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only
When remaining major heading and secondary target, according to the two between the feature and target position for the direction vector that video interframe generates
IOU come judge movement generation.
In the specific embodiment of the invention, the difference of major heading and time target refers in multiple targets of movement decomposition,
The motion change of video interframe is less major heading, and remaining decomposition goal is kept moving in interframe, one be finally synthesizing
Fresh target is referred to as time target.Secondary target can be constantly close to major heading in video streaming, by the direction between two targets to
The IOU between space characteristics, that is, vector length, direction and the variable quantity and two targets of length of formation, which is measured, to judge movement is
No generation.
Fig. 7 is the process of the action identification method based on space characteristics and multi-target detection of the specific embodiment of the invention
Figure.In the present embodiment, it is as follows to be somebody's turn to do the action recognition process based on space characteristics and multi-target detection:
Step 1, will need the movement decomposition that detects is multiple targets, collects the data set of those targets, to data set into
Row pretreatment, is trained it to obtain the model of target detection by the YoloV3 network in deep learning.
In the present embodiment, carrying out pretreated method to data set has: being translated to the target in image, mirror image is grasped
Make, some Gaussian noises, salt-pepper noise etc. is added in target position, random cropping partial target image does image and shakes, fills out
Fill operation.The network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as the basic element of character, in three differences
The output of size detects target, then will test target and merged to obtain final goal by NMS, and output scale is respectively
13*13,26*26,52*52.
Step 2, input video frame carry out target detection to the frame video, obtain the location information of target, calculate each
Direction vector between target extracts the feature of target direction vector two-by-two;
In the embodiment of the present invention, the direction vector method obtained between target two-by-two is: the position letter of single goal in the picture
Breath is represented by (x1, y1, t1), (x2, y2, t1) ..., (xn, yn, t1), wherein t1Indicate t1Frame video, (xn, yn) indicate mesh
Target position coordinates, direction vector may be expressed as:
u1,2=(x1-x2, y1-y2, t1)
u1, n=(x1-xn, y1-yn, t1)
…
un-1, n=(xn-1-xn, yn-1-yn, t1)
Wherein, uN-1, nIndicate the direction vector between (n-1)th target and n-th of target.
In the embodiment of the present invention, the length characteristic of direction vector be may be expressed as:
Step 3, input video stream, same to carry out target detection and calculate the direction vector between target two-by-two again, and
It is compared with direction vector before, judges to whether there is close trend between target two-by-two, continue input video, until mesh
Mark is very close to merging target;
Two direction vector walking direction of video interframe may be expressed as:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to
Amount.
In continually entering video flowing, exist | uN-1, n| > | u 'N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two mesh
Close trend is indicated, two targets can be synthesized into a new mesh when the IOU between two targets is more than some threshold value T
It marks, the IOU between two targets may be expressed as:
Wherein A, B indicate two targets.
Step 4, when all targets of a movement decomposition merge to obtain only remaining major heading and when time target, according to the two
Video interframe generate direction vector feature and target position between IOU come judge movement generation.In this implementation
In example, a movement decomposition is referred to as major heading, remaining multiple lists for the little target of the mobile variation in multiple targets
Target constantly moves one fresh target of synthesis in interframe, and the fresh target finally obtained is referred to as time target, and secondary target can be in video
It is constantly close to major heading in stream, pass through space characteristics, that is, vector length of the direction vector formation between two targets, direction
Whether the IOU between the variable quantity and two targets of length occurs to judge to act.
In conclusion a kind of action identification method and system based on space characteristics and multi-target detection of the present invention pass through by
Movement decomposition is multiple simple targets and establishes target detection model, makes full use of the space vector in video between multiple target special
Sign, by interframe vector variation characteristic, by multiple targets continuous interframe movement relation and positional relationship come detection operation,
Realize the purpose for improving action recognition accuracy rate.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any
Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore,
The scope of the present invention, should be as listed in the claims.
Claims (10)
1. a kind of action identification method based on space characteristics and multi-target detection, includes the following steps:
Step S1, the movement detected to needs are decomposed, and each decomposition goal is obtained, and carry out data set to each decomposition goal
It collects, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal;
Step S2, constantly acquisition video flowing obtain monocular using the target detection model to the video flow detection target of input
The location information being marked in video image, and the direction vector feature between target is calculated, compare direction vector feature in video flowing
In variation tendency, constantly close target is merged into fresh target;
Step S3 extracts the fresh target of synthesis, merges to obtain only remaining major heading and time mesh in all targets of a movement decomposition
When mark, which is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates
Generation.
2. a kind of action identification method based on space characteristics and multi-target detection as described in claim 1, which is characterized in that
Step S1 further comprises:
Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained;
Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets;
Step S102 pre-processes the target data set of acquisition;
Step S103 is trained each target data set using YoloV3 network to obtain the target detection mould of each decomposition goal
Type.
3. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 2, it is characterised in that:
In step S102, the pretreatment including but not limited to translates the target in the image of target data set, mirror image is grasped
Make, Gaussian noise, salt-pepper noise is added in target position, random cropping partial target image shakes image, fills behaviour
Make.
4. a kind of action identification method based on space characteristics and multi-target detection as described in claim 1, which is characterized in that
Step S2 further comprises:
Step S200 carries out target detection using the target detection model to video flowing present frame, obtains the position of each target
Information, and the direction vector between each target is calculated, extract the feature of direction vector between target two-by-two;
Step S201 carries out target detection using the target detection model to next frame video, obtains the position letter of each target
Breath, and the direction vector between each target is calculated, extract the feature of direction vector;
The direction vector feature obtained according to before and after frames video is compared by step S202, is compared direction vector feature and is being regarded
The variation tendency of length characteristic and direction character in frequency stream, is merged into fresh target for constantly close target.
5. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 4, it is characterised in that:
In step S202, judge to whether there is close trend between target two-by-two, close trend, then continue next frame view if it exists
Frequently, return step S201 is overlapped until target is close two-by-two, and is merged into fresh target.
6. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 5, it is characterised in that:
In step S202, if in the video frame of front and back two-by-two the direction vector length of target it is continuous reduction and direction in video interframe
It is consistent, then illustrates two targets constantly close.
7. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 6, which is characterized in that
Two direction vector walking direction of video interframe indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-x′n)+(yn-1-yn)·(y′n-1-y′n)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, u 'N-1, nIndicate t2Direction vector between frame video object,
(xn, yn) indicate target position coordinates,
In continually entering video flowing, if it exists | uN-1, n|>|u′N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two targets have
Close trend.
8. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 5, it is characterised in that:
In step S202, by the IOU size between two targets to determine whether a fresh target should be merged into two targets.
9. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 8, it is characterised in that:
In step S202, the IOU between two targets is calculated, two targets are then merged into fresh target if more than some threshold value.
10. a kind of motion recognition system based on space characteristics and multi-target detection, comprising:
Target detection model training acquiring unit, the movement for detecting to needs are decomposed, and each decomposition goal is obtained, right
Each decomposition goal carries out the collection of data set, and is trained to obtain each decomposition goal to each target data set based on deep learning
Target detection model;
Object detection unit, for constantly obtaining video flowing, using the target detection model to the video flow detection mesh of input
Mark obtains location information of the single goal in video image, and calculates the direction vector feature between target, compares direction vector spy
Constantly close target is merged into fresh target by the variation tendency of sign in video streaming;
Action recognition unit, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only to be left master
When target and time target, according to IOU of the two between the feature and target position for the direction vector that video interframe generates come
The generation of judgement movement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910192305.XA CN109977818A (en) | 2019-03-14 | 2019-03-14 | A kind of action identification method and system based on space characteristics and multi-target detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910192305.XA CN109977818A (en) | 2019-03-14 | 2019-03-14 | A kind of action identification method and system based on space characteristics and multi-target detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977818A true CN109977818A (en) | 2019-07-05 |
Family
ID=67078860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910192305.XA Pending CN109977818A (en) | 2019-03-14 | 2019-03-14 | A kind of action identification method and system based on space characteristics and multi-target detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977818A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942011A (en) * | 2019-11-18 | 2020-03-31 | 上海极链网络科技有限公司 | Video event identification method, system, electronic equipment and medium |
CN111695638A (en) * | 2020-06-16 | 2020-09-22 | 兰州理工大学 | Improved YOLOv3 candidate box weighted fusion selection strategy |
WO2021017291A1 (en) * | 2019-07-31 | 2021-02-04 | 平安科技(深圳)有限公司 | Darkflow-deepsort-based multi-target tracking detection method, device, and storage medium |
CN112418278A (en) * | 2020-11-05 | 2021-02-26 | 中保车服科技服务股份有限公司 | Multi-class object detection method, terminal device and storage medium |
CN112967320A (en) * | 2021-04-02 | 2021-06-15 | 浙江华是科技股份有限公司 | Ship target detection tracking method based on bridge collision avoidance |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663429A (en) * | 2012-04-11 | 2012-09-12 | 上海交通大学 | Method for motion pattern classification and action recognition of moving target |
US20170039431A1 (en) * | 2015-08-03 | 2017-02-09 | Beijing Kuangshi Technology Co., Ltd. | Video monitoring method, video monitoring apparatus and video monitoring system |
CN108288032A (en) * | 2018-01-08 | 2018-07-17 | 深圳市腾讯计算机系统有限公司 | Motion characteristic acquisition methods, device and storage medium |
-
2019
- 2019-03-14 CN CN201910192305.XA patent/CN109977818A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663429A (en) * | 2012-04-11 | 2012-09-12 | 上海交通大学 | Method for motion pattern classification and action recognition of moving target |
US20170039431A1 (en) * | 2015-08-03 | 2017-02-09 | Beijing Kuangshi Technology Co., Ltd. | Video monitoring method, video monitoring apparatus and video monitoring system |
CN108288032A (en) * | 2018-01-08 | 2018-07-17 | 深圳市腾讯计算机系统有限公司 | Motion characteristic acquisition methods, device and storage medium |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021017291A1 (en) * | 2019-07-31 | 2021-02-04 | 平安科技(深圳)有限公司 | Darkflow-deepsort-based multi-target tracking detection method, device, and storage medium |
CN110942011A (en) * | 2019-11-18 | 2020-03-31 | 上海极链网络科技有限公司 | Video event identification method, system, electronic equipment and medium |
CN111695638A (en) * | 2020-06-16 | 2020-09-22 | 兰州理工大学 | Improved YOLOv3 candidate box weighted fusion selection strategy |
CN112418278A (en) * | 2020-11-05 | 2021-02-26 | 中保车服科技服务股份有限公司 | Multi-class object detection method, terminal device and storage medium |
CN112967320A (en) * | 2021-04-02 | 2021-06-15 | 浙江华是科技股份有限公司 | Ship target detection tracking method based on bridge collision avoidance |
CN112967320B (en) * | 2021-04-02 | 2023-05-30 | 浙江华是科技股份有限公司 | Ship target detection tracking method based on bridge anti-collision |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ke et al. | Multi-dimensional traffic congestion detection based on fusion of visual features and convolutional neural network | |
CN109977818A (en) | A kind of action identification method and system based on space characteristics and multi-target detection | |
CN109934176B (en) | Pedestrian recognition system, recognition method, and computer-readable storage medium | |
US10735694B2 (en) | System and method for activity monitoring using video data | |
CN103593464B (en) | Video fingerprint detecting and video sequence matching method and system based on visual features | |
CN108875600A (en) | A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO | |
CN110378931A (en) | A kind of pedestrian target motion track acquisition methods and system based on multi-cam | |
Lin et al. | Learning a scene background model via classification | |
CN110084165A (en) | The intelligent recognition and method for early warning of anomalous event under the open scene of power domain based on edge calculations | |
CN105931270A (en) | Video keyframe extraction method based on movement trajectory analysis | |
CN108875456A (en) | Object detection method, object detecting device and computer readable storage medium | |
Lovanshi et al. | Human pose estimation: benchmarking deep learning-based methods | |
CN116402850A (en) | Multi-target tracking method for intelligent driving | |
CN114677633A (en) | Multi-component feature fusion-based pedestrian detection multi-target tracking system and method | |
Liu et al. | A stochastic attribute grammar for robust cross-view human tracking | |
CN113780145A (en) | Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium | |
Wang et al. | Paul: Procrustean autoencoder for unsupervised lifting | |
Abdullah et al. | Vehicle counting using deep learning models: a comparative study | |
Lin et al. | Overview of 3d human pose estimation | |
Huang et al. | A detection method of individual fare evasion behaviours on metros based on skeleton sequence and time series | |
Liang et al. | Multiple object tracking by reliable tracklets | |
Chen et al. | Understanding dynamic associations: Gait recognition via cross-view spatiotemporal aggregation network | |
Zhang | [Retracted] An Intelligent and Fast Dance Action Recognition Model Using Two‐Dimensional Convolution Network Method | |
CN111832475B (en) | Face false detection screening method based on semantic features | |
CN106372650A (en) | Motion prediction-based compression tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20230404 |