CN109977818A - A kind of action identification method and system based on space characteristics and multi-target detection - Google Patents

A kind of action identification method and system based on space characteristics and multi-target detection Download PDF

Info

Publication number
CN109977818A
CN109977818A CN201910192305.XA CN201910192305A CN109977818A CN 109977818 A CN109977818 A CN 109977818A CN 201910192305 A CN201910192305 A CN 201910192305A CN 109977818 A CN109977818 A CN 109977818A
Authority
CN
China
Prior art keywords
target
direction vector
video
target detection
targets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910192305.XA
Other languages
Chinese (zh)
Inventor
刘维
张奕
李滇博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jilian Network Technology Co Ltd
Original Assignee
Shanghai Jilian Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jilian Network Technology Co Ltd filed Critical Shanghai Jilian Network Technology Co Ltd
Priority to CN201910192305.XA priority Critical patent/CN109977818A/en
Publication of CN109977818A publication Critical patent/CN109977818A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of action identification method and system based on space characteristics and multi-target detection, this method comprises: step S1, the movement detected to needs is decomposed, obtain each decomposition goal, the collection of data set is carried out to each decomposition goal, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal;Step S2, constantly obtain video flowing, using the target detection model to the video flow detection target of input, obtain location information of the single goal in video image, and calculate the direction vector feature between target, compare the variation tendency of direction vector feature in video streaming, constantly close target is merged into fresh target;Step S3 extracts the fresh target of synthesis, merges to obtain only remaining major heading and when time target in all targets of a movement decomposition, the generation of the movement is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates.

Description

A kind of action identification method and system based on space characteristics and multi-target detection
Technical field
The present invention relates to technical field of image processing, more particularly to a kind of dynamic based on space characteristics and multi-target detection Make recognition methods and system.
Background technique
The fast development of science and technology makes the demand for understanding video content and analyzing more and more, the prison being seen everywhere Control camera allows us to obtain the video information of magnanimity, and the method efficiency of traditional artificial treatment video is too low, and And accuracy rate is not high, therefore, by computer and vision algorithm in conjunction with technology received widely to substitute artificial method Concern, this technology can not only greatly reduce cost of labor, improve the efficiency of processing event, but also discovery thing can be improved The accuracy rate of part and the timeliness for handling event.In field of video content analysis, action recognition is one very important Branch, detection effect and performance are most important for the detection of anomalous event and behavior, so, action recognition has very strong Social effect.
By the way that discovery is investigated to existing action identification method, most of action recognitions are all based in deep learning Convolutional neural networks carry out single frames picture single goal detection or extract feature after, classification prediction is carried out by SVM classifier, Also there is the Optical-flow Feature for extracting single goal in successive frame, be then based on deep learning and be trained classification, ballot statistics obtains pre- It surveys as a result, extract feature however, these action identification methods are all based on and act single goal itself and analyzed, thus when dynamic When complexity of making comparisons, certain features are easy to be ignored, so as to cause the ineffective of detection.
Summary of the invention
In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide a kind of based on space characteristics and more The action identification method and system of target detection, with by simplifying compound action, and by utilizing the space between multiple target The accuracy rate of feature raising action recognition.
In order to achieve the above object, the present invention proposes a kind of action identification method based on space characteristics and multi-target detection, packet Include following steps:
Step S1, the movement detected to needs are decomposed, and each decomposition goal is obtained, and carry out data to each decomposition goal The collection of collection, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal;
Step S2, constantly acquisition video flowing are obtained using the target detection model to the video flow detection target of input Location information of the single goal in video image, and the direction vector feature between target is calculated, compare direction vector feature and is regarding Constantly close target is merged into fresh target by the variation tendency in frequency stream;
Step S3 extracts the fresh target of synthesis, in all targets of a movement decomposition merge only remaining major heading and When secondary target, this is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates The generation of movement.
Preferably, step S1 further comprises:
Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained;
Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets;
Step S102 pre-processes the target data set of acquisition;
Step S103 is trained each target data set using YoloV3 network to obtain the target detection of each decomposition goal Model.
Preferably, in step S102, the pretreatment includes but is not limited to the target in the image of target data set It is translated, mirror image operation, some Gaussian noises, salt-pepper noise is added in target position, random cropping partial target image is right Image shaken, padding.
Preferably, step S2 further comprises:
Step S200 carries out target detection using the target detection model to video flowing present frame, obtains each target Location information, and the direction vector between each target is calculated, extract the feature of direction vector between target two-by-two;
Step S201 carries out target detection using the target detection model to next frame video, obtains the position of each target Confidence breath, and the direction vector between each target is calculated, extract the feature of direction vector;
The direction vector feature obtained according to before and after frames video is compared, compares direction vector feature by step S202 Constantly close target is merged into fresh target by the variation tendency of length characteristic and direction character in video streaming.
Preferably, in step S202, judgement whether there is close trend between target two-by-two, close if it exists to become Gesture, then continue next frame video, and return step S201 is overlapped until target is close two-by-two, and is merged into fresh target.
Preferably, in step S202, if in the video frame of front and back two-by-two the direction vector length of target it is continuous reduction with And direction is consistent in video interframe, then illustrates two targets constantly close.
Preferably, two direction vector walking direction of video interframe indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to Amount, (xn, yn) indicate target position coordinates,
In continually entering video flowing, if it exists | uN-1, n| > | u 'N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two Target has the tendency that close.
Preferably, in step S202, by the IOU size between two targets to determine whether two targets should be closed And at a fresh target.
Preferably, in step S202, the IOU between two targets is calculated, then merges two targets if more than some threshold value For fresh target.
In order to achieve the above objectives, the present invention also provides a kind of action recognition system based on space characteristics and multi-target detection System, comprising:
Target detection model training acquiring unit, the movement for detecting to needs are decomposed, and each decomposition mesh is obtained Mark carries out the collection of data set to each decomposition goal, and is trained to obtain each point to each target data set based on deep learning Solve the target detection model of target;
Object detection unit is examined for constantly obtaining video flowing using video flowing of the target detection model to input Survey target, obtain location information of the single goal in video image, and calculate the direction vector feature between target, compare direction to Constantly close target is merged into fresh target by the variation tendency of measure feature in video streaming;
Action recognition unit, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only surplus When lower major heading and secondary target, according to the two between the feature and target position for the direction vector that video interframe generates IOU come judge movement generation.
Compared with prior art, a kind of action identification method and system based on space characteristics and multi-target detection of the present invention By being multiple simple targets by movement decomposition and establishing target detection model, the space in video between multiple target is made full use of Vector characteristics, by interframe vector variation characteristic, movement relation and positional relationship by multiple targets in continuous interframe are examined Survey movement, realizes the purpose for improving action recognition accuracy rate.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of the action identification method based on space characteristics and multi-target detection of the present invention;
Fig. 2 is the network structure of YoloV3 network in the specific embodiment of the invention;
Fig. 3 is the schematic diagram that IOU is calculated in the specific embodiment of the invention;
Fig. 4 is a kind of system architecture diagram of the motion recognition system based on space characteristics and multi-target detection of the present invention;
Fig. 5 is the detail structure chart of target detection model training acquiring unit in the specific embodiment of the invention;
Fig. 6 is the detail structure chart of object detection unit in the specific embodiment of the invention;
Fig. 7 is the process of the action identification method based on space characteristics and multi-target detection of the specific embodiment of the invention Figure.
Specific embodiment
Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from Various modifications and change are carried out under spirit of the invention.
Fig. 1 is a kind of step flow chart of the action identification method based on space characteristics and multi-target detection of the present invention.Such as Shown in Fig. 1, a kind of action identification method based on space characteristics and multi-target detection of the present invention includes the following steps:
Step S1 decomposes the movement that detects of needs, obtains each decomposition goal, such as this movement of drinking can be with Be decomposed into wineglass, hand, three targets of mouth (such as Manual definition will drink movement decomposition be three wineglass, hand and mouth mesh Mark), carry out the collection of data set to each decomposition goal, the data set some can be looked in online certain disclosed data sets It arrives, some then need to collect pictures is labeled by software such as labelImg, and based on deep learning to each target data Collect and is trained to obtain the target detection model of each decomposition goal, in the present invention, the data set training based on all decomposition goals A target detection model is obtained, this target detection model can detecte all targets in data set.
Specifically, step S1 further comprises:
Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained.
Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets.Such as it is received from network Collection includes the picture of each decomposition goal, and the picture comprising identical decomposition goal is collected together, the mesh of the decomposition goal is formed Data set is marked, such as the picture comprising each decomposition goal can be downloaded from the Internet, the figure of single decomposition goal can also be downloaded, is led to It crosses annotation tool the target in picture is marked out to form the target data set of the decomposition goal, it includes original images The comment file generated with mark.
Step S102 pre-processes the target data set of acquisition.Specifically, in order to improve the property of target detection Can, before being trained based on deep learning to target data set, the image concentrated to the target data of acquisition is pre-processed, The pretreatment includes but is not limited to: being translated to the target in the image of target data set, mirror image operation, in target position Be added some Gaussian noises, salt-pepper noise etc., random cropping partial target image, image is done shake, padding.
Step S103 is trained each target data set using YoloV3 network to obtain the target detection of each decomposition goal Model.
In the present invention, the network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as basic portion Part detects target in three various sizes of outputs, then will test target and merged to obtain final goal by NMS, defeated Scale is respectively 13*13,26*26,52*52 out, and the network structure of YoloV3 network is as shown in Fig. 2, wherein residual block part one 21 convolutional layers (convolutional layer including several 3*3 and 1*1 convolution) is shared, remaining is res layers, and the part YOLO is yolo network Feature interaction layer, is divided into three scales, and in each scale, local feature interaction is realized by way of convolution kernel, acts on class It is similar to full articulamentum but is the part realized by way of convolution kernel (3*3 and 1*1) between characteristic pattern (feature map) Feature interaction, in the specific embodiment of the invention, Far Left is smallest dimension yolo layers, and input is the characteristic pattern of 13*13 (feature map) exports the characteristic pattern (feature map) of 13*13 size, in this base by a series of convolution operation Classification is carried out on plinth and position returns;Centre is mesoscale yolo layers, the characteristic pattern that yolo layers of smallest dimension are exported (feature map) carries out a series of convolution operations, exports the characteristic pattern (feature map) of 26*26 size, then herein into Row classification and position return;Rightmost is the yolo layer of large scale, the characteristic pattern (feature that yolo layers of mesoscale are exported Map a series of convolution operations) are carried out, the characteristic pattern (feature map) of 52*52 size is exported, then carry out herein classification and Position returns.
Step S2, constantly acquisition video flowing obtain monocular using target detection model to the video flow detection target of input The location information being marked in video image, and the direction vector feature between target is calculated, compare direction vector feature in video flowing In variation tendency, constantly close target is merged into fresh target.In the specific embodiment of the invention, before and after frames are regarded respectively Frequency meter calculates the direction vector between target two-by-two, and the space characteristics i.e. length of vector for extracting direction vector and direction are in video flowing Between variation tendency, if the length of the direction vector between two targets constantly reduces and direction is consistent in video interframe, explanation Two targets constantly it is close, then calculate the IOU between two targets, then merge into fresh target for two if more than some threshold value.
Specifically, step S2 further comprises:
Step S200 carries out target detection using the target detection model that training obtains in S1 to video flowing present frame, obtains To the location information of each target, and the direction vector between each target is calculated, extracts the feature of direction vector between target two-by-two.
In the specific embodiment of the invention, the direction vector method obtained between target two-by-two is as follows: assuming that each single goal exists Location information in image is represented by (x1, y1, t1), (x2, y2, t1) ..., (xn, yn, t1), wherein t1Indicate t1Frame view Frequently, (xn, yn) position coordinates that indicate target, the direction vector between target may be expressed as:
u1,2=(x1-x2, y1-y2, t1)
u1, n=(x1-xn, y1-yn, t1)
uN-1, n=(xn-1-xn, yn-1-yn, t1)
Wherein, uN-1, nIndicate the direction vector between (n-1)th target and n-th of target.
In the embodiment of the present invention, direction vector feature generally refers to the length characteristic and direction character of direction vector, i.e., The feature for extracting direction vector between target two-by-two is exactly to calculate length and the direction of direction vector, wherein the length of direction vector Degree feature may be expressed as:
Step S201 carries out target detection using target detection model to next frame, obtains the location information of each target, and The direction vector between each target is calculated, the feature of direction vector is extracted.The calculating of target detection and direction vector herein Identical as step S200, it will not be described here.
Step S202, the direction vector feature that step S201 is obtained and the direction vector obtained according to former frame video are special Sign compares, and compares the variation tendency of direction vector feature length characteristic in video streaming and direction character, will be constantly close Target is merged into fresh target, that is, judges to whether there is close trend between target two-by-two, if it exists close trend, then under continuing One frame video, return step S201, until target merges constantly close target very close to (such as close coincidence) two-by-two At fresh target, if close trend is not present in continuous several frames between two targets, illustrate it is not related between the two targets, The relationship between the two targets is then no longer paid close attention to below.
In the present invention, trend close between target is by the direction vector of target two-by-two in the video frame of front and back two-by-two What the feature that the continuous reduction of length and direction are consistent was judged, i.e. two direction vector walking direction of video interframe can It indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to Amount.In continually entering video flowing, if it exists | uN-1, n|>|u′N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two targets connect Close trend.
In the specific embodiment of the invention, merging target is by the IOU (Intersection-over- between two targets Union is handed over and is compared) size be to determine whether a fresh target should be merged by two targets.Specifically, when two targets Between IOU (hand over and compare) two targets can synthesize to when being more than some threshold value T a fresh targets, between two targets IOU may be expressed as:
Wherein A, B indicate two targets, and the schematic diagram of IOU is as shown in Figure 3.
Step S3 extracts the fresh target of synthesis, when all targets of a movement decomposition merge only remaining major heading and When secondary target, judge to move according to IOU of the two between the feature and target position for the direction vector that video interframe generates The generation of work.
In the specific embodiment of the invention, the difference of major heading and time target refers in multiple targets of movement decomposition, The motion change of video interframe is less major heading, and remaining decomposition goal is kept moving in interframe, one be finally synthesizing Fresh target is referred to as time target.Secondary target can be constantly close to major heading in video streaming, by the direction between two targets to The IOU between space characteristics, that is, vector length, direction and the variable quantity and two targets of length of formation, which is measured, to judge movement is No generation.
Fig. 4 is a kind of system architecture diagram of the motion recognition system based on space characteristics and multi-target detection of the present invention.Such as Shown in Fig. 4, a kind of motion recognition system based on space characteristics and multi-target detection of the present invention, comprising:
Target detection model training acquiring unit 401, the movement for detecting to needs are decomposed, and each decomposition is obtained Target carries out the collection of data set to each decomposition goal, and is trained to obtain respectively to each target data set based on deep learning The target detection model of decomposition goal.For example, three wineglass, hand, mouth targets can be decomposed into for this movement of drinking, it is right Each decomposition goal carries out the collection of data set, some can be focused to find out the data set in online certain disclosed data, some It then needs to collect pictures and be labeled by software such as labelImg, be then based on deep learning and each target data set is carried out Training obtains the target detection model of each decomposition goal, this target detection model can detecte all targets in data set.
Specifically, as shown in figure 5, target detection model training acquiring unit 401 further comprises:
Movement decomposition unit 4010, the movement detected to needs are decomposed, several decomposition goals are obtained.
Target data set collector unit 4011 obtains multiple number of targets for carrying out data set collection to each decomposition goal According to collection.Such as the picture comprising each decomposition goal is collected from network, the picture comprising identical decomposition goal is collected together, Form the target data set of the decomposition goal.
Pretreatment unit 4012, for being pre-processed to the target data set of acquisition.Specifically, in order to improve target The performance of detection need to be obtained using 4012 Duis of pretreatment unit before being trained based on deep learning to target data set The image that target data is concentrated is pre-processed, and the pretreatment includes but is not limited to: to the mesh in the image of target data set Mark translated, mirror image operation, and some Gaussian noises, salt-pepper noise etc., random cropping partial target figure is added in target position Picture, image is done shake, padding.
Model training unit 4013 obtains each decomposition mesh for being trained using YoloV3 network to each target data set Target target detection model.
In the present invention, the network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as basic portion Part detects target in three various sizes of outputs, then will test target and merged to obtain final goal by NMS, defeated Scale is respectively 13*13,26*26,52*52 out.
Object detection unit 402, for constantly obtaining video flowing, using target detection model to the video flow detection of input Target obtains location information of the single goal in video image, and calculates the direction vector feature between target, compares direction vector Constantly close target is merged into fresh target by the variation tendency of feature in video streaming.In the specific embodiment of the invention, mesh Detection unit 402 is marked respectively to the direction vector between the calculating of before and after frames video two-by-two target, and the space for extracting direction vector is special Sign is the variation tendency between video flowing of length and direction of vector, if the length of the direction vector between two targets constantly reduce and Direction is consistent in video interframe, illustrate two targets constantly it is close, then the IOU between two targets is calculated, if more than some Threshold value then merges into fresh target for two.
Specifically, as shown in fig. 6, object detection unit 402 further comprises:
Former frame module of target detection 4021 carries out target detection using target detection model to video flowing present frame, obtains To the location information of each target, and the direction vector between each target is calculated, extracts the feature of direction vector between target two-by-two.
In the specific embodiment of the invention, the direction vector method obtained between target two-by-two is as follows: assuming that each single goal exists Location information in image is represented by (x1, y1, t1), (x2, y2, t1) ..., (xn, yn, t1), wherein t1Indicate t1Frame view Frequently, (xn, yn) position coordinates that indicate target, the direction vector between target may be expressed as:
u1,2=(x1-x2, y1-y2, t1)
u1, n=(x1-xn, y1-yn, t1)
uN-1, n=(xn-1-xn, yn-1-yn, t1)
Wherein, uN-1, nIndicate the direction vector between (n-1)th target and n-th of target.
In the embodiment of the present invention, direction vector feature generally refers to the length characteristic and direction character of direction vector, i.e., The feature for extracting direction vector between target two-by-two is exactly to calculate length and the direction of direction vector, wherein the length of direction vector Degree feature may be expressed as:
A later frame object detection unit 4022 is obtained for carrying out target detection using target detection model to next frame The location information of each target, and the direction vector between each target is calculated, extract the feature of direction vector.Target detection herein And the calculating of direction vector is identical as step S200, it will not be described here.
Trend judgement processing unit 4023, the direction vector feature and basis that a later frame object detection unit 4022 is obtained The direction vector feature that former frame video obtains compares, and judges to whether there is close trend between target two-by-two, close if it exists Trend, then continue next frame video, return to a later frame object detection unit 4022, until two-by-two target very close to (such as Close to coincidence), merge target.If close trend is not present in continuous several frames between two targets, illustrate the two targets it Between it is not related, behind then no longer pay close attention to relationship between the two targets.
In the present invention, trend close between target is by the direction vector of target two-by-two in the video frame of front and back two-by-two What the feature that the continuous reduction of length and direction are consistent was judged, i.e. two direction vector walking direction of video interframe can It indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to Amount.In continually entering video flowing, if it exists | uN-1, n| > | u 'N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two targets have Close trend.
In the specific embodiment of the invention, merging target is by the IOU (Intersection-over- between two targets Union is handed over and is compared) size be to determine whether a fresh target should be merged by two targets.Specifically, when two targets Between IOU (hand over and compare) two targets can synthesize to when being more than some threshold value T a fresh targets, between two targets IOU may be expressed as:
Wherein A, B indicate two targets.
Action recognition unit 403, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only When remaining major heading and secondary target, according to the two between the feature and target position for the direction vector that video interframe generates IOU come judge movement generation.
In the specific embodiment of the invention, the difference of major heading and time target refers in multiple targets of movement decomposition, The motion change of video interframe is less major heading, and remaining decomposition goal is kept moving in interframe, one be finally synthesizing Fresh target is referred to as time target.Secondary target can be constantly close to major heading in video streaming, by the direction between two targets to The IOU between space characteristics, that is, vector length, direction and the variable quantity and two targets of length of formation, which is measured, to judge movement is No generation.
Fig. 7 is the process of the action identification method based on space characteristics and multi-target detection of the specific embodiment of the invention Figure.In the present embodiment, it is as follows to be somebody's turn to do the action recognition process based on space characteristics and multi-target detection:
Step 1, will need the movement decomposition that detects is multiple targets, collects the data set of those targets, to data set into Row pretreatment, is trained it to obtain the model of target detection by the YoloV3 network in deep learning.
In the present embodiment, carrying out pretreated method to data set has: being translated to the target in image, mirror image is grasped Make, some Gaussian noises, salt-pepper noise etc. is added in target position, random cropping partial target image does image and shakes, fills out Fill operation.The network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as the basic element of character, in three differences The output of size detects target, then will test target and merged to obtain final goal by NMS, and output scale is respectively 13*13,26*26,52*52.
Step 2, input video frame carry out target detection to the frame video, obtain the location information of target, calculate each Direction vector between target extracts the feature of target direction vector two-by-two;
In the embodiment of the present invention, the direction vector method obtained between target two-by-two is: the position letter of single goal in the picture Breath is represented by (x1, y1, t1), (x2, y2, t1) ..., (xn, yn, t1), wherein t1Indicate t1Frame video, (xn, yn) indicate mesh Target position coordinates, direction vector may be expressed as:
u1,2=(x1-x2, y1-y2, t1)
u1, n=(x1-xn, y1-yn, t1)
un-1, n=(xn-1-xn, yn-1-yn, t1)
Wherein, uN-1, nIndicate the direction vector between (n-1)th target and n-th of target.
In the embodiment of the present invention, the length characteristic of direction vector be may be expressed as:
Step 3, input video stream, same to carry out target detection and calculate the direction vector between target two-by-two again, and It is compared with direction vector before, judges to whether there is close trend between target two-by-two, continue input video, until mesh Mark is very close to merging target;
Two direction vector walking direction of video interframe may be expressed as:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-xn′)+(yn-1-yn)·(y′n-1-yn′)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, uN-1, nIndicate t2Between frame video object direction to Amount.
In continually entering video flowing, exist | uN-1, n| > | u 'N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two mesh Close trend is indicated, two targets can be synthesized into a new mesh when the IOU between two targets is more than some threshold value T It marks, the IOU between two targets may be expressed as:
Wherein A, B indicate two targets.
Step 4, when all targets of a movement decomposition merge to obtain only remaining major heading and when time target, according to the two Video interframe generate direction vector feature and target position between IOU come judge movement generation.In this implementation In example, a movement decomposition is referred to as major heading, remaining multiple lists for the little target of the mobile variation in multiple targets Target constantly moves one fresh target of synthesis in interframe, and the fresh target finally obtained is referred to as time target, and secondary target can be in video It is constantly close to major heading in stream, pass through space characteristics, that is, vector length of the direction vector formation between two targets, direction Whether the IOU between the variable quantity and two targets of length occurs to judge to act.
In conclusion a kind of action identification method and system based on space characteristics and multi-target detection of the present invention pass through by Movement decomposition is multiple simple targets and establishes target detection model, makes full use of the space vector in video between multiple target special Sign, by interframe vector variation characteristic, by multiple targets continuous interframe movement relation and positional relationship come detection operation, Realize the purpose for improving action recognition accuracy rate.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore, The scope of the present invention, should be as listed in the claims.

Claims (10)

1. a kind of action identification method based on space characteristics and multi-target detection, includes the following steps:
Step S1, the movement detected to needs are decomposed, and each decomposition goal is obtained, and carry out data set to each decomposition goal It collects, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal;
Step S2, constantly acquisition video flowing obtain monocular using the target detection model to the video flow detection target of input The location information being marked in video image, and the direction vector feature between target is calculated, compare direction vector feature in video flowing In variation tendency, constantly close target is merged into fresh target;
Step S3 extracts the fresh target of synthesis, merges to obtain only remaining major heading and time mesh in all targets of a movement decomposition When mark, which is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates Generation.
2. a kind of action identification method based on space characteristics and multi-target detection as described in claim 1, which is characterized in that Step S1 further comprises:
Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained;
Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets;
Step S102 pre-processes the target data set of acquisition;
Step S103 is trained each target data set using YoloV3 network to obtain the target detection mould of each decomposition goal Type.
3. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 2, it is characterised in that: In step S102, the pretreatment including but not limited to translates the target in the image of target data set, mirror image is grasped Make, Gaussian noise, salt-pepper noise is added in target position, random cropping partial target image shakes image, fills behaviour Make.
4. a kind of action identification method based on space characteristics and multi-target detection as described in claim 1, which is characterized in that Step S2 further comprises:
Step S200 carries out target detection using the target detection model to video flowing present frame, obtains the position of each target Information, and the direction vector between each target is calculated, extract the feature of direction vector between target two-by-two;
Step S201 carries out target detection using the target detection model to next frame video, obtains the position letter of each target Breath, and the direction vector between each target is calculated, extract the feature of direction vector;
The direction vector feature obtained according to before and after frames video is compared by step S202, is compared direction vector feature and is being regarded The variation tendency of length characteristic and direction character in frequency stream, is merged into fresh target for constantly close target.
5. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 4, it is characterised in that: In step S202, judge to whether there is close trend between target two-by-two, close trend, then continue next frame view if it exists Frequently, return step S201 is overlapped until target is close two-by-two, and is merged into fresh target.
6. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 5, it is characterised in that: In step S202, if in the video frame of front and back two-by-two the direction vector length of target it is continuous reduction and direction in video interframe It is consistent, then illustrates two targets constantly close.
7. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 6, which is characterized in that Two direction vector walking direction of video interframe indicates are as follows:
uN-1, n·u′N-1, n=(xn-1-xn)·(x′n-1-x′n)+(yn-1-yn)·(y′n-1-y′n)
Wherein, uN-1, nIndicate t1Direction vector between frame video object, u 'N-1, nIndicate t2Direction vector between frame video object, (xn, yn) indicate target position coordinates,
In continually entering video flowing, if it exists | uN-1, n|>|u′N-1, n| and uN-1, n·u′N-1, n> 0, then it represents that two targets have Close trend.
8. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 5, it is characterised in that: In step S202, by the IOU size between two targets to determine whether a fresh target should be merged into two targets.
9. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 8, it is characterised in that: In step S202, the IOU between two targets is calculated, two targets are then merged into fresh target if more than some threshold value.
10. a kind of motion recognition system based on space characteristics and multi-target detection, comprising:
Target detection model training acquiring unit, the movement for detecting to needs are decomposed, and each decomposition goal is obtained, right Each decomposition goal carries out the collection of data set, and is trained to obtain each decomposition goal to each target data set based on deep learning Target detection model;
Object detection unit, for constantly obtaining video flowing, using the target detection model to the video flow detection mesh of input Mark obtains location information of the single goal in video image, and calculates the direction vector feature between target, compares direction vector spy Constantly close target is merged into fresh target by the variation tendency of sign in video streaming;
Action recognition unit, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only to be left master When target and time target, according to IOU of the two between the feature and target position for the direction vector that video interframe generates come The generation of judgement movement.
CN201910192305.XA 2019-03-14 2019-03-14 A kind of action identification method and system based on space characteristics and multi-target detection Pending CN109977818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910192305.XA CN109977818A (en) 2019-03-14 2019-03-14 A kind of action identification method and system based on space characteristics and multi-target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910192305.XA CN109977818A (en) 2019-03-14 2019-03-14 A kind of action identification method and system based on space characteristics and multi-target detection

Publications (1)

Publication Number Publication Date
CN109977818A true CN109977818A (en) 2019-07-05

Family

ID=67078860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910192305.XA Pending CN109977818A (en) 2019-03-14 2019-03-14 A kind of action identification method and system based on space characteristics and multi-target detection

Country Status (1)

Country Link
CN (1) CN109977818A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942011A (en) * 2019-11-18 2020-03-31 上海极链网络科技有限公司 Video event identification method, system, electronic equipment and medium
CN111695638A (en) * 2020-06-16 2020-09-22 兰州理工大学 Improved YOLOv3 candidate box weighted fusion selection strategy
WO2021017291A1 (en) * 2019-07-31 2021-02-04 平安科技(深圳)有限公司 Darkflow-deepsort-based multi-target tracking detection method, device, and storage medium
CN112418278A (en) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 Multi-class object detection method, terminal device and storage medium
CN112967320A (en) * 2021-04-02 2021-06-15 浙江华是科技股份有限公司 Ship target detection tracking method based on bridge collision avoidance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663429A (en) * 2012-04-11 2012-09-12 上海交通大学 Method for motion pattern classification and action recognition of moving target
US20170039431A1 (en) * 2015-08-03 2017-02-09 Beijing Kuangshi Technology Co., Ltd. Video monitoring method, video monitoring apparatus and video monitoring system
CN108288032A (en) * 2018-01-08 2018-07-17 深圳市腾讯计算机系统有限公司 Motion characteristic acquisition methods, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663429A (en) * 2012-04-11 2012-09-12 上海交通大学 Method for motion pattern classification and action recognition of moving target
US20170039431A1 (en) * 2015-08-03 2017-02-09 Beijing Kuangshi Technology Co., Ltd. Video monitoring method, video monitoring apparatus and video monitoring system
CN108288032A (en) * 2018-01-08 2018-07-17 深圳市腾讯计算机系统有限公司 Motion characteristic acquisition methods, device and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021017291A1 (en) * 2019-07-31 2021-02-04 平安科技(深圳)有限公司 Darkflow-deepsort-based multi-target tracking detection method, device, and storage medium
CN110942011A (en) * 2019-11-18 2020-03-31 上海极链网络科技有限公司 Video event identification method, system, electronic equipment and medium
CN111695638A (en) * 2020-06-16 2020-09-22 兰州理工大学 Improved YOLOv3 candidate box weighted fusion selection strategy
CN112418278A (en) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 Multi-class object detection method, terminal device and storage medium
CN112967320A (en) * 2021-04-02 2021-06-15 浙江华是科技股份有限公司 Ship target detection tracking method based on bridge collision avoidance
CN112967320B (en) * 2021-04-02 2023-05-30 浙江华是科技股份有限公司 Ship target detection tracking method based on bridge anti-collision

Similar Documents

Publication Publication Date Title
Ke et al. Multi-dimensional traffic congestion detection based on fusion of visual features and convolutional neural network
CN109977818A (en) A kind of action identification method and system based on space characteristics and multi-target detection
CN109934176B (en) Pedestrian recognition system, recognition method, and computer-readable storage medium
US10735694B2 (en) System and method for activity monitoring using video data
CN103593464B (en) Video fingerprint detecting and video sequence matching method and system based on visual features
CN108875600A (en) A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO
CN110378931A (en) A kind of pedestrian target motion track acquisition methods and system based on multi-cam
Lin et al. Learning a scene background model via classification
CN110084165A (en) The intelligent recognition and method for early warning of anomalous event under the open scene of power domain based on edge calculations
CN105931270A (en) Video keyframe extraction method based on movement trajectory analysis
CN108875456A (en) Object detection method, object detecting device and computer readable storage medium
Lovanshi et al. Human pose estimation: benchmarking deep learning-based methods
CN116402850A (en) Multi-target tracking method for intelligent driving
CN114677633A (en) Multi-component feature fusion-based pedestrian detection multi-target tracking system and method
Liu et al. A stochastic attribute grammar for robust cross-view human tracking
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
Wang et al. Paul: Procrustean autoencoder for unsupervised lifting
Abdullah et al. Vehicle counting using deep learning models: a comparative study
Lin et al. Overview of 3d human pose estimation
Huang et al. A detection method of individual fare evasion behaviours on metros based on skeleton sequence and time series
Liang et al. Multiple object tracking by reliable tracklets
Chen et al. Understanding dynamic associations: Gait recognition via cross-view spatiotemporal aggregation network
Zhang [Retracted] An Intelligent and Fast Dance Action Recognition Model Using Two‐Dimensional Convolution Network Method
CN111832475B (en) Face false detection screening method based on semantic features
CN106372650A (en) Motion prediction-based compression tracking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20230404