CN109977818A

CN109977818A - A kind of action identification method and system based on space characteristics and multi-target detection

Info

Publication number: CN109977818A
Application number: CN201910192305.XA
Authority: CN
Inventors: 刘维; 张奕; 李滇博
Original assignee: Shanghai Jilian Network Technology Co Ltd
Current assignee: Shanghai Jilian Network Technology Co Ltd
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2019-07-05

Abstract

The invention discloses a kind of action identification method and system based on space characteristics and multi-target detection, this method comprises: step S1, the movement detected to needs is decomposed, obtain each decomposition goal, the collection of data set is carried out to each decomposition goal, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal；Step S2, constantly obtain video flowing, using the target detection model to the video flow detection target of input, obtain location information of the single goal in video image, and calculate the direction vector feature between target, compare the variation tendency of direction vector feature in video streaming, constantly close target is merged into fresh target；Step S3 extracts the fresh target of synthesis, merges to obtain only remaining major heading and when time target in all targets of a movement decomposition, the generation of the movement is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates.

Description

A kind of action identification method and system based on space characteristics and multi-target detection

Technical field

The present invention relates to technical field of image processing, more particularly to a kind of dynamic based on space characteristics and multi-target detection Make recognition methods and system.

Background technique

The fast development of science and technology makes the demand for understanding video content and analyzing more and more, the prison being seen everywhere Control camera allows us to obtain the video information of magnanimity, and the method efficiency of traditional artificial treatment video is too low, and And accuracy rate is not high, therefore, by computer and vision algorithm in conjunction with technology received widely to substitute artificial method Concern, this technology can not only greatly reduce cost of labor, improve the efficiency of processing event, but also discovery thing can be improved The accuracy rate of part and the timeliness for handling event.In field of video content analysis, action recognition is one very important Branch, detection effect and performance are most important for the detection of anomalous event and behavior, so, action recognition has very strong Social effect.

By the way that discovery is investigated to existing action identification method, most of action recognitions are all based in deep learning Convolutional neural networks carry out single frames picture single goal detection or extract feature after, classification prediction is carried out by SVM classifier, Also there is the Optical-flow Feature for extracting single goal in successive frame, be then based on deep learning and be trained classification, ballot statistics obtains pre- It surveys as a result, extract feature however, these action identification methods are all based on and act single goal itself and analyzed, thus when dynamic When complexity of making comparisons, certain features are easy to be ignored, so as to cause the ineffective of detection.

Summary of the invention

In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide a kind of based on space characteristics and more The action identification method and system of target detection, with by simplifying compound action, and by utilizing the space between multiple target The accuracy rate of feature raising action recognition.

In order to achieve the above object, the present invention proposes a kind of action identification method based on space characteristics and multi-target detection, packet Include following steps:

Step S1, the movement detected to needs are decomposed, and each decomposition goal is obtained, and carry out data to each decomposition goal The collection of collection, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal；

Step S2, constantly acquisition video flowing are obtained using the target detection model to the video flow detection target of input Location information of the single goal in video image, and the direction vector feature between target is calculated, compare direction vector feature and is regarding Constantly close target is merged into fresh target by the variation tendency in frequency stream；

Step S3 extracts the fresh target of synthesis, in all targets of a movement decomposition merge only remaining major heading and When secondary target, this is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates The generation of movement.

Preferably, step S1 further comprises:

Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained；

Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets；

Step S102 pre-processes the target data set of acquisition；

Step S103 is trained each target data set using YoloV3 network to obtain the target detection of each decomposition goal Model.

Preferably, in step S102, the pretreatment includes but is not limited to the target in the image of target data set It is translated, mirror image operation, some Gaussian noises, salt-pepper noise is added in target position, random cropping partial target image is right Image shaken, padding.

Preferably, step S2 further comprises:

Step S200 carries out target detection using the target detection model to video flowing present frame, obtains each target Location information, and the direction vector between each target is calculated, extract the feature of direction vector between target two-by-two；

Step S201 carries out target detection using the target detection model to next frame video, obtains the position of each target Confidence breath, and the direction vector between each target is calculated, extract the feature of direction vector；

The direction vector feature obtained according to before and after frames video is compared, compares direction vector feature by step S202 Constantly close target is merged into fresh target by the variation tendency of length characteristic and direction character in video streaming.

Preferably, in step S202, judgement whether there is close trend between target two-by-two, close if it exists to become Gesture, then continue next frame video, and return step S201 is overlapped until target is close two-by-two, and is merged into fresh target.

Preferably, in step S202, if in the video frame of front and back two-by-two the direction vector length of target it is continuous reduction with And direction is consistent in video interframe, then illustrates two targets constantly close.

Preferably, two direction vector walking direction of video interframe indicates are as follows:

u_{N-1, n}·u′_{N-1, n}=(x_n-1-x_n)·(x′_n-1-x_n′)+(y_n-1-y_n)·(y′_n-1-y_n′)

Wherein, u_{N-1, n}Indicate t₁Direction vector between frame video object, u_{N-1, n}Indicate t₂Between frame video object direction to Amount, (x_n, y_n) indicate target position coordinates,

In continually entering video flowing, if it exists | u_{N-1, n}| > | u '_{N-1, n}| and u_{N-1, n}·u′_{N-1, n}> 0, then it represents that two Target has the tendency that close.

Preferably, in step S202, by the IOU size between two targets to determine whether two targets should be closed And at a fresh target.

Preferably, in step S202, the IOU between two targets is calculated, then merges two targets if more than some threshold value For fresh target.

In order to achieve the above objectives, the present invention also provides a kind of action recognition system based on space characteristics and multi-target detection System, comprising:

Target detection model training acquiring unit, the movement for detecting to needs are decomposed, and each decomposition mesh is obtained Mark carries out the collection of data set to each decomposition goal, and is trained to obtain each point to each target data set based on deep learning Solve the target detection model of target；

Object detection unit is examined for constantly obtaining video flowing using video flowing of the target detection model to input Survey target, obtain location information of the single goal in video image, and calculate the direction vector feature between target, compare direction to Constantly close target is merged into fresh target by the variation tendency of measure feature in video streaming；

Action recognition unit, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only surplus When lower major heading and secondary target, according to the two between the feature and target position for the direction vector that video interframe generates IOU come judge movement generation.

Compared with prior art, a kind of action identification method and system based on space characteristics and multi-target detection of the present invention By being multiple simple targets by movement decomposition and establishing target detection model, the space in video between multiple target is made full use of Vector characteristics, by interframe vector variation characteristic, movement relation and positional relationship by multiple targets in continuous interframe are examined Survey movement, realizes the purpose for improving action recognition accuracy rate.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of the action identification method based on space characteristics and multi-target detection of the present invention；

Fig. 2 is the network structure of YoloV3 network in the specific embodiment of the invention；

Fig. 3 is the schematic diagram that IOU is calculated in the specific embodiment of the invention；

Fig. 4 is a kind of system architecture diagram of the motion recognition system based on space characteristics and multi-target detection of the present invention；

Fig. 5 is the detail structure chart of target detection model training acquiring unit in the specific embodiment of the invention；

Fig. 6 is the detail structure chart of object detection unit in the specific embodiment of the invention；

Fig. 7 is the process of the action identification method based on space characteristics and multi-target detection of the specific embodiment of the invention Figure.

Specific embodiment

Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from Various modifications and change are carried out under spirit of the invention.

Fig. 1 is a kind of step flow chart of the action identification method based on space characteristics and multi-target detection of the present invention.Such as Shown in Fig. 1, a kind of action identification method based on space characteristics and multi-target detection of the present invention includes the following steps:

Step S1 decomposes the movement that detects of needs, obtains each decomposition goal, such as this movement of drinking can be with Be decomposed into wineglass, hand, three targets of mouth (such as Manual definition will drink movement decomposition be three wineglass, hand and mouth mesh Mark), carry out the collection of data set to each decomposition goal, the data set some can be looked in online certain disclosed data sets It arrives, some then need to collect pictures is labeled by software such as labelImg, and based on deep learning to each target data Collect and is trained to obtain the target detection model of each decomposition goal, in the present invention, the data set training based on all decomposition goals A target detection model is obtained, this target detection model can detecte all targets in data set.

Specifically, step S1 further comprises:

Step S100, the movement detected to needs are decomposed, several decomposition goals are obtained.

Step S101 carries out data set collection to each decomposition goal, obtains multiple target data sets.Such as it is received from network Collection includes the picture of each decomposition goal, and the picture comprising identical decomposition goal is collected together, the mesh of the decomposition goal is formed Data set is marked, such as the picture comprising each decomposition goal can be downloaded from the Internet, the figure of single decomposition goal can also be downloaded, is led to It crosses annotation tool the target in picture is marked out to form the target data set of the decomposition goal, it includes original images The comment file generated with mark.

Step S102 pre-processes the target data set of acquisition.Specifically, in order to improve the property of target detection Can, before being trained based on deep learning to target data set, the image concentrated to the target data of acquisition is pre-processed, The pretreatment includes but is not limited to: being translated to the target in the image of target data set, mirror image operation, in target position Be added some Gaussian noises, salt-pepper noise etc., random cropping partial target image, image is done shake, padding.

In the present invention, the network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as basic portion Part detects target in three various sizes of outputs, then will test target and merged to obtain final goal by NMS, defeated Scale is respectively 13*13,26*26,52*52 out, and the network structure of YoloV3 network is as shown in Fig. 2, wherein residual block part one 21 convolutional layers (convolutional layer including several 3*3 and 1*1 convolution) is shared, remaining is res layers, and the part YOLO is yolo network Feature interaction layer, is divided into three scales, and in each scale, local feature interaction is realized by way of convolution kernel, acts on class It is similar to full articulamentum but is the part realized by way of convolution kernel (3*3 and 1*1) between characteristic pattern (feature map) Feature interaction, in the specific embodiment of the invention, Far Left is smallest dimension yolo layers, and input is the characteristic pattern of 13*13 (feature map) exports the characteristic pattern (feature map) of 13*13 size, in this base by a series of convolution operation Classification is carried out on plinth and position returns；Centre is mesoscale yolo layers, the characteristic pattern that yolo layers of smallest dimension are exported (feature map) carries out a series of convolution operations, exports the characteristic pattern (feature map) of 26*26 size, then herein into Row classification and position return；Rightmost is the yolo layer of large scale, the characteristic pattern (feature that yolo layers of mesoscale are exported Map a series of convolution operations) are carried out, the characteristic pattern (feature map) of 52*52 size is exported, then carry out herein classification and Position returns.

Step S2, constantly acquisition video flowing obtain monocular using target detection model to the video flow detection target of input The location information being marked in video image, and the direction vector feature between target is calculated, compare direction vector feature in video flowing In variation tendency, constantly close target is merged into fresh target.In the specific embodiment of the invention, before and after frames are regarded respectively Frequency meter calculates the direction vector between target two-by-two, and the space characteristics i.e. length of vector for extracting direction vector and direction are in video flowing Between variation tendency, if the length of the direction vector between two targets constantly reduces and direction is consistent in video interframe, explanation Two targets constantly it is close, then calculate the IOU between two targets, then merge into fresh target for two if more than some threshold value.

Specifically, step S2 further comprises:

Step S200 carries out target detection using the target detection model that training obtains in S1 to video flowing present frame, obtains To the location information of each target, and the direction vector between each target is calculated, extracts the feature of direction vector between target two-by-two.

In the specific embodiment of the invention, the direction vector method obtained between target two-by-two is as follows: assuming that each single goal exists Location information in image is represented by (x₁, y₁, t₁), (x₂, y₂, t₁) ..., (x_n, y_n, t₁), wherein t₁Indicate t₁Frame view Frequently, (x_n, y_n) position coordinates that indicate target, the direction vector between target may be expressed as:

u_1,2=(x₁-x₂, y₁-y₂, t₁)

u_{1, n}=(x₁-x_n, y₁-y_n, t₁)

…

u_{N-1, n}=(x_n-1-x_n, y_n-1-y_n, t₁)

Wherein, u_{N-1, n}Indicate the direction vector between (n-1)th target and n-th of target.

In the embodiment of the present invention, direction vector feature generally refers to the length characteristic and direction character of direction vector, i.e., The feature for extracting direction vector between target two-by-two is exactly to calculate length and the direction of direction vector, wherein the length of direction vector Degree feature may be expressed as:

Step S201 carries out target detection using target detection model to next frame, obtains the location information of each target, and The direction vector between each target is calculated, the feature of direction vector is extracted.The calculating of target detection and direction vector herein Identical as step S200, it will not be described here.

Step S202, the direction vector feature that step S201 is obtained and the direction vector obtained according to former frame video are special Sign compares, and compares the variation tendency of direction vector feature length characteristic in video streaming and direction character, will be constantly close Target is merged into fresh target, that is, judges to whether there is close trend between target two-by-two, if it exists close trend, then under continuing One frame video, return step S201, until target merges constantly close target very close to (such as close coincidence) two-by-two At fresh target, if close trend is not present in continuous several frames between two targets, illustrate it is not related between the two targets, The relationship between the two targets is then no longer paid close attention to below.

In the present invention, trend close between target is by the direction vector of target two-by-two in the video frame of front and back two-by-two What the feature that the continuous reduction of length and direction are consistent was judged, i.e. two direction vector walking direction of video interframe can It indicates are as follows:

Wherein, u_{N-1, n}Indicate t₁Direction vector between frame video object, u_{N-1, n}Indicate t₂Between frame video object direction to Amount.In continually entering video flowing, if it exists | u_{N-1, n}|>|u′_{N-1, n}| and u_{N-1, n}·u′_{N-1, n}> 0, then it represents that two targets connect Close trend.

In the specific embodiment of the invention, merging target is by the IOU (Intersection-over- between two targets Union is handed over and is compared) size be to determine whether a fresh target should be merged by two targets.Specifically, when two targets Between IOU (hand over and compare) two targets can synthesize to when being more than some threshold value T a fresh targets, between two targets IOU may be expressed as:

Wherein A, B indicate two targets, and the schematic diagram of IOU is as shown in Figure 3.

Step S3 extracts the fresh target of synthesis, when all targets of a movement decomposition merge only remaining major heading and When secondary target, judge to move according to IOU of the two between the feature and target position for the direction vector that video interframe generates The generation of work.

In the specific embodiment of the invention, the difference of major heading and time target refers in multiple targets of movement decomposition, The motion change of video interframe is less major heading, and remaining decomposition goal is kept moving in interframe, one be finally synthesizing Fresh target is referred to as time target.Secondary target can be constantly close to major heading in video streaming, by the direction between two targets to The IOU between space characteristics, that is, vector length, direction and the variable quantity and two targets of length of formation, which is measured, to judge movement is No generation.

Fig. 4 is a kind of system architecture diagram of the motion recognition system based on space characteristics and multi-target detection of the present invention.Such as Shown in Fig. 4, a kind of motion recognition system based on space characteristics and multi-target detection of the present invention, comprising:

Target detection model training acquiring unit 401, the movement for detecting to needs are decomposed, and each decomposition is obtained Target carries out the collection of data set to each decomposition goal, and is trained to obtain respectively to each target data set based on deep learning The target detection model of decomposition goal.For example, three wineglass, hand, mouth targets can be decomposed into for this movement of drinking, it is right Each decomposition goal carries out the collection of data set, some can be focused to find out the data set in online certain disclosed data, some It then needs to collect pictures and be labeled by software such as labelImg, be then based on deep learning and each target data set is carried out Training obtains the target detection model of each decomposition goal, this target detection model can detecte all targets in data set.

Specifically, as shown in figure 5, target detection model training acquiring unit 401 further comprises:

Movement decomposition unit 4010, the movement detected to needs are decomposed, several decomposition goals are obtained.

Target data set collector unit 4011 obtains multiple number of targets for carrying out data set collection to each decomposition goal According to collection.Such as the picture comprising each decomposition goal is collected from network, the picture comprising identical decomposition goal is collected together, Form the target data set of the decomposition goal.

Pretreatment unit 4012, for being pre-processed to the target data set of acquisition.Specifically, in order to improve target The performance of detection need to be obtained using 4012 Duis of pretreatment unit before being trained based on deep learning to target data set The image that target data is concentrated is pre-processed, and the pretreatment includes but is not limited to: to the mesh in the image of target data set Mark translated, mirror image operation, and some Gaussian noises, salt-pepper noise etc., random cropping partial target figure is added in target position Picture, image is done shake, padding.

Model training unit 4013 obtains each decomposition mesh for being trained using YoloV3 network to each target data set Target target detection model.

In the present invention, the network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as basic portion Part detects target in three various sizes of outputs, then will test target and merged to obtain final goal by NMS, defeated Scale is respectively 13*13,26*26,52*52 out.

Object detection unit 402, for constantly obtaining video flowing, using target detection model to the video flow detection of input Target obtains location information of the single goal in video image, and calculates the direction vector feature between target, compares direction vector Constantly close target is merged into fresh target by the variation tendency of feature in video streaming.In the specific embodiment of the invention, mesh Detection unit 402 is marked respectively to the direction vector between the calculating of before and after frames video two-by-two target, and the space for extracting direction vector is special Sign is the variation tendency between video flowing of length and direction of vector, if the length of the direction vector between two targets constantly reduce and Direction is consistent in video interframe, illustrate two targets constantly it is close, then the IOU between two targets is calculated, if more than some Threshold value then merges into fresh target for two.

Specifically, as shown in fig. 6, object detection unit 402 further comprises:

Former frame module of target detection 4021 carries out target detection using target detection model to video flowing present frame, obtains To the location information of each target, and the direction vector between each target is calculated, extracts the feature of direction vector between target two-by-two.

u_1,2=(x₁-x₂, y₁-y₂, t₁)

u_{1, n}=(x₁-x_n, y₁-y_n, t₁)

…

u_{N-1, n}=(x_n-1-x_n, y_n-1-y_n, t₁)

A later frame object detection unit 4022 is obtained for carrying out target detection using target detection model to next frame The location information of each target, and the direction vector between each target is calculated, extract the feature of direction vector.Target detection herein And the calculating of direction vector is identical as step S200, it will not be described here.

Trend judgement processing unit 4023, the direction vector feature and basis that a later frame object detection unit 4022 is obtained The direction vector feature that former frame video obtains compares, and judges to whether there is close trend between target two-by-two, close if it exists Trend, then continue next frame video, return to a later frame object detection unit 4022, until two-by-two target very close to (such as Close to coincidence), merge target.If close trend is not present in continuous several frames between two targets, illustrate the two targets it Between it is not related, behind then no longer pay close attention to relationship between the two targets.

Wherein, u_{N-1, n}Indicate t₁Direction vector between frame video object, u_{N-1, n}Indicate t₂Between frame video object direction to Amount.In continually entering video flowing, if it exists | u_{N-1, n}| > | u '_{N-1, n}| and u_{N-1, n}·u′_{N-1, n}> 0, then it represents that two targets have Close trend.

Wherein A, B indicate two targets.

Action recognition unit 403, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only When remaining major heading and secondary target, according to the two between the feature and target position for the direction vector that video interframe generates IOU come judge movement generation.

Fig. 7 is the process of the action identification method based on space characteristics and multi-target detection of the specific embodiment of the invention Figure.In the present embodiment, it is as follows to be somebody's turn to do the action recognition process based on space characteristics and multi-target detection:

Step 1, will need the movement decomposition that detects is multiple targets, collects the data set of those targets, to data set into Row pretreatment, is trained it to obtain the model of target detection by the YoloV3 network in deep learning.

In the present embodiment, carrying out pretreated method to data set has: being translated to the target in image, mirror image is grasped Make, some Gaussian noises, salt-pepper noise etc. is added in target position, random cropping partial target image does image and shakes, fills out Fill operation.The network structure of YoloV3 is the residual block that is formed using 3*3 convolution sum 1*1 convolution as the basic element of character, in three differences The output of size detects target, then will test target and merged to obtain final goal by NMS, and output scale is respectively 13*13,26*26,52*52.

Step 2, input video frame carry out target detection to the frame video, obtain the location information of target, calculate each Direction vector between target extracts the feature of target direction vector two-by-two；

In the embodiment of the present invention, the direction vector method obtained between target two-by-two is: the position letter of single goal in the picture Breath is represented by (x₁, y₁, t₁), (x₂, y₂, t₁) ..., (x_n, y_n, t₁), wherein t₁Indicate t₁Frame video, (x_n, y_n) indicate mesh Target position coordinates, direction vector may be expressed as:

u_1,2=(x₁-x₂, y₁-y₂, t₁)

u_{1, n}=(x₁-x_n, y₁-y_n, t₁)

…

u_n-_{1, n}=(x_n-1-x_n, y_n-1-y_n, t₁)

In the embodiment of the present invention, the length characteristic of direction vector be may be expressed as:

Step 3, input video stream, same to carry out target detection and calculate the direction vector between target two-by-two again, and It is compared with direction vector before, judges to whether there is close trend between target two-by-two, continue input video, until mesh Mark is very close to merging target；

Two direction vector walking direction of video interframe may be expressed as:

Wherein, u_{N-1, n}Indicate t₁Direction vector between frame video object, u_{N-1, n}Indicate t₂Between frame video object direction to Amount.

In continually entering video flowing, exist | u_{N-1, n}| > | u '_{N-1, n}| and u_{N-1, n}·u′_{N-1, n}> 0, then it represents that two mesh Close trend is indicated, two targets can be synthesized into a new mesh when the IOU between two targets is more than some threshold value T It marks, the IOU between two targets may be expressed as:

Wherein A, B indicate two targets.

Step 4, when all targets of a movement decomposition merge to obtain only remaining major heading and when time target, according to the two Video interframe generate direction vector feature and target position between IOU come judge movement generation.In this implementation In example, a movement decomposition is referred to as major heading, remaining multiple lists for the little target of the mobile variation in multiple targets Target constantly moves one fresh target of synthesis in interframe, and the fresh target finally obtained is referred to as time target, and secondary target can be in video It is constantly close to major heading in stream, pass through space characteristics, that is, vector length of the direction vector formation between two targets, direction Whether the IOU between the variable quantity and two targets of length occurs to judge to act.

In conclusion a kind of action identification method and system based on space characteristics and multi-target detection of the present invention pass through by Movement decomposition is multiple simple targets and establishes target detection model, makes full use of the space vector in video between multiple target special Sign, by interframe vector variation characteristic, by multiple targets continuous interframe movement relation and positional relationship come detection operation, Realize the purpose for improving action recognition accuracy rate.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore, The scope of the present invention, should be as listed in the claims.

Claims

1. a kind of action identification method based on space characteristics and multi-target detection, includes the following steps:

Step S1, the movement detected to needs are decomposed, and each decomposition goal is obtained, and carry out data set to each decomposition goal It collects, and each target data set is trained based on deep learning to obtain the target detection model of each decomposition goal；

Step S2, constantly acquisition video flowing obtain monocular using the target detection model to the video flow detection target of input The location information being marked in video image, and the direction vector feature between target is calculated, compare direction vector feature in video flowing In variation tendency, constantly close target is merged into fresh target；

Step S3 extracts the fresh target of synthesis, merges to obtain only remaining major heading and time mesh in all targets of a movement decomposition When mark, which is judged according to IOU of the two between the feature and target position for the direction vector that video interframe generates Generation.

2. a kind of action identification method based on space characteristics and multi-target detection as described in claim 1, which is characterized in that Step S1 further comprises:

Step S102 pre-processes the target data set of acquisition；

Step S103 is trained each target data set using YoloV3 network to obtain the target detection mould of each decomposition goal Type.

3. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 2, it is characterised in that: In step S102, the pretreatment including but not limited to translates the target in the image of target data set, mirror image is grasped Make, Gaussian noise, salt-pepper noise is added in target position, random cropping partial target image shakes image, fills behaviour Make.

4. a kind of action identification method based on space characteristics and multi-target detection as described in claim 1, which is characterized in that Step S2 further comprises:

Step S200 carries out target detection using the target detection model to video flowing present frame, obtains the position of each target Information, and the direction vector between each target is calculated, extract the feature of direction vector between target two-by-two；

Step S201 carries out target detection using the target detection model to next frame video, obtains the position letter of each target Breath, and the direction vector between each target is calculated, extract the feature of direction vector；

The direction vector feature obtained according to before and after frames video is compared by step S202, is compared direction vector feature and is being regarded The variation tendency of length characteristic and direction character in frequency stream, is merged into fresh target for constantly close target.

5. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 4, it is characterised in that: In step S202, judge to whether there is close trend between target two-by-two, close trend, then continue next frame view if it exists Frequently, return step S201 is overlapped until target is close two-by-two, and is merged into fresh target.

6. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 5, it is characterised in that: In step S202, if in the video frame of front and back two-by-two the direction vector length of target it is continuous reduction and direction in video interframe It is consistent, then illustrates two targets constantly close.

7. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 6, which is characterized in that Two direction vector walking direction of video interframe indicates are as follows:

u_{N-1, n}·u′_{N-1, n}=(x_n-1-x_n)·(x′_n-1-x′_n)+(y_n-1-y_n)·(y′_n-1-y′_n)

Wherein, u_{N-1, n}Indicate t₁Direction vector between frame video object, u '_{N-1, n}Indicate t₂Direction vector between frame video object, (x_n, y_n) indicate target position coordinates,

In continually entering video flowing, if it exists | u_{N-1, n}|>|u′_{N-1, n}| and u_{N-1, n}·u′_{N-1, n}> 0, then it represents that two targets have Close trend.

8. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 5, it is characterised in that: In step S202, by the IOU size between two targets to determine whether a fresh target should be merged into two targets.

9. a kind of action identification method based on space characteristics and multi-target detection as claimed in claim 8, it is characterised in that: In step S202, the IOU between two targets is calculated, two targets are then merged into fresh target if more than some threshold value.

10. a kind of motion recognition system based on space characteristics and multi-target detection, comprising:

Target detection model training acquiring unit, the movement for detecting to needs are decomposed, and each decomposition goal is obtained, right Each decomposition goal carries out the collection of data set, and is trained to obtain each decomposition goal to each target data set based on deep learning Target detection model；

Object detection unit, for constantly obtaining video flowing, using the target detection model to the video flow detection mesh of input Mark obtains location information of the single goal in video image, and calculates the direction vector feature between target, compares direction vector spy Constantly close target is merged into fresh target by the variation tendency of sign in video streaming；

Action recognition unit, for extracting the fresh target of synthesis, when all targets of a movement decomposition merge only to be left master When target and time target, according to IOU of the two between the feature and target position for the direction vector that video interframe generates come The generation of judgement movement.