CN109977912A - Video human critical point detection method, apparatus, computer equipment and storage medium - Google Patents

Video human critical point detection method, apparatus, computer equipment and storage medium Download PDF

Info

Publication number
CN109977912A
CN109977912A CN201910276687.4A CN201910276687A CN109977912A CN 109977912 A CN109977912 A CN 109977912A CN 201910276687 A CN201910276687 A CN 201910276687A CN 109977912 A CN109977912 A CN 109977912A
Authority
CN
China
Prior art keywords
image
detected
flow field
optical flow
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910276687.4A
Other languages
Chinese (zh)
Other versions
CN109977912B (en
Inventor
张樯
张挺
李斌
李司同
崔洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Environmental Features
Original Assignee
Beijing Institute of Environmental Features
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Environmental Features filed Critical Beijing Institute of Environmental Features
Priority to CN201910276687.4A priority Critical patent/CN109977912B/en
Publication of CN109977912A publication Critical patent/CN109977912A/en
Application granted granted Critical
Publication of CN109977912B publication Critical patent/CN109977912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Abstract

The present invention relates to a kind of video human critical point detection methods, comprising: extracts multiframe image to be detected in video to be detected;The optical flow field between image to be detected described in multiframe is obtained, and obtains the characteristic pattern of each described image to be detected;The characteristic pattern is merged according to the optical flow field, obtains the Enhanced feature figure of described image to be detected;The Enhanced feature figure is inputted into default neural network, obtains the human body key point in described image to be detected.By extracting the optical flow field between multiple image, image to be detected is enhanced, and then improve the accuracy rate of Video Key point detection.

Description

Video human critical point detection method, apparatus, computer equipment and storage medium
Technical field
The present invention relates to field of image processings, more particularly to a kind of video human critical point detection method, apparatus, calculate Machine equipment and storage medium.
Background technique
The research of human body critical point detection is how accurately to identify and position to each key point of human body in image, it It is the basis of many computer vision applications such as action recognition, human-computer interaction.
Generally use " bottom-up " and " top-down " two methods on video human critical point detection at present, but this two Video is all only simply decomposed into several frames by kind algorithm, and the Processing Algorithm of single frames is recycled to be handled frame by frame, without benefit With the time-domain information of interframe, cause human body critical point detection accuracy rate lower.
Summary of the invention
The purpose of the present invention is to provide a kind of video human critical point detection method, apparatus, computer equipment and readable Storage medium can effectively improve the accuracy of human body critical point detection in video.
The purpose of the present invention is achieved through the following technical solutions:
A kind of video human critical point detection method, which comprises
Extract multiframe image to be detected in video to be detected;
The optical flow field between image to be detected described in multiframe is obtained, and obtains the characteristic pattern of each described image to be detected;
The characteristic pattern is merged according to the optical flow field, obtains the Enhanced feature figure of described image to be detected;
The Enhanced feature figure is inputted into default neural network, obtains the human body key point in described image to be detected.
In one embodiment, described image to be detected includes current frame image and at least one historical frames image;It is described The step of extracting multiframe image to be detected in video to be detected, comprising:
Extract the current frame image in the video to be detected;
At least one historical frames image in the video to be detected is extracted, the historical frames image takes the frame moment to be located at institute It is before stating current frame image and adjacent with the current frame image.
In one embodiment, the optical flow field obtained between image to be detected described in multiframe, and obtain each described The step of characteristic pattern of image to be detected, comprising:
Obtain the optical flow field between the current frame image and the historical frames image;
The current signature figure of the current frame image is obtained, and obtains the history feature figure of the historical frames image.
In one embodiment, the step for obtaining the optical flow field between the current frame image and the historical frames image Suddenly, comprising:
By the current frame image and the default neural light stream network of historical frames image input, the present frame figure is obtained Optical flow field between picture and the historical frames image.
In one embodiment, described to be merged the characteristic pattern according to the optical flow field, obtain described image to be detected Enhanced feature figure the step of, comprising:
The history feature figure is aligned to the current signature figure according to the optical flow field, obtains alignment feature figure;
The alignment feature figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
In one embodiment, the step for obtaining the optical flow field between the current frame image and the historical frames image Suddenly, comprising:
By the current frame image and the default neural light stream network of historical frames image input, the present frame figure is obtained Optical flow field between picture and the historical frames image, also obtains scale field;Wherein, the scale field and the characteristic pattern dimension phase Together.
In one embodiment, described to be merged the characteristic pattern according to the optical flow field, obtain described image to be detected Enhanced feature figure the step of, comprising:
The history feature figure is aligned to the current signature figure according to the optical flow field, obtains alignment feature figure;
The alignment feature figure is multiplied to obtain fine-characterization figure with the scale field;
The fine-characterization figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
A kind of video human critical point detection device, described device include:
Image zooming-out module, for extracting multiframe image to be detected in video to be detected;
Optical-flow Feature extraction module, for obtaining the optical flow field between image to be detected described in multiframe, and each institute of acquisition State the characteristic pattern of image to be detected;
Image enhancement module obtains described image to be detected for merging the characteristic pattern according to the optical flow field Enhanced feature figure;
Critical point detection module obtains the mapping to be checked for the Enhanced feature figure to be inputted default neural network Human body key point as in.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Above-mentioned steps when device executes the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Above-mentioned steps are realized when row.
Video human critical point detection method provided by the invention extracts the multiframe mapping to be checked in video to be detected Picture;The optical flow field between image to be detected described in multiframe is obtained, and obtains the characteristic pattern of each described image to be detected;According to institute It states optical flow field to merge the characteristic pattern, obtains the Enhanced feature figure of described image to be detected;The Enhanced feature figure is inputted Default neural network, obtains the human body key point in described image to be detected.It is right by extracting the optical flow field between multiple image Image to be detected is enhanced, and then improves the accuracy rate of Video Key point detection.
Detailed description of the invention
Fig. 1 is the applied environment figure of video human critical point detection method in one embodiment;
Fig. 2 is the flow diagram of video human critical point detection method in one embodiment;
Fig. 3 is the flow diagram of video human critical point detection method in another embodiment;
Fig. 4 is the structural block diagram of video human critical point detection device in another embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, with reference to the accompanying drawings and embodiments, to this Invention is described in further detail.It should be appreciated that the specific embodiments described herein are only used to explain the present invention, And the scope of protection of the present invention is not limited.
Video human critical point detection method provided by the present application can be applied in application environment as shown in Figure 1.It should Application environment includes server 104 and photographic device 102, and server 104 obtains video to be detected from photographic device 102, and Extract multiframe image to be detected in video to be detected;Server 104 obtains the optical flow field between image to be detected described in multiframe, And obtain the characteristic pattern of each described image to be detected;Server 104 merges the characteristic pattern according to the optical flow field, obtains The Enhanced feature figure of described image to be detected;The Enhanced feature figure is inputted default neural network by server 104, is obtained described Human body key point in image to be detected.Wherein, server can use the either multiple server compositions of independent server Server cluster is realized;Photographic device can have the device of camera function to realize using camera, camera, mobile phone etc..
In one embodiment, it as shown in Fig. 2, providing a kind of video human critical point detection method, answers in this way For being illustrated for the server in Fig. 1, comprising the following steps:
Step S202 extracts multiframe image to be detected in video to be detected.
In this step, image to be detected includes current frame image and at least one historical frames image.
In the specific implementation process, multiframe image to be detected in the extraction video to be detected of step S202, comprising:
1) current frame image in the video to be detected is extracted;
2) at least one historical frames image in the video to be detected is extracted, the historical frames image takes the frame moment to be located at It is before the current frame image and adjacent with the current frame image.
For example, three adjacent frame images can be extracted, using last frame image as current frame image, by two frame figure of front As being used as historical frames image.
Step S204 obtains the optical flow field between image to be detected described in multiframe, and obtains each described image to be detected Characteristic pattern.
In this step, light stream estimation be according to two observe the variation of body surface, shape etc. between moments to Calculate a kind of method of object of which movement variation.What light stream characterized is the motion information between two images, and what it reflected is previous The instantaneous velocity of pixel motion in frame image to a later frame image.
In the present embodiment, the light stream estimation of interframe is carried out using Flownet2S network.
As shown in figure 3, in one embodiment, the light stream between image to be detected described in the acquisition multiframe of step S204 , and obtain the characteristic pattern of each described image to be detected, comprising:
Step S410 obtains the optical flow field between the current frame image and the historical frames image;
Step S420 obtains the current signature figure of the current frame image, and obtains the history of the historical frames image Characteristic pattern.
In one embodiment, the light stream of step S410 obtained between the current frame image and the historical frames image , it may include: to obtain the current frame image and the default neural light stream network of historical frames image input described current Optical flow field between frame image and the historical frames image.
Specifically, using Mi→kTo indicate that one is calculated the two-dimentional light stream of the i-th frame to kth frame by Flownet2S ?.Assuming that a certain pixel in the i-th framing bit in position p, be the pixel motion to position q in kth frame, then then there is q=p+ δ p, Middle δ p=Mi→k(p);Before carrying out feature alignment, need light stream to be zoomed to by bilinear interpolation the identical size of characteristic pattern;By δ p is mostly decimal in above formula, so needing to be aligned by formula (1) Lai Shixian feature.
Wherein what c was indicated is a channel of characteristic pattern f, and q traverses each coordinate on characteristic pattern, and G is that bilinearity is inserted It is worth transformation kernel.Due to G be it is two-dimensional, two one-dimensional transformation kernels can be decomposed into and be multiplied, as shown in formula (2).
G (q, p+ δ p)=g (qx,px+δpx)·g(qy,py+δpy) (2)
Wherein g (a, b)=max (0,1- | a-b |);It is non-zero due to there was only seldom item in above formula, the meter of institute's above formula Calculating can quickly.
In another embodiment, it is detected in order to enable obtaining feature after alignment and can be more advantageous to, step S410's Obtain the optical flow field between the current frame image and the historical frames image, can also include: by the current frame image and The default neural light stream network of historical frames image input, obtains the light between the current frame image and the historical frames image Flow field also obtains scale field;Wherein, the scale field is identical as the characteristic pattern dimension.
Specifically, Flownet2S not only exports optical flow field also while exporting the scale field of one and characteristic pattern identical dimensional Si→k
The characteristic pattern is merged according to the optical flow field, obtains the Enhanced feature of described image to be detected by step S206 Figure.
In this step, characteristic pattern fusion is carried out using GRU (Gated Recurrent Units, gating cycle unit), Specifically, carrying out temporal signatures fusion using GRU, that is, ConvGRU of convolution form.
In one embodiment, when only obtaining optical flow field in step S410, step S206's will according to the optical flow field The characteristic pattern fusion, obtains the Enhanced feature figure of described image to be detected, comprising:
1) the history feature figure is aligned according to the optical flow field to the current signature figure, obtains alignment feature figure;
2) the alignment feature figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
Specifically, input information is handled according to formula formula (3) inside a GRU unit.GRU unit it is new State htIt is preceding state ht-1With memory state h'tWeighted sum.Update door ztHow many ingredient quilt in memory state determined For calculating new state ht, reset door rtControl preceding state ht-1To the influence degree of memory state.With full type of attachment GRU is different, and * indicates convolution here,Indicate that contraposition is multiplied, σ is sigmoid function, and w is weight to be learned, and b is bias term.
zt=σ (xt*wxz+ht-1*whz+bz),
rt=σ (xt*wxr+ht-1*whr+br),
In another embodiment, when obtaining optical flow field in step S410 and when scale field, step S206 according to Optical flow field merges the characteristic pattern, obtains the Enhanced feature figure of described image to be detected, comprising:
1) the history feature figure is aligned according to the optical flow field to the current signature figure, obtains alignment feature figure;
2) the alignment feature figure is multiplied to obtain fine-characterization figure with the scale field;Specifically, scale field Si→kAnd sky Between be aligned after alignment feature figure be multiplied, obtain fine-characterization figure.
3) the fine-characterization figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
The Enhanced feature figure is inputted default neural network, obtains the human body in described image to be detected by step S208 Key point.
In this step, the human body critical point detection based on image is carried out using Mask-RCNN.The net of Mask-RCNN Network structure mainly includes the feature extraction network of bottom, the candidate frame generation network of middle layer and the specific subtask positioned at head Network three parts composition.
Bottom feature extraction network is original image for extracting feature abundant, input from image, and output is special Sign figure.In order to extract better feature, it is stronger that VGG network used in Faster-RCNN is replaced with into feature representation ability Residual error network.Simultaneously as often there is the different different target of size scale in image, only from the characteristic pattern of single scale into Row detection easily causes missing inspection.For the core network as the Resnet, the feature resolution of shallow-layer is high but semantic Level is lower, and the Feature Semantics level of deep layer is higher but resolution ratio is low.It can be incited somebody to action by using FPN network as core network The information fusion of different scale is got up, and the Analysis On Multi-scale Features figure of output examines subsequent target detection, semantic segmentation, key point Survey has great importance.
Intermediate candidate frame generates network for distinguishing target and background, generating target candidate frame, is later exactly according to time Frame is selected to be cut out characteristic pattern.The method used in Faster-RCNN is RoI Pooling, and realization is reflected from original image region The last pooling in convolution region is mapped to the function of fixed size, the size in the region is normalized into convolutional network input Size.Mask-RCNN between the ROIAlign layers of feature and input to extraction using calibrating.It avoids to each side ROI Boundary or block are digitized, and are obtained using bilinear interpolation method calculating four sampling locations fixed in ROI block defeated Enter characteristic value and result is merged.The characteristic pattern of ROIAlign layers of 7 × 7 size of final output gives subsequent subtask net Network.
The sub-network of specific tasks is located at top, for human body critical point detection, including 8 layer of 3 × 3 convolution.By It is very sensitive for the resolution ratio of characteristic pattern in the accuracy rate of critical point detection therefore double by one uncoiling lamination of cascade and one Linear interpolation layer makes the result scale finally exported be 56 × 56.
Above-mentioned video human critical point detection method, by extracting multiframe image to be detected in video to be detected;It obtains The optical flow field between image to be detected described in multiframe is taken, and obtains the characteristic pattern of each described image to be detected;According to the light The characteristic pattern is merged in flow field, obtains the Enhanced feature figure of described image to be detected;Enhanced feature figure input is default Neural network obtains the human body key point in described image to be detected.By extracting the optical flow field between multiple image, to be checked Altimetric image is enhanced, and then improves the accuracy rate of Video Key point detection.
As shown in figure 4, Fig. 4 is the structural schematic diagram of video human critical point detection device in one embodiment, this implementation A kind of video human critical point detection device, including image zooming-out module 401, Optical-flow Feature extraction module 402, figure are provided in example Image intensifying module 403 and critical point detection module 404, in which:
Image zooming-out module 401, for extracting multiframe image to be detected in video to be detected;
Optical-flow Feature extraction module 402 for obtaining the optical flow field between image to be detected described in multiframe, and obtains each The characteristic pattern of described image to be detected;
Image enhancement module 403 obtains described image to be detected for merging the characteristic pattern according to the optical flow field Enhanced feature figure;
Critical point detection module 404 obtains described to be detected for the Enhanced feature figure to be inputted default neural network Human body key point in image.
Specific restriction about video human critical point detection device may refer to above for video human key point The restriction of detection method, details are not described herein.Modules in above-mentioned video human critical point detection device can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold The corresponding operation of the above modules of row.
As shown in figure 5, Fig. 5 is the schematic diagram of internal structure of computer equipment in one embodiment.The computer equipment packet Include processor, non-volatile memory medium, memory and the network interface connected by device bus.Wherein, which sets Standby non-volatile memory medium is stored with operating device, database and computer-readable instruction, can be stored with control in database Part information sequence when the computer-readable instruction is executed by processor, may make processor to realize a kind of video human key point Detection method.The processor of the computer equipment supports the operation of entire computer equipment for providing calculating and control ability. Computer-readable instruction can be stored in the memory of the computer equipment, when which is executed by processor, Processor may make to execute a kind of video human critical point detection method.The network interface of the computer equipment is used to connect with terminal Connect letter.It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment it is proposed that a kind of computer equipment, computer equipment include memory, processor and storage On a memory and the computer program that can run on a processor, processor realize following steps when executing computer program: Extract multiframe image to be detected in video to be detected;The optical flow field between image to be detected described in multiframe is obtained, and is obtained The characteristic pattern of each described image to be detected;The characteristic pattern is merged according to the optical flow field, obtains described image to be detected Enhanced feature figure;The Enhanced feature figure is inputted into default neural network, obtains the human body key point in described image to be detected.
It includes current frame image that processor, which executes described image to be detected when computer program, in one of the embodiments, With at least one historical frames image;The step of described multiframe image to be detected extracted in video to be detected, comprising: described in extraction Current frame image in video to be detected;Extract at least one historical frames image in the video to be detected, the historical frames figure The frame moment that takes of picture is located at before the current frame image and adjacent with the current frame image.
When processor executes computer program in one of the embodiments, image to be detected described in the acquisition multiframe it Between optical flow field, and the step of obtaining the characteristic pattern of each described image to be detected, comprising: obtain the current frame image and institute State the optical flow field between historical frames image;The current signature figure of the current frame image is obtained, and obtains the historical frames figure The history feature figure of picture.
Processor executes the current frame image and described of obtaining when computer program in one of the embodiments, The step of optical flow field between historical frames image, comprising: by the current frame image and the default mind of historical frames image input Through light stream network, the optical flow field between the current frame image and the historical frames image is obtained.
In one of the embodiments, processor execute when computer program it is described according to the optical flow field by the feature Figure fusion, the step of obtaining the Enhanced feature figure of described image to be detected, comprising: according to the optical flow field by the history feature Figure is aligned to the current signature figure, obtains alignment feature figure;When the alignment feature figure and the current signature figure are carried out Domain fusion, obtains the Enhanced feature figure.
Processor executes the current frame image and described of obtaining when computer program in one of the embodiments, The step of optical flow field between historical frames image, comprising: by the current frame image and the default mind of historical frames image input Through light stream network, the optical flow field between the current frame image and the historical frames image is obtained, scale field is also obtained;Wherein, The scale field is identical as the characteristic pattern dimension.
In one of the embodiments, processor execute when computer program it is described according to the optical flow field by the feature Figure fusion, the step of obtaining the Enhanced feature figure of described image to be detected, comprising: according to the optical flow field by the history feature Figure is aligned to the current signature figure, obtains alignment feature figure;The alignment feature figure is multiplied to obtain carefully with the scale field Change characteristic pattern;The fine-characterization figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
In one embodiment it is proposed that a kind of storage medium for being stored with computer-readable instruction, this is computer-readable When instruction is executed by one or more processors, so that one or more processors execute following steps: extracting video to be detected In multiframe image to be detected;The optical flow field between image to be detected described in multiframe is obtained, and obtains each mapping to be checked The characteristic pattern of picture;The characteristic pattern is merged according to the optical flow field, obtains the Enhanced feature figure of described image to be detected;By institute It states Enhanced feature figure and inputs default neural network, obtain the human body key point in described image to be detected.
Described image to be detected includes current when computer-readable instruction is executed by processor in one of the embodiments, Frame image and at least one historical frames image;The step of described multiframe image to be detected extracted in video to be detected, comprising: mention Take the current frame image in the video to be detected;At least one historical frames image in the video to be detected is extracted, it is described to go through The frame moment that takes of history frame image is located at before the current frame image and adjacent with the current frame image.
It is to be detected described in the acquisition multiframe when computer-readable instruction is executed by processor in one of the embodiments, Optical flow field between image, and the step of obtaining the characteristic pattern of each described image to be detected, comprising: obtain the present frame figure Optical flow field between picture and the historical frames image;The current signature figure of the current frame image is obtained, and is gone through described in acquisition The history feature figure of history frame image.
It is described when computer-readable instruction is executed by processor in one of the embodiments, to obtain the current frame image The step of optical flow field between the historical frames image, comprising: input the current frame image and the historical frames image Default nerve light stream network, obtains the optical flow field between the current frame image and the historical frames image.
When computer-readable instruction is executed by processor in one of the embodiments, it is described according to the optical flow field by institute The step of stating characteristic pattern fusion, obtaining the Enhanced feature figure of described image to be detected, comprising: gone through according to the optical flow field by described History characteristic pattern is aligned to the current signature figure, obtains alignment feature figure;By the alignment feature figure and the current signature figure Time Domain Fusion is carried out, the Enhanced feature figure is obtained.
It is described when computer-readable instruction is executed by processor in one of the embodiments, to obtain the current frame image The step of optical flow field between the historical frames image, comprising: input the current frame image and the historical frames image Default nerve light stream network, obtains the optical flow field between the current frame image and the historical frames image, also obtains scale field; Wherein, the scale field is identical as the characteristic pattern dimension.
When computer-readable instruction is executed by processor in one of the embodiments, it is described according to the optical flow field by institute The step of stating characteristic pattern fusion, obtaining the Enhanced feature figure of described image to be detected, comprising: gone through according to the optical flow field by described History characteristic pattern is aligned to the current signature figure, obtains alignment feature figure;The alignment feature figure is multiplied with the scale field Obtain fine-characterization figure;The fine-characterization figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature Figure.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of video human critical point detection method, which is characterized in that the described method includes:
Extract multiframe image to be detected in video to be detected;
The optical flow field between image to be detected described in multiframe is obtained, and obtains the characteristic pattern of each described image to be detected;
The characteristic pattern is merged according to the optical flow field, obtains the Enhanced feature figure of described image to be detected;
The Enhanced feature figure is inputted into default neural network, obtains the human body key point in described image to be detected.
2. the method according to claim 1, wherein described image to be detected includes current frame image and at least one A historical frames image;The step of described multiframe image to be detected extracted in video to be detected, comprising:
Extract the current frame image in the video to be detected;
At least one historical frames image in the video to be detected is extracted, the historical frames image takes the frame moment to be located at described work as It is before prior image frame and adjacent with the current frame image.
3. according to the method described in claim 2, it is characterized in that, the light stream obtained between image to be detected described in multiframe , and the step of obtaining the characteristic pattern of each described image to be detected, comprising:
Obtain the optical flow field between the current frame image and the historical frames image;
The current signature figure of the current frame image is obtained, and obtains the history feature figure of the historical frames image.
4. according to the method described in claim 3, it is characterized in that, the acquisition current frame image and the historical frames figure As between optical flow field the step of, comprising:
By the current frame image and the default neural light stream network of historical frames image input, obtain the current frame image and Optical flow field between the historical frames image.
5. according to the method described in claim 4, it is characterized in that, described merge the characteristic pattern according to the optical flow field, The step of obtaining the Enhanced feature figure of described image to be detected, comprising:
The history feature figure is aligned to the current signature figure according to the optical flow field, obtains alignment feature figure;
The alignment feature figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
6. according to the method described in claim 3, it is characterized in that, the acquisition current frame image and the historical frames figure As between optical flow field the step of, comprising:
By the current frame image and the default neural light stream network of historical frames image input, obtain the current frame image and Optical flow field between the historical frames image, also obtains scale field;Wherein, the scale field is identical as the characteristic pattern dimension.
7. according to the method described in claim 6, it is characterized in that, described merge the characteristic pattern according to the optical flow field, The step of obtaining the Enhanced feature figure of described image to be detected, comprising:
The history feature figure is aligned to the current signature figure according to the optical flow field, obtains alignment feature figure;
The alignment feature figure is multiplied to obtain fine-characterization figure with the scale field;
The fine-characterization figure and the current signature figure are subjected to Time Domain Fusion, obtain the Enhanced feature figure.
8. a kind of video human critical point detection device, which is characterized in that described device includes:
Image zooming-out module, for extracting multiframe image to be detected in video to be detected;
Optical-flow Feature extraction module, for obtaining the optical flow field between image to be detected described in multiframe, and obtain it is each it is described to The characteristic pattern of detection image;
Image enhancement module obtains the enhancing of described image to be detected for merging the characteristic pattern according to the optical flow field Characteristic pattern;
Critical point detection module obtains in described image to be detected for the Enhanced feature figure to be inputted default neural network Human body key point.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910276687.4A 2019-04-08 2019-04-08 Video human body key point detection method and device, computer equipment and storage medium Active CN109977912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910276687.4A CN109977912B (en) 2019-04-08 2019-04-08 Video human body key point detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910276687.4A CN109977912B (en) 2019-04-08 2019-04-08 Video human body key point detection method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109977912A true CN109977912A (en) 2019-07-05
CN109977912B CN109977912B (en) 2021-04-16

Family

ID=67083370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910276687.4A Active CN109977912B (en) 2019-04-08 2019-04-08 Video human body key point detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109977912B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853074A (en) * 2019-10-09 2020-02-28 天津大学 Video target detection network system for enhancing target by utilizing optical flow
CN111160237A (en) * 2019-12-27 2020-05-15 智车优行科技(北京)有限公司 Head pose estimation method and apparatus, electronic device, and storage medium
CN111914756A (en) * 2020-08-03 2020-11-10 北京环境特性研究所 Video data processing method and device
CN112053327A (en) * 2020-08-18 2020-12-08 南京理工大学 Video target detection method and system, storage medium and server
CN112199978A (en) * 2019-07-08 2021-01-08 北京地平线机器人技术研发有限公司 Video object detection method and device, storage medium and electronic equipment
CN113901909A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Video-based target detection method and device, electronic equipment and storage medium
CN115909508A (en) * 2023-01-06 2023-04-04 浙江大学计算机创新技术研究院 Image key point enhancement detection method under single-person sports scene

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8116624B1 (en) * 2007-01-29 2012-02-14 Cirrex Systems Llc Method and system for evaluating an optical device
CN106529419A (en) * 2016-10-20 2017-03-22 北京航空航天大学 Automatic detection method for significant stack type polymerization object in video
CN108229336A (en) * 2017-12-13 2018-06-29 北京市商汤科技开发有限公司 Video identification and training method and device, electronic equipment, program and medium
CN108242062A (en) * 2017-12-27 2018-07-03 北京纵目安驰智能科技有限公司 Method for tracking target, system, terminal and medium based on depth characteristic stream
CN108776974A (en) * 2018-05-24 2018-11-09 南京行者易智能交通科技有限公司 A kind of real-time modeling method method suitable for public transport scene
CN109117701A (en) * 2018-06-05 2019-01-01 东南大学 Pedestrian's intension recognizing method based on picture scroll product
CN109508643A (en) * 2018-10-19 2019-03-22 北京陌上花科技有限公司 Image processing method and device for porny

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8116624B1 (en) * 2007-01-29 2012-02-14 Cirrex Systems Llc Method and system for evaluating an optical device
CN106529419A (en) * 2016-10-20 2017-03-22 北京航空航天大学 Automatic detection method for significant stack type polymerization object in video
CN108229336A (en) * 2017-12-13 2018-06-29 北京市商汤科技开发有限公司 Video identification and training method and device, electronic equipment, program and medium
CN108242062A (en) * 2017-12-27 2018-07-03 北京纵目安驰智能科技有限公司 Method for tracking target, system, terminal and medium based on depth characteristic stream
CN108776974A (en) * 2018-05-24 2018-11-09 南京行者易智能交通科技有限公司 A kind of real-time modeling method method suitable for public transport scene
CN109117701A (en) * 2018-06-05 2019-01-01 东南大学 Pedestrian's intension recognizing method based on picture scroll product
CN109508643A (en) * 2018-10-19 2019-03-22 北京陌上花科技有限公司 Image processing method and device for porny

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199978A (en) * 2019-07-08 2021-01-08 北京地平线机器人技术研发有限公司 Video object detection method and device, storage medium and electronic equipment
CN110853074A (en) * 2019-10-09 2020-02-28 天津大学 Video target detection network system for enhancing target by utilizing optical flow
CN110853074B (en) * 2019-10-09 2023-06-27 天津大学 Video target detection network system for enhancing targets by utilizing optical flow
CN111160237A (en) * 2019-12-27 2020-05-15 智车优行科技(北京)有限公司 Head pose estimation method and apparatus, electronic device, and storage medium
CN111914756A (en) * 2020-08-03 2020-11-10 北京环境特性研究所 Video data processing method and device
CN112053327A (en) * 2020-08-18 2020-12-08 南京理工大学 Video target detection method and system, storage medium and server
CN112053327B (en) * 2020-08-18 2022-08-23 南京理工大学 Video target detection method and system, storage medium and server
CN113901909A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Video-based target detection method and device, electronic equipment and storage medium
CN113901909B (en) * 2021-09-30 2023-10-27 北京百度网讯科技有限公司 Video-based target detection method and device, electronic equipment and storage medium
CN115909508A (en) * 2023-01-06 2023-04-04 浙江大学计算机创新技术研究院 Image key point enhancement detection method under single-person sports scene

Also Published As

Publication number Publication date
CN109977912B (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN109977912A (en) Video human critical point detection method, apparatus, computer equipment and storage medium
Chen et al. Learning spatial attention for face super-resolution
CN112733797B (en) Method, device and equipment for correcting sight of face image and storage medium
WO2020134818A1 (en) Image processing method and related product
Nasrollahi et al. Extracting a good quality frontal face image from a low-resolution video sequence
CN111160164A (en) Action recognition method based on human body skeleton and image fusion
CN111428664B (en) Computer vision real-time multi-person gesture estimation method based on deep learning technology
WO2020233427A1 (en) Method and apparatus for determining features of target
CN112580521B (en) Multi-feature true and false video detection method based on MAML (maximum likelihood markup language) element learning algorithm
Zhou et al. A lightweight hand gesture recognition in complex backgrounds
KR102551835B1 (en) Active interaction method, device, electronic equipment and readable storage medium
CN110853039B (en) Sketch image segmentation method, system and device for multi-data fusion and storage medium
CN111914756A (en) Video data processing method and device
Vieriu et al. On HMM static hand gesture recognition
CN112712019A (en) Three-dimensional human body posture estimation method based on graph convolution network
CN116092178A (en) Gesture recognition and tracking method and system for mobile terminal
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
CN110021036A (en) Infrared target detection method, apparatus, computer equipment and storage medium
CN116309983A (en) Training method and generating method and device of virtual character model and electronic equipment
CN109492755B (en) Image processing method, image processing apparatus, and computer-readable storage medium
CN111476868B (en) Animation generation model training and animation generation method and device based on deep learning
Xiong et al. Extraction of hand gestures with adaptive skin color models and its applications to meeting analysis
Qian et al. Multi-Scale tiny region gesture recognition towards 3D object manipulation in industrial design
TWI734297B (en) Multi-task object recognition system sharing multi-range features
KR102591082B1 (en) Method and apparatus for creating deep learning-based synthetic video contents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant