CN109345569A

CN109345569A - Human movement capture system based on multi-view image collection

Info

Publication number: CN109345569A
Application number: CN201811277128.7A
Authority: CN
Inventors: 黄佳维
Original assignee: Anhui Void Space Information Technology Co Ltd
Current assignee: Anhui Void Space Information Technology Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-02-15

Abstract

The invention discloses the human movement capture systems based on multi-view image collection, including controlling terminal, the controlling terminal is connected with logging modle, it include multiple cameras in logging modle, human motion is shot using camera, obtain under different perspectives a large amount of projected image and as sample, logging modle is connected with CNN module, CNN module includes sample input unit and sample output unit, the input of sample input unit progress sample, the output of sample output unit progress sample, a large amount of sample is trained by CNN network, obtain inference network, CNN module is connected with application module, in the scene of application module, it is fixed with multiple cameras, the movement of human body is shot.The present invention effectively captures human motion, can effectively avoid the problems such as the device is complicated and human motion scope limitation, and equipment is simple, and the calculating time is short, and easy-to-use and result precision is higher.

Description

Human movement capture system based on multi-view image collection

Technical field

The present invention relates to human movement capture system technical fields, more particularly to the fortune of the human body based on multi-view image collection Dynamic capture system.

Background technique

Motion capture system is a kind of for accurately measuring moving object in the system equipment of three-dimensional space motion situation, is Based on computer graphics principle, by several video capturing devices for arranging in space by the fortune of moving object (tracker) Dynamic situation is recorded in the form of images, is then handled using computer the pictorial data, is obtained different time meter The space coordinate (X, Y, Z) of different objects (tracker) in unit is measured, the method for mainstream has at present: optical profile type, mechanical, electromagnetism Formula and acoustics formula are motion-captured, but these methods suffer from some disadvantages: system price is expensive, finishing time is long, uses It is inconvenient, stringent to environmental requirement.

Summary of the invention

The purpose of the present invention is to solve disadvantage existing in the prior art, and propose based on multi-view image collection Human movement capture system.

To achieve the goals above, present invention employs following technical solutions:

Human movement capture system based on multi-view image collection, including controlling terminal, the controlling terminal are connected with record Module is included multiple cameras in logging modle, is shot using camera to human motion, obtained and largely throw under different perspectives Shadow image and as sample, logging modle are connected with CNN module, and CNN module includes sample input unit and sample output unit, Sample input unit carries out the input of sample, and sample output unit carries out the output of sample, by CNN network to a large amount of sample It is trained, obtains inference network, CNN module is connected with application module, in the scene of application module, is fixed with multiple phases Machine shoots the movement of human body, by the inference network obtained in the image input CNN module of shooting, obtains crucial section Projected position C of the point under different perspectives, application module are connected with realization module, in realizing module, it is known that the position of camera The projected position that different cameral is corresponded to space is iterated calculating using least square method, obtains human motion key node Spatial position (X, Y, Z).

Preferably, the logging modle includes that shooting unit, storage unit and mark unit, shooting unit utilize camera pair Human motion is shot, and the projected image that storage unit shoots camera stores, and mark unit is in storage image Human body key node is labeled.

Preferably, the CNN module is convolutional neural networks, and the sample of input is the figure of camera shooting in logging modle Picture, the sample of output are the projected position of skeleton key node in the picture.

Preferably, the quantity of camera is 3-6 in the application module, and camera is fixed on different visual angles to human motion It is shot, obtains corresponding key point projected position.

Preferably, the projected position C(X, Y for realizing module according to key node under different perspectives) calculate crucial section The spatial position P(X, Y, Z of point).

Preferably, the logging modle, CNN module, application module and realize each unit in module by wired or Person wirelessly carries out data transmission.

Compared with prior art, the beneficial effects of the present invention are:

Human motion is shot by multiple (3-6) cameras, obtains the image information under different perspectives；Pass through convolution mind Great amount of samples is trained through network, video camera shoots image pattern and passes through trained convolutional neural networks and corresponding human body bone Bone key node corresponds；Corresponding key node projection information can pass through least-squares iteration under the different perspectives of acquisition The spatial position (X, Y, Z) for obtaining human motion key node, to the photographing information of acquisition, can effectively avoid that the device is complicated with And the problems such as human motion scope limitation, equipment is simple, and easy-to-use and result precision is higher.

Detailed description of the invention

Fig. 1 is that the system structure of the human movement capture system proposed by the present invention based on multi-view image collection is illustrated Figure；

Fig. 2 is the system structure of the logging modle of the human movement capture system proposed by the present invention based on multi-view image collection Schematic diagram；

Fig. 3 is the system structure of the CNN module of the human movement capture system proposed by the present invention based on multi-view image collection Schematic diagram.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.

Referring to Fig.1-3, based on the human movement capture system of multi-view image collection, including controlling terminal, the control Terminal is connected with logging modle, and the logging modle includes that shooting unit, storage unit and mark unit, shooting unit utilize phase Machine shoots human motion, and the projected image that storage unit shoots camera stores, and mark unit is to storage image In human body key node be labeled, in logging modle include multiple cameras, human motion is shot using camera, is obtained A large amount of projected image and as sample under different perspectives, logging modle is connected with CNN module, and the CNN module is convolution Neural network, the sample of input are the image of camera shooting in logging modle, and the sample of output is that skeleton key node exists Projected position in image, CNN module include sample input unit and sample output unit, and sample input unit carries out sample Input, sample output unit carry out the output of sample, are trained by CNN network to a large amount of sample, obtain inference network, CNN module is connected with application module, and the quantity of camera is 3-6 in the application module, and camera is fixed on different visual angles pair Human motion is shot, and single key point projected position is obtained, and in the scene of application module, is fixed with multiple cameras, right The movement of human body is shot, and by the inference network obtained in the image input CNN module of shooting, obtains key node not With the projected position C under visual angle, application module is connected with realization module, and the realization module is according to key node in different perspectives Under projected position C(X, Y) calculate key node spatial position P(X, Y, Z), realize module in, it is known that the position of camera The projected position that different cameral is corresponded to space is iterated calculating using least square method, obtains human motion key node Spatial position (X, Y, Z), the logging modle, CNN module, application module and realize each unit in module by wired Or wirelessly carry out data transmission, human motion is shot by multiple (3-6) cameras, obtains different perspectives Under image information；Great amount of samples is trained by convolutional neural networks, video camera shoots image pattern and corresponding human body Bone key node corresponds；Corresponding key node projection information can pass through trained convolution under the different perspectives of acquisition The spatial position (X, Y, Z) that neural network obtains human motion key node can effectively avoid setting to the photographing information of acquisition The problems such as standby complicated and human motion scope limitation, equipment is simple, and easy-to-use and result precision is higher.

Embodiment: human motion is shot by multiple (3-6) cameras, obtains the image letter under different perspectives Breath；Great amount of samples is trained by convolutional neural networks, video camera shoots image pattern and passes through the convolutional Neural net of training Network is corresponded with corresponding skeleton key node；Corresponding key node projection information can lead under the different perspectives of acquisition Cross the spatial position (X, Y, Z) that least-squares iteration obtains human motion key node, it is known that position P of the key node in space It is C with projected position of the node under camera, transformation matrix of the spatial point under different perspectives camera is T, then has the letter of transformation Number relationship: C=F (T, P), specific algorithm, which is accomplished by, assumes that the initial value of P is Pi(random value), for the phase that position is fixed Machine has determining transformation F, has: Ci=F (T, Pi), then the projection that obtains sit with practical projection coordinate there are error E=| Ci-C |, use Least square method, gauss-newton method iteration reduce E, so that acquiring Pi levels off to true value P, then final Pi is exactly that we want The space coordinate P (X, Y, Z) asked.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims

1. the human movement capture system based on multi-view image collection, including controlling terminal, which is characterized in that the control is eventually End is connected with logging modle, includes multiple cameras in logging modle, is shot using camera to human motion, obtain different views A large amount of projected image and as sample under angle, logging modle are connected with CNN module, CNN module include sample input unit and Sample output unit, sample input unit carry out the input of sample, and sample output unit carries out the output of sample, passes through CNN net Network is trained a large amount of sample, obtains inference network, and CNN module is connected with application module, in the scene of application module, Multiple cameras are fixed with, the movement of human body is shot, the inference network that will be obtained in the image input CNN module of shooting In, projected position C of the key node under different perspectives is obtained, application module is connected with realization module, in realizing module, Know that the position of camera and space correspond to the projected position of different cameral, is iterated calculating using least square method, obtains human body Move the spatial position (X, Y, Z) of key node.

2. the human movement capture system according to claim 1 based on multi-view image collection, which is characterized in that described Logging modle includes that shooting unit, storage unit and mark unit, shooting unit shoot human motion using camera, deposit The projected image that storage unit shoots camera stores, and mark unit marks the human body key node in storage image Note.

3. the human movement capture system according to claim 1 based on multi-view image collection, which is characterized in that described CNN module is convolutional neural networks, and the sample of input is the image of camera shooting in logging modle, and the sample of output is human body bone The projected position of bone key node in the picture.

4. the human movement capture system according to claim 1 based on multi-view image collection, which is characterized in that described The quantity of camera is 3-6 in application module, and camera is fixed on different visual angles and shoots to human motion, obtains single pass Key point projected position.

5. the human movement capture system according to claim 1 based on multi-view image collection, which is characterized in that described Realize module calculated according to projected position C (X, Y) of the key node under different perspectives key node spatial position P (X, Y, Z)。

6. the human movement capture system according to claim 1 based on multi-view image collection, which is characterized in that described Logging modle, CNN module, application module and to realize that each unit in module passes through wired or wirelessly carry out data Transmission.