CN108573197A

CN108573197A - Video actions detection method and device

Info

Publication number: CN108573197A
Application number: CN201710146933.5A
Authority: CN
Inventors: 刘春晖; 厉扬豪; 胡越予; 刘家瑛; 郭宗明
Original assignee: Peking University; Peking University Founder Group Co Ltd; Beijing Founder Electronics Co Ltd
Current assignee: Peking University; Peking University Founder Group Co Ltd; Beijing Founder Electronics Co Ltd
Priority date: 2017-03-13
Filing date: 2017-03-13
Publication date: 2018-09-25

Abstract

The present invention provides a kind of video actions detection method and device, and wherein method includes：Training video is chosen from training set, obtains the skeleton data of each frame image and the recognition result to being acted in image and prediction result in training video；The skeleton data of each frame image is handled using neural network, obtains the identification information and predictive information of each frame image；According to the identification information and recognition result and the predictive information and prediction result, neural network is optimized；It repeats the above steps, until neural network restrains；After neural network convergence, using neural network to being handled per the skeleton data of frame image in video to be measured, corresponding identification information and predictive information are obtained.Each frame image in video can be identified in video actions detection method and device provided by the invention, be not necessarily to manual extraction video clip, improve detection efficiency and accuracy.

Description

Video actions detection method and device

Technical field

The present invention relates to computer vision technique more particularly to a kind of video actions detection method and device.

Background technology

The target of video actions detection is to give a video sequence, is identified to the action generation in the video sequence Section and corresponding type.The development of Microsoft's Kinect device allows joint skeleton (skeleton) data of people more to hold Easy acquisition, joint skeleton are the more abstract expression of people, have prodigious help to motion detection and forecasting problem.

In the prior art, the work of video actions detection is completed based on action recognition.The task of action recognition be to A fixed short video clip identifies that it acts type.Traditional action identification method is the gradient histogram for extracting video image The traditional characteristics information such as figure are classified.On this basis, and it is proposed that made using the movement locus of front and back frame and light stream Using fischer vector after compressed encoding classify for a new feature combination traditional characteristic.These methods are Regard video as an entirety, one section of video only can recognize that an action type is needed when there are multiple actions in long video Want manual extraction video clip, the above method recycled to be identified for each video clip, inefficiency, and accuracy compared with Difference.

Invention content

The present invention provides a kind of video actions detection method and device, to solve video actions identification in the prior art The technical issues of inefficiency.

The present invention provides a kind of video actions detection method, including：

It chooses training video from training set, obtains in training video the skeleton data of each frame image and in image The recognition result and prediction result of action；

The skeleton data of each frame image is handled using neural network, obtain each frame image identification information and Predictive information；

According to the identification information and recognition result and the predictive information and prediction result, neural network is carried out Optimization；

It repeats the above steps, until neural network restrains；

After neural network convergence, using neural network to being handled per the skeleton data of frame image in video to be measured, Obtain corresponding identification information and predictive information.

Further, the skeleton data of each frame image is handled using neural network, obtains each frame image Identification information and predictive information, including：

The skeleton data of each frame image is separately input to the characteristic extraction part of neural network, obtains corresponding feature Information；

The characteristic information is input to multitask part of neural network, obtains identification information and the prediction of each frame image Information.

Further, according to the identification information and recognition result and the predictive information and prediction result, to nerve Network optimizes, including：

Identification error is calculated according to the identification information and the recognition result；

Prediction error is calculated according to the predictive information and the prediction result；

According to the weighted sum of the identification error and the prediction error, overall error is obtained, and utilize stochastic gradient descent Method reverse transmittance nerve network parameter.

Further, after neural network convergence, using neural network to the skeleton data of every frame image in video to be measured It is handled, obtains corresponding identification information and predictive information, including：

After neural network convergence, the skeleton data of video to be measured is obtained；

The skeleton data of each frame image in video to be measured is input to the characteristic extraction part of neural network, obtains phase The characteristic information answered；

The corresponding characteristic information of the video to be measured is input to multitask part of neural network, obtains each frame image Identification information and predictive information；

According to the identification information and the predictive information, determine in video the corresponding action classification of each frame image and At the beginning of acting end time or next action.

Further, the recognition result acted in each frame image in training video and prediction result are obtained, including：

The training video is played to user, and it is corresponding to receive each frame image that user inputs according to the training video Action classification and action end time or time started；

The recognition result and prediction result are determined according to the action classification and action end time or time started.

The present invention also provides a kind of video actions detection devices, including：

Acquisition module obtains the skeleton number of each frame image in training video for choosing training video from training set According to this and the recognition result to being acted in image and prediction result；

Processing module obtains each frame figure for being handled the skeleton data of each frame image using neural network The identification information and predictive information of picture；

Optimization module is used for according to the identification information and recognition result and the predictive information and prediction result, right Neural network optimizes；

Replicated blocks, if judging result is no, trigger the acquisition module for judging whether log on restrains, If the determination result is YES, then detection trigger module；

The detection module is used for after neural network convergence, using neural network to every frame image in video to be measured Skeleton data is handled, and corresponding identification information and predictive information are obtained.

Further, the processing module is specifically used for：

Further, the optimization module is specifically used for：

Further, the detection module is specifically used for：

Further, the acquisition module is specifically used for：

Training video is chosen from training set, obtains the skeleton data of each frame image in training video；

Video actions detection method and device provided by the invention obtain instruction by choosing training video from training set Practice the skeleton data of each frame image and the recognition result to being acted in image and prediction result in video, utilizes neural network The skeleton data of each frame image is handled, the identification information and predictive information of each frame image are obtained, according to the knowledge Other information and recognition result and the predictive information and prediction result, optimize neural network, repeat the above steps, Until neural network restrains, after neural network convergence, using neural network to the skeleton data of every frame image in video to be measured It is handled, obtains corresponding identification information and predictive information, each frame image in video can be identified, without manual Video clip is extracted, detection efficiency and accuracy are improved.

Description of the drawings

Fig. 1 is the flow chart for the video actions detection method that the embodiment of the present invention one provides；

Fig. 2 is the structure diagram of video actions detection device provided by Embodiment 2 of the present invention.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art The every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

The term used in the embodiment of the present application is the purpose only merely for description specific embodiment, is not intended to be limiting The present invention.The "an" of used singulative, " described " and "the" are also intended to including most shapes in the embodiment of the present application Formula, unless context clearly shows that other meanings.

It should be appreciated that term "and/or" used herein is only a kind of incidence relation of description affiliated partner, indicate There may be three kinds of relationships, for example, A and/or B, can indicate：Individualism A, exists simultaneously A and B, individualism B these three Situation.In addition, character "/" herein, it is a kind of relationship of "or" to typically represent forward-backward correlation object.

Depending on context, word as used in this " if ", " if " can be construed to " ... when " or " when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as Fruit detect (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection (statement Condition or event) when " or " in response to detection (condition or event of statement) ".

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Including so that commodity or system including a series of elements include not only those elements, but also include not clear The other element listed, or further include for this commodity or the intrinsic element of system.In the feelings not limited more Under condition, the element that is limited by sentence "including a ...", it is not excluded that including the element commodity or system in also There are other identical elements.

Embodiment one

The embodiment of the present invention one provides a kind of video actions detection method.Fig. 1 is the video that the embodiment of the present invention one provides The flow chart of motion detection method.As shown in Figure 1, the method in the present embodiment, may include：

Step 101 chooses training video from training set, obtain in training video the skeleton data of each frame image and To the recognition result and prediction result acted in image.

Wherein, training set can be the set for including multiple training videos, and the training video can be for training god Video through network.

The present embodiment identifies the corresponding action of each frame image in video to be measured using neural network.Specific method can divide For training process and detection process two parts.Step 101 to step 104 is considered as training process, utilizes the instruction in training set Practice video to optimize neural network；Step 105 is considered as detection process, is to utilize neural network after the completion of training Video to be measured is handled, to analyze the action in video to be measured.

Specifically, the recognition result acted in each frame image in training video and prediction result are obtained, may include：To User plays the training video, and receive the corresponding action classification of each frame image that user inputs according to the training video with And act end time or time started；The knowledge is determined according to the action classification and action end time or time started Other result and prediction result.

The recognition result of one frame image is used to indicate the action classification of the frame image, and the action classification may include but not It is limited to：It sits down, stand up, walk, pour, raise one's hand.The prediction result of one frame image is for indicating the corresponding action of frame image The time that classification terminates, for example, the action classification of a certain frame image is walking, walks this alternatively, at the beginning of next action One action terminates after 10 frames, then the prediction result can be 10 frames.Action classification, action end time or the beginning of image Time can be determined and inputted according to training video by user.

The skeleton data of each frame image may include the information such as the position of each joint skeleton of human body and size in image. The joint skeleton can include but is not limited to：Head, left shoulder, right shoulder, left finesse, right finesse etc..The skeleton number of image According to acquisition belong to the prior art, repeated no more in the present embodiment.

Step 102 is handled the skeleton data of each frame image using neural network, obtains the knowledge of each frame image Other information and predictive information.

In the present embodiment, neural network includes mainly characteristic extraction part and the multitask neural network portion of neural network Point.Wherein, the characteristic extraction part of neural network and multitask part of neural network can respectively include multiple portions again, such as god Characteristic extraction part through network includes shot and long term Memory Neural Networks, and multitask neural network includes full Connection Neural Network Deng.

In the characteristic extraction part of neural network, the mode that model has used three layers of complex network to be superimposed carries out action spy Sign extraction.For each layer, Recognition with Recurrent Neural Network to the temporal aspect of action is modeled and is stored, later using full connection Layer carries out temporary and artis correlation information the processing of feature, introduces arbitrarily lose the mechanism of parameter to mitigate nerve later Over-fitting in e-learning.

In multitask part of neural network, in order to motion detection there are one more accurately as a result, we to training set into Row pretreatment, before action is subdivided into action generation, action, which takes place and acts, soon terminates three parts.Utilize nerve net Network characteristic extraction part output as a result, being classified to the action type of each frame using full Connection Neural Network, to reach Being completed at the same time to identification and detection for task.On the basis of the classification of motion of each frame, the processing of multitask part of neural network The task of action prediction, is broadly divided into prediction action and starts to terminate with prediction action, using selection neural network, using dynamic Parameter in detecting carries out the parameter of full Connection Neural Network the screening of action type level, forecasting problem is converted into The regression problem of countdown occurs for action.

The skeleton data of each frame image is handled using neural network, obtain each frame image identification information and Predictive information may include：The skeleton data of each frame image is separately input to the characteristic extraction part of neural network, is obtained Corresponding characteristic information；The characteristic information is input to multitask part of neural network, obtains the identification letter of each frame image Breath and predictive information.

Wherein, the corresponding action classification of image that the identification information is used to indicate to be obtained according to neural computing, institute At the beginning of end time or the next action of stating action of the predictive information for indicating to obtain according to neural computing.

Specifically, it is assumed that it is respectively V that training, which is concentrated with S training video,₀…V_s-1, each training video may include one A or multiple actions randomly select a training video from training set, it is assumed that the training video of selection is V_i, the video is by N Group of picture at being denoted as f respectively^Vi ₁…f^Vi _N。

First, by the skeleton data f of each frame image in training video^Vi ₁…f^Vi _NIt is input to the feature extraction unit of neural network Point, corresponding characteristic information is obtained, is as follows：

Skeleton data is input to shot and long term Memory Neural Networks (a kind of Recognition with Recurrent Neural Network) by step 1021, utilizes public affairs Formula (1) obtains accordingly exporting h₁…h_N。

h₁,h₂,…,h_N-1=LSTM (f₁,f₂,…,f_N) (1)

In formula (1), f_tFor the corresponding skeleton data of video t frame images, h_tIt is L for the intermediate parameters of t frame images × 1 vector, LSTM are shot and long term Memory Neural Networks function, and the parameter in the function generates at random when initial, later basis The error of step 103 is modified.

Step 1022, the h that will be calculated₁…h_NIt is input to full Connection Neural Network, is accordingly tied using formula (2) Fruit.

In formula (2), W_k,jIt is the matrix of a L × L, the matrix for the matrix to be asked in full Connection Neural Network function It is random number when parameter is initial, is modified later according to the error of step 103.L depends on the vector length of framework information.g_i Indicate the characteristic information of the i-th frame image, h_iIndicate that the intermediate parameters of the i-th frame image obtained according to formula (1), the value of i are From 1 to N, h_iAnd g_iAll it is the vector of L × 1, therefore the value of j is from 1 to L.

Step 1023 obtains g_i(j)Afterwards, random loss vector portion numerical value within a preset range, prevents over-fitting.

Step 1021 is repeated to step 1023 three times, finally obtains corresponding characteristic information g₁…g_N.Wherein g_i∈R^Lx1, it is L is multiplied by 1 real vector, and L values are respectively [100,120,100] in calculating three times, is changed by the height of L values, is realized Processing of the low-dimensional to higher-dimension, again from higher-dimension to low-dimensional so that the robustness of neural network is more preferable.

Then, after the characteristic information for obtaining each frame image, the characteristic information g that will obtain₀…g_N-1It is separately input to more Business part of neural network, obtains the identification information and predictive information of each frame, is as follows：

Step 1024, by characteristic information g₁…g_NIt is input to full Connection Neural Network, identification information is obtained using formula (3).

In formula (3), W '_k,jIt is the matrix of a L × L, the matrix parameter for the matrix to be asked of full Connection Neural Network It is random number when initial, is modified later according to the error of step 103, g_tIt is a L for the characteristic information of t frame images × 1 vector, L 100.y_tThe identification information for indicating t frame images, for the result vector of the judgement of the frame action classification, y_t∈ R^Mx1, it is the vector of M × 1, therefore the value of j is 1 to M in formula (3), M is the number of action classification.It is each in the vector A value expresses the confidence level of corresponding actions.

Assuming that action classification shares 4, respectively it is seated, stands up, walks and raises one's hand, then M=4, y_t=(0,0.1,0.2, 0.5), then it represents that the confidence level for this four action classifications that are seated, stand up, walk and raise one's hand is respectively 0,0.1,0.2,0.5, confidence Degree it is higher, illustrate image it is corresponding more may be the action, This move confidence level highest of raising one's hand, be 0.5, illustrate in image Action, which is most likely to be, raises one's hand.

Finally, predictive information is obtained according to skeleton data and identification information, be as follows：

Step 1025 generates prediction matrix using formula (4).

In formula (4), W " is the matrix to be asked of full Connection Neural Network, is the matrix of L × L, is when matrix parameter is initial Random number is modified according to the error of step 103 later.g_tFor the characteristic information of t frame images, p ' is the vector of L × 1.

By p '_iData in vector are one group a per M, are converted to a matrix G_k,j.G is the matrix of (L/M) × M.

Step 1025, the final result y obtained using step 1024_tAnd formula (5) selects matrix.

In formula (5), p " is the vector of (L/M) × 1, and the value of j is from 1 to L/M.

Step 1025 obtains predictive information using formula (6).

In formula (6), W " ' is the matrix to be asked of full Connection Neural Network, is the matrix of 1 × (L/M), matrix parameter is initial When be random number, modified later according to the error of step 103.p_t∈ R are the prediction that frame action occurs, the numerical tabular The time span that present frame terminates to action or occurs to next action is shown.

Step 103, according to the identification information and recognition result and the predictive information and prediction result, to nerve Network optimizes.

In the present embodiment, according to the identification information and recognition result and the predictive information and prediction result, to god It is optimized through network, may include：Identification error is calculated according to the identification information and the recognition result；According to described pre- Measurement information and the prediction result calculate prediction error；According to the weighted sum of the identification error and the prediction error, obtain Overall error, and utilize stochastic gradient descent method reverse transmittance nerve network parameter.

Specifically, it is compared using the actual result in the result and step 101 obtained in step 102, is always missed Difference：

Overall error is formed by identification error and prediction error combination, and λ is the coefficient of artificial selection, can generally take 0.1.

The calculation formula of identification error is：

Wherein, y_tFor the identification information of the t frame images obtained in step 102, y_t,kIt indicates to move for k-th in identification information Make the corresponding value of classification.z_tFor the recognition result of the t frame images obtained in step 101, i.e., correct predicted vector, z_t,kTable Show the corresponding value of k-th of action classification in recognition result.K values are 1 to M.It is 1, Qi Tadong correctly to act corresponding numerical value It is 0 to make corresponding numerical value.Assuming that action classification shares 4, respectively it is seated, stands up, walks and raises one's hand, in a certain frame image In, user identifies that action is to raise one's hand, then recognition result z_t=(0,0,0,1), in another frame image, user identifies action To stand up, then recognition result z_t=(0,1,0,0).

Predict that the calculation formula of error is：

Wherein, p_tFor the prediction result obtained in step 102, L_tFor the prediction result obtained in step 101, i.e., correctly Predictive information.

On the basis of obtaining overall error using formula (7), joined using stochastic gradient descent method reverse transmittance nerve network Number.

Step 104 repeats the above steps, until neural network restrains.

Specifically, above-mentioned steps 101 can be repeated to step 103, it is follow-up repeat step 101 when, can be with Without processed training video before selection from training set, to preferably be optimized to neural network, until nerve Network convergence.

Step 105, after neural network convergence, using neural network to the skeleton data per frame image in video to be measured into Row processing, obtains corresponding identification information and predictive information.

After neural network convergence, so that it may to be detected to video to be measured using neural network.It specifically, can be first The skeleton data of video to be measured is obtained, and the skeleton data of each frame image in video to be measured is input to the spy of neural network Sign extraction part, obtains corresponding characteristic information.It is then possible to which the corresponding characteristic information of the video to be measured is input to more Business part of neural network, obtains the identification information and predictive information of each frame image, the calculating of identification information and predictive information can With with reference to the formula in step 102.Finally, according to the identification information and the predictive information, each frame figure in video is determined At the beginning of corresponding action classification and action end time or next action：Include in the identification information of one frame image The confidence level of each action classification, the confidence level highest action classification that can be considered in this frame image, predictive information Directly may indicate that also how many frame or how long arrive action terminate or it is next act start.

On the basis of technical solution provided in this embodiment, adjust to the being adapted to property of sequence of each step, for example, The recognition result and prediction result acted in each frame image in acquisition training video in step 101, may be adjusted to step It is executed after 102.

In practical applications, user can determine the action classification of each frame image and be moved according to the training video in training set At the beginning of making end time or next action.It, can be according to the actual act in video using the method in above-mentioned steps The information such as classification, optimize neural network so that the identification information and the close enough reality of predictive information of neural network output Actual value, it is then possible to be handled video to be measured using neural network, the identification information and prediction for obtaining video to be measured are believed Breath.Due to being handled each frame in video, it can identify multiple actions in a video, may be used also To predict the end time of each action.

Video actions detection method provided in this embodiment obtains training and regards by choosing training video from training set The skeleton data of each frame image and the recognition result to being acted in image and prediction result in frequency, using neural network to every The skeleton data of one frame image is handled, and the identification information and predictive information of each frame image are obtained, and is believed according to the identification Breath and recognition result and the predictive information and prediction result, optimize neural network, repeat the above steps, until Neural network restrains, and after neural network convergence, is carried out to the skeleton data per frame image in video to be measured using neural network Processing, obtains corresponding identification information and predictive information, each frame image in video can be identified, and is not necessarily to manual extraction Video clip improves detection efficiency and accuracy.

Embodiment two

Second embodiment of the present invention provides a kind of video actions detection devices.Fig. 2 is video provided by Embodiment 2 of the present invention The structure diagram of action detection device.As shown in Fig. 2, the device in the present embodiment, may include：

Acquisition module 201 obtains the skeleton of each frame image in training video for choosing training video from training set Data and the recognition result to being acted in image and prediction result；

Processing module 202 obtains each frame for being handled the skeleton data of each frame image using neural network The identification information and predictive information of image；

Optimization module 203, for being tied with prediction according to the identification information and recognition result and the predictive information Fruit optimizes neural network；

Replicated blocks 204, if judging result is no, trigger the acquisition mould for judging whether log on restrains Block 201, if the determination result is YES, then detection trigger module 205；

The detection module 205 is used for after neural network convergence, using neural network to every frame image in video to be measured Skeleton data handled, obtain corresponding identification information and predictive information.

Video actions detection device in the present embodiment can be used for executing the video described in embodiment one and move detection side Method, specific implementation principle are referred to embodiment one, and details are not described herein again.

Video actions detection device provided in this embodiment obtains training and regards by choosing training video from training set The skeleton data of each frame image and the recognition result to being acted in image and prediction result in frequency, using neural network to every The skeleton data of one frame image is handled, and the identification information and predictive information of each frame image are obtained, and is believed according to the identification Breath and recognition result and the predictive information and prediction result, optimize neural network, repeat the above steps, until Neural network restrains, and after neural network convergence, is carried out to the skeleton data per frame image in video to be measured using neural network Processing, obtains corresponding identification information and predictive information, each frame image in video can be identified, and is not necessarily to manual extraction Video clip improves detection efficiency and accuracy.

Further, the processing module 202 is specifically used for：

Further, the optimization module 203 is specifically used for：

Further, the detection module 205 is specifically used for：

Further, the acquisition module 201 is specifically used for：

Finally it should be noted that：The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Present invention has been described in detail with reference to the aforementioned embodiments for pipe, it will be understood by those of ordinary skill in the art that：Its according to So can with technical scheme described in the above embodiments is modified, either to which part or all technical features into Row equivalent replacement；And these modifications or replacements, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of video actions detection method, which is characterized in that including：

It chooses training video from training set, obtains in training video the skeleton data of each frame image and to being acted in image Recognition result and prediction result；

The skeleton data of each frame image is handled using neural network, obtains identification information and the prediction of each frame image Information；

According to the identification information and recognition result and the predictive information and prediction result, neural network is optimized；

It repeats the above steps, until neural network restrains；

2. according to the method described in claim 1, it is characterized in that, using neural network to the skeleton data of each frame image into Row processing, obtains the identification information and predictive information of each frame image, including：

The skeleton data of each frame image is separately input to the characteristic extraction part of neural network, obtains corresponding feature letter Breath；

The characteristic information is input to multitask part of neural network, the identification information and prediction for obtaining each frame image are believed Breath.

3. according to the method described in claim 1, it is characterized in that, according to the identification information and recognition result and described Predictive information and prediction result, optimize neural network, including：

According to the weighted sum of the identification error and the prediction error, overall error is obtained, and anti-using stochastic gradient descent method To Propagation Neural Network parameter.

4. according to claim 1-3 any one of them methods, which is characterized in that after neural network convergence, utilize nerve net Network handles the skeleton data of every frame image in video to be measured, obtains corresponding identification information and predictive information, including：

The skeleton data of each frame image in video to be measured is input to the characteristic extraction part of neural network, is obtained corresponding Characteristic information；

The corresponding characteristic information of the video to be measured is input to multitask part of neural network, obtains the identification of each frame image Information and predictive information；

According to the identification information and the predictive information, the corresponding action classification of each frame image and action in video are determined At the beginning of end time or next action.

5. according to the method described in claim 4, it is characterized in that, obtaining the identification acted in each frame image in training video As a result and prediction result, including：

The training video is played to user, and receives the corresponding action of each frame image that user inputs according to the training video Classification and action end time or time started；

6. a kind of video actions detection device, which is characterized in that including：

Acquisition module, for choosing training video from training set, obtain the skeleton data of each frame image in training video with And the recognition result to being acted in image and prediction result；

Processing module obtains each frame image for being handled the skeleton data of each frame image using neural network Identification information and predictive information；

Optimization module is used for according to the identification information and recognition result and the predictive information and prediction result, to nerve Network optimizes；

Replicated blocks if judging result is no, trigger the acquisition module, if sentencing for judging whether log on restrains Disconnected result is yes, then detection trigger module；

The detection module is used for after neural network convergence, using neural network to the skeleton of every frame image in video to be measured Data are handled, and corresponding identification information and predictive information are obtained.

7. device according to claim 6, which is characterized in that the processing module is specifically used for：

8. device according to claim 6, which is characterized in that the optimization module is specifically used for：

9. according to claim 6-8 any one of them devices, which is characterized in that the detection module is specifically used for：

10. device according to claim 9, which is characterized in that the acquisition module is specifically used for：