WO2014101219A1

WO2014101219A1 - Action recognition method and television

Info

Publication number: WO2014101219A1
Application number: PCT/CN2012/088111
Authority: WO
Inventors: 葛中峰; 刘丽丽; 刘卫东
Original assignee: 青岛海信信芯科技有限公司
Priority date: 2012-12-31
Filing date: 2012-12-31
Publication date: 2014-07-03

Abstract

Disclosed are an action recognition method and a television, which are applied in an electronic device comprising a camera and having a video playing function. The method comprises: obtaining a first image of a first area comprising a first user at a first moment via the camera; obtaining a second image of the first area comprising the first user at a second moment after the first moment via the camera; on the basis of the first image, obtaining a first barycentric coordinate location of the first user at the first moment; on the basis of the second image, obtaining a second barycentric coordinate location of the first user at the second moment; and at least on the basis of the first barycentric coordinate location and the second barycentric coordinate location, recognizing and determining an operation action of the first user between the first moment and the second moment.

Description

-Action recognition method and television technology field

The present application belongs to the field of pattern recognition, and specifically relates to a motion recognition method and a television set. Background technique

A somatosensory game is a new type of video game that manipulates the game through changes in physical movements. It breaks through the traditional way of inputting with the handle button, so that gamers can immerse themselves in the game as they please. use

When the Wii and PS Move platforms play a somatosensory game, the player is still inconvenienced because he still needs to wear some auxiliary equipment to complete the body movement. When playing a somatosensory game with the Xbox platform with the peripheral peripheral Kinect, you don't need to use any controller. However, Kinect needs to use a combination of a depth camera and a color camera to recognize human motion.

In the prior art, the human body motion can also be recognized by using the 2D camera. In one method, the character image data, the size and the predetermined display position of the preset initial image are required as reference data, and the currently acquired character image is used. The adjustment is made, and the image of the person part corresponding to the reference data is cut out and displayed at a predetermined position.

In the process of implementing the technical solutions of the embodiments of the present application, at least the following technical problems exist in the prior art:

In the prior art method, character feature data needs to be extracted, and since the extraction feature algorithm has high complexity and large operation amount, there is a technical problem that the recognition process is complicated. Summary of the invention

The embodiment of the invention provides a method for motion recognition, which is used to solve the technical problem that the program operation complexity is high, the calculation amount is large, and the recognition process is complicated in the prior art, and the algorithm and the smaller program are realized through the single program. The technical effect of the amount of calculation to identify the action.

A motion recognition method is applied to an electronic device having a video playback function including a camera, the method comprising:

At a first moment, obtaining, by the camera, a first image of the first region including the first user;

At a second time after the first time, a second image including the first area of the first user is obtained by the camera;

Obtaining, according to the first image, a first shield center coordinate position of the first user at the first moment; and obtaining, according to the second image, a second shield of the first user at the second moment Heart coordinate position;

And determining, according to the first shield center coordinate position and the second shield core coordinate position, an operation action of the first user between the first time and the second time. A television set includes a camera, and the television set includes:

An image obtaining module, configured to obtain, by the camera, a first region first image including a first user at a first moment; and at a second moment after the first moment, obtain, by the camera, the first image a second image of the first area of the user;

a first obtaining module, configured to obtain, according to the first image, a first shield coordinate position of the first user at the first moment; and obtain, according to the second image, the first user a second shield center coordinate position of the second moment; an identification module, configured to determine, according to the first shield center coordinate position and the second shield center coordinate position, that the first user is at the first moment and The operation action between the second moments.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages: 1. In the embodiment of the present invention, the first image of the first user is obtained at the first moment, and after the first moment Obtaining a second image of the first user at two times, and obtaining a first shield center coordinate position and a second shield center coordinate position respectively based on the first image and the second image, based at least on the first shield center coordinate The position and the second shield center coordinate position are used to identify an operation action of the first user between the first time and the second time, which solves the problem of high computational complexity and computational complexity in the prior art. Large, existing technical problems of the recognition process, the technical effect of recognizing the action by the algorithm of the single block and the small amount of program operation, for example, the prior art requires the character data of the initial image to be preset. , size and predetermined display position as reference data, identify the currently acquired task image and obtain character feature data information, and then reference number According to the proportional adjustment, the image of the part of the person corresponding to the reference data is cut out and displayed at a predetermined position, and the present invention only needs to determine the position of the shield core in the captured image of the person, and uses the coordinate of the shield core The difference operation, based on the position change of the shield core, can identify the action of the character, the algorithm is simple, and the program operation amount is small;

2. Because only the data processing capability and the equipped 2D camera of the high-end TV in the prior art can identify the action, it not only increases the cost of the hardware, but also enriches the function of the TV. DRAWINGS

1 is a flow chart of a motion recognition method according to an embodiment of the present invention;

2(a) is a schematic diagram of a foreground image before removing a shadow image according to an embodiment of the present invention;

2(b) is a schematic diagram of a foreground image after removing a shadow image according to an embodiment of the present invention;

3(a)-3(k) are schematic diagrams of various actions in the base motion model library according to an embodiment of the present invention; and FIG. 4 is a structural diagram of a television set according to an embodiment of the present invention. detailed description

The embodiment of the invention provides a method for motion recognition, which is used to solve the technical problem that the program operation complexity is high, the calculation amount is large, and the recognition process is complicated in the prior art, and the algorithm and the smaller program are realized through the single program. Operation The technical effect of the amount of recognition of the action.

The technical solution in the embodiment of the present invention is to solve the above problems, and the general idea is as follows:

Obtaining, by the camera, a first region first image including a first user at a first moment; obtaining, by the camera, the first user including the first user at a second moment after the first moment a second image of an area; obtaining, based on the first image, a first shield center coordinate position of the first user at the first moment; and obtaining, according to the second image, the first user a second shield center coordinate position at a second time; determining, based at least on the first shield center coordinate position and the second shield center coordinate position, that the first user is at the first time and the second time The operation action between the two sides solves the technical problem that the complexity of the program operation is large, the calculation amount is large, and the recognition process is complicated in the prior art, and the technology for recognizing the action through the algorithm of the single tube and the small amount of program operation is realized. effect.

In order to better understand the above technical solutions, the above technical solutions will be described in detail below in conjunction with the drawings and specific embodiments.

An embodiment of the present invention provides a motion recognition method, which is applied to an electronic device having a video playback function, including a camera, wherein the electronic device may be an existing high-end television, and the high-end television has a 2D camera. Moreover, the television itself has a certain data processing capability. After obtaining the image containing the user through the camera, the data processing capability can be used to analyze the image, thereby identifying the action performed by the user.

Referring to FIG. 1, the method includes:

Step 101: When the first user is not in the first area, obtain a first background image of the first area by using the camera.

After collecting the first background image, performing step 102: obtaining, at the first moment, a first region first image including the first user by the camera; and at a second moment after the first moment, passing the The camera obtains a second image comprising the first region of the first user.

After completing step 102, performing step 103: obtaining, according to the first image, a first shield center coordinate position of the first user at the first moment; obtaining the first user based on the second image The second shield center coordinate position at the second moment.

In a specific implementation process, first, when the user does not enter the image collection area, the first background image including only the background area is collected by the camera, and then, at the first moment after the user enters the image collection area, and At a second time after the first time, the first image and the second image of the user and the background area are collected. After obtaining an arbitrary image through the camera, it is necessary to perform pre-processing for color restoration on the image to remove the influence of uneven illumination caused by the illumination on the true color of the image, thereby restoring the true color of the image. . Usually, the white balance method is used to correct the color to restore the color. Since the RGB components in white light are the same (R=G=B=255), the white light is first corrected, and the light of other colors is also corrected. Similarly, due to RGB in gray light. The components are also the same, and the gray light can be corrected to achieve white balance. Specifically, the white balance method uses total reflection theory, assuming that the brightest point on the image is white point, and this white point is used as a reference object to automatically white balance the image. In practical engineering applications, the brightest point is defined as the point at which the value of R+G+B on the image is the maximum value. Then, based on the three color component values of the brightest point, and using Equation 1: R _A + R _B , G _A = G _B ' B _A =^~B _B ,

J

A color component value of each pixel in the color-corrected image can be obtained, and a color restored image can be obtained. among them,

RGB is the three color component values of the brightest point on the original image, respectively, R, G, and B are the three color component values (usually 255 or slightly smaller) after the white balance of the brightest point, respectively, R _B , G _B , _β respectively The three color component values of the respective pixels on the original image, R _A , G _A , are the three color component values after the white balance of each pixel point. In the embodiment of the present application, the obtaining, by the first image, the first shield center coordinate position of the first user at the first moment based on the first image, specifically includes:

Performing color correction processing on the first image, obtaining a first color restored image, and performing color correction processing on the first background image to obtain a second color restored image;

Obtaining a first processed image based on the first color restored image and the second color restored image, wherein the first processed image includes a foreground image composed of pixels having a first color value and a second background image composed of pixels of a second color value having different first color values;

And obtaining, according to the first processed image, a first shield center coordinate position of the first user at the first moment. In a specific implementation process, after processing the first image and the first background image by using the color correction preprocessing process mentioned above, the first color restored image wb_Pre_rgb and the second color restored image wb_Bg_rgb, b Bg i are respectively obtained. )>T i 二r, g, b use these two images, and the formula two Fg- b/"-

- ,

0 else

A first processed image after binarization processing can be obtained, where i=r, g, b represent three channels of r, g, b, respectively. Equation 2 shows that as long as the difference between the first color restored image and the second color restored image for one channel is greater than the threshold T, then this portion is the foreground image, and is set to white, and the rest is the second background image, and It is set to black, resulting in a first processed image containing only two color values. In the implementation process of the present application, after obtaining the second image and other arbitrary images, it is necessary to perform the above-mentioned binarization processing on the image, thereby distinguishing the foreground portion and the background portion in the image. Further, the obtaining, according to the first processed image, the first shield center coordinate position of the first user at the first moment, specifically includes: Determining whether there is a shadow image in the foreground image;

When the shadow image exists in the foreground image, the shadow image is removed to obtain a shadow-destroy first processed image;

And obtaining, according to the de-shadowed first processed image, a first shield coordinate position of the first user at the first moment.

In the specific implementation process, there may be a shadow image caused by the shadow of the user in the obtained foreground image, as shown in FIG. 2( a ), wherein the shadow image is an image with a small number of white dots in the continuous region. And has a separation from the user's image. If the shadow image exists in the foreground image, the shadow image is removed to obtain a first image to be shaded, as shown in FIG. 2(b).

Further, the obtaining, according to the first image of the shading, the first shield center position of the first user at the first moment, specifically includes:

And obtaining an abscissa value and an ordinate value of each pixel point constituting the foreground image based on the first coordinate system composed of the X axis and the Y axis in the first processed image of the shading;

Obtaining, according to the abscissa value and the ordinate value of each pixel point, a first abscissa value and a first ordinate value of the first shield core of the first user at the first moment, thereby obtaining the The first shield center coordinate position.

In a specific implementation process, after obtaining the first processed image that is not shaded, a first coordinate system composed of an X axis and a Y axis is established in the first processed image, and the right axis of the X axis is specified in the embodiment of the present application. The side is the positive direction and the upper side of the Y axis is the positive direction. Based on the coordinate system, the horizontal and vertical coordinates of each pixel in the first processed image can be obtained.

NN value. Furthermore, the first abscissa value of the first shield core is _{_η} and the first ordinate value y can be obtained by combining the formula _ _η = ^ ^ , = ^ ^ . Where ^ and ^^ are the horizontal and vertical coordinate values of the respective pixel points in the foreground image in the first processed image, and Ν is the number of total pixel points in the foreground image. In the implementation process of the present application, the shield core coordinate of the foreground image of each frame can be obtained by the above-mentioned shield obtaining method. ,]. After the step 103 is completed, step 104 is performed: determining, according to the first shield center coordinate position and the second shield core coordinate position, that the first user is between the first moment and the second moment Operational actions.

In the implementation of the present application, step 104 specifically includes:

Obtaining a first difference by subtracting the first abscissa value of the first shield from the second abscissa value of the second shield;

Determining whether the first difference is greater than a first threshold;

Determining, when the first difference is greater than the first threshold, the first user at the first moment and the second The operation action at the moment is the right movement;

Determining whether the first difference is less than a second threshold when the first difference is not greater than the first threshold; determining the first user when the first difference is less than the second threshold The operation operation between the first time and the second time is a left movement.

Obtaining a second difference by subtracting the first ordinate value of the first shield from the second ordinate value of the second shield;

Determining whether the second difference is greater than a third threshold;

When the second difference is greater than the third threshold, determining that the operation action of the first user between the first time and the second time is a jumping action;

Determining whether the second difference is less than a fourth threshold when the second difference is not greater than the third threshold; determining the first user when the second difference is less than the fourth threshold The operation operation seen at the first time and the second time is a squat operation.

In the embodiment of the present application, after the step 103, the method further includes:

Taking i from 3 to N in turn, the i-th abscissa value and the i-th ordinate value of the i-th shield of the first user at the i-th moment after the second moment are obtained, and N is an integer greater than or equal to 4.

In the specific implementation process, the camera will acquire 15~25 frames per second, of which the standard standard camera sampling rate is 25 frames/second, and the industrial camera usage rate is up to 60 frames/second, or 200 frames/second. Even higher, but the PAL file on the TV is 25 frames per second, and the NTSC file is 30 frames per second. Wherein, when obtaining the image of the first frame, the initialization is first performed, that is, the coordinate position of the shield image of the foreground image in the first frame image is simultaneously assigned to the three parameters set in advance, and the reference frame coordinate reference_frame (r_x ₀ , r_y ₀₎ ), the previous frame 贞 shield coordinates previous-frame ( p_x ₀ , p_yo ), the current frame shield coordinates current_frame ( c_x ₀ , c_y ₀ ). At different times, the three preset parameters will change according to the user's action. In addition, according to the three parameters, the difference between the current frame centroid ordinate value and the reference shield ordinate value dy _CT = c_y _{Q -} r_y _Q , the current frame shield abscissa value and the reference shield abscissa value can be obtained. The difference dx _CT = c_x _{Q -} r_x _Q , the difference between the current frame shield ordinate value and the previous frame shield ordinate value dy _C p = c_yo-p_y. And the difference between the current frame shield abscissa value and the previous frame shield abscissa value dx _C p=c_xo-p_x ₀ .

In the embodiment of the present application, when the first difference is greater than the first threshold, determining that the operation action of the first user between the first time and the second time is a right movement Specifically, including:

When the first difference is greater than the first threshold, determining that the operation action of the first user between the first time and the second time has a right shifting tendency;

Determining, according to the second abscissa value and the at least one of the i-th abscissa values, whether the operation action has a right shift end flag;

When the right shift end flag is present, determining that the operation action is a right movement; or

Determining, when the first difference is less than the second threshold, the first user at the first moment and the The operation action at the second time is a left movement, and specifically includes:

When the first difference is smaller than the second threshold, determining that the operation action of the first user between the first time and the second time has a left shifting tendency;

Determining, according to the second abscissa value and the at least one of the i-th abscissa values, whether the operation action has a left-shift end flag;

When the left shift end flag is present, it is determined that the operation action is a left shift.

In the specific implementation process, when the first shield center coordinate of the first moment is assigned to the reference shield center coordinate, the first shield center coordinate position and the second shield core coordinate position are located, and if dx _CT > T _lr , the first is determined. The user's operation action has a right shift tendency. If dx _CT < -T _lr , it is determined that the first user's operation action has a left shift tendency. When the operation action of the first user has a right shifting tendency, it is determined whether there is a right shift end flag, and when there is a right shift end flag, it indicates that the first user's operation action is a right shift. The application only provides two implementations of the right shift end flag, and those skilled in the art may also use other methods as the flag for the right shift end.

Manner 1: When the operation action has a right shift tendency, that is, dx _cr > T _lr , when the sign of dx _cp changes from a positive sign to a negative sign, it is judged whether |dx _cp | is greater than the threshold Τ _η if |dx _cp | If the threshold is greater than the threshold T _n , it indicates that the right shift does not end. The change of the dx _cp symbol is only the interference caused by the user's left and right shaking. If |dx _cp | is greater than the threshold T _n , it indicates that there is a right shift end flag, thereby determining the first The operation action of a user between the first time and the second time is a right movement.

Manner 2: When the operation action has a right shift trend, that is, dx _CT > T _lr , when the sign of ^ is always a positive number, it is determined whether the value of |dx _cp | is less than 2 in the preset number of consecutive frames, for example 5 consecutive frames or 6 consecutive frames. If |dx _cp | is less than 2, it indicates that there is a right shift end flag in a distance, thereby determining that the first user operates right between the first time and the second time. Move.

In addition, after determining that the operation action between the first time and the second time is the right movement, the shield center coordinate of the current frame is given to the reference frame shield center coordinate, and the operation action between the next time points is determined.

In the specific implementation process, the determination process of the operation movement specifically shifting to the left is exactly the opposite of the judgment process of the right movement. The judgment process of the left movement according to the right movement can be obtained by a person of ordinary skill in the art, and is not Let me repeat.

In the embodiment of the present application, when the second difference is greater than the third threshold, determining that the operation action of the first user between the first time and the second time is a jumping action, Specifically include:

When the second difference is greater than the third threshold, determining that the operation action of the first user between the first time and the second time has a jumping trend;

Determining, based on the second ordinate value and the at least one of the i-th ordinate values, whether the operation action is a presence skip end flag;

When the skip end flag is present, determining that the operation action is a jump action; or

Determining, when the second difference is smaller than the fourth threshold, the first user at the first moment and the The operation action seen at the second moment is a squat action, which specifically includes:

When the second difference is smaller than the fourth threshold, determining that the operation action of the first user between the first time and the second time has a downward trend;

Determining, based on the second ordinate value and the at least one of the i-th ordinate values, whether the operation action is a presence suffix end flag;

When the squat end flag is present, it is determined that the operation action is a squat action.

In the specific implementation process, when the first shield center coordinate position of the first moment is given to the reference shield core coordinate position, according to the first shield center coordinate position and the second shield center coordinate position, if dy _c , _r > Tj, It is determined that the operation action of the first user has a jumping tendency. If dy _c , _r < -T _s , it is determined that the operation action of the first user has a tendency to squat. When the operation action of the first user has a jumping trend, it is determined whether there is a jump end flag, and when there is a jump end flag, it indicates that the operation action of the first user is a jump action, wherein the jump end flag is specifically determining that dy _cp is Whether the sign in the first half of the time is positive and the sign in the second half of the time is negative in a certain period of time, finally, dy _cr < Tj, if the above situation exists, it indicates that there is a jump end flag, thereby determining that the first user is in the first The operation between the time and the second time is a jump action.

In addition, after determining that the operation action between the first time and the second time is a jump action, the reference coordinates are not replaced. In the specific implementation process, the judgment process of the operation action specifically as the squat action is exactly the opposite of the judgment process of the hop action, and the judgment process of the squat action can be obtained by a person of ordinary skill in the art according to the judgment process of the jump action, Let me repeat.

And obtaining, according to the first image, a first area of the first body part image of the first body part of the first user at the first moment;

Based on the second image, a second area of the first body part image is obtained at a second time.

In a specific implementation, a body part that the user does not change during exercise is selected as the first body part, for example, a face. After preprocessing the first image and the second image, obtaining a first processed image, and obtaining the first area and the second area based on the number of pixel points in the foreground image in the first processed image The more the number of pixels in the image, the larger the area. Conversely, the smaller the number of pixels, the smaller the area.

In the embodiment of the present application, determining, according to the first shield center coordinate position and the second shield core coordinate position, determining an operation action of the first user between the first time and the second time , Specifically:

Determining, by the first shield center coordinate position, the second shield center coordinate position, the first area and the second area, the first user at the first moment and the second moment Operational actions between.

In a specific implementation process, after determining, according to the first shield center coordinate position and the second shield center coordinate position, that the operation action of the first user between the first time and the second time is a jump action, determining Whether the first area is If the first area is larger than the second area, indicating that the first user is away from the camera, thereby determining that the operation action of the first user is a backward jump action, if the first area If the second area is smaller than the second area, it indicates that the first user is close to the camera, thereby determining that the operation action of the first user is a forward jumping action.

As described above, since the first area and the second area of the first body part are passed, it is possible to determine whether the user has an action in the forward or backward direction, which can more accurately recognize the action and enrich the recognition action. Method, more actions can be identified;

In addition, the change of the position of the shield core and the area of the image of the body part can also determine other user actions.

In the embodiment of the present application, after the step 104, the method further includes:

Matching the operation action with a standard action model in the action model library to obtain a first match result; and when the first match result indicates that the match is successful, displaying the standard action model corresponding to the action action.

In the specific implementation process, as shown in Fig. 3 (a) ~ (k), the action model library includes various basic models corresponding to actions such as lower jaw, jump, left shift, right shift, and no action. The basis for establishing the model is: J. H. Yoo et al. established a human body line graph model based on human anatomy knowledge. Assuming that the total height of the human body is Η, the relative lengths of various parts of the human body can be obtained, as shown in Table 1 below:

Moreover, based on the user's operation motion and the proportion of each part of the human body, the coordinates of the 17 joint points of the human body can be obtained, and then the real skeleton model of the user during the action can be obtained, and the skeleton model can be displayed on the television. At the same time, the data of the skeleton model can be transmitted to the upper application for game development, so that the skeleton model can be used to control the characters on the game screen to perform the somatosensory game, so that the player has a feeling of immersing in the game.

In addition, based on the coordinate position of each pixel in the foreground image, the ordinate value y _t of the highest point in the foreground image can also be obtained. _p , the ordinate value of the lowest point y _b . _Tt . _m , the leftmost abscissa value x _left and the rightmost abscissa value x _nght . Further, in the case where the user is in the standing motion, various operation actions of the user in the standing state can be determined based on the ^'H = y _to -y _hottom - shield core and the standard W value and H value. For example, if H'-H

> T _h , indicating that the user is in the raising hand movement, and then counting each point in the foreground image in this state, and counting the highest ordinate value corresponding to each abscissa value, thereby obtaining an array, if The graph consisting of arrays is bimodal, then the action is determined to stand with both hands. If the graph consisting of the array is a single peak, then the action is determined to be standing with one hand, and when the action is determined to stand with one hand, if the peak is Xtop-xo ^ O, stand for the left hand, if Xtop-xo^ O, stand for the right hand; if Hi-H < T _h , and | Wi-W| > T _w , then extend the hand or foot in the horizontal direction, If |Xrig t -Χθ |"|xieft "Χθ | < -T, it stands for the right hand or the right foot. If IXnght'-Xoi-lxieft'-XQ' Ti, it stands for the left or left foot. In other cases, Stand with your hands out. Based on the same concept, an embodiment of the present invention provides a television set including a camera. Referring to FIG. 4, the television includes:

a background obtaining submodule, configured to obtain, by the camera, a first background image of the first area when the first user is not in the first area;

An image obtaining module 401, configured to obtain, by using the camera, a first region first image including a first user at a first moment; and at a second moment after the first moment, obtaining, by the camera, the first image a second image of the first area of a user;

a first obtaining module 402, configured to obtain, according to the first image, a first shield coordinate position of the first user at the first moment; and obtain, according to the second image, the first user The second shield center coordinate position at the second moment.

In a specific implementation process, first, when the user does not enter the image collection area, the first background image including only the background area is collected by the camera, and then, at the first moment after the user enters the image collection area, and At a second time after the first time, the first image and the second image of the user and the background area are collected. After obtaining an arbitrary image through the camera, it is necessary to perform pre-processing for color restoration on the image to remove the influence of uneven illumination caused by the illumination on the true color of the image, thereby restoring the true color of the image. . Usually, the white balance method is used to correct the color to restore the color. Since the RGB components in white light are the same (R=G=B=255), the white light is first corrected, and the light of other colors is also corrected. Similarly, since the RGB components in the gray light are also the same, Use gray light to achieve white balance. Specifically, the white balance method uses total reflection theory, assuming that the brightest point on the image is white point, and this white point is used as a reference object to automatically white balance the image. In practical engineering applications, the brightest point is defined as the point at which the value of R+G+B on the image is the maximum value, and then, according to the most Highlight the three color component values and use Equation 1: R _A + R _B , G _A = G _B ' B _A =^~B _B ,

Max J max max A color component value of each pixel in the color-corrected image can be obtained, and a color restored image can be obtained. among them,

RGB is the three color component values of the brightest point on the original image, respectively, R, G, and B are the three color component values (usually 255 or slightly smaller) after the white balance of the brightest point, respectively, R _B , G _B , _β respectively The three color component values of the respective pixels on the original image, R _A , G _A , are the three color component values after the white balance of each pixel point. In the embodiment of the present application, the first obtaining module specifically includes:

The background obtaining a submodule;

a color atomic module, configured to perform color correction processing on the first image, obtain a first color restored image, and perform color correction processing on the first background image to obtain a second color restored image;

An image processing submodule, configured to obtain a first processed image based on the first color restored image and the second color restored image, wherein the first processed image includes a pixel composed of a first color value a foreground image and a second background image composed of pixel points having a second color value different from the first color value;

In a specific implementation process, after processing the first image and the first background image by using the color correction preprocessing process mentioned above, the first color restored image wb_Pre_rgb and the second color restored image wb_Bg_rgb, b Bg i are respectively obtained. )>T i 二r, g, b using these two images,

,

0 else

A first processed image after binarization processing can be obtained, where i = r, g, b represent three channels of r, g, b, respectively. Equation 2 shows that as long as the difference between the first color restored image and the second color restored image for one channel is greater than the threshold T, then this portion is the foreground image, and is set to white, and the rest is the second background image, and It is set to black, resulting in a first processed image containing only two color values. In the implementation process of the present application, after obtaining the second image and other arbitrary images, the above-mentioned binarization processing is required for the image, thereby distinguishing the foreground portion and the background portion in the image. In the embodiment of the present application, the first obtaining module further includes:

a determining submodule, configured to determine whether a shadow image exists in the foreground image;

De-shadowing image obtaining sub-module, configured to remove the shadow image when the shadow image exists in the foreground image, to obtain a shading first processed image;

In the specific implementation process, there may be a shadow image caused by the shadow of the user in the obtained foreground image, as shown in FIG. 2( a ), wherein the shadow image is an image with a small number of white dots in the continuous region. And has a separation from the user's image. If the shadow image exists in the foreground image, the shadow image is removed to obtain The shadow first processed image, as shown in Figure 2 (b).

In the embodiment of the present application, the first obtaining module further includes:

Creating a submodule for creating a first coordinate system composed of an X axis and a Y axis in the first processed image; a first obtaining submodule, configured to obtain the foreground image based on the first coordinate system The abscissa value and the slave coordinate value of each pixel point;

a second obtaining submodule, configured to obtain, according to an abscissa value and an ordinate value of each pixel point, a first abscissa value and a first first coordinate value of the first shield of the first user at the first moment The ordinate value, and the first shield center coordinate position is obtained.

NN value. Furthermore, the first abscissa value of the first shield core can be obtained as _{_η} and the first ordinate value by combining the formula _ _η = ^ ^ , = ^ ^ . Where ^ and ^^ are the horizontal and vertical coordinate values of the respective pixel points in the foreground image in the first processed image, and Ν is the number of total pixel points in the foreground image. In the implementation process of the present application, the shield core coordinate of the foreground image of each frame can be obtained by the above-mentioned shield obtaining method. ,]. In the embodiment of the present application, the television further includes:

The identification module 403 is configured to determine, according to the first shield center coordinate position and the second shield core coordinate position, an operation action of the first user between the first time and the second time.

In the embodiment of the present application, the identifying module 403 specifically includes:

a first difference obtaining submodule, configured to obtain a first difference by subtracting the first abscissa value of the first shield from the second abscissa value of the second shield;

a first determining submodule, configured to determine whether the first difference is greater than a first threshold;

a first determining submodule, configured to determine, when the first difference is greater than the first threshold, that the operation action of the first user between the first time and the second time is a right movement;

a second determining sub-module, configured to determine, when the first difference is not greater than the first threshold, whether the first difference is less than a second threshold;

a second determining submodule, configured to determine, when the first difference is smaller than the second threshold, that the operation action of the first user between the first time and the second time is a left movement;

a second difference obtaining submodule for subtracting the first shield from the second ordinate value of the second shield The first ordinate value, obtaining a second difference;

a third determining submodule, configured to determine whether the second difference is greater than a third threshold;

a third determining submodule, configured to determine, when the second difference is greater than the third threshold, that the operation action of the first user between the first time and the second time is a jumping action;

a fourth determining sub-module, configured to determine, when the second difference is not greater than the third threshold, whether the second difference is less than a fourth threshold;

And a fourth determining submodule, configured to determine, when the second difference is smaller than the fourth threshold, that the operation action that the first user sees at the first time and the second time is a squat action.

In the embodiment of the present application, the television further includes:

a second obtaining module, configured to sequentially take i from 3 to N, and obtain an i-th abscissa value and an i-th ordinate value of the i-th shield of the first user after the second moment after the second moment, N Is an integer greater than or equal to 4.

In the embodiment of the present application, the first determining submodule specifically includes:

a first determining unit, configured to determine, when the first difference is greater than the first threshold, that the operation action of the first user between the first time and the second time has a right shifting trend;

a first determining unit, configured to determine, according to the second abscissa value and the at least one i-th abscissa value, whether the right action end flag exists in the operation action;

a second determining unit, configured to: when the right shift end flag is present, determine that the operation action is a right move; or the second determining submodule, specifically:

a third determining unit, configured to determine, when the first difference is smaller than the second threshold, that the operation action of the first user between the first time and the second time has a left shifting tendency;

a second determining unit, configured to determine, according to the second abscissa value and the at least one of the i-th abscissa values, whether the left-shift end flag exists in the operation action;

And a fourth determining unit, configured to determine that the operation action is a left movement when the left shift end flag is present. In the specific implementation process, when the first shield center coordinate of the first moment is assigned to the reference shield center coordinate, the first shield center coordinate position and the second shield core coordinate position are located, and if dx _CT > T _lr , the first is determined. The user's operation action has a right shift tendency. If dx _CT < -T _lr , it is determined that the first user's operation action has a left shift tendency. When the operation action of the first user has a right shifting tendency, it is determined whether there is a right shift end flag, and when there is a right shift end flag, it indicates that the first user's operation action is a right shift. The application only provides two implementations of the right shift end flag, and those skilled in the art may also use other methods as the flag for the right shift end.

In addition, after determining that the operation action between the first time and the second time is the right movement, the shield center coordinates of the current frame are assigned to the reference frame shield center coordinates, and the determination of the operation action between the next time intervals is performed.

In the embodiment of the present application, the third determining submodule specifically includes:

a fifth determining unit, configured to determine, when the second difference is greater than the third threshold, that the operation action of the first user between the first time and the second time has a jumping trend;

a third determining unit, configured to determine, according to the second ordinate value and the at least one of the ith ordinate values, whether the operation action is a skip end flag;

a sixth determining unit, configured to: when the skip end flag is present, determine that the operation action is a skip action; or the fourth determining submodule, specifically:

a seventh determining unit, configured to determine, when the second difference is smaller than the fourth threshold, that the operation action of the first user between the first time and the second time has a downward trend;

a fourth determining unit, configured to determine, according to the second ordinate value and the at least one of the ith ordinate values, whether the operation action is a squat end flag;

And an eighth determining unit, configured to determine that the operation action is a squatting action when the squat end flag is present. In the specific implementation process, when the first shield center coordinate position of the first moment is given to the reference shield core coordinate position, according to the first shield center coordinate position and the second shield center coordinate position, if dy _c , _r > Tj, Determining that the first user's action has Jumping trend, if dy _c , _r < -T _s , it is determined that the first user's operation action has a downward trend. When the operation action of the first user has a jumping trend, it is determined whether there is a jump end flag, and when there is a jump end flag, it indicates that the operation action of the first user is a jump action, wherein the jump end flag is specifically determining that dy _cp is Whether the sign in the first half of the time is positive and the sign in the second half of the time is negative in a certain period of time, finally, dy _cr < Tj, if the above situation exists, it indicates that there is a jump end flag, thereby determining that the first user is in the first The operation between the time and the second time is a jump action.

In the embodiment of the present application, the television further includes:

An area obtaining module, configured to obtain, according to the first image, a first area of the first body part image of the first body part of the first user at a first moment; based on the second image, obtained in the first At the second moment, the second area of the first body part image.

In the embodiment of the present application, the identifying module is specifically configured to:

In a specific implementation process, after determining, according to the first shield center coordinate position and the second shield center coordinate position, that the operation action of the first user between the first time and the second time is a jump action, determining Whether the first area is larger than the second area, if the first area is larger than the second area, indicating that the first user is away from the camera, thereby determining that the operation action of the first user is a backward jumping action, If the first area is smaller than the second area, it indicates that the first user is close to the camera, thereby determining that the operation action of the first user is a forward jumping action.

In the embodiment of the present application, the television further includes:

a matching module, configured to match the operation action with a standard action model in the action model library to obtain a first matching result;

a display module, configured to display, when the first matching result indicates that the matching is successful, display the corresponding operation The standard action model.

In the specific implementation process, as shown in Fig. 3 (a) ~ (k), the action model library includes various basic models corresponding to actions such as squatting, jumping, left shifting, right shifting, and no motion. The basis for establishing the model is: J ' H . Yoo et al. established a human body line graph model based on human anatomy knowledge. Assuming that the total height of the human body is H, the relative lengths of various parts of the human body can be obtained, as shown in Table 1 below:

Table I

In addition, based on the coordinate position of each pixel in the foreground image, the ordinate value y _t of the highest point in the foreground image can also be obtained. _p , the ordinate value of the lowest point y _b . _Tt . _m , the leftmost abscissa value x _left and the rightmost abscissa value x _nght . Further, in the case where the user is standing, it is possible to follow ^^ = - H ^{i =} K. _Tt . _m , centroid [. , ^| and the standard W value and H value to determine the various operating actions of the user in the standing state. For example, if Hi-H

>T _h , indicating that the user is in the raising hand movement, and then counting each point in the foreground image in this state, and counting the highest ordinate value corresponding to each horizontal coordinate value, thereby obtaining an array, if The graph consisting of arrays is bimodal, then the action is determined to stand with both hands. If the graph consisting of the array is a single peak, then the action is determined to be standing with one hand, and when the action is determined to stand with one hand, if the peak is Xtop-xo ^O, stand for the left hand, if Xtop-xo^O, stand for the right hand; if Hi-H<T _h , and |Wi-W|>T _w , then extend the hand or foot in the horizontal direction, If |Xrig t -Χθ |"|xieft "Χθ | <-T, stand for the right hand or the right foot. If IXnght'-Xoi-lxieft'-XQ' Ti, stand for the left or left foot. Otherwise, Stand with your hands out. One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

In the embodiment of the present invention, the first image of the first user is obtained at the first time, the second image of the first user is obtained at the second time after the first time, and based on the first image and the first And obtaining, by the second image, a first shield center coordinate position and a second shield core coordinate position, respectively, based on the first shield center coordinate position and the second shield center coordinate position, and determining that the first user is in the first The operation operation between the moment and the second moment solves the technical problem that the recognition process is complicated due to the high complexity of the program operation and the large amount of calculation in the prior art, and the algorithm and the smaller method are realized. The technical effect of the program operation amount to recognize the action, for example, the prior art needs to use the preset character image data, size and predetermined display position of the initial image as reference data to identify the currently acquired task image and acquire the character feature. Data innovation, and then proportional adjustment with the reference data, intercepting the image of the person part of the figure that matches the reference data, displayed in the pre-position However, the present invention only needs to determine the position of the shield core in the captured character image, and uses the difference calculation of the shield core coordinate, and the movement of the character can be recognized based on the position change of the shield core, the algorithm cylinder Single, the program operation is small;

2. Because only the data processing capability and the equipped 2D camera of the high-end TV in the prior art can identify the action, it not only increases the cost of the hardware, but also enriches the function of the TV.

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the present invention can be embodied in the form of a computer program product embodied on one or more computer-usable storage interfaces (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each process and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart. These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Although the preferred embodiment of the invention has been described, it will be apparent to those skilled in the < Therefore, the appended claims are intended to be construed as including the preferred embodiments and the modifications

Claims

Rights request

A motion recognition method, which is applied to an electronic device having a video playback function including a camera, wherein the method includes:

And determining, according to the first shield center coordinate position and the second shield core coordinate position, an operation action of the first user between the first time and the second time.

The method according to claim 1, wherein the obtaining the first shield center coordinate position of the first user at the first moment comprises:

Obtaining, by the camera, a first background image of the first area when the first user is not in the first area;

Obtaining a first processed image based on the first image and the first background image, wherein the first processed image includes a foreground image composed of pixel points having a first color value and is provided with the first A second background image consisting of pixels of a second color value having different color values.

3. The method according to claim 2, wherein after the obtaining the first processed image, the method further comprises:

Creating a first coordinate system composed of an X axis and a Y axis in the first processed image;

Obtaining an abscissa value and an ordinate value of each pixel point constituting the foreground image based on the first coordinate system;

4. The method according to claim 3, wherein when the right side of the X-axis in the coordinate system is a positive direction and an upper side of the Y-axis is a positive direction, the at least based on Determining, by the first shield core coordinate position and the second shield core coordinate position, determining an operation operation of the first user between the first time and the second time, specifically:

Determining whether the first difference is greater than a first threshold;

Determining, when the first difference is greater than the first threshold, that the operation action of the first user between the first time and the second time is a right movement; Determining whether the first difference is less than a second threshold when the first difference is not greater than the first threshold; determining the first user when the first difference is less than the second threshold The operation operation between the first time and the second time is a left movement.

5. The method according to claim 3, wherein when the right side of the X-axis in the coordinate system is a positive direction and an upper side of the Y-axis is a positive direction, the at least based on Determining, by the first shield core coordinate position and the second shield core coordinate position, determining an operation operation of the first user between the first time and the second time, specifically:

Determining whether the second difference is greater than a third threshold;

Determining whether the second difference is less than a fourth threshold when the second difference is not greater than the third threshold; determining the first user when the second difference is less than the fourth threshold The operation operation between the first time and the second time is a squat operation.

6. The method according to claim 3, wherein, based on the first image, obtaining a first shield center coordinate position of the first user at the first moment; based on the second The image, after obtaining the second shield coordinate position of the first user at the second moment, the method further includes:

Determining, according to the second image, a second area of the first body part image at a second moment; determining whether the first area is greater than the second area;

When the first area is greater than the second area, determining that the operation action of the first user between the first time and the second time is a post-movement;

When the first area is smaller than the second area, determining that the operation action of the first user between the first time and the second time is a forward movement.

The method according to claim 6, wherein the determining, according to the first shield center coordinate position and the second shield center coordinate position, determining that the first user is at the first moment and The operation action between the second time is specifically as follows:

8. A television set comprising a camera, wherein the television set comprises:

An image obtaining module, configured to obtain, by the camera, a first area including a first user by using the camera at a first moment An image obtained by the camera at a second time after the first time, including the first region of the first user;

The television set according to claim 8, wherein the television further comprises:

The television set according to claim 9, wherein the identification module is specifically configured to: