WO2022078154A1

WO2022078154A1 - Display device and media asset playing method

Info

Publication number: WO2022078154A1
Application number: PCT/CN2021/119052
Authority: WO
Inventors: 王光强; 赖园园; 薛梅; 刘金刚
Original assignee: 聚好看科技股份有限公司
Priority date: 2020-10-15
Filing date: 2021-09-17
Publication date: 2022-04-21
Also published as: CN116324700A

Abstract

Embodiments of the present application provide a display device and a media asset playing method. The display device comprises: a display; and a controller, connected to the display. The controller is configured to: receive a media asset playing instruction inputted by a user; in response to the media asset playing instruction, obtain a target video corresponding to the media asset playing instruction; when a control is not set above a first playing window corresponding to the target video, play the target video in the first playing window; and when the control is set above the first playing window corresponding to the target video, move a display position of the target video to a direction distant from the control in the first playing window, so that a center position of a picture of the target video is close to a center position of a target display area not shielded by the control in the first playing window for display. The control is non-transparent and shields one side of the first playing window. The present application improves the display effect of media asset playing.

Description

Display device and media resource playback method

This application requires Chinese patent applications filed on October 15, 2020 with application number 202011102193.3, filed on March 15, 2021 with application number 202110275148.6, and filed on April 25, 2021 with application number 202110448074.1 Priority, the entire contents of which are incorporated herein by reference.

technical field

The present application relates to the field of display technologies, and in particular, to a display device and a method for playing media assets.

Background technique

Nowadays, it is a popular way to exercise by following the fitness videos on TV. In order to better grasp whether your fitness movements are standard, some TVs can collect user images through cameras, and display the user images and fitness videos on the TV at the same time. Play, so that users can see their actions on the TV, so that they can compare and analyze their actions with the actions in the fitness video terminal.

At present, the display ratio of most TVs is 16:9, and the ratio of fitness videos is usually 16:9. If the TV only plays fitness videos, the fitness videos can be displayed in full screen. If the fitness videos and user images are displayed at the same time, Since the user image occupies part of the display area of the TV, the display area of the fitness video may not be 16:9. In the related art, when the ratio of a video is inconsistent with the ratio of the playback window, the video is usually scaled to a smaller size so that it can be displayed in the playback window. However, this will cause black borders to appear around the video. , and the video will also become smaller. For fitness videos, the smaller video will make some fitness movements difficult to see, which will seriously affect the user's viewing experience.

SUMMARY OF THE INVENTION

In a first aspect, the present application provides a display device, the display device comprising:

monitor;

a controller, connected to the display, the controller being configured to:

Receive the media asset playback instruction input by the user;

In response to the media asset playback instruction, obtain a target video corresponding to the media asset playback instruction;

When the control is not set above the first playback window corresponding to the target video, the target video is played in the first playback window;

When the control is provided above the first playback window corresponding to the target video, the display position of the target video is moved in the first playback window away from the control, so that the target video is displayed in a direction away from the control. The center position of the screen is displayed close to the center position of the target display area in the first play window that is not blocked by the controls, wherein the controls are opaque and block one side of the first play window.

In a second aspect, the present application provides a method for playing media assets, the method comprising:

Receive the media asset playback instruction input by the user;

Description of drawings

FIG. 1 shows a schematic diagram of an operation scenario between a display device and a control device in some embodiments;

FIG. 2 shows a schematic diagram of the fitness home page in some embodiments;

Figure 3 shows a schematic diagram of a media asset details interface in some embodiments;

FIG. 4 shows a schematic diagram of a playback mode selection interface in some embodiments;

Figure 5 shows a schematic diagram of a full-screen playback interface in normal mode in some embodiments;

Figure 6 shows a schematic diagram of a dual-window playback interface in the follow-up mode in some embodiments;

Figure 7 shows a schematic diagram of a dual-window playback interface in the follow-up mode in some embodiments;

A schematic diagram of image movement in some embodiments is shown in FIG. 8;

Figure 9 shows a schematic diagram of the effect after the image is moved in some embodiments;

Figure 10 shows a schematic diagram of a display interface in some embodiments;

Figure 11 shows a schematic diagram of the interaction of dots of target media assets in some embodiments;

Figure 12 shows a schematic diagram of scoring interaction of target media assets in some embodiments;

Figure 13 shows a flowchart of the scoring method in the follow-up practice process in some embodiments;

Figure 14 shows a schematic diagram of scoring during follow-up practice in some embodiments;

Figure 15 shows a schematic diagram of scoring after follow-up practice in some embodiments;

Figure 16 shows a schematic diagram of an exception handling interface of a display device in some embodiments;

Figure 17 shows a schematic diagram of an exception handling interface of a display device in some embodiments;

FIG. 18 shows a schematic diagram of an exception handling interface of a display device in some embodiments.

Detailed ways

In order to make the purpose and implementation of the present application clearer, the exemplary embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the exemplary embodiments of the present application. Obviously, the described exemplary embodiments are only the Some embodiments are claimed, but not all embodiments.

FIG. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in FIG. 1 , a user can operate the display device 200 through the smart device 300 or the control device 100 .

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or Bluetooth protocol communication, and other short-range communication methods, and the display device 200 is controlled wirelessly or wiredly. The user can control the display device 200 by inputting user instructions through keys on the remote control, voice input, control panel input, and the like.

In some embodiments, a smart device 300 (eg, a mobile terminal, a tablet computer, a computer, a notebook computer, etc.) can also be used to control the display device 200 . For example, the display device 200 is controlled using an application running on the smart device.

In some embodiments, the display device 200 can also be controlled in a manner other than the control apparatus 100 and the smart device 300. For example, the module for acquiring voice commands configured inside the display device 200 can directly receive the user's voice command for control. , the user's voice command control can also be received through a voice control device provided outside the display device 200 device.

In some embodiments, the display device 200 is also in data communication with the server 400 . The display device 200 may be allowed to communicate via local area network (LAN), wireless local area network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200 . The server 400 may be a cluster or multiple clusters, and may include one or more types of servers.

In some embodiments, the user may input user commands on a graphical user interface (GUI) displayed on the display 260, and the user input interface receives the user input commands through the graphical user interface (GUI). Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.

In some embodiments, a "user interface" is a medium interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. The commonly used form of user interface is Graphical User Interface (GUI), which refers to a user interface related to computer operations displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display screen of the electronic device, wherein the control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, etc. visual interface elements.

In some embodiments, the display device can directly enter the interface of the preset VOD program after startup. The interface of the VOD program can be as shown in FIG. 2 , including at least a navigation bar and a content display area located below the navigation bar. The content displayed in the display area changes with the selected control in the navigation bar. The program in the application layer can be integrated in the video-on-demand program to be displayed through a control in the navigation bar, or it can be further displayed after the application control in the navigation bar is selected.

In some embodiments, after the display device is started, it can directly enter the display interface of the last selected signal source, or the signal source selection interface, where the signal source can be a preset video-on-demand program, and can also be an HDMI interface, a live TV interface At least one of etc., after the user selects different signal sources, the display can display the content obtained from the different signal sources.

In some embodiments, as shown in FIG. 2 , the navigation bar may be provided with multiple controls, such as "My", "Channel", "Video", "Fitness", "VIP", "Education", "Mall" , "Games" and applications, different navigation bar controls correspond to different channel interfaces, if the user wants to exercise, he can select the "Fitness" control, the interface shown in Figure 2 is the interface after the "Fitness" control is selected, the user can Select a fitness video in this interface to follow the fitness video to exercise.

Referring to Figure 3, after the user clicks a fitness video on the interface shown in Figure 2, the display device requests the server to send the corresponding details page data according to the configuration parameters corresponding to the selected video control, and then enters the details page data according to the received details page data. The media asset details interface shown in Figure 3. As shown in Figure 3, the media asset details interface can display multiple course subsection controls of the fitness video, or it may not include it. After the user clicks one of the course subsection controls or starts training controls, the display device displays the course subsection controls/details page for the fitness video. The corresponding fitness video, and then enter the playback mode selection interface. In the following expressions, the played fitness video is also referred to as a target video.

Referring to FIG. 4, the playback mode selection interface can display three mode controls. The playback mode corresponding to the first mode control is the normal mode, which can also be called the first mode, and the playback mode corresponding to the second mode control is the follow-up mode. mode, this mode may also be referred to as the second mode, and the playback mode corresponding to the third mode control is the movie viewing mode, which may also be referred to as the third mode. Each mode control can display an explanation of the playback mode. Exemplarily, the explanation of the normal mode can be: "Shield the camera to watch the complete teaching video to become familiar with the training action", and the explanation of the follow-up mode can be: "Turn on The camera obtains the real-time comparison of the action to make the action more standard", and the explanation of the viewing mode can be: "The effect of exercising while shielding the camera will not be discounted".

According to the above explanation, in some embodiments, in the normal mode, the display device does not activate the camera, and only sets a playback window on a new interface to play the fitness video. In the follow-up mode, the display device activates the camera, Two playback windows are set on the new interface of the monitor to play the images captured by the camera and the fitness video at the same time. In the movie viewing mode, the display device does not start the camera, and two playback windows are set on the new interface of the monitor, respectively simultaneously. Play a workout video and a movie.

In some embodiments, the display device may be provided with a camera, and the camera may include an elevating camera or a non-elevating camera, the camera may capture user images to obtain local camera data, and the controller of the display device may display the local camera data captured by the camera on the On the display of the display device, the user can see his actions on the display.

In some embodiments, the display device is not provided with a camera, but a camera can be connected, such as an external camera connected through USB, and the camera is used to capture user images, and the controller of the display device can display the local data captured by the camera on the display device. on the device's display.

The user can select a playback mode to watch the fitness video according to the above explanation.

In some embodiments, if the user clicks the normal mode control on the playback mode selection interface shown in FIG. 4 , the display device generates a media asset playback instruction, and the media asset playback instruction includes the information of the playback mode and the target video. The mode is the normal mode, and the information of the target video includes the playback address of the target video. The display device obtains the video data stream of the target video from the playback address of the target video according to the media asset playback instruction, and generates a first playback window for playing the target video on the new interface according to the normal playback mode. In some embodiments, the first playback window may be a full-screen window with a display ratio of 16:9.

In some embodiments, since the ratio of the fitness video is generally 16:9, and the ratio of the full-screen window is also 16:9, the ratio of the fitness video is consistent with the ratio of the full-screen window. In the normal mode, the display device only generates a first playback window, and does not generate other windows, and no other windows will block the display content of the first playback window. Therefore, after generating the first playback window, the display device can The playback mode of the media asset playback instruction is the normal mode, which directly plays the fitness video in full screen in the first playback window.

In some embodiments, in order to play the fitness video in full screen in the first playback window, the display device may zoom the image of the target video, so that the image size of the zoomed target video is scaled to be consistent with the size of the full screen window.

In some embodiments, the method for scaling the target video by the display device may be: parsing the video data stream of the target video to obtain a video frame sequence of the target video, taking the first frame image in the video frame sequence, and according to the image The ratio of the height to the height of the target display area obtains a scaling ratio, and then the video frame sequence of the target video is scaled according to the scaling ratio. Exemplarily, the image height of the target video is 100, the height of the full-screen window of the display device is 1000, where the height can be the number of pixels in the vertical direction, and the scaling ratio is: 100:1000=1:10, according to this The zoom ratio is to enlarge the video frame sequence of the target video by 10 times, so that the enlarged image of the target video can fill the entire full-screen window.

In some embodiments, after zooming the target video, the display device obtains a video frame sequence of the zoomed target video, and sends the video frame sequence to the first playback window, so that the first playback window can continuously play the video frame sequence .

Referring to FIG. 5 , it is a schematic diagram of a full-screen playback interface in the normal playback mode according to some embodiments. As shown in FIG. 5 , in the normal playback mode, the target video can be played in full screen. The character in FIG. 5 can be a fitness coach. It can represent the background of the person. Usually, in the image of the target video, the person is displayed in the center, and the left and right sides behind the person are background images.

In some embodiments, in the display interface of FIG. 5 , the user can call up a control list including the switching control, and after the user selects the switching control, the display is switched to the follow-up mode display in FIG. 6 . Or when the interface shown in FIG. 5 is displayed, the display of the follow-up training mode shown in FIG. 6 can be switched by preset key values. Similarly, it is also possible to switch from the display of FIG. 6 to the display of FIG. 5 by the above-mentioned means.

In some embodiments, if the user clicks the follow-up mode control on the playback mode selection interface shown in FIG. 4 , the display device generates a media asset playback instruction, and the media asset playback instruction includes the information of the playback mode and the target video. At this time, The playback mode is the follow-up mode, and the information of the target video includes the playback address of the target video. The display device obtains the video data stream of the target video from the playback address of the target video according to the media asset playback instruction, and generates a first playback window for playing the target video on the new interface according to the playback mode as a follow-up mode, and A second playback window for playback of local camera data.

In some embodiments, the second play window is superimposed above the first play window;

In some embodiments, the height of the second play window and the first play window are the same, the left border of the second play window and the left border of the first play window overlap, or, the right border of the second play window and the first play window The right borders of .

In some embodiments, the position of the window can be realized by setting the coordinate parameters of the window in the interface.

In some embodiments, referring to FIG. 6 , in the follow-up mode, the first playback window can be a full-screen window with a display ratio of 16:9, and the second playback window can be a window of the same height as the display, and the display ratio can be adjusted according to The shooting parameters of the camera are determined. The second playback window can be displayed on one side of the first playback window in the form of a texture. For example, the second playback window can be displayed on the right side of the first playback window and overlap with the right edge of the first playback window. The display area on the right side of a playback window constitutes a block. If the display device scales the target video to the same size as the full-screen window and directly displays the scaled image in the full-screen window, due to the occlusion of the second playback window, if the normal window display logic is followed, part of the target video cannot be displayed. When the user watches, the user can only watch the images in the target video that are not blocked by the second play window. When the target video is a fitness video, the user needs to follow the movements of the fitness coach in the fitness video. The fitness coach is the character in the target video, usually located in the middle of the fitness video, and the second playback window may block the fitness coach. The body affects the viewing effect of fitness videos.

In some embodiments, in order to reduce the probability that the body of the fitness coach is blocked, the display device may determine a target display area in the first playback window in the follow-up mode, and the target display area is not blocked by the second playback window. In the occluded area, the display device can display the image of the target video to the left in the target display area, so that the user can see a relatively complete exercise action in the target display area.

In some embodiments, the target display area is determined according to the position coordinates of the first play window and the position coordinates of the second play window. By subtracting the position coordinates of the first play window from the position coordinates of the second play window, the position coordinates of the target display area can be obtained.

It should be noted that the target display area refers to the preferred display area in the first playback window, and the content displayed in the target display area will not be blocked by other images. window, there is also a second playback window, and the second playback window forms a partial block to the first playback window, then the display device can display the area in the first playback window that is located on the left side of the second playback window and is not blocked by the second playback window. Determined as the target display area. Referring to FIG. 7 , the second play window is superimposed and displayed on the upper right side of the first play window, and the area on the left side of the first play window that is not blocked by the second play window can be determined as the target display area of the first play window.

It should be noted that, in the follow-up mode, the first playback window is blocked by the second playback window, and the display device determines the target display area according to the position of the second playback window on the first playback window. In some video playback scenarios, the first playback window used for video playback may not be blocked by the second playback window used to play the local data of the camera, but by other images, for example, by an opaque control for displaying pictures In this case, the method provided by the embodiment of the present application can also be applied to determine a target display area, and play a video in the target display area, so as to achieve a better playback effect. At this time, if the position where the control blocks the first playback window is the same as the position where the second playback window blocks the first playback window, that is, both are blocked on one side of the first playback window, and one width side of the control is the same as the first playback window. One of the width sides of , the target display area can be determined as a rectangular area in the first playback window that is not blocked by the controls. If all the sides of the control do not overlap with the first playback window, at this time, no matter which side of the control the target display area is set to, the size of the target display area will be smaller. The display area is determined as the target display area, that is, the target video is played in full screen in the first play window.

In some embodiments, in the follow-up mode, since the target display area is only a partial display area of the first playback window, the size ratio of the target display area may not be 16:9. First zoom the target video, and then offset the image of the zoomed target video to the left, so that the center line of the offset image is close to the center line of the target display area, or the center line of the offset image is close to the center line of the target display area. The center lines of the display areas are coincident, so that the user can see a relatively complete person image in the target display area.

In some embodiments, during the software execution process, the target display area may be determined, and the first moving distance of the target video may be directly determined according to the position parameters of the first playback window and the second playback window in the playback interface. The exemplary target video will be scaled during the playback of the player. In order to ensure that the displayed content is not deformed, the image is generally scaled in equal proportions in the height and width directions. The general scaling rule is to confirm the scaling factor according to the height (width). After scaling the image, the scaled image fills the height (width) of the playback window in height (width), and black can be inserted in another dimension. The height direction may be used as the reference, or the width direction may be used as the reference. The first distance that the image of the target video needs to move during the display process can be determined according to the width parameter of the first player and the width parameter of the second player. So that the middle position of the target video image frame is displayed as far as possible in the unobstructed area on the first video window.

Referring to FIG. 8 , after shifting the image of the target video to the left by the first moving distance D, it can be seen that the image of the person in the target video is closer to the left side of the screen than before the shift.

Referring to FIG. 9 , after shifting the image of the target video to the left by the first moving distance D, the background image on the right side of the character in the target video can be blocked by the second playback window. The characters in the window are occluded, which achieves a good display effect.

In some embodiments, the dislocation display of the first play window and the second play window in FIG. 6 and FIG. 9 is only to express the existence of two independently controlled play windows, and does not represent the actual superimposed display effect.

In order to calculate the first moving distance D, in some embodiments, the image of the target video may be zoomed to obtain the image width after the target video is zoomed to the size of the first playback window, and the image width may be referred to as the width to be displayed; Obtain the width of the second playback window; take the difference between the two widths to obtain the width of the target display area; take half of the difference between the width to be displayed and the width of the target display area as the first moving distance. According to this calculation method, the first moving distance D offset from the left side of the target video is the same as the distance that the right side of the target video is blocked by the second playback window, so that the center line of the target video and the center line of the target display area are overlapped. This calculation method may be called an average method, and has the advantages of simple calculation and fast calculation speed, and can quickly determine the size of the first moving distance D.

In some embodiments, the first moving distance may be any distance less than the difference in width of the two playback windows.

In some embodiments, the first moving distance is not greater than the width of the second playback window.

In some embodiments, the first moving distance may be directly determined according to the width of the second playing window. Exemplarily, half of the width of the second playing window is used as the distance to be moved, and the first moving distance is not greater than the distance to be moved. .

In some embodiments, the first moving distance may also be obtained according to the difference between the width of the zoomed movie and the width of the second playback window, for example, half of the difference may be used as the first moving distance.

In some embodiments, the determination of the first movement distance is performed according to a position parameter of the playback window.

In some embodiments, the fitness trainer in the target video may not be located in the middle of the image. At this time, if the first moving distance D offset from the left side of the target video is set to the right side of the target video and is blocked by the second playback window The method with the same distance will cause the fitness trainer in the target video to be skewed to the left or right, and the display effect is not good. In order to solve this technical problem, other methods can also be used to calculate the first moving distance D. For example, human body recognition can also be performed on the image frame of the target video. After identifying the human body, that is, the fitness coach, the central axis of the human body is symmetrically extended to both sides until the width of the display area containing the human body is the same as the width of the target display area. At this time, the difference between the width starting point of the display area containing the human body and the width starting point of the first playback window is used as the first moving distance. The first moving distance obtained by this calculation method may be the same as the first moving distance obtained by the above-mentioned averaging method. Not the same, if the fitness coach is to the left in the media image of the target video, then the first moving distance obtained by this calculation method is smaller than the first moving distance obtained by the above-mentioned averaging method, if the fitness coach is offset in the media image of the target video. Right, then the first moving distance obtained by this calculation method is greater than the first moving distance obtained by the above-mentioned averaging method.

After obtaining the first moving distance D, in some embodiments, the display device may implement the image offset display of the target video by dynamically setting the display of the surfaceView (planar view). The offset output function of surfaceView can be: layoutParam.setMargins(0-D,0,0,0). As shown in Figure 9, the offset output function indicates that the left offset is D, so that the left starting point of the media asset image is (0-D), and the starting point on the left side of the media asset image in the first playback window is 0, Since the first playback window is displaying an image, the image needs to be displayed from the position where the starting point is 0. Therefore, according to the above-mentioned offset output function, when displaying each frame of the target video in the first playback window, the first playback window starts to display the media assets from the pixel point on the right side of the target video at the first moving distance D in the first playback window. Image, the image within the first moving distance D exceeds the display range of the first playback window, and the first playback window will not display this part of the image, which realizes the display effect of shifting the image of the target video to the left and displaying it in the target video. On the right side of the image, the first playback window still has a part of the display area, and this part of the display area can display black borders. By placing the second playback window on top, the second playback window can cover the right side of the character in the image of the target video. part of the background image and the above black border, the user will not see this part of the background image and the black border, and will not affect the user's viewing experience. Wherein, the topping method of the second playback window may be setZOrderOnTop(true) to topping.

According to the above method of dynamically setting the surfaceView, the playback interface of the display device in the follow-up mode is shown in Figure 10. The middle image of the target video is displayed in the target display area of the first playback window, and the user image is displayed in the second playback window. displayed inside.

In some embodiments, after receiving the target video, the display device decodes the target video, scales it according to the parameters of the first playback window, and then moves the image frame to the left by the first The position parameter after the distance is displayed. Since the image outside the playback area of the first playback window cannot be displayed, the edge of the image frame close to the second playback window is blocked by the second playback window, and the central area of the image frame can be in the first playback window. Unoccluded areas are rendered.

In some embodiments, the second play window plays the acquired video data of the camera.

It can be seen from the above embodiments that in the embodiment of the present application, a display area where the target video is not blocked is detected on the display device, and it is determined as the target display area, and then after scaling the target video, the image of the target video is controlled to be displayed on the target. The area is offset and displayed, which solves the problem of poor display effect caused by the target video being blocked by the second playback window when playing in the follow-up mode, and improves the user's viewing experience.

In some embodiments, the video playback application can also score the user's actions according to some preset scoring rules, so that the user does not need to compare their actions with the actions in the target media assets, and can also know whether their actions are standardized .

In some embodiments, a preset scoring rule is: compare the image of the target media asset with the user image in real time, and determine the score of the user action according to the similarity between the user action in the user image and the action in the target media asset , the higher the similarity, the higher the score, and the lower the similarity, the lower the score.

However, during the playback of the target media asset, it takes some time for the user to see the action in the target media asset to act. If the user's image is captured too early or too late, the user's action score is likely to be low. The target media asset is continuously played. If the user performs an action in the target media asset, the screen of the target media asset has been switched to other actions, which will directly lead to a lower user rating.

In order to solve the above technical problems, a preset scoring rule is: when the target media asset plays a specific action, the image of the target media asset at this time is acquired, and then multiple user images are continuously collected, and the The action is compared with the action in the image of the target media asset to obtain multiple scores, and the highest score is used as the score of the action, thereby improving the accuracy of the score.

In some embodiments, the specific action used for scoring in the target media asset may be determined according to an action library, wherein the action library may include a plurality of sample pictures including actions of different characters and action data corresponding to the sample pictures. The movements can be some common fitness movements, such as squat movements, hand raising movements and so on.

In some embodiments, the action data of the action library may include the coordinate position and type of the skeleton key point of the character in the sample picture, wherein the skeleton key point may be obtained by a trained skeleton key point detection model, a kind of Exemplary bone key types may include nose bone key, neck bone key, left shoulder bone key, left elbow bone key, left wrist bone key, right shoulder bone key, right elbow bone key, right Wrist Bone Key, Left Hip Bone Key, Left Knee Bone Key, Left Ankle Bone Key, Right Hip Bone Key, Right Knee Bone Key, Right Ankle Bone Key, Left Eye Bone Keys, Right Eye Bone Keys, Left Ear Bone Keys, and Right Ear Bone Keys. The skeleton key point detection model can be a model based on a deep neural network. A large number of pictures with manually marked skeleton key points are input into the deep neural network model, and then the deep neural network is trained, so that the deep neural network has the function of identifying the key points of the skeleton. . Of course, skeleton key points can also be obtained by manual annotation.

In some embodiments, the action data of the action library may further include the positional relationship between adjacent skeleton key points, and different character actions can be distinguished according to the positional relationship between the adjacent skeleton key points.

In some embodiments, the action data of the action library also includes the action difficulty of the character actions in the sample pictures, and the action difficulty can be determined by the operator. An exemplary action difficulty range is 0-10. means more difficulty.

In some embodiments, the action data of the action library further includes an action identifier, each character action may correspond to a different action identifier, an exemplary action identifier may be an action number, and according to the action number, it can be retrieved from the action library Quickly find out other action data and sample pictures corresponding to the action number.

In some embodiments, the process of determining the image frame in the target media asset in which the specific action for scoring is located may be referred to as punctuating the target media asset, see FIG. 11 , which is the punctuation of the target media asset according to some embodiments Interactive diagram.

As shown in FIG. 11 , the operator can use the first tool to process the management of the target media assets by the server, the media asset service server and the media asset content server. The first tool processing server can manage the target media assets, and the action library can be stored in the first tool processing server; the media asset service processor can be provided with media asset information of each target media asset, the media asset information The original information that can be provided to the provider of the target media asset, such as the media asset playback address, media asset resolution, media asset duration, media asset type, etc., or the media asset information processed by the operators. For example, the processed media asset may include some new information such as the corrected media asset type, media asset label, etc., and the original media asset type is deleted. The latter type of media asset may be fitness; the media asset content server may be a server for uploading the video stream file and original information of the target media asset to the content provider of the target media asset.

In Figure 11, the first tool processing server, media asset service server, and media asset content server are distinguished according to their respective functions. In actual implementation, each server may be deployed on one hardware device, or may be deployed on multiple Each of the three servers may also be deployed on one hardware device, which is not specifically limited in this embodiment of the present application.

In some embodiments, the operator may input a management instruction of the target media asset to the first tool processing server, where the management instruction may include the media asset ID of the target media asset, and the first tool processing server may send the media asset ID to the media asset according to the media asset ID. The asset service server obtains the media asset information corresponding to the media asset ID, that is, the media asset information of the target media asset.

In some embodiments, after the content provider uploads a new media asset on the media asset content server, the media asset service server may generate corresponding media asset information according to the original information of the newly uploaded media asset, and the first tool processing server It can actively monitor the newly generated media asset information on the media asset service server in real time, and judge whether the newly uploaded media asset is used as the target media asset according to the media asset type in the media asset information. If the media asset type is the preset management type, Such as fitness type, the newly uploaded media asset can be used as the target media asset. If the media asset type is not the preset management type, the newly uploaded media asset can be skipped as the target media asset to be managed, and the media asset can be skipped. Continue to judge whether the next newly uploaded media asset is the target media asset.

In some embodiments, after the media asset content server uploads a new media asset, the content provider has already checked the media asset, and has set a check mark in the original information of the media asset, which is used to indicate The media asset has been clicked. When the media asset service processing server processes the original information to obtain the media asset information, if it detects a click-through label and the click-through label conforms to a preset specification, for example, the click-through label contains a time-axis-based click-through label. The playback time of the video frame, the dotted label can be retained in the media asset information, and if not, the dotted label is deleted. Therefore, when processing a target media asset, the first tool processing server can determine whether the media asset information of the target media asset has a dotted label, and if the media asset information of the target media asset has a dotted label, the first tool processing server can It is determined that the target media asset has been checked, and if there is no sticky note, the media asset is regarded as the target media asset to be checked.

In some implementations, after the content provider uploads a new media asset on the media asset content server, the media asset type in the media asset information generated by the media asset service server does not belong to the management type, but after a period of time, the media asset After the service server re-checks the media asset information, a type attribute is added to the media asset information of the media asset as the dot type. In this case, the first tool processing server can actively monitor the occurrences on the media asset service server in real time. For the changed media asset information, if the media asset type in the changed media asset information is a management type and there is no management label in the media asset information, the media asset is determined as the target media asset to be managed.

In some embodiments, after the content provider manages the media asset, it can also generate a management file, and the management file can be stored in the original information of the media asset, and the media asset service processing server is processing the original information to obtain the media asset. When the asset information is stored, the dot file can be retained in the media asset information. Therefore, when processing a target media asset, the first tool processing server can determine whether the media asset information of the target media asset has the above-mentioned dot file, and if the media asset information of the target media asset has the above dotted file, the first tool processing The server can determine that the target media asset has been managed, and if there is no dot file and no dot tag, the media asset can be regarded as the target media asset to be dotted.

In some embodiments, after determining that the target media asset has been managed, the first tool processing server may generate a prompt message that the target media asset is managed if the media asset information of the target media asset was previously obtained according to the management instruction, so that the target media asset is managed. The operator knows that the target media asset has been managed; if the media asset information of the target media asset is automatically obtained by the first tool processing server from the media asset service server, the target media asset can be directly skipped and the next one can be processed. target media.

In some embodiments, the content provider's management method for the target media asset may be different from the management method of the first tool processing server. Therefore, after knowing that the target media asset has been managed, the operator can input the first tool processing server. Re-managing the instruction to make the first tool processing server manage the target media asset.

In some embodiments, after confirming that the target media asset needs to be managed, the first tool processing server can obtain the video stream file of the target media asset from the media asset content server according to the media asset playback address in the media asset information, and then Analyze the video stream file to obtain the video frame of the target media asset, then detect the video frame of the target video frame by frame, and perform character motion recognition on the video frame, if the recognized character action is one of the character actions corresponding to the action library, Then, a dot recording record is generated, and the dot recording record includes at least the playing time of the video frame.

In some embodiments, the first tool processing server may detect the skeleton key points in the video frame through the trained skeleton key point detection model, and then compare the relative positional relationship between the adjacent skeleton key points in the video frame with the action library Compare the relative positional relationship between the corresponding skeleton key points in each sample picture in character actions. For example, in a video frame of the target media asset, the key point of the left shoulder bone, the key point of the left elbow bone and the key point of the left wrist bone are on a straight line, while in the action library, in the action data corresponding to a sample image, the left shoulder bone key The key point, the key point of the left elbow bone and the key point of the left wrist bone are also on a straight line, and the action in the video frame can be regarded as extending the left hand.

In some embodiments, after detecting that the character action in the video frame of the target media asset is one of the character actions corresponding to the action library, the playback time of the video frame in the target media asset, and the The action mark corresponding to the action in the video frame is generated, and then a dot record is generated according to the play time and the action mark, and the dot record may include the playback time of the video frame and the action mark corresponding to the video frame. Among them, the time interval between adjacent video frames is usually in milliseconds. Therefore, in the dot recording, the playback time can be accurate to milliseconds, which is convenient for determining the video frames.

In some embodiments, if the dotted video frames in a target media asset are too dense, it may cause the user to give a low score because the user does not have time to keep up with the actions in the target media asset when the target media asset is playing. After it is detected in the video frame of the media asset that the action of a character is one of the actions of the characters corresponding to the action library, it is possible to first judge whether the dotting conditions are met, and then do the dots if the dotting conditions are satisfied, and skip the video if the dotting conditions are not met. frame, continue to detect the next video frame. An exemplary dotting condition may be: when the character action in the video frame is one of the character actions corresponding to the action library, if the playback time of the video frame is greater than the playback time corresponding to the previous dot recording If the preset time is set, it can be dotted to generate a dot record, that is, within the preset time, at most one dot can be done, and the preset time can be set to 10 seconds or other durations.

In some embodiments, in order to prevent the dotted video frames in the target media asset from being too dense, after a dot dot is performed, no action recognition is performed on the video frames of the target media asset within a preset time after dot dot, and the pre-dot video frames are not identified. The video frame after the set time is used for character action recognition.

In some embodiments, after the detection of all the video frames of the target media asset is completed, or after the target media asset detects the video frames within a preset time from the last video frame of the target media asset, it can be aggregated and recorded. Dot records and the timeline of the target media asset generate a dot file and/or a dot tag, and store the dot file and/or the dot tag in the media asset information of the target media asset.

In some embodiments, only a dotted file may be generated without a dotted label, or only a dotted label may be generated without a dotted file.

In some embodiments, the first tool processing server may generate a notification message indicating that the target media asset has been managed after generating the dotting file and/or the dotting label of the target media asset, if the dosing instruction was previously done according to the dosing instruction, so that the operation The personnel know that the management of the target media asset has been completed; if the target media asset is automatically identified before, the next target media asset can be processed.

In some embodiments, the first tool processing server may further generate a dotting library corresponding to the target media asset according to the action data of the action of the character corresponding to the dotting record. The first tool processing server can store the management library in the media asset information of the target media asset in the media asset service server, and the media asset service server can be configured to download the media asset information of the target media asset to the display device. Send RBI library. Of course, the first tool processing server may also directly store the management library in the first tool processing server.

When the user uses the follow-up mode to watch the target media assets, the display device can collect the user image according to the dot recording obtained in the above-mentioned embodiment, when the playback time of the video frame in the dot recording is reached, and perform user actions in the user image. Action comparison, after comparison, can also be used to score user actions.

Referring to FIG. 12 , which is a schematic diagram of scoring interaction of target media assets according to some embodiments, as shown in FIG. 12 , when a user watches a target media asset from a media asset content server on a display device, the second tool processing server may It interacts with the display device, then scores the user's actions, generates a follow-up practice record, and feeds the follow-up practice record back to the display device, so that the display device can display the follow-up practice record.

In Figure 12, the second tool processing server and the media asset content server are distinguished according to their respective functions. In actual implementation, each server may be deployed on one hardware device or may be deployed on multiple hardware devices. Both servers may also be deployed on one hardware device, which is not specifically limited in this embodiment of the present application.

In some embodiments, the display device can detect the dotted label from the media asset information, confirm that the target media asset supports the action score according to the dotted label, and then obtain the dotted file of the target media asset from the media asset information, and obtain the target media asset's dotted file. Dot record.

In some embodiments, the display device can also detect the media asset information to determine whether the media asset information contains a dot file and/or dot label, and if so, the target media resource can be obtained from the dot file and/or dot label. 's hit record.

During the playback of the target video, the user can follow the target video to make corresponding actions.

In some embodiments, when the display device detects that the target video is played to a time corresponding to a dot record, the display device may acquire the media image of the target video at this time, and start to collect multiple user images with time progression, The media asset image and the user image are sent to the second tool processing server. Exemplarily, when the target video is played to the time corresponding to a dotting record, the display device can upload a user image to the second tool processing server every time interval, and for a dotting record, the display device can upload a preset number of , wherein the time interval of the uploaded user images may be 100 milliseconds, and the preset number may be 10, or the time interval of the uploaded user images may be 50 milliseconds, and the preset number may be 20.

In some embodiments, after receiving the media asset image and the user image, the second tool processing server may perform an action comparison between the user image and the media asset image according to the time sequence of the user image to obtain an action score of the user image. Exemplarily, the action comparison method includes: detecting the skeleton key points in the user image and the skeleton key points in the media image by using the trained skeleton key point detection model, and comparing the adjacent skeleton key points in the user image. The relative position is compared with the relative position between the corresponding skeleton key points in the media image, that is, the action data in the user image is compared with the action data in the media image, and the error of the relative position is obtained. The error of the position and the action difficulty of the media image are calculated to obtain the similarity between the user action in the user image and the action in the media image, and the action score of the user action is obtained according to the similarity. Among them, the mapping relationship between the error of the relative position and the similarity, as well as the mapping relationship between the similarity, the action difficulty and the action score can be formulated in advance and can be adjusted. For example, when the number of relative positions whose errors are within the preset range is constant, the greater the difficulty of the action, the higher the action score.

In some embodiments, in order to improve scoring efficiency and reduce the amount of data uploaded by the display device, the display device may also send a playback instruction of the target media asset to the second tool processing server, so that the second tool processing server can respond to the target media The playback instruction of the asset is downloaded from the media asset service server or the first tool processing server. The display device can upload the user image and the action ID, but not the asset image. The second tool processing server compares the action data in the user image with the action data of the corresponding sample picture in the action library according to the action ID, and obtains the action data of the user action. Action rating.

In some embodiments, the second tool processing server may download the dot library from the first tool processing server in response to a playback instruction of the target media asset, and the second tool processing server may also compare the motion data of the user image with the target media asset’s action data. The corresponding action data in the management library is compared to obtain the action score of the user's action, which avoids the problem that the action library may be large and that downloading the action library and searching for the action data from the action library are slow.

In some embodiments, the action library and/or the management library can also be directly stored on the second tool processing server, which avoids the time-consuming problem that the second tool processing server needs to download the action library and/or the management library.

In some embodiments, the second tool processing server stops comparing the next user image and the media asset image when the termination condition of the current comparison is reached. Exemplarily, the termination condition may be that a first preset number of user images have been compared, or a comparison needs to be performed for the next action, such as receiving the next media image, or a second preset number of consecutive actions. The score is on a downward trend, the first preset number may be 10, and the second preset number may be 3.

Since the user sees the image of the target media asset, it takes a certain amount of time to perform the action in the target media asset. After the action is completed, it may return to the initial state, such as standing upright, or proceed to the next action. Therefore, , after the time-progressing user images are scored, multiple action scores can form a parabola with an approximate opening downward in chronological order. The vertex of the parabola is the highest score in the action score, and the highest score can be the highest score in this action. The follow-up score, of course, can also be determined in other ways. For example, after removing several lower scores, the average score of the remaining scores is used as the follow-up score for this action.

In some embodiments, FIG. 13 shows a follow-up mode control method, the method is configured and executed by the controller 250 in the display device, that is, the controller 250 is the execution subject of the method, and the method includes the following program steps :

Step S10, in response to receiving the operation of starting the training item video, displaying the training item video in the first window of the follow-up training interface, and displaying the local image in the video code stream collected and sent by the image collector in the second window . This step is the basis and premise of user follow-up training, which is convenient for users to train according to the guidance of the follow-up interface.

Step S20, in response to the video of the training item being played to a key frame, periodically acquiring a follow-up image corresponding to the key frame from the video stream.

Step S30, compare the follow-up exercises in the follow-up images with the standard actions in the key frame, and obtain the training scores of the follow-up actions in each follow-up image respectively.

If the training item video is not played to the key frame of the dot position, it will continue to play until a dot is encountered. In this application, every time a key frame is played in the video of the training project, a follow-up image corresponding to the key frame needs to be periodically obtained from the video stream. For example, a preset period can be set, and one frame of follow-up practice can be obtained every preset period. The image, the follow-up image corresponding to the key frame mentioned here refers to the image collected when the user simulates the follow-up action after watching the standard action in the key frame. A frame of follow-up training images is obtained every 100ms, and the follow-up exercises of the human body identified in the follow-up training images are compared with the standard movements, and the training scores of the follow-up exercises in each frame of follow-up training images are obtained.

In some embodiments, a time stamp is set for each frame of image in the video code stream collected by the image collector 232 , and the time stamp of each frame of image in the video code stream is the time stamp of the image collector 232 on the basis of the collection time. The time delay compensation is set after time compensation, and the delay compensation is used to eliminate the delay caused by the transmission of the image from the image collector 232 to the controller 250 . Before the controller 250 periodically obtains the follow-up image from the video code stream, according to the physical time played to the key frame, the time stamps of each frame image in the video code stream are compared to locate the follow-up image corresponding to the key frame, In order to obtain accurate follow-up images. The present application utilizes physical time, that is, the time of the dotting position, to locate and acquire the follow-up image, rather than the progress bar time, so as to improve the accuracy of acquiring the follow-up image in a more accurate time matching manner. The present application considers the compensation for the image transmission delay when setting the time stamp. For example, the image transmission delay is about 150ms, that is, after the delay of 150ms, it is transmitted to the controller 250, then the time stamp of each frame of image in the video stream can be set as Advance this frame acquisition time by 150ms.

Step S40, according to the maximum value of the training score of the follow-up action in each follow-up image, calculate the action matching degree between the standard action and the follow-up action.

In some embodiments, before step S40 is performed, the method further includes: in response to reaching a termination condition, stopping acquiring the follow-up image from the video stream.

In some embodiments, the controller 250 determines that the termination condition is reached in response to the training item video playing to the next key frame. That is, before switching from the standard action at the current dotting position to the next standard action, it is necessary to stop acquiring follow-up images to ensure that the collected follow-up images are related to the current standard action. In this case, the image acquisition process is followed by the dot position constraint, and all frames included between the two dot positions are acquired.

There is a time interval between the key frames of two adjacent dotting positions. For example, the action of "1/4 lunge squat" needs to be held for 20 seconds. After 20 seconds, switch to the next standard action, that is, between the two dotting positions. The interval is 20 seconds. If a frame of follow-up training images is acquired every 100ms, the number of follow-up training image frames obtained will be very large, and it is necessary to calculate the training scores of the follow-up exercises in a large number of follow-up training image frames, which will cause the controller 250 to calculate And the waste of processing resources will also lead to low computational efficiency of action scoring and action matching.

In addition, the applicant's research found that when the user performs follow-up training, the training score of each action is approximately a parabola with an opening downward, that is, when the user gradually adjusts the limbs to approach the standard action according to the action standardization prompt information, this process of training The score shows an upward trend. When the user is tired or wants to switch actions, the match between the follow-up actions and the standard actions gradually decreases, and the training score shows a downward trend. When the training score shows a significant downward trend, or the training score is low, it is not necessary to obtain this part of the training image from the video stream. Each follow-up action score needs to be superimposed with the factor of the user's reflection time.

Therefore, in some embodiments, each time the controller 250 acquires a frame of follow-up image, the number of acquired frames is cumulatively increased by 1, that is, the number of acquired frames is accurately recorded and updated during the process of acquiring the follow-up image. When it is detected that the number of acquired frames is equal to When the second quantity threshold is reached, it is determined that the termination condition is reached. The second quantity threshold is a preset value, such as 10 frames, which is used to limit the maximum number of frames that the controller 250 can obtain a follow-up image corresponding to each dotting position, and the value of the second quantity threshold is not limited. In this case, the number threshold is used to constrain the acquisition process of follow-up images, and the number of follow-up images obtained is equal to the second number threshold, so as to filter out other subsequent frames, terminate the relatively ineffective follow-up image acquisition and score judgment in time, and improve the Matching degree of user scores, and improve the calculation efficiency of score and action matching degree, and improve user experience.

Alternatively, in some embodiments, each time the controller 250 acquires a frame of follow-up training images, it will match the training scores of the corresponding follow-up exercises, so as to obtain the variation trend of the training scores. For example, the i-th frame is 85 points, the i+1-th frame is 83 points, and the i+2-th frame is 80 points, so obviously the training score shows a decreasing (declining) trend. Since the lower the training score is, the lower the matching degree between the follow-up action and the standard action is. Therefore, it is expected to retain the higher training score and filter out the lower training score. For this, if the training scores corresponding to the follow-up images of consecutive M frames show a decreasing trend trend, it is determined that the termination condition is reached, where M is the first quantity threshold, for example, M can be 3, and the value of the first quantity threshold M is not limited. In this case, the score trend/trajectory during the user's follow-up practice is used to timely terminate the relatively ineffective follow-up image acquisition and score determination, improve the matching degree of the user's score, and improve the calculation efficiency of the score and action matching degree. experience.

On the basis of the above embodiments of the termination condition, in the process of acquiring the follow-up image, it is necessary to determine whether the termination condition is reached. If the termination condition is not reached, continue to acquire follow-up image frames every preset period and match the training score; if the termination condition is reached, stop acquiring the follow-up image from the video stream, and execute step S40.

No matter what termination condition is adopted, when the termination condition is reached, it is assumed that N frames of follow-up images are obtained in total, and the training score of the follow-up actions in each frame of follow-up images is Scorej, 1≤j≤N, then the user simulation in this application The score of the follow-up action based on the standard action in the key frame is named as the target score. The maximum training score for the exercise.

In some embodiments, after matching the target score obtained by the standard actions of the user in the follow-up dotting position, the target score may be recorded, so as to facilitate the subsequent statistics of the final score.

Step S50, controlling the display to display action matching prompt information in the second window according to the action matching degree.

In some embodiments, the action matching prompt information includes the accuracy rate of the action, the action matching degree between the standard action and the follow-up action can be calculated by the target score, and the action is displayed in a designated position on the second window of the follow-up interface. Matching degree, the action matching degree is displayed to the user in the form of the ratio value of the accuracy rate.

In some embodiments, the action matching prompt information further includes encouraging words, and according to the action matching degree, matching encouraging words, such as "Good", "Great", "Perfect", can be displayed in the second window of the follow-up interface. " and so on, each kind of encouragement corresponds to a range of action matching degree. For example, when the action matching degree is more than 90%, the encouragement is displayed as "Perfect".

In some embodiments, the action matching prompt information further includes action standardization prompt information, and the degree of deviation between the follow-up exercise and the standard action can be measured by the degree of motion matching, for example, when the degree of motion matching is lower than a preset threshold , indicating that the user's follow-up action is not standard, it is necessary to prompt the user to correct the action. In the process of displaying the follow-up image in the second window, according to the deviation in position, limb posture, etc. between the standard action and the follow-up action in each frame of the follow-up image, the standard action is displayed in the second window of the follow-up interface. The prompt information is convenient for users to know the deficiencies of their own follow-up exercises, and to correct and adjust them, so as to improve the training score of the follow-up exercises until the highest score (ie, target score) of the movement is reached.

It should be noted that the action matching prompt information is not limited to those described in the above embodiments, as long as the information content determined based on the action matching degree analysis belongs to the category of the action matching prompt information, and can be displayed in the follow-up interface according to actual needs. .

In some embodiments, since the target score is the maximum value of the training scores in the N frames of training images obtained, only the target training images corresponding to the target scores may be retained, and the other N-1 frames of the training images may be deleted, In this way, after the acquisition of the follow-up images is terminated, the second window of the follow-up interface will display only the best follow-up movements that match the standard movements with the highest degree, and the reserved target follow-up movements will be displayed when viewing the follow-up images of the standard movements. practice images.

Before the acquisition of follow-up images is terminated, the second window of the follow-up interface will display the follow-up images of each frame in sequence according to the acquisition sequence, and display the action standardization prompt information according to the deviation between the follow-up action and the standard action, so that the user can The following exercises are gradually corrected until the optimal follow exercises are reached. Then after the termination condition is reached, the second window only keeps displaying the target follow-up image corresponding to the best follow-up action/target score until the next key frame of dotting arrives, and then starts the follow-up image acquisition, UI transformation of the second window and The above scoring and selection process.

In some embodiments, when the video of the training item is played to the end, that is, when the progress of the play bar moves to the end, the current training item follow-up practice ends, and the user's final score of the follow-up practice needs to be counted; After the preset time is exceeded, for example, after the user has practiced for 2 minutes, the user exits the training program video. In this case, the follow-up practice should also be ended, and the final score of the user's follow-up practice will be counted.

In some embodiments, the controller 250 is in response to the training item video being played to the end point. Since the user's follow-up practice process has traversed all the standard actions of the dots in the video, it is necessary to count the target follow-up images corresponding to all key frames in the training item video. The training score (that is, the target score) is accumulated and weighted to obtain the final score, that is, the cumulative addition of the target scores obtained by the user referring to each standard action training is the final score of this follow-up training.

In some embodiments, the controller 250 responds to the operation of exiting the training item video after the follow-up time exceeds a preset duration, that is, the user only follows a part of the video clips that exceed the preset duration without traversing the entire video. Standard actions, so it is necessary to count the training scores of the target follow-up images of the currently traversed key frames, and accumulate the weight to obtain the final score. For example, if the user quits this follow-up practice after watching for 3 minutes, and within this 3-minute period, the key frames of the traversed dots are 6, that is, the user has completed the simulation training of 6 standard actions when exiting the follow-up practice, then The cumulative sum of the target scores of these 6 exercises is the final score of this follow-up exercise.

In some embodiments, after the final score of each follow-up exercise is counted, a training report can also be output synchronously, so that the follow-up user can learn the details of the current training. The controller 250 controls the display 275 to display a training report interface as shown in FIG. 17 . The training report interface displays the final score, as well as information such as the energy consumed by the follow-up training, the accuracy rate of the movements, and the training duration to the user. The training report interface can also include retraining controls and switching controls. The retraining controls are used to repeat the training process of the currently ended training item when triggered; the switching controls are used to switch to the next training item list when triggered. A training project video, and start the control flow of the follow-up training mode for the next training project video.

In some embodiments, when the training report interface is displayed, the user can click the “return” button on the remote control to exit the training report interface and return to the training item list interface as shown in FIG. 12 . At this time, the user can either Continue to select the video you want to follow from the list of training items, or you can choose not to continue training and exit the fitness application.

When the image collector does not involve external factors such as power failure, physical switch off, failure, and being occupied by other applications, since the follow-up user is active, there may be situations where it exits the portrait capture area/aperture during follow-up practice, resulting in image In the follow-up training images collected by the collector 232, the actual portrait cannot be recognized and detected, which is invalid follow-up training.

In some embodiments, in order to avoid invalid follow-up training, the controller 250 controls the display 275 to pause the training item video played in the first window, and at the same time pauses the training duration, energy consumption, accuracy rate, target score corresponding to the key frame of the hitting position, The statistics of the final score and other data accumulated during the follow-up practice make the follow-up mode in a suspended state to wait for the user to perform portrait recognition again. At the same time, the second window is controlled to display prompt information to prompt the user to move to the portrait collection area/aperture Re-recognize the portrait inside, and the UI is as shown in Figure 18 at this time. When the user's portrait is successfully recognized, the current pause frame is used as the starting point to start the video of the training project in the first window, and on the basis of the original training data, continue to count the training time, energy consumption, accuracy, target score and final score, etc.

As can be seen from the technical solutions of the above embodiments, in the present application, when the training item video is played to a key frame, a frame of follow-up image is acquired at every preset period, and the follow-up image is compared with the key frame, and the given The training score of the follow-up exercise in each follow-up image, when the termination condition is met, stop acquiring the follow-up image from the video stream, and then select the score with the highest training score from the multi-frame follow-up images obtained as the The target score corresponding to this key frame, and the action matching degree is calculated based on the target score. In the termination condition, the acquisition of follow-up images is constrained by the second quantity threshold or the change trend/trajectory of the training score, and the acquisition and scoring of invalid follow-up images can be terminated in time, thereby reducing the processing resources consumed by the controller 250 and improving The scoring efficiency and accuracy of the follow-up mode. This application collects a certain number of follow-up images within a certain period of time, and obtains the highest score as the target score corresponding to the standard action, so that the score of each action is maintained at the best matching degree, avoiding delay due to user feedback It can improve the user experience and improve the user's confidence and enthusiasm for training.

In some embodiments, after obtaining a follow-up score, the second tool processing server can send the follow-up score to a display device, so that the display device can display a score prompt corresponding to the score, see FIG. 14 , a score prompt Can be "GOOD" and the rating prompt can be superimposed on the user's image.

In some embodiments, after obtaining a follow-up score, the second tool processing server may calculate the accuracy rate of the user's action according to the accumulated follow-up score after the target media asset starts playing, and send the accuracy rate to the display device, so that The display device can display the accuracy rate.

In some embodiments, during the playback of the target media asset, if the user wants to stop the follow-up practice, he can input an instruction to end the video playback to the display device. The server sends the information of the end of the follow-up practice, and the second tool processing server receives the information of the end of the follow-up practice, generates a follow-up practice record according to all the follow-up practice scores, and then sends the follow-up practice record to the display device, so that the display device can display the follow-up practice to the user. Follow up practice records.

In some embodiments, after the playback of the target media asset ends, the display device may send information that the follow-up practice is over to the second tool processing server, and the second tool processing server receives the information that the follow-up practice ends, and generates a score based on all the follow-up practice scores. The follow-up practice record is then sent to the display device, so that the display device can display the follow-up practice record to the user.

Referring to FIG. 15 , it is a schematic interface diagram of an exemplary follow-up record. As shown in FIG. 15 , the follow-up record can display the training score, energy consumption, accuracy and training duration. Exemplarily, the training score can be the follow-up exercise. The average score of the score, the accuracy can be the average score of the similarity, the training duration is the playback duration of the target media asset, and the energy consumption can be determined according to some preset calculation rules.

In some embodiments, before or during the follow-up practice, the display device can also handle some abnormal situations. For example, after the controller of the display device cannot receive the signal from the camera assembly, the display device can pause to play the target media asset, and display an abnormal prompt, see FIG. 16 , the abnormal prompt may include: “Camera not detected”, and the abnormal prompt may be displayed in the window of the user image.

In some embodiments, during the follow-up training process, the second tool processing server may also process some abnormal situations. For example, if the second tool processing server does not detect a skeleton key point in the user image, it may send a message to the display device. The abnormal prompt and the play pause instruction enable the display device to pause the playback of the target media according to the play pause instruction, and display the abnormal prompt. Referring to FIG. 17 , the abnormal prompt may include: "There is no one in front of the camera, pause playback", and the abnormal prompt may be displayed in the window of the user image.

In some embodiments, the processing of the abnormal situation by the second tool processing server further includes: during the follow-up practice, if the second tool processing server detects that the position of the skeleton key point in the user image does not change within a period of time, then An abnormality prompt and a playback pause instruction can be sent to the display device, so that the display device can pause the playback of the target media asset according to the pause playback instruction, and display the abnormality prompt. Referring to FIG. 18 , the abnormality prompt can be two arrows pointing to the characters in the user image, and the abnormality prompt can be displayed in the window of the user image.

It can be seen that during the follow-up practice, the second tool processing server can perform the operation of scoring user actions and the operation of abnormal handling. Complex data processing such as skeleton point detection and calculation scores requires low hardware level of the display device, which is conducive to the smooth operation of the display device. In some embodiments, when the hardware level of the display device is relatively high, the operations performed by the second tool processing server can also be performed by the display device. In this case, the display device needs to download the action library or the management library before scoring. , there is no need to interact with the second tool processing server during scoring, which can reduce the occupation of network resources.

It can be seen from the above-mentioned embodiments that in the embodiment of the present application, by pre-dotting the target media assets, when scoring, the user images can be scored according to the video frames dotted, which solves the problem that the target media may be possible when the user makes an action during real-time comparison. The problem that the user's action score is low due to the fact that the data has been played to other actions has improved the scoring accuracy of the follow-up mode; and multiple scores are obtained by comparing multiple user images and the video frames corresponding to the dot recording, and the highest score is determined. As a follow-up score, it reduces the probability of a low follow-up score; further, when managing the target media assets, a certain number of video frames are spaced apart to prevent users from being unable to keep up in time due to the excessively intensive management. Each action case enhances the user experience.

For the convenience of explanation, the above description has been made in conjunction with specific embodiments. However, the above discussion in some embodiments is not intended to be exhaustive or to limit implementations to the specific forms disclosed above. Numerous modifications and variations are possible in light of the above teachings. The above embodiments have been chosen and described to better explain the principles and practical applications, so as to enable those skilled in the art to better utilize the embodiments and various modified embodiments suitable for specific use considerations.

Claims

A display device comprising:

monitor;

a controller, connected to the display, the controller being configured to:

Receive the media asset playback instruction input by the user;

In response to the media asset playback instruction, obtain a target video corresponding to the media asset playback instruction;

When the control is not set above the first playback window corresponding to the target video, the target video is played in the first playback window;

When the control is provided above the first playback window corresponding to the target video, the display position of the target video is moved in the first playback window away from the control, so that the target video is displayed in a direction away from the control. The center position of the screen is displayed close to the center position of the target display area in the first play window that is not blocked by the controls, wherein the controls are opaque and block one side of the first play window.
The display device according to claim 1, wherein the control comprises a second playback window, the second playback window comprises a window generated by the controller in response to the media asset playback instruction, the second playback window is used for Play the received local camera data.
The display device according to claim 1, wherein the controller is further configured to: before moving the display position of the target video away from the control in the first play window:

determining the to-be-displayed width of the target video when the target video is completely displayed in the first playback window in the height direction;

The first moving distance is determined according to the width to be displayed and the width of the target display area, and the first moving distance is that in the first play window, the display position of the target video needs to be moved away from the control distance moved.
The display device according to claim 3, wherein the controller is configured to determine the to-be-displayed width of the target video when the target video is completely displayed in the first play window in the height direction by:

Obtain the scaling ratio according to the ratio of the image height of the target video to the height of the target display area;

The to-be-displayed width of the target video is obtained according to the image width of the target video and the zoom ratio.
The display device according to claim 3, the controller is further configured to determine the first moving distance according to the width to be displayed and the width of the target display area by:

Taking half of the difference between the width to be displayed and the width of the target display area as the first moving distance.
The display device according to claim 1, wherein the target display area is determined according to the position coordinates of the first play window and the position coordinates of the control.
The display device according to claim 1, wherein the height of the control is the same as the height of the first play window.
The display device according to claim 1, wherein the right side of the control is coincident with the right side of the first play window.
The display device according to claim 1, the controller is further configured to: in response to the media asset play instruction, after acquiring the target video corresponding to the media asset play instruction,

In response to the play mode corresponding to the media asset play instruction being the first mode, load a first play page, wherein the first play page includes a first play window but does not include a second play window, and the first play page is The window is used to play the target video, and the control includes the second playback window;

In response to the play mode corresponding to the media asset play instruction being the second mode, a second play page is loaded, wherein the first play page includes a first play window, and a second play window located above the first play window. A play window, the first play window is used to play the target video, and the second play window is used to play the received local camera data.
A method for playing media assets, comprising:

Receive the media asset playback instruction input by the user;

In response to the media asset playback instruction, obtain a target video corresponding to the media asset playback instruction;

When the control is not set above the first playback window corresponding to the target video, the target video is played in the first playback window;

When the control is provided above the first playback window corresponding to the target video, the display position of the target video is moved in the first playback window away from the control, so that the target video is displayed in a direction away from the control. The center position of the screen is displayed close to the center position of the target display area in the first play window that is not blocked by the controls, wherein the controls are opaque and block one side of the first play window.