CN116980654B - Interaction method, device, equipment and storage medium based on video teaching - Google Patents

Interaction method, device, equipment and storage medium based on video teaching Download PDF

Info

Publication number
CN116980654B
CN116980654B CN202311228761.8A CN202311228761A CN116980654B CN 116980654 B CN116980654 B CN 116980654B CN 202311228761 A CN202311228761 A CN 202311228761A CN 116980654 B CN116980654 B CN 116980654B
Authority
CN
China
Prior art keywords
video
target
sequence data
action
virtual digital
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311228761.8A
Other languages
Chinese (zh)
Other versions
CN116980654A (en
Inventor
潘孟姣
孙健
张远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Small Sugar Technology Co ltd
Original Assignee
Beijing Small Sugar Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Small Sugar Technology Co ltd filed Critical Beijing Small Sugar Technology Co ltd
Priority to CN202311228761.8A priority Critical patent/CN116980654B/en
Publication of CN116980654A publication Critical patent/CN116980654A/en
Application granted granted Critical
Publication of CN116980654B publication Critical patent/CN116980654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • G09B5/12Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations different stations being capable of presenting different information simultaneously
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Abstract

The invention discloses an interaction method, device, equipment and storage medium based on video teaching, relates to the technical field of image communication, and aims to solve the problem that video pictures in the video teaching process cannot meet user learning. The method applied to the server comprises the following steps: acquiring action sequence data corresponding to the target video based on the target video identification; processing motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data; and sending target action sequence data to a client that sent the video request; or driving the virtual digital person based on the target motion sequence data to generate a virtual digital person motion video, and sending the virtual digital person motion video to a client side which sends the video request as a requested video. According to the embodiment of the invention, the teaching video can be processed through interaction with the user in the video teaching process, so that the problem that the learning user cannot see the teaching content clearly is solved.

Description

Interaction method, device, equipment and storage medium based on video teaching
Technical Field
The present invention relates to the field of image communication technologies, and in particular, to an interaction method, apparatus, device, and storage medium based on video teaching.
Background
With the development of internet technology, people have become more and more convenient to learn various kinds of knowledge through networks, and with the rise of Applications (APP) based on various different platforms and the development of video technology, teaching or learning knowledge through videos has become an increasingly popular teaching mode. For example, content providing users record various teaching videos and upload the various teaching videos to certain platforms, and the various teaching videos are pushed to learning users by using platform resources, and the learning users can learn relevant knowledge by watching the teaching videos. The teaching video content is, for example, various discipline content for students, or kitchen, gardening, dance, handwriting, knitting, musical instrument playing, etc., for the general public.
For teaching videos that teach a learner a specific action, such as a weaving teaching video, a dance teaching video, a handwriting teaching video, a playing teaching video of a musical instrument, and the like, a content providing user typically shows the learner the action to be learned using a certain video recording method. Taking dance as an example, dance teaching videos generally provide front dance teaching demonstration of a dancer first, and then demonstrate actions and explain simultaneously. In some dance teaching videos, in order to facilitate a learner to watch movements of body parts outside the front view angle of a dance presenter, a mirror is usually assisted, and dance movements can be presented from the back view angle while the dance movements are presented from the front view angle, so that the learner is helped to correctly master the movements. In other dance teaching videos, the dancer may also be included to group the presentation and explanation of dance movements. Similarly, there are similar contents for teaching videos of other contents, such as fingering, position, etc. when a violin is played.
With the development of terminal technology and internet application technology, more and more platforms adopt terminals, especially mobile terminals, as carriers of Application (APP), so that the main carriers of the teaching video are also mobile terminals, and due to the popularity and portability of the mobile terminals, convenience conditions are provided for a learner to watch the teaching video and learn specific actions, so that the learning threshold is reduced, and dance lovers can learn with dance teachers on site and dance teaching video learning with mobile phones, tablet computers and the like, except for selecting to enter professional dance shifts.
However, when a learner learns using such a teaching video through a platform, there are often inconveniences, for example, the learner cannot see the content displayed in the teaching video clearly due to the too small screen of the terminal. Of course, some platforms provide a picture enlarging function, however, the picture enlarging function only enlarges the whole video picture, but for the learner, the enlarging cannot enlarge the part which the learner wants to enlarge, even after the whole picture is enlarged, the part which the learner wants to enlarge exceeds the current picture, and in addition, the unclear problem caused by the angle problem cannot be solved even if the picture is enlarged. Thus, it can be seen that there is still an improvement in such platforms and applications today in order to provide better services to learning users.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide an interaction method, apparatus, electronic device and storage medium based on video teaching, so as to solve the technical problem that video pictures in the video teaching process cannot meet the learning requirement of users.
In a first aspect, an embodiment of the present invention provides an interaction method based on video teaching, applied to a server, where when a target video playing process receives a video request from a client, the interaction method includes:
acquiring a target video identification and video processing parameters from the video request;
acquiring action sequence data corresponding to a target video based on a target video identifier, wherein the action sequence data comprises action data formed according to a video frame sequence, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video;
processing motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data; and
transmitting the target action sequence data to a client side which sends the video request; or driving a virtual digital person based on the target motion sequence data to generate a virtual digital person motion video, and sending the virtual digital person motion video to a client side sending the video request as a requested video.
In a second aspect, an embodiment of the present invention further provides another video teaching-based interaction method, applied to a client, where the method includes:
responding to a video processing instruction received by a user in a target video playing process, acquiring action sequence data corresponding to a target video from a server, wherein the video processing instruction comprises video processing parameters, the action sequence data comprises action data formed according to video frame sequences, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video;
responding to the action sequence data acquired from a server as original action sequence data corresponding to a target video, and processing the original action sequence data based on the video processing parameters to acquire target action sequence data;
driving a virtual digital person based on the target motion sequence data to generate a virtual digital person motion video, and taking the virtual digital person motion video as a requested video;
responding to the action sequence data acquired from the server to be target action sequence data processed by the server based on the video processing parameters, driving a virtual digital person based on the target action sequence data to generate a virtual digital person action video, and taking the virtual digital person action video as a requested video; and playing the requested video.
In a third aspect, an embodiment of the present invention provides an interaction device based on video teaching, which is applied to a server, and includes a parameter acquisition module, an action sequence data processing module, and a request response module, where the parameter acquisition module is configured to acquire a target video identifier and a video processing parameter from a video request sent by a client; the motion sequence data processing module obtains motion sequence data corresponding to a target video based on a target video identifier, wherein the motion sequence data comprises motion data formed according to a video frame sequence, and the motion data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video; processing motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data; the request response module is configured to send the target motion sequence data to a client that made the video request, or to send the virtual digital human motion video as a requested video to a client that made the video request.
In a fourth aspect, an embodiment of the present invention further provides an interaction device based on video teaching, which is applied to a client, and includes a user operation acquisition module, a data request module, a data processing module, a video generation module and a playing module, where the user operation acquisition module is configured to monitor a user operation in a target video playing process to receive a video processing instruction of a user, and at least acquire a video processing parameter from the video processing instruction; the data request module is configured to respond to the video processing instruction of a user received in the target video playing process, and acquire action sequence data corresponding to a target video from a server, wherein the action sequence data comprises action data formed according to video frame sequences, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video; the data processing module is configured to respond to the action sequence data acquired from the server as original action sequence data corresponding to a target video, and process the original action sequence data based on the video processing parameters to obtain target action sequence data; the video generation module is configured to drive a virtual digital person to generate a virtual digital person action video based on the target action sequence data obtained by the data processing module or the target action sequence data received from the server, and the virtual digital person action video is used as a requested video; the play module is configured to play the requested video.
In a fifth aspect, embodiments of the present invention also provide an electronic device including a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the video teaching-based interaction method applied to a server or to a client as described above.
In a sixth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement a video-teaching-based interaction method as described above for application to a server or for application to a client.
According to the invention, the teaching video is processed through interaction with the user in the video teaching process, so that the problem that the learning user cannot see the teaching content clearly due to the small screen, shooting angle and the like when watching the teaching video is solved.
Drawings
In order to more clearly describe the technical solution of the embodiments of the present invention, the following description briefly describes the drawings in the embodiments of the present invention.
Fig. 1 is a schematic block diagram of a video application system according to one embodiment of the invention.
Fig. 2 is a flowchart of a video teaching-based processing method according to an embodiment of the present invention.
FIG. 3 is a flowchart of a method for generating dancer motion sequence data in dance teaching video according to one embodiment of the present invention.
FIG. 4 is a flowchart of a method for performing a rotation process on a target dance video based on rotation parameters according to one embodiment of the present invention.
Fig. 5 is a flowchart of an interactive method based on video teaching according to an embodiment of the present invention.
Fig. 6 is a schematic view of a playback screen structure of a terminal device screen according to an embodiment of the present invention.
Fig. 7 is a schematic view of a playback screen of a terminal device screen according to another embodiment of the present invention.
Fig. 8 is a flowchart of an interactive method based on video teaching according to a second embodiment of the present invention.
Fig. 9 is a flowchart of a method for processing motion sequence data corresponding to a target video based on video processing parameters by a second client to obtain target motion sequence data according to an embodiment of the present invention.
Fig. 10 is a functional block diagram of a first interactive apparatus 100 applied to a server-side based video teaching according to an embodiment of the present invention.
Fig. 11 is a functional block diagram of a second interactive apparatus 200 applied to a client-side based video teaching according to an embodiment of the present invention.
Fig. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It will be appreciated that such embodiments are provided to make the principles and spirit of the invention clear and thorough, and enabling those skilled in the art to better understand and practice the principles and spirit of the invention. The exemplary embodiments provided herein are merely some, but not all embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the embodiments herein, are within the scope of the present invention.
Those skilled in the art will appreciate that embodiments of the invention may be implemented as a method, apparatus, system, device, computer readable storage medium, or computer program product. Accordingly, the present disclosure may be embodied in at least one of the following forms: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In this document, terms such as first and second are used solely to distinguish one entity (or action) from another entity (or action) without necessarily requiring or implying any order or relationship between such entities (or actions). In this document, an element (e.g., a component, a composition, a process, a step) defined by the phrase "comprising … …" does not exclude the presence of other elements other than those listed, i.e., may include other elements not explicitly listed. Any elements in the figures and their number are used herein for illustration and not limitation, and any naming in the figures is used for distinction only and does not have any limiting meaning.
The principles and spirit of the present invention are explained in detail below with reference to several exemplary or representative embodiments thereof.
The invention provides an interaction method, an interaction device, electronic equipment and a storage medium based on video teaching, which are used for processing teaching videos through interaction with users in the video teaching process so as to meet the requirements of the users on teaching video pictures.
Fig. 1 is a schematic block diagram of a video application system according to one embodiment of the invention. The video application system includes a terminal device 102 and a server 104. Wherein the terminal device 102 may comprise at least one of: smart phones, tablet computers, notebook computers, desktop computers, smart televisions, and the like. The terminal device 102 is provided with a video application client, and the video application client may be a client providing a video application, or a client embedded with an applet (different functions), or a client logged in through a browser. The user may operate on the terminal device 102, for example, the user may open a client installed on the terminal device 102, input an instruction through a client operation, or the user may open a browser installed on the terminal device 102 and input an instruction through a browser operation. After the terminal device 102 receives the instruction input by the user, the request information containing the instruction is transmitted to the server 104 as the video application server. The server 104 performs a corresponding process after receiving the request information, and then returns the process result information to the terminal device 102. User instructions are completed through a series of data processing and information interaction. The video application system in this embodiment serves as a platform, and can provide various teaching videos or teaching videos in a specific field for users. For example, teaching videos of various knitting needle methods, dance videos, musical instrument teaching videos, and the like are provided. The users of the video application system include two types, one type is video providing users and the other type is video application users. The video providing user creates teaching video, and the teaching video is uploaded to the server 104, and the server 104 stores the teaching video created by the video providing user in the database 106 and distributes the teaching video to the client according to preset business logic. A user can browse and play teaching videos through a client installed on the terminal device 102 of the user.
Fig. 2 is a flowchart of a video teaching-based processing method according to an embodiment of the present invention, the method including:
step S110, obtaining a target video identifier and a video processing parameter from the video request.
Step S120, acquiring action sequence data corresponding to a target video based on a target video identifier, wherein the action sequence data comprises action data formed according to a video frame sequence, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video. In the following description, one frame in the motion sequence data is referred to as a motion frame, which corresponds to a video frame of the target video and has the same time information and frame id.
Step S130, processing the motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data.
Step S140, driving the virtual digital person based on the target motion sequence data to generate a virtual digital person motion video.
And step S150, playing the virtual digital human action video as a requested video.
The method flow shown in fig. 2 may be implemented based on the system shown in fig. 1, for example, various data, such as various video providing teaching videos made by users, personal data of users, intermediate data in a service processing process, etc., are stored in the database 106, and the server 104 cooperates with the terminal device 102 to implement the method shown in fig. 2. For example, the server 104 performs steps S110 to S140 when receiving a video request from a client in the terminal device 102, obtains a virtual digital human action video, and sends the virtual digital human action video to the client, which performs step S150. Alternatively, the server 104 performs steps S110 to S130 to obtain target motion sequence data, sends the target motion sequence data to the client, performs step S140 to obtain a virtual digital human motion video, and performs step S150. Alternatively, the server 104 performs steps S110 to S120 to obtain motion sequence data corresponding to the target video, and the motion sequence data is transmitted to the client, and the client performs steps S130 to S150. When a video application user watches teaching video learning and does not watch clear actions, the video meeting the video processing parameters set by the video application user can be obtained through the processing method shown in fig. 2 by setting the processing parameters such as the rotation angle, the amplification factor, the local amplification part and the like of the target video currently watched, so that the requirement of the video application user on the teaching video is met, and the application experience of the user is improved.
For the server 104, the teaching videos managed by the server include videos already stored in the database 106, and the teaching videos are called stock videos; the teaching videos managed by the system also comprise teaching videos which are newly received within a preset time period, such as teaching videos which are newly received within 24 hours per day, and the teaching videos are called incremental videos. For stock videos, server 104 processes each stock video to obtain motion sequence data of a target object of the video, and in an alternative embodiment, processes the motion sequence data of multiple views or view orientations according to the motion type of the target object in the video. For example, the dance video is processed to obtain the motion sequence data of the front, back or side of the dancer in the video. And respectively processing the cooking teaching videos to obtain the action sequence data of the front, the side and the top of the kitchen in the videos. For delta videos, server 104 processes the delta videos at a timing to obtain one or more motion sequence data for each delta video. Of course, the client in the terminal device 102 may also process the video to obtain the motion sequence data of the target object in the video.
FIG. 3 is a flowchart of a method for generating dancer motion sequence data in dance teaching video according to one embodiment of the present invention. The method briefly comprises the following steps:
step S111, generating dancer first motion sequence data according to a skeleton node recognition algorithm and characteristic data of dancers in videos, wherein each frame data in the first motion sequence data comprises a frame id, time point information and a skeleton adjacent joint point coordinate vector (comprising a joint point name and a joint point three-dimensional coordinate). Wherein the present embodiment uses the joint points of the bone as bone key points to represent the motion. The storage format of the motion sequence data of each motion frame is, for example: [ frame id: xx; time information: xx; joint 1 name: three-dimensional coordinates of adjacent joint points; joint 2 name: adjacent node three-dimensional coordinates … … ].
Step S112, the visual orientation of the dancer in the target video is identified. In this embodiment, the visual orientation of the dancer in the target video is preset as three: front, back and side. In order to judge the visual orientation of a dancer in a target video, a preset number of video frames (such as 100 frames) are extracted from the video as identified working frames, the facial features of the dancer in each working frame are identified, the implementation also classifies the facial features into three types of front, back and side, and when the proportion of the number of the working frames comprising the facial features of a certain type to the total number of the working frequencies is higher than a threshold (such as 70%), the visual orientation of the dancer in the target video is determined to be the same as the facial features of the dancer, either front, back or side.
Step S113, based on the visual orientation and the target orientation of the dancer in the target video, converting the first motion sequence data to obtain corresponding second motion sequence data. For example, when the visual direction of the dancer in the target video is the front and the target direction is the back, the body of the dancer needs to be rotated 180 degrees clockwise along the Y axis, so that the three-dimensional coordinates (x, Y, z) of each joint point in the first motion sequence data of the front can be converted into (-x, Y, -z), and the second motion sequence data of the back can be obtained; when the visual direction of the dancer in the target video is the front and the target direction is the side, the body of the dancer needs to be rotated 90 degrees or 270 degrees clockwise along the Y axis, and three-dimensional coordinates (x, Y, z) of each joint point in the front first action sequence data can be converted into (-z, Y, x) or (z, Y, -x), so that the third action sequence data of the side is acquired.
For one dance video, the first action sequence data on the front side, the second action sequence data on the back side and the third action sequence data on the side can be obtained through the processing. For videos of other action classes, action sequence data of different visual angles or different visual orientations are obtained according to requirements. For example, for composing a class video, first motion sequence data for the front face and second motion sequence data for the top down view may be obtained.
In step S130, when the video processing parameters are different, the processing of the motion sequence data corresponding to the target video is different. For example, when the video processing parameter is rotation, a rotation direction and a rotation angle are included at this time, wherein the rotation direction is divided into, for example, clockwise and counterclockwise. The action sequence data corresponding to the target video at this time includes front, back and side. Referring to fig. 4, fig. 4 is a flowchart illustrating a method for performing a rotation process on a target dance video based on rotation parameters according to an embodiment of the present invention. In this embodiment, it is assumed that the video processing parameter is rotated clockwise by β degrees along the Y-axis. The processing method comprises the following steps:
step S131, the visual orientation of the target object in the target dance video is identified.
Step S132, determining whether the visual orientation of the target object is front or back, if front, executing step S133, and if back, executing step S135.
Step S133, acquiring original action sequence data of the front face.
Step S134, the x coordinate and the z coordinate in the three-dimensional coordinates of each bone joint point in each action frame are calculated based on the formulas (1-1) and (1-2), and the three-dimensional coordinates (x, y, z) of each joint point are converted into (x_change, y, z_change), so that the processing of the action sequence data is completed.
(1-1)
(1-2)
Step S135, acquiring original motion sequence data of the back surface.
And step S136, calculating the x coordinate and the z coordinate in the three-dimensional coordinates of each bone joint point in each action frame based on the formulas (1-3) and (1-4), and converting the three-dimensional coordinates (x, y, z) of each joint point into (x_change, y, z_change) to finish the processing of the action sequence data.
(1-3)
(1-4)
The coordinates in each frame are calculated frame by frame to obtain the target action sequence data.
When the video processing parameter is a magnification, the parameter typically also includes a magnification position, for example, a user-specified position, or a preset position, where the user-specified position is prioritized. In processing based on the magnification, similar to the flow as in fig. 4, first, the visual orientation of the target object in the target video is recognized, and the original motion sequence data of the same visual orientation is processed.
In one embodiment, when the front-side raw motion sequence data is amplified, it is assumed that a three-dimensional coordinate of a joint point in each motion frame in the raw motion sequence data is (x, y, z). Firstly, the data in the sequence are proportionally amplified in three-dimensional space based on the coordinate origin point, and the amplification factor is alpha. The equally scaled data is then redirected with the coordinates of the specified location as origin, resulting in converted three-dimensional coordinates (x_big, y_big, z_big), specifically calculated with reference to the following formulas (2-1), (2-2), (2-3):
(2-1)
(2-2)
(2-3)
Wherein W is the height of a target object (such as dancer in dance video) in the target video when the arm is lifted upwards, H is the width of the target object in the target video when the arms are opened, and mu and delta are respectively a horizontal movement coefficient and a vertical movement coefficient. The relationship between the amplification position and the coefficient is shown in the following table 1:
TABLE 1
For example, when the magnification position is the whole body, the coordinates are magnified by α times and remain unchanged, and the corresponding three-dimensional coordinates are (αx, αy, αz).
When the enlarged position is the upper body, the coordinates need to be moved down along the Y-axis, and the corresponding three-dimensional coordinates are (alpha x, alpha Y-alpha H/2, alpha z).
When the enlarged position is the lower body, the coordinates are moved up along the Y-axis, and the corresponding three-dimensional coordinates are (αx, αy+αh/2, αz).
When the enlarged position is the left upper body joint of the dancer's left arm, left wrist, etc., the coordinates move left along the X-axis and move down along the Y-axis, the corresponding three-dimensional coordinates are (αx- αW/2, αy- αH/2, αz), and so on.
When the video processing parameters include both the rotation parameters and the magnification parameters, the magnification processing may be performed first, and then the rotation processing may be performed.
When there are a plurality of original motion sequence data, the original motion sequence data is processed to obtain a plurality of target motion sequence data. Accordingly, in step S140, the virtual digital person is driven based on the plurality of target motion sequence data to generate a plurality of virtual digital person motion videos.
In one embodiment, the present invention provides a variety of preset virtual digital figures, such as, for example, an ethnic dance figure, a street dance teenager figure, a ballet dance figure, a male figure, and the like. In the case where the target video provides user authorization, bone data, skin data, clothing data, etc. of the target object are collected to constitute an avatar of the target object. In step S140, when the virtual digital person is driven based on the target motion sequence data to generate a virtual digital person motion video, if the target video provides the authorization of the user, generating a virtual digital person image according to the target object in the target video; if not, an avatar of the virtual digital person of the same class as the target object in the target video is employed.
In a further embodiment, when the local magnification part is included in the video processing parameter, the local magnification part is taken as the center of the video picture when the virtual digital human action video is generated, so that the action of the part which the user wants to zoom in is reserved in the generated virtual digital human action video picture. In another further embodiment, when the virtual digital human action video is generated, the local amplification part is displayed in a circled mode, and the area outside the local amplification part is subjected to desalination or blurring treatment, so that the amplification part is emphasized, other parts are weakened, and the amplified video picture is emphasized.
When the client executes step S150 to play the virtual digital person action video as the requested video, there may be a plurality of different play modes, for example, when only one virtual digital person action video is obtained, the virtual digital person action video and the target video are played synchronously or only the virtual digital person action video is played. In synchronous playing, the two can be played in different windows or combined together and played in one window.
When a plurality of virtual digital human action videos are obtained, the plurality of virtual digital human action videos can be combined together to form a second combined video, and the second combined video is synchronously played with the target video as the requested video or only played as the second combined video of the requested video.
And combining the plurality of virtual digital human action videos and the target video into a third combined video, and playing the third combined video as the requested video.
In the above-mentioned synthesis of various videos, the synthesis is implemented according to time information, so that the actions of several identical target objects in the synthesized video can be synchronously implemented, so that the learner can conveniently see the amplification actions which he wants to see, the actions after turning over a certain angle, etc.
Application example one
Fig. 5 is a flowchart of an interactive method based on video teaching according to an embodiment of the present invention. In this application embodiment, a video application client is installed in the terminal device 102, and a video application server is installed in the server 104. The method comprises the following steps:
in step S101a, the user starts a video application by clicking on a video application icon in the screen of the terminal device 102.
After receiving the start instruction of the user, the client in the terminal device 102 interacts with the server in the server 104 to complete the start of the video application in step S201a, and after the video application is started, the server pushes video data to the client, so that the user can browse a corresponding video list and can play by clicking a certain video.
In step S102a, the user selects a video as the target video and clicks the play button.
In step S202a, the client sends a target video data request to the server based on the play operation of the user.
The service side reads the target video data from the database 106 in step S301a based on the target video data request, and transmits the target video data to the client in step S302 a.
In step S203a, the client receives the target video data and plays the target video, and provides the video processing parameter options. In one embodiment, to facilitate user setting of video processing parameters, the client provides parameter setting buttons and provides various parameter items available for setting in the form of a menu list. Parameter items include, but are not limited to:
1) Rotation angle (clockwise 0 degrees, 30 degrees, 60 degrees, 90 degrees, etc. based on the front face, and a desired angle value may be input);
2) Amplifying position (joints such as whole body, upper body, lower body, left arm, left wrist, etc.);
3) Magnification (1-fold, 1.5-fold, 2-fold, etc.);
4) Dancer figures (virtual digital person figure a, virtual digital person figure B, author figures, authorized, etc.);
5) Picture structure (front + rotation/magnification, back + rotation/magnification, front + back + rotation/magnification, etc.).
The various parameter items include default parameter values, and when the user selects the parameter item and does not determine a specific parameter value, the default parameter values are adopted.
In another embodiment, the client captures the user's operation on the current device screen and determines the corresponding video processing parameters based on the type of preset parameters corresponding to the screen operation. For example, a screen operation in which two contact points of the screen are simultaneously slid outward is determined as "magnification", a midpoint of a two-point line is a magnification position, and an average value of sliding distances of the two contact points is taken as a magnification factor. For another example, a screen operation of arc sliding clockwise or counterclockwise of a single contact point is determined as "rotation", the contact point is a rotation base point, a direction of arc sliding is a rotation direction, and an angle of arc sliding is taken as a rotation angle. The video processing parameters set by the user can be determined through capturing and calculating the screen operation, the parameter setting steps are simplified, and the user operation is facilitated.
In step S103a, the user sets video processing parameters such as magnification, rotation angle, and the like.
In step S204a, the client sends a video request to the server, where the video request includes video processing parameters.
After receiving the video request sent by the client, the server obtains the target video identifier and the video processing parameter from the video request in step S303 a.
In step S304a, the service terminal obtains the motion sequence data corresponding to the target video based on the target video identifier. When there is no motion sequence data corresponding to the target video in the database 106, the target video is processed based on the flow shown in fig. 3 to obtain one or more motion sequence data.
In step S305a, the server processes the motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data, and the processing procedure is described in the foregoing, which is not repeated herein, and at this time, one or more target motion sequence data are obtained.
In step S306a, the server sends the target action sequence data to the client that sent the video request.
In step S205a, after receiving the target motion sequence data, the client drives the virtual digital person to generate a corresponding virtual digital person motion video based on the target motion sequence data. When a plurality of target motion sequence data exist, a plurality of virtual digital human motion videos are generated.
Step S206a, playing the virtual digital human action video. The client plays the target video and the virtual human action video in a default picture structure, for example, when the target video is in a front visual orientation, the front and the rotated or enlarged video are synchronously played, and when the target video is in a back visual orientation, the back and the rotated or enlarged video are synchronously played, see fig. 6, fig. 6 is a schematic diagram of a play picture structure of a screen of the terminal device according to an embodiment of the present invention. Or front, back and rotated/enlarged video synchronized play, see fig. 7. Fig. 7 is a schematic diagram of a play frame structure of a terminal device screen according to another embodiment of the present invention. Of course, the picture structure may also be set by the user as a video playing parameter, and when the user sets the picture structure in the video playing parameter, the picture structure set by the user is preferentially adopted. The picture structure can be formed by a plurality of playing windows or a composite video played by one playing window. When synthesizing video, synthesis is performed according to the layout of the picture structure.
Application example II
Fig. 8 is a flowchart of an interactive method based on video teaching according to a second embodiment of the present invention. In this application embodiment, a video application client is installed in the terminal device 102, and a video application server is installed in the server 104. The first embodiment of the overall interaction step in this embodiment is the same as that of the first embodiment, except that when the client sends the video request in step S204b, the video request does not include the video processing parameters, and after obtaining the motion sequence data corresponding to the target video, the server sends the number of motion sequences corresponding to the target video to the client in step S305 b. In step S205b, the client processes the motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data. Fig. 9 is a flowchart of a method for processing motion sequence data corresponding to a target video based on video processing parameters by a second client to obtain target motion sequence data according to an embodiment of the present invention. The method comprises the following steps:
Step S2051b, respectively identifies a first visual orientation of the target object in the target video and a second visual orientation of the original motion sequence data.
Step S2052b, determining whether the second visual orientation is the same as the first visual orientation, if so, in step S2053b, calculating three-dimensional coordinates of the target object skeleton key point in each frame in the original motion sequence data based on the video processing parameters to obtain target three-dimensional coordinates, where the target three-dimensional coordinates of the target object skeleton key point in each frame constitute target motion sequence data; if the second visual orientation and the first visual orientation are not the same, then in step S2054b, an orientation difference of the second visual orientation and the first visual orientation is calculated.
Step S2055b, calculating the three-dimensional coordinates of the target object skeleton key point in each frame in the original motion sequence data based on the azimuth difference and the video processing parameter to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key point in each frame form the target motion sequence data.
When a plurality of original motion sequence data exist, the original motion sequence data which are the same as the first visual orientation of the target object in the target video are calculated, then the other original motion sequence data are calculated based on the orientation differences of the visual orientations of the other original motion sequences and the first visual orientation of the target object in the target video, and the other original motion sequence data are calculated based on the orientation differences and the video processing parameters, so that a plurality of target motion sequence data are obtained.
Then in step S206b, the client drives the virtual digital person to generate a corresponding virtual digital person action video based on the target action sequence data.
When a plurality of target motion sequence data exist, a plurality of virtual digital human motion videos are generated. In step S207b, the virtual digital human action video is played. Wherein, since the step of processing the motion sequence data corresponding to the target video based on the video processing parameter to obtain the target motion sequence data is completed at the client, when the client sends the video request in step S204b, only the target video identifier is included in the request, and the video processing parameter is not included. Other steps, such as steps S101b to S103b implemented by the user, steps S201b to S204b implemented by the client, and steps S301b to S304b implemented by the server, are the same as those of the first embodiment, and are not described herein.
In addition, in the above two application embodiments, when the database 106 of the server side does not have the motion sequence data corresponding to the target video, the server side or the client side may process the target video to obtain the corresponding motion sequence data, and then process the motion sequence data according to the video processing parameters.
In another aspect, the present invention further provides an interaction device based on video teaching, referring to fig. 10, fig. 10 is a schematic block diagram of a first interaction device 100 based on video teaching applied to a server according to an embodiment of the present invention, where the first interaction device 100 includes a parameter obtaining module 110, an action sequence data processing module 120, a first video generating module 130, and a request responding module 140. When the server receives the video request sent by the client, in this embodiment, the parameter obtaining module 110 obtains the target video identifier and the video processing parameter from the video request sent by the client, and sends the target video identifier and the video processing parameter to the action sequence data processing module 120. The video processing parameters include, but are not limited to, one or more of rotation angle, magnification, local magnification location, picture structure, and virtual human figure. The motion sequence data processing module 120 obtains motion sequence data corresponding to a target video from the database 106 based on the target video identifier, wherein the motion sequence data includes motion data composed in a video frame sequence, the motion data of each video frame is composed of three-dimensional coordinates of a skeletal key point of the target object in the target video, and processes the motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data. The first video generation module 130 is an optional module capable of driving a virtual digital person based on the target motion sequence data to generate a virtual digital person motion video. The request response module 140 may send the target motion sequence data to the client that sends the video request, or send the virtual digital human motion video as the requested video to the client that sends the video request, or send the obtained motion sequence data corresponding to the target video to the client that sends the video request.
The motion sequence data processing module 120 may process the target video according to the flow shown in fig. 3 to obtain one or more motion sequence data corresponding to the target video based on the target video identifier, where the motion sequence data corresponding to the target video is not obtained from the database 106. When the number of the original motion sequence data corresponding to the target video is plural, the motion sequence data processing module 120 processes the plural original motion sequence data to obtain plural target motion sequence data. Correspondingly, the first video generating module 130 obtains a plurality of virtual digital human action videos.
For example, for dance video, the motion sequence data corresponding to the target video are front, back and side motion sequence data, when the visual orientation of the target video is front, the motion sequence data processing module 120 calculates the motion sequence data identical to the visual orientation of the current target video according to the video processing parameters to obtain one target motion sequence data, that is, obtains one target motion sequence data of the front according to the motion sequence data of the front, then calculates the motion sequence data of the back according to the 180 degree orientation difference between the front and the back and the video processing parameters to obtain the target motion sequence data of the back, and similarly calculates the motion sequence data of the side according to the 90 degree orientation difference between the front and the side and the video processing parameters to obtain the target motion sequence data of the side.
The first video generating module 130 obtains a front virtual digital person motion video, a back virtual digital person motion video, and a side virtual digital person motion video based on the front target motion sequence data, the back target motion sequence data, and the side target motion sequence data, respectively.
Further, the first video generation module 130 synthesizes the virtual digital human action video with the target video to generate a first synthesized video after generating the virtual digital human action video, at which time the request response module 140 sends the first synthesized video as a requested video to the client that issued the video request.
Optionally, the first video generating module 130 synthesizes the multiple virtual digital human action videos into a second synthesized video after generating the multiple virtual digital human action videos, at this time, the request responding module 140 sends the second synthesized video as the requested video to the client that sends the video request; or the first video generating module 130 synthesizes the plurality of virtual digital human action videos and the target video into a third synthesized video, and the request responding module 140 transmits the third synthesized video as the requested video to the client that issued the video request.
Fig. 11 is a schematic block diagram of a second interactive apparatus 200 applied to a client-side based video teaching according to an embodiment of the present invention, the second interactive apparatus 200 includes a user operation acquisition module 210, a data request module 220, a data processing module 230, a second video generation module 240, and a play module 250, wherein the client-side includes an interactive module 201 connected to receive an instruction of playing a target video from a user and play the target video. The user operation acquisition module 210 is configured to monitor user operations during the target video playback process and, upon receiving a video processing instruction from the user by the interaction module 201, acquire at least video processing parameters from the video processing instruction. In one embodiment, the data processing module 230 provides the video processing parameter options and displays the video processing parameter options to the user through the interaction module 201, and when the user sets the video processing parameter options, the corresponding video processing parameters are acquired. In another embodiment, the data processing module 230 captures user operations on the device screen in the interaction module 201, and determines corresponding video processing parameters based on the type of preset parameters corresponding to the screen operations. The video processing parameters at least comprise one or more of rotation angle, magnification, local magnification position, picture structure and virtual digital human figure. The user operation acquisition module 210 transmits the video processing parameters to the data request module 220.
The interaction module 201 is, for example, an input unit such as various hardware keyboards, software keyboards, touch units, etc., and an output unit such as a display screen, etc.
The data request module 220 receives a video processing instruction of a user in a target video playing process, and obtains action sequence data corresponding to a target video from a server. The method comprises the steps that original action sequence data corresponding to a target video are obtained from a server, or one or more target action sequence data obtained through processing of the server based on video processing parameters, or one or more virtual digital human action videos generated after processing of the original action sequence data corresponding to the target video by the server based on the video processing parameters.
When the motion sequence data obtained from the server by the data request module 220 is the original motion sequence data corresponding to the target video, the motion sequence data is sent to the data processing module 230, and the data processing module 230 processes the original motion sequence data based on the video processing parameters to obtain one or more target motion sequence data, and sends the one or more target motion sequence data to the second video generating module 240. The second video generating module 240 drives the virtual digital person to generate a virtual digital person action video based on the target action sequence data obtained by the data processing module or the target action sequence data received from the server, and sends the virtual digital person action video to the playing module 250 as a requested video. The playing module 250 plays the requested video through the display screen in the interaction module 201.
When the data request module 220 obtains the virtual digital human action video from the server, it sends the virtual digital human action video to the play module 250.
Optionally, the data processing module 230, when processing the original motion sequence data corresponding to the target video based on the video processing parameter to obtain the target motion sequence data, includes the following steps:
respectively identifying a first visual orientation of a target object in a target video and a second visual orientation of original action sequence data;
responding to the second visual orientation and the first visual orientation to be the same, and calculating the three-dimensional coordinates of the target object skeleton key points in each frame in the original action sequence data based on the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points in each frame form target action sequence data;
responsive to the second visual orientation and the first visual orientation being different, calculating an orientation difference of the second visual orientation and the first visual orientation; and
and calculating the three-dimensional coordinates of the target object skeleton key points in each frame in the original action sequence data based on the azimuth difference and the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points in each frame form target action sequence data.
Optionally, after generating the plurality of virtual digital human action videos, the second video generating module 240 further synthesizes the plurality of virtual digital human action videos into a second synthesized video, and sends the second synthesized video as the requested video to the playing module 250; or the plurality of virtual digital human action videos and the target video are combined into a third combined video, and the third combined video is sent to the playing module 250 as the requested video.
Optionally, when the video processing parameter includes a local amplification part, the second video generating module 240 uses the local amplification part as a center of the video when generating the virtual digital human action video; and/or when the virtual digital human action video is generated, carrying out circled display on the local amplification part, and carrying out desalination or blurring treatment on the area outside the local amplification part.
The second video generating module 240 provides the authorization of the user based on the target video when generating the virtual digital person action video, and generates the virtual digital person image according to the target object in the target video; or adopting a virtual digital human figure which is similar to a target object in the target video; or use a user-specified virtual digital persona in the video processing parameters.
The invention also provides an electronic device comprising a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the video teaching-based interactive apparatus method of any of the above embodiments. The electronic device in the present invention may be the terminal device 102 in fig. 1 or the server 104 in fig. 1. Fig. 12 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 12, the electronic device may include a processor 601 and a memory 602 storing computer program instructions.
In particular, the processor 601 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present invention.
Memory 602 may include mass storage for data or instructions. By way of example, and not limitation, memory 602 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the above. The memory 602 may include removable or non-removable (or fixed) media, where appropriate. Memory 602 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 602 is a non-volatile solid state memory.
The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to methods in accordance with aspects of the present disclosure.
The processor 601 reads and executes the computer program instructions stored in the memory 602 to implement any of the video teaching based interaction methods of the above embodiments.
In one example, the electronic device may also include a communication interface 603 and a bus 610. As shown in fig. 12, the processor 601, the memory 602, and the communication interface 603 are connected to each other through a bus 610 and perform communication with each other. The electronic device in the embodiment of the invention can be a server or other computing devices, and also can be a cloud server.
The communication interface 603 is mainly used for implementing communication between each module, apparatus, unit and/or device in the embodiment of the present invention.
Bus 610 includes hardware, software, or both, coupling components of the online data flow billing device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 610 may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.
In addition, in combination with the teaching video-based interaction method in the above embodiment, the embodiment of the present invention may be implemented by providing a computer storage medium. The computer storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the video teaching-based interaction methods of the above embodiments.
The invention also provides a computer program product comprising computer program instructions which, when executed by a processor, implement any of the video teaching based interaction methods of the above embodiments. The computer program product is, for example, a software installation package, a plug-in, etc.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. The machine-readable medium may include non-transitory computer-readable storage media such as including electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, the machine-readable medium may also include Radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims (21)

1. The interactive method based on video teaching is characterized by being applied to a server, and when a video request from a client is received in a target video playing process, the interactive method comprises the following steps:
acquiring a target video identification and video processing parameters from the video request;
acquiring action sequence data corresponding to a target video based on a target video identifier, wherein the action sequence data comprises action data formed according to a video frame sequence, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video;
Processing action sequence data corresponding to the target video based on the video processing parameters to obtain target action sequence data; and
transmitting the target action sequence data to a client side which sends the video request; or driving a virtual digital person based on the target motion sequence data to generate a virtual digital person motion video, and sending the virtual digital person motion video to a client side sending the video request as a requested video;
wherein the step of processing the motion sequence data corresponding to the target video based on the video processing parameters to obtain the target motion sequence data comprises:
identifying a visual orientation of a target object in a target video;
acquiring original action sequence data of a target object with the same visual orientation; and
and calculating the three-dimensional coordinates of the target object skeleton key points in each video frame in the original motion sequence data based on the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points of each video frame form target motion sequence data.
2. The video-teaching based interaction method of claim 1, further comprising, after generating the virtual digital human action video: and synthesizing the virtual digital human action video and the target video together to generate a first synthesized video, and sending the first synthesized video to a client side sending the video request as a requested video.
3. The interactive method according to claim 1, wherein when the video processing parameters are plural, plural target motion sequence data are obtained based on different video processing parameters and the original motion sequence data, respectively; or alternatively
Obtaining a plurality of target action sequence data based on the video processing parameters and the original action sequence data of a plurality of different visual orientations corresponding to the target video;
correspondingly, the virtual digital person is driven based on the plurality of target motion sequence data to generate a plurality of virtual digital person motion videos.
4. The interactive video teaching-based method according to claim 3, wherein a plurality of virtual digital human action videos are synthesized into a second synthesized video after being generated, and the second synthesized video is transmitted as a requested video to a client that makes the video request; or synthesizing the plurality of virtual digital human action videos and the target video into a third synthesized video, and sending the third synthesized video to a client side sending the video request as the requested video.
5. The interactive method according to claim 1, wherein the video processing parameters include at least one or more of rotation angle, magnification, local magnification location, picture structure, and virtual digital character.
6. The interactive method according to claim 5, wherein when the video processing parameter includes a local enlargement part, the local enlargement part is taken as the center of a video frame when generating a virtual digital human action video; and/or when the virtual digital human action video is generated, carrying out circled display on the local amplification part, and carrying out desalination or blurring treatment on the area outside the local amplification part.
7. The video-teaching based interaction method of claim 1, further comprising: providing authorization of a user based on the target video, and generating a virtual digital human figure according to a target object in the target video; or adopting a virtual digital human figure which is similar to a target object in the target video; or use a user-specified virtual digital persona in the video processing parameters.
8. An interactive method based on video teaching, which is applied to a client, the method comprising:
responding to a video processing instruction received by a user in a target video playing process, acquiring action sequence data corresponding to a target video from a server, wherein the video processing instruction comprises video processing parameters, the action sequence data comprises action data formed according to video frame sequences, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video;
Responding to the action sequence data acquired from a server as original action sequence data corresponding to a target video, and processing the original action sequence data based on the video processing parameters to acquire target action sequence data;
driving a virtual digital person based on the target motion sequence data to generate a virtual digital person motion video, and taking the virtual digital person motion video as a requested video;
responding to the action sequence data acquired from the server to be target action sequence data processed by the server based on the video processing parameters, driving a virtual digital person based on the target action sequence data to generate a virtual digital person action video, and taking the virtual digital person action video as a requested video; and
playing the requested video;
the step of processing the original motion sequence data corresponding to the target video based on the video processing parameters to obtain target motion sequence data comprises the following steps:
respectively identifying a first visual orientation of a target object in a target video and a second visual orientation of original action sequence data;
responding to the second visual orientation and the first visual orientation to be the same, and calculating the three-dimensional coordinates of the target object skeleton key points in each frame in the original action sequence data based on the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points in each frame form target action sequence data;
Responsive to the second visual orientation and the first visual orientation being different, calculating an orientation difference of the second visual orientation and the first visual orientation; and
and calculating the three-dimensional coordinates of the target object skeleton key points in each frame in the original action sequence data based on the azimuth difference and the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points in each frame form target action sequence data.
9. The video-teaching based interaction method of claim 8, further comprising, after generating the virtual digital human action video: and synthesizing the virtual digital human action video and the target video together to generate a first synthesized video, and taking the first synthesized video as a requested video.
10. The video teaching based interaction method according to claim 8, wherein when the target motion sequence data is plural, the virtual digital person is driven based on the plural target motion sequence data to generate plural virtual digital person motion videos, respectively; wherein each target motion sequence data corresponds to one or more video processing parameters or the visual orientation of each target motion sequence data is different.
11. The video-teaching based interaction method of claim 10, further comprising, after generating the plurality of virtual digital human action videos: synthesizing the plurality of virtual digital human action videos into a second synthesized video, and taking the second synthesized video as a requested video; or synthesizing the plurality of virtual digital human action videos and the target video into a third synthesized video, and taking the third synthesized video as the requested video.
12. The interactive method according to claim 8, wherein the video processing parameters include at least one or more of rotation angle, magnification, local magnification location, picture structure, and virtual digital character.
13. The interactive method according to claim 12, wherein when the video processing parameter includes a local enlargement part, the local enlargement part is taken as the center of a video frame when generating a virtual digital human action video; and/or when the virtual digital human action video is generated, carrying out circled display on the local amplification part, and carrying out desalination or blurring treatment on the area outside the local amplification part.
14. The video-teaching based interaction method of claim 8, further comprising: providing authorization of a user based on the target video, and generating a virtual digital human figure according to a target object in the target video; or adopting a virtual digital human figure which is similar to a target object in the target video; or use a user-specified virtual digital persona in the video processing parameters.
15. The video-teaching based interaction method of claim 8, further comprising: playing the requested video synchronously with the target video.
16. The video teaching-based interaction method according to claim 8, further comprising, during the playing of the target video: providing video processing parameter options, and correspondingly, acquiring corresponding video processing parameters based on the setting of the user in the video processing parameter options.
17. The video teaching-based interaction method according to claim 8, further comprising, during the playing of the target video: capturing the operation of a user on the screen of the current device, and determining corresponding video processing parameters based on the preset parameter types corresponding to the screen operation.
18. An interaction device based on video teaching is characterized in that the interaction device is applied to a server and comprises:
the parameter acquisition module is configured to acquire a target video identification and video processing parameters from a video request sent by a client;
the motion sequence data processing module is configured to acquire motion sequence data corresponding to a target video based on a target video identifier, wherein the motion sequence data comprises motion data formed according to a video frame sequence, and the motion data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video; processing action sequence data corresponding to the target video based on the video processing parameters to obtain target action sequence data; and
a request response module configured to send the target motion sequence data to a client that issued the video request, or to drive a virtual digital person based on the target motion sequence data to generate a virtual digital person motion video, the virtual digital person motion video being sent as a requested video to the client that issued the video request;
wherein the step of processing the motion sequence data corresponding to the target video based on the video processing parameters to obtain the target motion sequence data comprises:
Identifying a visual orientation of a target object in a target video;
acquiring original action sequence data of a target object with the same visual orientation; and
and calculating the three-dimensional coordinates of the target object skeleton key points in each video frame in the original motion sequence data based on the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points of each video frame form target motion sequence data.
19. An interactive device based on video teaching, which is applied to a client, comprises:
the system comprises a user operation acquisition module, a video processing module and a video processing module, wherein the user operation acquisition module is configured to monitor user operation in a target video playing process so as to receive video processing instructions of a user and at least acquire video processing parameters from the video processing instructions;
the data request module is configured to respond to the video processing instruction of a user received in the target video playing process, and acquire action sequence data corresponding to a target video from a server, wherein the action sequence data comprises action data formed according to the sequence of video frames, and the action data of each video frame is formed by three-dimensional coordinates of a target object skeleton key point in the target video;
The data processing module is configured to respond to the action sequence data acquired from the server as original action sequence data corresponding to the target video, and process the original action sequence data based on the video processing parameters to acquire target action sequence data;
the video generation module is configured to drive the virtual digital person to generate a virtual digital person action video based on the target action sequence data obtained by the data processing module or the target action sequence data received from the server, and the virtual digital person action video is used as a requested video; and
a play module configured to play the requested video;
the step of processing the original motion sequence data based on the video processing parameters to obtain target motion sequence data comprises the following steps:
respectively identifying a first visual orientation of a target object in a target video and a second visual orientation of original action sequence data;
responding to the second visual orientation and the first visual orientation to be the same, and calculating the three-dimensional coordinates of the target object skeleton key points in each frame in the original action sequence data based on the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points in each frame form target action sequence data;
Responsive to the second visual orientation and the first visual orientation being different, calculating an orientation difference of the second visual orientation and the first visual orientation; and
and calculating the three-dimensional coordinates of the target object skeleton key points in each frame in the original action sequence data based on the azimuth difference and the video processing parameters to obtain target three-dimensional coordinates, wherein the target three-dimensional coordinates of the target object skeleton key points in each frame form target action sequence data.
20. An electronic device comprising a processor and a memory storing computer program instructions; the electronic device, when executing the computer program instructions, implements the video-teaching-based interaction method applied to a server according to any one of claims 1 to 7, or implements the video-teaching-based interaction method applied to a client according to any one of claims 8 to 17.
21. A computer-readable storage medium, having stored thereon computer program instructions which, when executed by a processor, implement the video-teaching based interaction method applied to a server according to any of claims 1-7 or the video-teaching based interaction method applied to a client according to any of claims 8-17.
CN202311228761.8A 2023-09-22 2023-09-22 Interaction method, device, equipment and storage medium based on video teaching Active CN116980654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311228761.8A CN116980654B (en) 2023-09-22 2023-09-22 Interaction method, device, equipment and storage medium based on video teaching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311228761.8A CN116980654B (en) 2023-09-22 2023-09-22 Interaction method, device, equipment and storage medium based on video teaching

Publications (2)

Publication Number Publication Date
CN116980654A CN116980654A (en) 2023-10-31
CN116980654B true CN116980654B (en) 2024-01-19

Family

ID=88475322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311228761.8A Active CN116980654B (en) 2023-09-22 2023-09-22 Interaction method, device, equipment and storage medium based on video teaching

Country Status (1)

Country Link
CN (1) CN116980654B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860243A (en) * 2020-07-07 2020-10-30 华中师范大学 Robot action sequence generation method
CN113596537A (en) * 2020-04-30 2021-11-02 聚好看科技股份有限公司 Display device and playing speed method
CN113642394A (en) * 2021-07-07 2021-11-12 北京搜狗科技发展有限公司 Action processing method, device and medium for virtual object
CN113656640A (en) * 2021-08-23 2021-11-16 成都拟合未来科技有限公司 Fitness training method, system, device and medium
CN114245210A (en) * 2021-09-22 2022-03-25 北京字节跳动网络技术有限公司 Video playing method, device, equipment and storage medium
CN114399713A (en) * 2022-01-11 2022-04-26 袁鲁荣 Online dance teaching action analysis system
CN114842547A (en) * 2022-01-11 2022-08-02 南京工业大学 Sign language teaching method, device and system based on gesture action generation and recognition
CN116320534A (en) * 2023-03-23 2023-06-23 北京卡路里信息技术有限公司 Video production method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200185006A1 (en) * 2018-12-06 2020-06-11 Ran Tene System and method for presenting a visual instructional video sequence according to features of the video sequence

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596537A (en) * 2020-04-30 2021-11-02 聚好看科技股份有限公司 Display device and playing speed method
CN111860243A (en) * 2020-07-07 2020-10-30 华中师范大学 Robot action sequence generation method
CN113642394A (en) * 2021-07-07 2021-11-12 北京搜狗科技发展有限公司 Action processing method, device and medium for virtual object
CN113656640A (en) * 2021-08-23 2021-11-16 成都拟合未来科技有限公司 Fitness training method, system, device and medium
CN114245210A (en) * 2021-09-22 2022-03-25 北京字节跳动网络技术有限公司 Video playing method, device, equipment and storage medium
CN114399713A (en) * 2022-01-11 2022-04-26 袁鲁荣 Online dance teaching action analysis system
CN114842547A (en) * 2022-01-11 2022-08-02 南京工业大学 Sign language teaching method, device and system based on gesture action generation and recognition
CN116320534A (en) * 2023-03-23 2023-06-23 北京卡路里信息技术有限公司 Video production method and device

Also Published As

Publication number Publication date
CN116980654A (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN111556278B (en) Video processing method, video display device and storage medium
KR20220138398A (en) Creating Augmented Reality Sessions Using Skeleton Tracking
EP3889912B1 (en) Method and apparatus for generating video
US20180103201A1 (en) Device and method for panoramic image processing
CN116457829A (en) Personalized avatar real-time motion capture
CN116508063A (en) Body animation sharing and remixing
JP2005250950A (en) Marker presentation portable terminal, expanded sense of reality system, and its operation method
JP2008521110A (en) Personal device with image capture function for augmented reality resources application and method thereof
US10887195B2 (en) Computer system, remote control notification method and program
KR20170012979A (en) Electronic device and method for sharing image content
CN111242704B (en) Method and electronic equipment for superposing live character images in real scene
CN116546149B (en) Dance teaching interaction method, device, equipment and medium based on virtual digital person
CN110930220A (en) Display method, display device, terminal equipment and medium
CN112866773B (en) Display equipment and camera tracking method in multi-person scene
CN112073770B (en) Display device and video communication data processing method
CN111290722A (en) Screen sharing method, device and system, electronic equipment and storage medium
EP1667010A1 (en) Information processing device for setting background image, information display method, and program
CN116980654B (en) Interaction method, device, equipment and storage medium based on video teaching
CN107995538B (en) Video annotation method and system
CN113875227A (en) Information processing apparatus, information processing method, and program
CN111939561B (en) Display device and interaction method
CN116980717B (en) Interaction method, device, equipment and storage medium based on video decomposition processing
CN113076436A (en) VR device theme background recommendation method and system
WO2024051467A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN113587812B (en) Display equipment, measuring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant