WO2022116751A1

WO2022116751A1 - Interaction method and apparatus, and terminal, server and storage medium

Info

Publication number: WO2022116751A1
Application number: PCT/CN2021/127010
Authority: WO
Inventors: 丛延东
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2020-12-02
Filing date: 2021-10-28
Publication date: 2022-06-09
Also published as: CN112560605B; CN112560605A

Abstract

The embodiments of the present disclosure relate to an interaction method and apparatus, and a terminal, a server and a storage medium. The method may comprise: acquiring first image frame data of a user and displaying the first image frame data; identifying at least one human body part in the first image frame data, and determining position information of the human body part; determining a display position of an action icon on the first image frame data, and displaying the action icon at the display position; collecting second image frame data of the user and displaying the second image frame data; determining a target human body part, associated with the action icon, and state information of the target human body part in the second image frame data; and determining an evaluation result according to the matching degree between the state information of the target human body part in the second image frame data and the action icon. According to the embodiments of the present disclosure, the display position of an action icon can be dynamically adjusted according to the position of a human body part of a user, and the state information of the human body part of the user is accurately evaluated, so that the interaction experience of the user is improved.

Description

Interactive method, device, terminal, server and storage medium

This application claims the priority of the Chinese patent application with the application number 202011399864.7 and the application title "Interaction Method, Device, Terminal, Server and Storage Medium" filed with the China Patent Office on December 02, 2020, the entire contents of which are incorporated by reference in this application.

technical field

The present disclosure relates to the technical field of image processing, and in particular, to an interaction method, device, terminal, server and storage medium.

Background technique

At present, body recognition technology, as a branch of computer vision processing technology, has an increasingly wide range of applications, such as video-based fitness training, video-based dance teaching, and video-based game experience. How to apply the body recognition results of the user images collected by the camera to the guidance and evaluation of the user's body movements to improve the user's action experience is still a problem to be solved at present.

SUMMARY OF THE INVENTION

In order to solve the above technical problems or at least partially solve the above technical problems, the embodiments of the present disclosure provide an interaction method, apparatus, terminal, server, and storage medium, which can improve user interaction experience.

In a first aspect, an embodiment of the present disclosure provides an interaction method, which is applied to a client, including:

Collect the user's first image frame data and display it;

Identifying at least one human body part in the first image frame data, and determining the position information of the human body part;

determining the display position of the action icon based on the position information of the at least one body part and the preset position information of the action icon corresponding to the body part, and displaying the action icon at the display position;

collecting and displaying second image frame data of the user; wherein, the second image frame data is image frame data at a preset time point after the first image frame data;

determining the target body part associated with the action icon in the second image frame data and the state information of the target body part;

The evaluation result is determined according to the degree of matching between the state information of the target human body part and the action icon in the second image frame data.

In a second aspect, an embodiment of the present disclosure further provides an interaction method, applied to a server, including:

Obtaining multiple candidate videos, and extracting the position data of human body parts of each image frame in the multiple candidate videos;

Based on preset rules, the position data of human body parts of the same image frame in the plurality of candidate videos are fused to obtain a standard position data set;

Find the position data of the target human body part in the standard position data set in at least one image frame in the plurality of candidate videos;

The preset position information of the action icon corresponding to the target body part is determined by using the searched position data, so as to participate in determining the display position of the action icon in the image frame data displayed by the client.

In a third aspect, an embodiment of the present disclosure further provides an interaction device, which is configured on a client and includes:

a first acquisition module, configured to collect and display the first image frame data of the user;

a first determining module, configured to identify at least one human body part in the first image frame data, and determine the position information of the human body part;

A display position determination module, configured to determine the display position of the action icon based on the position information of the at least one human body part and the preset position information of the action icon corresponding to the human body part, and display it at the display position the action icon;

a second collection module, configured to collect and display second image frame data of the user; wherein, the second image frame data is image frame data at a preset time point after the first image frame data;

a second determination module, configured to determine the target human body part associated with the action icon in the second image frame data and the state information of the target human body part;

An evaluation module, configured to determine an evaluation result according to the degree of matching between the state information of the target human body part in the second image frame data and the action icon.

In a fourth aspect, an embodiment of the present disclosure further provides an interaction device, which is configured on a server and includes:

a position data extraction module, configured to obtain a plurality of candidate videos, and extract the position data of human body parts of each image frame in the plurality of candidate videos;

a standard position data set determination module, configured to fuse the body part position data of the same image frame in the plurality of candidate videos based on preset rules to obtain a standard position data set;

a position data search module, configured to search for the position data of the target body part in the standard position data set in at least one image frame of the plurality of candidate videos;

The preset position information determination module is used to determine the preset position information of the action icon corresponding to the target body part by using the searched position data, so as to participate in determining the position of the action icon in the image frame data displayed by the client. placement.

In a fifth aspect, an embodiment of the present disclosure further provides a terminal, including a memory, a processor, and a camera, wherein:

The camera is used to collect the user's image frame data in real time;

A computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes any interaction method provided by the embodiments of the present disclosure.

In a sixth aspect, an embodiment of the present disclosure further provides a server, including a memory and a processor, wherein: a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes Any of the interaction methods provided by the embodiments of the present disclosure.

In a seventh aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the processor executes the computer program provided by the embodiment of the present disclosure. of any interaction method.

In an eighth aspect, an embodiment of the present disclosure further provides a computer program product, wherein the computer program product includes computer program instructions, and when executed by a processor, the computer program instructions cause the processor to execute the present disclosure Any interaction method provided by the embodiment.

Compared with the prior art, the technical solutions provided by the embodiments of the present disclosure have at least the following advantages: in the embodiments of the present disclosure, the client can call the camera to collect the first image frame data and the second image frame data of the user in real time and display them, Among them, the first image frame data is the image frame data collected previously, first identify the body part in the first image frame data in real time and determine the position information of the body part, and then combine the preset position information of the action icon to determine that the action icon is in The exact display position on the first image frame data, that is, with the change of the position of the body part, the display position of the action icon on the first image frame data can be adjusted in real time (or called correction); finally, according to the second image frame data The matching degree between the state information of the target human body part associated with the action icon and the action icon is determined, and the evaluation result is determined. The embodiment of the present disclosure realizes the effective combination of the user image frame data collected by the camera and the action icon to be displayed in the image frame data, dynamically adjusts the display position of the action icon according to the position of the user's body part, and accurately evaluates the state of the user's body part information to improve the user's interactive experience.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the accompanying drawings that are required to be used in the description of the embodiments or the prior art will be briefly introduced below. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

FIG. 1 is a flowchart of an interaction method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of image frame data displaying action icons according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another interaction method provided by an embodiment of the present disclosure;

4 is a schematic diagram of image frame data showing an action icon and a guiding video animation provided by an embodiment of the present disclosure;

5 is a schematic diagram of image frame data showing an animation of an evaluation result provided by an embodiment of the present disclosure;

6 is a schematic diagram of displaying a shared video on the same screen according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of another interaction method provided by an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of an interaction apparatus according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of another interaction apparatus provided by an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a server according to an embodiment of the present disclosure.

Detailed ways

In order to more clearly understand the above objects, features and advantages of the present disclosure, the solutions of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other under the condition of no conflict.

Many specific details are set forth in the following description to facilitate a full understanding of the present disclosure, but the present disclosure can also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only a part of the embodiments of the present disclosure, and Not all examples.

FIG. 1 is a flowchart of an interaction method provided by an embodiment of the present disclosure, which is applied to a client. The method can be applied to the situation of how to combine the user image frame data collected by the camera in real time with the action icons to be displayed on the image frame data, and evaluate the state information of the human body part in the user image frame data collected in real time. In addition, the method can be executed by an interactive device configured on the client, and the device can be implemented by software and/or hardware. The client mentioned in the embodiment of the present disclosure may include any client with a video interaction function, and the terminal device on which the client is installed may include, but not limited to, a smart phone, a tablet computer, a notebook, and the like.

In the embodiment of the present disclosure, the types of the state information of the user's body parts may include, but are not limited to, the state information of the body parts related to dance games, dance training, fitness movements, and teaching actions, etc., that is, the embodiments of the present disclosure may be applicable to games , fitness and teaching and other application scenarios.

As shown in FIG. 1 , the interaction method provided by the embodiment of the present disclosure may include S101-S106:

S101. Collect and display first image frame data of a user.

Exemplarily, the user can pre-select the entire set of action video content to be completed, and before starting to execute the relevant action, touch the image capture control (or video recording control) on the client interface to trigger an image capture request, and the client responds. In response to the image collection request, the camera is called to collect the user's image frame data in real time and display it on the interface. The first image frame data may be any image frame data collected by the camera in real time, and the word "first" does not have any limited meaning in order.

S102. Identify at least one human body part in the first image frame data, and determine the position information of the human body part.

The body parts identified in the first image frame data include at least one of a head, an arm, a hand, a foot, and a leg. In addition, S102 can use the human body recognition technology to identify the human body parts in the collected user image frame data in real time, and simultaneously determine the position information of the human body parts, and the position information may specifically be the position information of key points on the human body parts. Regarding the implementation principle of the human body recognition technology, reference may be made to the prior art, which is not specifically limited in the embodiments of the present disclosure.

S103 . Based on the position information of at least one human body part and the preset position information of the action icon corresponding to the human body part, determine the display position of the action icon, and display the action icon at the display position.

The preset position information of the action icon is used to constrain the display position of the action icon in the user image frame data, and can be pre-determined by the server in the development stage, and then delivered to the client. The preset position information of the action icon may include relative position information between the to-be-displayed position of the action icon and the corresponding body part.

The client may determine whether an action icon needs to be displayed in the currently collected first image frame data based on the collection time information (or video recording time information) of the user's first image frame data. For example, in the case of recording a dance action video with a duration of 30 seconds, it is preset that when the video is recorded at the 5th, 15th, and 25th seconds, the action needs to be displayed in the user image frame data collected in real time. Therefore, when the user completes the dance action, the client can determine whether the action icon needs to be displayed in the current image frame data based on the acquisition time information of the user's current image frame data or the current recording time information of the dance action video. In addition, the collection time information of the image frame data and the video recording time information can be mutually determined. If the client records the collection time information of the first frame of user image data collected as 0 seconds, the collection time of the current image frame data is the time of the video. recording time.

The client can also determine whether the action icon needs to be displayed in the current image frame data based on the predetermined display correspondence between the specific image frame data and the action icon. For example, if the client collects the user image frame data showing the specified body movement, it will display the action icon in the image frame data, and the specified body movement is the body that needs to exist in the image frame data specified in the display corresponding relationship. action.

During the real-time collection of the user's image frame data, the client dynamically determines that the action icon is in the user's The display position in the image frame data, so as to ensure the accurate display of the action icon in the user's image frame data. Taking a dance game scene as an example, the preset position information of the action icons may also be called chart information, which defines the relative position information of the human body parts and the action icons, and the action icons may also be called note points.

In a possible implementation manner, the preset position information of the action icon may be obtained based on the position data of the body part corresponding to the action icon in the standard data set. Specifically, for a complete action video, the image frame data in which the action icon needs to be displayed may be predetermined according to the display requirement of the action icon (for example, the action icon is displayed at a specified moment of video recording). Taking a dance game with a duration of 20 seconds as an example, in the game development stage, the developer can pre-determine that when the dance game progresses to the Nth second, an action icon will be displayed at a preset position in the user's image frame data, such as the shoulder. , and then the developer determines the relative position information of the action icon and the preset position based on the position data of the preset position in the standard data set in the image frame data of the Nth second, as the preset position information of the action icon.

The standard dataset is obtained by fusing the position data of human body parts of the same image frame in multiple candidate videos (referring to at least two) based on preset rules. The same image frame in multiple candidate videos (for example, the Nth frame in each candidate video) presents the same state information of human body parts, for example, presents the same human body action information, such as dance videos recorded for different people of the same dance, namely Can be used as a candidate video. The position data of human body parts in each frame of image data in each candidate video can be obtained by using a motion capture system to perform motion capture. Exemplarily, the server may determine the weight value of each candidate video; and then, based on the weight value of each candidate video, perform a weighted average calculation on the position data of human body parts of the same image frame in the multiple candidate videos to obtain a standard position data set. The weight value of each candidate video can be determined according to video interaction information and/or video publisher information. For example, the higher the amount of video interaction information, the higher the video weight value; if the video publisher is a high-profile person, the higher the video weight value is. big.

Further, the plurality of candidate videos may be obtained based on preset video screening information, the preset video screening information includes video interaction information and/or video publisher information, and the video interaction information includes the likes and/or comments of the videos. Exemplarily, in this embodiment of the present disclosure, for videos showing the same state information of human body parts, the Internet data can be screened to screen out the likes exceeding the likes threshold, the commenting Published videos as candidate videos. Each threshold can be flexibly set.

By fusing the position data of human body parts of the same image frame in multiple candidate videos, a standard position data set is obtained, which can integrate the position characteristics of human body parts of different people, reasonably optimize the position information of human body parts displayed in the video, and improve the video quality. , and optimize the placement of action icons. At the same time, it also helps to improve the public's recognition and acceptance of the optimized video effects.

In addition, under the condition of ensuring the visual effect of the interface, the action icon can be displayed in the user image frame data in any available style, and the display sample can include the shape, color, dynamic effect, and static effect of the action icon. The situation is designed in advance, and is not specifically limited in the embodiment of the present disclosure.

FIG. 2 is a schematic diagram of image frame data showing an action icon provided by an embodiment of the present disclosure, which is used to illustrate the embodiment of the present disclosure and should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in FIG. 2 , the current image frame data of the user displays a circular first action icon 21 and an arrow-shaped second action icon 22 . The first action icon 21 may be used to guide the user to move the hand to the position of the first action icon 21, and the second action icon 22 may be used to guide the user to swipe the hand in the direction of the arrow. The number of action icons that can be displayed in each image frame data is not specifically limited in this embodiment of the present disclosure.

S104: Collect and display second image frame data of the user; wherein, the second image frame data is image frame data at a preset time point after the first image frame data.

The collection interval between the second image frame data and the first image frame data is not specifically limited in this embodiment of the present disclosure, that is, the specific value of the preset time point can be set flexibly. The second image frame data or the first image frame data do not specifically refer to a specific frame of image data, and can be used to refer to multiple frames of image data, but there is an order of image acquisition. With the real-time collection of the user image frame data, the state information of the user's body part displayed in the first image frame data and the second image frame data may change continuously. In addition, the action icon displayed in the first image frame data may continue to be displayed in the second image frame data, or may not be displayed based on the determined display position of the action icon.

In the embodiment of the present disclosure, since the image frame data of the user is collected in real time, the collection time interval between the first user image frame data and the second image frame data is usually very small. Therefore, based on the determined display position of the action icon, Continuing to display the second image frame data will not cause a large change in the display position of the action icon, that is, the display positions of the action icon in the first image frame data and the second image frame data are consistent to a certain extent. Of course, after collecting the second image frame data of the user, at least one human body part in the second image frame data can also be identified, and the position information of the human body part can be determined, and then based on the position information of the at least one human body part, and the relationship with the human body part The preset position information of the corresponding action icon determines the display position of the action icon in the second image frame data, and displays it.

In a possible implementation manner, the action icon includes an emoticon icon, and in the process of displaying the first image frame data or displaying the second image frame data, it further includes:

Identify the user's expression in the first image frame data or the second image frame data, and determine the expression icon matching the user's expression;

Based on the position information of the facial features forming the user's expression on the first image frame data or the second image frame data, the display position of the expression icon is determined, and the expression icon is displayed at the determined display position.

For example, if the user's expression in the first image frame data or the second image frame data is identified as Duzui based on the facial expression recognition technology, the expression icon matching Duzui is determined to be "heart" or "kiss", and then based on the facial expression recognition technology The position of the user's mouth, determine the preset area of the mouth (specifically can be set flexibly) as the display position of "love" or "kiss", and display the special effect icon of "love" or "kiss" in the preset area, so as to Make the interaction more interesting.

S105. Determine the target human body part associated with the action icon in the second image frame data and the state information of the target human body part.

The target body part associated with the action icon in the second image frame data is related to the action video content to be completed pre-selected by the user. In a possible implementation manner, the client may determine the target human body part associated with the action icon in the second image frame data based on the playback time information of the background music or the collection time information of the second image frame data. At least one of a head, arms, hands, feet, and legs may be included. For example, when the background music is played to the Nth second, or the collection time of the second image frame data is the Nth second, it is determined that the target human body part associated with the action icon in the second image frame data is the user's hand.

The state information of the target body part includes position information of the target body part and/or action information formed by the target body part. For example, when the background music is played to the Nth second, or the collection time of the second image frame data is the Nth second, the user's hand is placed on the user's shoulder, or the user's hand presents an OK gesture, or the user's hand presents a clapping. action etc.

S106: Determine the evaluation result according to the matching degree between the state information of the target human body part and the action icon in the second image frame data.

In the embodiment of the present disclosure, for each action icon, its preset position information in the image frame data and the associated standard action information of the body part can be preset, that is, according to the different state information of the body part, the body part The degree of matching between the state information of , and the action icon may include the degree of position matching and the degree of action matching. Therefore, the client may determine the user's evaluation result in the second image frame data based on the matching results of multiple dimensions. The higher the matching degree, the better the evaluation result. The evaluation result can be displayed in the second image frame data. The evaluation results can be realized in the form of numbers, text and/or English, and dynamic special effects can also be added during the presentation to enhance the visual effect of the interface.

During the real-time collection of user image frame data, after the client determines the user's evaluation result in the current image frame data, it can also combine the user's evaluation result in the previously collected image frame data to determine the user's cumulative evaluation result, and carry out exhibit. Of course, if the user's evaluation result in the current image frame data is poor, the accumulated evaluation result can also be cleared.

In a possible implementation, taking the state information of the target human body part associated with the action icon including the position information of the human body part as an example, according to the matching degree between the state information of the target human body part and the action icon in the second image frame data, Determine the results of the assessment, including:

determining the effective response area of the action icon in the second image frame data;

Determine the position matching degree between the position information of the target body part and the effective response area of the action icon, and determine the evaluation result according to the position matching degree.

Wherein, the effective response area of the action icon may be determined according to the display position and/or display style of the action icon, for example, an area with a preset size and a preset shape may be determined based on the display position of the action icon as the effective response area of the action icon, or The shape area corresponding to the display style of the action icon may be determined as its effective response area, or based on the shape area of the action icon, an area of a preset shape with an area smaller or larger than its shape area may be determined as its effective response area; or , and determine the effective response area based on the display position and display style of the action icon, which can be set flexibly. How to determine the effective response area of the action icon may be predetermined by the server.

If the target human body part is within the effective response area of the action icon, and the distance between the position of the human body part and the center of the effective response area is less than the first distance threshold (the value can be set flexibly), then the position of the human body part is the same as that of the action icon. The position matching degree of the effective response area is high, otherwise, the position matching degree is poor when one item is not satisfied. It can be seen that the higher the position matching degree, the better the evaluation result.

Of course, other methods that can be used to determine the position matching degree between the target body part and the action icon can also be flexibly adopted by those skilled in the art. For example, it is also possible to directly calculate the distance between the display position coordinates of the action icon in the first image frame data or the second image frame data and the position coordinates of the target body part in the second image frame data, if the calculated distance value is less than The second distance threshold (the value can be set flexibly), the position of the associated human body part and the action icon in the second image frame data is highly matched, and the corresponding evaluation result is better; if the calculated distance value is greater than or equal to the second distance If the threshold is set, the position of the associated body part and the action icon in the second image frame data is poorly matched, and the corresponding evaluation result is poor.

In a possible implementation, taking the state information of the target human body part associated with the action icon including the action information formed by the human body part as an example, according to the matching degree between the state information of the target human body part and the action icon in the second image frame data , to determine the assessment results, including:

Determine the standard action information corresponding to the action icon; wherein, the standard action information corresponding to different action icons can be determined in the server in advance;

Determine the action matching degree between the action information formed by the target human body part and the standard action information in the second image frame data, and determine the evaluation result according to the action matching degree.

The action information formed by the body parts includes but is not limited to dance game action information. Exemplarily, based on the key point matching technology, the action matching degree between the action information formed by the target human body part and the standard action information in the second image frame data can be determined, for example, for the OK gesture, the user's hand when the OK gesture is presented can be extracted respectively. The key point coordinates are then compared with the hand key point coordinates corresponding to the standard OK gesture to determine the action matching degree.

In the embodiment of the present disclosure, the client can call the camera to collect and display the first image frame data and the second image frame data of the user in real time, wherein the first image frame data is the image frame data collected previously, and the first image frame data is identified in real time. The human body part in the first image frame data and the position information of the human body part are determined, and then combined with the preset position information of the action icon, the accurate display position of the action icon on the first image frame data is determined, that is, with the change of the position of the human body part , the display position of the action icon on the first image frame data can be adjusted in real time (or called correction); finally, according to the matching degree of the state information of the target human body part associated with the action icon in the second image frame data and the action icon, Determine the assessment results. The embodiment of the present disclosure realizes the effective combination of the user image frame data collected by the camera and the action icon to be displayed in the image frame data, dynamically adjusts the display position of the action icon according to the position of the user's body part, and accurately evaluates the state of the user's body part information to improve the user's interactive experience.

FIG. 3 is a flowchart of another interaction method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the foregoing technical solution, and may be combined with the foregoing optional implementation manners.

As shown in FIG. 3, the interaction method provided by the embodiment of the present disclosure may include S201-S209:

S201. Collect and display the first image frame data of the user.

S202. Identify at least one human body part in the first image frame data, and determine the position information of the human body part.

S203. Determine the display position of the action icon based on the position information of at least one body part and the preset position information of the action icon corresponding to the body part.

It should be noted that, for the relevant content of S201-S203, please refer to the relevant content of S101-S103 above, respectively.

S204. Determine the display style of the action icon based on the playback time information of the background music or the collection time information of the first image frame data.

According to the different playing time of background music (for example, playing until the 3rd second or playing until the 7th second), or different image frame data collection time (or video recording time), which display method is used to display the action icon, in the action development stage It has been determined in advance. Therefore, the client can determine the current display style of the action icon according to the playback time information of the current background music or the collection time information of the first image frame data. The display styles of action icons at different time points can be the same or different.

S205, the action icon is displayed in a display style at the display position.

S206: Collect and display second image frame data of the user; wherein, the second image frame data is image frame data at a preset time point after the first image frame data.

It should be noted that, for the relevant content of S206, please refer to the relevant content of S104 above.

S207 , displaying guidance information on the second image frame data to guide the user to change the state information of the target body part associated with the action icon.

Wherein, the guidance information includes at least one of a guidance video animation, a guidance picture and a guidance instruction. In addition, the guiding instructions can also be played in the form of voice. The guidance information may be obtained based on the standard data set in the foregoing embodiments. Taking a guide video animation or a guide picture as an example, it can be obtained by importing the standard data set in the foregoing embodiment into a human body model and performing image processing. Specifically, developers can use the server to import standard data sets into the human body model based on the existing 3D animation production principles, and generate guided video animations through model rendering, or obtain guiding pictures in the form of screenshots, and then send them to the client from the server. middle.

The standard data set integrates the location characteristics of different people's body parts, and obtaining guidance information based on the standard data set can improve the reference value of the guidance information and improve the public's recognition and acceptance of the guidance information.

The guidance information may be directly superimposed and displayed in the second image frame data, or may be displayed in the second image frame data in the form of an independent play window or the like. The specific display position of the guidance information in the second image frame data is not limited in the embodiment of the present disclosure, and may be, for example, the lower right, upper right, upper left, or lower left of the image.

Further, during the real-time collection of user image frame data, the client can also dynamically adjust the display position of the guidance information based on the position of the user's body parts in the image frame data, so as to avoid overlapping display of body parts and guidance information. For example, if the client detects that the user's limb is positioned to the right in the second image frame data, the guide information may be displayed at the position that is biased to the left in the second image frame data.

FIG. 4 is a schematic diagram of image frame data showing action icons and guiding video animation provided by an embodiment of the present disclosure, which is used to illustrate the embodiment of the present disclosure and should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in FIG. 4 , the current image frame data displays a first action icon 21 and a second action icon 22 ; at the same time, a guide video animation 23 is displayed at the lower left of the current image frame data to guide the user to complete correct body movements.

S208: Determine the target human body part associated with the action icon and the state information of the target human body part in the second image frame data.

S209: Determine the evaluation result according to the matching degree between the state information of the target human body part and the action icon in the second image frame data.

It should be noted that, for the relevant content of S208-S209, please refer to the relevant content of S105-S106 above, respectively.

On the basis of the above technical solution, after determining the evaluation result according to the matching degree between the state information of the target human body part and the action icon in the second image frame data, the method further includes:

The evaluation result animation is determined according to the evaluation result; the specific implementation of the evaluation result animation (or called action determination animation) can be flexibly set, and the embodiment of the present disclosure does not make specific limitations;

Using the display position of the action icon, the animation display position of the evaluation result animation in the second image frame data is determined, and the evaluation result animation is displayed in the animation display position.

Exemplarily, the display position of the evaluation result animation may or may not coincide with the display position of the action icon. For example, after the evaluation result animation is determined, the evaluation result animation can be displayed on the display position of the action icon, and the action icon is hidden at the same time, so as to generate an interface effect of special switching and transformation.

By displaying the evaluation result animation, the visual effect of the interface can be improved, and the user's video recording can be more interesting. FIG. 5 is a schematic diagram of image frame data showing an evaluation result animation provided by an embodiment of the present disclosure, which is used to illustrate the embodiment of the present disclosure and should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in Figure 5, the position of the user's hand and limb movements has a high degree of matching with the position of the action icon at the shoulder (that is, the degree of coincidence with the effective response area of the action icon is high), and the evaluation result of the user's hand and limb movements is: Perfect, therefore, a circular evaluation result animation 51 is displayed in the image frame data, and the word "perfect" is displayed in the evaluation result animation. The evaluation result animation 51 can dynamically change the size of the circle and change the display color during the presentation process. Image frame data showing animation of evaluation results can be used as valid video frame data.

On the basis of the above technical solution, in a possible implementation manner, after determining the evaluation result according to the matching degree between the state information of the target body part and the action icon in the second image frame data, the method further includes:

Based on the collected first image frame data and the second image frame data, the first shared video is generated; since the user's image frame data belongs to the image sequence collected in real time, it can be obtained based on the first image frame data and the second image frame data. A complete user video, and action icons, guidance information, and evaluation result animations can be displayed in the corresponding image frame data in the shared video;

According to the user's video sharing operation, a first video sharing request is sent to the server; wherein, the first video sharing request carries the first sharing video and the user identifier of the sharing object, and the user identifier of the sharing object is used by the server to determine the first sharing object to share. Two shared videos; the second shared video and the first shared video may be videos of the same action content recorded by different people; the number of shared objects may be one or more, correspondingly, the second shared video may refer to one video or multiple videos;

The composite video returned by the server is received; wherein, the composite video is obtained by the server synthesizing the first shared video and the second shared video for display on the same screen. For the specific implementation of video synthesis, reference may be made to the prior art. The same-screen display can be a left-right split-screen display, or a top-bottom split-screen display. Depending on the number of users participating in the video sharing, the same-screen display method is different.

Exemplarily, after the current client generates the first sharing video, it can switch from the current interface to the sharing object selection interface according to the sharing object selection operation triggered by the current user, so that the current user can determine at least one sharing object and obtain the selected sharing object by the current user. After sharing the user identification of the object, switch to the current interface again, and generate a first video sharing request and send it to the server according to the video sharing operation triggered by the current user. For the client controlled by the shared object, it can also perform the same operation as the foregoing operation, so as to realize the sharing of the second shared video to the server. In addition, the client controlled by the current user (ie the sharing initiator) and the client controlled by the sharing object may simultaneously send a video sharing request to the server on the basis of user communication. After the server completes the video synthesis, it can send the synthesized video to the client controlled by the current user and the client controlled by the sharing object respectively.

FIG. 6 is a schematic diagram of displaying a shared video on the same screen provided by an embodiment of the present disclosure, specifically taking two people participating in video sharing as an example to illustrate the embodiment of the present disclosure, and should not be construed as a specific limitation of the embodiment of the present disclosure. As shown in FIG. 6 , user A and user B are each other's sharing objects, and the client controlled by the sharing initiator and the client controlled by the sharing object can simultaneously display the shared videos of the two. In Figure 6, the display position of the action icon is above the shoulder, and the hand of user A is displayed above the shoulder, that is, the matching degree of the hand position and the display position of the action icon is high, and the evaluation result of user A is perfect; B's hand is displayed on the right side of the body, that is, the match between the hand position and the display position of the action icon is low, and the evaluation result of user B is average. Moreover, for different evaluation results, different evaluation result animations are shown in Fig. 6. For example, for user A, the evaluation result animation is an animation formed by a star pattern, and the word "perfect" is displayed in the star pattern; B, The animation of the evaluation result is an animation formed by a circular pattern, and the word "general" is displayed in the circular pattern.

On the basis of the above technical solution, in a possible implementation manner, before displaying the first image frame data, the method further includes:

According to the user's image synchronization operation, the current mode is switched to the image synchronization sharing mode; that is, in the image synchronization sharing mode, after the current user determines the sharing object, the image frame data obtained locally in real time can be displayed at the same time as the image frame data controlled by the sharing object. For the image frame data obtained by the client in real time, the display effect on the same screen can refer to the display effect shown in Figure 6.

Correspondingly, in the process of displaying the first image frame data and the second image frame data, the method further includes:

Receive the first shared image frame data in real time, and display the first shared image frame data and the first image frame data on the same screen;

Receive the second shared image frame data in real time, and display the second shared image frame data and the second image frame data on the same screen;

The first shared image frame data and the second shared image frame data are shared in real time by the sharing object, and the sharing object is predetermined by the user.

Exemplarily, the synchronous display of the first shared image frame data and the second shared image frame data between different clients can be realized directly through the interaction between the client and the client; The data transfer between them realizes the synchronous display of the first shared image frame data and the second shared image frame data between different clients.

Exemplarily, the sharing object may be determined before or after the current user triggers the video synchronization operation, and after the client controlled by the current user is switched from the current mode to the image synchronization sharing mode, a mode switching notification may be sent to the server. The user identifier of the shared object can be carried to notify the server to send the received first shared image frame data and second shared image frame data shared by the shared object in real time to the client controlled by the current user in real time. At the same time, when the client controlled by the current user displays the image frame data obtained in real time, it also shares the image frame data to the server in real time, so that the client controlled by the sharing object can also display the image frame of the current user synchronously after performing the aforementioned operations. data. Contents such as action icons, guidance information, and evaluation result animations can also be displayed synchronously during the display of image frame data on the same screen. In addition, the client currently controlled by the user and the client controlled by the shared object can be switched to the image synchronization sharing mode at the same time on the basis of mutual communication between users.

In the embodiments of the present disclosure, image frame data of different users can be displayed on the same screen in the same client through image sharing and synthesis, which improves the interest of image interaction or video interaction.

7 is a flowchart of another interaction method provided by an embodiment of the present disclosure, applied to a server, and the method may be executed by an interaction apparatus configured on the server, and the apparatus may be implemented by software and/or hardware.

The interaction method applied to the server provided by the embodiment of the present disclosure may be executed in cooperation with the interaction method applied to the client provided by the embodiment of the present disclosure. For content not described in detail in the following embodiments, reference may be made to the explanations in the above embodiments.

As shown in FIG. 7 , the interaction method provided by the embodiment of the present disclosure may include S301-S304:

S301. Acquire multiple candidate videos, and extract the position data of human body parts of each image frame in the multiple candidate videos.

S302 , fuse the position data of human body parts of the same image frame in the multiple candidate videos based on a preset rule to obtain a standard position data set.

S303: Search for the position data of the target human body part in at least one image frame in the multiple candidate videos in the standard position data set.

S304. Determine the preset position information of the action icon corresponding to the target body part by using the searched position data, so as to participate in determining the display position of the action icon in the image frame data displayed by the client.

While determining the preset position information of the action icon, the standard action information corresponding to the action icon may also be determined based on the action information formed in the image frame by the target body part corresponding to the action icon.

In a possible implementation, acquiring multiple candidate videos, and extracting body part position data of each image frame in the multiple candidate videos, including:

Obtain a plurality of candidate videos based on the preset video screening information; wherein, the preset video screening information includes video interaction information and/or video publisher information, and the video interaction information includes the amount of likes and/or comments of the video;

Extract body part position data for each image frame in multiple candidate videos.

In a possible implementation manner, the interaction method provided by the embodiment of the present disclosure further includes:

Generate guidance information based on standard location datasets;

Sending guidance information to the client, so that the client displays the guidance information on the collected user image frame data, and guides the user to change the state information of the target body part associated with the action icon in the image frame data.

In a possible implementation manner, the guide information includes at least one of a guide video animation, a guide picture and a guide instruction.

In a possible implementation manner, based on preset rules, the position data of human body parts of the same image frame in multiple candidate videos are fused to obtain a standard position data set, including:

Determine the weight value of each candidate video;

Based on the weight value of each candidate video, a weighted average calculation is performed on the position data of the human body parts of the same image frame in the multiple candidate videos to obtain a standard position data set.

Receiving the first video sharing request sent by the client; wherein, the first video sharing request carries the first sharing video and the user ID of the sharing object, and the first sharing video is collected by the client based on the first image frame data and the second image frame. data generation;

Determine the second shared video shared by the shared object based on the user identification of the shared object; wherein, the second shared video and the first shared video may include image frames of human body parts showing the same state information;

Combine the first shared video and the second shared video into a composite video displayed on the same screen;

Send the composite video to the client.

Receive the first shared image frame data shared by the shared object in real time; wherein, the shared object is predetermined by the user;

Send the first shared image frame data to the client in real time, so that the client displays the first shared image frame data and the locally collected first image frame data on the same screen; wherein, the first shared image frame data and the client's local The collected first image frame data can show human body parts with the same state information;

receiving the second shared image frame data shared by the sharing object in real time; wherein, the sharing object is predetermined by the user;

Send the second shared image frame data to the client in real time, so that the client displays the second shared image frame data and the locally collected second image frame data on the same screen; wherein, the second shared image frame data and the client locally The collected second image frame data may show human body parts with the same state information.

In the embodiment of the present disclosure, the server may determine a standard position data set based on the position data of the human body parts of each image frame in the multiple candidate videos, and then determine the target human body based on the position data of the target human body part in the standard position data set. The preset position information of the action icon corresponding to the part is sent to the client, so that the client can dynamically determine the accuracy of the action icon in the image frame data in combination with the position information of the human body part identified from the currently displayed image frame data Display position, that is to achieve the effect of dynamically adjusting the display position of the action icon in the user's image frame data as the position of the user's body parts changes; at the same time, the client also based on the real-time collection of the user's image frame data in the target associated with the action icon The matching degree between the state information of the human body part and the action icon determines the evaluation result. The embodiment of the present disclosure realizes the effective combination of the user image frame data collected by the camera and the action icon to be displayed in the image frame data, dynamically adjusts the display position of the action icon according to the position of the user's body part, and accurately evaluates the state of the user's body part information to improve the user's interactive experience. In addition, through the interaction between the server and the client, the shared video of multiple people can be displayed on the same screen in the client, which improves the fun of video sharing.

FIG. 8 is a schematic structural diagram of an interaction apparatus according to an embodiment of the present disclosure. The apparatus may be configured in a client, and may be implemented by software and/or hardware. The client mentioned in the embodiment of the present disclosure may include any client with a video interaction function, and the terminal device on which the client is installed may include, but not limited to, a smart phone, a tablet computer, a notebook, and the like.

As shown in FIG. 8 , the interaction apparatus 400 provided by the embodiment of the present disclosure may include a first collection module 401 , a first determination module 402 , a display position determination module 403 , a second collection module 404 , a second determination module 405 , and an evaluation module 406 ,in:

The first collection module 401 is used to collect and display the first image frame data of the user;

a first determining module 402, configured to identify at least one human body part in the first image frame data, and determine the position information of the human body part;

The display position determination module 403 is configured to determine the display position of the action icon based on the position information of at least one human body part and the preset position information of the action icon corresponding to the human body part, and display the action icon at the display position;

The second collection module 404 is configured to collect and display the second image frame data of the user; wherein, the second image frame data is the image frame data at a preset time point after the first image frame data;

The second determination module 405 is configured to determine the target human body part associated with the action icon and the state information of the target human body part in the second image frame data;

The evaluation module 406 is configured to determine the evaluation result according to the matching degree between the state information of the target human body part and the action icon in the second image frame data.

In a possible implementation manner, the state information of the target body part includes position information of the target body part and/or action information formed by the target body part.

In a possible implementation manner, the state information of the target body part includes position information of the target body part;

Evaluation module 406 includes:

an effective response area determination unit, used to determine the effective response area of the action icon in the second image frame data;

The first evaluation result determination unit is configured to determine the position matching degree between the position information of the target human body part and the effective response area of the action icon, and determine the evaluation result according to the position matching degree.

In a possible implementation manner, the state information of the target body part includes action information formed by the target body part;

Evaluation module 406 includes:

a standard action information determining unit, used for determining standard action information corresponding to the action icon;

The second evaluation result determination unit is configured to determine the action matching degree between the action information formed by the target human body part in the second image frame data and the standard action information, and determine the evaluation result according to the action matching degree.

In a possible implementation manner, the preset position information of the action icon is obtained based on the position data of the body part corresponding to the action icon in the standard data set;

The standard dataset is obtained by fusing the position data of human body parts of the same image frame in multiple candidate videos based on preset rules.

In a possible implementation, the plurality of candidate videos are obtained based on preset video screening information, the preset video screening information includes video interaction information and/or video publisher information, and the video interaction information includes video likes and/or or comment volume.

In a possible implementation manner, the interaction apparatus 400 provided by the embodiment of the present disclosure further includes:

The guide information display module is used for displaying guide information on the second image frame data, so as to guide the user to change the state information of the target body part associated with the action icon.

In a possible implementation, the placement determination module 403 includes:

a display position determination unit, configured to determine the display position of the action icon based on the position information of at least one human body part and the preset position information of the action icon corresponding to the human body part;

Action icon display unit, used to display the action icon in the display position;

The action icon display unit includes:

a display style determination subunit, used for determining the display style of the action icon based on the playback time information of the background music or based on the collection time information of the first image frame data;

The action icon display subunit is used to display the action icon in the display style in the display position.

The evaluation result animation determination module is used to determine the evaluation result animation according to the evaluation result;

The animation display module is used to determine the animation display position of the evaluation result animation in the second image frame data by using the display position of the action icon, and display the evaluation result animation in the animation display position.

In a possible implementation manner, the second determining module 405 includes:

an associated human body part determination unit for determining the target human body part associated with the action icon in the second image frame data;

a state information determination unit, used to determine the state information of the target human body part;

The associated human body part determining unit is specifically configured to: determine the target human body part associated with the action icon in the second image frame data based on the playing time information of the background music or the collection time information of the second image frame data.

In a possible implementation manner, the action icon includes an emoticon icon, and the interaction apparatus 400 provided in this embodiment of the present disclosure further includes:

a user expression recognition module, used to identify the user expression in the first image frame data or the second image frame data, and determine the expression icon matching the user expression;

The expression icon display module is used to determine the display position of the expression icon based on the position information of the facial features forming the user's expression on the first image frame data or the second image frame data, and display the expression icon in the determined display position.

a first shared video generation module, configured to generate a first shared video based on the collected first image frame data and second image frame data;

A sharing request sending module, configured to send a first video sharing request to the server according to the user's video sharing operation; wherein, the first video sharing request carries the first sharing video and the user ID of the sharing object, and the user ID of the sharing object is used for The server determines the second shared video shared by the shared object;

The composite video receiving module is used to receive the composite video returned by the server; wherein, the composite video is obtained by the server after synthesizing the first shared video and the second shared video and displaying on the same screen.

The mode switching module is used to switch from the current mode to the image synchronization sharing mode according to the user's image synchronization operation;

a first on-screen display module, configured to receive the first shared image frame data in real time, and display the first shared image frame data and the first image frame data on the same screen;

The second on-screen display module is used to receive the second shared image frame data in real time, and display the second shared image frame data and the second image frame data on the same screen;

In a possible implementation manner, the action information formed by the body parts includes dance game-like action information.

In a possible implementation manner, the body part identified in the first image frame data or the second image frame data includes at least one of a head, an arm, a hand, a foot, and a leg.

The interaction device configured on the client provided by the embodiment of the present disclosure can execute any interaction method applied to the client provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. For the content that is not described in detail in the apparatus embodiment of the present disclosure, reference may be made to the description in any method embodiment of the present disclosure.

FIG. 9 is a schematic structural diagram of another interaction apparatus provided by an embodiment of the present disclosure. The apparatus may be configured in a server, and may be implemented by software and/or hardware.

As shown in FIG. 9 , the interaction apparatus 500 provided by the embodiment of the present disclosure may include a position data extraction module 501, a standard position data set determination module 502, a position data search module 503, and a preset position information determination module 504, wherein:

The position data extraction module 501 is used to obtain a plurality of candidate videos, and extract the position data of human body parts of each image frame in the plurality of candidate videos;

The standard position data set determination module 502 is configured to fuse the body part position data of the same image frame in the multiple candidate videos based on preset rules to obtain a standard position data set;

The position data search module 503 is used to search for the position data of the target human body part in the standard position data set in at least one image frame in the multiple candidate videos;

The preset position information determination module 504 is used to determine the preset position information of the action icon corresponding to the target body part by using the searched position data, so as to participate in determining the display position of the action icon in the image frame data displayed by the client.

In a possible implementation manner, the location data extraction module 501 includes:

A video screening unit, configured to obtain multiple candidate videos based on preset video screening information; wherein the preset video screening information includes video interaction information and/or video publisher information, and the video interaction information includes video likes and/or the amount of comments;

The position data extraction unit is used for extracting the position data of human body parts of each image frame in the multiple candidate videos.

In a possible implementation manner, the interaction apparatus 500 provided by the embodiment of the present disclosure further includes:

a guidance information generation module for generating guidance information based on a standard location data set;

The guidance information sending module is used to send guidance information to the client, so that the client can display the guidance information on the collected user image frame data, and guide the user to change the state information of the target body part associated with the action icon in the image frame data.

In a possible implementation, the standard location data set determination module 502 includes:

a video weight determination unit for determining the weight value of each candidate video;

The standard position data set determination unit is configured to perform weighted average calculation on the position data of human body parts of the same image frame in the multiple candidate videos based on the weight value of each candidate video to obtain the standard position data set.

The video sharing request receiving module is used for receiving the first video sharing request sent by the client; wherein, the first video sharing request carries the first sharing video and the user identifier of the sharing object, and the first sharing video is obtained by the client based on the collected first video image frame data and second image frame data are generated;

a shared video determination module, configured to determine the second shared video shared by the shared object based on the user identification of the shared object; wherein the second shared video and the first shared video may include image frames of human body parts showing the same state information;

a video synthesis module, used to synthesize the first shared video and the second shared video into a composite video displayed on the same screen;

The composite video sending module is used to send the composite video to the client.

a first shared image receiving module, configured to receive the first shared image frame data shared by the shared object in real time; wherein, the shared object is predetermined by the user;

The first shared image sending module is used for sending the first shared image frame data to the client in real time, so that the client displays the first shared image frame data and the locally collected first image frame data on the same screen; A shared image frame data and the first image frame data locally collected by the client can display human body parts with the same state information;

The second shared image receiving module is configured to receive the second shared image frame data shared by the shared object in real time; wherein, the shared object is predetermined by the user;

The second shared image sending module is used to send the second shared image frame data to the client in real time, so that the client can display the second shared image frame data and the locally collected second image frame data on the same screen; The two-shared image frame data and the second image frame data locally collected by the client can display human body parts with the same state information.

The interaction device configured on the server provided by the embodiment of the present disclosure can execute the interaction method applied to the server provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. For the content that is not described in detail in the apparatus embodiment of the present disclosure, reference may be made to the description in any method embodiment of the present disclosure.

FIG. 10 is a schematic structural diagram of a terminal provided by an embodiment of the present disclosure, which is used to exemplarily describe a terminal that implements the interaction method provided by the embodiment of the present disclosure. The terminals in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), in-vehicle terminals (eg, in-vehicle terminals) mobile terminals such as navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The terminal shown in FIG. 10 is only an example, and should not impose any limitations on the functions and occupancy scope of the embodiments of the present disclosure.

As shown in FIG. 10 , the terminal 600 includes one or more processors 601 , a memory 602 and a camera 605 .

The camera 605 is used to collect image frame data of the user in real time.

Processor 601 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in terminal 600 to perform desired functions.

Memory 602 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory, among others. Non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 601 may execute the program instructions to implement the interaction method applied to the client provided by the embodiments of the present disclosure, and may also implement other desired functions. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

The interaction method applied to the client may include: collecting and displaying the first image frame data of the user; identifying at least one human body part in the first image frame data, and determining the position information of the human body part; based on the position of the at least one human body part information, and the preset position information of the action icon corresponding to the body part, determine the display position of the action icon, and display the action icon at the display position; collect the second image frame data of the user and display; wherein, the second image frame data is image frame data at a preset time point after the first image frame data; determine the target body part associated with the action icon in the second image frame data and the state information of the target body part; according to the state of the target body part in the second image frame data The matching degree of the information and the action icon determines the evaluation result.

It should be understood that the terminal 600 may also perform other optional implementations provided by the method embodiments of the present disclosure.

In one example, the terminal 600 may also include an input device 603 and an output device 604, these components being interconnected by a bus system and/or other form of connection mechanism (not shown).

In addition, the input device 603 may also include, for example, a keyboard, a mouse, and the like.

The output device 604 can output various information to the outside, including the determined distance information, direction information, and the like. The output device 604 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.

Of course, for simplicity, only some of the components in the terminal 600 related to the present disclosure are shown in FIG. 10 , and components such as a bus, an input/output interface, and the like are omitted. Besides, the terminal 600 may also include any other appropriate components according to the specific application.

FIG. 11 is a schematic structural diagram of a server provided by an embodiment of the present disclosure, which is used to exemplarily describe a server that implements the interaction method provided by the embodiment of the present disclosure. The server shown in FIG. 11 is only an example, and should not impose any limitations on the functions and occupation scope of the embodiments of the present disclosure.

As shown in FIG. 11 , server 700 includes one or more processors 701 and memory 702 .

Processor 701 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in server 700 to perform desired functions.

Memory 702 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory, among others. Non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 701 may execute the program instructions to implement the interaction method applied to the server provided by the embodiments of the present disclosure, and may also implement other desired functions. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

Wherein, the interaction method applied to the server may include: acquiring multiple candidate videos, and extracting body part position data of each image frame in the multiple candidate videos; The position data is fused to obtain a standard position data set; the position data of the target body part in at least one image frame in the multiple candidate videos is found in the standard position data set; the searched position data is used to determine the action icon corresponding to the target body part. Preset position information to participate in determining the display position of the action icon in the image frame data displayed by the client.

It should be understood that the server 700 may also execute other optional implementations provided by the method embodiments of the present disclosure.

In one example, the server 700 may also include an input device 703 and an output device 704 interconnected by a bus system and/or other form of connection mechanism (not shown).

In addition, the input device 703 may also include, for example, a keyboard, a mouse, and the like.

The output device 704 can output various information to the outside, including the determined distance information, direction information, and the like. The output devices 704 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.

Of course, for simplicity, only some of the components in the server 700 related to the present disclosure are shown in FIG. 11 , and components such as buses, input/output interfaces, and the like are omitted. Besides, the server 700 may also include any other appropriate components according to the specific application.

In addition to the above-mentioned methods and apparatuses, the embodiments of the present disclosure may also be computer program products, which include computer program instructions, which, when executed by the processor, cause the processor to execute the application to the client or the application provided by the embodiments of the present disclosure. Arbitrary interaction method applied to the server.

The computer program product may write program code for performing operations of embodiments of the present disclosure in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc., as well as conventional procedural programming language, such as "C" language or similar programming language. The program code may execute entirely on a user terminal or server, partly on a user terminal or server, as a stand-alone software package, partly on a user terminal or server and partly on a remote terminal or server, or completely Execute on a remote terminal or server.

In addition, an embodiment of the present disclosure may also be a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, cause the processor to execute the application client or application provided by the embodiment of the present disclosure. Any method of interaction with the server.

In one aspect, the interaction method applied to the client may include: collecting and displaying the first image frame data of the user; identifying at least one human body part in the first image frame data, and determining the position information of the human body part; position information, and the preset position information of the action icons corresponding to the body parts, determine the display position of the action icons, and display the action icons at the display position; collect and display the second image frame data of the user; wherein, the second image frame data is the image frame data at a preset time point after the first image frame data; determine the target human body part associated with the action icon in the second image frame data and the state information of the target human body part; according to the second image frame data of the target human body part The matching degree between the status information and the action icon determines the evaluation result.

On the other hand, the interaction method applied to the server may include: acquiring multiple candidate videos, and extracting body part position data of each image frame in the multiple candidate videos; The position data of human body parts are fused to obtain a standard position data set; the position data of the target human body part in at least one image frame in multiple candidate videos is found in the standard position data set; the action corresponding to the target human body part is determined by using the searched position data The preset position information of the icon to participate in determining the display position of the action icon in the image frame data displayed by the client.

It should be understood that, when the computer program instructions are executed by the processor, the processor may also cause the processor to execute other optional implementations provided by the method embodiments of the present disclosure.

A computer-readable storage medium can employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses or devices, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

It should be noted that, in this document, relational terms such as "first" and "second" etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article, or device that includes the element.

The above are only specific embodiments of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure is not to be limited to the embodiments herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

An interaction method, characterized in that, applied to a client, comprising:

Collect the user's first image frame data and display it;

Identifying at least one human body part in the first image frame data, and determining the position information of the human body part;

determining the display position of the action icon based on the position information of the at least one body part and the preset position information of the action icon corresponding to the body part, and displaying the action icon at the display position;

collecting and displaying second image frame data of the user; wherein, the second image frame data is image frame data at a preset time point after the first image frame data;

determining the target body part associated with the action icon in the second image frame data and the state information of the target body part;

The evaluation result is determined according to the degree of matching between the state information of the target human body part and the action icon in the second image frame data.
The method according to claim 1, wherein the state information of the target body part includes position information of the target body part and/or motion information formed by the target body part.
The method according to claim 2, wherein the state information of the target body part comprises position information of the target body part;

The determining the evaluation result according to the matching degree between the state information of the target human body part in the second image frame data and the action icon, and determining the evaluation result, including:

determining the effective response area of the action icon in the second image frame data;

The position matching degree between the position information of the target body part and the effective response area of the action icon is determined, and the evaluation result is determined according to the position matching degree.
The method according to claim 2, wherein the state information of the target body part comprises action information formed by the target body part;

The determining the evaluation result according to the matching degree between the state information of the target human body part and the action icon in the second image frame data includes:

determining the standard action information corresponding to the action icon;

Determine the motion matching degree between the motion information formed by the target human body part and the standard motion information in the second image frame data, and determine the evaluation result according to the motion matching degree.
The method according to claim 1, wherein the preset position information of the action icon is obtained based on the position data of the body part corresponding to the action icon in a standard data set;

The standard data set is obtained by fusing human body position data of the same image frame in multiple candidate videos based on preset rules.
The method according to claim 5, wherein the plurality of candidate videos are obtained based on preset video screening information, and the preset video screening information includes video interaction information and/or video publisher information, and the video Interaction information includes the number of likes and/or comments on the video.
The method according to claim 1, wherein in the process of displaying the second image frame data of the user, the method further comprises:

Guiding information is displayed on the second image frame data to guide the user to change the state information of the target body part.
The method according to claim 7, wherein the guidance information comprises at least one of a guidance video animation, a guidance picture and a guidance instruction.
The method according to claim 1, wherein the displaying the action icon at the display position comprises:

Determine the display style of the action icon based on the playback time information of the background music or based on the collection time information of the first image frame data;

Display the action icon in the display position using the display style;

The determining of the target body part associated with the action icon in the second image frame data includes:

Based on the playing time information of the background music or based on the collection time information of the second image frame data, the target human body part associated with the action icon in the second image frame data is determined.
The method according to claim 1, wherein after determining the evaluation result according to the matching degree between the state information of the target human body part in the second image frame data and the action icon, the method further comprises:

determining an evaluation result animation according to the evaluation result;

Using the display position of the action icon, the animation display position of the evaluation result animation in the second image frame data is determined, and the evaluation result animation is displayed at the animation display position.
The method according to claim 1, wherein the action icon comprises an emoticon icon, and in the process of displaying the first image frame data or displaying the second image frame data, further comprising:

Identifying the user's expression in the first image frame data or the second image frame data, and determining an expression icon matching the user's expression;

Based on the position information of the facial features forming the user's expression on the first image frame data or the second image frame data, the display position of the expression icon is determined, and the expression icon is displayed at the determined display position.
The method according to claim 1, wherein after determining the evaluation result according to the matching degree between the state information of the target human body part in the second image frame data and the action icon, the method further comprises:

generating a first shared video based on the collected first image frame data and the second image frame data;

According to the video sharing operation of the user, a first video sharing request is sent to the server; wherein, the first video sharing request carries the first sharing video and the user ID of the sharing object, and the user ID of the sharing object is determining the second shared video shared by the shared object on the server;

Receive a composite video returned by the server; wherein, the composite video is obtained by the server synthesizing the first shared video and the second shared video for display on the same screen.
The method according to claim 1, wherein before displaying the first image frame data, the method further comprises:

According to the image synchronization operation of the user, switch from the current mode to the image synchronization sharing mode;

Correspondingly, in the process of displaying the first image frame data and the second image frame data, the method further includes:

Receive the first shared image frame data in real time, and display the first shared image frame data and the first image frame data on the same screen;

Receive the second shared image frame data in real time, and display the second shared image frame data and the second image frame data on the same screen;

The first shared image frame data and the second shared image frame data are shared in real time by a sharing object, and the sharing object is predetermined by the user.
The method according to claim 2, wherein the motion information formed by the body parts includes dance game-like motion information.
An interaction method, characterized in that, applied to a server, comprising:

Obtaining multiple candidate videos, and extracting the position data of human body parts of each image frame in the multiple candidate videos;

Based on preset rules, the position data of human body parts of the same image frame in the plurality of candidate videos are fused to obtain a standard position data set;

Find the position data of the target human body part in the standard position data set in at least one image frame in the plurality of candidate videos;

The preset position information of the action icon corresponding to the target body part is determined by using the searched position data, so as to participate in determining the display position of the action icon in the image frame data displayed by the client.
An interactive device, characterized in that, configured on a client, comprising:

a first acquisition module, configured to collect and display the first image frame data of the user;

a first determining module, configured to identify at least one human body part in the first image frame data, and determine the position information of the human body part;

A display position determination module, configured to determine the display position of the action icon based on the position information of the at least one human body part and the preset position information of the action icon corresponding to the human body part, and display it at the display position the action icon;

a second collection module, configured to collect and display second image frame data of the user; wherein, the second image frame data is image frame data at a preset time point after the first image frame data;

a second determination module, configured to determine the target human body part associated with the action icon in the second image frame data and the state information of the target human body part;

An evaluation module, configured to determine an evaluation result according to the degree of matching between the state information of the target human body part in the second image frame data and the action icon.
An interactive device, characterized in that, configured on a server, comprising:

a position data extraction module, configured to obtain a plurality of candidate videos, and extract the position data of human body parts of each image frame in the plurality of candidate videos;

a standard position data set determination module, configured to fuse the body part position data of the same image frame in the plurality of candidate videos based on preset rules to obtain a standard position data set;

a position data search module, configured to search for the position data of the target body part in the standard position data set in at least one image frame of the plurality of candidate videos;

The preset position information determination module is used to determine the preset position information of the action icon corresponding to the target body part by using the searched position data, so as to participate in determining the position of the action icon in the image frame data displayed by the client. placement.
A terminal, characterized in that it includes a memory, a processor and a camera, wherein:

The camera is used to collect the user's image frame data in real time;

A computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the interaction method of any one of claims 1-14.
A server, characterized by comprising a memory and a processor, wherein:

A computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the interaction method of claim 15 .
A computer-readable storage medium, wherein a computer program is stored in the storage medium, and when the computer program is executed by a processor, the processor executes the method described in any one of claims 1-14. interactive method, or execute the interactive method described in claim 15 .
A computer program product, characterized in that the computer program product comprises computer program instructions that, when executed by a processor, cause the processor to perform the interaction of any one of claims 1-14 method, or execute the interaction method of claim 15 .