WO2023273500A1 - Data display method, apparatus, electronic device, computer program, and computer-readable storage medium - Google Patents

Data display method, apparatus, electronic device, computer program, and computer-readable storage medium Download PDF

Info

Publication number
WO2023273500A1
WO2023273500A1 PCT/CN2022/085941 CN2022085941W WO2023273500A1 WO 2023273500 A1 WO2023273500 A1 WO 2023273500A1 CN 2022085941 W CN2022085941 W CN 2022085941W WO 2023273500 A1 WO2023273500 A1 WO 2023273500A1
Authority
WO
WIPO (PCT)
Prior art keywords
anchor
real
head
real anchor
special effect
Prior art date
Application number
PCT/CN2022/085941
Other languages
French (fr)
Chinese (zh)
Inventor
邱丰
王佳梨
王权
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023273500A1 publication Critical patent/WO2023273500A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4781Games
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to a data presentation method, device, electronic equipment, computer program, and computer-readable storage medium.
  • the host is required to face the display screen of the host terminal, so as to enhance the interactive effect between the host and the audience.
  • the anchor's face disappears from the display screen, it not only affects the display effect of the animation special effects added for the anchor, but also reduces the viewing experience of the audience watching the live video.
  • the audience leaves the live broadcast room, it will also indirectly affect the live broadcast experience of the host and the popularity of the live broadcast.
  • Embodiments of the present disclosure at least provide a data display method, device, electronic device, computer program, and computer-readable storage medium.
  • the embodiment of the present disclosure provides a data display method, including: acquiring multiple frames of video images of the real anchor during the live broadcast; detecting the head posture of the real anchor in each frame of the video image; The head posture corresponding to the multi-frame video image, when it is determined that the time length of the head of the real anchor in the specified posture meets the special effect triggering requirements, the target special effect animation is displayed in the live video screen; the video The live broadcast screen displays a virtual anchor model driven by the real anchor.
  • the head of the real anchor when the head of the real anchor is detected to be in a specified posture for a long time, it may cause the head of the virtual anchor model displayed in the live video screen to shake, thereby affecting the live broadcast experience and viewing experience of the anchor.
  • the virtual anchor model displayed in the live video screen by displaying the virtual anchor model in the live video screen, the interest and interactivity of the live broadcast can be enhanced.
  • the target special effect animation corresponding to the driving virtual anchor model in the live video screen it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen, so that the live video The picture is no longer too monotonous, and thus solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
  • the detection of the head posture of the real anchor in each frame of the video image includes: when it is determined that the face of the real anchor is facing the video acquisition device, determining the current The first facial orientation of the real anchor at any time; determine the change information of the head posture of the real anchor according to the first facial orientation; the change information is used to characterize the change information of the first facial orientation ; Determine the head pose of the real anchor in each frame of the video image based on the change information.
  • the method by determining the change information of the head pose of the real anchor according to the first face orientation of the real anchor at the current moment, and then determining the head pose of the real anchor according to the change information, it is possible to use the timing information in the video sequence (that is, adjacent video images) to analyze the change information of the head posture of the real anchor.
  • the method provided by the technical solution of the present disclosure can improve the head posture. accuracy, so as to obtain more accurate attitude results.
  • the determining the head posture of the real anchor in each frame of the video image based on the change information includes: determining the first face according to the change information When the head orientation increases to exceed the first threshold, it is determined that the head posture of the real anchor changes from an unspecified posture to the specified posture.
  • determining the head posture of the real anchor in each frame of the video image based on the change information includes: determining the first facial orientation according to the change information In the case of exceeding the first threshold and decreasing to less than the second threshold, it is determined that the head posture of the real anchor changes from the designated posture to a non-designated posture, wherein the second threshold is smaller than the first threshold.
  • the detecting the head posture of the real anchor in each frame of the video image includes: when it is determined that the face of the real anchor is not facing the video capture device, by The deep learning model processes the live video images to obtain the head pose of the real anchor, and determines whether the head of the real anchor is in the specified pose according to the head pose.
  • the pose estimation of the live video screen is carried out through the deep learning model to obtain the head pose of the real anchor, which can improve the estimation accuracy of the head pose of the real anchor.
  • the processing of the live video image by using a deep learning model to obtain the head pose of the real anchor includes: acquiring the target reference image frame; wherein, the target reference The image frame includes at least one of the following image frames: N image frames before the video live image in the video sequence to which the live video image belongs, the first M image frames in the video sequence to which the live video image belongs, N and M is a positive integer greater than zero; the live video picture and the target reference image frame are processed by a deep learning model to obtain the head pose of the real anchor.
  • the real anchor's head determined according to N image frames (or M image frames) can be Head posture is used as the guidance information of the live video screen to be processed at the current moment, so as to guide the deep learning model to predict the head posture of the real anchor in the live video screen at the current moment, so as to obtain more accurate head posture detection results.
  • the detecting the head posture of the real anchor in each frame of the video image includes: performing feature point detection on the face of the real anchor in the video image to obtain feature points The detection result, wherein the feature point detection result is used to characterize the feature information of the facial feature points of the real anchor; determine the second facial orientation of the real anchor according to the feature point detection result, wherein the second face The orientation is used to characterize the orientation information of the face of the real anchor relative to the video capture device; the head posture of the real anchor is determined according to the second facial orientation.
  • the orientation information of the real anchor relative to the video acquisition device can be determined, for example, the real anchor The front faces the video capture device, or the real anchor faces the video capture device. Since the real host cannot collect a complete facial image when facing the video capture device, in this case, it will affect the accuracy of the real host's head posture. By determining the head posture of the real anchor according to the two situations of frontal orientation and non-frontal orientation, the accuracy of the real anchor's head posture can be improved.
  • the displaying the target special effect animation in the live video screen includes: determining the posture type of the head posture; determining the special effect animation matching the posture type, adding the The matching special effect animation is used to drive the target special effect animation displayed by the virtual anchor model, and the target special effect animation is displayed in the live video screen.
  • different types of special effect animations are triggered according to different head postures and gesture types, which can enrich the display content of special effect animations, thereby increasing the live broadcast fun during the live broadcast process and providing users with a better live broadcast experience.
  • the displaying target special effect animation in the live video screen includes: determining the type information of each viewer who watches the live broadcast process of the virtual anchor model driven by the real anchor; The special effect animation matching the above type information, using the matching special effect animation as the target special effect animation displayed by the virtual anchor model, and sending the target special effect animation to the audience terminal, so that the target animation can be seen in the audience
  • the party terminal displays the target special effect animation.
  • the probability of the audience continuing to watch the live broadcast can be increased, thereby reducing the loss of viewers , while ensuring the popularity of the live broadcast of the real anchor, it also increases the corresponding interactive fun.
  • an embodiment of the present disclosure provides a data display device, including: an acquisition part configured to acquire multiple frames of video images of a real host during a live broadcast; a detection part configured to detect each frame of the video image
  • the head posture of the real anchor in the above; the special effect adding part is configured to determine the time length of the head of the real anchor in a specified posture according to the head posture corresponding to the multi-frame video image to meet the special effect
  • the target special effect animation is displayed in the live video screen; the live video screen shows a virtual anchor model driven by the real anchor.
  • an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing
  • the processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
  • embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.
  • An embodiment of the present disclosure provides a computer program, including computer readable codes.
  • a processor in the electronic device implements the above method when executed.
  • Fig. 1 shows a flow chart 1 of a data presentation method provided by an embodiment of the present disclosure
  • Fig. 2 shows a schematic diagram of the effect of a live video screen of a real anchor provided by an embodiment of the present disclosure
  • FIG. 3 shows the second flow chart of a data presentation method provided by an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram showing the orientation information between the first real anchor and the video capture device provided by the embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram showing the orientation information between the second real anchor and the video acquisition device provided by the embodiment of the present disclosure
  • FIG. 6 shows a flowchart three of a data presentation method provided by an embodiment of the present disclosure
  • FIG. 7 shows a flowchart 4 of a data presentation method provided by an embodiment of the present disclosure
  • FIG. 8 shows a flowchart five of a data presentation method provided by an embodiment of the present disclosure
  • Fig. 9 shows a schematic diagram showing the orientation information between the third real anchor and the video capture device provided by the embodiment of the present disclosure.
  • FIG. 10 shows a flowchart six of a data presentation method provided by an embodiment of the present disclosure
  • FIG. 11 shows the seventh flowchart of a data presentation method provided by an embodiment of the present disclosure
  • FIG. 12 shows a flowchart eighth of a data presentation method provided by an embodiment of the present disclosure
  • Fig. 13 shows a schematic diagram of a data display device provided by an embodiment of the present disclosure
  • Fig. 14 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • the anchor is generally required to face the display screen of the anchor's terminal, so as to enhance the interactive effect between the anchor and the audience.
  • the anchor's face disappears from the display screen, it not only affects the display effect of the animation special effects added for the anchor, but also reduces the viewing experience of the audience watching the live video.
  • the audience leaves the live broadcast room, it will also indirectly affect the live broadcast experience of the host and the popularity of the live broadcast.
  • the present disclosure provides a data presentation method.
  • the technical solution provided by the present disclosure can be applied in a virtual live broadcast scenario.
  • the virtual live broadcast scene can be understood as the use of pre-set virtual anchor models, such as red pandas, little rabbits, cartoon characters, etc. to replace the actual image of the real anchor for live broadcast.
  • the above-mentioned virtual anchor is shown in the live video screen Model.
  • the virtual anchor model can also be used to interact with the real anchor and the audience.
  • the camera device of the live broadcast device can collect a video image containing the real anchor, and then capture the head of the real anchor included in the video image through electronic equipment, so as to obtain the head posture of the real anchor.
  • the electronic device can generate a corresponding driving signal, which is used to drive the virtual anchor model in the live video screen to perform corresponding actions corresponding to the real anchor, and display the virtual anchor model through the live video screen.
  • the real anchor can preset one or more virtual anchor models through the electronic device, for example, "YYY role model in XXX game" can be preset as the virtual anchor model. In this way, when starting the virtual live broadcast at the current moment, one can be selected from one or more preset virtual anchor models as the virtual anchor model at the current moment.
  • the virtual anchor model may be a 2D model or a 3D model.
  • the electronic device in addition to determining the virtual anchor model for the real anchor in the manner described above, can also reshape the virtual anchor for the real anchor in the video image after acquiring multiple frames of video images Model.
  • the electronic device can recognize the real anchor included in the video image, so as to reshape the virtual anchor model for the real anchor according to the recognition result.
  • the recognition result may include at least one of the following: the gender of the real anchor, the appearance characteristics of the real anchor, the clothing characteristics of the real anchor, and the like.
  • the electronic device may search for a model matching the recognition result from the virtual anchor model database as the virtual anchor model of the real anchor. For example, according to the recognition result, it is determined that the real anchor wears a peaked cap and clothes in hip-hop style during the live broadcast.
  • the electronic device may search for a virtual anchor model matching the "cap” or "hip-hop style" from the virtual anchor model database as the virtual anchor model of the real anchor.
  • the electronic device can also construct a corresponding virtual anchor model for the real anchor in real time through the model building module based on the recognition result.
  • the electronic device when constructing the virtual anchor model in real time, can also use the virtual anchor model used by the virtual live broadcast initiated by the real anchor in the past as a reference to construct the virtual anchor model driven by the real anchor at the current moment.
  • the animation displayed on the live viewing interface of the viewer side is the animation when the virtual anchor model performs corresponding actions.
  • the virtual anchor model can be displayed on the live video screen of the live broadcast end, and the video image containing the real anchor can also be displayed.
  • the virtual anchor model can be displayed on the left side of the live video screen.
  • the anchor model can also display the video image of the real anchor at the position 21 in the lower right corner of the live video screen.
  • the target special effect animation includes multiple animation frames.
  • the electronic device drives the virtual anchor model to perform specified actions, it can generate multiple animation frames, and obtain the target special effect animation by combining the multiple animation frames.
  • the target special effect animation corresponding to the driving virtual anchor model in the live video screen by displaying the target special effect animation corresponding to the driving virtual anchor model in the live video screen, it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen.
  • the video live broadcast screen is no longer too monotonous, and then solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
  • the execution subject of the data display method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability.
  • the devices include, for example: terminal devices or servers or other live broadcast devices capable of supporting virtual live broadcast.
  • the data presentation method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • the data presentation method can be applied to any virtual live broadcast scene such as a chat live broadcast scene, a game live broadcast scene, etc., which is not specifically limited in the present disclosure.
  • FIG. 1 it is a flow chart of a data presentation method provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:
  • the head posture can be used to characterize the angle between the plane corresponding to the face of the real anchor and the horizontal plane, and/or, the angle between the plane corresponding to the face of the real anchor and the plane where the lens of the video capture device is located.
  • the posture of the head of the real anchor relative to the video acquisition device of the real anchor terminal can be determined according to the head posture: for example, postures such as head-up posture, head-down posture, and head-up posture, among which,
  • the head-up posture can be understood as a state where the face of the real anchor is relatively parallel to the horizontal plane.
  • the head posture of each real anchor can be detected, and the head posture of a designated real anchor among the multiple real anchors can also be detected. This is not specifically limited.
  • the specified posture can be understood as the head posture of the real anchor when the face of the real anchor in the video image is in an invalid display state.
  • it can be the head posture of the real anchor when the face of the real anchor is fixed for a long time
  • it can also be the head posture of the real anchor when the face of the real anchor disappears from the live video screen
  • it can also be The head posture of the real anchor when only part of the face of the real anchor is displayed, or the head posture of the real anchor when the real anchor is not facing the video capture device for a long time.
  • the specified posture includes the following postures: bowing the head posture, raising the head posture, bowing the head to the lower left, lowering the head to the lower right, raising the head to the upper left, and raising the head to the upper right. No longer list them one by one.
  • the target special effect animation can be understood as a special effect animation matching a specified posture.
  • the special effect animations matching the specified gestures may be the same or different.
  • one or more matching special effect animations may be preset, and each special effect animation corresponds to a different special effect triggering requirement.
  • the target special effect animation may include model animation, and besides, may also include material special effects.
  • the model animation may be the animation when the specified limbs of the virtual anchor model are driven to perform corresponding actions, for example, actions such as heart-to-heart gestures, greeting gestures, and goodbye gestures.
  • the material special effect can be a preset dynamic or static sticker special effect.
  • the material special effect may be a special effect matching the model animation, and may also be a special effect matching the specified posture of the real anchor.
  • the special effect of the material is a special effect that matches the model animation
  • the special effect of the material can also be displayed at the specified display position in the live video screen; when switching to the next During model animation, you can switch to display the material effects corresponding to the next model action on the live video screen.
  • the special effect of the material is a special effect that matches the specified posture of the real anchor
  • the special effect of the material can be continuously displayed in the live video screen when it is detected that the length of time that the real anchor is in the specified posture meets the requirements for triggering the special effect. Until it is detected that the head of the real anchor is no longer in the specified posture.
  • the specified posture can be that the real anchor keeps his head down for a long time
  • the target special effect animation can include: model animation and material special effects.
  • the model animation can include the animation of the virtual anchor model "Bi Xin” and the animation of the virtual anchor model "Greeting”.
  • the material effect can be a sticker effect that matches the model animation.
  • the sticker effect can be "Hello ", and love stickers.
  • the sticker special effect of "Hello” can be displayed on the live video screen at the same time.
  • the animation of "Bi Xin” is displayed on the live video screen, the special effect of love stickers can be displayed on the live video screen at the same time.
  • the content displayed in the live video screen can be enriched, thereby improving the user's live broadcast experience.
  • displaying the target special effect animation in the live video screen includes the following steps:
  • the target special effect animation may be requested to the server. Then, display the target special effect animation on the live video screen of the live broadcast device where the real anchor is located, and push the video stream corresponding to the target special effect animation to the device where the audience terminal is located, so as to view the live broadcast on the device where the audience terminal is located. Play the target special effect animation on .
  • the number of target special effect animations may be one, or multiple.
  • multiple target special effect animations can be set to play in a loop until it is detected that the head of the real anchor is no longer in the specified posture.
  • a target special effect animation can be set to play in a loop until it is detected that the head of the real anchor is no longer in the specified posture.
  • the virtual anchor model and the real-time game screen can be displayed on the live video screen at the same time.
  • the real-time game screen can be displayed on the left side of the live video screen, and then the virtual anchor model can be displayed on the right side of the live video screen.
  • the target special effect animation can be determined.
  • the target special effect animation can be a special effect animation for the virtual anchor model to dance, and it can also be a special effect animation for the virtual anchor model to remind the audience "please wait for a while, and the excitement will continue later".
  • a database containing a mapping relationship can be created in advance, and various special effect animations are stored in the database.
  • the mapping relationship of , and/or, is used to characterize the mapping relationship between the special effect trigger requirement and the special effect animation corresponding to each specified gesture.
  • the special effect animation that has a mapping relationship with the specified posture and special effect triggering requirements can be searched in the database according to the mapping relationship, and the target special effect animation can be determined based on the found special effect animation.
  • step S101 after detecting the start instruction of the live broadcast of the real anchor, start to collect the live video of the real anchor during the live broadcast, wherein the live video contains multiple frames of video images.
  • step S103 is performed to detect the head posture of the real anchor in each frame of the video image, as shown in Figure 3, including the following steps:
  • feature point detection can be performed on the face of the real anchor in the video image through the face detection network model, so as to obtain the feature information of the facial feature points of the real anchor.
  • the feature points can be understood as the feature points of the facial features of the real anchor.
  • the number of feature points can be set according to actual needs. Generally, the number of feature points can be selected as 84 facial feature points.
  • the feature information of a feature point can be understood as the number of feature points, the label of the feature point, the classification information of each feature point (for example, the eye feature point, the mouth feature point, or the nose feature point), and each feature point The eigenvalues corresponding to the points.
  • the number of feature points can affect the accuracy of the determined head pose of the real anchor, for example, the larger the number of feature points, the higher the accuracy of the calculated head pose, and vice versa. Low.
  • the number of feature points can be dynamically adjusted according to the remaining amount of device memory of the real host terminal. For example, when the remaining memory of the real host terminal is greater than the preset threshold, a feature point detection result of a larger number of feature points may be selected to determine the face orientation of the real host according to the feature point detection result.
  • the face orientation of the real anchor (that is, the above-mentioned second face orientation) can be determined according to the feature point detection result.
  • An optional implementation manner is that the feature point detection result can be input into the neural network model, so as to process the feature point detection result through the neural network model to obtain the face orientation of the real anchor (that is, the above-mentioned second face facing).
  • Another optional implementation manner is to judge the classification information of the feature points contained in the feature point detection result. If it is determined according to the classification information that the feature point does not include all the facial feature points, at this time, it can be determined that the real anchor is facing the video capture device. If it is determined according to the classification information that the feature point contains all facial features, at this time, it can be determined that the real anchor is facing the video capture device.
  • the second facial orientation is used to characterize the orientation information of the face of the real anchor relative to the video capture device;
  • the orientation information can be understood as the orientation of the face of the real anchor relative to the video capture device of the real host terminal to which the real anchor belongs. angle and distance.
  • the video capture device is installed on the terminal of the real anchor, and the real anchor is determined when the angle between the horizontal plane of the face of the real anchor and the X-axis of the coordinate system where the video capture device is located is less than or equal to the specified threshold face facing the video capture device.
  • the video capture device is installed on the terminal of the real anchor, and when the angle between the horizontal plane of the face of the real anchor and the X-axis of the coordinate system where the video capture device is located is greater than a specified threshold, determine the face of the real anchor side to the video capture device.
  • the specified threshold may be set to any value between 0 and 30, which is not specifically limited here.
  • the face orientation can be used to determine whether the face of the real anchor is facing the video capture device.
  • the head pose of the real anchor is determined by way of threshold value comparison.
  • the method of threshold value comparison can be understood as determining whether the head pose of the real anchor is a specified pose by comparing the change information of the head pose of the real anchor with a preset threshold.
  • the head posture of the real anchor is determined through the neural network model.
  • the orientation information of the real anchor relative to the video acquisition device can be determined, for example, the real anchor The front faces the video capture device, or the real anchor faces the video capture device. Since the real host cannot collect a complete facial image when facing the video capture device, in this case, it will affect the accuracy of the real host's head posture.
  • the accuracy rate of the head posture of the real anchor can be improved.
  • step S103 detecting the head posture of the real anchor in each frame of the video image, the process includes the following steps:
  • the historical facial orientation can be obtained, wherein the historical facial orientation is based on the video collected at multiple historical moments before the current moment.
  • the facial orientation of the real anchor determined by the image, and the historical facial orientation can be used to represent the historical angle between the plane where the real anchor's face is located and the horizontal plane at each historical moment.
  • the historical facial orientation and the first facial orientation determined at the current moment can be combined to determine the change information of the head posture of the real anchor, that is, according to the historical angle and the plane where the face is located at the current moment The angle with the horizontal plane determines the change information of the first facial orientation.
  • the first face orientation is used to represent the degree of inclination of the real host's face relative to the imaging plane corresponding to the video capture device.
  • the first face orientation may be the angle between the face of the real anchor and the horizontal plane; the first face orientation may also be the angle between the face of the real anchor and the imaging plane corresponding to the video capture device.
  • other included angles that can characterize the degree of inclination may also be used.
  • the change information can be understood as the trend information such as the gradual increase of the first facial orientation and the increase range of the first facial orientation, or the gradual decrease of the first facial orientation and the decrease range of the first facial orientation. .
  • the historical face orientation is the facial orientation determined according to video images corresponding to a plurality of consecutive historical moments. For example, if the current moment is moment k, then the historical moment can be from moment k-n to moment k-1, and the historical facial orientations are the facial orientations of the real anchor determined based on the video images collected from moment k-n to moment k-1.
  • the change information when determining the head posture of the real anchor in each frame of video image according to the change information, can be compared with a threshold transition interval, wherein the threshold transition interval is determined based on multiple thresholds multiple transition intervals.
  • the change process of the head pose of the real anchor can be determined through the threshold transition interval, and then the head pose of the real anchor at the current moment can be determined through the change process.
  • the technical solution provided by the present disclosure can improve the accuracy of the head pose, so as to obtain more accurate pose results.
  • step S13 determining the head pose of the real anchor in each frame of the video image based on the change information, can be performed by executing S13-1 or The steps of S13-2 are implemented as follows:
  • the first threshold may be set according to the angle range of the first face orientation defined for the specified gesture in the actual live broadcast scene.
  • the first threshold may be set to any value in [27-33], for example, the first threshold may be set to 30.
  • the first threshold may be set to 30.
  • the embodiment of the present disclosure does not limit the specific numerical value of the setting of the first threshold.
  • the detection of the head posture may also continue to be performed on the collected video images.
  • the specified gesture for example, head-down posture or head-up posture
  • a posture adjustment prompt message can be sent to the real anchor to prompt the real anchor to adjust the head posture at the current moment.
  • the threshold A1 may be multiple thresholds greater than the first threshold, for example, the threshold A1 may be selected as 50 degrees, and may also be selected as 60 degrees, 70 degrees and so on. It can be understood that the threshold A1 can be selected as multiple arbitrary values between [30-90], which is not specifically limited in the present disclosure.
  • the first threshold can be set according to the angle range of the first facial orientation defined for the specified gesture in the actual live broadcast scene; according to the non-specified gesture in the actual live broadcast scene
  • the defined angle range of the first facial orientation is used to set the second threshold.
  • the first threshold can be set to any value in [27-33], for example, the first threshold can be set to 30; the second threshold can be set to [17-23] Any numerical value, for example, the second threshold may be set to 20.
  • the real anchor M broadcasts live on the live broadcast platform through the real anchor terminal. After the real anchor M opens the live broadcast room, start to collect video images, and determine the head posture of the real anchor in the manner described above.
  • the target angle between the real host's face and the imaging plane of the video capture device is alpha. If the change information of alpha is gradually increasing, when the alpha increases from 0 to more than 20 degrees but not 50, it is determined that the real anchor is not bowing or raising his head; when the alpha increases to more than 30 degrees, it is determined that the real The anchor is looking down or looking up. Conversely, when the alpha gradually decreases from an angle greater than 30 degrees to the interval between 20-30 degrees, it is determined that the real anchor is still bowing or raising his head, until the alpha continues to decrease to less than 20 degrees, it is determined The real anchor is not bowing or looking up.
  • a threshold may be preset, and the angle between the real anchor's face orientation and the horizontal plane may be compared with the threshold to determine whether the real anchor is in a specified posture.
  • the single-threshold detection technology may cause errors in the recognition of the specified gesture of the real anchor, thereby triggering the corresponding special effect animation by mistake, bringing a bad live broadcast experience to the real anchor and the audience .
  • the technical solution of the present disclosure by comparing the change information of the target angle with the first threshold and the second threshold, it is possible to determine the head posture of the real anchor through multi-threshold comparison, thereby improving the head posture of the real anchor.
  • the accuracy of the head posture can be improved, so as to prevent the frequent changes of the head posture of the real anchor brought by the single threshold technical solution.
  • step S103 detecting the head posture of the real anchor in each frame of the video image, can be realized by executing S21-S22, as follows:
  • the live video picture when it is detected that the face of the real anchor is not facing the video capture device, the live video picture can be input into the deep learning model, so that the live video picture can be processed by the deep learning model, Get the head pose of the real anchor.
  • the deep learning model Before inputting the live video picture into the deep learning model, the deep learning model needs to be trained. Specifically, it is possible to collect images of multiple real anchors at various angles relative to the video capture screen, and then input the images into the deep learning model for training, and then analyze the live video screen through the trained deep learning model processing to obtain the head posture of the real anchor.
  • the output data of the deep learning model can be a vector, which is used to indicate at least one of the following information: whether it is in a specified posture, the posture type of the specified posture (for example, bowing the head posture or looking up head pose), the estimated angle between the real anchor’s face orientation and the horizontal plane, and the orientation information of the real anchor’s face relative to the video capture device.
  • the posture type of the specified posture for example, bowing the head posture or looking up head pose
  • the estimated angle between the real anchor’s face orientation and the horizontal plane for example, the orientation information of the real anchor’s face relative to the video capture device.
  • the live broadcast screen displays Target special effect animation.
  • prompt information can be generated to the real anchor, and the prompt information is used to prompt the real
  • the host moves the video capture device so that the face of the real host can face the video capture device.
  • the video capture device and the real anchor terminal are set separately, and the video capture device is placed on the left side of the real anchor terminal.
  • the live video image collected by the video collection device includes the left side of the real anchor's face.
  • the pose estimation of the live video screen is carried out through the deep learning model to obtain the head pose of the real anchor, which can improve the estimation accuracy of the head pose of the real anchor.
  • S21 can be realized by executing S21-1 to S21-2 as shown in FIG. 10, as follows:
  • the target reference image frame includes at least one of the following image frames: the live video image The N image frames before the live video picture in the video sequence to which it belongs, and the first M image frames in the video sequence to which the live video picture belongs, where N and M are positive integers greater than zero;
  • S21-2 Process the live video image and the target reference image frame by using a deep learning model to obtain the head pose of the real anchor.
  • the electronic device in order to further improve the accuracy of the head pose of the real anchor, can determine the head pose of the real anchor at the current moment by combining the timing information of the video sequence during the live broadcast of the real anchor through a deep learning model .
  • N image frames before the live video picture corresponding to the current moment may be determined. Then, input the obtained N image frames, the output data corresponding to each image frame, and the live video image collected at the current moment into the deep learning model for processing, so as to obtain the head pose of the real anchor.
  • the head postures of the real anchor corresponding to adjacent live video images in the video sequence may be the same posture.
  • the head pose of the real anchor in the live video screen at the current moment can be predicted by combining the timing information in the video sequence, and the head pose of the real anchor determined based on N image frames can be used as the current moment
  • the guidance information of the live video screen to be processed can guide the deep learning model to predict the head pose of the real anchor in the live video screen at the current moment, so as to obtain more accurate detection results of the head pose.
  • the first M image frames in the video sequence may also be determined. Then, the acquired M image frames, the output data corresponding to each image frame, and the live video images collected at the current moment are input into the deep learning model for processing, so as to obtain the head pose of the real anchor.
  • the face of the real anchor will face the video capture device in order to debug the device of the real anchor. Therefore, when predicting the live video picture to be processed at the current moment, M image frames, the output data corresponding to each image frame, and the live video picture collected at the current moment can be input into the deep learning model for processing , so as to obtain the head pose of the real anchor.
  • the M image frames can be understood as the image frames collected when the face of the real anchor faces the video capture device, the M image frames may contain the complete face of the real anchor.
  • the deep learning model can compare the picture about the real anchor in the live video picture to be processed at the current moment with the picture about the real anchor in the M image frames, so as to guide the deep learning model to predict the live video picture at the current moment In order to get more accurate head pose detection results.
  • N image frames before the live video image corresponding to the current moment may be determined, and the first M image frames in the video sequence may be determined. Then, input the obtained N image frames and M image frames, the output data corresponding to each image frame, and the live video image collected at the current moment into the deep learning model for processing, so as to obtain the real host's head pose.
  • the head pose of the real anchor in the video image when the head pose of the real anchor in the video image is detected in the manner described above, the head pose of the real anchor can be determined according to the head poses corresponding to multiple frames of video images.
  • the target special effect animation is displayed on the live video screen.
  • the target special effect animation can also be displayed on the live video screen when the specified gesture meets at least one of the following special effect triggering requirements, including:
  • the position of the head in the video image meets the requirements for triggering special effects.
  • the display mode of special effect animation can be enriched, and a richer interactive experience can be provided for real hosts and audiences.
  • step S105 displaying target special effect animation in the live video screen includes the following steps:
  • different special effect animations are set for head postures of different posture types. After determining the posture type of the head posture, the model animation and/or material special effects matching the posture type can be searched in the data table, and the found model animation and/or material special effects can be used as driving the virtual anchor.
  • the target special effect animation can be one special effect animation, and can also be multiple special effect animations.
  • the animation with special effects can be cyclically played in the video sequence corresponding to the live video screen.
  • each target special effect animation may be played sequentially in the video sequence corresponding to the live video screen.
  • the special effect of the material is a special effect that matches the model animation
  • the special effect of the material can be played sequentially in the live video screen following the corresponding model animation.
  • the special effect of the material is a special effect that matches the specified pose
  • the special effect of the material can be played in a loop on the live video screen without following the model animation.
  • the display of target special effect animation in the live video screen in the above step S105 or S1052 may also include the following steps as shown in FIG. 12:
  • S33 Determine the special effect animation matching the type information, use the matching special effect animation as the target special effect animation displayed by the virtual anchor model, and send the target special effect animation to the audience terminal,
  • the target special effect animation is displayed on the audience terminal.
  • type information of each viewer may be determined, and the type information may include at least one of the following: gender, age, region, occupation, hobby, and rating.
  • the special effect animation matching the type information can be searched in the database according to the type information as the target special effect animation. Then, the target special effect animation is sent to the audience terminal, so as to play the target special effect animation on the live video screen displayed by the audience terminal.
  • the real anchor may keep his head down for a long time during the live broadcast.
  • the facial expressions of the real anchor cannot be captured, which will cause the virtual anchor model to be unable to be displayed normally in the live video screen.
  • the audience enters the live broadcast room and sees the virtual anchor model that cannot be displayed normally, it will affect the viewing experience of the audience and cause the audience to leave the live broadcast room.
  • the data display method of the embodiment of the present application can display the corresponding special effect animation for the audience, for example: the real anchor is performing a connection operation , please don't leave. Thereby, the probability of viewers continuing to watch the live broadcast is increased, the loss of viewers is reduced, and while ensuring the popularity of the live broadcast of the real host, the corresponding interactive fun is also increased.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the embodiment of the present disclosure also provides a data display device corresponding to the data display method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned data display method in the embodiment of the present disclosure, the implementation of the device Reference can be made to the implementation of the method, and repeated descriptions will not be repeated.
  • FIG. 13 it is a schematic diagram of a data display device provided by an embodiment of the present disclosure.
  • the device includes: an acquisition part 51, a detection part 52, and a special effect addition part 53; wherein,
  • the acquisition part 51 is configured to acquire the multi-frame video images of the real anchor during the live broadcast;
  • the detection part 52 is configured to detect the head posture of the real anchor in each frame of the video image
  • the special effect adding part 53 is configured to, when it is determined according to the head postures corresponding to the multi-frame video images that the length of time that the head of the real anchor is in a specified posture meets the requirements for triggering special effects,
  • the target special effect animation is displayed in the live video screen; the live video screen shows a virtual anchor model driven by the real anchor.
  • the interest and interactivity of the live broadcast can be enhanced.
  • the target special effect animation corresponding to the driving virtual anchor model in the live video screen it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen, so that the live video The picture is no longer too monotonous, and thus solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
  • the detection part 52 is further configured to: determine the first face orientation of the real anchor at the current moment when it is determined that the face of the real anchor faces the video capture device;
  • the first facial orientation determines the change information of the head posture of the real anchor; the change information is used to characterize the change information of the first facial orientation; the video image of each frame is determined based on the change information The head pose of the real anchor in .
  • the detection part 52 is further configured to: determine that the real anchor's head The internal posture is changed from the unspecified posture to the specified posture.
  • the detection part 52 is further configured to: determine the The head pose of the real anchor changes from the designated pose to a non-designated pose, wherein the second threshold is smaller than the first threshold.
  • the detection part 52 is further configured to: when it is determined that the face of the real anchor is not facing the video collection device, process the live video screen through a deep learning model to obtain The head pose of the real anchor, and determine whether the head of the real anchor is in the specified pose according to the head pose.
  • the detection part 52 is further configured to: acquire the target reference image frame; wherein, the target reference image frame includes at least one of the following image frames: the video sequence to which the live video picture belongs In the N image frames before the live video picture, the first M image frames in the video sequence to which the live video picture belongs, N and M are positive integers greater than zero;
  • the target reference image frame is processed to obtain the head pose of the real anchor.
  • the detection part 52 is further configured to: perform feature point detection on the face of the real anchor in the video image to obtain a feature point detection result, wherein the feature point detection result is used for Characterize the feature information of the facial feature points of the real anchor; determine the second facial orientation of the real anchor according to the feature point detection result, wherein the second facial orientation is used to characterize the face of the real anchor relative to the video Collecting the orientation information of the device; determining the head posture of the real anchor according to the second facial orientation.
  • the special effect adding part 53 is further configured to: determine the posture type of the head posture; determine the special effect animation matching the posture type, and use the matching special effect animation as the driving
  • the target special effect animation displayed by the virtual anchor model, and the target special effect animation is displayed in the live video screen.
  • the special effect adding part 53 is further configured to: determine the type information of each viewer watching the live broadcast process of the virtual anchor model driven by the real anchor; determine the special effect matching the type information Animation, using the matching special effect animation as the target special effect animation displayed by the virtual anchor model, and sending the target special effect animation to the audience terminal to display the target special effect on the audience terminal animation.
  • the embodiment of the present disclosure also provides an electronic device 600, as shown in FIG. 14, which is a schematic structural diagram of the electronic device 600 provided in the embodiment of the present disclosure, including:
  • Processor 61 memory 62, and bus 63; memory 62 is configured to store execution instructions, including memory 621 and external memory 622; memory 621 here is also called internal memory, and is configured to temporarily store computing data in the processor 61, And the data exchanged with the external memory 622 such as hard disk, the processor 61 exchanges data with the external memory 622 through the memory 621, when the electronic device 600 is running, the processor 61 communicates with the memory 62 through the bus 63 , so that the processor 61 executes the following instructions:
  • the head posture corresponding to the multi-frame video image it is determined that the time length of the head of the real anchor in the specified posture meets the special effect triggering requirement, and display the target special effect animation in the live video screen ;
  • the live video screen shows a virtual anchor model driven by the real anchor.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data presentation method described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data display method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. Wait.
  • the parts described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional part in each embodiment of the present disclosure may be integrated into one processing unit, each part may exist separately physically, or two or more parts may be integrated into one part.
  • the functions are implemented in the form of software function parts and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the interest and interactivity of the live broadcast can be enhanced.
  • the target special effect animation corresponding to the driving virtual anchor model in the live video screen it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen, so that the live video screen It is no longer too monotonous, and thus solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
  • the method provided by the disclosed technical solution can improve the accuracy of the head posture , so as to obtain more accurate pose results.
  • the orientation information of the real anchor relative to the video capture device can be determined, for example, the real anchor is facing the video Acquisition device, or, real anchor side-to-side video acquisition device.
  • the real host cannot collect a complete facial image when facing the video capture device, in this case, it will affect the accuracy of the real host's head posture.
  • the accuracy of the real anchor's head posture can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Provided in embodiments of the present disclosure are a data display method, an apparatus, an electronic device, a computer program, and a computer-readable storage medium. The method comprises: obtaining a plurality of video image frames of a real broadcaster during a live broadcast; detecting the head orientation of the real broadcaster in each video image frame; and when it is determined that the length of time that the head of the real broadcaster has been in a specified orientation satisfies a special effect trigger requirement according to the head orientations corresponding to the plurality of video image frames, displaying a target special effect animation in a video live broadcast visual, wherein a virtual broadcaster model driven by the real broadcaster is displayed in the video live broadcast visual.

Description

数据展示方法、装置、电子设备、计算机程序以及计算机可读存储介质Data display method, device, electronic device, computer program, and computer-readable storage medium
相关申请的交叉引用Cross References to Related Applications
本公开基于申请号为202110728854.1、申请日为2021年06月29日、申请名称为“数据展示方法、装置、电子设备以及计算机可读存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This disclosure is based on the Chinese patent application with the application number 202110728854.1, the filing date is June 29, 2021, and the application name is "data display method, device, electronic equipment, and computer-readable storage medium", and requires the Chinese patent application Priority, the entire content of this Chinese patent application is hereby incorporated into this application by reference.
技术领域technical field
本公开涉及图像处理的技术领域,具体而言,涉及一种数据展示方法、装置、电子设备、计算机程序以及计算机可读存储介质。The present disclosure relates to the technical field of image processing, and in particular, to a data presentation method, device, electronic equipment, computer program, and computer-readable storage medium.
背景技术Background technique
随着网络技术发展,实时视频交流如网络直播成为一种越来越流行的娱乐方式。在直播的过程中,一般情况下要求主播面对主播方终端的显示屏幕,从而增强主播和观众之间的互动效果。在一些特殊情况下,当主播的面部从显示屏幕上消失时,不仅影响为该主播所添加的动画特效的展示效果,还降低观众观看该直播视频的观看体验。同时,由于观众离开直播室,还将间接影响主播的直播体验,以及直播热度。With the development of network technology, real-time video communication such as webcasting has become an increasingly popular form of entertainment. During the live broadcast, generally, the host is required to face the display screen of the host terminal, so as to enhance the interactive effect between the host and the audience. In some special cases, when the anchor's face disappears from the display screen, it not only affects the display effect of the animation special effects added for the anchor, but also reduces the viewing experience of the audience watching the live video. At the same time, as the audience leaves the live broadcast room, it will also indirectly affect the live broadcast experience of the host and the popularity of the live broadcast.
发明内容Contents of the invention
本公开实施例至少提供一种数据展示方法、装置、电子设备、计算机程序以及计算机可读存储介质。Embodiments of the present disclosure at least provide a data display method, device, electronic device, computer program, and computer-readable storage medium.
第一方面,本公开实施例提供了一种数据展示方法,包括:获取真实主播在直播过程中的多帧视频图像;检测每帧所述视频图像中所述真实主播的头部姿态;在根据所述多帧视频图像对应的所述头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在视频直播画面中展示目标特效动画;所述视频直播画面展示有所述真实主播驱动的虚拟主播模型。In the first aspect, the embodiment of the present disclosure provides a data display method, including: acquiring multiple frames of video images of the real anchor during the live broadcast; detecting the head posture of the real anchor in each frame of the video image; The head posture corresponding to the multi-frame video image, when it is determined that the time length of the head of the real anchor in the specified posture meets the special effect triggering requirements, the target special effect animation is displayed in the live video screen; the video The live broadcast screen displays a virtual anchor model driven by the real anchor.
针对虚拟直播领域,在检测到真实主播的头部长时间处于指定姿态的情况下,可能导致视频直播画面中展示的虚拟主播模型的头部出现抖动,从而影响主播的直播体验,以及观看体验。在本公开技术方案中,通过在视频直播画面中展示虚拟主播模型,可以增强直播的趣味性和互动性,进一步地,在确定出真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,可以通过在视频直播画面中展示驱动虚拟主播模型所对应的目标特效动画,保证虚拟主播模型的头部处于稳定的播放状态,同时还可以丰富视频主播画面的展示内容,从而使得视频直播画面不再过于单调,进而解决传统的直播场景下在匹配不到真实主播的面部画面的情况下所导致虚拟主播模型显示异常的问题。For the field of virtual live broadcast, when the head of the real anchor is detected to be in a specified posture for a long time, it may cause the head of the virtual anchor model displayed in the live video screen to shake, thereby affecting the live broadcast experience and viewing experience of the anchor. In the disclosed technical solution, by displaying the virtual anchor model in the live video screen, the interest and interactivity of the live broadcast can be enhanced. Under certain circumstances, by displaying the target special effect animation corresponding to the driving virtual anchor model in the live video screen, it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen, so that the live video The picture is no longer too monotonous, and thus solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
一种可选的实施方式中,所述检测每帧所述视频图像中所述真实主播的头部姿态,包括:在确定出所述真实主播的面部正面朝向视频采集装置的情况下,确定当前时刻所述真实主播的第一面部朝向;根据所述第一面部朝向确定所述真实主播的头部姿态的变化信息;所述变化信息用于表征所述第一面部朝向的变化信息;基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态。In an optional implementation manner, the detection of the head posture of the real anchor in each frame of the video image includes: when it is determined that the face of the real anchor is facing the video acquisition device, determining the current The first facial orientation of the real anchor at any time; determine the change information of the head posture of the real anchor according to the first facial orientation; the change information is used to characterize the change information of the first facial orientation ; Determine the head pose of the real anchor in each frame of the video image based on the change information.
上述实施方式中,通过根据当前时刻真实主播的第一面部朝向确定真实主播的头部姿态的变化信息,进而根据该变化信息确定真实主播的头部姿态,可以实现利用视频序列中的时序信息(即,相邻的视频图像)对真实主播的头部姿态的变化信息进行分析,相比于基于单帧视频图像确定头部姿态的方式,本公开技术方案所提供的方法可以提高头部姿态的准确率,从而得到更加准确的姿态结果。In the above-mentioned embodiment, by determining the change information of the head pose of the real anchor according to the first face orientation of the real anchor at the current moment, and then determining the head pose of the real anchor according to the change information, it is possible to use the timing information in the video sequence (that is, adjacent video images) to analyze the change information of the head posture of the real anchor. Compared with the method of determining the head posture based on a single frame video image, the method provided by the technical solution of the present disclosure can improve the head posture. accuracy, so as to obtain more accurate attitude results.
一种可选的实施方式中,所述基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态,包括:在根据所述变化信息确定出所述第一面部朝向增大至超过第一阈值的情况下,确定所述真实主播的头部姿态从非指定姿态变化为所述指定姿态。In an optional implementation manner, the determining the head posture of the real anchor in each frame of the video image based on the change information includes: determining the first face according to the change information When the head orientation increases to exceed the first threshold, it is determined that the head posture of the real anchor changes from an unspecified posture to the specified posture.
一种可选的实施方式中,基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态,包括:在根据所述变化信息确定出所述第一面部朝向由超过第一阈值降低至小于第二阈值的情况下,确定所述真实主播的头部姿态从所述指定姿态变化为非指定姿态,其中,所述第二阈值小于所述第一阈值。In an optional implementation manner, determining the head posture of the real anchor in each frame of the video image based on the change information includes: determining the first facial orientation according to the change information In the case of exceeding the first threshold and decreasing to less than the second threshold, it is determined that the head posture of the real anchor changes from the designated posture to a non-designated posture, wherein the second threshold is smaller than the first threshold.
上述实施方式中,通过将目标夹角的变化信息与第一阈值和第二阈值进行比较的方式,可以实现通过多阈值比较的方式确定真实主播的头部姿态,进而提高真实主播的头部姿态的准确率,从而防止单阈值技术方案所带来的真实主播头部姿态的频繁变化。In the above embodiment, by comparing the change information of the target angle with the first threshold and the second threshold, it is possible to determine the head posture of the real anchor through multi-threshold comparison, thereby improving the head posture of the real anchor. The accuracy rate, thereby preventing the frequent changes of the real anchor's head posture brought about by the single-threshold technical solution.
一种可选的实施方式中,所述检测每帧所述视频图像中所述真实主播的头部姿态,包括:在确定出所述真实主播的面部未正面朝向视频采集装置的情况下,通过深度学习模型对所述视频直播画面进行处理,得到所述真实主播的头部姿态,并根据所述头部姿态确定所述真实主播的头部是否处于所述指定姿态。In an optional implementation manner, the detecting the head posture of the real anchor in each frame of the video image includes: when it is determined that the face of the real anchor is not facing the video capture device, by The deep learning model processes the live video images to obtain the head pose of the real anchor, and determines whether the head of the real anchor is in the specified pose according to the head pose.
上述实施方式中,在真实主播的面部侧对视频采集装置的情况下,在视频直播画面中无法显示完整的面部特征点。由于残缺的面部特征点将影响头部姿态的确定结果,基于此,通过深度学习模型对视频直播画面进行姿态估计,得到真实主播的头部姿态,可以提高真实主播头部姿态的估计准确率。In the above embodiments, when the face of the real host faces the video capture device, the complete facial feature points cannot be displayed in the live video screen. Since the incomplete facial feature points will affect the determination result of the head pose, based on this, the pose estimation of the live video screen is carried out through the deep learning model to obtain the head pose of the real anchor, which can improve the estimation accuracy of the head pose of the real anchor.
一种可选的实施方式中,所述通过深度学习模型对所述视频直播画面进行处理,得到所述真实主播的头部姿态,包括:获取所述目标参考图像帧;其中,所述目标参考图像帧包括至少以下一种图像帧:所述视频直播画面所属的视频序列中位于该视频直播画面之前的N个图像帧、所述视频直播画面所属的视频序列中前M个图像帧,N和M为大于零的正整数;通过深度学习模型对所述视频直播画面和所述目标参考图像帧进行处理,得到所述真实主播的头部姿态。In an optional implementation manner, the processing of the live video image by using a deep learning model to obtain the head pose of the real anchor includes: acquiring the target reference image frame; wherein, the target reference The image frame includes at least one of the following image frames: N image frames before the video live image in the video sequence to which the live video image belongs, the first M image frames in the video sequence to which the live video image belongs, N and M is a positive integer greater than zero; the live video picture and the target reference image frame are processed by a deep learning model to obtain the head pose of the real anchor.
上述实施方式中,通过结合视频序列中的时序信息,来预测当前时刻视频直播画面中真实主播的头部姿态,可以将根据N个图像帧(或者M个图像帧)确定出的真实主播的头部姿态作为当前时刻待处理的视频直播画面的引导信息,从而指引深度学习模型预测当前时刻视频直播画面中真实主播的头部姿态,以得到更加准确的头部姿态的检测结果。In the above embodiment, by combining the timing information in the video sequence to predict the head posture of the real anchor in the live video screen at the current moment, the real anchor's head determined according to N image frames (or M image frames) can be Head posture is used as the guidance information of the live video screen to be processed at the current moment, so as to guide the deep learning model to predict the head posture of the real anchor in the live video screen at the current moment, so as to obtain more accurate head posture detection results.
一种可选的实施方式中,所述检测每帧所述视频图像中所述真实主播的头部姿态,包括:对所述视频图像中所述真实主播的面部进行特征点检测,得到特征点检测结果,其中,所述特征点检测结果用于表征所述真实主播面部特征点的特征信息;根据所述特征点检测结果确定所述真实主播的第二面部朝向,其中,所述第二面部朝向用于表征所述真实主播的面部相对于视频采集装置的方位信息;根据所述第二面部朝向确定所述真实主播的头部姿态。In an optional implementation manner, the detecting the head posture of the real anchor in each frame of the video image includes: performing feature point detection on the face of the real anchor in the video image to obtain feature points The detection result, wherein the feature point detection result is used to characterize the feature information of the facial feature points of the real anchor; determine the second facial orientation of the real anchor according to the feature point detection result, wherein the second face The orientation is used to characterize the orientation information of the face of the real anchor relative to the video capture device; the head posture of the real anchor is determined according to the second facial orientation.
上述实施方式中,通过根据对视频图像中真实主播的面部进行特征点检测的特征点检测结果确定真实主播的第二面部朝向,可以确定真实主播相对于视频采集装置的方位信息,例如,真实主播正面朝向视频采集装置,或者,真实主播侧对视频采集装置。由于真实主播侧对视频采集装置时,无法采集完整面部图像,在此情况下,将影响真实主播头部姿态的准确率。通过分正面朝向和未正面朝向两种情况确定真实主播的头部姿态,可以提高真实主播的头部姿态的准确率。In the above embodiment, by determining the second facial orientation of the real anchor according to the feature point detection result of the feature point detection on the face of the real anchor in the video image, the orientation information of the real anchor relative to the video acquisition device can be determined, for example, the real anchor The front faces the video capture device, or the real anchor faces the video capture device. Since the real host cannot collect a complete facial image when facing the video capture device, in this case, it will affect the accuracy of the real host's head posture. By determining the head posture of the real anchor according to the two situations of frontal orientation and non-frontal orientation, the accuracy of the real anchor's head posture can be improved.
一种可选的实施方式中,所述在所述视频直播画面中展示目标特效动画,包括:确定所 述头部姿态的姿态类型;确定与所述姿态类型相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并在所述视频直播画面中展示所述目标特效动画。In an optional implementation manner, the displaying the target special effect animation in the live video screen includes: determining the posture type of the head posture; determining the special effect animation matching the posture type, adding the The matching special effect animation is used to drive the target special effect animation displayed by the virtual anchor model, and the target special effect animation is displayed in the live video screen.
上述实施方式中,根据不同的头部姿态的姿态类型,触发不同类型的特效动画的方式,可以丰富特效动画的展示内容,从而增加直播过程中的直播趣味性,为用户提供更加的直播体验。In the above embodiments, different types of special effect animations are triggered according to different head postures and gesture types, which can enrich the display content of special effect animations, thereby increasing the live broadcast fun during the live broadcast process and providing users with a better live broadcast experience.
一种可选的实施方式中,所述在所述视频直播画面中展示目标特效动画,包括:确定观看所述真实主播驱动的虚拟主播模型的直播过程的每个观众的类型信息;确定与所述类型信息相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并向观众方终端发送所述目标特效动画,以在所述观众方终端展示所述目标特效动画。In an optional implementation manner, the displaying target special effect animation in the live video screen includes: determining the type information of each viewer who watches the live broadcast process of the virtual anchor model driven by the real anchor; The special effect animation matching the above type information, using the matching special effect animation as the target special effect animation displayed by the virtual anchor model, and sending the target special effect animation to the audience terminal, so that the target animation can be seen in the audience The party terminal displays the target special effect animation.
上述实施方式中,通过根据每个观众的类型信息确定相匹配的目标特效动画,并在观众方终端展示该目标特效动画的方式,可以增大观众继续观看该直播的概率,从而减少观众的流失,在保证了真实主播的直播热度的同时,还增加了相应的互动乐趣。In the above embodiment, by determining the matching target special effect animation according to the type information of each viewer, and displaying the target special effect animation on the audience terminal, the probability of the audience continuing to watch the live broadcast can be increased, thereby reducing the loss of viewers , while ensuring the popularity of the live broadcast of the real anchor, it also increases the corresponding interactive fun.
第二方面,本公开实施例提供了一种数据展示装置,包括:获取部分,被配置为获取真实主播在直播过程中的多帧视频图像;检测部分,被配置为检测每帧所述视频图像中所述真实主播的头部姿态;特效添加部分,被配置为在根据所述多帧视频图像对应的所述头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在所述视频直播画面中展示目标特效动画;所述视频直播画面展示有所述真实主播驱动的虚拟主播模型。In a second aspect, an embodiment of the present disclosure provides a data display device, including: an acquisition part configured to acquire multiple frames of video images of a real host during a live broadcast; a detection part configured to detect each frame of the video image The head posture of the real anchor in the above; the special effect adding part is configured to determine the time length of the head of the real anchor in a specified posture according to the head posture corresponding to the multi-frame video image to meet the special effect In the case of a trigger requirement, the target special effect animation is displayed in the live video screen; the live video screen shows a virtual anchor model driven by the real anchor.
第三方面,本公开实施例还提供一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.
本公开实施例提供一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行时实现上述方法。An embodiment of the present disclosure provides a computer program, including computer readable codes. When the computer readable codes run in an electronic device, a processor in the electronic device implements the above method when executed.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例所提供的一种数据展示方法的流程图一;Fig. 1 shows a flow chart 1 of a data presentation method provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的一种真实主播端的视频直播画面的效果示意图;Fig. 2 shows a schematic diagram of the effect of a live video screen of a real anchor provided by an embodiment of the present disclosure;
图3示出了本公开实施例所提供的一种数据展示方法的流程图二;FIG. 3 shows the second flow chart of a data presentation method provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的第一种真实主播和视频采集装置之间的方位信息的展示示意图;Fig. 4 shows a schematic diagram showing the orientation information between the first real anchor and the video capture device provided by the embodiment of the present disclosure;
图5示出了本公开实施例所提供的第二种真实主播和视频采集装置之间的方位信息的展 示示意图;Fig. 5 shows a schematic diagram showing the orientation information between the second real anchor and the video acquisition device provided by the embodiment of the present disclosure;
图6示出了本公开实施例所提供的一种数据展示方法的流程图三;FIG. 6 shows a flowchart three of a data presentation method provided by an embodiment of the present disclosure;
图7示出了本公开实施例所提供的一种数据展示方法的流程图四;FIG. 7 shows a flowchart 4 of a data presentation method provided by an embodiment of the present disclosure;
图8示出了本公开实施例所提供的一种数据展示方法的流程图五;FIG. 8 shows a flowchart five of a data presentation method provided by an embodiment of the present disclosure;
图9示出了本公开实施例所提供的第三种真实主播和视频采集装置之间的方位信息的展示示意图;Fig. 9 shows a schematic diagram showing the orientation information between the third real anchor and the video capture device provided by the embodiment of the present disclosure;
图10示出了本公开实施例所提供的一种数据展示方法的流程图六;FIG. 10 shows a flowchart six of a data presentation method provided by an embodiment of the present disclosure;
图11示出了本公开实施例所提供的一种数据展示方法的流程图七;FIG. 11 shows the seventh flowchart of a data presentation method provided by an embodiment of the present disclosure;
图12示出了本公开实施例所提供的一种数据展示方法的流程图八;FIG. 12 shows a flowchart eighth of a data presentation method provided by an embodiment of the present disclosure;
图13示出了本公开实施例所提供的一种数据展示装置的示意图;Fig. 13 shows a schematic diagram of a data display device provided by an embodiment of the present disclosure;
图14示出了本公开实施例所提供的一种电子设备的示意图。Fig. 14 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式detailed description
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
经研究发现,在直播的过程中,一般情况下要求主播面对主播方终端的显示屏幕,从而增强主播和观众之间的互动效果。在一些特殊情况下,当主播的面部从显示屏幕上消失时,不仅影响为该主播所添加的动画特效的展示效果,还降低观众观看该直播视频的观看体验。同时,由于观众离开直播室,还将间接影响主播的直播体验,以及直播热度。After research, it is found that in the process of live broadcasting, the anchor is generally required to face the display screen of the anchor's terminal, so as to enhance the interactive effect between the anchor and the audience. In some special cases, when the anchor's face disappears from the display screen, it not only affects the display effect of the animation special effects added for the anchor, but also reduces the viewing experience of the audience watching the live video. At the same time, as the audience leaves the live broadcast room, it will also indirectly affect the live broadcast experience of the host and the popularity of the live broadcast.
基于上述研究,本公开提供了一种数据展示方法。本公开所提供的技术方案,可以应用于虚拟直播场景下。虚拟直播场景可以理解为使用预先设定的虚拟主播模型,如小熊猫、小兔子、卡通人物等代替真实主播的实际形象进行直播,此时,在视频直播画面中所展示出的为上述虚拟主播模型。同时,还可以利用该虚拟主播模型进行真实主播与观众的互动。Based on the above research, the present disclosure provides a data presentation method. The technical solution provided by the present disclosure can be applied in a virtual live broadcast scenario. The virtual live broadcast scene can be understood as the use of pre-set virtual anchor models, such as red pandas, little rabbits, cartoon characters, etc. to replace the actual image of the real anchor for live broadcast. At this time, the above-mentioned virtual anchor is shown in the live video screen Model. At the same time, the virtual anchor model can also be used to interact with the real anchor and the audience.
举例来说,直播设备的摄像装置可以采集包含真实主播的视频图像,然后,通过电子设备对视频图像中所包含的真实主播的头部进行捕捉,从而得到真实主播的头部姿态。在确定出该头部姿态之后,电子设备可以生成对应的驱动信号,该驱动信号用于驱动视频直播画面中的虚拟主播模型执行相应与真实主播相对应的动作,并通过该视频直播画面展示该虚拟主播模型执行该动作的画面。For example, the camera device of the live broadcast device can collect a video image containing the real anchor, and then capture the head of the real anchor included in the video image through electronic equipment, so as to obtain the head posture of the real anchor. After determining the head posture, the electronic device can generate a corresponding driving signal, which is used to drive the virtual anchor model in the live video screen to perform corresponding actions corresponding to the real anchor, and display the virtual anchor model through the live video screen. A picture of the virtual anchor model performing the action.
在一个可选的实施方式中,真实主播可以通过电子设备预先设定一个或多个虚拟主播模型,例如,可以预先设定“XXX游戏中的YYY角色模型”作为虚拟主播模型。这样,在开启当前时刻的虚拟直播时,可以从预先设定的一个或多个虚拟主播模型中选择一个作为当前时刻的虚拟主播模型。其中,虚拟主播模型可以为2D模型,还可以为3D模型。In an optional implementation, the real anchor can preset one or more virtual anchor models through the electronic device, for example, "YYY role model in XXX game" can be preset as the virtual anchor model. In this way, when starting the virtual live broadcast at the current moment, one can be selected from one or more preset virtual anchor models as the virtual anchor model at the current moment. Wherein, the virtual anchor model may be a 2D model or a 3D model.
在另一个可选的实施方式中,除了上述所描述方式为真实主播确定虚拟主播模型之外, 电子设备还可以在获取到多帧视频图像之后,为该视频图像中的真实主播重塑虚拟主播模型。In another optional implementation, in addition to determining the virtual anchor model for the real anchor in the manner described above, the electronic device can also reshape the virtual anchor for the real anchor in the video image after acquiring multiple frames of video images Model.
举例来说,电子设备可以对视频图像中所包含的真实主播进行识别,从而根据识别结果为真实主播重塑虚拟主播模型。该识别结果可以包含以下至少之一:真实主播的性别、真实主播的外貌特征、真实主播的穿戴特征等。For example, the electronic device can recognize the real anchor included in the video image, so as to reshape the virtual anchor model for the real anchor according to the recognition result. The recognition result may include at least one of the following: the gender of the real anchor, the appearance characteristics of the real anchor, the clothing characteristics of the real anchor, and the like.
基于识别结果,电子设备可以从虚拟主播模型库中,搜索与该识别结果相匹配的模型作为该真实主播的虚拟主播模型。例如,根据识别结果确定出真实主播在直播过程中戴鸭舌帽、所穿衣服为嘻哈风格的衣服。电子设备可以从虚拟主播模型库中,搜索与该“鸭舌帽”或者“嘻哈风”相匹配的虚拟主播模型作为该真实主播的虚拟主播模型。Based on the recognition result, the electronic device may search for a model matching the recognition result from the virtual anchor model database as the virtual anchor model of the real anchor. For example, according to the recognition result, it is determined that the real anchor wears a peaked cap and clothes in hip-hop style during the live broadcast. The electronic device may search for a virtual anchor model matching the "cap" or "hip-hop style" from the virtual anchor model database as the virtual anchor model of the real anchor.
除了在虚拟主播模型库中搜索与识别结果相匹配的模型之外,电子设备还可以基于该识别结果,通过模型构建模块,为真实主播实时构建出相应的虚拟主播模型。In addition to searching for a model matching the recognition result in the virtual anchor model library, the electronic device can also construct a corresponding virtual anchor model for the real anchor in real time through the model building module based on the recognition result.
这里,在实时构建该虚拟主播模型时,电子设备还可以将该真实主播在过去时刻所发起的虚拟直播所使用的虚拟主播模型作为参考,构建当前时刻该真实主播所驱动的虚拟主播模型。Here, when constructing the virtual anchor model in real time, the electronic device can also use the virtual anchor model used by the virtual live broadcast initiated by the real anchor in the past as a reference to construct the virtual anchor model driven by the real anchor at the current moment.
通过上述所描述的确定虚拟主播模型的方式,可以实现为真实主播个性化定制相应的虚拟主播模型,从而避免千篇一律的虚拟主播模型。同时,通过个性化定制虚拟主播模型,还可以为观众留下更深刻的印象。Through the method of determining the virtual anchor model described above, it is possible to customize the corresponding virtual anchor model for the real anchor, thereby avoiding stereotyped virtual anchor models. At the same time, by customizing the virtual anchor model, it can also leave a deeper impression on the audience.
针对观众端来说,在观众端的直播观看界面中所展示的为虚拟主播模型执行相应动作时的动画。针对直播端来说,在直播端的视频直播画面中可以展示有虚拟主播模型,还可以展示包含真实主播的视频图像,例如,如图2所示,可以在视频直播画面的左侧位置展示该虚拟主播模型,还可以为该视频直播画面的右下角位置21展示该真实主播的视频图像。For the viewer side, the animation displayed on the live viewing interface of the viewer side is the animation when the virtual anchor model performs corresponding actions. For the live broadcast end, the virtual anchor model can be displayed on the live video screen of the live broadcast end, and the video image containing the real anchor can also be displayed. For example, as shown in Figure 2, the virtual anchor model can be displayed on the left side of the live video screen. The anchor model can also display the video image of the real anchor at the position 21 in the lower right corner of the live video screen.
在本公开实施例中,目标特效动画中包含多个动画帧。电子设备在驱动虚拟主播模型执行指定动作时,可以生成多个动画帧,并通过组合多个动画帧得到目标特效动画。In the embodiment of the present disclosure, the target special effect animation includes multiple animation frames. When the electronic device drives the virtual anchor model to perform specified actions, it can generate multiple animation frames, and obtain the target special effect animation by combining the multiple animation frames.
在本公开实施例中,通过在视频直播画面中展示驱动虚拟主播模型所对应的目标特效动画,可以保证虚拟主播模型的头部处于稳定的播放状态,同时还可以丰富视频主播画面的展示内容,从而使得视频直播画面不再过于单调,进而解决传统的直播场景下在匹配不到真实主播的面部画面的情况下所导致虚拟主播模型显示异常的问题。In the embodiment of the present disclosure, by displaying the target special effect animation corresponding to the driving virtual anchor model in the live video screen, it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen. As a result, the video live broadcast screen is no longer too monotonous, and then solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种数据展示方法进行详细介绍,本公开实施例所提供的数据展示方法的执行主体一般为具有一定计算能力的电子设备,该电子设备例如包括:终端设备或服务器或其它能够支持虚拟直播的直播设备。在一些可能的实现方式中,该数据展示方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, a data display method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the data display method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability. The devices include, for example: terminal devices or servers or other live broadcast devices capable of supporting virtual live broadcast. In some possible implementation manners, the data presentation method may be implemented by a processor invoking computer-readable instructions stored in a memory.
在本公开实施例中,该数据展示方法可以应用在聊天直播场景、游戏直播场景等任意一种虚拟直播场景,本公开对此不作具体限定。In the embodiment of the present disclosure, the data presentation method can be applied to any virtual live broadcast scene such as a chat live broadcast scene, a game live broadcast scene, etc., which is not specifically limited in the present disclosure.
参见图1所示,为本公开实施例提供的一种数据展示方法的流程图,所述方法包括步骤S101~S105,其中:Referring to FIG. 1 , it is a flow chart of a data presentation method provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:
S101、获取真实主播在直播过程中的多帧视频图像。S101. Acquire multiple frames of video images of a real anchor during a live broadcast.
S103、检测每帧视频图像中真实主播的头部姿态。S103. Detect the head posture of the real anchor in each frame of video image.
这里,头部姿态可以用于表征该真实主播的面部所对应的平面与水平面之间的夹角,和/或,该真实主播的面部所对应的平面与视频采集装置的镜头所在平面之间的夹角,和/或,该真实主播的面部所对应的平面与真实主播方终端所在平面之间的夹角。Here, the head posture can be used to characterize the angle between the plane corresponding to the face of the real anchor and the horizontal plane, and/or, the angle between the plane corresponding to the face of the real anchor and the plane where the lens of the video capture device is located. The included angle, and/or, the included angle between the plane corresponding to the real anchor's face and the plane where the real anchor's terminal is located.
在本公开实施例中,可以根据头部姿态确定该真实主播的头部相对于的真实主播方终端的视频采集装置的姿态:例如,仰头姿态、低头姿态,以及平视姿态等姿态,其中,该平视姿态可以理解为真实主播的面部与水平面处于相对平行的状态。In the embodiment of the present disclosure, the posture of the head of the real anchor relative to the video acquisition device of the real anchor terminal can be determined according to the head posture: for example, postures such as head-up posture, head-down posture, and head-up posture, among which, The head-up posture can be understood as a state where the face of the real anchor is relatively parallel to the horizontal plane.
在本公开实施例中,在视频图像中包含多个真实主播的情况下,可以检测每个真实主播的头部姿态,还可以检测多个真实主播中指定真实主播的头部姿态,本公开对此不作具体限 定。In the embodiment of the present disclosure, when the video image contains multiple real anchors, the head posture of each real anchor can be detected, and the head posture of a designated real anchor among the multiple real anchors can also be detected. This is not specifically limited.
S105、在根据多帧视频图像对应的头部姿态,确定出真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在视频直播画面中展示目标特效动画,视频直播画面展示有真实主播驱动的虚拟主播模型。S105. When it is determined according to the head postures corresponding to the multi-frame video images that the time length of the real anchor's head in the specified posture meets the special effect triggering requirements, display the target special effect animation in the live video screen, and the live video screen display has A virtual anchor model driven by real anchors.
这里,指定姿态可以理解为视频图像中真实主播的面部处于无效展示状态时该真实主播的头部姿态。例如,可以为真实主播的面部长时间固定不动时真实主播的头部姿态,还可以为真实主播的面部从视频直播画面中消失时真实主播的头部姿态,还可以为在视频直播画面中仅显示部分真实主播面部时真实主播的头部姿态,还可以为真实主播长时间处于未正面朝向视频采集装置时真实主播的头部姿态。Here, the specified posture can be understood as the head posture of the real anchor when the face of the real anchor in the video image is in an invalid display state. For example, it can be the head posture of the real anchor when the face of the real anchor is fixed for a long time, it can also be the head posture of the real anchor when the face of the real anchor disappears from the live video screen, and it can also be The head posture of the real anchor when only part of the face of the real anchor is displayed, or the head posture of the real anchor when the real anchor is not facing the video capture device for a long time.
例如,该指定姿态包含以下几种姿态:低头姿态、仰头姿态、向左下方低头的姿态,向右下方低头的姿态、向左上方仰头的姿态、向右上方仰头的姿态,此处不再一一列举。For example, the specified posture includes the following postures: bowing the head posture, raising the head posture, bowing the head to the lower left, lowering the head to the lower right, raising the head to the upper left, and raising the head to the upper right. No longer list them one by one.
这里,目标特效动画可以理解为与指定姿态相匹配的特效动画。其中,与指定姿态相匹配的特效动画可以相同或者不同。例如,针对低头姿态或者仰头姿态,可以预先设置相匹配的一种或多种特效动画,且每种特效动画对应不同的特效触发要求。Here, the target special effect animation can be understood as a special effect animation matching a specified posture. Wherein, the special effect animations matching the specified gestures may be the same or different. For example, for a head-down posture or a head-up posture, one or more matching special effect animations may be preset, and each special effect animation corresponds to a different special effect triggering requirement.
在本公开实施例中,目标特效动画可以包含模型动画,除此之外,还可以包含素材特效。其中,模型动画可以为驱动虚拟主播模型的指定肢体执行相应动作时的动画,例如,比心的动作,打招呼的动作,再见的动作等动作。素材特效可以为预先设定的动态或者静态的贴纸特效。这里,素材特效可以为与模型动画相匹配的特效,还可以为与真实主播的指定姿态相匹配的特效。In the embodiment of the present disclosure, the target special effect animation may include model animation, and besides, may also include material special effects. Wherein, the model animation may be the animation when the specified limbs of the virtual anchor model are driven to perform corresponding actions, for example, actions such as heart-to-heart gestures, greeting gestures, and goodbye gestures. The material special effect can be a preset dynamic or static sticker special effect. Here, the material special effect may be a special effect matching the model animation, and may also be a special effect matching the specified posture of the real anchor.
在素材特效为与模型动画相匹配的特效的情况下,在视频直播画面中展示出模型动画的同时,还可以在视频直播画面中的指定展示位置展示出该素材特效;当切换至播放下一个模型动画时,可以在视频直播画面中切换展示下一个模型动作所对应的素材特效。When the special effect of the material is a special effect that matches the model animation, while displaying the model animation in the live video screen, the special effect of the material can also be displayed at the specified display position in the live video screen; when switching to the next During model animation, you can switch to display the material effects corresponding to the next model action on the live video screen.
在素材特效为与真实主播的指定姿态相匹配的特效的情况下,可以在检测出真实主播处于指定姿态的时间长度满足特效触发要求的情况下,持续在视频直播画面中展示出该素材特效,直至检测出真实主播的头部不再处于指定姿态。If the special effect of the material is a special effect that matches the specified posture of the real anchor, the special effect of the material can be continuously displayed in the live video screen when it is detected that the length of time that the real anchor is in the specified posture meets the requirements for triggering the special effect. Until it is detected that the head of the real anchor is no longer in the specified posture.
例如,在游戏虚拟直播场景下,指定姿态可以是真实主播长时间处于低头姿势,目标特效动画中可以包含:模型动画和素材特效。这里,模型动画可以包含虚拟主播模型“比心”的动画,以及包含虚拟主播模型“打招呼”的动画,素材特效可以为与该模型动画相匹配的贴纸特效,例如,该贴纸特效可以为“Hello”,以及爱心贴纸。For example, in the virtual game live broadcast scene, the specified posture can be that the real anchor keeps his head down for a long time, and the target special effect animation can include: model animation and material special effects. Here, the model animation can include the animation of the virtual anchor model "Bi Xin" and the animation of the virtual anchor model "Greeting". The material effect can be a sticker effect that matches the model animation. For example, the sticker effect can be "Hello ", and love stickers.
这样,在真实主播长时间处于低头姿势的情况下,可以在视频直播画面中依次循环展示放:“打招呼”的动画和“比心”的动画,直至检测出真实主播的头部不再处于指定姿态。In this way, when the real anchor is in the posture of bowing his head for a long time, the animation of "greeting" and "compassion" can be displayed in sequence in the live video screen until it is detected that the head of the real anchor is no longer in the specified position. attitude.
在视频直播画面中展示“打招呼”的动画的情况下,可以在该视频直播画面中同时展示出“Hello”的贴纸特效。在视频直播画面中展示“比心”的动画的情况下,可以在该视频直播画面中同时展示出爱心贴纸特效。In the case of displaying the "greeting" animation on the live video screen, the sticker special effect of "Hello" can be displayed on the live video screen at the same time. When the animation of "Bi Xin" is displayed on the live video screen, the special effect of love stickers can be displayed on the live video screen at the same time.
通过设置目标特效动画中包含模型动画和素材特效,可以丰富视频直播画面中所展示的内容,进而提高用户的直播体验。By setting the target special effect animation to include model animation and material special effects, the content displayed in the live video screen can be enriched, thereby improving the user's live broadcast experience.
在一个可选的实施方式中,在视频直播画面中展示目标特效动画,包括如下步骤:In an optional implementation manner, displaying the target special effect animation in the live video screen includes the following steps:
在检测到真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,可以向服务器请求目标特效动画。然后,在真实主播端所在的直播设备的视频直播画面中展示该目标特效动画,并向观众端所在的设备推送该目标特效动画所对应的视频流,以在观众端所在的设备的直播观看界面上播放该目标特效动画。When it is detected that the head of the real anchor is in the specified posture for a period of time meeting the requirement for triggering the special effect, the target special effect animation may be requested to the server. Then, display the target special effect animation on the live video screen of the live broadcast device where the real anchor is located, and push the video stream corresponding to the target special effect animation to the device where the audience terminal is located, so as to view the live broadcast on the device where the audience terminal is located. Play the target special effect animation on .
在本公开实施例中,目标特效动画的数量可以为一个,还可以为多个。例如,可以设置多个目标特效动画循环播放,直至检测出真实主播的头部不再处于指定姿态。还例如,可以设置一个目标特效动画循环播放,直至检测出真实主播的头部不再处于指定姿态。In the embodiment of the present disclosure, the number of target special effect animations may be one, or multiple. For example, multiple target special effect animations can be set to play in a loop until it is detected that the head of the real anchor is no longer in the specified posture. Also for example, a target special effect animation can be set to play in a loop until it is detected that the head of the real anchor is no longer in the specified posture.
例如,针对游戏直播场景,可以在视频直播画面上同时展示出虚拟主播模型和游戏实时 画面。比如,可以在视频直播画面的左侧展示游戏实时画面,然后,在视频直播画面的右侧展示虚拟主播模型。在真实主播的头部处于低头姿势的时间满足特效触发要求的情况下,可以确定目标特效动画。比如,该目标特效动画可以为虚拟主播模型跳舞的特效动画,还可以为虚拟主播模型提醒观众“请稍等一会,稍后精彩继续”的特效动画。For example, for the live game scene, the virtual anchor model and the real-time game screen can be displayed on the live video screen at the same time. For example, the real-time game screen can be displayed on the left side of the live video screen, and then the virtual anchor model can be displayed on the right side of the live video screen. In the case that the time when the real anchor's head is in the head-down posture meets the special effect triggering requirement, the target special effect animation can be determined. For example, the target special effect animation can be a special effect animation for the virtual anchor model to dance, and it can also be a special effect animation for the virtual anchor model to remind the audience "please wait for a while, and the excitement will continue later".
在本公开实施例中,可以预先创建一个包含映射关系的数据库,该数据库中存储了多种特效动画,该数据库中还包含映射关系,该映射关系用于表征每种指定姿态和特效动画之间的映射关系,和/或,用于表征每种指定姿态所对应的特效触发要求和特效动画之间的映射关系。In the embodiment of the present disclosure, a database containing a mapping relationship can be created in advance, and various special effect animations are stored in the database. The mapping relationship of , and/or, is used to characterize the mapping relationship between the special effect trigger requirement and the special effect animation corresponding to each specified gesture.
在视频直播画面中展示目标特效动画之前,可以根据映射关系在数据库中查找与该指定姿态以及特效触发要求具有映射关系的特效动画,并基于该查找到的特效动画确定目标特效动画。Before displaying the target special effect animation in the live video screen, the special effect animation that has a mapping relationship with the specified posture and special effect triggering requirements can be searched in the database according to the mapping relationship, and the target special effect animation can be determined based on the found special effect animation.
针对上述步骤S101,在检测到真实主播的直播开启指令之后,开始采集真实主播在直播过程中的直播视频,其中,该直播视频中包含多帧视频图像。With regard to the above step S101, after detecting the start instruction of the live broadcast of the real anchor, start to collect the live video of the real anchor during the live broadcast, wherein the live video contains multiple frames of video images.
在采集到多帧视频图像之后,执行步骤S103,检测每帧所述视频图像中所述真实主播的头部姿态,可以如图3所示,包括如下步骤:After collecting multiple frames of video images, step S103 is performed to detect the head posture of the real anchor in each frame of the video image, as shown in Figure 3, including the following steps:
S1031、对所述视频图像中所述真实主播的面部进行特征点检测,得到特征点检测结果,其中,所述特征点检测结果用于表征所述真实主播面部特征点的特征信息;S1031. Perform feature point detection on the face of the real anchor in the video image to obtain a feature point detection result, wherein the feature point detection result is used to characterize the feature information of the facial feature points of the real anchor;
S1032、根据所述特征点检测结果确定所述真实主播的第二面部朝向,其中,所述第二面部朝向用于表征所述真实主播的面部相对于视频采集装置的方位信息;S1032. Determine a second facial orientation of the real anchor according to the feature point detection result, wherein the second facial orientation is used to represent the orientation information of the real anchor's face relative to the video capture device;
S1033、根据所述第二面部朝向确定所述真实主播的头部姿态。S1033. Determine the head posture of the real anchor according to the second facial orientation.
针对每帧视频图像,可以通过人脸检测网络模型,对视频图像中所述真实主播的面部进行特征点检测,从而得到该真实主播的面部特征点的特征信息。For each frame of video image, feature point detection can be performed on the face of the real anchor in the video image through the face detection network model, so as to obtain the feature information of the facial feature points of the real anchor.
这里,特征点可以理解为真实主播面部五官的特征点,其中,特征点的数量可以根据实际需要来进行设定,一般情况下,特征点的数量可以选择为84个面部特征点。特征点的特征信息可以理解为特征点的数量、特征点的标号、每个特征点的分类信息(例如,所属于眼部特征点,嘴部特征点,或者鼻子特征点),以及每个特征点所对应的特征值。Here, the feature points can be understood as the feature points of the facial features of the real anchor. The number of feature points can be set according to actual needs. Generally, the number of feature points can be selected as 84 facial feature points. The feature information of a feature point can be understood as the number of feature points, the label of the feature point, the classification information of each feature point (for example, the eye feature point, the mouth feature point, or the nose feature point), and each feature point The eigenvalues corresponding to the points.
需要说明的是,由于特征点的数量可以影响所确定出的真实主播的头部姿态的准确性,例如,特征点的数量越多,则计算出的头部姿态的准确性越高,反之越低。此时,可以根据真实主播方终端的设备内存的剩余量选择动态的调整特征点的数量。例如,当真实主播方终端的剩余内存大于预设阈值时,可以选择确定较多数量的特征点的特征点检测结果,从而根据该特征点检测结果确定真实主播的面部朝向。It should be noted that since the number of feature points can affect the accuracy of the determined head pose of the real anchor, for example, the larger the number of feature points, the higher the accuracy of the calculated head pose, and vice versa. Low. At this time, the number of feature points can be dynamically adjusted according to the remaining amount of device memory of the real host terminal. For example, when the remaining memory of the real host terminal is greater than the preset threshold, a feature point detection result of a larger number of feature points may be selected to determine the face orientation of the real host according to the feature point detection result.
通过动态设置特征点的数量的方式,可以在真实主播方终端的内存满足计算要求的情况下,可以得到更加准确的面部朝向,进而提高头部姿态的准确性。By dynamically setting the number of feature points, a more accurate face orientation can be obtained when the memory of the real host terminal meets the calculation requirements, thereby improving the accuracy of the head posture.
在对真实主播的面部进行特征点检测,得到特征点检测结果之后,就可以根据特征点检测结果确定真实主播的面部朝向(即,上述第二面部朝向)。After the feature point detection is performed on the face of the real anchor and the feature point detection result is obtained, the face orientation of the real anchor (that is, the above-mentioned second face orientation) can be determined according to the feature point detection result.
一种可选的实施方式为,可以将特征点检测结果输入至神经网络模型中,以通过该神经网络模型对该特征点检测结果进行处理,得到该真实主播的面部朝向(即,上述第二面部朝向)。An optional implementation manner is that the feature point detection result can be input into the neural network model, so as to process the feature point detection result through the neural network model to obtain the face orientation of the real anchor (that is, the above-mentioned second face facing).
另一种可选的实施方式为,判断特征点检测结果中所包含特征点的分类信息。如果根据该分类信息确定出该特征点不包含全部的面部特征点,此时,可以确定出真实主播侧对视频采集装置。如果根据该分类信息确定出该特征点包含全部的面部特征,此时,可以确定出真实主播正面朝向视频采集装置。Another optional implementation manner is to judge the classification information of the feature points contained in the feature point detection result. If it is determined according to the classification information that the feature point does not include all the facial feature points, at this time, it can be determined that the real anchor is facing the video capture device. If it is determined according to the classification information that the feature point contains all facial features, at this time, it can be determined that the real anchor is facing the video capture device.
这里,第二面部朝向用于表征所述真实主播的面部相对于视频采集装置的方位信息;该方位信息可以理解为真实主播的面部相对于该真实主播所属的真实主播方终端的视频采集装置的角度和距离。Here, the second facial orientation is used to characterize the orientation information of the face of the real anchor relative to the video capture device; the orientation information can be understood as the orientation of the face of the real anchor relative to the video capture device of the real host terminal to which the real anchor belongs. angle and distance.
如图4和图5所示的即为真实主播的面部相对于该视频采集装置所之间的角度。As shown in Figure 4 and Figure 5 is the angle between the face of the real anchor relative to the video capture device.
如图4所示,视频采集装置安装在真实主播方终端上,在真实主播的面部水平面与视频采集装置所在坐标系的X轴之间的夹角小于或者等于指定阈值的情况下,确定真实主播的面部正面朝向该视频采集装置。As shown in Figure 4, the video capture device is installed on the terminal of the real anchor, and the real anchor is determined when the angle between the horizontal plane of the face of the real anchor and the X-axis of the coordinate system where the video capture device is located is less than or equal to the specified threshold face facing the video capture device.
如图5所示,视频采集装置安装在真实主播方终端上,在真实主播的面部水平面与视频采集装置所在坐标系的X轴之间的夹角大于指定阈值的情况下,确定真实主播的面部侧对该视频采集装置。As shown in Figure 5, the video capture device is installed on the terminal of the real anchor, and when the angle between the horizontal plane of the face of the real anchor and the X-axis of the coordinate system where the video capture device is located is greater than a specified threshold, determine the face of the real anchor side to the video capture device.
在本公开实施例中,该指定阈值可以设定为0至30之间的任意一个数值,此处不作具体限定。In the embodiment of the present disclosure, the specified threshold may be set to any value between 0 and 30, which is not specifically limited here.
在确定出面部朝向之后,就可以该面部朝向确定真实主播的面部是否正面朝向视频采集装置。After the face orientation is determined, the face orientation can be used to determine whether the face of the real anchor is facing the video capture device.
在确定出真实主播的面部正面朝向的情况下,通过阈值比较的方式确定真实主播的头部姿态。这里,阈值比较的方式可以理解为通过将真实主播的头部姿态的变化信息和预设阈值进行比较的方式,来确定真实主播的头部姿态是否为指定姿态。在确定出真实主播的面部侧对视频采集装置的情况下,通过神经网络模型确定真实主播的头部姿态。In the case of determining the frontal orientation of the face of the real anchor, the head pose of the real anchor is determined by way of threshold value comparison. Here, the method of threshold value comparison can be understood as determining whether the head pose of the real anchor is a specified pose by comparing the change information of the head pose of the real anchor with a preset threshold. When it is determined that the face of the real anchor faces the video capture device, the head posture of the real anchor is determined through the neural network model.
上述实施方式中,通过根据对视频图像中真实主播的面部进行特征点检测的特征点检测结果确定真实主播的第二面部朝向,可以确定真实主播相对于视频采集装置的方位信息,例如,真实主播正面朝向视频采集装置,或者,真实主播侧对视频采集装置。由于真实主播侧对视频采集装置时,无法采集完整面部图像,在此情况下,将影响真实主播头部姿态的准确率。通过分正面朝向和未正面朝向(例如,侧对)两种情况确定真实主播的头部姿态,可以提高真实主播的头部姿态的准确率。In the above embodiment, by determining the second facial orientation of the real anchor according to the feature point detection result of the feature point detection on the face of the real anchor in the video image, the orientation information of the real anchor relative to the video acquisition device can be determined, for example, the real anchor The front faces the video capture device, or the real anchor faces the video capture device. Since the real host cannot collect a complete facial image when facing the video capture device, in this case, it will affect the accuracy of the real host's head posture. By determining the head posture of the real anchor according to two situations of frontal orientation and non-frontal orientation (for example, facing sideways), the accuracy rate of the head posture of the real anchor can be improved.
下面将分情况对正对和侧对两种情况进行详细介绍。The following will introduce the two cases of positive pair and side pair in detail.
情况一:真实主播的面部正面朝向视频采集装置。Situation 1: The face of the real anchor faces the video capture device.
在此情况下,如图6所示,步骤S103,检测每帧所述视频图像中所述真实主播的头部姿态,过程包括如下步骤:In this case, as shown in Figure 6, step S103, detecting the head posture of the real anchor in each frame of the video image, the process includes the following steps:
S11、在确定出所述真实主播的面部正面朝向视频采集装置的情况下,确定当前时刻所述真实主播的第一面部朝向;S11. When it is determined that the face of the real anchor is facing the video capture device, determine the first face orientation of the real anchor at the current moment;
S12、根据所述第一面部朝向确定所述真实主播的头部姿态的变化信息;所述变化信息用于表征第一面部朝向的变化信息;S12. Determine the change information of the real anchor's head posture according to the first facial orientation; the change information is used to represent the change information of the first facial orientation;
S13、基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态。S13. Determine the head pose of the real anchor in each frame of the video image based on the change information.
在本公开实施例中,如果确定出真实主播的面部正面朝向视频采集装置的情况下,可以获取历史面部朝向,其中,该历史面部朝向为根据在当前时刻之前的多个历史时刻采集到的视频图像确定的该真实主播的面部朝向,该历史面部朝向可以用于表征每个历史时刻该真实主播的面部所在平面和水平面之间历史角度。In the embodiment of the present disclosure, if it is determined that the face of the real anchor is facing the video capture device, the historical facial orientation can be obtained, wherein the historical facial orientation is based on the video collected at multiple historical moments before the current moment. The facial orientation of the real anchor determined by the image, and the historical facial orientation can be used to represent the historical angle between the plane where the real anchor's face is located and the horizontal plane at each historical moment.
在获取到该历史面部朝向之后,就可以结合历史面部朝向和当前时刻确定出的第一面部朝向,确定真实主播的头部姿态的变化信息,也即,根据历史角度和当前时刻面部所在平面与水平面之间的角度确定该第一面部朝向的变化信息。After the historical facial orientation is obtained, the historical facial orientation and the first facial orientation determined at the current moment can be combined to determine the change information of the head posture of the real anchor, that is, according to the historical angle and the plane where the face is located at the current moment The angle with the horizontal plane determines the change information of the first facial orientation.
这里,第一面部朝向用于表征真实主播的面部相对于视频采集装置对应的成像平面的倾斜程度。例如,第一面部朝向可以为真实主播的面部和水平面之间的夹角;第一面部朝向还可以为真实主播的面部和视频采集装置对应的成像平面之间的夹角。除此之外,还可以为其他能够表征该倾斜程度的夹角。Here, the first face orientation is used to represent the degree of inclination of the real host's face relative to the imaging plane corresponding to the video capture device. For example, the first face orientation may be the angle between the face of the real anchor and the horizontal plane; the first face orientation may also be the angle between the face of the real anchor and the imaging plane corresponding to the video capture device. In addition, other included angles that can characterize the degree of inclination may also be used.
这里,变化信息可以理解为第一面部朝向逐渐增大,以及第一面部朝向的增大幅度,或者第一面部朝向逐渐减小,以及第一面部朝向的减小幅度等趋势信息。Here, the change information can be understood as the trend information such as the gradual increase of the first facial orientation and the increase range of the first facial orientation, or the gradual decrease of the first facial orientation and the decrease range of the first facial orientation. .
需要说明的是,历史面部朝向为根据多个连续的历史时刻所对应的视频图像所确定的面部朝向。例如,当前时刻为k时刻,那么历史时刻可以为k-n时刻至k-1时刻,历史面部朝 向分别为基于k-n时刻至k-1时刻所采集到的视频图像确定的该真实主播的面部朝向。It should be noted that the historical face orientation is the facial orientation determined according to video images corresponding to a plurality of consecutive historical moments. For example, if the current moment is moment k, then the historical moment can be from moment k-n to moment k-1, and the historical facial orientations are the facial orientations of the real anchor determined based on the video images collected from moment k-n to moment k-1.
在本公开实施例中,在根据变化信息确定每帧视频图像中该真实主播的头部姿态时,可以将该变化信息和阈值过渡区间进行比较,其中,该阈值过渡区间为根据多个阈值确定的多个过渡区间。通过该阈值过渡区间可以确定真实主播的头部姿态的变化过程,进而通过该变化过程来确定当前时刻真实主播的头部姿态。In the embodiment of the present disclosure, when determining the head posture of the real anchor in each frame of video image according to the change information, the change information can be compared with a threshold transition interval, wherein the threshold transition interval is determined based on multiple thresholds multiple transition intervals. The change process of the head pose of the real anchor can be determined through the threshold transition interval, and then the head pose of the real anchor at the current moment can be determined through the change process.
上述实施方式中,通过根据当前时刻真实主播的第一面部朝向和历史时刻的历史面部朝向确定真实主播的头部姿态的变化信息,进而根据该变化信息确定真实主播的头部姿态,可以实现利用视频序列中的时序信息(即,相邻的视频图像)对真实主播的头部姿态的变化信息进行分析,相比于基于单帧视频图像确定头部姿态的方式,本公开技术方案所提供的方法可以提高头部姿态的准确率,从而得到更加准确的姿态结果。In the above embodiment, by determining the change information of the head posture of the real anchor according to the first face orientation of the real anchor at the current moment and the historical face orientation at historical moments, and then determining the head posture of the real anchor according to the change information, it can be realized Using the timing information in the video sequence (that is, adjacent video images) to analyze the change information of the head pose of the real anchor, compared with the method of determining the head pose based on a single frame of video image, the technical solution provided by the present disclosure The method can improve the accuracy of the head pose, so as to obtain more accurate pose results.
在一个可选的实施方式中,如图7所示,上述步骤S13,基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态,可以通过执行S13-1或者S13-2的步骤来实现,如下:In an optional implementation, as shown in FIG. 7, the above step S13, determining the head pose of the real anchor in each frame of the video image based on the change information, can be performed by executing S13-1 or The steps of S13-2 are implemented as follows:
S13-1、在根据所述变化信息确定出所述第一面部朝向增大至超过第一阈值的情况下,确定所述真实主播的头部姿态从非指定姿态变化为所述指定姿态。S13-1. When it is determined according to the change information that the first facial orientation has increased to exceed a first threshold, determine that the head posture of the real anchor has changed from an unspecified posture to the specified posture.
在一些实施例中,针对S13-1,可以根据实际的直播场景中为指定姿态所定义的第一面部朝向的角度范围,来设定第一阈值。当根据该变化信息确定出第一面部朝向的变化信息为目标夹角逐渐增大,且第一面部朝向从小于第一阈值增大至超过第一阈值的情况下,确定真实主播的头部姿态变化为指定姿态。In some embodiments, for S13-1, the first threshold may be set according to the angle range of the first face orientation defined for the specified gesture in the actual live broadcast scene. When it is determined according to the change information that the change information of the first facial orientation is that the target angle gradually increases, and the first facial orientation increases from less than the first threshold to exceeding the first threshold, determine the head of the real anchor Change the internal posture to the specified posture.
示例性地,可以将第一阈值设定为[27-33]中的任意一个数值,例如,可以将第一阈值设定为30。例如,当根据该变化信息确定出第一面部朝向的变化信息为第一面部朝向增大至超过30度时,确定真实主播的头部姿态变化为指定姿态。本公开实施例对第一阈值的设定的具体数值不作限定。Exemplarily, the first threshold may be set to any value in [27-33], for example, the first threshold may be set to 30. For example, when it is determined according to the change information that the change information of the first facial orientation is that the first facial orientation increases to more than 30 degrees, it is determined that the head posture of the real anchor has changed to a specified posture. The embodiment of the present disclosure does not limit the specific numerical value of the setting of the first threshold.
这里,在确定出第一面部朝向增大至超过第一阈值之后,还可以继续对采集到的视频图像进行头部姿态的检测。当检测出第一面部朝向在增大至超过第一阈值之后,继续增大至超过阈值A1的情况下,确定真实主播的指定姿态(例如,低头姿态或者仰头姿态)过于严重,此时,可以向真实主播发送姿态调整的提示信息,以提示真实主播调整当前时刻的头部姿态。Here, after it is determined that the first facial orientation has increased to exceed the first threshold, the detection of the head posture may also continue to be performed on the collected video images. When it is detected that the first facial orientation continues to increase to exceed the threshold A1 after it increases to exceed the first threshold, it is determined that the specified gesture (for example, head-down posture or head-up posture) of the real anchor is too serious, at this time , a posture adjustment prompt message can be sent to the real anchor to prompt the real anchor to adjust the head posture at the current moment.
这里,阈值A1可以为大于第一阈值的多个阈值,例如,阈值A1可以选择为50度,还可以选择为60度、70度等。可以理解的是,阈值A1可以选择为[30-90]之间的多个任意数值,本公开对比不作具体限定。Here, the threshold A1 may be multiple thresholds greater than the first threshold, for example, the threshold A1 may be selected as 50 degrees, and may also be selected as 60 degrees, 70 degrees and so on. It can be understood that the threshold A1 can be selected as multiple arbitrary values between [30-90], which is not specifically limited in the present disclosure.
S13-2、在根据所述变化信息确定出所述第一面部朝向由超过第一阈值降低至小于第二阈值的情况下,确定所述真实主播的头部姿态从所述指定姿态变化为非指定姿态,其中,所述第二阈值小于所述第一阈值。S13-2. When it is determined according to the change information that the first facial orientation decreases from exceeding the first threshold to being less than the second threshold, determine that the head posture of the real anchor has changed from the specified posture to An unspecified gesture, wherein the second threshold is smaller than the first threshold.
在一些实施例中,针对S13-2,可以根据实际的直播场景中为指定姿态所定义的第一面部朝向的角度范围,来设定第一阈值;根据实际的直播场景中为非指定姿态所定义的第一面部朝向的角度范围,来设定第二阈值。示例性地,可以将第一阈值设定为[27-33]中的任意一个数值,例如,可以将第一阈值设定为30;可以将第二阈值设定为[17-23]中的任意一个数值,例如,可以将第二阈值设定为20。当根据该变化信息由超过第一阈值降低至小于第二阈值的情况下,确定真实主播的头部姿态变化为指定姿态。In some embodiments, for S13-2, the first threshold can be set according to the angle range of the first facial orientation defined for the specified gesture in the actual live broadcast scene; according to the non-specified gesture in the actual live broadcast scene The defined angle range of the first facial orientation is used to set the second threshold. Exemplarily, the first threshold can be set to any value in [27-33], for example, the first threshold can be set to 30; the second threshold can be set to [17-23] Any numerical value, for example, the second threshold may be set to 20. When the change information decreases from exceeding the first threshold to being less than the second threshold, it is determined that the head posture of the real anchor has changed to a specified posture.
下面结合实际场景,对上述的S13-1和S13-2举例说明,过程描述如下:The following is an example of the above S13-1 and S13-2 in combination with actual scenarios. The process is described as follows:
真实主播M通过真实主播方终端在直播平台上进行直播。当真实主播M打开直播室之后,开始采集视频图像,并按照上述所描述的方式确定真实主播的头部姿态。The real anchor M broadcasts live on the live broadcast platform through the real anchor terminal. After the real anchor M opens the live broadcast room, start to collect video images, and determine the head posture of the real anchor in the manner described above.
假设真实主播的面部和视频采集装置的成像平面之间的目标夹角(即第一面部朝向)为alpha。如果alpha的变化信息为逐渐增大,当alpha从0开始增大至超过20度但未达到50时,确定真实主播未处于低头或者抬头;当alpha增大至超过30度的情况下,确定真实主播处于 低头或者抬头。反之,在alpha从大于30度的角度逐渐减小至20-30度之间的区间的情况下,确定真实主播仍然处于低头或者抬头,直到alpha继续减小至低于20度的情况下,确定真实主播未处于低头或者抬头。Assume that the target angle between the real host's face and the imaging plane of the video capture device (ie, the first face orientation) is alpha. If the change information of alpha is gradually increasing, when the alpha increases from 0 to more than 20 degrees but not 50, it is determined that the real anchor is not bowing or raising his head; when the alpha increases to more than 30 degrees, it is determined that the real The anchor is looking down or looking up. Conversely, when the alpha gradually decreases from an angle greater than 30 degrees to the interval between 20-30 degrees, it is determined that the real anchor is still bowing or raising his head, until the alpha continues to decrease to less than 20 degrees, it is determined The real anchor is not bowing or looking up.
在一种可选的低头检测技术方案中,可以通过预先设定一个阈值,将真实主播的面部朝向与水平面之间的角度与该阈值进行比较的方式确定真实主播是否处于指定姿态。然而,当真实主播执行点头动作时,可能会频繁出现目标夹角大于该阈值,或者目标夹角小于该阈值的情况。由于点头动作并不是指定姿态,因此,通过单阈值的检测技术可能会出现真实主播的指定姿态识别错误的问题,从而错误的触发相应的特效动画,给真实主播和观众带来不好的直播体验。In an optional head-down detection technical solution, a threshold may be preset, and the angle between the real anchor's face orientation and the horizontal plane may be compared with the threshold to determine whether the real anchor is in a specified posture. However, when a real anchor performs a nodding action, it may frequently occur that the target angle is greater than the threshold, or the target angle is smaller than the threshold. Since the nodding action is not a specified gesture, the single-threshold detection technology may cause errors in the recognition of the specified gesture of the real anchor, thereby triggering the corresponding special effect animation by mistake, bringing a bad live broadcast experience to the real anchor and the audience .
在本公开技术方案中,通过将目标夹角的变化信息与第一阈值和第二阈值进行比较的方式,可以实现通过多阈值比较的方式确定真实主播的头部姿态,进而提高真实主播的头部姿态的准确率,从而防止单阈值技术方案所带来的真实主播头部姿态的频繁变化。In the technical solution of the present disclosure, by comparing the change information of the target angle with the first threshold and the second threshold, it is possible to determine the head posture of the real anchor through multi-threshold comparison, thereby improving the head posture of the real anchor. The accuracy of the head posture can be improved, so as to prevent the frequent changes of the head posture of the real anchor brought by the single threshold technical solution.
情况二:真实主播的面部未正面朝向(例如,侧对)视频采集装置。Situation 2: The face of the real host is not facing the video capture device frontally (for example, facing sideways).
在此情况下,如图8所示,步骤S103,检测每帧所述视频图像中所述真实主播的头部姿态,可以通过执行S21-S22来实现,如下:In this case, as shown in FIG. 8, step S103, detecting the head posture of the real anchor in each frame of the video image, can be realized by executing S21-S22, as follows:
S21、在确定出所述真实主播的面部未正面朝向视频采集装置的情况下,通过深度学习模型对所述视频直播画面进行处理,得到所述真实主播的头部姿态;S21. When it is determined that the face of the real anchor is not facing the video acquisition device, process the live video image through a deep learning model to obtain the head posture of the real anchor;
S22、根据所述头部姿态确定所述真实主播的头部是否处于所述指定姿态。S22. Determine whether the head of the real anchor is in the specified posture according to the head posture.
在本公开实施例中,在检测出真实主播的面部未正面朝向视频采集装置的情况下,可以将该视频直播画面输入至深度学习模型中,从而通过该深度学习模型对视频直播画面进行处理,得到真实主播的头部姿态。In the embodiment of the present disclosure, when it is detected that the face of the real anchor is not facing the video capture device, the live video picture can be input into the deep learning model, so that the live video picture can be processed by the deep learning model, Get the head pose of the real anchor.
在将该视频直播画面输入至深度学习模型中之前,还需要对该深度学习模型进行训练。具体地,可以采集多个真实主播相对于视频采集画面各个角度时的图像,然后,将该图像输入至深度学习模型中进行训练,进而,通过训练之后的深度学习模型对该视频直播画面进行分析处理,得到真实主播的头部姿态。Before inputting the live video picture into the deep learning model, the deep learning model needs to be trained. Specifically, it is possible to collect images of multiple real anchors at various angles relative to the video capture screen, and then input the images into the deep learning model for training, and then analyze the live video screen through the trained deep learning model processing to obtain the head posture of the real anchor.
在一个可选的实施方式中,深度学习模型的输出数据可以为一个向量,该向量用于指示以下至少一种信息:是否处于指定姿态,所处于指定姿态的姿态类型(例如,低头姿态或者仰头姿态),真实主播的面部朝向与水平面之间的估计角度,真实主播的面部相对于视频采集装置的方位信息。In an optional implementation, the output data of the deep learning model can be a vector, which is used to indicate at least one of the following information: whether it is in a specified posture, the posture type of the specified posture (for example, bowing the head posture or looking up head pose), the estimated angle between the real anchor’s face orientation and the horizontal plane, and the orientation information of the real anchor’s face relative to the video capture device.
在本公开实施例中,在根据深度学习模型的输出数据确定出真实主播的头部处于指定姿态,且确定出的所述指定姿态满足特效触发要求的情况下,在所述视频直播画面中展示目标特效动画。In the embodiment of the present disclosure, when it is determined according to the output data of the deep learning model that the head of the real anchor is in a specified posture, and the specified posture meets the requirements for triggering special effects, the live broadcast screen displays Target special effect animation.
在根据深度学习模型的输出数据确定出真实主播的头部处于非指定姿态,且该真实主播的面部侧对视频采集装置的情况下,可以向真实主播生成提示信息,该提示信息用于提示真实主播移动视频采集装置,以使真实主播的面部能够正面朝向该视频采集装置。When it is determined according to the output data of the deep learning model that the head of the real anchor is in an unspecified posture, and the face of the real anchor is facing the video capture device, prompt information can be generated to the real anchor, and the prompt information is used to prompt the real The host moves the video capture device so that the face of the real host can face the video capture device.
例如,如图9所示,视频采集装置和真实主播方终端单独设置,且视频采集装置放置在真实主播方终端的左侧。当真实主播面对真实主播方终端的显示屏幕时,视频采集装置所采集到的视频直播画面中包含该真实主播的面部左侧部。在检测出真实主播正面朝向该真实主播方终端的显示屏幕,且侧对视频采集装置的情况下,确定不满足特效触发条件,需要为真实主播生成提示信息,以提示真实主播调整该视频采集装置的方位。For example, as shown in FIG. 9 , the video capture device and the real anchor terminal are set separately, and the video capture device is placed on the left side of the real anchor terminal. When the real anchor faces the display screen of the real anchor's terminal, the live video image collected by the video collection device includes the left side of the real anchor's face. When it is detected that the real host is facing the display screen of the real host’s terminal, and the side is facing the video capture device, it is determined that the special effect trigger condition is not met, and a prompt message needs to be generated for the real host to prompt the real host to adjust the video capture device orientation.
上述实施方式中,在真实主播的面部侧对视频采集装置的情况下,在视频直播画面中无法显示完整的面部特征点。由于残缺的面部特征点将影响头部姿态的确定结果,基于此,通过深度学习模型对视频直播画面进行姿态估计,得到真实主播的头部姿态,可以提高真实主播头部姿态的估计准确率。In the above embodiments, when the face of the real host faces the video capture device, the complete facial feature points cannot be displayed in the live video screen. Since the incomplete facial feature points will affect the determination result of the head pose, based on this, the pose estimation of the live video screen is carried out through the deep learning model to obtain the head pose of the real anchor, which can improve the estimation accuracy of the head pose of the real anchor.
在一个可选的实施方式中,上述S21可以如图10所示,通过执行S21-1至S21-2来实现, 如下:In an optional implementation manner, the above S21 can be realized by executing S21-1 to S21-2 as shown in FIG. 10, as follows:
S21-1、在确定出所述真实主播的面部未正面朝向视频采集装置的情况下,获取目标参考图像帧;其中,所述目标参考图像帧包括至少以下一种图像帧:所述视频直播画面所属的视频序列中位于该视频直播画面之前的N个图像帧、所述视频直播画面所属的视频序列中前M个图像帧,N和M为大于零的正整数;S21-1. When it is determined that the face of the real anchor is not facing the video acquisition device, acquire a target reference image frame; wherein, the target reference image frame includes at least one of the following image frames: the live video image The N image frames before the live video picture in the video sequence to which it belongs, and the first M image frames in the video sequence to which the live video picture belongs, where N and M are positive integers greater than zero;
S21-2、通过深度学习模型对所述视频直播画面和所述目标参考图像帧进行处理,得到所述真实主播的头部姿态。S21-2. Process the live video image and the target reference image frame by using a deep learning model to obtain the head pose of the real anchor.
在本公开实施例中,为了进一步提高真实主播的头部姿态的准确性,电子设备可以通过深度学习模型,结合真实主播在直播过程中视频序列的时序信息,确定当前时刻真实主播的头部姿态。In the embodiment of the present disclosure, in order to further improve the accuracy of the head pose of the real anchor, the electronic device can determine the head pose of the real anchor at the current moment by combining the timing information of the video sequence during the live broadcast of the real anchor through a deep learning model .
在一种可选的实施方式中,可以在视频序列中,确定当前时刻所对应的视频直播画面之前的N个图像帧。然后,将获取到的N个图像帧、每个图像帧所对应的输出数据、以及当前时刻采集到的视频直播画面输入到深度学习模型中进行处理,从而得到该真实主播的头部姿态。In an optional implementation manner, in the video sequence, N image frames before the live video picture corresponding to the current moment may be determined. Then, input the obtained N image frames, the output data corresponding to each image frame, and the live video image collected at the current moment into the deep learning model for processing, so as to obtain the head pose of the real anchor.
这里,由于真实主播在直播过程中头部的动作变化并不是很频繁,因此,视频序列中相邻视频直播画面对应的该真实主播的头部姿态可能是相同的姿态。在这种情况下,可以通过结合视频序列中的时序信息,来预测当前时刻视频直播画面中真实主播的头部姿态,可以将根据N个图像帧确定出的真实主播的头部姿态作为当前时刻待处理的视频直播画面的引导信息,从而指引深度学习模型预测当前时刻视频直播画面中真实主播的头部姿态,以得到更加准确的头部姿态的检测结果。Here, since the movement of the head of the real anchor does not change very frequently during the live broadcast, the head postures of the real anchor corresponding to adjacent live video images in the video sequence may be the same posture. In this case, the head pose of the real anchor in the live video screen at the current moment can be predicted by combining the timing information in the video sequence, and the head pose of the real anchor determined based on N image frames can be used as the current moment The guidance information of the live video screen to be processed can guide the deep learning model to predict the head pose of the real anchor in the live video screen at the current moment, so as to obtain more accurate detection results of the head pose.
在另一种可选的实施方式中,还可以确定视频序列中前M个图像帧。然后,将获取到的M个图像帧,以及每个图像帧所对应的输出数据,以及当前时刻采集到的视频直播画面输入到深度学习模型中进行处理,从而得到该真实主播的头部姿态。In another optional implementation manner, the first M image frames in the video sequence may also be determined. Then, the acquired M image frames, the output data corresponding to each image frame, and the live video images collected at the current moment are input into the deep learning model for processing, so as to obtain the head pose of the real anchor.
这里,由于真实主播在开启直播时,真实主播的面部会正面朝向视频采集装置,以对真实主播端设备进行调试。因此,可以在预测当前时刻待处理的视频直播画面时,可以将M个图像帧,以及每个图像帧所对应的输出数据,以及当前时刻采集到的视频直播画面输入到深度学习模型中进行处理,从而得到该真实主播的头部姿态。Here, when the real anchor starts the live broadcast, the face of the real anchor will face the video capture device in order to debug the device of the real anchor. Therefore, when predicting the live video picture to be processed at the current moment, M image frames, the output data corresponding to each image frame, and the live video picture collected at the current moment can be input into the deep learning model for processing , so as to obtain the head pose of the real anchor.
由于M个图像帧可以理解为真实主播的面部正面朝向视频采集装置时采集到的图像帧,因此,在M个图像帧中可以包含该真实主播的完整面部。这样,深度学习模型可以将当前时刻待处理的视频直播画面中关于该真实主播的画面,与M个图像帧中关于该真实主播的画面进行比对,从而指引深度学习模型预测当前时刻视频直播画面中真实主播的头部姿态,以得到更加准确的头部姿态的检测结果。Since the M image frames can be understood as the image frames collected when the face of the real anchor faces the video capture device, the M image frames may contain the complete face of the real anchor. In this way, the deep learning model can compare the picture about the real anchor in the live video picture to be processed at the current moment with the picture about the real anchor in the M image frames, so as to guide the deep learning model to predict the live video picture at the current moment In order to get more accurate head pose detection results.
在又一种可选的实施方式中,可以在视频序列中,确定当前时刻所对应的视频直播画面之前的N个图像帧,并确定视频序列中前M个图像帧。然后,将获取到的N个图像帧和M个图像帧、每个图像帧所对应的输出数据、以及当前时刻采集到的视频直播画面输入到深度学习模型中进行处理,从而得到该真实主播的头部姿态。In yet another optional implementation manner, in the video sequence, N image frames before the live video image corresponding to the current moment may be determined, and the first M image frames in the video sequence may be determined. Then, input the obtained N image frames and M image frames, the output data corresponding to each image frame, and the live video image collected at the current moment into the deep learning model for processing, so as to obtain the real host's head pose.
在本公开实施例中,在按照上述所描述的方式检测出视频图像中真实主播的头部姿态的情况下,就可以在根据多帧视频图像对应的头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在视频直播画面中展示目标特效动画。In the embodiment of the present disclosure, when the head pose of the real anchor in the video image is detected in the manner described above, the head pose of the real anchor can be determined according to the head poses corresponding to multiple frames of video images. When the head is in the specified posture for a period of time that meets the triggering requirements of the special effect, the target special effect animation is displayed on the live video screen.
在一个可选的实施方式中,还可以在指定姿态满足以下至少一种特效触发要求的情况下,在所述视频直播画面中展示目标特效动画,包括:In an optional implementation manner, the target special effect animation can also be displayed on the live video screen when the specified gesture meets at least one of the following special effect triggering requirements, including:
真实主播的头部处于指定姿态的次数满足特效触发要求;The number of times the head of the real anchor is in the specified posture meets the requirements for triggering special effects;
真实主播的头部处于指定姿态的状态类型满足特效触发要求;The type of state where the head of the real anchor is in the specified posture meets the requirements for triggering special effects;
真实主播的头部处于指定姿态时该头部在所述视频图像中的位置满足特效触发要求。When the real anchor's head is in a specified posture, the position of the head in the video image meets the requirements for triggering special effects.
上述实施方式中,通过设置多种特效触发要求,可以丰富特效动画的展示方式,为真实 主播和观众提供更加丰富的交互体验。In the above embodiments, by setting various special effect triggering requirements, the display mode of special effect animation can be enriched, and a richer interactive experience can be provided for real hosts and audiences.
在一个可选的实施方式中,基于图1,如图11所示,上述步骤S105,在所述视频直播画面中展示目标特效动画,包括如下步骤:In an optional implementation manner, based on FIG. 1, as shown in FIG. 11, the above step S105, displaying target special effect animation in the live video screen includes the following steps:
S1051、确定所述头部姿态的姿态类型;S1051. Determine the posture type of the head posture;
S1052、确定与所述姿态类型相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并在所述视频直播画面中展示目标特效动画。S1052. Determine the special effect animation that matches the gesture type, use the matched special effect animation as the target special effect animation that drives the virtual anchor model to display, and display the target special effect animation on the live video screen .
在本公开实施例中,针对不同姿态类型的头部姿态,设定了不同的特效动画。在确定出头部姿态的姿态类型之后,可以在数据表格中查找与该姿态类型相匹配的模型动画和/或素材特效,并将查找到的模型动画和/或素材特效作为驱动所述虚拟主播模型所展示的目标特效动画,并将该目标特效动画展示在视频直播画面上。In the embodiments of the present disclosure, different special effect animations are set for head postures of different posture types. After determining the posture type of the head posture, the model animation and/or material special effects matching the posture type can be searched in the data table, and the found model animation and/or material special effects can be used as driving the virtual anchor The target special effect animation displayed by the model, and display the target special effect animation on the live video screen.
可以理解的是,目标特效动画可以为一个特效动画,还可以为多个特效动画。当目标特效动画为一个时,可以在视频直播画面所对应的视频序列中循环播放该特效动画。当目标特效动画为多个时,可以在视频直播画面所对应的视频序列中依次播放每个目标特效动画。It can be understood that the target special effect animation can be one special effect animation, and can also be multiple special effect animations. When there is one target animation with special effects, the animation with special effects can be cyclically played in the video sequence corresponding to the live video screen. When there are multiple target special effect animations, each target special effect animation may be played sequentially in the video sequence corresponding to the live video screen.
在素材特效为与模型动画相匹配的特效时,素材特效可以跟随对应的模型动画在视频直播画面中依次进行循环播放。在素材特效为与指定姿态相匹配的特效时,素材特效可以在不用跟随模型动画的基础上,在视频直播画面中进行循环播放。When the special effect of the material is a special effect that matches the model animation, the special effect of the material can be played sequentially in the live video screen following the corresponding model animation. When the special effect of the material is a special effect that matches the specified pose, the special effect of the material can be played in a loop on the live video screen without following the model animation.
上述实施方式中,根据不同的头部姿态的姿态类型,触发不同类型的特效动画的方式,可以丰富视频直播画面中的展示内容,从而增加虚拟直播过程中的直播趣味性,为用户提供更加的直播体验。In the above embodiment, according to different types of head postures, different types of special effects animations are triggered, which can enrich the display content in the video live broadcast screen, thereby increasing the live broadcast fun during the virtual live broadcast process, and providing users with more Live experience.
在一个可选的实施方式中,基于图1或图11,上述步骤S105或S1052中的在所述视频直播画面中展示目标特效动画,还可以如图12所示,包括如下步骤:In an optional implementation manner, based on FIG. 1 or FIG. 11, the display of target special effect animation in the live video screen in the above step S105 or S1052 may also include the following steps as shown in FIG. 12:
S31、确定观看所述真实主播驱动的虚拟主播模型的直播过程的每个观众的类型信息;S31. Determine the type information of each viewer who watches the live broadcast process of the virtual anchor model driven by the real anchor;
S33、确定与所述类型信息相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并向观众方终端发送所述目标特效动画,以在所述观众方终端展示所述目标特效动画。S33. Determine the special effect animation matching the type information, use the matching special effect animation as the target special effect animation displayed by the virtual anchor model, and send the target special effect animation to the audience terminal, The target special effect animation is displayed on the audience terminal.
在本公开实施例中,可以设置为不同的观众触发展示不同类型的特效动画。首先,可以确定每个观众的类型信息,该类型信息可以包含以下至少之一:性别、年龄、地域、职业、爱好、等级。In the embodiment of the present disclosure, different types of special effect animations may be triggered to be displayed for different viewers. First, type information of each viewer may be determined, and the type information may include at least one of the following: gender, age, region, occupation, hobby, and rating.
在获取到上述类型信息之后,就可以根据该类型信息在数据库中查找与该类型信息相匹配的特效动画作为目标特效动画。然后,向观众方终端发送该目标特效动画,以在观众方终端所展示视频直播画面上播放该目标特效动画。After the above type information is obtained, the special effect animation matching the type information can be searched in the database according to the type information as the target special effect animation. Then, the target special effect animation is sent to the audience terminal, so as to play the target special effect animation on the live video screen displayed by the audience terminal.
比如,真实主播在直播的过程中可能长时间处于低头状态,在真实主播处于低头状态的情况下,无法捕捉到真实主播的面部表情,从而将导致视频直播画面中无法对虚拟主播模型进行正常显示。在观众进入到直播间看到无法正常显示的虚拟主播模型的情况下,将影响观众的观看体验,导致观众离开该直播间。而在上述因真实主播的头部姿态导致的无法对虚拟主播模型进行正常显示的情况下,本申请实施例的数据展示方法可以为该观众展示对应特效动画,例如:真实主播正在执行连线操作,请不要离开哦。从而增大了观众继续观看该直播的概率,减少了观众的流失,并且在保证真实主播的直播热度的同时,还增加了相应的互动乐趣。For example, the real anchor may keep his head down for a long time during the live broadcast. When the real anchor is in the bowed state, the facial expressions of the real anchor cannot be captured, which will cause the virtual anchor model to be unable to be displayed normally in the live video screen. . When the audience enters the live broadcast room and sees the virtual anchor model that cannot be displayed normally, it will affect the viewing experience of the audience and cause the audience to leave the live broadcast room. In the above-mentioned situation where the virtual anchor model cannot be displayed normally due to the head posture of the real anchor, the data display method of the embodiment of the present application can display the corresponding special effect animation for the audience, for example: the real anchor is performing a connection operation , please don't leave. Thereby, the probability of viewers continuing to watch the live broadcast is increased, the loss of viewers is reduced, and while ensuring the popularity of the live broadcast of the real host, the corresponding interactive fun is also increased.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.
基于同一发明构思,本公开实施例中还提供了与数据展示方法对应的数据展示装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述数据展示方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides a data display device corresponding to the data display method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned data display method in the embodiment of the present disclosure, the implementation of the device Reference can be made to the implementation of the method, and repeated descriptions will not be repeated.
参照图13所示,为本公开实施例提供的一种数据展示装置的示意图,所述装置包括:获取部分51、检测部分52、特效添加部分53;其中,Referring to FIG. 13 , it is a schematic diagram of a data display device provided by an embodiment of the present disclosure. The device includes: an acquisition part 51, a detection part 52, and a special effect addition part 53; wherein,
获取部分51,被配置为获取真实主播在直播过程中的多帧视频图像;The acquisition part 51 is configured to acquire the multi-frame video images of the real anchor during the live broadcast;
检测部分52,被配置为检测每帧所述视频图像中所述真实主播的头部姿态;The detection part 52 is configured to detect the head posture of the real anchor in each frame of the video image;
特效添加部分53,被配置为在根据所述多帧视频图像对应的所述头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在所述视频直播画面中展示目标特效动画;所述视频直播画面展示有所述真实主播驱动的虚拟主播模型。The special effect adding part 53 is configured to, when it is determined according to the head postures corresponding to the multi-frame video images that the length of time that the head of the real anchor is in a specified posture meets the requirements for triggering special effects, The target special effect animation is displayed in the live video screen; the live video screen shows a virtual anchor model driven by the real anchor.
在本公开技术方案中,通过在视频直播画面中展示虚拟主播模型,可以增强直播的趣味性和互动性,进一步地,在确定出真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,可以通过在视频直播画面中展示驱动虚拟主播模型所对应的目标特效动画,保证虚拟主播模型的头部处于稳定的播放状态,同时还可以丰富视频主播画面的展示内容,从而使得视频直播画面不再过于单调,进而解决传统的直播场景下在匹配不到真实主播的面部画面的情况下所导致虚拟主播模型显示异常的问题。In the disclosed technical solution, by displaying the virtual anchor model in the live video screen, the interest and interactivity of the live broadcast can be enhanced. Under certain circumstances, by displaying the target special effect animation corresponding to the driving virtual anchor model in the live video screen, it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen, so that the live video The picture is no longer too monotonous, and thus solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene.
一种可能的实施方式中,检测部分52,还被配置为:在确定出所述真实主播的面部正面朝向视频采集装置的情况下,确定当前时刻所述真实主播的第一面部朝向;根据所述第一面部朝向确定所述真实主播的头部姿态的变化信息;所述变化信息用于表征所述第一面部朝向的变化信息;基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态。In a possible implementation manner, the detection part 52 is further configured to: determine the first face orientation of the real anchor at the current moment when it is determined that the face of the real anchor faces the video capture device; The first facial orientation determines the change information of the head posture of the real anchor; the change information is used to characterize the change information of the first facial orientation; the video image of each frame is determined based on the change information The head pose of the real anchor in .
一种可能的实施方式中,检测部分52,还被配置为:在根据所述变化信息确定出所述第一面部朝向增大至超过第一阈值的情况下,确定所述真实主播的头部姿态从非指定姿态变化为所述指定姿态。In a possible implementation manner, the detection part 52 is further configured to: determine that the real anchor's head The internal posture is changed from the unspecified posture to the specified posture.
一种可能的实施方式中,检测部分52,还被配置为:在根据所述变化信息确定出所述第一面部朝向由超过第一阈值降低至小于第二阈值的情况下,确定所述真实主播的头部姿态从所述指定姿态变化为非指定姿态,其中,所述第二阈值小于所述第一阈值。In a possible implementation manner, the detection part 52 is further configured to: determine the The head pose of the real anchor changes from the designated pose to a non-designated pose, wherein the second threshold is smaller than the first threshold.
一种可能的实施方式中,检测部分52,还被配置为:在确定出所述真实主播的面部未正面朝向视频采集装置的情况下,通过深度学习模型对所述视频直播画面进行处理,得到所述真实主播的头部姿态,并根据所述头部姿态确定所述真实主播的头部是否处于所述指定姿态。In a possible implementation manner, the detection part 52 is further configured to: when it is determined that the face of the real anchor is not facing the video collection device, process the live video screen through a deep learning model to obtain The head pose of the real anchor, and determine whether the head of the real anchor is in the specified pose according to the head pose.
一种可能的实施方式中,检测部分52,还被配置为:获取所述目标参考图像帧;其中,所述目标参考图像帧包括至少以下一种图像帧:所述视频直播画面所属的视频序列中位于该视频直播画面之前的N个图像帧、所述视频直播画面所属的视频序列中前M个图像帧,N和M为大于零的正整数;通过深度学习模型对所述视频直播画面和所述目标参考图像帧进行处理,得到所述真实主播的头部姿态。In a possible implementation manner, the detection part 52 is further configured to: acquire the target reference image frame; wherein, the target reference image frame includes at least one of the following image frames: the video sequence to which the live video picture belongs In the N image frames before the live video picture, the first M image frames in the video sequence to which the live video picture belongs, N and M are positive integers greater than zero; The target reference image frame is processed to obtain the head pose of the real anchor.
一种可能的实施方式中,检测部分52,还被配置为:对所述视频图像中所述真实主播的面部进行特征点检测,得到特征点检测结果,其中,所述特征点检测结果用于表征所述真实主播面部特征点的特征信息;根据所述特征点检测结果确定所述真实主播的第二面部朝向,其中,所述第二面部朝向用于表征所述真实主播的面部相对于视频采集装置的方位信息;根据所述第二面部朝向确定所述真实主播的头部姿态。In a possible implementation manner, the detection part 52 is further configured to: perform feature point detection on the face of the real anchor in the video image to obtain a feature point detection result, wherein the feature point detection result is used for Characterize the feature information of the facial feature points of the real anchor; determine the second facial orientation of the real anchor according to the feature point detection result, wherein the second facial orientation is used to characterize the face of the real anchor relative to the video Collecting the orientation information of the device; determining the head posture of the real anchor according to the second facial orientation.
一种可能的实施方式中,特效添加部分53,还被配置为:确定所述头部姿态的姿态类型;确定与所述姿态类型相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并在所述视频直播画面中展示所述目标特效动画。In a possible implementation manner, the special effect adding part 53 is further configured to: determine the posture type of the head posture; determine the special effect animation matching the posture type, and use the matching special effect animation as the driving The target special effect animation displayed by the virtual anchor model, and the target special effect animation is displayed in the live video screen.
一种可能的实施方式中,特效添加部分53,还被配置为:确定观看所述真实主播驱动的虚拟主播模型的直播过程的每个观众的类型信息;确定与所述类型信息相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并向观众方终端发送所述目标特效动画,以在所述观众方终端展示所述目标特效动画。In a possible implementation manner, the special effect adding part 53 is further configured to: determine the type information of each viewer watching the live broadcast process of the virtual anchor model driven by the real anchor; determine the special effect matching the type information Animation, using the matching special effect animation as the target special effect animation displayed by the virtual anchor model, and sending the target special effect animation to the audience terminal to display the target special effect on the audience terminal animation.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.
对应于图1中的数据展示方法,本公开实施例还提供了一种电子设备600,如图14所示,为本公开实施例提供的电子设备600结构示意图,包括:Corresponding to the data presentation method in FIG. 1, the embodiment of the present disclosure also provides an electronic device 600, as shown in FIG. 14, which is a schematic structural diagram of the electronic device 600 provided in the embodiment of the present disclosure, including:
处理器61、存储器62、和总线63;存储器62被配置为存储执行指令,包括内存621和外部存储器622;这里的内存621也称内存储器,被配置为暂时存放处理器61中的运算数据,以及与硬盘等外部存储器622交换的数据,处理器61通过内存621与外部存储器622进行数据交换,当所述电子设备600运行时,所述处理器61与所述存储器62之间通过总线63通信,使得所述处理器61执行以下指令: Processor 61, memory 62, and bus 63; memory 62 is configured to store execution instructions, including memory 621 and external memory 622; memory 621 here is also called internal memory, and is configured to temporarily store computing data in the processor 61, And the data exchanged with the external memory 622 such as hard disk, the processor 61 exchanges data with the external memory 622 through the memory 621, when the electronic device 600 is running, the processor 61 communicates with the memory 62 through the bus 63 , so that the processor 61 executes the following instructions:
获取真实主播在直播过程中的多帧视频图像;Obtain multi-frame video images of the real anchor during the live broadcast;
检测每帧所述视频图像中所述真实主播的头部姿态;Detecting the head posture of the real anchor in each frame of the video image;
在根据所述多帧视频图像对应的所述头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在所述视频直播画面中展示目标特效动画;所述视频直播画面展示有所述真实主播驱动的虚拟主播模型。According to the head posture corresponding to the multi-frame video image, it is determined that the time length of the head of the real anchor in the specified posture meets the special effect triggering requirement, and display the target special effect animation in the live video screen ; The live video screen shows a virtual anchor model driven by the real anchor.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据展示方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data presentation method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据展示方法的步骤,具体可参见上述方法实施例,在此不再赘述。Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data display method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. Wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述部分的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个部分或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或部分的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the system and device described above can refer to the corresponding process in the foregoing method embodiment, and details are not repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the parts is only a logical function division. In actual implementation, there may be other division methods. For example, multiple parts or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or parts may be in electrical, mechanical or other forms.
所述作为分离部件说明的部分可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The parts described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能部分可以集成在一个处理单元中,也可以是各个部分单独物理存在,也可以两个或两个以上部分集成在一个部分中。In addition, each functional part in each embodiment of the present disclosure may be integrated into one processing unit, each part may exist separately physically, or two or more parts may be integrated into one part.
所述功能如果以软件功能部分的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function parts and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公 开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.
工业实用性Industrial Applicability
本公开实施例中,通过在视频直播画面中展示虚拟主播模型,可以增强直播的趣味性和互动性,进一步地,在确定出真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,可以通过在视频直播画面中展示驱动虚拟主播模型所对应的目标特效动画,保证虚拟主播模型的头部处于稳定的播放状态,同时还可以丰富视频主播画面的展示内容,从而使得视频直播画面不再过于单调,进而解决传统的直播场景下在匹配不到真实主播的面部画面的情况下所导致虚拟主播模型显示异常的问题。并且,通过根据当前时刻真实主播的第一面部朝向确定真实主播的头部姿态的变化信息,进而根据该变化信息确定真实主播的头部姿态,可以实现利用视频序列中的时序信息(即,相邻的视频图像)对真实主播的头部姿态的变化信息进行分析,相比于基于单帧视频图像确定头部姿态的方式,本公开技术方案所提供的方法可以提高头部姿态的准确率,从而得到更加准确的姿态结果。并且,通过将目标夹角的变化信息与第一阈值和第二阈值进行比较的方式,可以实现通过多阈值比较的方式确定真实主播的头部姿态,进而提高真实主播的头部姿态的准确率,从而防止单阈值技术方案所带来的真实主播头部姿态的频繁变化。并且,通过根据对视频图像中真实主播的面部进行特征点检测的特征点检测结果确定真实主播的第二面部朝向,可以确定真实主播相对于视频采集装置的方位信息,例如,真实主播正面朝向视频采集装置,或者,真实主播侧对视频采集装置。由于真实主播侧对视频采集装置时,无法采集完整面部图像,在此情况下,将影响真实主播头部姿态的准确率。通过分正面朝向和未正面朝向两种情况确定真实主播的头部姿态,可以提高真实主播的头部姿态的准确率。In the embodiment of the present disclosure, by displaying the virtual anchor model in the video live broadcast screen, the interest and interactivity of the live broadcast can be enhanced. Next, by displaying the target special effect animation corresponding to the driving virtual anchor model in the live video screen, it can ensure that the head of the virtual anchor model is in a stable playback state, and at the same time, it can also enrich the display content of the video anchor screen, so that the live video screen It is no longer too monotonous, and thus solves the problem of abnormal display of the virtual anchor model caused by the fact that the facial image of the real anchor cannot be matched in the traditional live broadcast scene. And, by determining the change information of the head pose of the real anchor according to the first face orientation of the real anchor at the current moment, and then determining the head pose of the real anchor according to the change information, it can be realized to use the timing information in the video sequence (that is, Adjacent video images) to analyze the change information of the head posture of the real anchor, compared with the method of determining the head posture based on a single frame video image, the method provided by the disclosed technical solution can improve the accuracy of the head posture , so as to obtain more accurate pose results. In addition, by comparing the change information of the target angle with the first threshold and the second threshold, it is possible to determine the head pose of the real anchor through multi-threshold comparison, thereby improving the accuracy of the head pose of the real anchor , so as to prevent the frequent changes of the head posture of the real anchor brought by the single threshold technical scheme. And, by determining the second facial orientation of the real anchor according to the feature point detection result of feature point detection on the face of the real anchor in the video image, the orientation information of the real anchor relative to the video capture device can be determined, for example, the real anchor is facing the video Acquisition device, or, real anchor side-to-side video acquisition device. Since the real host cannot collect a complete facial image when facing the video capture device, in this case, it will affect the accuracy of the real host's head posture. By determining the head posture of the real anchor according to the two situations of frontal orientation and non-frontal orientation, the accuracy of the real anchor's head posture can be improved.

Claims (13)

  1. 一种数据展示方法,包括:A method for displaying data, comprising:
    获取真实主播在直播过程中的多帧视频图像;Obtain multi-frame video images of the real anchor during the live broadcast;
    检测每帧所述视频图像中所述真实主播的头部姿态;Detecting the head posture of the real anchor in each frame of the video image;
    在根据所述多帧视频图像对应的所述头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在视频直播画面中展示目标特效动画;所述视频直播画面展示有所述真实主播驱动的虚拟主播模型。According to the head posture corresponding to the multi-frame video image, it is determined that the time length of the head of the real anchor in the specified posture meets the special effect triggering requirement, and the target special effect animation is displayed in the live video screen; The above-mentioned live video screen displays a virtual anchor model driven by the real anchor.
  2. 根据权利要求1所述的方法,其中,所述检测每帧所述视频图像中所述真实主播的头部姿态,包括:The method according to claim 1, wherein said detecting the head posture of said real anchor in each frame of said video image comprises:
    在确定出所述真实主播的面部正面朝向视频采集装置的情况下,确定当前时刻所述真实主播的第一面部朝向;When it is determined that the face of the real anchor is facing the video capture device, determine the first face orientation of the real anchor at the current moment;
    根据所述第一面部朝向确定所述真实主播的头部姿态的变化信息;所述变化信息用于表征所述第一面部朝向的变化信息;Determine the change information of the real anchor's head posture according to the first facial orientation; the change information is used to characterize the change information of the first facial orientation;
    基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态。Determining the head pose of the real anchor in each frame of the video image based on the change information.
  3. 根据权利要求2所述的方法,其中,所述基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态,包括:The method according to claim 2, wherein said determining the head pose of the real anchor in each frame of the video image based on the change information comprises:
    在根据所述变化信息确定出所述第一面部朝向增大至超过第一阈值的情况下,确定所述真实主播的头部姿态从非指定姿态变化为所述指定姿态。If it is determined according to the change information that the first facial orientation has increased to exceed a first threshold, it is determined that the head posture of the real anchor has changed from an unspecified posture to the specified posture.
  4. 根据权利要求2或3所述的方法,其中,所述基于所述变化信息确定每帧所述视频图像中所述真实主播的所述头部姿态,包括:The method according to claim 2 or 3, wherein the determining the head pose of the real anchor in each frame of the video image based on the change information comprises:
    在根据所述变化信息,确定出所述第一面部朝向由超过第一阈值降低至小于第二阈值的情况下,确定所述真实主播的头部姿态从所述指定姿态变化为非指定姿态,其中,所述第二阈值小于所述第一阈值。When it is determined according to the change information that the first facial orientation decreases from exceeding the first threshold to being less than the second threshold, it is determined that the head posture of the real anchor has changed from the designated posture to a non-designated posture , wherein the second threshold is smaller than the first threshold.
  5. 根据权利要求1至4中任一项所述的方法,其中,所述检测每帧所述视频图像中所述真实主播的头部姿态,包括:The method according to any one of claims 1 to 4, wherein said detecting the head posture of said real anchor in each frame of said video image comprises:
    在确定出所述真实主播的面部未正面朝向视频采集装置的情况下,通过深度学习模型对所述视频直播画面进行处理,得到所述真实主播的头部姿态,并根据所述头部姿态确定所述真实主播的头部是否处于所述指定姿态。When it is determined that the face of the real anchor is not facing the video capture device, the live video screen is processed through a deep learning model to obtain the head posture of the real anchor, and determine the head posture according to the head posture Whether the head of the real anchor is in the specified posture.
  6. 根据权利要求5所述的方法,其中,所述通过深度学习模型对所述视频直播画面进行处理,得到所述真实主播的头部姿态,包括:The method according to claim 5, wherein said processing the live video picture through a deep learning model to obtain the head posture of the real anchor includes:
    获取目标参考图像帧;其中,所述目标参考图像帧包括至少以下一种图像帧:所述视频直播画面所属的视频序列中位于该视频直播画面之前的N个图像帧、所述视频直播画面所属的视频序列中前M个图像帧,N和M为大于零的正整数;Obtain a target reference image frame; wherein, the target reference image frame includes at least one of the following image frames: the N image frames before the live video picture in the video sequence to which the live video picture belongs, the live video picture to which the live video picture belongs The first M image frames in the video sequence, N and M are positive integers greater than zero;
    通过深度学习模型对所述视频直播画面和所述目标参考图像帧进行处理,得到所述真实主播的头部姿态。The live video image and the target reference image frame are processed by a deep learning model to obtain the head pose of the real anchor.
  7. 根据权利要求1至6任一所述的方法,其中,所述检测每帧所述视频图像中所述真实主播的头部姿态,包括:The method according to any one of claims 1 to 6, wherein said detecting the head posture of said real anchor in each frame of said video image comprises:
    对所述视频图像中所述真实主播的面部进行特征点检测,得到特征点检测结果,其中,所述特征点检测结果用于表征所述真实主播面部特征点的特征信息;Perform feature point detection on the face of the real anchor in the video image to obtain a feature point detection result, wherein the feature point detection result is used to characterize the feature information of the real anchor facial feature points;
    根据所述特征点检测结果确定所述真实主播的第二面部朝向,其中,所述第二面部朝向用于表征所述真实主播的面部相对于视频采集装置的方位信息;Determining a second facial orientation of the real anchor according to the feature point detection result, wherein the second facial orientation is used to characterize the orientation information of the real anchor's face relative to the video capture device;
    根据所述第二面部朝向确定所述真实主播的头部姿态。Determining the head pose of the real anchor according to the second facial orientation.
  8. 根据权利要求1至7中任一项所述的方法,其中,所述在视频直播画面中展示目标特 效动画,包括:The method according to any one of claims 1 to 7, wherein said displaying target special effect animation in the live video picture includes:
    确定所述头部姿态的姿态类型;determining a gesture type of the head gesture;
    确定与所述姿态类型相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的目标特效动画,并在所述视频直播画面中展示所述目标特效动画。Determining a special effect animation that matches the gesture type, using the matched special effect animation as a target special effect animation that drives the virtual anchor model to display, and displaying the target special effect animation on the live video screen.
  9. 根据权利要求1至8中任一项所述的方法,其中,所述在视频直播画面中展示目标特效动画,包括:The method according to any one of claims 1 to 8, wherein said displaying target special effect animation in the live video screen includes:
    确定观看所述真实主播驱动的虚拟主播模型的直播过程的每个观众的类型信息;Determine the type information of each viewer who watches the live broadcast process of the virtual anchor model driven by the real anchor;
    确定与所述类型信息相匹配的特效动画,将所述相匹配的特效动画作为驱动所述虚拟主播模型所展示的所述目标特效动画,并向观众方终端发送所述目标特效动画,以在所述观众方终端展示所述目标特效动画。Determine the special effect animation that matches the type information, use the matched special effect animation as the target special effect animation displayed by the virtual anchor model, and send the target special effect animation to the audience terminal, so as to The audience terminal displays the target special effect animation.
  10. 一种数据展示装置,包括:A data display device, comprising:
    获取部分,被配置为获取真实主播在直播过程中的多帧视频图像;The acquisition part is configured to acquire multiple frames of video images of the real anchor during the live broadcast;
    检测部分,被配置为检测每帧所述视频图像中所述真实主播的头部姿态;The detection part is configured to detect the head posture of the real anchor in each frame of the video image;
    特效添加部分,被配置为在根据所述多帧视频图像对应的所述头部姿态,确定出所述真实主播的头部处于指定姿态的时间长度满足特效触发要求的情况下,在视频直播画面中展示目标特效动画;所述视频直播画面展示有所述真实主播驱动的虚拟主播模型。The special effect adding part is configured to, when it is determined according to the head postures corresponding to the multi-frame video images that the time length of the head of the real anchor in the specified posture meets the special effect triggering requirements, the live broadcast screen The target special effect animation is displayed in the live video screen; the virtual anchor model driven by the real anchor is displayed on the live video screen.
  11. 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9中任一所述的数据展示方法的步骤。An electronic device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the memory through the bus , when the machine-readable instructions are executed by the processor, the steps of the data presentation method according to any one of claims 1 to 9 are executed.
  12. 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9中任一所述的数据展示方法的步骤。A computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the data presentation method according to any one of claims 1 to 9 are executed.
  13. 一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行时实现权利要求1至9中任一所述的数据展示方法的步骤。A computer program comprising computer readable code, when the computer readable code is run in an electronic device, the processor in the electronic device implements the data according to any one of claims 1 to 9 when executed Show the steps of the method.
PCT/CN2022/085941 2021-06-29 2022-04-08 Data display method, apparatus, electronic device, computer program, and computer-readable storage medium WO2023273500A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110728854.1 2021-06-29
CN202110728854.1A CN113453034B (en) 2021-06-29 2021-06-29 Data display method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2023273500A1 true WO2023273500A1 (en) 2023-01-05

Family

ID=77813960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/085941 WO2023273500A1 (en) 2021-06-29 2022-04-08 Data display method, apparatus, electronic device, computer program, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN113453034B (en)
WO (1) WO2023273500A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113453034B (en) * 2021-06-29 2023-07-25 上海商汤智能科技有限公司 Data display method, device, electronic equipment and computer readable storage medium
CN113850746A (en) * 2021-09-29 2021-12-28 北京字跳网络技术有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN114092678A (en) * 2021-11-29 2022-02-25 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN114363685A (en) * 2021-12-20 2022-04-15 咪咕文化科技有限公司 Video interaction method and device, computing equipment and computer storage medium
CN114125569B (en) * 2022-01-27 2022-07-15 阿里巴巴(中国)有限公司 Live broadcast processing method and device
CN115147312B (en) * 2022-08-10 2023-07-14 深圳因应特科技有限公司 Facial skin-polishing special-effect simplified identification system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300100A1 (en) * 2014-11-10 2016-10-13 Intel Corporation Image capturing apparatus and method
CN109803165A (en) * 2019-02-01 2019-05-24 北京达佳互联信息技术有限公司 Method, apparatus, terminal and the storage medium of video processing
CN109960986A (en) * 2017-12-25 2019-07-02 北京市商汤科技开发有限公司 Human face posture analysis method, device, equipment, storage medium and program
CN110557625A (en) * 2019-09-17 2019-12-10 北京达佳互联信息技术有限公司 live virtual image broadcasting method, terminal, computer equipment and storage medium
CN112543343A (en) * 2020-11-27 2021-03-23 广州华多网络科技有限公司 Live broadcast picture processing method and device based on live broadcast with wheat and electronic equipment
CN113453034A (en) * 2021-06-29 2021-09-28 上海商汤智能科技有限公司 Data display method and device, electronic equipment and computer readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093490B (en) * 2013-02-02 2015-08-26 浙江大学 Based on the real-time face animation method of single video camera
US10210648B2 (en) * 2017-05-16 2019-02-19 Apple Inc. Emojicon puppeting
CN107493515B (en) * 2017-08-30 2021-01-01 香港乐蜜有限公司 Event reminding method and device based on live broadcast
CN110139115B (en) * 2019-04-30 2020-06-09 广州虎牙信息科技有限公司 Method and device for controlling virtual image posture based on key points and electronic equipment
CN112069863B (en) * 2019-06-11 2022-08-19 荣耀终端有限公司 Face feature validity determination method and electronic equipment
CN110933452B (en) * 2019-12-02 2021-12-03 广州酷狗计算机科技有限公司 Method and device for displaying lovely face gift and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300100A1 (en) * 2014-11-10 2016-10-13 Intel Corporation Image capturing apparatus and method
CN109960986A (en) * 2017-12-25 2019-07-02 北京市商汤科技开发有限公司 Human face posture analysis method, device, equipment, storage medium and program
CN109803165A (en) * 2019-02-01 2019-05-24 北京达佳互联信息技术有限公司 Method, apparatus, terminal and the storage medium of video processing
CN110557625A (en) * 2019-09-17 2019-12-10 北京达佳互联信息技术有限公司 live virtual image broadcasting method, terminal, computer equipment and storage medium
CN112543343A (en) * 2020-11-27 2021-03-23 广州华多网络科技有限公司 Live broadcast picture processing method and device based on live broadcast with wheat and electronic equipment
CN113453034A (en) * 2021-06-29 2021-09-28 上海商汤智能科技有限公司 Data display method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN113453034B (en) 2023-07-25
CN113453034A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
WO2023273500A1 (en) Data display method, apparatus, electronic device, computer program, and computer-readable storage medium
CN107911724B (en) Live broadcast interaction method, device and system
CN107911736B (en) Live broadcast interaction method and system
KR102292537B1 (en) Image processing method and apparatus, and storage medium
CN108885533B (en) Combining virtual reality and augmented reality
US10360715B2 (en) Storage medium, information-processing device, information-processing system, and avatar generating method
US9094571B2 (en) Video chatting method and system
US9292092B2 (en) Interactive display system with collaborative gesture detection
WO2023071443A1 (en) Virtual object control method and apparatus, electronic device, and readable storage medium
CN113422977B (en) Live broadcast method and device, computer equipment and storage medium
WO2023279705A1 (en) Live streaming method, apparatus, and system, computer device, storage medium, and program
TW202105331A (en) Human body key point detection method and device, electronic device and storage medium
GB2590208A (en) Face-based special effect generation method and apparatus, and electronic device
WO2021159792A1 (en) Method and device for interaction with virtual item, computer apparatus, and storage medium
US11778263B2 (en) Live streaming video interaction method and apparatus, and computer device
WO2023279713A1 (en) Special effect display method and apparatus, computer device, storage medium, computer program, and computer program product
US8958686B2 (en) Information processing device, synchronization method, and program
JP2020039029A (en) Video distribution system, video distribution method, and video distribution program
JP2018005892A (en) Engagement value processing system and engagement value processing device
CN111643900A (en) Display picture control method and device, electronic equipment and storage medium
CN112199016A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111638784A (en) Facial expression interaction method, interaction device and computer storage medium
CN117032520A (en) Video playing method and device based on digital person, electronic equipment and storage medium
JP7438690B2 (en) Information processing device, image recognition method, and learning model generation method
JP5850188B2 (en) Image display system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22831331

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE