CN117041645A - Video playing method and device based on digital person, electronic equipment and storage medium - Google Patents

Video playing method and device based on digital person, electronic equipment and storage medium Download PDF

Info

Publication number
CN117041645A
CN117041645A CN202310880242.3A CN202310880242A CN117041645A CN 117041645 A CN117041645 A CN 117041645A CN 202310880242 A CN202310880242 A CN 202310880242A CN 117041645 A CN117041645 A CN 117041645A
Authority
CN
China
Prior art keywords
target
digital person
character
role
digital
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310880242.3A
Other languages
Chinese (zh)
Inventor
钟彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202310880242.3A priority Critical patent/CN117041645A/en
Publication of CN117041645A publication Critical patent/CN117041645A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The present disclosure relates to a video playing method, a device, an electronic device and a storage medium based on digital people, wherein the method comprises the steps of obtaining a target role of a voice content currently generated in a target video and obtaining a sign language animation corresponding to the voice content in a target video playing process; acquiring a digital person matched with the target role; and displaying the sign language animation corresponding to the digital person in the target video. Through obtaining the digital person corresponding to the target role and the sign language animation corresponding to the digital person in the target video, the embodiment of the disclosure can diversify the image of the digital person, and the digital person corresponding to the target role is obtained through matching, so that a user can conveniently and timely distinguish the speaking role from the displayed digital person.

Description

Video playing method and device based on digital person, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of digital persons, and in particular relates to a video playing method and device based on digital persons, electronic equipment and a storage medium.
Background
With the continuous development of the internet and mobile terminals, more and more users can watch videos through the mobile terminals, and the deaf-mute can watch videos through the mobile terminals, so that the content taught by the videos needs to be translated through sign language due to the special requirements of the deaf-mute users.
In the related art, before playing a video, a gesture animation can be made in advance according to the video content, and in the process of playing the video, the gesture animation of a digital person is synchronously displayed in the lower left corner of a screen. If a plurality of people speak at the same time in the video, speaking contents of the plurality of people cannot be well displayed through the same digital human voice animation, and even if a deaf user sees speaking contents, it is unclear which person is translated to speak the contents.
Disclosure of Invention
The disclosure provides a video playing method and device based on digital people, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a digital person-based video playing method, the method including:
in the process of playing a target video, acquiring a target role of a voice content currently generated in the target video, and acquiring a sign language animation corresponding to the voice content;
acquiring a digital person matched with the target role;
and displaying the sign language animation corresponding to the digital person in the target video.
According to another aspect of the present disclosure, there is provided a digital person-based video playback apparatus, the apparatus including:
the first information acquisition module is used for acquiring a target role of the voice content currently generated in the target video and acquiring a sign language animation corresponding to the voice content in the target video in the playing process of the target video;
The second information acquisition module is used for acquiring the digital person matched with the target role;
and the display module is used for displaying the sign language animation corresponding to the digital person in the target video.
According to a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method as described above when executing the program.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described method of the present disclosure.
According to the video playing method, device, electronic equipment and storage medium based on the digital person, in the process of playing the target video, a target role of the voice content currently generated in the target video is acquired, and sign language animation corresponding to the voice content is acquired; and acquiring the digital person matched with the target character, and displaying sign language animation corresponding to the digital person in the target video. Through obtaining the digital person corresponding to the target role and the sign language animation corresponding to the digital person in the target video, the embodiment of the disclosure can diversify the image of the digital person, and the digital person corresponding to the target role is obtained through matching, so that a user can conveniently and timely distinguish the speaking role from the displayed digital person.
Drawings
Further details, features and advantages of the present disclosure are disclosed in the following description of exemplary embodiments, with reference to the following drawings, wherein:
FIG. 1 is a schematic diagram of role marking in video provided in an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic diagram of binding between roles and digital persons provided by an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram of character position determination provided by an exemplary embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a character and digital person display in a video provided in accordance with an exemplary embodiment of the present disclosure;
FIG. 5 is a flowchart of a digital person-based video playback method provided in an exemplary embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of functional blocks of a digital person-based video playback device provided in an exemplary embodiment of the present disclosure;
FIG. 7 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure;
fig. 8 is a block diagram of a computer system according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
With the continuous development of the internet and mobile terminals, more and more users can watch videos through the mobile terminals, and the deaf-mute can watch videos through the mobile terminals, so that the content taught by the videos needs to be translated through sign language due to the special requirements of the deaf-mute users.
In the related art, before playing a video, a gesture animation can be made in advance according to the video content, and in the process of playing the video, the gesture animation of a digital person is synchronously displayed in the lower left corner of a screen. If multiple people speak at the same time in the video, the single digital human voice animation cannot well show the speaking content of the multiple people, that is, the current digital person is too single, so that even if the deaf user sees the content clearly, it is unclear which person is translating the content.
In addition, the related technology has the defect that the digital person display image is too fixed and cannot be automatically controlled according to the user setting; and the display position of the digital person in the screen is too fixed, for example, the digital person is usually displayed in the lower left corner of the screen.
In order to solve the technical problems, the method comprises the steps of determining target roles appearing in a video by analyzing video content, and generating a personalized digital person for each target role; generating a sign language animation of the corresponding digital person by analyzing the speaking content of each target role; and the display position of each target character in the video frame is determined by tracking the position of each target character in the video frame, and in the process of playing the video, the sign language animation of the digital person corresponding to different target characters can be displayed at a proper position for each video frame, so that different digital persons can be individually rendered according to different target characters, and conversation contents of a plurality of target characters can be accurately displayed by different digital person sign language animation, that is, the conversation contents of the target characters corresponding to the digital person can be accurately displayed by the sign language animation of each digital person, and the video viewing experience of a deaf-mute user is improved.
In the embodiments provided by the present disclosure, it is first necessary to mark the start frame position in the video when each character speaks, and mark the video frame in which each character appears in the video.
In an embodiment, the start frame is the first frame when a character in a video starts speaking a sentence, and if there are many characters in a video, each character will speak many, then there will be many start frames corresponding to each character. By reading the original video content, analyzing the video and the corresponding audio frame by frame, and detecting the frequency, amplitude and other contents of the audio, whether the frame content contains the role dialogue content can be judged. If so, and if no dialog content is contained in the previous consecutive frames, the frame is marked as a start frame and the position of the start frame is marked, while the speaking character is recorded. If a plurality of characters speak at the same time, information of the plurality of characters is recorded at the same time, and the information character information, including character ID and the like, is recorded in the detected video frame information, the frame being a mark frame.
For example, the video includes a character a, a character B, a character C, a character D, and a character E, as shown in fig. 1, a start frame of each character speaking may be marked in the video, the start frame may be regarded as a first frame of the character starting to speak, and the content of the character continuous speaking may include multi-frame content starting from the start frame.
After each character in the video is determined, a digital person may be set for each character for display in the video. For example, images of the principal and coordination angles in the video may be acquired, uploaded to a server, and the server obtains material from a 3D digital person assembly resource material library through AI image recognition, generates 3 relatively close digital person images for each character, and binds with the character ID. Of course, one or more digital figures may be generated as needed, and the user may edit the generated digital figures, or generate corresponding digital figures according to the user's selection or operation, etc. The digital person in the embodiment may be 3D, 2D, or the like, and may be specifically set as required, and in the embodiment, the 3D digital person is taken as an example to be described.
In the embodiment, the 3D digital person virtual images can be obtained from the server and displayed respectively corresponding to the roles. The method comprises the steps that a deaf-mute video viewer can select a favorite digital person image for each role according to the generated and displayed content, as shown in fig. 2, and the digital person image is bound with the corresponding role in a one-to-one matching way, so that each main angle and each match angle are bound with a unique digital person, and the binding relation is recorded.
In the video playing process, the video frames can be analyzed frame by frame, digital people to be displayed are searched, the video content is analyzed frame by frame, and whether the video is a marked initial frame or not is detected. If the frame is the initial frame, obtaining the character ID from the frame information, and obtaining the corresponding 3D virtual digital person from the binding relation between the digital person and the character.
In the embodiment provided by the disclosure, after the corresponding relation between the digital person and the role is determined in the above manner, the voice content in the video can be analyzed, and the number and the rendering position of the digital person in the conversation can be determined.
In the embodiment, through the above-mentioned mark frame read into video, the audio frame of the video is read from the mark frame in advance, until the reading of 2 consecutive frames has no dialogue information, the audio content of the audio frame is extracted, and the dialogue can be converted into the text through the AI speech-to-text function.
For example, if a start frame is detected from 500 th frame of video, the thread is started to quickly parse the video frame content from 500 th frame and acquire audio information of each frame, when 800 th frame is read, no dialogue content is detected in the video frame content, then 801 frames are continuously monitored, if 801 frames still have no dialogue content, the dialogue content is 800-500=300 frames long, 300 frames are pre-read, dialogue with 1 or more characters is contained in 300 frames, audio information is extracted from 300 frames, and the dialogue content is converted into text through a voice-to-text function.
The method comprises the steps of obtaining digital person information, such as one or more characters, in a marking frame, splitting the dialogue text content respectively extracted from the characters according to the characters, extracting the speaking content of each character, and determining the number of the rendered digital persons according to the number of the characters.
In the embodiment, whether the video frame is a principal angle or a match angle is determined by analyzing the video frame and identifying through an AI image, the talking content roles are compared, the coordinates of the head center point of the talking character on the upper left corner of the television screen are determined, the size information of the head of the talking character on the screen is calculated, the information is written into the detected video frame head information, and if the talking object is detected not to be in the screen, the coordinate point information is recorded as the origin of coordinates.
For example, as shown in fig. 3, the head coordinates of a certain speaking character are (x, y), where:
x=size+headWidth/2
y=top+headHeight/2
the size is the distance between the left side of the head of the character and the left side of the screen, the top is the distance between the head of the character and the top of the screen, the head width is the width of the head of the character, and the head height is the height of the head of the character.
After determining the position of the character in the screen or video frame in the manner described above, the digital person may be rendered and the position of the digital person on the screen or video frame determined.
In the embodiment, the digital person is already animated and rendered by binding the sign language, and the digital person image of the speaking character and the coordinate position of the head of the character in each frame in the screen are obtained according to the above. According to the obtained digital person and the role dialogue content of the role, the dialogue content is analyzed through AI, and the corresponding sign language animation material and the corresponding role digital person are obtained from the digital person material information base to be bound. And determining the drawing position of the digital person according to the acquired head coordinates (x, y) of the character of each frame and the face orientation of the character.
If the character is facing to the right of the screen, then when the size is greater than the width of the digital person rendering, the digital person is rendered to the left of the character's head position, and the digital person center point coordinates are drawn as (draw X, draw Y):
drawX=size-drawWidth/2
drawY=top+drawHeiht/2
wherein, drawHeiht is the rendering height of the digital person, and top is the distance of the digital person from the top of the screen.
At this time, the character faces to the right side of the screen, and the left side of the character can accommodate the display space of the digital person, so that the digital person is displayed on the left side of the character, the digital person can be prevented from affecting the speaking of the character, and the interference of the digital person to the playing content can be reduced as much as possible.
If the character is facing to the left of the screen, rendering the digital person to the right of the character's head position when the distance right between the right of the character's head and the right of the screen is greater than drawWidth, drawing digital person center point coordinates as (drawX, drawY):
drawX=right+headWidth+drawWidth/2
drawY=top+drawHeiht/2
at this time, the character faces to the left side of the screen, and the right side of the character can accommodate the display space of the digital person, so that the digital person is displayed on the right side of the character, the digital person can be prevented from affecting the speaking of the character, and the interference of the digital person to the playing content can be reduced as much as possible.
If the character is front or back facing the screen or the character head is perpendicular to the edge of the screen, the digital person corresponding to the character is drawn at the top position of the speaking character, and when top > drawHeight, the coordinates of the center point of the drawn digital person are (drawX, drawY):
drawX=right+drawWidth/2
drawY=top-drawHeiht/2
if none of the three conditions is satisfied, that is, none of the edge distance determinations is satisfied, drawing a digital person under the head position of the character, and drawing a digital person center point coordinate as (draw x, draw y):
drawX=right+drawWidth/2
drawY=top+drawHeiht+drawHeiht/2
as shown in fig. 4, the binding of the digital person's animation according to the above embodiment completes and determines the rendering position of the digital person, and in the video playing process, plays the digital person's animation, and follows the video rendering frame of the digital person's animation at the corresponding position. As shown in fig. 4, the head of character 11 is directed to the right side to talk with character 12 whose head is directed to the left side, digital person 21 is associated with character 11, and digital person 22 is associated with character 12. Since the character 11 has a sufficient distance to the boundary of the left video frame to accommodate the digital person 21, the digital person 21 can be positioned behind the character 11, and when the character 11 speaks, the digital person 21 performs the corresponding sign language action, i.e., the sign language animation of the digital person 21 is played, as is the digital person 22. When the position of the character 11 on the video frame changes, the position of the digital person 21 will also change accordingly so as not to affect the normal viewing of the video by the user, and when the character 11 does not speak, the digital person 21 will be automatically hidden. Of course, the digital person can also be always displayed in the video along with the character according to the needs of the user.
Based on the above embodiments, the embodiments provided by the present disclosure may generate a digital person according to a character object of a television program content, and parse and extract dialogue content binding sign language animation according to a program video frame to realize that the digital person displays the sign language animation along with the position of the character in the video. The role dialogue content can be extracted according to the video frames, the sign language is bound according to the dialogue content, the rendering position of the digital person is determined according to the position of the role head information in the video, and the digital person is rendered frame by frame along with the change.
And the digital person can be matched according to the video roles and can select the video roles by himself, so that the method has better operability and better experience of the user. In the embodiment, a plurality of digital persons can be simultaneously rendered according to the positions and speaking of the roles on the screen, so that the deaf-mute can watch the video content conveniently and distinguish the speaker. The viewing experience of the deaf-mute to non-news programs is greatly improved. In addition, the method can help the deaf-mute to be limited to watching news programs with sign language animation, and the embodiment can be used for various video programs such as television shows, movies, variety and sports, and can enlarge the variety of the solution of watching video content for the deaf-mute.
Based on the foregoing embodiment, in still another embodiment provided by the present disclosure, there is further provided a video playing method based on a digital person, as shown in fig. 5, the method may include the following steps:
in step S510, during the playing process of the target video, a target character that currently generates the voice content in the target video is acquired, and a sign language animation corresponding to the voice content is acquired.
In an embodiment, role information of each target role in each video frame of the target video and voice content corresponding to each target role may be acquired. Specifically, the target video generally includes many video frames, and there may be a plurality of target characters in the target video, for example, the character ID of the target character may be character a, character B, character C, character D, character E, character F, character G, and the like. The character information of one target character may include a character ID of the target character. The method and the device can further comprise an image of the target character, a target position of the head of the target character in the video frame, a head orientation of the target character and the like, and character information of the target character is not particularly limited in the embodiment of the disclosure.
In addition to acquiring character information of the target characters, in order to accurately obtain sign language animation of the digital person corresponding to the target characters in the subsequent steps, voice content corresponding to each target character needs to be acquired.
In an embodiment, for each target character, a digital person corresponding to the target character and a target position of the digital person in the target video frame may be determined based on character information of the target character.
Specifically, the character information of the target character may include an image of the target character, a target position of a head of the target character in a video frame, a head orientation of the target character, and the like, a digital person corresponding to the target character may be determined based on the image of the target character, for example, the target character is a female, and then the digital person representing the female may be drawn; the target position of the digital person in the video frame can also be determined based on the target position of the target character, for example, the digital person of one target character can be drawn beside the target character, so that the deaf user can better know what target character is said by the sign language animation of the digital person.
In the embodiment, the voice content can be converted into the corresponding sign language action through the voice content corresponding to the target role, so that the sign language action is controlled to be executed by the digital person by binding the sign language action with the digital person, and the sign language animation of the digital person is obtained.
In step S520, a digital person matching the target character is acquired.
In an embodiment, since multiple characters may exist in the target video, different characters may produce speech content at different times, e.g., one character speaking into another character, etc. Therefore, in order to make the deaf-mute audience know that the character is speaking at the first time, the corresponding digital person can be obtained, and the corresponding speaking character is represented by different digital person images, so that when the target character speaks, the corresponding digital person can be matched from a plurality of digital persons, and the corresponding sign language animation can be displayed through the digital person.
In step S530, a sign language animation corresponding to the digital person is displayed in the target video.
In the embodiment provided by the disclosure, the corresponding sign language animation can be displayed by the digital person corresponding to the obtained target role, so that the audience can obtain the speaking content of the target role through the sign language animation at the first time, and as different roles correspond to different digital persons, the audience can also distinguish which role is speaking from the corresponding digital person, and the experience degree of watching the video by the user can be greatly improved.
In addition, in the embodiment, role information corresponding to the target role in the target video can be acquired, the position of the digital person in the target video is determined based on the role information, and sign language animation corresponding to the digital person is displayed at the target position of the target video.
After determining the position of the digital person in the video frame of the target video and the sign language animation of the digital person, the sign language animation can be played at the target position in the target video frame. In an embodiment, the target frame may be a video frame where the target character is located, and the digital person in the target video frame has corresponding voice content, that is, the target character has speaking content. The target position can be a specific position in the target video frame or is not overlapped with the position of the target role in the target video frame, so that the phenomenon that the image of the digital person is overlapped on the target role and the normal watching of the user is influenced is avoided. Further, the target position may also be a relative position to the target character in the target video frame, for example, a position behind the target character or other insignificant positions, so as to avoid affecting normal viewing of the user.
According to the video playing method based on the digital person, the digital person corresponding to the target role and the target position of the digital person in the target video frame are determined by acquiring the role information of the target role and the corresponding voice content in the target video, and the sign language animation corresponding to the voice content is determined, so that the sign language animation of the digital person is played at the target position in the target video frame in the target video playing process. The digital person corresponding to the target role and the position of the sign language animation of the digital person in the target video frame can be determined, so that the embodiment of the application can diversify the image of the digital person, the speaking role can be timely distinguished from the displayed digital person by a user, and the sign language animation of the digital person can be played in the target position of the target video frame, so that the digital person can be prevented from affecting the normal watching of the video by the user.
Based on the above embodiment, in still another embodiment provided in the present disclosure, the step S510 may specifically further include the following steps:
in step S511, the audio content of the target video is acquired.
Step S512 determines whether the audio content includes voice content of the target character.
In step S513, in the case where the audio content includes the voice content of the target character, character information of the target character in the target video frame of the target video is acquired.
In an embodiment, the audio content may be detected, for example, by using audio features such as frequency, amplitude, and the like to identify whether the audio content includes the speaking content of the target character, and if so, the speaking position of the target character in the target audio may be located, and thus the speaking position of the target character in the target video may be corresponding. If it is detected that the target characters in the audio content include a plurality of target characters, character information of each target character, such as a character a, a character B, etc., may be recorded in the target video correspondingly, and the positions of the video frames where each target character appears in the target video are separated from each other, as shown in fig. 1, which may be specifically referred to the description of the above embodiments, and will not be repeated herein.
Based on the above embodiment, in yet another embodiment provided by the present disclosure, the method may further include the steps of:
Step S541, acquire the audio content of the target video.
In step S542, in the case where it is determined that the target video contains the target character based on the audio content, a start frame and an end frame of the target character in the target video are acquired, and a video frame between the start frame and the end frame is taken as a target video frame.
In step S543, the audio content corresponding to the target video frame is converted into text content.
Step S544, splitting the text content, determining the number of target roles in the target video, and obtaining the voice content corresponding to each target role.
In an embodiment, a plurality of target video frames, that is, a start frame for a target character to speak, may be marked in a target video, in order to obtain the voice content spoken by each target character, the audio content of a preset number of frames of video frames may be read in advance from the target video frames until two consecutive frames have no character dialogue content, that is, until no character dialogue content is read, and then the read audio content is converted into text content through an AI voice to text function, and finally the text content is split to obtain the text content corresponding to each target character, where the text content corresponding to one target character is the voice content spoken by the target character.
According to the technical scheme provided by the embodiment of the disclosure, the voice content corresponding to each target role can be accurately determined by pre-reading the voice content of the video frames with the preset number of frames, so that the sign language animation can be drawn for each target role in advance based on the voice content corresponding to each target role, and further, the situation that the digital sign language animation is delayed in the video playing process can be prevented.
Based on the above embodiment, in yet another embodiment provided by the present disclosure, the method may further include the steps of:
step S521, acquiring the character image of the target character, the head position of the head of the target character in the target video frame, and the head orientation of the target character.
Step S522, digital person resource materials matched with the character image are obtained, and the obtained digital person resource materials are utilized to generate digital persons corresponding to the target character.
Step S523, determining the target position of the digital person corresponding to the target character in the target video frame based on the head position of the head of the target character in the target video frame and the head orientation of the target character.
In an embodiment, an image of each target role may be acquired, the image is uploaded to a server, the server obtains an image recognition result through AI image recognition, and materials may be acquired from a 3D digital person assembly resource material library to generate a digital person for each target role.
In the embodiment, the acquired digital person assembly resource materials can be synthesized to obtain a plurality of digital persons corresponding to the target roles. And receiving a selection triggering operation of the target user for the digital person, and determining the digital person corresponding to the target role from the plurality of digital persons in response to the selection triggering operation. For example, images of the principal and coordination angles in the video may be acquired, uploaded to a server, and the server obtains material from a 3D digital person assembly resource material library through AI image recognition, generates a plurality of relatively close digital person images for each character, and binds with the character ID. The user can edit the generated digital persona or generate a corresponding digital persona according to the selection or operation of the user. Therefore, the user can perform selection operation on a plurality of digital persons, and when the user receives the selection triggering operation on the digital persons, the corresponding digital persons are determined for the user.
In the embodiment, the target position of the digital person corresponding to the target role in the target video frame can be determined through the head position of the target role in the target video frame and the head orientation of the target role. Such as the head of the target character facing to the left or right, etc., and the distance of the target object from the boundary determines the position of the digital person in the target video frame.
For example, in the case that the head position of the target character meets the first preset condition, determining that the position of the digital person corresponding to the target character is located at the first side of the target character; the first preset condition comprises that the head direction of the target character is the second side, and the first distance is larger than the drawing width of the digital person corresponding to the target character; the first side is opposite to the second side; the first distance is a distance of a head of the target character from a first side boundary of the target video frame. Determining a target position of the digital person corresponding to the target role in the target video frame based on the first distance, the second distance, the drawing width of the digital person and the drawing height of the digital person; the second distance is the distance between the head of the target character and the top boundary of the target video frame.
In the embodiments, the first side corresponds to the left side in the above embodiments, and the second side corresponds to the right side in the above embodiments. If the target character faces to the right side of the screen, and the left side of the target character can accommodate the display space of the digital person, the digital person is displayed on the left side of the target character, so that the digital person can be prevented from affecting the speaking of the character, and the interference of the digital person to the played content can be reduced as much as possible. If the target character faces to the left side of the screen and the right side of the target character can accommodate the display space of the digital person, the digital person is displayed on the right side of the character, so that the digital person can be prevented from affecting the speaking of the character and the interference of the digital person to the playing content can be reduced as much as possible.
In an embodiment, under the condition that the head position of the target role meets a second preset condition, determining that the position of the digital person corresponding to the target role is positioned at the top of the target role; the second preset condition is that the head direction is perpendicular to the plane where the screen is located, and the second distance is larger than the drawing height of the digital person corresponding to the target role; the second distance is the distance between the head of the target character and the top boundary of the target video frame. Determining a target position of the digital person corresponding to the target role in the target video frame based on the first distance, the second distance, the drawing width of the digital person and the drawing height of the digital person; the first distance is a distance of a head of the target character from a first side boundary of the target video frame. If the target character is the edge of the front face or the back face or the head of the character vertical to the screen, the digital person corresponding to the target character is drawn at the top position of the target character, so that the digital person can be prevented from influencing the content of the video normally played.
In the case of dividing each functional module by adopting corresponding each function, the embodiment of the disclosure provides a digital person-based video playing device, which can be a server or a chip applied to the server. Fig. 6 is a schematic block diagram of functional modules of a digital person-based video playing device according to an exemplary embodiment of the present disclosure.
As shown in fig. 6, the digital person-based video playback apparatus includes:
the first information acquisition module 10 is used for acquiring a target role of a voice content currently generated in a target video and acquiring a sign language animation corresponding to the voice content in a target video playing process;
a second information acquisition module 20, configured to acquire a digital person matched with the target character;
and the display module 30 is used for displaying the sign language animation corresponding to the digital person in the target video.
In yet another embodiment provided by the present disclosure, the display module has a display module for:
acquiring role information corresponding to the target role in the target video;
and determining the position of the digital person in the target video based on the role information, and displaying the sign language animation corresponding to the digital person at the target position of the target video.
In yet another embodiment provided by the present disclosure, the first information obtaining module is specifically configured to:
acquiring the audio content of the target video;
determining whether the audio content includes voice content of a target character;
and acquiring the role information of the target role in the target video frame of the target video under the condition that the audio content comprises the voice content of the target role.
In yet another embodiment provided by the present disclosure, the apparatus further comprises:
the audio content acquisition module is used for acquiring the audio content of the target video;
the video frame acquisition module is used for acquiring a start frame and an end frame of the target role in the target video and taking a video frame between the start frame and the end frame as a target video frame under the condition that the target video contains the target role based on the audio content;
the conversion module is used for converting the audio content corresponding to the target video frame into text content;
and the content splitting module is used for splitting the text content, determining the number of the target roles in the target video and obtaining the voice content corresponding to each target role.
In yet another embodiment provided by the present disclosure, the apparatus further includes a processing module, specifically configured to:
acquiring a role image of the target role, a head position of a head of the target role in a target video frame and a head orientation of the target role;
acquiring digital human resource materials matched with the character image, and generating digital people corresponding to the target character by utilizing the acquired digital human resource materials;
And determining the target position of the digital person corresponding to the target role in the target video frame based on the head position of the head of the target role in the target video frame and the head orientation of the target role.
In a further embodiment provided by the present disclosure, the processing module is specifically further configured to:
under the condition that the head position of the target role meets a first preset condition, determining that the position of the digital person corresponding to the target role is located at the first side of the target role; the first preset condition comprises that the head direction of the target role is a second side, and the first distance is larger than the drawing width of the digital person corresponding to the target role; the first side is opposite to the second side; the first distance is the distance between the head of the target character and the first side boundary of the target video frame;
determining a target position of the digital person corresponding to the target role in a target video frame based on the first distance, the second distance, the drawing width of the digital person and the drawing height of the digital person; the second distance is a distance between the head of the target character and the top boundary of the target video frame.
In a further embodiment provided by the present disclosure, the processing module is specifically further configured to:
under the condition that the head position of the target role meets a second preset condition, determining that the position of the digital person corresponding to the target role is positioned at the top of the target role; the second preset condition is that the head direction is perpendicular to the plane where the screen is located, and the second distance is larger than the drawing height of the digital person corresponding to the target role; the second distance is the distance between the head of the target character and the top boundary of the target video frame;
determining a target position of the digital person corresponding to the target role in a target video frame based on the first distance, the second distance, the drawing width of the digital person and the drawing height of the digital person; the first distance is a distance of a head of the target character from a first side boundary of the target video frame.
In a further embodiment provided by the present disclosure, the processing module is specifically further configured to:
synthesizing the acquired digital person assembly resource materials to obtain a plurality of digital persons corresponding to the target roles;
and receiving a selection triggering operation of a target user for the digital person, and responding to the selection triggering operation to determine the digital person corresponding to the target role from the plurality of digital persons.
For the device parts, reference may be made specifically to the description of the above embodiments, and details are not repeated here.
According to the video playing device based on the digital person, role information of a target role in a target video and voice content corresponding to the target role are obtained; and acquiring the sign language animation of the digital person corresponding to the target role information and the sign language animation of the digital person corresponding to the voice content, so that the sign language animation corresponding to the digital person is displayed at the target position of the target video in the target video playing process. The digital person corresponding to the target role information and the sign language animation corresponding to the digital person in the target video are obtained, so that the image of the digital person can be diversified, a user can conveniently and timely distinguish the speaking role from the displayed digital person, and the sign language animation of the digital person is played in the target position of the target video frame, so that the digital person can be prevented from affecting the normal watching of the video by the user.
The embodiment of the disclosure also provides an electronic device, including: at least one processor; a memory for storing the at least one processor-executable instruction; wherein the at least one processor is configured to execute the instructions to implement the above-described methods disclosed by embodiments of the present disclosure.
Fig. 7 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 7, the electronic device 1800 includes at least one processor 1801 and a memory 1802 coupled to the processor 1801, the processor 1801 may perform corresponding steps in the above-described methods disclosed by embodiments of the present disclosure.
The processor 1801 may also be referred to as a central processing unit (central processing unit, CPU), which may be an integrated circuit chip with signal processing capabilities. The steps of the above-described methods disclosed in the embodiments of the present disclosure may be accomplished by instructions in the form of integrated logic circuits or software in hardware in the processor 1801. The processor 1801 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), an ASIC, an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may reside in a memory 1802 such as random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as is well known in the art. The processor 1801 reads the information in the memory 1802 and, in combination with its hardware, performs the steps of the method described above.
In addition, various operations/processes according to the present disclosure, in the case of being implemented by software and/or firmware, may be installed from a storage medium or network to a computer system having a dedicated hardware structure, such as the computer system 1900 shown in fig. 8, which is capable of performing various functions including functions such as those described above, and the like, when various programs are installed. Fig. 8 is a block diagram of a computer system according to an exemplary embodiment of the present disclosure.
Computer system 1900 is intended to represent various forms of digital electronic computing devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the computer system 1900 includes a computing unit 1901, and the computing unit 1901 can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1902 or a computer program loaded from a storage unit 1908 into a Random Access Memory (RAM) 1903. In the RAM 1903, various programs and data required for the operation of the computer system 1900 may also be stored. The computing unit 1901, ROM 1902, and RAM 1903 are connected to each other via a bus 1904. An input/output (I/O) interface 1905 is also connected to bus 1904.
Various components in computer system 1900 are connected to I/O interface 1905, including: an input unit 1906, an output unit 1907, a storage unit 1908, and a communication unit 1909. The input unit 1906 may be any type of device capable of inputting information to the computer system 1900, and the input unit 1906 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 1907 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1908 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1909 allows the computer system 1900 to exchange information/data with other devices over a network, such as the internet, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 1901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1901 performs the various methods and processes described above. For example, in some embodiments, the above-described methods disclosed by embodiments of the present disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1908. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1900 via ROM 1902 and/or communication unit 1909. In some embodiments, the computing unit 1901 may be configured to perform the above-described methods of the disclosed embodiments by any other suitable means (e.g., by means of firmware).
The disclosed embodiments also provide a computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above-described method disclosed by the disclosed embodiments.
A computer readable storage medium in embodiments of the present disclosure may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium described above can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specifically, the computer-readable storage medium described above may include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The disclosed embodiments also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the above-described methods of the disclosed embodiments.
In an embodiment of the present disclosure, computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computers may be connected to the user computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computers.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules, components or units referred to in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module, component or unit does not in some cases constitute a limitation of the module, component or unit itself.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
The above description is merely illustrative of some embodiments of the present disclosure and of the principles of the technology applied. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (10)

1. A digital person-based video playback method, the method comprising:
in the process of playing a target video, acquiring a target role of a voice content currently generated in the target video, and acquiring a sign language animation corresponding to the voice content;
acquiring a digital person matched with the target role;
and displaying the sign language animation corresponding to the digital person in the target video.
2. The method of claim 1, wherein displaying the sign language animation corresponding to the digital person in the target video comprises:
acquiring role information corresponding to the target role in the target video;
and determining the position of the digital person in the target video based on the role information, and displaying the sign language animation corresponding to the digital person at the target position of the target video.
3. The method according to claim 2, wherein the method further comprises:
acquiring the audio content of the target video;
determining whether the audio content includes voice content of a target character;
and acquiring character information of the target character in a target video frame of the target video under the condition that the audio content comprises voice content of the target character.
4. The method according to claim 1, wherein the method further comprises:
acquiring the audio content of a target video;
under the condition that the target video contains the target role based on the audio content, acquiring a start frame and an end frame of the target role in the target video, and taking a video frame between the start frame and the end frame as a target video frame;
converting the audio content corresponding to the target video frame into text content;
splitting the text content, determining the number of the target roles in the target video, and obtaining the voice content corresponding to each target role.
5. The method according to any one of claims 1 to 4, further comprising:
acquiring a role image of the target role, a head position of a head of the target role in a target video frame and a head orientation of the target role;
acquiring digital human resource materials matched with the character image, and generating digital people corresponding to the target character by utilizing the acquired digital human resource materials;
and determining the target position of the digital person corresponding to the target role in the target video frame based on the head position of the head of the target role in the target video frame and the head orientation of the target role.
6. The method of claim 5, wherein determining the target position of the digital person corresponding to the target character in the target video frame based on the head position of the head of the target character in the target video frame and the head orientation of the target character comprises:
under the condition that the head position of the target role meets a first preset condition, determining that the position of the digital person corresponding to the target role is located at the first side of the target role; the first preset condition comprises that the head direction of the target role is a second side, and the first distance is larger than the drawing width of the digital person corresponding to the target role; the first side is opposite to the second side; the first distance is the distance between the head of the target character and the first side boundary of the target video frame;
determining a target position of the digital person corresponding to the target role in a target video frame based on the first distance, the second distance, the drawing width of the digital person and the drawing height of the digital person; the second distance is a distance between the head of the target character and the top boundary of the target video frame.
7. The method of claim 5, wherein determining the target position of the digital person corresponding to the target character in the target video frame based on the head position of the head of the target character in the target video frame and the head orientation of the target character comprises:
Under the condition that the head position of the target role meets a second preset condition, determining that the position of the digital person corresponding to the target role is positioned at the top of the target role; the second preset condition is that the head direction is perpendicular to the plane where the screen is located, and the second distance is larger than the drawing height of the digital person corresponding to the target role; the second distance is the distance between the head of the target character and the top boundary of the target video frame;
determining a target position of the digital person corresponding to the target role in a target video frame based on the first distance, the second distance, the drawing width of the digital person and the drawing height of the digital person; the first distance is a distance of a head of the target character from a first side boundary of the target video frame.
8. The method of claim 5, wherein the generating the digital person corresponding to the target character using the obtained digital person resource material comprises:
synthesizing the acquired digital person assembly resource materials to obtain a plurality of digital persons corresponding to the target roles;
and receiving a selection triggering operation of a target user for the digital person, and responding to the selection triggering operation to determine the digital person corresponding to the target role from the plurality of digital persons.
9. A digital person-based video playback device, the device comprising:
the first information acquisition module is used for acquiring a target role of the voice content currently generated in the target video and acquiring a sign language animation corresponding to the voice content in the target video in the playing process of the target video;
the second information acquisition module is used for acquiring the digital person matched with the target role;
and the display module is used for displaying the sign language animation corresponding to the digital person in the target video.
10. An electronic device, comprising:
at least one processor;
a memory for storing the at least one processor-executable instruction;
wherein the at least one processor is configured to execute the instructions to implement the method of any of claims 1-7.
CN202310880242.3A 2023-07-18 2023-07-18 Video playing method and device based on digital person, electronic equipment and storage medium Pending CN117041645A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310880242.3A CN117041645A (en) 2023-07-18 2023-07-18 Video playing method and device based on digital person, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310880242.3A CN117041645A (en) 2023-07-18 2023-07-18 Video playing method and device based on digital person, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117041645A true CN117041645A (en) 2023-11-10

Family

ID=88632643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310880242.3A Pending CN117041645A (en) 2023-07-18 2023-07-18 Video playing method and device based on digital person, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117041645A (en)

Similar Documents

Publication Publication Date Title
CN109819313B (en) Video processing method, device and storage medium
CN110941954B (en) Text broadcasting method and device, electronic equipment and storage medium
CN109637518B (en) Virtual anchor implementation method and device
CN109729420B (en) Picture processing method and device, mobile terminal and computer readable storage medium
US10542323B2 (en) Real-time modifiable text captioning
EP2917824B1 (en) Information processing apparatus, information processing method, and program
EP2925005A1 (en) Display apparatus and user interaction method thereof
US20160343351A1 (en) Prioritized display of visual content in computer presentations
KR102193029B1 (en) Display apparatus and method for performing videotelephony using the same
CN111147880A (en) Interaction method, device and system for live video, electronic equipment and storage medium
CN107403011B (en) Virtual reality environment language learning implementation method and automatic recording control method
CN108965981B (en) Video playing method and device, storage medium and electronic equipment
CN111050023A (en) Video detection method and device, terminal equipment and storage medium
CN107204027B (en) Image processing device, display device, animation generation method, and animation display method
US20120301030A1 (en) Image processing apparatus, image processing method and recording medium
KR20210110852A (en) Image deformation control method, device and hardware device
CN112601120B (en) Subtitle display method and device
US12019669B2 (en) Method, apparatus, device, readable storage medium and product for media content processing
CN111556350B (en) Intelligent terminal and man-machine interaction method
CN114556469A (en) Data processing method and device, electronic equipment and storage medium
CN111699673A (en) Electronic device and operation method thereof
CN113301372A (en) Live broadcast method, device, terminal and storage medium
CN113992972A (en) Subtitle display method and device, electronic equipment and readable storage medium
JP2017146672A (en) Image display device, image display method, image display program, and image display system
KR20130096983A (en) Method and apparatus for processing video information including face

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination