CN112988100A - Video playing method and device - Google Patents

Video playing method and device Download PDF

Info

Publication number
CN112988100A
CN112988100A CN202110382925.7A CN202110382925A CN112988100A CN 112988100 A CN112988100 A CN 112988100A CN 202110382925 A CN202110382925 A CN 202110382925A CN 112988100 A CN112988100 A CN 112988100A
Authority
CN
China
Prior art keywords
sentence
audio
text
keyword
playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110382925.7A
Other languages
Chinese (zh)
Inventor
胡其斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhangmen Science and Technology Co Ltd
Original Assignee
Shanghai Zhangmen Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhangmen Science and Technology Co Ltd filed Critical Shanghai Zhangmen Science and Technology Co Ltd
Priority to CN202110382925.7A priority Critical patent/CN112988100A/en
Publication of CN112988100A publication Critical patent/CN112988100A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/1407General aspects irrespective of display type, e.g. determination of decimal point position, display with fixed or driving decimal point, suppression of non-significant zeros
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application discloses a video playing method and device, and relates to the technical field of artificial intelligence including computer vision and deep learning. The specific implementation mode comprises the following steps: acquiring a text, determining keywords of the text, and acquiring an active graph indicated by an active graph label matched with the keywords in an active graph set; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph; and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video. The method and the device can convert the text into the motion picture in real time to realize video playing, so that the text can be converted into a multimedia form in real time, and the visual experience of a user is enriched. In addition, by matching the keywords, an animation accurately matched with the text can be obtained, and the playing effect of the video is improved.

Description

Video playing method and device
Technical Field
The application relates to the technical field of computers, in particular to the technical field of artificial intelligence including computer vision and deep learning, and particularly relates to a video playing method and device.
Background
Text is a common form of static presentation. Typically a sentence or a combination of sentences having a complete, systematic meaning.
With the increasing performance of electronic device hardware, real-time synthesis technology for converting text into sound content becomes possible. In the related art, the server may implement synthesis from Text To audio through a Speech synthesis technology (Text To Speech, TTS). Thus, the text can be converted into an output form and output in an audio manner.
Disclosure of Invention
A video playing method, a video playing device, an electronic device and a storage medium are provided.
According to a first aspect, there is provided a method for playing a video, including: the method comprises the steps of obtaining a text, determining keywords of the text, and obtaining an action graph indicated by an action graph label matched with the keywords in an action graph set, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph; and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video.
According to a second aspect, there is provided a video playing apparatus, comprising: the obtaining unit is configured to obtain a text, determine keywords of the text, and obtain an action graph indicated by an action graph label matched with the keywords in an action graph set, wherein the keywords exist in each sentence of the text; the rendering unit is configured to acquire and play audio corresponding to each sentence in each sentence according to the sequence of each sentence in the text, and render an obtained dynamic graph corresponding to the keyword of the sentence in the dynamic graph; and the ending unit is configured to respond to the fact that the rendering of the dynamic images corresponding to the keywords of the sentences is determined to be finished or a playing stopping instruction is received, and end the playing flow of the video.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of any embodiment of the video playing method.
According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the playing method of the video.
According to a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to any embodiment of the method of playing back a video.
According to the scheme of the application, the text can be converted into the dynamic image in real time, so that video playing is achieved, static display of the paper surface of the article is converted into dynamic video playing, the text can be converted into a multimedia form in real time, and visual experience of a user is enriched. In addition, by matching the keywords, an animation accurately matched with the text can be obtained, and the playing effect of the video is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of playing a video according to the present application;
FIG. 3 is a flow diagram of yet another embodiment of a method of playing a video according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method of playing a video according to the present application;
fig. 5 is a schematic structural diagram of an embodiment of a video playback apparatus according to the present application;
fig. 6 is a block diagram of an electronic device for implementing a video playing method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which an embodiment of a video playback method or a video playback apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103.
Here, the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and perform other processing on the received data such as the text, and feed back a processing result (e.g., a motion picture) to the terminal device.
It should be noted that the video playing method provided in the embodiment of the present application may be executed by the terminal devices 101, 102, and 103, and accordingly, the video playing apparatus may be disposed in the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a method of playing a video according to the present application is shown. The video playing method comprises the following steps:
step 201, obtaining a text, determining keywords of the text, and obtaining an active graph indicated by an active graph label matched with the keywords in an active graph set, wherein the keywords exist in each sentence of the text.
In this embodiment, an execution subject (for example, the terminal device shown in fig. 1) on which the video playing method is executed may acquire a text and determine a keyword of the text. After that, the execution body may obtain an animation corresponding to the keyword. The execution main body can extract the keywords of the text in the device, and can also send the text to other electronic devices (such as a server) and receive the keywords returned by the other electronic devices.
The corresponding dynamic graph of the keyword refers to the dynamic graph indicated by the dynamic graph label matched with the keyword. The execution main body or other electronic equipment can match the keywords with the moving picture labels in the moving picture set, so as to obtain a matching result. The execution body may obtain an animation indicated by the matching animation tag in the matching result.
Corresponding keywords may exist in each sentence, and the keyword existing in each sentence may be at least one. The device or other electronic devices can have a picture set, and the pictures in the picture set have corresponding keywords. For example, the keywords of a plurality of pictures for playing basketball may be "playing basketball", and may also be "playing basketball" or "sports".
Step 202, according to the sequence of each sentence in the text, for the sentence in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the obtained dynamic graph corresponding to the keyword of the sentence in the dynamic graph.
In this embodiment, the execution main body may obtain, for a sentence (for example, each sentence) in each sentence according to a sequence (or reading sequence) of the line text of each sentence in the text, an audio corresponding to the sentence, and play the audio. Moreover, the execution main body can also render the dynamic graph corresponding to the keyword of the sentence, so that the display of the dynamic graph and the playing of the audio are realized, and the video playing is further realized. The motion picture is a motion picture among the acquired motion pictures.
Step 203, in response to determining that rendering of the motion picture corresponding to the keyword of each sentence is completed or receiving a play stop instruction, ending the video play flow.
In this embodiment, the execution main body may end the video playback flow when it is determined that rendering of the motion picture corresponding to the keyword of each sentence is completed, or when a display stop instruction is received.
The method provided by the embodiment of the application can convert the text into the moving picture in real time so as to realize video playing, thereby converting the static display of the paper of the article into the dynamic playing of the video, converting the text into a multimedia form in real time and enriching the visual experience of the user. In addition, by matching the keywords, an animation accurately matched with the text can be obtained, and the playing effect of the video is improved.
In some optional implementations of this embodiment, the matching is semantic fuzzy matching; the generating of the matching result may include: obtaining the semantics of the keywords and the semantics of the dynamic image labels in the dynamic image set; in the dynamic image labels of the dynamic image set, the dynamic image labels with the same or similar semanteme with the keywords are determined as the matched dynamic image labels.
In these optional implementation manners, the execution main body or other electronic devices may obtain semantics of the keyword and semantics of the dynamic tag in the dynamic image set, and determine the matching dynamic image tag through the semantics. The fuzzy matching may be that, in each of the moving picture labels in the moving picture set, a moving picture label having the same or similar semantics as the above-mentioned keyword is determined as a matched moving picture label.
The semantic similarity between the keyword and the moving picture labels may refer to that the similarity is greater than a preset threshold, or may refer to that the moving picture labels are determined according to the sequence of similarity from large to small in the similarity between the keyword and each moving picture label.
These implementations can increase the accuracy of matching to an animation using keywords through semantic matching.
In some optional implementations of this embodiment, the generating step of the animation in the animation set may include: acquiring an avatar, and making an animation from a plurality of continuous expression action pictures of the avatar, wherein the expression action pictures are used for presenting facial expressions and/or body actions of the avatar; and adding a dynamic graph label to the dynamic graph, wherein the dynamic graph after the label is added is taken as one dynamic graph in the dynamic graph set, and the dynamic graph label is used for indicating the motion and/or the motion presented in a plurality of continuous expression motion pictures.
In these alternative implementations, the executing body or other electronic devices may execute the step of generating the moving picture. Taking the execution main body as an example, when the execution main body generates the motion picture, the execution main body can acquire a plurality of pictures showing facial posture expressions and/or body movements of the avatar, that is, expression movement pictures. The plurality of pictures taken together may present an avatar with continuous facial pose expressions and/or body movements. The execution body may also add labels to the produced actions, where the labels are used to indicate expressions and/or actions presented in the motion picture, such as "smiling", "basketball".
These implementations may generate a motion picture label with expressive actions, thereby letting the motion picture label vividly indicate an avatar in the motion picture.
In some optional implementation manners of this embodiment, the rendering an animation corresponding to the keyword of the sentence in the acquired animation in step 202 may include: and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph, and rendering the sentence as a subtitle to an upper layer of the dynamic graph.
In these optional implementations, the execution main body may render not only the motion picture corresponding to the keyword of the sentence, but also render the sentence as a subtitle corresponding to the rendered motion picture, that is, a subtitle of a video, to an upper layer of the motion picture.
These implementations can display subtitles corresponding to a motion picture for the motion picture, and explain the motion picture by the subtitles, thereby contributing to the understanding of the video content by the viewer.
In some optional implementation manners of this embodiment, the rendering an animation corresponding to the keyword of the sentence in the acquired animation in step 202 may include: acquiring a compressed texture file packet comprising texture features of the motion picture; and decompressing the compressed texture file package, binding the dynamic image and the decompressed texture, and rendering the binding result.
In these alternative implementations, the execution body may obtain (e.g., generate in advance) the compressed texture file package, and may perform rendering on the motion graph by parsing the texture package and binding the motion graph and the motion graph texture.
In some optional implementations of this embodiment, the number of keywords of each sentence is at least one; obtaining an action graph indicated by an action graph label matched with the keyword in an action graph set, wherein the step comprises the following steps: sending an image request including the keywords to a server, wherein the server searches an image with the highest matching degree of the labels and the keywords in an image set and returns the image; and receiving the dynamic graph returned by the server as the dynamic graph indicated by the matched dynamic graph label.
In these alternative implementations, the execution main body may send an image request including the keyword to the server, and the server may determine the image corresponding to the keyword. The server may determine the best matching at least one of the kinegrams. In the case where more than one of the identified profiles is determined, the execution body may select from the returned results.
The implementation modes can enable the server to execute the dynamic graph searching task which needs to consume more running resources, so that the dynamic graph obtaining efficiency is improved.
In some optional implementations of this embodiment, the method is applied to a terminal, and the terminal is installed with a video application; obtaining a text, comprising: responding to the starting of the video application, and acquiring an input URL address at the video application; and determining the page indicated by the URL address, and analyzing the text from the page.
In these alternative implementations, as the execution subject of the terminal, the URL address input by the user (such as copied or manually input) may be acquired by the video application when the video application is started. Then, the execution body may determine the page indicated by the address, and parse the text from the page to extract the text.
These implementations may extract text in a page, thereby providing a way to obtain text.
In some optional implementations of this embodiment, acquiring and playing the audio corresponding to the sentence includes: and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.
In these alternative implementations, the audio synthesis model may complete the process from text to speech, i.e., audio. The execution body may input the sentence to an audio synthesis model, thereby obtaining audio output from the model.
In these implementations, when the execution subject is a terminal device, the execution subject may locally implement audio synthesis through an audio synthesis model.
With continued reference to fig. 3, fig. 3 is a flow chart of still another embodiment of a video playing method according to the present embodiment.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method of playing a video is shown. The process 400 includes the following steps:
step 401, obtaining a text, determining keywords of the text, and obtaining an active graph indicated by an active graph label matched with the keywords in an active graph set, where each sentence of the text has the keywords respectively.
In this embodiment, an execution subject (for example, the terminal device shown in fig. 1) on which the video playing method is executed may acquire a text and determine a keyword of the text. After that, the execution body may obtain an animation corresponding to the keyword. The execution main body can extract the keywords of the text in the device, and can also send the text to other electronic devices (such as a server) and receive the keywords returned by the other electronic devices.
Step 402, performing an audio acquisition step: and determining at least one sentence which is not acquired from each sentence according to the sequence, and acquiring the audio corresponding to the at least one sentence.
In this embodiment, the executing body may execute the audio obtaining step, specifically, determine at least one sentence that is not obtained from each sentence according to the above sequence, and obtain the audio corresponding to the at least one sentence. For example, the next sentence may be read sequentially according to the sequence, and the audio of the sentence may be obtained.
Step 403, extracting an animation corresponding to the keyword of at least one sentence from the obtained animation, rendering the animation, and playing the obtained audio.
In this embodiment, the execution body may extract an animation corresponding to the keyword of at least one sentence from the acquired animation, and render the animation. The execution main body can not only perform motion picture rendering, but also play audio so as to realize playing in a multimedia form.
Step 404, in response to determining that rendering of the motion picture corresponding to the keyword of each sentence is completed or receiving a play stop instruction, ending the video play flow.
In this embodiment, the execution main body may end the video playback flow when it is determined that rendering of the motion picture corresponding to the keyword of each sentence is completed, or when a display stop instruction is received.
The embodiment can acquire at least one statement at a time and convert the at least one statement into a part of the video so as to realize accurate conversion of the video.
In some optional implementation manners of this embodiment, the obtaining and playing, according to the sequence of each sentence in the text, the audio corresponding to the sentence for the sentence in each sentence, and rendering the obtained animation corresponding to the keyword of the sentence in the animation may further include: executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time; and for the sentence corresponding to the audio, extracting the dynamic graph corresponding to the keyword of the sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the audio.
In these optional implementations, the executing main body may execute the audio acquiring step again, so as to obtain the audio generated by the execution. Then, the execution main body may continue to extract an animation corresponding to the keyword of the sentence corresponding to the audio (e.g., at least one sentence that is determined according to the above sequence and is not yet acquired). The execution body may render the animation and play the audio.
The alternative implementation modes can realize the continuous playing of the video by completing a plurality of audio generation, playing and moving picture extraction and rendering processes.
In some optional application scenarios of these implementations, the method may further include: and displaying the motion picture with the speaking virtual image in response to that the motion picture corresponding to the keyword of at least one sentence is not extracted from the acquired motion pictures.
In these optional application scenarios, the execution subject may display an action image (default action image) presenting the talking avatar in a case where the action image corresponding to the keyword is not extracted. Specifically, the motion picture presented with the talking avatar is acquired in advance. The motion picture with the speaking virtual image is presented, which means that the virtual image is presented in the motion picture and the virtual image is speaking.
The selectable application scenes can adopt the motion pictures of the speaking virtual images to vividly present pictures, and are beneficial to compensating the uncomfortable experience of the user under the condition that the motion pictures corresponding to the keywords of the text cannot be displayed.
Optionally, before the displaying presents the moving picture with the talking avatar, the method may further comprise: uploading the at least one sentence to a server so that the server synthesizes and presents an animation in which the speaking avatar speaks the at least one sentence; and receiving the dynamic graph synthesized by the server.
In these alternative application scenarios, the execution body may upload at least one sentence to the server in advance, so that the server synthesizes an animation to be presented when an animation with no keyword match is found. Specifically, the motion picture can present an avatar explaining the above sentence, so that the user can appropriately understand what the text originally intended to express in a vivid form.
In some optional implementations of this embodiment, the performing the audio obtaining step again may include: and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.
Specifically, the executing body may execute the audio acquiring step again when the audio is played and there is an unexecuted sentence (that is, there is a next sentence). For example, the execution body may first determine whether the last generated audio is played completely, and in a case that the result is yes, determine whether there is an unexecuted sentence. If so, the executing entity may execute the audio acquiring step again.
These optional implementations can convert statements into video sentence by sentence, thereby avoiding the problem of excessive occupation of running resources caused by centralized generation of video.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a video playing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and besides the features described below, the embodiment of the apparatus may further include the same or corresponding features or effects as the embodiment of the method shown in fig. 2. The device can be applied to various electronic equipment.
As shown in fig. 5, the video playback apparatus 500 of the present embodiment includes: an acquisition unit 501, a rendering unit 502, and an end unit 503. The obtaining unit 501 is configured to obtain a text, determine a keyword of the text, and obtain an animation indicated by an animation tag matched with the keyword in an animation set, where each sentence of the text has the keyword; a rendering unit 502 configured to, according to the sequence of each sentence in the text, acquire and play an audio corresponding to the sentence for the sentence in each sentence, and render an animation corresponding to a keyword of the sentence in the acquired animation; an ending unit 503, configured to end the playing flow of the video in response to determining that rendering of the moving picture corresponding to the keyword of each sentence is completed or receiving a playing stop instruction.
In this embodiment, specific processing of the obtaining unit 501, the rendering unit 502, and the ending unit 503 of the video playing apparatus 500 and technical effects thereof can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the matching is a fuzzy matching of semantics; the generation step of the matching result comprises the following steps: obtaining the semantics of the keywords and the semantics of the dynamic image labels in the dynamic image set; in the dynamic image labels of the dynamic image set, the dynamic image labels with the same or similar semanteme with the keywords are determined as the matched dynamic image labels.
In some optional implementations of this embodiment, the step of generating the animation in the animation set includes: acquiring an avatar, and making an animation from a plurality of continuous expression action pictures of the avatar, wherein the expression action pictures are used for presenting facial expressions and/or body actions of the avatar; and adding a dynamic graph label to the dynamic graph, wherein the dynamic graph after the label is added is taken as one dynamic graph in the dynamic graph set, and the dynamic graph label is used for indicating expressions and/or actions displayed in a plurality of continuous expression action pictures.
In some optional implementations of this embodiment, the rendering unit is further configured to execute the rendering of the corresponding animation corresponding to the keyword of the sentence in the obtained animation as follows: and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph, and rendering the sentence as a subtitle to an upper layer of the dynamic graph.
In some optional implementations of this embodiment, the rendering unit is further configured to perform rendering the obtained animation corresponding to the keyword of the sentence in the animation as follows: acquiring a compressed texture file packet comprising texture features of the motion picture; and decompressing the compressed texture file package, binding the dynamic image and the decompressed texture, and rendering the binding result.
In some optional implementations of this embodiment, the number of keywords of each sentence is at least one; the acquisition unit is further configured to execute the following steps of acquiring the dynamic graph indicated by the dynamic graph label matched with the keyword in the dynamic graph set: sending an image request including the keywords to a server, wherein the server searches an image with the highest matching degree of the labels and the keywords in an image set and returns the image; and receiving the dynamic graph returned by the server as the dynamic graph indicated by the matched dynamic graph label.
In some optional implementations of this embodiment, the method is applied to a terminal, and the terminal is installed with a video application; an obtaining unit further configured to perform obtaining the text as follows: responding to the starting of the video application, and acquiring an input URL address at the video application; and determining the page indicated by the URL address, and analyzing the text from the page.
In some optional implementation manners of this embodiment, the rendering unit is further configured to execute, according to the following manner, obtaining and playing, for a sentence in each sentence, an audio corresponding to the sentence according to the sequence of each sentence in the text, and rendering an animation corresponding to a keyword of the sentence in the obtained animation: performing an audio acquisition step: determining at least one sentence which is not acquired from each sentence according to the sequence, and acquiring the audio corresponding to the at least one sentence; and extracting the dynamic graph corresponding to the keyword of at least one sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the acquired audio.
In some optional implementation manners of this embodiment, the rendering unit is further configured to execute, according to the following manner, obtaining and playing, for a sentence in each sentence, an audio corresponding to the sentence according to the sequence of each sentence in the text, and rendering an animation corresponding to a keyword of the sentence in the obtained animation: executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time; and for the sentence corresponding to the audio, extracting the dynamic graph corresponding to the keyword of the sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the audio.
In some optional implementations of this embodiment, the apparatus further includes: and a display unit configured to display an action diagram in which the talking avatar is presented in response to an action diagram corresponding to the keyword of the at least one sentence not being extracted from the acquired action diagrams.
In some optional implementations of this embodiment, the apparatus further includes: an upload unit configured to upload at least one sentence to the server before displaying the moving picture in which the talking avatar is presented, so that the server synthesizes the moving picture in which the talking avatar speaks the at least one sentence; a receiving unit configured to receive the server-synthesized motion picture.
In some optional implementations of this embodiment, the audio obtaining step is performed again, including: and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.
In some optional implementations of this embodiment, the rendering unit is further configured to acquire and play audio corresponding to the sentence as follows: and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.
There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.
Fig. 6 is a block diagram of an electronic device according to a video playing method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the video playing method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of playing a video provided by the present application.
The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the playing method of a video in the embodiment of the present application (for example, the obtaining unit 501, the rendering unit 502, and the ending unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, namely, implements the video playing method in the above method embodiment, by running the non-transitory software programs, instructions and modules stored in the memory 602.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the playing electronic device of the video, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected to the video playing electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the video playing method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the video-playing electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, audio, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a rendering unit, and an end unit. For example, the ending unit may also be described as a unit that determines that rendering of the motion picture corresponding to the keyword of each sentence is completed, or that receives a playback stop instruction and ends the playback flow of the video.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: the method comprises the steps of obtaining a text, determining keywords of the text, and obtaining an action graph indicated by an action graph label matched with the keywords in an action graph set, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph; and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (15)

1. A method of playing a video, the method comprising:
obtaining a text, determining keywords of the text, and obtaining an active graph indicated by an active graph label matched with the keywords in an active graph set, wherein the keywords exist in each sentence of the text;
according to the sequence of each sentence in the text, for the sentence in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph;
and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video.
2. The method of claim 1, wherein the match is a fuzzy match of semantics;
the generating step of the matching result comprises the following steps:
obtaining the semantics of the key words and the semantics of the dynamic image labels in the dynamic image set;
and determining the moving picture labels with the same or similar semanteme with the keywords in the moving picture labels of the moving picture set as the matched moving picture labels.
3. The method according to claim 1 or 2, wherein the step of generating the animation in the set of animation comprises:
acquiring an avatar, and making an animation from a plurality of continuous expression action pictures of the avatar, wherein the expression action pictures are used for presenting facial expressions and/or body actions of the avatar;
and adding an animation label to the animation, and taking the labeled animation as one of the animation in the animation set, wherein the animation label is used for indicating the expression and/or the action presented in the plurality of continuous expression action pictures.
4. The method according to claim 1, wherein the rendering the corresponding motion graph of the keyword of the sentence in the obtained motion graph comprises:
and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph, and rendering the sentence as a subtitle to an upper layer of the dynamic graph.
5. The method according to claim 1 or 4, wherein the rendering the obtained motion graph corresponding to the keyword of the sentence comprises:
acquiring a compressed texture file packet comprising texture features of the motion picture;
and decompressing the compressed texture file package, binding the dynamic image and the decompressed texture, and rendering the binding result.
6. The method of claim 1, wherein the number of keywords per sentence is at least one;
the obtaining of the moving picture indicated by the moving picture label matched with the keyword in the moving picture set includes:
sending an image request including the keyword to a server, wherein the server searches an image with the highest matching degree between the label and the keyword in an image set and returns the image;
and receiving the dynamic graph returned by the server as the dynamic graph indicated by the matched dynamic graph label.
7. The method of claim 1, wherein the method is applied to a terminal, which is installed with a video application;
the acquiring the text comprises:
responding to the video application starting, and acquiring an input URL address at the video application;
and determining a page indicated by the URL address, and analyzing a text from the page.
8. The method according to claim 1, wherein the obtaining and playing, for the sentences in each sentence, the audio corresponding to the sentence according to the sequence of the sentences in the text, and rendering the obtained animation corresponding to the keyword of the sentence in the animation comprises:
performing an audio acquisition step: determining at least one sentence which is not acquired from the sentences according to the sequence, and acquiring the audio corresponding to the at least one sentence;
and extracting the dynamic graph corresponding to the keyword of the at least one sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the acquired audio.
9. The method according to claim 8, wherein the obtaining and playing, for the sentences in each sentence, the audio corresponding to the sentence according to the sequence of the sentences in the text, and rendering the obtained animation corresponding to the keyword of the sentence in the animation further comprises:
executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time;
and for the sentence corresponding to the audio, extracting the dynamic graph corresponding to the keyword of the sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the audio.
10. The method of claim 8 or 9, wherein the method further comprises:
and displaying the motion picture with the speaking virtual image in response to that the motion picture corresponding to the keyword of the at least one sentence is not extracted from the acquired motion pictures.
11. The method of claim 10, wherein prior to the display presenting an action figure with a talking avatar, the method further comprises:
uploading the at least one sentence to a server to cause the server to compose a presentation of an animation speaking the at least one sentence with a talking avatar;
and receiving the dynamic graph synthesized by the server.
12. The method of claim 8 or 9, wherein said re-performing an audio acquisition step comprises:
and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.
13. The method of claim 1, wherein the obtaining and playing the audio corresponding to the sentence comprises:
and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.
14. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
15. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-13.
CN202110382925.7A 2021-04-09 2021-04-09 Video playing method and device Pending CN112988100A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110382925.7A CN112988100A (en) 2021-04-09 2021-04-09 Video playing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110382925.7A CN112988100A (en) 2021-04-09 2021-04-09 Video playing method and device

Publications (1)

Publication Number Publication Date
CN112988100A true CN112988100A (en) 2021-06-18

Family

ID=76339629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110382925.7A Pending CN112988100A (en) 2021-04-09 2021-04-09 Video playing method and device

Country Status (1)

Country Link
CN (1) CN112988100A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113891133A (en) * 2021-12-06 2022-01-04 阿里巴巴达摩院(杭州)科技有限公司 Multimedia information playing method, device, equipment and storage medium
WO2023138634A1 (en) * 2022-01-19 2023-07-27 阿里巴巴(中国)有限公司 Virtual human control method, apparatus, device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN107832382A (en) * 2017-10-30 2018-03-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on word generation video
US20180286421A1 (en) * 2017-03-31 2018-10-04 Hong Fu Jin Precision Industry (Shenzhen) Co. Ltd. Sharing method and device for video and audio data presented in interacting fashion
CN109344291A (en) * 2018-09-03 2019-02-15 腾讯科技(武汉)有限公司 A kind of video generation method and device
CN110381266A (en) * 2019-07-31 2019-10-25 百度在线网络技术(北京)有限公司 A kind of video generation method, device and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
US20180286421A1 (en) * 2017-03-31 2018-10-04 Hong Fu Jin Precision Industry (Shenzhen) Co. Ltd. Sharing method and device for video and audio data presented in interacting fashion
CN107832382A (en) * 2017-10-30 2018-03-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on word generation video
CN109344291A (en) * 2018-09-03 2019-02-15 腾讯科技(武汉)有限公司 A kind of video generation method and device
CN110381266A (en) * 2019-07-31 2019-10-25 百度在线网络技术(北京)有限公司 A kind of video generation method, device and terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113891133A (en) * 2021-12-06 2022-01-04 阿里巴巴达摩院(杭州)科技有限公司 Multimedia information playing method, device, equipment and storage medium
WO2023138634A1 (en) * 2022-01-19 2023-07-27 阿里巴巴(中国)有限公司 Virtual human control method, apparatus, device, and storage medium

Similar Documents

Publication Publication Date Title
US10521946B1 (en) Processing speech to drive animations on avatars
CN112131988B (en) Method, apparatus, device and computer storage medium for determining virtual character lip shape
CN112259072A (en) Voice conversion method and device and electronic equipment
JP2021192222A (en) Video image interactive method and apparatus, electronic device, computer readable storage medium, and computer program
US11423907B2 (en) Virtual object image display method and apparatus, electronic device and storage medium
EP3902280A1 (en) Short video generation method and platform, electronic device, and storage medium
CN108846886B (en) AR expression generation method, client, terminal and storage medium
CN111225236B (en) Method and device for generating video cover, electronic equipment and computer-readable storage medium
CN111862277A (en) Method, apparatus, device and storage medium for generating animation
JP7263660B2 (en) Video processing method, device, electronic device and storage medium
WO2017218038A1 (en) Server-based conversion of autoplay content to click-to-play content
CN112541957A (en) Animation generation method, animation generation device, electronic equipment and computer readable medium
CN112102449A (en) Virtual character generation method, virtual character display device, virtual character equipment and virtual character medium
CN112988100A (en) Video playing method and device
CN111984825A (en) Method and apparatus for searching video
CN111158924A (en) Content sharing method and device, electronic equipment and readable storage medium
WO2019085625A1 (en) Emotion picture recommendation method and apparatus
CN111309200A (en) Method, device, equipment and storage medium for determining extended reading content
US11615714B2 (en) Adaptive learning in smart products based on context and learner preference modes
CN112843681A (en) Virtual scene control method and device, electronic equipment and storage medium
CN112328088A (en) Image presenting method and device
CN113542802B (en) Video transition method and device
CN113542888A (en) Video processing method and device
CN113965798A (en) Video information generating and displaying method, device, equipment and storage medium
CN111524123A (en) Method and apparatus for processing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination