CN112988100A

CN112988100A - Video playing method and device

Info

Publication number: CN112988100A
Application number: CN202110382925.7A
Authority: CN
Inventors: 胡其斌
Original assignee: Shanghai Zhangmen Science and Technology Co Ltd
Current assignee: Shanghai Zhangmen Science and Technology Co Ltd
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-06-18

Abstract

The application discloses a video playing method and device, and relates to the technical field of artificial intelligence including computer vision and deep learning. The specific implementation mode comprises the following steps: acquiring a text, determining keywords of the text, and acquiring an active graph indicated by an active graph label matched with the keywords in an active graph set; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph; and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video. The method and the device can convert the text into the motion picture in real time to realize video playing, so that the text can be converted into a multimedia form in real time, and the visual experience of a user is enriched. In addition, by matching the keywords, an animation accurately matched with the text can be obtained, and the playing effect of the video is improved.

Description

Video playing method and device

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence including computer vision and deep learning, and particularly relates to a video playing method and device.

Background

Text is a common form of static presentation. Typically a sentence or a combination of sentences having a complete, systematic meaning.

With the increasing performance of electronic device hardware, real-time synthesis technology for converting text into sound content becomes possible. In the related art, the server may implement synthesis from Text To audio through a Speech synthesis technology (Text To Speech, TTS). Thus, the text can be converted into an output form and output in an audio manner.

Disclosure of Invention

A video playing method, a video playing device, an electronic device and a storage medium are provided.

According to a first aspect, there is provided a method for playing a video, including: the method comprises the steps of obtaining a text, determining keywords of the text, and obtaining an action graph indicated by an action graph label matched with the keywords in an action graph set, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph; and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video.

According to a second aspect, there is provided a video playing apparatus, comprising: the obtaining unit is configured to obtain a text, determine keywords of the text, and obtain an action graph indicated by an action graph label matched with the keywords in an action graph set, wherein the keywords exist in each sentence of the text; the rendering unit is configured to acquire and play audio corresponding to each sentence in each sentence according to the sequence of each sentence in the text, and render an obtained dynamic graph corresponding to the keyword of the sentence in the dynamic graph; and the ending unit is configured to respond to the fact that the rendering of the dynamic images corresponding to the keywords of the sentences is determined to be finished or a playing stopping instruction is received, and end the playing flow of the video.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of any embodiment of the video playing method.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the playing method of the video.

According to a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to any embodiment of the method of playing back a video.

According to the scheme of the application, the text can be converted into the dynamic image in real time, so that video playing is achieved, static display of the paper surface of the article is converted into dynamic video playing, the text can be converted into a multimedia form in real time, and visual experience of a user is enriched. In addition, by matching the keywords, an animation accurately matched with the text can be obtained, and the playing effect of the video is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of playing a video according to the present application;

FIG. 3 is a flow diagram of yet another embodiment of a method of playing a video according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method of playing a video according to the present application;

fig. 5 is a schematic structural diagram of an embodiment of a video playback apparatus according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a video playing method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which an embodiment of a video playback method or a video playback apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on the received data such as the text, and feed back a processing result (e.g., a motion picture) to the terminal device.

It should be noted that the video playing method provided in the embodiment of the present application may be executed by the

terminal devices

101, 102, and 103, and accordingly, the video playing apparatus may be disposed in the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of playing a video according to the present application is shown. The video playing method comprises the following steps:

step 201, obtaining a text, determining keywords of the text, and obtaining an active graph indicated by an active graph label matched with the keywords in an active graph set, wherein the keywords exist in each sentence of the text.

In this embodiment, an execution subject (for example, the terminal device shown in fig. 1) on which the video playing method is executed may acquire a text and determine a keyword of the text. After that, the execution body may obtain an animation corresponding to the keyword. The execution main body can extract the keywords of the text in the device, and can also send the text to other electronic devices (such as a server) and receive the keywords returned by the other electronic devices.

The corresponding dynamic graph of the keyword refers to the dynamic graph indicated by the dynamic graph label matched with the keyword. The execution main body or other electronic equipment can match the keywords with the moving picture labels in the moving picture set, so as to obtain a matching result. The execution body may obtain an animation indicated by the matching animation tag in the matching result.

Corresponding keywords may exist in each sentence, and the keyword existing in each sentence may be at least one. The device or other electronic devices can have a picture set, and the pictures in the picture set have corresponding keywords. For example, the keywords of a plurality of pictures for playing basketball may be "playing basketball", and may also be "playing basketball" or "sports".

Step 202, according to the sequence of each sentence in the text, for the sentence in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the obtained dynamic graph corresponding to the keyword of the sentence in the dynamic graph.

In this embodiment, the execution main body may obtain, for a sentence (for example, each sentence) in each sentence according to a sequence (or reading sequence) of the line text of each sentence in the text, an audio corresponding to the sentence, and play the audio. Moreover, the execution main body can also render the dynamic graph corresponding to the keyword of the sentence, so that the display of the dynamic graph and the playing of the audio are realized, and the video playing is further realized. The motion picture is a motion picture among the acquired motion pictures.

Step 203, in response to determining that rendering of the motion picture corresponding to the keyword of each sentence is completed or receiving a play stop instruction, ending the video play flow.

In this embodiment, the execution main body may end the video playback flow when it is determined that rendering of the motion picture corresponding to the keyword of each sentence is completed, or when a display stop instruction is received.

The method provided by the embodiment of the application can convert the text into the moving picture in real time so as to realize video playing, thereby converting the static display of the paper of the article into the dynamic playing of the video, converting the text into a multimedia form in real time and enriching the visual experience of the user. In addition, by matching the keywords, an animation accurately matched with the text can be obtained, and the playing effect of the video is improved.

In some optional implementations of this embodiment, the matching is semantic fuzzy matching; the generating of the matching result may include: obtaining the semantics of the keywords and the semantics of the dynamic image labels in the dynamic image set; in the dynamic image labels of the dynamic image set, the dynamic image labels with the same or similar semanteme with the keywords are determined as the matched dynamic image labels.

In these optional implementation manners, the execution main body or other electronic devices may obtain semantics of the keyword and semantics of the dynamic tag in the dynamic image set, and determine the matching dynamic image tag through the semantics. The fuzzy matching may be that, in each of the moving picture labels in the moving picture set, a moving picture label having the same or similar semantics as the above-mentioned keyword is determined as a matched moving picture label.

The semantic similarity between the keyword and the moving picture labels may refer to that the similarity is greater than a preset threshold, or may refer to that the moving picture labels are determined according to the sequence of similarity from large to small in the similarity between the keyword and each moving picture label.

These implementations can increase the accuracy of matching to an animation using keywords through semantic matching.

In some optional implementations of this embodiment, the generating step of the animation in the animation set may include: acquiring an avatar, and making an animation from a plurality of continuous expression action pictures of the avatar, wherein the expression action pictures are used for presenting facial expressions and/or body actions of the avatar; and adding a dynamic graph label to the dynamic graph, wherein the dynamic graph after the label is added is taken as one dynamic graph in the dynamic graph set, and the dynamic graph label is used for indicating the motion and/or the motion presented in a plurality of continuous expression motion pictures.

In these alternative implementations, the executing body or other electronic devices may execute the step of generating the moving picture. Taking the execution main body as an example, when the execution main body generates the motion picture, the execution main body can acquire a plurality of pictures showing facial posture expressions and/or body movements of the avatar, that is, expression movement pictures. The plurality of pictures taken together may present an avatar with continuous facial pose expressions and/or body movements. The execution body may also add labels to the produced actions, where the labels are used to indicate expressions and/or actions presented in the motion picture, such as "smiling", "basketball".

These implementations may generate a motion picture label with expressive actions, thereby letting the motion picture label vividly indicate an avatar in the motion picture.

In some optional implementation manners of this embodiment, the rendering an animation corresponding to the keyword of the sentence in the acquired animation in step 202 may include: and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph, and rendering the sentence as a subtitle to an upper layer of the dynamic graph.

In these optional implementations, the execution main body may render not only the motion picture corresponding to the keyword of the sentence, but also render the sentence as a subtitle corresponding to the rendered motion picture, that is, a subtitle of a video, to an upper layer of the motion picture.

These implementations can display subtitles corresponding to a motion picture for the motion picture, and explain the motion picture by the subtitles, thereby contributing to the understanding of the video content by the viewer.

In some optional implementation manners of this embodiment, the rendering an animation corresponding to the keyword of the sentence in the acquired animation in step 202 may include: acquiring a compressed texture file packet comprising texture features of the motion picture; and decompressing the compressed texture file package, binding the dynamic image and the decompressed texture, and rendering the binding result.

In these alternative implementations, the execution body may obtain (e.g., generate in advance) the compressed texture file package, and may perform rendering on the motion graph by parsing the texture package and binding the motion graph and the motion graph texture.

In some optional implementations of this embodiment, the number of keywords of each sentence is at least one; obtaining an action graph indicated by an action graph label matched with the keyword in an action graph set, wherein the step comprises the following steps: sending an image request including the keywords to a server, wherein the server searches an image with the highest matching degree of the labels and the keywords in an image set and returns the image; and receiving the dynamic graph returned by the server as the dynamic graph indicated by the matched dynamic graph label.

In these alternative implementations, the execution main body may send an image request including the keyword to the server, and the server may determine the image corresponding to the keyword. The server may determine the best matching at least one of the kinegrams. In the case where more than one of the identified profiles is determined, the execution body may select from the returned results.

The implementation modes can enable the server to execute the dynamic graph searching task which needs to consume more running resources, so that the dynamic graph obtaining efficiency is improved.

In some optional implementations of this embodiment, the method is applied to a terminal, and the terminal is installed with a video application; obtaining a text, comprising: responding to the starting of the video application, and acquiring an input URL address at the video application; and determining the page indicated by the URL address, and analyzing the text from the page.

In these alternative implementations, as the execution subject of the terminal, the URL address input by the user (such as copied or manually input) may be acquired by the video application when the video application is started. Then, the execution body may determine the page indicated by the address, and parse the text from the page to extract the text.

These implementations may extract text in a page, thereby providing a way to obtain text.

In some optional implementations of this embodiment, acquiring and playing the audio corresponding to the sentence includes: and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.

In these alternative implementations, the audio synthesis model may complete the process from text to speech, i.e., audio. The execution body may input the sentence to an audio synthesis model, thereby obtaining audio output from the model.

In these implementations, when the execution subject is a terminal device, the execution subject may locally implement audio synthesis through an audio synthesis model.

With continued reference to fig. 3, fig. 3 is a flow chart of still another embodiment of a video playing method according to the present embodiment.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method of playing a video is shown. The process 400 includes the following steps:

step 401, obtaining a text, determining keywords of the text, and obtaining an active graph indicated by an active graph label matched with the keywords in an active graph set, where each sentence of the text has the keywords respectively.

Step 402, performing an audio acquisition step: and determining at least one sentence which is not acquired from each sentence according to the sequence, and acquiring the audio corresponding to the at least one sentence.

In this embodiment, the executing body may execute the audio obtaining step, specifically, determine at least one sentence that is not obtained from each sentence according to the above sequence, and obtain the audio corresponding to the at least one sentence. For example, the next sentence may be read sequentially according to the sequence, and the audio of the sentence may be obtained.

Step 403, extracting an animation corresponding to the keyword of at least one sentence from the obtained animation, rendering the animation, and playing the obtained audio.

In this embodiment, the execution body may extract an animation corresponding to the keyword of at least one sentence from the acquired animation, and render the animation. The execution main body can not only perform motion picture rendering, but also play audio so as to realize playing in a multimedia form.

Step 404, in response to determining that rendering of the motion picture corresponding to the keyword of each sentence is completed or receiving a play stop instruction, ending the video play flow.

The embodiment can acquire at least one statement at a time and convert the at least one statement into a part of the video so as to realize accurate conversion of the video.

In some optional implementation manners of this embodiment, the obtaining and playing, according to the sequence of each sentence in the text, the audio corresponding to the sentence for the sentence in each sentence, and rendering the obtained animation corresponding to the keyword of the sentence in the animation may further include: executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time; and for the sentence corresponding to the audio, extracting the dynamic graph corresponding to the keyword of the sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the audio.

In these optional implementations, the executing main body may execute the audio acquiring step again, so as to obtain the audio generated by the execution. Then, the execution main body may continue to extract an animation corresponding to the keyword of the sentence corresponding to the audio (e.g., at least one sentence that is determined according to the above sequence and is not yet acquired). The execution body may render the animation and play the audio.

The alternative implementation modes can realize the continuous playing of the video by completing a plurality of audio generation, playing and moving picture extraction and rendering processes.

In some optional application scenarios of these implementations, the method may further include: and displaying the motion picture with the speaking virtual image in response to that the motion picture corresponding to the keyword of at least one sentence is not extracted from the acquired motion pictures.

In these optional application scenarios, the execution subject may display an action image (default action image) presenting the talking avatar in a case where the action image corresponding to the keyword is not extracted. Specifically, the motion picture presented with the talking avatar is acquired in advance. The motion picture with the speaking virtual image is presented, which means that the virtual image is presented in the motion picture and the virtual image is speaking.

The selectable application scenes can adopt the motion pictures of the speaking virtual images to vividly present pictures, and are beneficial to compensating the uncomfortable experience of the user under the condition that the motion pictures corresponding to the keywords of the text cannot be displayed.

Optionally, before the displaying presents the moving picture with the talking avatar, the method may further comprise: uploading the at least one sentence to a server so that the server synthesizes and presents an animation in which the speaking avatar speaks the at least one sentence; and receiving the dynamic graph synthesized by the server.

In these alternative application scenarios, the execution body may upload at least one sentence to the server in advance, so that the server synthesizes an animation to be presented when an animation with no keyword match is found. Specifically, the motion picture can present an avatar explaining the above sentence, so that the user can appropriately understand what the text originally intended to express in a vivid form.

In some optional implementations of this embodiment, the performing the audio obtaining step again may include: and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.

Specifically, the executing body may execute the audio acquiring step again when the audio is played and there is an unexecuted sentence (that is, there is a next sentence). For example, the execution body may first determine whether the last generated audio is played completely, and in a case that the result is yes, determine whether there is an unexecuted sentence. If so, the executing entity may execute the audio acquiring step again.

These optional implementations can convert statements into video sentence by sentence, thereby avoiding the problem of excessive occupation of running resources caused by centralized generation of video.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a video playing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and besides the features described below, the embodiment of the apparatus may further include the same or corresponding features or effects as the embodiment of the method shown in fig. 2. The device can be applied to various electronic equipment.

As shown in fig. 5, the video playback apparatus 500 of the present embodiment includes: an acquisition unit 501, a rendering unit 502, and an end unit 503. The obtaining unit 501 is configured to obtain a text, determine a keyword of the text, and obtain an animation indicated by an animation tag matched with the keyword in an animation set, where each sentence of the text has the keyword; a rendering unit 502 configured to, according to the sequence of each sentence in the text, acquire and play an audio corresponding to the sentence for the sentence in each sentence, and render an animation corresponding to a keyword of the sentence in the acquired animation; an ending unit 503, configured to end the playing flow of the video in response to determining that rendering of the moving picture corresponding to the keyword of each sentence is completed or receiving a playing stop instruction.

In this embodiment, specific processing of the obtaining unit 501, the rendering unit 502, and the ending unit 503 of the video playing apparatus 500 and technical effects thereof can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the matching is a fuzzy matching of semantics; the generation step of the matching result comprises the following steps: obtaining the semantics of the keywords and the semantics of the dynamic image labels in the dynamic image set; in the dynamic image labels of the dynamic image set, the dynamic image labels with the same or similar semanteme with the keywords are determined as the matched dynamic image labels.

In some optional implementations of this embodiment, the step of generating the animation in the animation set includes: acquiring an avatar, and making an animation from a plurality of continuous expression action pictures of the avatar, wherein the expression action pictures are used for presenting facial expressions and/or body actions of the avatar; and adding a dynamic graph label to the dynamic graph, wherein the dynamic graph after the label is added is taken as one dynamic graph in the dynamic graph set, and the dynamic graph label is used for indicating expressions and/or actions displayed in a plurality of continuous expression action pictures.

In some optional implementations of this embodiment, the rendering unit is further configured to execute the rendering of the corresponding animation corresponding to the keyword of the sentence in the obtained animation as follows: and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph, and rendering the sentence as a subtitle to an upper layer of the dynamic graph.

In some optional implementations of this embodiment, the rendering unit is further configured to perform rendering the obtained animation corresponding to the keyword of the sentence in the animation as follows: acquiring a compressed texture file packet comprising texture features of the motion picture; and decompressing the compressed texture file package, binding the dynamic image and the decompressed texture, and rendering the binding result.

In some optional implementations of this embodiment, the number of keywords of each sentence is at least one; the acquisition unit is further configured to execute the following steps of acquiring the dynamic graph indicated by the dynamic graph label matched with the keyword in the dynamic graph set: sending an image request including the keywords to a server, wherein the server searches an image with the highest matching degree of the labels and the keywords in an image set and returns the image; and receiving the dynamic graph returned by the server as the dynamic graph indicated by the matched dynamic graph label.

In some optional implementations of this embodiment, the method is applied to a terminal, and the terminal is installed with a video application; an obtaining unit further configured to perform obtaining the text as follows: responding to the starting of the video application, and acquiring an input URL address at the video application; and determining the page indicated by the URL address, and analyzing the text from the page.

In some optional implementation manners of this embodiment, the rendering unit is further configured to execute, according to the following manner, obtaining and playing, for a sentence in each sentence, an audio corresponding to the sentence according to the sequence of each sentence in the text, and rendering an animation corresponding to a keyword of the sentence in the obtained animation: performing an audio acquisition step: determining at least one sentence which is not acquired from each sentence according to the sequence, and acquiring the audio corresponding to the at least one sentence; and extracting the dynamic graph corresponding to the keyword of at least one sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the acquired audio.

In some optional implementation manners of this embodiment, the rendering unit is further configured to execute, according to the following manner, obtaining and playing, for a sentence in each sentence, an audio corresponding to the sentence according to the sequence of each sentence in the text, and rendering an animation corresponding to a keyword of the sentence in the obtained animation: executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time; and for the sentence corresponding to the audio, extracting the dynamic graph corresponding to the keyword of the sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the audio.

In some optional implementations of this embodiment, the apparatus further includes: and a display unit configured to display an action diagram in which the talking avatar is presented in response to an action diagram corresponding to the keyword of the at least one sentence not being extracted from the acquired action diagrams.

In some optional implementations of this embodiment, the apparatus further includes: an upload unit configured to upload at least one sentence to the server before displaying the moving picture in which the talking avatar is presented, so that the server synthesizes the moving picture in which the talking avatar speaks the at least one sentence; a receiving unit configured to receive the server-synthesized motion picture.

In some optional implementations of this embodiment, the audio obtaining step is performed again, including: and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.

In some optional implementations of this embodiment, the rendering unit is further configured to acquire and play audio corresponding to the sentence as follows: and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.

There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.

Fig. 6 is a block diagram of an electronic device according to a video playing method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the video playing method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of playing a video provided by the present application.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the playing method of a video in the embodiment of the present application (for example, the obtaining unit 501, the rendering unit 502, and the ending unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, namely, implements the video playing method in the above method embodiment, by running the non-transitory software programs, instructions and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the playing electronic device of the video, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected to the video playing electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the video playing method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the video-playing electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, audio, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a rendering unit, and an end unit. For example, the ending unit may also be described as a unit that determines that rendering of the motion picture corresponding to the keyword of each sentence is completed, or that receives a playback stop instruction and ends the playback flow of the video.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: the method comprises the steps of obtaining a text, determining keywords of the text, and obtaining an action graph indicated by an action graph label matched with the keywords in an action graph set, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph; and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method of playing a video, the method comprising:

obtaining a text, determining keywords of the text, and obtaining an active graph indicated by an active graph label matched with the keywords in an active graph set, wherein the keywords exist in each sentence of the text;

according to the sequence of each sentence in the text, for the sentence in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph;

and in response to the fact that the rendering of the dynamic image corresponding to the keyword of each sentence is finished or a playing stop instruction is received, ending the playing flow of the video.

2. The method of claim 1, wherein the match is a fuzzy match of semantics;

the generating step of the matching result comprises the following steps:

obtaining the semantics of the key words and the semantics of the dynamic image labels in the dynamic image set;

and determining the moving picture labels with the same or similar semanteme with the keywords in the moving picture labels of the moving picture set as the matched moving picture labels.

3. The method according to claim 1 or 2, wherein the step of generating the animation in the set of animation comprises:

acquiring an avatar, and making an animation from a plurality of continuous expression action pictures of the avatar, wherein the expression action pictures are used for presenting facial expressions and/or body actions of the avatar;

and adding an animation label to the animation, and taking the labeled animation as one of the animation in the animation set, wherein the animation label is used for indicating the expression and/or the action presented in the plurality of continuous expression action pictures.

4. The method according to claim 1, wherein the rendering the corresponding motion graph of the keyword of the sentence in the obtained motion graph comprises:

and rendering the dynamic graph corresponding to the keyword of the sentence in the acquired dynamic graph, and rendering the sentence as a subtitle to an upper layer of the dynamic graph.

5. The method according to claim 1 or 4, wherein the rendering the obtained motion graph corresponding to the keyword of the sentence comprises:

acquiring a compressed texture file packet comprising texture features of the motion picture;

and decompressing the compressed texture file package, binding the dynamic image and the decompressed texture, and rendering the binding result.

6. The method of claim 1, wherein the number of keywords per sentence is at least one;

the obtaining of the moving picture indicated by the moving picture label matched with the keyword in the moving picture set includes:

sending an image request including the keyword to a server, wherein the server searches an image with the highest matching degree between the label and the keyword in an image set and returns the image;

and receiving the dynamic graph returned by the server as the dynamic graph indicated by the matched dynamic graph label.

7. The method of claim 1, wherein the method is applied to a terminal, which is installed with a video application;

the acquiring the text comprises:

responding to the video application starting, and acquiring an input URL address at the video application;

and determining a page indicated by the URL address, and analyzing a text from the page.

8. The method according to claim 1, wherein the obtaining and playing, for the sentences in each sentence, the audio corresponding to the sentence according to the sequence of the sentences in the text, and rendering the obtained animation corresponding to the keyword of the sentence in the animation comprises:

performing an audio acquisition step: determining at least one sentence which is not acquired from the sentences according to the sequence, and acquiring the audio corresponding to the at least one sentence;

and extracting the dynamic graph corresponding to the keyword of the at least one sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the acquired audio.

9. The method according to claim 8, wherein the obtaining and playing, for the sentences in each sentence, the audio corresponding to the sentence according to the sequence of the sentences in the text, and rendering the obtained animation corresponding to the keyword of the sentence in the animation further comprises:

executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time;

and for the sentence corresponding to the audio, extracting the dynamic graph corresponding to the keyword of the sentence from the acquired dynamic graph, rendering the dynamic graph, and playing the audio.

10. The method of claim 8 or 9, wherein the method further comprises:

and displaying the motion picture with the speaking virtual image in response to that the motion picture corresponding to the keyword of the at least one sentence is not extracted from the acquired motion pictures.

11. The method of claim 10, wherein prior to the display presenting an action figure with a talking avatar, the method further comprises:

uploading the at least one sentence to a server to cause the server to compose a presentation of an animation speaking the at least one sentence with a talking avatar;

and receiving the dynamic graph synthesized by the server.

12. The method of claim 8 or 9, wherein said re-performing an audio acquisition step comprises:

and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.

13. The method of claim 1, wherein the obtaining and playing the audio corresponding to the sentence comprises:

and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

15. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-13.