CN111031373A - Video playing method and device, electronic equipment and computer readable storage medium - Google Patents

Video playing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111031373A
CN111031373A CN201911336145.8A CN201911336145A CN111031373A CN 111031373 A CN111031373 A CN 111031373A CN 201911336145 A CN201911336145 A CN 201911336145A CN 111031373 A CN111031373 A CN 111031373A
Authority
CN
China
Prior art keywords
video
video stream
user
information
playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911336145.8A
Other languages
Chinese (zh)
Inventor
王骎
刘勇
齐萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911336145.8A priority Critical patent/CN111031373A/en
Publication of CN111031373A publication Critical patent/CN111031373A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4882Data services, e.g. news ticker for displaying messages, e.g. warnings, reminders

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a video playing method and device, electronic equipment and a computer readable storage medium, and relates to the field of video playing. The specific implementation scheme is as follows: acquiring voice information input by a user in the playing process of the first video stream; selecting a target video stream from a plurality of second video streams associated with the first video stream according to the voice information; the first video stream and each second video stream are positioned on different branches of a target video, and the target video adopts a tree structure; and playing the target video stream. According to the embodiment of the application, the interaction mode of the user and the playing video can be enriched, the participation sense of the user in the process of watching the video is enhanced, and the watching experience of the user is improved.

Description

Video playing method and device, electronic equipment and computer readable storage medium
Technical Field
The application relates to the technical field of computers, in particular to a video playing technology.
Background
At present, when a user needs to adjust the playing content during the process of watching a video, the user often can only adjust the playing content of the video by a manual method, such as clicking a pop-up selection box. Therefore, the existing mode for adjusting video playing content is single, and user experience is poor.
Disclosure of Invention
The embodiment of the application provides a video playing method, a video playing device, electronic equipment and a computer readable storage medium, so as to solve the problem that the existing mode for adjusting video playing content is single.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a video playing method, including:
acquiring voice information input by a user in the playing process of the first video stream;
selecting a target video stream from a plurality of second video streams associated with the first video stream according to the voice information; the first video stream and each second video stream are positioned on different branches of a target video, and the target video adopts a tree structure;
and playing the target video stream.
Therefore, compared with the prior art that the playing content of the video can be adjusted only through a manual mode, the interactive mode of the user and the playing video can be enriched, the participation sense of the user when watching the video is enhanced, and the watching experience of the user is improved.
Optionally, the acquiring the voice information input by the user includes:
acquiring mark information of the first video stream; wherein the marking information comprises a video branch condition corresponding to the first video stream;
prompting the user for the video branch condition;
and receiving the voice information input by the user based on the prompt information.
Therefore, the user can conveniently select the required video stream to play.
Optionally, after obtaining the mark information of the first video stream and before selecting the target video stream, the method further includes:
asynchronously acquiring the plurality of second video streams according to the video branch condition;
preloading the plurality of second video streams;
wherein the target video stream is selected from a preloaded plurality of second video streams.
Therefore, by means of the preloading process, the fluency of video switching can be ensured, and therefore after the target video stream is selected based on user input, the currently played video stream can be smoothly switched to the target video stream for playing.
Optionally, the acquiring the voice information input by the user includes:
collecting environmental sound information, wherein the environmental sound information comprises the voice information and environmental noise information;
and inputting the environmental sound information into a pre-trained voice extraction model corresponding to the user to obtain the voice information.
Thus, by means of the pre-trained speech extraction model, the speech information input by the user can be accurately identified from the noisy environment sound.
Optionally, the prompting the user of the video branching condition includes:
prompting the user of the video branch condition in at least one of the following modes:
voice broadcast mode, text display mode.
Therefore, the effect of quickly and accurately prompting the user can be achieved by means of voice broadcasting and/or text display modes.
In a second aspect, an embodiment of the present application provides a video playing apparatus, including:
the first acquisition module is used for acquiring voice information input by a user in the playing process of the first video stream;
a selection module for selecting a target video stream from a plurality of second video streams associated with the first video stream according to the voice information; the first video stream and each second video stream are positioned on different branches of a target video, and the target video adopts a tree structure;
and the playing module is used for playing the target video stream.
Optionally, the first obtaining module includes:
an acquisition unit configured to acquire tag information of the first video stream; wherein the marking information comprises a video branch condition corresponding to the first video stream;
the prompting unit is used for prompting the video branch condition of a user;
and the receiving unit is used for receiving the voice information input by the user based on the prompt information.
Optionally, the apparatus further comprises:
a second obtaining module, configured to obtain the tag information of the first video stream, and then asynchronously obtain the plurality of second video streams according to the video branching condition;
and the loading module is used for preloading the plurality of second video streams so as to select a target video stream from the preloaded plurality of second video streams.
Optionally, the first obtaining module includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring environmental sound information, and the environmental sound information comprises the voice information and environmental noise information;
and the extracting unit is used for inputting the environmental sound information into a pre-trained voice extracting model corresponding to the user to obtain the voice information.
Optionally, the prompting unit is specifically configured to:
prompting the user of the video branch condition in at least one of the following modes:
voice broadcast mode, text display mode.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a video playback method as described above.
In a fourth aspect, the present application further provides a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are configured to cause the computer to execute the video playing method described above.
One embodiment in the above application has the following advantages or benefits: the interactive mode of the user and the playing video can be enriched, the participation sense of the user when watching the video is enhanced, and the watching experience of the user is improved. The technical means that the target video stream is selected from the plurality of second video streams associated with the currently played first video stream according to the voice information input by the user, the first video stream and each second video stream are positioned on different branches of the target video, and the selected target video stream is played is adopted, so that the technical problem that the existing mode for adjusting the video playing content is single is solved, the interactive mode of enriching the user and playing the video is achieved, the participation sense of the user when watching the video is enhanced, and the technical effect of watching experience of the user is improved.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a flowchart of a video playing method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating a video uploading and playing process according to an embodiment of the present application;
fig. 3 is a block diagram of a video playback device for implementing a video playback method according to an embodiment of the present application;
fig. 4 is a block diagram of an electronic device for implementing a video playing method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, fig. 1 is a flowchart of a video playing method provided in an embodiment of the present application, applied to an electronic device, and as shown in fig. 1, the method includes the following steps:
step 101: and acquiring voice information input by a user in the playing process of the first video stream.
In this embodiment, since the user may be the speech information input in the noisy environment including noise, in order to accurately acquire the speech information input by the user, the process of acquiring the speech information input by the user in step 101 may include: firstly, collecting environmental sound information, wherein the environmental sound information comprises voice information input by a user and environmental noise information, and the environmental noise information comprises video playing sound, other environmental noise and the like; and then, inputting the collected environmental sound information into a pre-trained voice extraction model corresponding to the user to obtain the voice information.
Understandably, for the collection of the environmental sound information, the collection can be realized by a sound collector such as a microphone and the like. The voice extraction model is mainly used for identifying voice information of a corresponding user and can be understood as a voice filtering module. The speech extraction models of different users are usually different. Thus, by means of the pre-trained speech extraction model, the speech information input by the user can be accurately identified from the noisy environment sound.
In one embodiment, taking user a as an example, the training process of the speech extraction model corresponding to user a may include: firstly, some common voice samples of a user A and a dialogue audio VF containing the voice samples and noise are collected; then, processing the collected voice sample by using a trained model (also called as a speaker encoder) for extracting the voiceprint features of the user, namely inputting the collected voice sample, outputting a feature vector SV, wherein one voice sample correspondingly outputs one feature vector SV, and when a plurality of feature vectors SV are obtained, an average value obtained by performing L2 regularization on the plurality of SVs can be used as the voiceprint feature PF of the user A; in one embodiment, the network used by the model for extracting the user voiceprint features may be a three-layer network based on Long Short-Term Memory (LSTM), with the addition of Generalized Estimation Equation (GEE) loss derivation, the input speech may be in the form of 1600ms audio logarithmic Mel spectrum, and the output feature vector SV may have a width of X; and finally, based on a pre-constructed basic model, taking the voiceprint features PF and the dialogue audio VF as input, taking the voice sample (namely the voice of the user A after irrelevant noise is removed) as output, and training to obtain a voice extraction model corresponding to the user A. For example, the basic model may be a network of Time dimension masks, a soft mask may be generated for multiplying a noise (dialogue audio VF) magnitude spectrum to generate an enhanced magnitude spectrum, then the phase of the noise audio is added to the enhanced magnitude spectrum, and the enhanced audio with irrelevant audio removed is obtained by Inverse Short-Time Fourier Transform (ISTFT).
Step 102: selecting a target video stream from a plurality of second video streams associated with the first video stream according to the voice information.
In this embodiment, the first video stream and each of the second video streams are located on different branches of the target video. The target video adopts a tree structure. The tree structure is adopted for the target video, that is, the video stream index of the target video adopts a tree storage structure to realize a tree video playing path. The target video may include video streams of a plurality of different branches, and the video streams between different branches may be switched, for example, the video stream of the current branch may be switched to the video stream of the next branch. The video streams of the different branches may be set according to predefined rules.
For example, when a video uploader uploads a video, the video in the parallel video list to be uploaded can be uploaded in a dragging manner according to a custom rule such as a plot development trend, so as to form a tree-shaped video structure and control the video contents of different branches. For example, in a tree-like video structure, the video content on the root branch is the first playing part, and the video content on different subbranches corresponds to different plot trends.
Optionally, in this embodiment, the target video stream may be selected through recognition of the user's intention. The process of selecting the target video stream in step 102 may be: firstly, performing intention recognition on voice information input by a user to obtain the intention of the user; then, a target video stream is selected from a plurality of second video streams associated with the first video stream according to the user intent. When the intention recognition is performed, the voice information to be recognized (i.e. the voice information input by the user) can be input into the intention recognition model trained in advance, and the user intention matched with the voice information can be obtained. For the training process of the intention recognition model, an existing method may be adopted, and this embodiment does not limit this. Therefore, through the identification of the user intention, the video stream of the corresponding branch can be dynamically switched to be played according to the user intention, and the video playing of the personalized visual angle is presented to the user.
Step 103: and playing the target video stream.
In one embodiment, taking the target video including video stream 1, video stream 2 and video stream 3 as an example, assuming that video stream 1 is the beginning part of the target video and is the description part of event a, and video stream 2 and video stream 3 are the video parts respectively associated with video stream 1 and located in the next branch, and represent different ends of event a, during the playing of video stream 1, one of video stream 2 and video stream 3 can be selected for playing by means of the voice information input by the user.
In another embodiment, taking a target video as a certain sports game video as an example, the game video includes a video stream 0 stored by a root node (here, a summary description of the game), and a video stream 3 and a video stream 4 stored by different child nodes, where the video stream 3 is game content explained in language 1, and the video stream 4 is game content explained in language 2, during playing the video stream 0, if a user wishes to hear the language 1, the user can play the video stream 1 by inputting corresponding voice information, such as watching the video stream 1; if the user wishes to hear language 2, the video stream 2 can be played by inputting corresponding voice information, such as watching the video stream 2.
According to the video playing method, the target video stream can be selected from the plurality of second video streams associated with the currently played first video stream according to the voice information input by the user, the first video stream and each second video stream are located on different branches of the target video, and the selected target video stream is played. Therefore, compared with the prior art that the playing content of the video can be adjusted only in a manual mode, the interactive mode of the user and the playing video can be enriched, the participation sense of the user when watching the video is enhanced, and the watching experience of the user is improved.
In this embodiment, in order to facilitate the user to select a desired video stream for playing, the video stream in the target video may be marked, and the marking information includes indication information indicating that the video stream is a part of the interactive video, a video branch condition corresponding to the video stream, and the like, so as to obtain the marking information in the playing process of the video stream and prompt based on the marking information. The video branching condition may be set based on a user requirement, for example, a scenario branching condition or a video version branching condition, which is not limited in this embodiment.
Optionally, the process of acquiring the voice information input by the user in step 101 may include:
acquiring mark information of the first video stream; wherein the marking information comprises a video branch condition corresponding to the first video stream; .
Prompting the user for the video branch condition;
and receiving the voice information input by the user based on the prompt information.
In a specific implementation process, when prompting the corresponding video branch condition, at least one of a voice broadcasting mode and a text display mode can be adopted for prompting. Therefore, the effect of quickly and accurately prompting the user can be achieved.
In one embodiment, taking an example that a target video includes a video stream 1, a video stream 2, and a video stream 3, assuming that the video stream 1 is a beginning portion of the target video and is a description portion of an event a, and a video branching condition of the video stream 1 is a scenario branching condition, where the scenario branching condition is that the video stream 1 has two associated video portions located in a next-level branch, which are respectively the video stream 2 and the video stream 3 and respectively represent an ending 1 and an ending 2 of the event a, during playing of the video stream 1, a terminal may obtain the scenario branching condition of the video stream 1 and prompt, so that a user can select to play the video stream 2 or the video stream 3 to be viewed based on the prompt information.
Further, after obtaining the mark information of the first video stream, before selecting the target video stream, the method may further include: and asynchronously acquiring the plurality of second video streams according to the video branch condition, and preloading the plurality of second video streams. Thereafter, a target video stream may be selected from the preloaded plurality of second video streams. Therefore, by means of the preloading process, the fluency of video switching can be ensured, and therefore after the target video stream is selected based on user input, the currently played video stream can be smoothly switched to the target video stream for playing.
In one embodiment, to ensure the fluency of dynamic video switching, after the video stream of the parent node (i.e. the parent branch) is played to a certain progress, all the video streams of the corresponding child nodes (i.e. the child branches) may be asynchronously pulled for preloading, so that a target video stream may be selected from the preloaded video streams based on the voice information input by the user, and smoothly switched to the target video stream for playing.
In addition, in order to facilitate the user to select a desired video stream for playing, the user may be prompted to perform an input operation based on the preset branch node in this embodiment. That is, before acquiring the voice information input by the user, the method further includes: detecting whether a target video is played to a preset branch node or not; and prompting a user to execute the operation of inputting voice information under the condition that the target video is played to the preset branch node. It should be noted that the predetermined branch node may represent a switching point between video streams of different branches. For example, taking the target video including video stream 1, video stream 2 and video stream 3 as an example, assuming that video stream 1 is a beginning portion of the target video and is a description portion of event a, and video stream 2 and video stream 3 are video portions at different branches respectively associated with video stream 1 and represent different ends of event a, the preset branch node may be selected as a branch end point corresponding to video stream 1.
The video uploading and playing process in the embodiment of the present application is described below with reference to fig. 2.
In the embodiment of the present application, as shown in fig. 2, for a video uploading end, a video uploading person (or a video creator) may record a video at first, and then edit the recorded video, for example, set video stream contents of different branches of the video according to a user-defined rule, and store a video stream index by using a tree-shaped storage structure, so as to implement a tree-shaped video playing path, and upload the edited video to a dynamic streaming media system. For a video viewer, the speech extraction model may be trained first, such as using the method described above; then, if the user carries out voice input when watching the video, acquiring environmental sound information comprising the voice information of the user, and extracting the voice information of the user in the environmental sound information by utilizing a pre-trained voice extraction model to obtain the voice information of the user; finally, intention recognition is carried out on the voice information of the user to obtain the intention of the user, a video stream scheduling strategy is determined according to the intention of the user by means of a dynamic streaming media system to select the target video stream, and the selected target video stream is pushed to a video watching end to be played.
Referring to fig. 3, fig. 3 is a block diagram of a video playing apparatus for implementing a video playing method according to an embodiment of the present application, and as shown in fig. 3, the video playing apparatus 30 includes:
a first obtaining module 31, configured to obtain voice information input by a user in a playing process of a first video stream;
a selection module 32, configured to select a target video stream from a plurality of second video streams associated with the first video stream according to the voice information; the first video stream and each second video stream are positioned on different branches of a target video, and the target video adopts a tree structure;
and a playing module 33, configured to play the target video stream.
Optionally, the first obtaining module includes:
an acquisition unit configured to acquire tag information of the first video stream; wherein the marking information comprises a video branch condition corresponding to the first video stream;
the prompting unit is used for prompting the video branch condition of a user;
and the receiving unit is used for receiving the voice information input by the user based on the prompt information.
Optionally, the apparatus further comprises:
a second obtaining module, configured to obtain the tag information of the first video stream, and then asynchronously obtain the plurality of second video streams according to the video branching condition;
and the loading module is used for preloading the plurality of second video streams so as to select a target video stream from the preloaded plurality of second video streams.
Optionally, the first obtaining module includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring environmental sound information, and the environmental sound information comprises the voice information and environmental noise information;
and the extracting unit is used for inputting the environmental sound information into a pre-trained voice extracting model corresponding to the user to obtain the voice information.
Optionally, the prompting unit is specifically configured to:
prompting the user of the video branch condition in at least one of the following modes:
voice broadcast mode, text display mode.
It can be understood that the video playing apparatus 30 according to the embodiment of the present application can implement each process implemented in the method embodiment shown in fig. 1 and achieve the same beneficial effects, and for avoiding repetition, details are not repeated here.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 4 is a block diagram of an electronic device for implementing a video playing method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 4, the electronic apparatus includes: one or more processors 401, memory 402, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor 401 is taken as an example.
Memory 402 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to execute the video playing method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the video playback method provided by the present application.
The memory 402, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the first obtaining module 31, the selecting module 32, and the playing module 34 shown in fig. 3) corresponding to the video playing method in the embodiment of the present application. The processor 401 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 402, that is, implements the video playing method in the above-described method embodiment.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the video playing method may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the video playback method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the interaction mode of enriching the user and playing the video can be achieved, the participation sense of the user when watching the video is enhanced, and the technical effect of watching experience of the user is improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A video playback method, comprising:
acquiring voice information input by a user in the playing process of the first video stream;
selecting a target video stream from a plurality of second video streams associated with the first video stream according to the voice information; the first video stream and each second video stream are positioned on different branches of a target video, and the target video adopts a tree structure;
and playing the target video stream.
2. The method of claim 1, wherein the obtaining the voice information input by the user comprises:
acquiring mark information of the first video stream; wherein the marking information comprises a video branch condition corresponding to the first video stream;
prompting the user for the video branch condition;
and receiving the voice information input by the user based on the prompt information.
3. The method of claim 2, wherein after the obtaining the tag information of the first video stream and before the selecting the target video stream, the method further comprises:
asynchronously acquiring the plurality of second video streams according to the video branch condition;
preloading the plurality of second video streams;
wherein the target video stream is selected from a preloaded plurality of second video streams.
4. The method of claim 1, wherein the obtaining the voice information input by the user comprises:
collecting environmental sound information, wherein the environmental sound information comprises the voice information and environmental noise information;
and inputting the environmental sound information into a pre-trained voice extraction model corresponding to the user to obtain the voice information.
5. The method of claim 2, wherein said prompting the user for the video branch condition comprises:
prompting the user of the video branch condition in at least one of the following modes:
voice broadcast mode, text display mode.
6. A video playback apparatus, comprising:
the first acquisition module is used for acquiring voice information input by a user in the playing process of the first video stream;
a selection module for selecting a target video stream from a plurality of second video streams associated with the first video stream according to the voice information; the first video stream and each second video stream are positioned on different branches of a target video, and the target video adopts a tree structure;
and the playing module is used for playing the target video stream.
7. The apparatus of claim 6, wherein the first obtaining module comprises:
an acquisition unit configured to acquire tag information of the first video stream; wherein the marking information comprises a video branch condition corresponding to the first video stream;
the prompting unit is used for prompting the video branch condition of a user;
and the receiving unit is used for receiving the voice information input by the user based on the prompt information.
8. The apparatus of claim 7, further comprising:
a second obtaining module, configured to obtain the tag information of the first video stream, and then asynchronously obtain the plurality of second video streams according to the video branching condition;
and the loading module is used for preloading the plurality of second video streams so as to select a target video stream from the preloaded plurality of second video streams.
9. The apparatus of claim 6, wherein the first obtaining module comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring environmental sound information, and the environmental sound information comprises the voice information and environmental noise information;
and the extracting unit is used for inputting the environmental sound information into a pre-trained voice extracting model corresponding to the user to obtain the voice information.
10. The apparatus according to claim 7, wherein the prompting unit is specifically configured to:
prompting the user of the video branch condition in at least one of the following modes:
voice broadcast mode, text display mode.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN201911336145.8A 2019-12-23 2019-12-23 Video playing method and device, electronic equipment and computer readable storage medium Pending CN111031373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911336145.8A CN111031373A (en) 2019-12-23 2019-12-23 Video playing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911336145.8A CN111031373A (en) 2019-12-23 2019-12-23 Video playing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111031373A true CN111031373A (en) 2020-04-17

Family

ID=70211536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911336145.8A Pending CN111031373A (en) 2019-12-23 2019-12-23 Video playing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111031373A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110809175A (en) * 2019-09-27 2020-02-18 腾讯科技(深圳)有限公司 Video recommendation method and device
CN112687275A (en) * 2020-12-25 2021-04-20 北京中科深智科技有限公司 Voice filtering method and filtering system
CN113423003A (en) * 2021-06-10 2021-09-21 山东云缦智能科技有限公司 Method for playing interactive video
CN113901190A (en) * 2021-10-18 2022-01-07 深圳追一科技有限公司 Man-machine interaction method and device based on digital human, electronic equipment and storage medium
CN114466201A (en) * 2022-02-21 2022-05-10 上海哔哩哔哩科技有限公司 Live stream processing method and device
CN114979770A (en) * 2022-06-28 2022-08-30 北京爱奇艺科技有限公司 Video playing method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472456A (en) * 2015-11-27 2016-04-06 北京奇艺世纪科技有限公司 Video playing method and device
US20180070143A1 (en) * 2016-09-02 2018-03-08 Sony Corporation System and method for optimized and efficient interactive experience
US20180296916A1 (en) * 2017-04-14 2018-10-18 Penrose Studios, Inc. System and method for spatial and immersive computing
CN108769814A (en) * 2018-06-01 2018-11-06 腾讯科技(深圳)有限公司 Video interaction method, device and readable medium
CN109788350A (en) * 2019-01-18 2019-05-21 北京睿峰文化发展有限公司 It is a kind of that the seamless method and apparatus continuously played are selected based on video display plot
CN109982142A (en) * 2017-12-28 2019-07-05 优酷网络技术(北京)有限公司 Video broadcasting method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472456A (en) * 2015-11-27 2016-04-06 北京奇艺世纪科技有限公司 Video playing method and device
US20180070143A1 (en) * 2016-09-02 2018-03-08 Sony Corporation System and method for optimized and efficient interactive experience
US20180296916A1 (en) * 2017-04-14 2018-10-18 Penrose Studios, Inc. System and method for spatial and immersive computing
CN109982142A (en) * 2017-12-28 2019-07-05 优酷网络技术(北京)有限公司 Video broadcasting method and device
CN108769814A (en) * 2018-06-01 2018-11-06 腾讯科技(深圳)有限公司 Video interaction method, device and readable medium
CN109788350A (en) * 2019-01-18 2019-05-21 北京睿峰文化发展有限公司 It is a kind of that the seamless method and apparatus continuously played are selected based on video display plot

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110809175A (en) * 2019-09-27 2020-02-18 腾讯科技(深圳)有限公司 Video recommendation method and device
CN112687275A (en) * 2020-12-25 2021-04-20 北京中科深智科技有限公司 Voice filtering method and filtering system
CN113423003A (en) * 2021-06-10 2021-09-21 山东云缦智能科技有限公司 Method for playing interactive video
CN113901190A (en) * 2021-10-18 2022-01-07 深圳追一科技有限公司 Man-machine interaction method and device based on digital human, electronic equipment and storage medium
CN114466201A (en) * 2022-02-21 2022-05-10 上海哔哩哔哩科技有限公司 Live stream processing method and device
CN114466201B (en) * 2022-02-21 2024-03-19 上海哔哩哔哩科技有限公司 Live stream processing method and device
CN114979770A (en) * 2022-06-28 2022-08-30 北京爱奇艺科技有限公司 Video playing method and device, electronic equipment and storage medium
CN114979770B (en) * 2022-06-28 2024-02-02 北京爱奇艺科技有限公司 Video playing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111031373A (en) Video playing method and device, electronic equipment and computer readable storage medium
CN110933487B (en) Method, device and equipment for generating click video and storage medium
CN112131988B (en) Method, apparatus, device and computer storage medium for determining virtual character lip shape
CN110751940B (en) Method, device, equipment and computer storage medium for generating voice packet
CN110085244B (en) Live broadcast interaction method and device, electronic equipment and readable storage medium
CN111225236B (en) Method and device for generating video cover, electronic equipment and computer-readable storage medium
CN112365877A (en) Speech synthesis method, speech synthesis device, electronic equipment and storage medium
CN111726682B (en) Video clip generation method, device, equipment and computer storage medium
WO2022000983A1 (en) Video processing method and apparatus, and electronic device and storage medium
CN111177453A (en) Method, device and equipment for controlling audio playing and computer readable storage medium
CN110647617B (en) Training sample construction method of dialogue guide model and model generation method
CN111538862A (en) Method and device for explaining video
CN111935502A (en) Video processing method, video processing device, electronic equipment and storage medium
CN112000781A (en) Information processing method and device in user conversation, electronic equipment and storage medium
CN111158924A (en) Content sharing method and device, electronic equipment and readable storage medium
CN112269867A (en) Method, device, equipment and storage medium for pushing information
CN112530419A (en) Voice recognition control method and device, electronic equipment and readable storage medium
CN112581933B (en) Speech synthesis model acquisition method and device, electronic equipment and storage medium
CN111883101B (en) Model training and speech synthesis method, device, equipment and medium
CN110674338B (en) Voice skill recommendation method, device, equipment and storage medium
CN105162839A (en) Data processing method, data processing device and data processing system
CN111970560A (en) Video acquisition method and device, electronic equipment and storage medium
CN111638787A (en) Method and device for displaying information
CN114422844B (en) Barrage material generation method, recommendation method, device, equipment, medium and product
CN111653263B (en) Volume adjusting method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200417

RJ01 Rejection of invention patent application after publication