CN113891156A

CN113891156A - Video playing method, video playing device, electronic equipment, storage medium and program product

Info

Publication number: CN113891156A
Application number: CN202111332449.4A
Authority: CN
Inventors: 马宏伟; 田晓丽; 王文浩; 王心杰; 曹洪伟
Original assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-01-04

Abstract

The present disclosure provides a video playing method, device, electronic device, storage medium and program product, which relate to the technical field of artificial intelligence, and in particular to the technical fields of intelligent interaction, NLP, computer vision, voice technology, intelligent recommendation, and the like. The specific implementation scheme is as follows: determining scenes appearing in a plurality of continuous shots currently in the process of playing the first video; determining at least one video segment associated with a scene in a first video; and performing episodic perspective on at least one video clip.

Description

Video playing method, video playing device, electronic equipment, storage medium and program product

Technical Field

The present disclosure relates to the field of artificial intelligence technology, specifically to the technical fields of intelligent interaction, NLP, computer vision, speech technology, intelligent recommendation, etc., and can be applied to scenes such as multimedia playing.

Background

With the continuous development of artificial intelligence technology, intelligent interaction, especially voice intelligent interaction, is more and more widely applied to intelligent equipment. At present, enriching the voice interaction function of an intelligent terminal to meet the video watching requirements of different users becomes a problem to be solved urgently in the industry.

Disclosure of Invention

The present disclosure provides a video playing method, apparatus, device, storage medium and computer program product.

According to an aspect of the present disclosure, there is provided a video playing method, including: determining scenes appearing in a plurality of continuous shots currently in the process of playing the first video; determining at least one video segment associated with a scene in a first video; and performing episodic perspective on at least one video clip.

According to another aspect of the present disclosure, there is provided a video playback apparatus including: the system comprises a scene determining module, a video clip determining module and an episode perspective module. The scene determining module is used for determining scenes appearing in a plurality of continuous shots currently in the process of playing the first video; a video clip determination module for determining at least one video clip associated with a scene in a first video; and the plot perspective module is used for performing plot perspective on at least one video clip.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Fig. 1 illustrates a system architecture of a video playing method and apparatus suitable for embodiments of the present disclosure;

fig. 2A illustrates a flow chart of a video playback method according to an embodiment of the present disclosure;

fig. 2B illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 2C illustrates a schematic diagram of a video playing method according to another embodiment of the present disclosure;

fig. 2D illustrates a schematic diagram of a video playing method according to yet another embodiment of the present disclosure;

fig. 2E illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 2F illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 2G illustrates a block diagram of a video playback device according to an embodiment of the present disclosure;

fig. 3A illustrates a flow chart of a video playback method according to another embodiment of the present disclosure;

fig. 3B illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 3C illustrates a schematic diagram of a video playing method according to another embodiment of the present disclosure;

fig. 3D illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 3E illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 3F illustrates a block diagram of a video playback device according to another embodiment of the present disclosure;

fig. 4A illustrates a flow chart of a video playback method according to yet another embodiment of the present disclosure;

fig. 4B illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 4C illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 4D illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 4E illustrates a block diagram of a video playback device according to yet another embodiment of the present disclosure;

fig. 5A illustrates a flow chart of a video playback method according to yet another embodiment of the present disclosure;

fig. 5B illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 5C illustrates a schematic diagram of a video playing method according to another embodiment of the present disclosure;

fig. 5D illustrates a schematic diagram of a video playing method according to still another embodiment of the present disclosure;

fig. 5E illustrates a block diagram of a video playback device according to yet another embodiment of the present disclosure;

fig. 6A illustrates a flow chart of a video playback method according to yet another embodiment of the present disclosure;

fig. 6B illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 6C illustrates a schematic diagram of a video playing method according to an embodiment of the present disclosure;

fig. 7 illustrates a block diagram of an electronic device for implementing a video playing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Video refers to content that is presented in a multimedia format by capturing, recording, processing, storing, transmitting, and reproducing a series of still images as electrical signals. Since the video has the appearance including sound, pictures and the like, the video has better transmission effect and appearance experience compared with the traditional transmission format such as characters and the like.

At present, the development of video is fast, and the content of video is increasingly diversified. In addition, the current intelligent terminal is also developed rapidly and supports video playing, recording, processing and the like, so that video transmission is taken as one of the current mainstream transmission modes, and a large number of video audiences are attracted.

When the intelligent terminal is used for playing videos, the following situations generally occur:

1) for example, when a tv series or the like is played on the smart terminal, the tv series or the like is played continuously in the form of an ensemble by default, and meanwhile, the currently played progress can be displayed below the currently played video, and a user can selectively watch specific content of the video by manually adjusting the progress bar.

In some cases, users are more inclined to view certain segments of a video than to view a full version of the video. For example, a television show typically includes multiple episodes, each episode having a video duration of about 40 minutes, the user has limited time available to view the video and is unable to view the entire television show including the multiple episodes completely, or the user is more interested in the role played by an actor in the television show.

2) The content of a video such as a tv series has a changing plot, and a user wants to know part of the currently played information of the video, such as the currently played episode, who some characters in the current video are, the singer of the currently played music (including tv series episode, head of a movie, tail of a movie, etc.), and so on, when watching the video or when watching the video continuously for a while.

3) Due to the specific background of some videos, a user cannot understand the scenario when watching the videos and cannot quickly enter the current situation of the videos, so that the impression of the user is influenced. For example, in an ancient movie and television show, the interval between the episode occurrence period and the current life period of the user is long, so that the language habit, the life habit and the like presented by the ancient movie and television show are different from the current language habit, the life habit and the like of the user. When a user watches such videos, it is difficult to understand the corresponding context.

4) When a user watches a video, the user may be influenced by other things to interrupt the watching. For example talking to others or leaving the screen temporarily, in which case the user can only pause the current video by manual operation.

5) The user is typically recommended other videos that are close to the type of the current video after finishing the viewing of the current video, but the user may also have further knowledge of other types of works that appear in the content of the current video or other types of works related to the current video.

The present disclosure will be described in detail below with reference to the drawings and specific embodiments.

A system architecture of a video playing method and apparatus suitable for the embodiments of the present disclosure is introduced as follows.

Fig. 1 illustrates a system architecture of a video playing method and apparatus suitable for the embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be used in other environments or scenarios.

As shown in fig. 1, the system architecture 100 in the embodiment of the present disclosure may include:

terminal devices

101A, 101B, 101C, network 102, and server 103. The network 102 serves as a medium for providing communication links between the

terminal devices

101A, 101B, 101C and the server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101A, 101B, 101C to interact with the server 103 over the network 102 to receive or send messages or the like. Various messaging client applications, such as a video player application, a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the

terminal devices

101A, 101B, 101C.

The

terminal devices

101A, 101B, 101C may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 103 may be a server that provides various services, such as a background management server (for example only) that provides support for websites browsed by users using the

terminal devices

101A, 101B, 101C. The background management server can analyze and process the received data such as the user request and feed back the processing result to the terminal equipment.

It should be noted that the video playing method provided by the embodiment of the present disclosure may be generally executed by the server 103. Accordingly, the video playing apparatus provided by the embodiment of the present disclosure may be generally deployed in the server 103. The video playing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and is capable of communicating with the

terminal devices

101A, 101B, 101C (and/or the server 103). Accordingly, the video playing apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 103 and capable of communicating with the

terminal devices

101A, 101B, and 101C (and/or the server 103).

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

According to the embodiment of the disclosure, a video playing method is provided, which can realize the function of inquiring assistant series in the video playing process.

As shown in fig. 2A, a video playing method 200 according to an embodiment of the present disclosure includes performing operations S210 to S220 in the process of playing a first video.

In operation S210, at least one video clip is determined according to the intention of the current user.

In operation S220, the playing of the first video is paused, and the following operations are performed: sequentially playing each video clip in at least one video clip according to a time sequence; or one or more video clips in the at least one video clip are played according to the selection of the current user.

It should be understood that in the embodiments of the present disclosure, the first video may be a relatively complete video file, such as an episode in a television series, for example, a first episode of a television series. The at least one video segment may be one or more video segments captured from the first video, or one or more video segments captured from other videos different from the first video, and the embodiment of the present disclosure is not limited herein.

In operation S210, the first video may be understood as a relatively complete large video segment, such as a video file corresponding to a complete video episode shown as a relatively complete video episode when played, and the first video may be, for example, a movie, a tv episode, or the like. The first video includes at least one video clip. The user's intention may be understood as the user's tendency to view the first video, e.g. towards viewing content associated with a certain actor or towards viewing content associated with a certain actor. Each video segment may be understood as a part of the video content of the complete video file, and when the first video includes a plurality of video segments, the video segments may be consecutive in time, or may be separated in time.

In operation S220, pausing the first video may be understood as determining at least one video segment according to the intention of the current user, and then playing the determined one or more video segments, where the video segment/segments may be a part of the first video, so that the first video and the video segment/segments are not played simultaneously, but the playing of the first video is paused first, and then the video segment/segments are played. In addition, playing the plurality of video segments of the at least one video segment may be understood as playing the plurality of video segments sequentially in time sequence or in a sequence designated by a user in a case where the plurality of video segments are included in the first video, according to a selection of a current user. The time sequence can be understood as the time sequence of the video clips in the first video from front to back in the time sequence when a plurality of video clips exist.

It should be understood that, in general, according to the personal preference of the user or the influence of an external factor (for example, the external factor may be insufficient time to watch the complete first video), some users may wish to watch a certain video segment or certain video segments more than to watch the complete first video, and the technical solution of the embodiment of the present disclosure may determine and play at least one video segment that the user wants to watch according to the current user's intention, that is, control the terminal device to play the corresponding video based on the current user's intention. Because the intention of the user can reflect the preference and the requirement of the user, the technical scheme of the embodiment of the disclosure can be suitable for the personal preference or the external factors of the user to select the video clip meeting the actual watching requirement of the user for playing the video. Meanwhile, compared with watching the whole large first video, only watching one or more small video segments saves more time and meets the video watching requirement of users with insufficient time.

It should be noted that, in the implementation of the present disclosure, one or more small video segments may be cut from each large video (e.g., the first video) in advance, and each small video segment may be labeled with an associated label, where the label of the video segment is used as an identifier of the video segment, and the core content or the subject content of the associated video segment may be described by one or two keywords. Further, after the preprocessing, in the video playing process, once the intention of the current user is found to match with a certain label or several labels, at least one video clip matching with the label or the several labels can be obtained to meet the watching requirement of the user.

In one embodiment, the video playing method may further include performing the following operations during the playing of the first video.

In response to obtaining the voice instruction issued by the current user, the intent of the current user is determined based on the voice instruction.

For example, as shown in fig. 2B, a voice instruction issued by the current user may be obtained, and by parsing the voice instruction, the intention of the current user may be determined, and then at least one video segment matching the intention of the user may be determined according to the intention of the user, where fig. 2B exemplarily shows three video segments matching the current intention of the user, which are: video clip P1, video clip P2, and video clip P3. Subsequently, according to the selection of the user or according to a default mode, the playing of the first video (the video currently playing) can be paused, and simultaneously, the corresponding playing flow of the video clips is started.

In the technical scheme of the embodiment of the disclosure, a user can control the playing of a video through a voice instruction. It should be understood that the voice command control mode has smaller limitation on space and operation objects than the manual control mode. Therefore, for the user, the control of video playing can be realized more conveniently in a voice interaction mode, namely, a voice instruction control mode, and the control mode does not influence the user to watch the video.

It should be noted that, determining the current user intention based on the voice instruction may be to recognize the received voice instruction, convert the voice instruction into a text instruction, and perform semantic analysis based on text information in the text instruction to determine the user intention expressed by the voice instruction.

It should be further noted that, in the embodiment of the present disclosure, the intelligent terminal with a voice interaction function may be used to perform video playing, so that, in the video playing process, a user may perform voice interaction with the intelligent terminal to convey a user intention of the user, and then the server may select and control the intelligent terminal to play a corresponding video clip according to the intention conveyed by the user, so as to meet the watching requirement of the user.

Illustratively, according to the video playing method of the embodiment of the present disclosure, the at least one video segment may include at least one of the following: a scenario segment regarding a specified actor, a scenario segment regarding a specified character, a scenario segment regarding a specified scene.

The user's intent can be roughly distinguished from the designation of actors, the designation of roles, and the designation of scenes. According to the technical scheme of the embodiment of the disclosure, the video clips are distinguished by the three devices, so that the requirement of a user for watching the video can be accurately matched.

Illustratively, the scenario segment about a specified actor may be a scenario segment including one specified actor, and may also be a scenario segment including a plurality of specified actors. It should be understood that a storyline segment of multiple designated actors means that multiple designated actors are included in the same video segment. The scenario segment regarding the designated role may be a scenario segment including a certain designated role, or may be a scenario segment including a plurality of designated roles. It should be understood that a scenario segment with multiple designated roles refers to a video segment that includes multiple designated roles.

The scenario segment of the designated scene may be understood as a scene-related segment in which the character in the scenario is located, and may also be referred to as a highlight segment of the video. For example, in some antique-type videos, a given scene may be a palace or the like, and a series of episode segments occurring/developing under the palace scene may be understood as episode segments occurring at the palace.

In another embodiment, a video playing method according to another embodiment of the present disclosure, wherein the at least one video segment may include at least one of: a designated scenario segment in a first video, a designated scenario segment in a video other than the first video.

For example, as shown in fig. 2C, in one embodiment, at least one video segment (e.g., a designated scenario segment) in the first video (i.e., a currently playing video, such as a first episode of a certain drama) may be determined according to the current user's intention, and at least one video segment (e.g., a designated scenario segment) in the other videos (e.g., other videos associated with the currently playing video, such as a first episode of a certain drama being a currently playing video and second to fifth episodes of a certain drama being other videos referred to in this embodiment) may also be determined according to the current user's intention. Fig. 2C shows three video clips in the first video, respectively video clip P11, video clip P12, and video clip P13. In addition, fig. 2C also shows three video clips, namely a video clip Pn1, a video clip Pn2, and a video clip Pn3, in other videos.

It should be understood that for a movie work such as a television show, the work itself includes a plurality of episode sets, each episode set is played separately when the television show is played, and each episode set that is a part of the entire work is also complete when played. When a user watches the videos, a certain episode, namely a first video, is watched currently, and scenario segments specified by the user may appear in the currently watched first video and may also appear in other videos. The other video may be understood as a video that is the same as the first video, is a complete video work, and is different from the first video.

In the technical scheme of the embodiment of the disclosure, the video segments in the first video currently watched by the user can be determined according to the intention of the current user, and the video segments in other videos different from the first video can also be determined, that is, all the video segments can be determined from the complete video works for the user to watch, so that the viewing experience of the user is improved.

Illustratively, according to the video playing method of the embodiment of the present disclosure, the first video and the other videos are different episodes in the same television episode.

Illustratively, the video playing method according to the embodiment of the present disclosure may further include: and after the playing of each video clip in the at least one video clip is completed according to the time sequence, returning to the pause playing position of the first video, and continuing to play the first video.

Taking the first video as an example of a certain episode of a television show, when a user does not select a certain video segment to play, each video segment in at least one video segment can be played in a time sequence, so as to ensure that the user does not miss the certain video segment. After the playing is completed, the playing place of the first video can be returned, and the playing of the first video is continued. It should be appreciated that when the user has not viewed each video segment in the first video, at least one video segment determined according to the user's intent may be played. When the user finishes watching each video segment in the first video, the user can know a part of the plot in the first video, but does not know the complete plot of the first video, and at this time, the user can return to the pause playing position of the first video and continue playing the first video, so that the user is provided with a choice of watching the complete first video.

Illustratively, the video playing method according to the embodiment of the present disclosure may further include: and returning to the pause playing position of the first video after the playing of one or more video clips in the at least one video clip is finished according to the selection of the current user, and continuing to play the first video.

Still taking the first video as an example of a certain episode of a tv show, when a current user selects a certain video segment to play, each of the selected at least one video segment may be played in time sequence to match the user's intention. After the playing is completed, the user can return to the playing place of the first video and continue playing the first video, and the user is provided with the option of watching the complete first video.

In another embodiment, a video playing method according to another embodiment of the present disclosure may further include: in the process of controlling the intelligent terminal to play the first video, the intelligent terminal is controlled to display corresponding peripheral information about the first video aiming at the consultation of the current user on the peripheral information of the first video.

As shown in fig. 2D, after the current user consults the peripheral information of the first video, the intelligent terminal may be controlled to display corresponding peripheral information about the first video while the intelligent terminal is controlled to play the first video.

The surrounding information may be understood as information related to the first video, such as the name of background music, the name of an actor of a character, and the like.

According to the technical scheme of the embodiment of the disclosure, when the user watches the first video which is played currently, the peripheral information of the first video is consulted, and the corresponding peripheral information of the first video is displayed to the user. Namely, the information interaction of the user when watching the first video is supported, the normal playing of the first video is not influenced, and the impression experience of the user can be improved.

Illustratively, the current user's consultation for the surrounding information of the first video may be a consultation by voice inquiry or a consultation through a dialog box.

Illustratively, the peripheral information of the first video may be presented in a manner that the peripheral information of the first video is displayed in another dialog box on the first video.

It should be noted that the video playing method according to the embodiment of the present disclosure may be executed by a video playing system, where the system may interact with a terminal for playing video in a cloud service manner, and the system may include a voice interaction subsystem, an information pushing subsystem, and the like, where the voice interaction subsystem supports a user to output an instruction in a voice form when viewing a first video, and the information pushing subsystem is configured to determine information related to the instruction according to a corresponding instruction, for example, in a technical solution of the embodiment of the present disclosure, the information pushing subsystem may determine a video clip associated with an intention of the user according to the intention of the user.

Illustratively, as shown in fig. 2E, a play screen of the first video may be presented within an a-region in the user interface. In the process of playing the first video, if the user voice indicates "see only the segment of actor X", the content showing the voice indication may be superimposed in the B region in the user interface, that is, on the played picture of the first video, that is, the "see only segment of actor X" may be displayed in a superimposed manner.

Illustratively, as shown in fig. 2E and 2F, during the playing of the first video, if the user voice indicates "only see the segment of actor X", in response to the voice indication, an option associated with the content of the voice indication or the like may be displayed in the C area in the user interface, i.e., superimposed on the playing frame of the first video, so as to facilitate the user to select the corresponding video segment for playing. For example, if the user voice indicates "see only segments of actor X", options such as "watch full video", "see only actor X", "see only actor Y", "see only actors X and Y", "see only essence segments" may be displayed in the C region in the user interface, i.e., superimposed on the play screen of the first video, to facilitate the user to select play.

According to the embodiment of the disclosure, a video playing device is further provided, which can realize the inquiry assistant series function in the video playing process.

As shown in fig. 2G, the image processing apparatus 200G according to the embodiment of the present disclosure includes a video clip determining module 210 and a video clip playing module 220.

The video segment determining module 210 is configured to determine at least one video segment according to the intention of the current user during the playing of the first video. In an embodiment, the video segment determining module 210 may be configured to perform the operation S210, which is not described herein again.

The video clip playing module 220 is configured to pause playing of the first video, and perform the following operations: sequentially playing each video clip in at least one video clip according to a time sequence; or one or more video clips in the at least one video clip are played according to the selection of the current user. In an embodiment, the video segment playing module 220 may be configured to perform the operation S220, which is not described herein again.

The video playing device according to the embodiment of the present disclosure may further include: a user intent confirmation module. And the user intention confirming module is used for responding to the acquired voice instruction sent by the current user in the process of playing the first video and determining the intention of the current user based on the voice instruction.

The video playing device according to the embodiment of the present disclosure, wherein the at least one video segment includes at least one of: a plot of a specified actor; a plot of a designated role; a storyline segment for a specified scene.

The video playing device according to the embodiment of the present disclosure, wherein the at least one video segment includes at least one of: a designated storyline segment in a first video; a designated storyline segment in a video other than the first video.

According to the video playing device of the embodiment of the disclosure, the first video and the other videos are different episodes in the same television episode.

The video playing device according to the embodiment of the present disclosure may further include: the first video playing module. And the first video playing module is used for restarting the playing of the first video after the video clip playing module finishes the playing of each video clip in at least one video clip according to the time sequence and returns to the pause playing position of the first video.

The video playing device according to the embodiment of the present disclosure may further include: and the second video playing module. And the second video playing module is used for restarting the playing of the first video after the video clip playing module finishes the playing of one or more video clips in at least one video clip according to the selection of the current user and returns to the pause playing position of the first video.

The video playing device according to the embodiment of the present disclosure may further include: and a peripheral information display module. And the peripheral information display module is used for displaying corresponding peripheral information related to the first video aiming at the consultation of the current user on the peripheral information of the first video in the process of playing the first video.

It should be understood that in the embodiments of the present disclosure, the embodiments of the apparatus portion described above correspond to the embodiments of the method portion described above, and the detailed description of the present disclosure is omitted here.

According to the embodiment of the disclosure, another video playing method is also provided, which can realize the plot perspective function in the video playing process.

As shown in fig. 3A, a video playing method 300A according to an embodiment of the present disclosure includes performing operations S310 to S330 during playing of a first video.

In operation S310, scenes currently occurring in a plurality of shots are determined.

In operation S320, at least one video clip associated with a scene in a first video is determined.

In operation S330, a storyboard is performed for at least one video clip.

In operation S310, the first video may be understood as a video currently being played, and the video is a relatively complete video file. Such as an episode in a certain drama set, such as a first episode of a certain drama. And the at least one video clip may be one or more video clips captured from the first video, the one or more video clips each being associated with a scene appearing in a plurality of footage currently being played. For example, the storyline or plot presented by the one or more video clips each occur in a scene that appears in a plurality of footage currently being played. Furthermore, the first video may be understood as representing a complete video file when played, and the first video may be a movie, a tv episode, or the like. The lens can be understood as a lens picture, that is, a continuous picture captured by an imaging device such as a camera from power on to power off, or a segment between two cropping points. A scene may be understood to include environmental information such as building context, geographic marker context, etc.

In operation S320, determining at least one video clip associated with a scene in the first video may specifically be: the first video is divided into a plurality of video units according to a preset time interval, and each video unit is analyzed, specifically including image recognition, subtitle recognition and the like. And marking a label related to the scene on each video unit according to the identification result, and combining the labels of two adjacent video units under the condition of consistency to finally obtain at least one video clip related to the scene.

In operation S330, the episodic perspective can be understood as presenting the episodic presentation of the video clip, so that the user can quickly know the episodic content of the current video clip, and the content of the episodic perspective is used as a reference for selective viewing.

When watching a video, a user may have a need to watch certain special episodes associated with the scene, for example, for some episodes of the antique type and a geographical type of documentary, etc., the user may want to watch the episodes of the scene being a palace or the episodes of a geographical environment.

It should be understood that a certain episode generally has a specific scene, and the scene is basically unchanged before episode skip, so that the relevance between the scene and the episode is high.

In another embodiment, the performing episode perspective on at least one video clip in operation S330 may include: and performing plot perspective with a corresponding floating window respectively for each video clip in at least one video clip.

Illustratively, as shown in fig. 3B, after determining at least one video clip associated with a scene according to the scene, the at least one video clip may be subjected to episodic perspective by way of floating windows. Fig. 3B exemplarily shows a case where, according to three video clips, i.e., a video clip P1, a video clip P2, and a video clip P3, determined by a scene, the video clip P1 makes an episodic perspective with a floating window.

In the technical solution of the embodiment of the present disclosure, for each video clip in the at least one video clip, performing episode perspective with a corresponding floating window respectively, specifically may be understood as: and respectively displaying corresponding floating windows aiming at each video clip, and displaying corresponding plot information in the floating windows to perform plot perspective. In the technical scheme of the embodiment of the disclosure, the floating window can be displayed on the window for playing the first video in a floating manner, so that a user can more conveniently acquire the information of the plot perspective in the floating window.

According to the video playing method of the embodiment of the present disclosure, performing episode perspective with the corresponding floating windows respectively may include displaying at least one of the following information of the corresponding video clips in the corresponding floating windows respectively: a video screenshot, a point in time that the current video clip appears in the first video, and a synopsis of the current video clip.

Video is a product with a preference to visual types, and users watch video mainly depend on vision to acquire information expressed by the video. It will be appreciated that the images are more intuitive than text, and the video shots described above may, for example, present character information, scene information, etc. including the current video segment. The user can schedule his or her viewing time with reference to the point in time when the current video clip appears in the first video. It should be appreciated that the video clip contains less content than the first video, and that a more refined episode can be presented to the user when the video clip is episode-oriented.

According to the video playing method of the embodiment of the present disclosure, for each video clip in at least one video clip, performing episode perspective with a corresponding floating window respectively may include: and sequentially displaying the floating windows corresponding to the video clips one by one according to the sequence of the time points of the video clips in the first video.

It should be understood that in a video work, the plot development is generally consistent with its timeline, and the viewing habits of the user are also generally viewed sequentially according to the time sequence. Therefore, in the technical scheme of the embodiment of the disclosure, the floating windows corresponding to the video segments one by one are sequentially displayed according to the sequence of the time points of the video segments in the first video, so that a user can keep consistent with the watching habits of the user when watching the video segments, and the information of the floating windows is checked according to the sequence of the plot development, so that the plot of the first video can be quickly and completely understood, and the situation of plot chaos cannot occur.

According to the video playing method of the embodiment of the disclosure, performing episode perspective on at least one video clip may include: and determining the sequence of the time points of the video clips in the at least one video clip in the first video, and displaying the video screenshots and the plot detailed introduction of the video clip with the most advanced time rank in the video clips.

In the disclosed embodiment, the video screenshot and episode detailed presentation showing the top ranked video clip of each video clip more tend to exemplarily show the functionality of the episode perspective to the user. It should be understood that, compared with the floating windows sequentially showing the video clips, in the embodiment of the present disclosure, only the video screenshot and the episode detailed description of the most temporally ranked video clip in each video clip are shown, and active skipping of the floating window is not involved, so generally, the user stays longer in the video screenshot and the episode detailed description of the most temporally ranked video clip, and accordingly, the episode description shown to the user may be more detailed, that is, the above-mentioned episode detailed description.

In another embodiment, the video playing method may further include performing the following operations.

And according to the selection of the current user, pausing the playing of the first video and playing one or more video clips selected by the current user in the at least one video clip.

Illustratively, as shown in fig. 3C, the playing of the first video may be paused according to the selection 1 of the current user, and one or more of the at least one video clips selected by the current user (e.g., the video clip P1, the video clip P2, and the video clip P3 are selected, and the several video clips are sequentially played) may be played to support the user to selectively watch the video clips.

It should be noted that, playing one or more video clips selected by the current user in the at least one video clip may be understood that the first video includes one or more video clips, and the user may select a video clip to play. Specifically, one video clip can be selected for playing, or a plurality of video clips can be selected for playing. It should be understood that the plurality of videos may be played sequentially in the time sequence of the plurality of videos in the first video.

In some cases, the user is more inclined to view a portion of the video clip in the first video. The technical scheme of the embodiment of the disclosure also supports playing the video clips selected by the user to provide different playing options, and meets various requirements of the user.

It should be appreciated that one or more of the at least one video segment selected by the user may be played when the user has not viewed each of the video segments in the first video. And when the user finishes watching one or more video segments in the at least one video segment, the user can know partial scenarios in the first video but does not know the complete scenarios in the first video, at this time, the user can return to the pause playing position of the first video and continue playing the first video, and the user is provided with the option of watching the complete first video.

For example, the video playing method of the embodiment of the present disclosure may further support a user to issue a voice instruction, where the voice instruction indicates a video clip related to a certain scene that the user wants to watch, and the video playing method of the embodiment of the present disclosure may be executed by a video playing system, the system may interact with a terminal of video playing in a cloud service manner, and the system may include a voice interaction subsystem, an information pushing subsystem, and the like, where the voice interaction subsystem supports the user to output an instruction in a voice form, and the information pushing subsystem is configured to determine information related to the instruction according to a corresponding instruction, for example, in a technical solution of the embodiment of the present disclosure, the information pushing subsystem may determine a video clip related to the scene according to different scenes.

Illustratively, as shown in fig. 3D, a playing screen of the first video and a progress bar 301 representing the playing progress thereof may be shown in an a area in the user interface. During the playing of the first video, at least one episode associated with scenes appearing in the current several shots may be sequentially presented along the progress bar 301, for example, episode perspective information corresponding to the episodes may be sequentially presented in the areas D1-Dn in the user interface, and the perspective information may include corresponding screenshot information, time points appearing in the whole video, and information of episode brief description. The area A can correspond to a large window, the areas D1-Dn can respectively and independently correspond to a small window, and the small windows can be displayed on the upper layer of the large window in a suspending and overlapping mode.

For example, as shown in fig. 3E, when performing episode perspective, screenshot information and episode detailed description information corresponding to the episode ranked first or selected by the user may be enlarged in an E region of the user interface.

According to the embodiment of the disclosure, a video playing device is further provided, which can realize the plot perspective function in the video playing process.

As shown in fig. 3F, a video playing device 300F according to an embodiment of the present disclosure includes: a scene determination module 310, a video clip determination module 320, and a storyboard module 330.

A scene determining module 310, configured to determine a scene appearing in the plurality of current shots during playing of the first video. In an embodiment, the scene determining module 310 may be configured to perform the operation S310, which is not described herein again.

A video clip determining module 320 for determining at least one video clip associated with a scene in a first video; in an embodiment, the video segment determining module 320 may be configured to perform the operation S320, which is not described herein again.

And an episodic perspective module 330 for conducting episodic perspective on at least one video clip. In an embodiment, the episodic perspective module 330 may be used to perform the operation S330, which is not described herein again.

According to the video playing device of the embodiment of the disclosure, the episode perspective module may include an episode perspective sub-module. And the episodic perspective submodule can be used for performing episodic perspective on each video clip in the at least one video clip by using the corresponding floating window.

According to the video playing device of the embodiment of the present disclosure, the performing of the episode perspective by the corresponding floating windows may include displaying at least one of the following information of the corresponding video clips in the corresponding floating windows respectively: a video screenshot, a point in time that the current video clip appears in the first video, and a synopsis of the current video clip.

According to the video playing device of the embodiment of the present disclosure, the episode perspective submodule may include a floating window display unit. The floating window display unit may be configured to sequentially display the floating windows corresponding to the video segments one to one according to a sequence of time points of the video segments in the first video.

The video playing device according to the embodiment of the present disclosure, wherein performing episode perspective on at least one video clip may include: and determining the sequence of the time points of the video clips in the at least one video clip in the first video, and displaying the video screenshots and the plot detailed introduction of the video clip with the most advanced time rank in the video clips.

The video playing device according to the embodiment of the present disclosure may further include a video clip playing module. The video clip playing module may be configured to pause playing of the first video according to a selection of a current user, and play one or more video clips selected by the current user in the at least one video clip.

The video playing device according to the embodiment of the present disclosure may further include a first video playing module. The first video playing module may be configured to return to a pause playing position of the first video and continue to play the first video after the playing of one or more video segments of the at least one video segment is completed according to the selection of the current user.

According to the embodiment of the disclosure, another video playing method is provided, which can realize information recommendation based on subtitles, music, characters and other objects (such as plants, animals, buildings and the like) appearing in the video during the video playing process, and realize information recommendation based on the first video after the first video is completely played.

Fig. 4A schematically shows a flow chart of a video playing method according to an embodiment of the present disclosure.

As shown in fig. 4A, a video playing method 400A according to an embodiment of the present disclosure includes performing operations S410 to S420 during playing of a first video.

In operation S410, in response to the output content containing the predetermined information, recommendation information associated with the predetermined information is acquired.

In operation S420, the intelligent terminal is controlled to display recommendation information in a designated area of a screen.

In operation S410, the predetermined information includes at least one of: predetermined captions, predetermined music, predetermined personas, predetermined props, predetermined vegetation, predetermined buildings. A first video may be understood to appear as a complete video file when played, and may be, for example, a movie, an episode of a television show, etc. The output content may be understood as content presented when the first video is played, including visual type content, such as pictures, subtitles. The content presented at the time of the first video playback also includes sound-type content, such as background music, voice-over, and the like. It should be understood that the output content is based on the current story needs and is richer in content, which may include buildings, personalities, props, vegetation, and so forth. The recommendation information may be understood as being associated with the predetermined information and being more detailed than the predetermined information, which is beneficial for the user to understand the related information such as the plot and the background of the first video.

It should be understood that, for example, an ancient movie and television play may not be able to enter the situation quickly when watching the video due to the difference between the language habits and the living habits of the characters, and the impression of the user may be affected due to the long interval between the episode occurrence period and the current living period of the user.

For example, the predetermined information and the recommendation information may be labeled respectively, and the association between the recommendation information and the predetermined information may be realized by matching the corresponding labels of the predetermined information and the recommendation information.

According to the technical scheme of the embodiment of the disclosure, the recommendation information matched with the preset information is displayed, other auxiliary information (namely the recommendation information) except for the plot can be displayed for the user watching the first video, the recommendation information is more detailed, the user can be helped to know the preset information more deeply, the user can understand and enter the situation quickly, and the user impression is improved.

Illustratively, the video playing method according to the embodiment of the present disclosure, wherein the controlling the intelligent terminal to display the recommendation information in the designated area of the screen may include: and controlling the intelligent terminal to display the recommendation information in a specified area of the screen in a floating layer mode.

Illustratively, as shown in fig. 4B, a play screen of a first video may be presented within an a window in the user interface. During the playing of the first video, corresponding recommendation information can be given according to one or more of currently output subtitles, background music/episodes, and characters, plants, animals and buildings appearing in the picture, and the recommendation information or relevant parts of the recommendation information can be presented in a B window in the user interface. Wherein, the window B is suspended and superposed on the upper layer of the window A.

Illustratively, as shown in fig. 4C, a play screen of a first video may be presented within an a window in the user interface. In the process of playing the first video, for example, when a verse "cloud thinks that clothes and clothes want flowers to look like" appears in the subtitle, related information such as the origin, author, first sentence of the verse and the like can be displayed in a window B in a user interface, and meanwhile, corresponding prompt information such as a prompt that a user clicks to confirm to view more poems or voices to view more poems can be displayed in the window B.

Illustratively, as shown in fig. 4C and 4D, when the user confirms "view more" by clicking or by voice, detailed information of the poem, such as the name of the poem, the dynasty to which the poem belongs, the content of the whole poem, and the related resolution of the poem, may be presented in the B window in the user interface. Therefore, the user can be helped to better understand the psychological activities, the emotions, the scenarios of the characters, the atmosphere of the scenarios to be rendered and the like, and the immersive movie viewing experience of the user can be improved.

In addition, in this embodiment, the floating layer may be understood as a display layer, which may be displayed in a floating manner to be superimposed on other windows or other display layers so as to be used for displaying the recommendation information, and the floating layer may be in a translucent or transparent state to prevent the other windows or other display layers located below from being blocked.

In another embodiment, in the above-mentioned video playing method, in the case that the predetermined subtitle includes a poetry sentence or a poetry, the recommendation information associated with the predetermined subtitle may include at least one of:

the ancient poems including ancient poems or ancient poems, authors of the ancient poems, the dynasties to which the authors belong and the analysis of the ancient poems; and/or in the case where the predetermined subtitle includes a ceremony or a literary expression, the recommendation information associated with the predetermined subtitle includes at least one of: the origin of the classical or literary works, or the analysis of the classical or literary works.

For example, in a movie show of a classic style, the language habit of a character is far from the language habit of a user in the current life period. When the predetermined subtitles include ancient poetry or ancient poetry/classical or literature, for example, the ancient poetry or ancient poetry has a specific format such as a seven-story absolute sentence, and the meanings of characters and words used in the ancient poetry or ancient poetry/classical or literature may be different from the meanings of the user at the current time, which may affect the user's understanding of the video.

According to the technical scheme, the recommendation information for understanding the specific content (ancient poetry, classical expression and literature) in the video can be provided for the user synchronously with the video watched by the user, so that the user can conveniently and quickly enter the situation, and the user impression is improved.

In another embodiment, in the above-mentioned video playing method, in a case where the predetermined information includes predetermined music, the recommendation information associated with the predetermined music may include at least one of: emotion expressed by predetermined music, full version of predetermined music, creator of predetermined music.

Music is generally used to assist in highlighting the atmosphere of the currently played content, and when the video is played, if other content appears on the screen besides the current video picture when the predetermined music does not highlight the atmosphere of the currently played content, the attention of the user may be diverted, and the impression of the user may be affected. For example, the recommendation information associated with the predetermined music may be presented after the predetermined music is played for a predetermined time.

In addition, the user has strong shared feeling capability on music, and the music can be enjoyed by the user without the current video, so that the background information related to the preset music can be provided for the user in the video playing process, and the requirement that the user wants to know more information related to the preset music at present can be met.

According to the technical scheme of the embodiment of the disclosure, comprehensive recommendation information can be provided for the user, the user can enter the plot quickly, and the impression of the user is improved.

In another embodiment, in the above-mentioned video playing method, in the case that the predetermined information includes a predetermined character, the recommendation information associated with the predetermined character may include at least one of: the biographical traces of the predetermined persona, the relationships between the predetermined persona and other personas.

In some videos, the number of the involved persons is large, or the relationship between the persons is complex, and especially when the user watches the videos at intervals, the user may forget the contents of some scenarios, the relationship between the persons, and the like. According to the technical scheme, the method and the device for the role-based content retrieval can help the user to quickly recall the content related to the role of the character, and the impression of the user is improved.

It should be noted that the duration of the video and the content that can be directly displayed are limited, and the subjective feelings of each user when watching the video are different, and the technical scheme of the embodiment of the present disclosure can expand the content of the video in a manner of displaying recommendation information, and improve the impression and viewing quality of the user.

According to the video playing method of the embodiment of the present disclosure, the displaying the recommendation information in the designated area of the screen in the floating layer form may include: displaying key information of the recommendation information in a specified area of a screen in a floating layer mode, responding to a received checking instruction sent by a current user, and then displaying complete information of the recommendation information.

It should be understood that the size of the frame in which the video is played is limited, and the length of the recommendation information and the size of the area in which the recommendation information is presented are correspondingly limited. According to the technical scheme of the embodiment of the disclosure, the key information of the recommendation information is displayed in the designated area of the screen in the floating layer mode, so that the selection of viewing the recommendation information can be provided for the user while the video viewing of the user is not influenced, and when the user is interested in the recommendation information, the complete recommendation information can be viewed in the modes of clicking and the like.

In another embodiment, the video playing method may further include, in response to the first video being played, performing at least one of the following operations: recommending the user to watch other episodes related to the first video through voice broadcasting; recommending a user to watch a music short associated with the first video; recommending that the user listen to music associated with the first video; recommending that the user watch a small item associated with the first video; recommending that a user watch a music series associated with the first video; the user is recommended to read the novel associated with the first video.

It should be understood that the content presented by the first video may have corresponding other types of works, for example, the content of the first video also has a thumbnail version, the content of the first video is changed by a corresponding novel, and so on. The first video may include some background music during playing, and the background music may also have a corresponding music short or music drama. Because the subjective feelings of various different types of works to the user are completely different, according to the technical scheme of the embodiment of the disclosure, other episodes related to the first video are recommended to the user, so that richer works can be provided for the user, the user can comprehensively and completely understand and expand the content of the first video by watching the different types of works related to the first video, and the impression experience of the user is improved.

According to the technical scheme, more abundant contents related to the first video can be provided for the user in a convenient voice broadcasting mode. And the content related to the first video is not limited to the video but may be other types of works such as the above-described music shorts, music, novels, drama, and the like.

Note that the above-described music may be understood as a work expressed only in the form of sound, and unlike music, a music episode may be understood as a work expressed in the form of a picture and sound, and unlike music and a music episode, a music episode may be understood as a work expressed in the form of a combination of music and a drama.

For example, the video playing method of the embodiment of the present disclosure may be executed by a video playing system, the system may interact with a terminal for playing video in a cloud service manner, and the system may include a voice interaction subsystem, an information recommendation subsystem, and the like, where the voice interaction subsystem supports a user to perform instruction output in a voice form, and the information recommendation subsystem is configured to recommend information to the user according to a recommendation condition, for example, in a technical solution of the embodiment of the present disclosure, the information recommendation subsystem may determine a video clip associated with predetermined information according to the predetermined information included in output content.

According to the embodiment of the present disclosure, a video playing apparatus is further provided, which can implement functions such as information recommendation based on subtitles, music, people, and other objects (such as plants, animals, buildings) appearing in a video during video playing, and information recommendation based on a first video after the first video is completely played.

As shown in fig. 4E, the video playing apparatus 400E according to the embodiment of the present disclosure includes: a recommendation information acquisition module 410 and a recommendation information presentation module 420.

The recommendation information obtaining module 410 may be configured to, in response to that the output content contains predetermined information during playing of the first video, obtain recommendation information associated with the predetermined information, where the predetermined information includes at least one of: predetermined captions, predetermined music, predetermined personas, predetermined props, predetermined vegetation, predetermined buildings. In an embodiment, the recommendation information obtaining module 410 may be configured to perform the operation S410, which is not described herein again.

The recommendation information presentation module 420 may be configured to present recommendation information in a designated area of the screen. In an embodiment, the recommendation information presenting module 420 may be configured to perform the operation S420, which is not described herein again.

According to the video playing device in the embodiment of the present disclosure, the recommendation information presentation module may include a recommendation information presentation sub-module. And the recommendation information display submodule can be used for displaying recommendation information in a floating layer mode in a specified area of the screen.

According to the video playing device of the embodiment of the present disclosure, in a case where the predetermined subtitles include ancient poetry or ancient poetry, the recommendation information associated with the predetermined subtitles may include at least one of: the ancient poems including ancient poems or ancient poems, authors of the ancient poems, the dynasties to which the authors belong and the analysis of the ancient poems; and/or, in the case where the predetermined subtitle includes a ceremony or a literary expression, the recommendation information associated with the predetermined subtitle includes at least one of: the origin of the classical or literary works, or the analysis of the classical or literary works.

According to the video playing apparatus of the embodiment of the present disclosure, in a case where the predetermined information includes predetermined music, the recommendation information associated with the predetermined music may include at least one of: emotion expressed by predetermined music, full version of predetermined music, creator of predetermined music.

The video playback apparatus according to an embodiment of the present disclosure, wherein in a case where the predetermined information includes a predetermined character, the recommendation information associated with the predetermined character may include at least one of: the biographical traces of the predetermined persona, the relationships between the predetermined persona and other personas.

According to the video playing device in the embodiment of the present disclosure, the recommendation information presentation sub-module may include a key information presentation unit and a complete information presentation unit.

The key information display unit may be configured to display the key information of the recommendation information in a designated area of the screen in a floating layer manner. And the complete information display unit can be used for responding to a received viewing instruction sent by the current user and then displaying the complete information of the recommended information.

The video playing device according to the embodiment of the present disclosure may further include an operation execution module.

The operation execution module is used for responding to the completion of the playing of the first video, and executing at least one of the following operations: recommending the user to watch other episodes related to the first video through voice broadcasting; recommending a user to watch a music short associated with the first video; recommending that the user listen to music associated with the first video; recommending that the user watch a small item associated with the first video; recommending that a user watch a music series associated with the first video; the user is recommended to read the novel associated with the first video.

According to the embodiment of the disclosure, another video playing method is also provided, which can realize an intelligent control function in the video playing process.

As shown in fig. 5A, a video playing method 500A according to an embodiment of the present disclosure includes performing operations S510 to S520 during playing of a first video.

In operation S510, at least one user within a preset area in front of a screen is detected.

In operation S520, in response to detecting that a side face of a user of the at least one user faces the screen or detecting that the at least one user in front of the screen has left a preset area, controlling the first video to pause playing.

When a user watches a video, the user may be influenced by other things to interrupt the watching. Such as talking to others or temporarily leaving, in which case the user can only manually pause playing the current video.

In operation S510, the first video may be understood as representing a complete video file when played, and the first video may be, for example, a movie, a tv episode, or the like. The preset area may be understood as a preset area in a space in front of the screen. It should be understood that the predetermined area may satisfy that the user may view the entire or part of the contents displayed in the screen when the user's face is within the predetermined area.

It should be understood that the user is facing, i.e., looking directly at the screen binocular or forming an angle with the screen within a range where the user can view the contents displayed in the whole or part of the screen. In operation S520, the side faces are directed to the screen, which means that the angle formed between the user' S eyes and the screen is beyond a certain range, and at this time, the user cannot view the contents displayed in the whole or part of the screen.

According to the technical scheme of the embodiment of the disclosure, in the process of playing the first video, at least one user in a preset area in front of a screen is detected, state information of the user when the user watches the video can be acquired, namely whether a phenomenon that a side face faces the screen or leaves the preset area occurs or not can be acquired, the phenomenon can indicate whether the user watches the video or not, and the first video is controlled to pause playing in response to the phenomenon that the side face faces the screen or leaves the preset area (namely, when the user does not watch the video).

It should be understood that the state of the user who does not watch the video can be more accurately represented by the side face facing the screen and leaving the preset area, and therefore, the technical scheme of the embodiment of the disclosure can be correspondingly more accurately matched with the state of the user who does not watch the video, and the first video is controlled to pause playing.

In another embodiment, the video playing method may further include the following operations.

And in response to the fact that all users in the preset area in front of the screen are detected to face the screen, restarting the playing of the first video from the pause playing position.

Illustratively, fig. 5B shows that when the first video is paused, in response to detecting that all users in a preset area in front of the screen face the screen, and then restarting playing the first video, it can be ensured that all users play the first video while watching the first video.

Under some circumstances, a plurality of users can watch the first video before the screen, and the technical scheme of the embodiment of the disclosure can ensure that the progress of watching the first video by each user is consistent when each user watches the first video at the same time, thereby improving the impression of each user. That is, when at least one of the plurality of users does not view the first video, the playing of the first video is paused, and when all of the plurality of users are facing the screen (i.e., viewing the first video), the first video is played.

And after at least one user leaves the preset area, in response to detecting that the user returns to the preset area before the screen, restarting the playing of the first video from the pause playing position.

Illustratively, fig. 5C shows that, when the first video is paused and at least one user has left the preset area, in response to detecting that the user has returned to the preset area before the screen, the playing of the first video is restarted, which can ensure that all users play the first video while watching the first video.

When a plurality of users are located in front of the screen and watch the first video at the same time, it may happen that a certain user temporarily leaves the preset area in front of the screen due to other reasons, and returns to the preset area in front of the screen after a certain time interval to watch the first video continuously.

According to the technical scheme of the embodiment of the disclosure, the first video can be controlled to pause playing according to the detected user state of the preset area in front of the screen, and the first video can be controlled to restart playing according to the detected user state of the preset area in front of the screen (the user state comprises that the user returns to the preset area in front of the screen), so that the video watching conditions of the user can be adapted.

For example, at the beginning of the first video playing, the number of users in a preset area in front of the screen and the watching state of the users can be determined as a reference for subsequently controlling the pausing or playing of the first video. The user viewing state here refers to a state of whether or not a user is present in a preset area, and a state of a front face or a side face of the user located in the preset area facing the screen. It should be understood that the number of users in the preset area before the screen at different time points may be compared with the number of users in the preset area before the screen at the initial time of playing the first video, so as to determine whether any user returns to the preset area before the screen.

And after at least one user leaves the preset area, in response to detecting that the user returns to the preset area in front of the screen and the front face faces the screen, restarting the playing of the first video from the pause playing position.

Illustratively, fig. 5D shows that, when the first video is paused, after at least one user has left the preset area, in response to detecting that the user is back within the preset area before the screen and the front face is facing the screen, the playing of the first video is restarted, which may ensure that all users are playing the first video while watching the first video.

It should be appreciated that when a user returns to a preset area in front of the screen, it is not possible to accurately indicate that the user is watching the first video. According to the technical scheme of the embodiment of the disclosure, whether the front face of the user faces the screen can be detected on the basis of detecting that the user returns to the preset area before the screen, when the front face of the user in the preset area before the user returns to the screen faces the screen, the fact that the user is watching the first video is more accurately represented, and at the moment, the playing of the first video is restarted.

Illustratively, according to the video playing method of the embodiment of the present disclosure, face recognition may be performed on at least one user in a preset area in front of the screen to determine whether a side face of the at least one user faces the screen.

According to the technical scheme of the embodiment of the disclosure, whether the side face of the user faces the screen can be determined more accurately through face recognition, and then the first video can be accurately controlled to pause.

Illustratively, the face recognition may be performed by a camera of the playing device of the first video.

It should be further noted that, in the technical solution of the embodiment of the present disclosure, all the related user information acquisition is based on obtaining the authorization or permission of the user.

Illustratively, according to the video playing method of the embodiment of the present disclosure, a preset area in front of a screen may be subjected to infrared sensing to determine whether at least one user in front of the screen has left the preset area.

It should be understood that when determining whether a user leaves a predetermined area in front of the screen based on the infrared sensing, it is necessary to ensure that the terminal playing the first video has the function of the infrared sensing. Specifically, the playing device of the first video may be provided with an infrared sensor to realize the function of infrared sensing.

Illustratively, according to a video playing method of a further embodiment of the present disclosure, image recognition may be performed on a preset area in front of a screen to determine whether at least one user in front of the screen has left the preset area.

It will be appreciated that determining whether a user is leaving a predetermined area in front of the screen based on image recognition ensures that the terminal viewing the first video has the relevant configuration and functionality of image recognition. For example, an image acquisition apparatus such as a camera. At present, users generally watch videos by using intelligent terminals such as mobile phones or computers supporting video playing, and the intelligent terminals supporting video playing are generally configured with image acquisition devices such as cameras.

According to the embodiment of the disclosure, another video playing device is also provided, which can realize an intelligent control function in the video playing process.

As shown in fig. 5E, the video playback device 500E according to the embodiment of the present disclosure includes a detection module 510 and a control module 520.

The detecting module 510 may be configured to detect at least one user in a preset area in front of the screen during the playing of the first video.

The control module 520 may be configured to control the first video to pause playing in response to detecting that a side face of at least one user is facing the screen or detecting that at least one user in front of the screen has left a preset area.

The video playing device according to the embodiment of the present disclosure may further include a first restart module. The first restarting module may be configured to restart the playing of the first video from the pause playing position in response to detecting that all users in the preset area in front of the screen are facing the screen.

The video playing device according to the embodiment of the present disclosure may further include a second restart module. The second restarting module may be configured to restart the playing of the first video from the pause playing position in response to detecting that the user returns to the preset area before the screen after the at least one user has left the preset area.

The video playing device according to the embodiment of the present disclosure may further include a third restarting module. And the third restarting module can be used for restarting the playing of the first video from the pause playing position in response to the fact that the user returns to the preset area in front of the screen and the front face faces the screen after the at least one user leaves the preset area.

According to the video playing device disclosed by the embodiment of the disclosure, face recognition can be performed on at least one user in a preset area in front of a screen to determine whether a side face of the user faces towards the screen.

According to the video playing device disclosed by the embodiment of the disclosure, infrared sensing can be performed on the preset area in front of the screen to determine whether at least one user in front of the screen leaves the preset area.

According to the video playing device in the embodiment of the disclosure, image recognition can be performed on the preset area in front of the screen to determine whether at least one user in front of the screen has left the preset area.

According to the embodiment of the disclosure, another video playing method is provided, and intelligent control can be realized through voice interaction in the video playing process.

As shown in fig. 6A, a video playing method 600A according to an embodiment of the present disclosure includes performing operations S610 to S640 during playing of a first video.

In operation S610, a storyboard is performed based on at least one video clip associated with a currently playing scene.

In operation S620, in response to receiving a voice query of a current user, a user intention expressed by the voice query is determined.

In operation S630, at least one video clip associated with the user' S intention is determined.

In operation S640, controlling the first video to pause playing and end the storyboard mode, and simultaneously performing the following operations: sequentially playing each video clip of at least one video clip associated with the user intention according to a time sequence; or, according to the selection of the current user, one or more video clips in the at least one video clip associated with the user intention are played.

It should be noted that, in operation S610, the episode perspective is performed based on at least one video clip associated with the currently played scene, which is described in detail in the video playing method of the above embodiment and is not repeated herein. In operation S620, in response to receiving the voice query of the current user, the user intention expressed by the voice query is determined, which is also described in detail in the video playing method of the above embodiment and is not described herein again. In operation S630, at least one video segment associated with the user' S intention is determined, which has been described in detail in the video playing method of the above embodiment and is not repeated herein.

A first video may be understood as a relatively complete large video segment, such as a video file corresponding to a complete set of videos that appear relatively complete when played, and may be, for example, a movie, a tv episode, etc. The first video comprises at least one video clip, the video clip is used as a part of the first video, and the video clip and the first video cannot be played simultaneously. The user's intention may be understood as a user's tendency idea when viewing the first video.

According to the technical scheme of the embodiment of the disclosure, a user can quickly know the plot content of the video clips in a plot perspective mode, and after at least one video clip associated with the user intention is determined, the first video can be controlled to pause playing and end the plot perspective mode, and the related video clip is played, so that the video can be selectively played according to the intention of the user.

In another embodiment, the video playing method may further include: in the process of playing the first video or each video clip or one or more video clips, responding to the condition that the output content contains the preset information, and acquiring recommendation information associated with the preset information; and displaying the recommendation information in a designated area of the screen.

The video playing method provided in this embodiment is the same as or similar to the method provided in the foregoing embodiment, and the disclosure is not repeated herein.

In another embodiment, the video playing method may further include: detecting at least one user in a preset area in front of a screen in the process of playing a first video or each video clip or one or more video clips; and controlling the playing of the currently playing video to be paused in response to detecting that a side face of at least one user faces the screen or detecting that at least one user in front of the screen leaves a preset area.

In another embodiment, in the above video playing method, the at least one video segment associated with the user intention may include at least one of: a plot of a specified actor; a plot of a designated role; a storyline segment for a specified scene.

In another embodiment, in the above video playing method, the at least one video segment associated with the user intention may include at least one of: a designated storyline segment in a first video; a designated storyline segment in a video other than the first video.

In another embodiment, in the above video playing method, the first video and the other videos are different episodes in the same television episode.

In another embodiment, the video playing method may further include: and after the playing of each video clip in at least one video clip is completed according to the time sequence, returning to the pause playing position of the first video, and continuing to play the first video.

In another embodiment, the video playing method may further include: and returning to the pause playing place of the first video and continuing to play the first video after the playing of one or more video clips in the at least one video clip associated with the user intention is finished according to the selection of the current user.

In another embodiment, the video playing method may further include: in the process of playing the first video, the current user consults the peripheral information of the first video, and meanwhile, the corresponding peripheral information about the first video is displayed.

In another embodiment, in the above video playing method, performing episodic perspective based on at least one video clip associated with a currently playing scene may include: and performing episode perspective with a corresponding floating window respectively for each video clip in at least one video clip associated with the current playing scene.

In another embodiment, in the above video playing method, performing episode perspective with the corresponding floating windows respectively may include displaying at least one of the following information of the corresponding video clips in the corresponding floating windows respectively: a video screenshot, a point in time that the current video clip appears in the first video, and a synopsis of the current video clip.

In another embodiment, in the above video playing method, for each of at least one video segment associated with a currently playing scene, performing episode perspective with a corresponding floating window respectively may include: and sequentially displaying the floating windows corresponding to the video clips one by one according to the sequence of the time points of the video clips in the first video in at least one video clip associated with the current playing scene.

In another embodiment, the performing of the episodic perspective based on at least one video clip associated with the currently playing scene in the above video playing method may include:

determining the sequence of the time points of the video clips in the at least one video clip associated with the current playing scene in the first video, and displaying the video screenshots and the plot detailed description of the video clip with the most advanced time rank in the video clips.

In another embodiment, the video playing method may further include: and before receiving the voice inquiry of the current user, pausing the playing of the first video according to the selection of the current user, and playing one or more video clips selected by the current user in at least one video clip associated with the current playing scene.

In another embodiment, the video playing method may further include: and returning to the pause playing position of the first video and continuing to play the first video after the playing of one or more video clips in at least one video clip associated with the currently playing scene is finished according to the selection of the current user.

In another embodiment, in the above video playing method, the predetermined information may include at least one of: predetermined captions, predetermined music, predetermined personas, predetermined props, predetermined vegetation, predetermined buildings.

In another embodiment, in the above video playing method, the presenting recommendation information in a designated area of the screen may include: and displaying the recommendation information in a floating layer mode in a specified area of the screen.

In another embodiment, in the above-mentioned video playing method, in the case that the predetermined subtitle includes a poetry sentence or a poetry, the recommendation information associated with the predetermined subtitle may include at least one of: the ancient poems including ancient poems or ancient poems, authors of the ancient poems, the dynasties to which the authors belong and the analysis of the ancient poems; and/or, in the case where the predetermined subtitle includes a ceremony or a literary expression, the recommendation information associated with the predetermined subtitle includes at least one of: the origin of the classical or literary works, or the analysis of the classical or literary works.

In another embodiment, in the above video playing method, the presenting the recommendation information in a specified area of the screen in a floating layer form may include: displaying key information of recommendation information in a specified area of a screen in a floating layer mode; and in response to receiving a viewing instruction sent by the current user, displaying the complete information of the recommendation information.

In another embodiment, the video playing method may further include: and in response to the fact that all users in the preset area in front of the screen are detected to face the screen, restarting the playing of the first video from the pause playing position.

In another embodiment, the video playing method may further include: and after at least one user leaves the preset area, in response to detecting that the user returns to the preset area in front of the screen, restarting the playing of the first video from the pause playing.

In another embodiment, the video playing method may further include: and after at least one user leaves the preset area, in response to detecting that the user returns to the preset area in front of the screen and the front face faces the screen, restarting the playing of the first video from the pause playing position.

In another embodiment, in the above video playing method, face recognition is performed on at least one user in a preset area in front of the screen to determine whether a side face of the at least one user faces the screen.

In another embodiment, in the above video playing method, a preset area in front of the screen is subjected to infrared sensing to determine whether at least one user in front of the screen has left the preset area.

In another embodiment, in the above video playing method, image recognition is performed on a preset area in front of the screen to determine whether at least one user in front of the screen has left the preset area.

The embodiment of the disclosure further provides a video playing device, which can realize intelligent control through voice interaction in the video playing process.

As shown in fig. 6C, the video playing apparatus 600C according to the embodiment of the present disclosure includes a storyboard module 610, an intention determining module 620, a clip determining module 630, and a playing control module 640.

The episodic perspective module 610 may be configured to perform episodic perspective based on at least one video clip associated with a currently playing scene during the playing of the first video. In an embodiment, the episodic perspective module 610 may be configured to perform the operation S610, which is not described herein again.

The intent determination module 620 may be configured to determine a user intent expressed by a voice query in response to receiving the voice query of the current user. In an embodiment, the intention determining module 620 may be configured to perform the operation S620, which is not described herein again.

A segment determination module 630 may be used to determine at least one video segment associated with the user's intent. In an embodiment, the segment determining module 630 may be configured to perform the operation S630, which is not described herein again.

The play control module 640 may be configured to control the first video to pause playing and end the storyboard mode, and perform the following operations: sequentially playing each video clip of at least one video clip associated with the user intention according to a time sequence; or, according to the selection of the current user, one or more video clips in the at least one video clip associated with the user intention are played. In an embodiment, the play control module 640 may be configured to perform the operation S640, which is not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as a video playback method. For example, in some embodiments, the video playback method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the video playback method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the video playback method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

In the technical scheme of the disclosure, the related data acquisition, data recording, data storage, data application and the like all accord with the regulations of related laws and regulations, and do not violate the good custom of the public order.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A video playback method, comprising: in the course of playing the first video,

determining scenes appearing in a plurality of current footage;

determining at least one video segment in the first video associated with the scene; and

performing episode perspective for the at least one video clip.

2. The method of claim 1, wherein performing episodic perspective on the at least one video clip comprises:

and performing episode perspective on each video clip in the at least one video clip by using the corresponding floating window respectively.

3. The method of claim 2, wherein performing episodic perspective with the corresponding floating windows respectively comprises presenting at least one of the following information of the corresponding video clip in the corresponding floating windows respectively: a video screenshot, a point in time that a current video clip appears in the first video, and an episode profile of the current video clip.

4. The method of claim 2 or 3, wherein performing, for each of the at least one video segment, episodic perspective with a corresponding floating window respectively comprises:

and sequentially displaying the floating windows which are in one-to-one correspondence with the video clips according to the sequence of the time points of the video clips in the first video.

5. The method of any of claims 1 to 4, wherein performing episodic perspective on the at least one video clip comprises:

and determining the sequence of the time points of the video clips in the at least one video clip in the first video, and displaying the video screenshots and the plot detailed introduction of the video clip with the most advanced time rank in the video clips.

6. The method of any of claims 1 to 5, further comprising:

7. The method of claim 6, further comprising: after the playing of the one or more video clips of the at least one video clip is completed according to the selection of the current user,

and returning to the pause playing position of the first video, and continuing to play the first video.

8. A video playback apparatus comprising:

the scene determining module is used for determining scenes appearing in a plurality of continuous shots currently in the process of playing the first video;

a video clip determination module for determining at least one video clip associated with the scene in the first video; and

and the plot perspective module is used for performing plot perspective on the at least one video clip.

9. The apparatus of claim 8, wherein the episodic perspective module comprises:

and the episodic perspective submodule is used for performing episodic perspective on each video clip in the at least one video clip by using the corresponding floating window.

10. The apparatus of claim 9, wherein performing episodic perspective with the corresponding floating windows respectively comprises presenting at least one of the following information of the corresponding video clip in the corresponding floating windows respectively: a video screenshot, a point in time that a current video clip appears in the first video, and an episode profile of the current video clip.

11. The apparatus of claim 9 or 10, wherein the episodic perspective sub-module comprises:

and the floating window display unit is used for sequentially displaying the floating windows corresponding to the video clips one by one according to the sequence of the time points of the video clips in the first video.

12. The apparatus of any of claims 8 to 11, wherein performing episodic perspective on the at least one video clip comprises:

13. The apparatus of any of claims 8 to 12, further comprising:

and the video clip playing module is used for pausing playing the first video according to the selection of the current user and playing one or more video clips selected by the current user in the at least one video clip.

14. The apparatus of claim 13, further comprising:

and the first video playing module is used for returning to the pause playing position of the first video and continuing to play the first video after the one or more video clips in the at least one video clip are played according to the selection of the current user.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.