CN113873323B

CN113873323B - Video playing method, device, electronic equipment and medium

Info

Publication number: CN113873323B
Application number: CN202110857081.7A
Authority: CN
Inventors: 侯在鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2023-08-29
Anticipated expiration: 2041-07-28
Also published as: CN113873323A

Abstract

The disclosure provides a video playing method, a video playing device, electronic equipment and a video playing medium, relates to the technical field of computers, and particularly relates to the technical fields of video playing technology, cloud computing and cloud service. The specific implementation scheme is as follows: selecting a target video attribute from the candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech or bullet screen; and extracting video content to be played from the target video according to the target video attribute. The video playing method and device achieve the effect of automatically positioning and playing the video content required by the user without manually controlling the playing progress by the user, and improve the video playing efficiency.

Description

Video playing method, device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical fields of video playing technology, cloud computing and cloud service, and particularly relates to a video playing method, a device, electronic equipment and a medium.

Background

Video resources such as movies, television shows or shows are currently mainstream, and the general duration is more than one hour, so that users can take longer to watch a video completely.

Existing video playback software typically only supports a user manually controlling a progress bar to coarsely select the video clip that is desired to be viewed.

Disclosure of Invention

The disclosure provides a method, a device, electronic equipment and a medium for improving video playing efficiency.

According to an aspect of the present disclosure, there is provided a video playing method, including:

selecting a target video attribute from the candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech or bullet screen;

and extracting video content to be played from the target video according to the target video attribute.

According to another aspect of the present disclosure, there is provided a video playing device including:

the target video attribute selection module is used for selecting target video attributes from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech or bullet screen;

and the video content extraction module is used for extracting video content to be played from the target video according to the target video attribute.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of any one of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flowchart of a video playing method disclosed according to an embodiment of the present disclosure;

FIG. 2A is a flow chart of a video playback method disclosed in accordance with an embodiment of the present disclosure;

FIG. 2B is an interface schematic diagram of target barrage information disclosed in accordance with an embodiment of the present disclosure;

FIG. 2C is an interface schematic diagram of a target bullet screen total disclosed in accordance with an embodiment of the present disclosure;

FIG. 3A is a flow chart of a video playback method disclosed in accordance with an embodiment of the present disclosure;

FIG. 3B is an interface diagram of candidate entity information according to an embodiment of the present disclosure;

FIG. 4A is a flow chart of a video playback method disclosed in accordance with an embodiment of the present disclosure;

FIG. 4B is an interface diagram of target speech information according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a video playback method disclosed in accordance with an embodiment of the present disclosure;

fig. 6 is a schematic structural view of a video playing device according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing the video playing method disclosed in the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The applicant finds that, when using the existing video playing software to play the video, if the user wants to watch the favorite video clip, the user usually manually controls the progress bar at the bottom of the video to adjust the video playing progress so as to locate the favorite video clip. However, in general, the user views a video for the first time, so that the user mostly does not know the specific playing time of the video clip to be viewed, and the user needs to repeatedly control the progress bar to accurately position the video clip to be viewed, which definitely greatly reduces the efficiency of video playing and results in poor video viewing experience of the user.

Fig. 1 is a flowchart of a video playing method according to an embodiment of the present disclosure, which may be suitable for a case of skip playing a target video. The method of the embodiment can be executed by the video playing device disclosed by the embodiment of the disclosure, and the device can be realized by software and/or hardware and can be integrated on any electronic equipment with computing capability.

As shown in fig. 1, the video playing method disclosed in this embodiment may include:

s101, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech, or barrage.

The target video is a video that a user is watching or about to watch, and may be a video resource locally stored by the client, that is, the target video is locally played at this time; the video source stored in the cloud can be also used, namely, the target video is played online at the moment. The candidate video attribute is extracted from the video content of the target video, and the candidate video attribute comprises at least one of an entity, a line or a barrage, wherein the entity represents an entity included in the video content of the target video, such as a role, a building, an automobile or a landscape, and the like; the speech lines represent speech lines related to each role in the video content of the target video; the bullet screen means a bullet screen transmitted in any video frame of a target video by a user who is watching the target video or is watching the target video.

In one embodiment, extracting attribute information from a target video in advance according to candidate video attributes of the target video, and determining attribute information corresponding to each candidate video attribute. The following three cases A, B and C may be specifically included:

A. when the candidate video attribute is an entity attribute, identifying entity information contained in each video frame of the target video based on an entity identification algorithm, wherein the entity information is used as attribute information corresponding to the entity attribute. And, each entity information is associated with the video frame to which it belongs, that is, at least one video frame containing the entity information can be determined according to any entity information. Optionally, if the entity attribute is a role, face information in each video frame in the target video is identified through a face recognition algorithm and is used as attribute information corresponding to the entity attribute.

B. When the candidate video attribute is a speech attribute, identifying audio data of each video frame of the target video through a speech recognition technology, thereby determining speech information contained in the target video and taking the speech information as attribute information corresponding to the speech attribute; or, the caption in each video frame of the target video is identified by the optical character recognition technology, so that the speech information contained in the target video is determined and used as the attribute information corresponding to the speech attribute. Each piece of speech information is associated with the video frame to which it belongs, that is, at least one video frame including the speech information can be specified from any piece of speech information.

C. When the candidate video attribute is a barrage attribute, the barrage information contained in each video frame of the target video is identified through an optical character identification technology and is used as attribute information corresponding to the entity attribute. And, each bullet screen information is associated with the video frame to which it belongs, that is, at least one video frame containing the bullet screen information can be determined according to any bullet screen information.

And after determining the attribute information of each candidate video attribute, displaying the candidate video attribute and the attribute information of the candidate video attribute in a control mode in an interface of the target video. When watching the target video, the user can select candidate attributes and attribute information according to own viewing requirements.

Specifically, after the user selects any candidate video attribute from the controls, the controls correspondingly display attribute information included in the candidate video attribute, and the user continues to select at least one target attribute information from the attribute information. And acquiring at least one piece of target attribute information of the candidate video attribute selected by the user, and taking the candidate video attribute as the target video attribute and taking at least one piece of target attribute information of the candidate video attribute as the target attribute information of the target video attribute if the user does not select other candidate video attributes later. For example, when the user selects only attribute information "character a" and "character B" of the candidate video attribute of "entity", the "entity" is taken as the target video attribute, and "character a" and "character B" are taken as the target attribute information of the "entity".

If the user continues to select at least one other candidate video attribute, screening attribute information of the other candidate video attributes according to at least one attribute information of the candidate video attribute selected for the first time, and displaying the residual attribute information of the screened other candidate video attributes for the user to select. And finally, the candidate video attribute selected for the first time and at least one other candidate video attribute selected at the time are used as target video attributes together, and the attribute information of the candidate video attribute selected for the first time and the residual attribute information of the at least one other candidate video attribute selected at the time are used as target attribute information of the target video attributes together.

For example, assuming that attribute information "character a" and "character B" of a candidate video attribute of "entity" are selected for the first time by the user, if the user subsequently selects other candidate video attributes of "speech", the "speech" includes four attribute information of "speech 1", "speech 2", "speech 3" and "speech 4", speech information "speech 1" and "speech 2" which do not belong to "character a" and "character B" are removed, and remaining attribute information "speech 3" and "speech 4" are obtained. If the user selects "speech 3", the "entity" and "speech" are used together as the target video attribute, and "character a", "character B" and "speech 3" are used together as the target attribute information of the target video attribute.

And selecting the target video attribute from the candidate video attributes according to the attribute information of the candidate video attributes in the target video, so as to lay a foundation for extracting the video content to be played according to the target video attribute.

S102, extracting video content to be played from the target video according to the target video attribute.

In one embodiment, when the number of the target video attributes is one and the target video attributes are entity, determining a target video frame associated with the target entity information according to the target entity information corresponding to the entity attributes and the association relation between the pre-established entity information and the video frame, and extracting the target video frame to generate video content to be played. For example, assuming that the target entity information is "role a", and the association relationship between "role a" and the 100 th to 200 th video frames of the target video is established in advance, the 100 th to 200 th video frames of the target video are extracted to generate video content to be played.

In another embodiment, when the number of the target video attributes is one and the target video attributes are the speech, determining a target video frame associated with the target speech information according to the target speech information corresponding to the speech attributes and the association relation between the speech information and the video frame, and extracting the target video frame to generate video content to be played. For example, assuming that the target speech information is "what is eaten in the noon today", and the association relation between the "what is eaten in the noon today" and the 20 th to 40 th video frames of the target video is established in advance, the 20 th to 40 th video frames of the target video are extracted to generate video contents to be played.

In another embodiment, when the number of the target video attributes is one and the video is a bullet screen, determining a target video frame associated with the target bullet screen information according to the target bullet screen information corresponding to the bullet screen attributes and the association relation between the pre-established bullet screen information and the video frame, and extracting the target video frame to generate video content to be played. For example, assuming that the target barrage information is "good and beautiful scenery", the association relationship between the "good and beautiful scenery" and the 30 th to 50 th, 80 th to 90 th and 110 th to 120 th video frames of the target video is established in advance, and then the 30 th to 50 th, 80 th to 90 th and 110 th to 120 th video frames of the target video are extracted to generate video contents to be played.

In another embodiment, in the case where the number of target video attributes is at least two, explanation is given taking two examples of "entity" and "speech". According to target entity information corresponding to the entity attribute and the pre-established association relation between the entity information and the video frames, determining a first target video frame associated with the target entity information, and according to target line information corresponding to the line attribute and the pre-established association relation between the line information and the video frames, determining a second target video frame associated with the target line information from the first target video frame, and extracting the second target video frame to generate video content to be played.

Selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: the entity, the line or the bullet screen extracts the video content to be played from the target video according to the target video attribute, the effect of automatically positioning and playing the video content required by the user without manually controlling the playing progress by the user is achieved, the video playing efficiency is improved, the multidimensional video skip is supported, and the personalized video watching requirement of the user is met.

Fig. 2A is a flowchart of a video playing method according to an embodiment of the present disclosure, which is applicable to the case that the number of target video attributes is one and is a bullet screen, is further optimized and expanded based on the above technical solution, and may be combined with the above various optional embodiments.

As shown in fig. 2A, the video playing method disclosed in this embodiment may include:

s201, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech, or barrage.

S202, determining the occurrence times of candidate barrage information in the target video under the condition that the target video attribute is barrage.

In one embodiment, the barrage resources sent by the user in the target video are packaged into an OCR (Optical Character Recognition ) interface, barrage content of the target video is analyzed through an OCR algorithm, candidate barrage information contained in the target video is obtained, and the number of times that each candidate barrage information appears in the target video is counted. For example, assuming that 10 pieces of bullet screen information of "highlight scenario" appear in total in the target video, the number of times that "highlight scenario" appears in the target video is 10.

S203, selecting target barrage information to be displayed from the candidate barrage information according to the occurrence times.

In one embodiment, the number of occurrences is sorted in descending order from high to low according to the number of occurrences of each candidate barrage information in the target video, and the target number of occurrences is determined according to the sorting result, for example, the first three numbers of occurrences with the largest numerical value are used as the target number of occurrences. And then determining candidate barrage information corresponding to the target occurrence times according to the target occurrence times, and taking the candidate barrage information as target barrage information to be displayed.

For example, assuming that the target appearance times are "100", "95" and "90", the candidate barrage information corresponding to the target appearance times "100" is "highlight scenario", the candidate barrage information corresponding to the target appearance times "95" is "man main angle out scene", and the candidate barrage information corresponding to the target appearance times "90" is "scene of name", the "highlight scenario", "man main angle out scene" and "scene of name" are taken as the target barrage information to be displayed.

S204, extracting video content to be played from the target video according to the target barrage information.

In one embodiment, a user selects at least one of the target bullet screen information as index bullet screen information according to the displayed target bullet screen information, and extracts video content to be played from the target video according to the index bullet screen information selected by the user and the association relation between the pre-established bullet screen information and the video frame.

Optionally, S204 includes:

extracting a video frame comprising the target barrage information from the target video; and generating the video content to be played according to the extracted video frames.

Specifically, according to index barrage information selected by a user and the association relation between the pre-established barrage information and video frames, determining a target video frame associated with the index barrage information, and extracting the target video frame to generate video content to be played.

The video frames comprising the target barrage information are extracted from the target video, and video content to be played is generated according to the extracted video frames, so that the effect of generating the video content to be played based on the barrage information selected by the user is achieved, and the personalized viewing requirement of the user is met.

Fig. 2B is an interface schematic diagram of target barrage information according to an embodiment of the disclosure, as shown in fig. 2B, where 20, 21 and 22 respectively represent three target barrage information, and a user may select at least one of the three target barrage information as index barrage information, and further extract video content to be played from the target video 23 according to the index barrage information selected by the user and the association relationship between the pre-established barrage information and the video frame.

According to the video playing method and device, under the condition that the target video attribute is a barrage, the occurrence frequency of candidate barrage information in the target video is determined, the target barrage information to be displayed is selected from the candidate barrage information according to the occurrence frequency, and then video content to be played is extracted from the target video according to the target barrage information, so that the effect of automatically positioning and playing video content required by a user without manually controlling the playing progress by the user is achieved, the video playing efficiency is improved, the video skip of the user based on barrage dimension is also supported, the effect of user personalized watching demand is met, and because the target barrage information is determined according to the occurrence frequency, the corresponding video frame is usually a wonderful video segment, and the watching experience of the user is improved.

On the basis of the above embodiment, the method further includes:

and under the condition that the target video attribute is a barrage, determining the total number of barrages of each candidate video frame in the target video, and extracting video content to be played from the target video according to the total number of barrages.

In one embodiment, the total number of shots of each candidate video frame is sorted in descending order from high to low according to the total number of shots, and the target total number of shots is determined according to the sorting result, for example, the first three total numbers with the largest total number of shots are taken as the target total number of shots. And then determining candidate video frames corresponding to the target video frames according to the total number of the target barrages, displaying the total number of the target barrages and the playing time of the target video frames, and selecting and determining index target video frames by a user so as to extract video contents to be played from the target video frames.

For example, if the user selects the target barrage total number with the highest barrage total number, the target video frame corresponding to the target barrage total number is used as the index target video frame, and then the index target video frame is extracted from the target video frames to be used as the video content to be played.

Fig. 2C is an interface schematic diagram of a total number of target backdrops according to an embodiment of the present disclosure, as shown in fig. 2C, in which 24, 25, and 26 respectively represent the total number of three target backdrops, and at least one of them may be selected by a user to determine an index target video frame, and further extract the index target video frame from the target video 23 as video content to be played.

Fig. 3A is a flowchart of a video playing method according to an embodiment of the present disclosure, which is applicable to a case where the number of target video attributes is one and is physical, is further optimized and expanded based on the above technical solution, and may be combined with the above various optional embodiments.

As shown in fig. 3A, the video playing method disclosed in this embodiment may include:

s301, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech, or barrage.

S302, under the condition that the target video attribute is an entity, determining candidate entity information to be displayed, which is included in the target video.

In one embodiment, each video frame of the target video is imported into the identification interface in batches, entity information included in the target video is analyzed frame by frame through a preset entity identification algorithm, and candidate entity information to be displayed included in the target video is output. And displaying the playing total time of the video frames to which each candidate entity information belongs together with the corresponding candidate entity information.

S303, selecting target entity information from the candidate entity information, and extracting video content to be played from the target video according to the target entity information.

In one embodiment, a user selects at least one of the candidate entity information as target entity information according to the displayed candidate entity information, and extracts a video frame associated with the target entity information from the target video as played video content according to the target entity information selected by the user and the association relationship between the pre-established entity information and the video frame. The selection operation of the user may be a touch selection operation, such as a single click, a double click, or a drag; or a voice selection operation, for example, by speaking a voice instruction "i want to see a section including XX", or the like.

Fig. 3B is an interface schematic diagram of candidate entity information according to an embodiment of the present disclosure, as shown in fig. 3B, where 27, 28 and 29 respectively represent three candidate entity information, and a user may select at least one of the candidate entity information as target entity information, and further extract video content to be played from the target video 23 according to the target entity information selected by the user and the association relationship between the pre-established entity information and the video frame.

According to the method and the device, under the condition that the target video attribute is the entity, candidate entity information to be displayed is determined, the target entity information is selected from the candidate entity information, and video content to be played is extracted from the target video according to the target entity information, so that the effect of automatically positioning and playing video content required by a user without manually controlling the playing progress by the user is achieved, video playing efficiency is improved, video skip based on entity dimensions by the user is supported, and the effect of meeting personalized video watching requirements of the user is achieved.

Fig. 4A is a flowchart of a video playing method according to an embodiment of the present disclosure, which is applicable to a case where the number of target video attributes is one and is a speech, is further optimized and expanded based on the above technical solution, and may be combined with the above various optional embodiments.

As shown in fig. 4A, the video playing method disclosed in this embodiment may include:

s401, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, speech, or barrage.

S402, determining a heat value of candidate line information in the target video when the target video attribute is line.

In one embodiment, each video frame of the target video is identified by OCR technology or speech recognition technology, candidate speech information included in the target video is determined, and then a hotness value of each candidate speech information is obtained by weighted calculation according to factors such as the number of times each candidate speech information is searched, the number of browses or the number of web pages recorded.

S403, selecting target speech information to be displayed from the candidate speech information according to the heat value.

In one embodiment, the candidate speech information is sorted in descending order from high to low according to the heat value of each candidate speech information, and the target speech information is determined according to the sorting result, for example, the candidate speech information corresponding to the first three heat values with the largest heat value is used as the target speech information. And further, the target speech information and the playing time of the video frame to which the target speech information belongs are displayed together.

In another embodiment, the candidate speech information is matched with the preset classical speech, and according to the matching result, the candidate speech information matched with any classical speech is used as target speech information. The classical speech is obtained according to the heat value of speech.

S404, extracting video content to be played from the target video according to the target speech information.

In one embodiment, a user selects at least one of the target speech information as index speech information according to the displayed target speech information, and extracts video content to be played from the target video according to the index speech information selected by the user and the association relation between the pre-established speech information and the video frame.

Fig. 4B is an interface schematic diagram of target speech information according to an embodiment of the present disclosure, as shown in fig. 4B, in which 30, 31 and 32 respectively represent three target speech information, and a user may select at least one of the three target speech information as index speech information, and further extract video content to be played from the target video 23 according to the index speech information selected by the user and the association relationship between the pre-established speech information and the video frame.

According to the method and the device, under the condition that the target video attribute is the speech, the heat value of candidate speech information in the target video is determined, the target speech information to be displayed is selected from the candidate speech information according to the heat value, and then the video content to be played is extracted from the target video according to the target speech information, so that the effect of automatically positioning and playing the video content required by the user without manually controlling the playing progress by the user is achieved, the video playing efficiency is improved, the video skip of the user based on speech dimension is also supported, the effect of user individuation viewing requirements is met, and because the target speech information is determined according to the heat value, the corresponding video frame is usually a wonderful video segment, and the viewing experience of the user is improved.

On the basis of the above embodiment, before "determining the heat value of the candidate speech information in the target video" in S402, the method further includes:

a) And under the condition that the target video has subtitles, carrying out optical character recognition on video frames included in the target video, and determining the candidate line information included in the target video.

Specifically, under the condition that the target video has subtitles, according to the position coordinates of the subtitles in the target video, optical character recognition is carried out on video contents in the position coordinates, and candidate speech information included in the target video is determined.

B) And under the condition that the target video does not have subtitles, performing voice recognition on video frames included in the target video, and determining the candidate speech information included in the target video.

Specifically, under the condition that the target video does not have subtitles, performing voice recognition on video frames included in the target video, converting audio data of each video frame into text information, and determining candidate speech information included in the target video according to the text information obtained by voice recognition.

Under the condition that the target video has subtitles, optical character recognition is carried out on video frames included in the target video, and candidate speech information included in the target video is determined; under the condition that the target video does not have subtitles, voice recognition is carried out on video frames included in the target video, and candidate speech information included in the target video is determined, so that whether the target video includes the subtitles or not, the candidate speech information of the target video can be determined, and the flexibility and reliability of candidate speech information determination are improved.

Fig. 5 is a flowchart of a video playing method according to an embodiment of the present disclosure, which is applicable to a case where the number of target video attributes is at least two, is further optimized and expanded based on the above technical solution, and may be combined with the above various optional embodiments.

As shown in fig. 5, the video playing method disclosed in this embodiment may include:

s501, acquiring target attribute information of any candidate video attribute selected from attribute information of the candidate video attribute in the target video.

In one embodiment, if the user wants to select at least two target video attributes from the candidate video attributes, at least one target attribute information of any candidate video selected by the user first is acquired. For example, when the user first selects attribute information "character a" and "character B" of the candidate video attribute of "entity", the user sets "character a" and "character B" as target attribute information.

S502, screening attribute information of other candidate video attributes except the candidate video attribute according to the target attribute information of the candidate video attribute to obtain residual attribute information of the other candidate video attributes.

In one embodiment, the attribute information of other candidate video attributes except the candidate video attribute is filtered according to the obtained target attribute information, the attribute information conflicting with the target attribute information in the other candidate video attributes is removed, only the attribute information having intersection with the target attribute information in the other candidate video attributes is reserved as the residual attribute information, and the residual attribute information is displayed to the user.

For example, if the target attribute information is "character a" and "character B", and the attribute information of the "speech" attribute includes "speech 1", "speech 2", "speech 3" and "speech 4", it is determined that "speech 1" and "speech 2" are speech spoken by "character C" based on the correspondence relationship between speech and character, and thus it is determined that "speech 1" and "speech 2" collide with "character a" and "character B", then "speech 1" and "speech 2" are removed, and "speech 3" and "speech 4" are regarded as the remaining attribute information.

S503, determining the target video attribute and the target attribute information of the target video attribute according to the target attribute information of the candidate video attribute and the residual attribute information of other candidate video attributes.

In one embodiment, the user selects at least one piece of remaining attribute information from the remaining attribute information of at least one other candidate video attribute according to the presented remaining attribute information of the other candidate video attribute, takes the selected at least one other candidate video attribute as a secondary candidate video attribute according to a selection operation of the user, and takes the selected at least one piece of remaining attribute information as secondary attribute information. And then the candidate video attribute and the secondary candidate video attribute are used as target video attributes, and the target attribute information and the secondary attribute information of the candidate video attribute are used as target attribute information of the target video attribute.

For example, assuming that the attribute information "character a" and "character B" of the candidate video attribute of "entity" is selected for the first time by the user, and the remaining attribute information "speech 3" and "speech 4" of the other candidate video attribute of "speech" is selected for the second time by the user, the "entity" and "speech" are taken together as the target video attribute, and the "character a", "character B", "speech 3" and "speech 4" are taken together as the target attribute information of the target video attribute.

S504, extracting video content to be played from the target video according to the target video attribute and the target attribute information of the target video attribute.

According to the method and the device, the target attribute information of the candidate video attribute selected from the attribute information of any candidate video attribute in the target video is obtained, the attribute information of other candidate video attributes except the candidate video attribute is screened according to the target attribute information of the candidate video attribute, the residual attribute information of the other candidate video attribute is obtained, the target attribute information of the target video attribute and the target attribute information of the target video attribute are determined according to the target attribute information of the candidate video attribute and the residual attribute information of the other candidate video attribute, play conflict is avoided between the target attribute information, when a video frame corresponding to one target attribute information is played, the problem that video frames corresponding to other target attribute information are suddenly jumped to play is solved, and the viewing experience of a user is guaranteed to be better.

Fig. 6 is a schematic structural diagram of a video playing device according to an embodiment of the present disclosure, which may be suitable for a case of skip playing a target video. The device of the embodiment can be implemented by software and/or hardware, and can be integrated on any electronic equipment with computing capability.

As shown in fig. 6, the video playing device 60 disclosed in the present embodiment may include a target video attribute selection module 61 and a video content extraction module 62, wherein:

a target video attribute selection module 61, configured to select a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in a target video; the candidate video attributes include at least one of: entity, speech or bullet screen;

the video content extraction module 62 is configured to extract video content to be played from the target video according to the target video attribute.

Optionally, the video content extraction module 62 is specifically configured to:

under the condition that the target video attribute is a barrage, determining the occurrence times of candidate barrage information in the target video;

selecting target barrage information to be displayed from the candidate barrage information according to the occurrence times;

And extracting video content to be played from the target video according to the target barrage information.

Optionally, the video content extraction module 62 is specifically further configured to:

extracting a video frame comprising the target barrage information from the target video;

and generating the video content to be played according to the extracted video frames.

under the condition that the target video attribute is an entity, determining candidate entity information to be displayed, which is included in the target video;

and selecting target entity information from the candidate entity information, and extracting video content to be played from the target video according to the target entity information.

under the condition that the target video attribute is a line, determining a heat value of candidate line information in the target video;

selecting target speech information to be displayed from the candidate speech information according to the heat value;

and extracting video content to be played from the target video according to the target speech information.

Optionally, the device further includes a speech information determining module, specifically configured to:

Under the condition that the target video has subtitles, carrying out optical character recognition on video frames included in the target video, and determining the candidate line information included in the target video;

and under the condition that the target video does not have subtitles, performing voice recognition on video frames included in the target video, and determining the candidate speech information included in the target video.

Optionally, the target video attribute selection module 61 is specifically configured to:

acquiring target attribute information of any candidate video attribute selected from attribute information of the candidate video attribute in the target video;

screening attribute information of other candidate video attributes except the candidate video attribute according to the target attribute information of the candidate video attribute to obtain residual attribute information of the other candidate video attributes;

and determining the target video attribute and the target attribute information of the target video attribute according to the target attribute information of the candidate video attribute and the residual attribute information of other candidate video attributes.

The video playing device 60 disclosed in the embodiments of the present disclosure may execute the video playing method disclosed in the embodiments of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method. Reference is made to the description of any method embodiment of the disclosure for details not explicitly described in this embodiment.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a video playback method. For example, in some embodiments, the video playback method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the video playback method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the video playback method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video playing method, comprising:

extracting video content to be played from the target video according to the target video attribute;

wherein selecting the target video attribute from the candidate video attributes according to the attribute information of the candidate video attributes in the target video comprises:

Extracting attribute information of a target video in advance according to candidate video attributes in the target video, and determining attribute information corresponding to each candidate video attribute;

displaying each candidate video attribute and attribute information corresponding to each candidate video attribute in a control mode in an interface of the target video so that a user selects the target video attribute from the candidate video attributes according to own viewing requirements;

when the user selects any candidate video attribute from the control, the control displays attribute information included in the any candidate video attribute, so that the user continues to select at least one target attribute information from the attribute information, and the at least one target attribute information of the candidate video attribute is used as target attribute information of the target video attribute;

if the user continues to select at least one other candidate video attribute, screening attribute information of the other candidate video attributes according to at least one attribute information of the candidate video attribute selected for the first time to obtain residual attribute information of the other candidate video attributes; and taking the first selected candidate video attribute and at least one other candidate video attribute selected at the time as target video attributes together, and taking attribute information of the first selected candidate video attribute and residual attribute information of the at least one other candidate video attribute selected at the time as target attribute information of the target video attributes together.

2. The method of claim 1, wherein the extracting video content to be played from the target video according to the target video attribute comprises:

3. The method of claim 2, wherein the extracting video content to be played from the target video according to the target barrage information comprises:

4. The method of claim 1, wherein the extracting video content to be played from the target video according to the target video attribute comprises:

5. The method of claim 1, wherein the extracting video content to be played from the target video according to the target video attribute comprises:

6. The method according to claim 5, further comprising, prior to determining the hotness value of the candidate speech information in the target video:

7. A video playback device comprising:

the video content extraction module is used for extracting video content to be played from the target video according to the target video attribute;

the target video attribute selection module is specifically configured to extract attribute information of a target video in advance according to candidate video attributes in the target video, and determine attribute information corresponding to each candidate video attribute; displaying each candidate video attribute and attribute information corresponding to each candidate video attribute in a control mode in an interface of the target video so that a user selects the target video attribute from the candidate video attributes according to own viewing requirements

The target video attribute selection module is specifically further configured to, when the user selects any candidate video attribute from the control, display attribute information included in the any candidate video attribute by using the control, so that the user continues to select at least one target attribute information from the attribute information, and use the at least one target attribute information of the candidate video attribute as target attribute information of the target video attribute;

8. The apparatus of claim 7, wherein the video content extraction module is specifically configured to:

9. The apparatus of claim 8, wherein the video content extraction module is further specifically configured to:

10. The apparatus of claim 7, wherein the video content extraction module is further specifically configured to:

11. The apparatus of claim 7, wherein the video content extraction module is further specifically configured to:

12. The apparatus of claim 11, further comprising a speech information determination module, in particular for:

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.