CN111182358A

CN111182358A - Video processing method, video playing method, device, equipment and storage medium

Info

Publication number: CN111182358A
Application number: CN201911389183.XA
Authority: CN
Inventors: 符德恩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-19
Anticipated expiration: 2039-12-30
Also published as: CN111182358B

Abstract

The application provides a video processing method, a video playing device, equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a recorded video of live video; identifying the content of the recorded video to obtain n video segments, wherein n is a positive integer; determining selection weights corresponding to the n video clips respectively; selecting m video clips from the n video clips according to the selection weights corresponding to the n video clips respectively, wherein m is a positive integer less than or equal to n; and generating a playback video of the live video according to the m video clips. Compared with the related art, the playback video of the live video is longer, so that the flow consumption is higher and the time cost is higher when the playback video is watched. According to the technical scheme provided by the embodiment of the application, some segments in the live video are selected to generate the playback video of the live video, so that the duration of the playback video is shortened, the flow consumed when the playback video is watched is reduced, and the time cost is reduced.

Description

Video processing method, video playing method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a video processing method, a video playing device, video processing equipment and a storage medium.

Background

With the development of computer network technology, live video is more and more popular with users. Live video refers to a situation that a viewer can watch what happens on the spot of a main broadcast through the internet, such as live singing, live shopping, live cooking and the like.

When some viewers do not have the urgency to watch the live video or want to watch the live content again, the playback video of the live video needs to be watched. In the related art, the playback video is a video that is generated after the live broadcast is finished and has the same duration and content as the live broadcast, and a user can directly watch the playback video.

In the above related art, when the live time is long, the playback video is also long, which results in large traffic consumption and high time cost when the playback video is viewed.

Disclosure of Invention

The embodiment of the application provides a video processing method, a video playing device, video processing equipment and a storage medium, which can be used for solving the problems of high flow consumption and high time cost in video playback watching caused by long playback video in the related technology. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a video processing method, where the method includes:

acquiring a recorded video of live video;

performing content identification on the recorded video to obtain n video segments, wherein n is a positive integer;

determining selection weights corresponding to the n video segments respectively, wherein the selection weights are used for indicating the selection priority of the video segments;

selecting m video clips from the n video clips according to the selection weights corresponding to the n video clips respectively, wherein m is a positive integer less than or equal to n;

and generating the playback video of the live video according to the m video clips.

On the other hand, an embodiment of the present application provides a video playing method, where the method includes:

displaying a focus interface of a video live broadcast application;

displaying information items of at least one anchor user account concerned by the target user account in the attention interface;

receiving a playback video playing instruction in an information item corresponding to a target anchor user account;

responding to the playback video playing instruction, and playing the playback video of the target live user account;

wherein the playback video is a portion of video content extracted from a recorded video of a live video of the target anchor user account.

In another aspect, an embodiment of the present application provides a video processing apparatus, where the apparatus includes:

the video acquisition module is used for acquiring a recorded video of live video;

the content identification module is used for carrying out content identification on the recorded video to obtain n video segments, wherein n is a positive integer;

the weight determining module is used for determining selection weights corresponding to the n video segments respectively, and the selection weights are used for indicating the selection priority of the video segments;

a segment selection module, configured to select m video segments from the n video segments according to selection weights corresponding to the n video segments, where m is a positive integer smaller than or equal to n;

In another aspect, an embodiment of the present application provides a video playing apparatus, where the apparatus includes:

the interface display module is used for displaying a focus interface of the video live broadcast application;

the item display module is used for displaying information items of at least one anchor user account concerned by the target user account in the concerned interface;

the instruction receiving module is used for receiving a playback video playing instruction in an information item corresponding to the target anchor user account;

the video playing module is used for responding to the playback video playing instruction and playing the playback video of the target live user account;

In yet another aspect, an embodiment of the present application provides a computer device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the video processing method according to the above aspect, or implement the video playing method according to the above aspect.

In yet another aspect, embodiments of the present application provide a computer-readable storage medium, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the video processing method according to the above aspect, or to implement the video playing method according to the above aspect.

In a further aspect, an embodiment of the present application provides a computer program product, which is configured to, when executed by a processor, implement the video processing method described above, or implement the video playing method described above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

performing content identification on a recorded video of live video to obtain a plurality of video segments; and selecting some video clips from the plurality of video clips to generate the playback video of the live video according to the selection weights corresponding to the plurality of video clips respectively. Compared with the related art, the playback video of the live video is longer, so that the flow consumption is higher and the time cost is higher when the playback video is watched. According to the technical scheme provided by the embodiment of the application, some segments in the live video are selected to generate the playback video of the live video, so that the duration of the playback video is shortened, the flow consumed when the playback video is watched is reduced, and the time cost is reduced.

Drawings

FIG. 1 is a schematic illustration of an implementation environment provided by one embodiment of the present application;

fig. 2 is a flowchart of a video processing method according to an embodiment of the present application;

fig. 3 is a flowchart of a video processing method according to another embodiment of the present application;

FIG. 4 illustrates a diagram of content recognition;

FIG. 5 is a diagram illustrating transition animation weights;

fig. 6 is a flowchart of a video playing method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a focus interface;

FIG. 8 illustrates a flow chart for playing back a video display;

fig. 9 is a block diagram of a video processing apparatus provided by an embodiment of the present application;

fig. 10 is a block diagram of a video processing apparatus according to another embodiment of the present application;

fig. 11 is a block diagram of a video playback device according to an embodiment of the present application;

fig. 12 is a block diagram of a video playback device according to another embodiment of the present application;

fig. 13 is a block diagram of a terminal according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

AI (Artificial Intelligence) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

CV (Computer Vision) Computer Vision is a science for researching how to make a machine "see", and further refers to using a camera and a Computer to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also includes common biometric technologies such as face Recognition, fingerprint Recognition, and the like.

Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

NL (Natural Language processing) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

ML (Machine Learning) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to artificial intelligence CV, voice technology, NL, ML and other technologies, and content recognition of the recorded video is completed by adopting the technologies.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the present application is shown. The implementation environment may include: a terminal 10 and a server 20.

The terminal 10 is installed with a client running a target application, and the client is a client with video playing capability and is used for playing a live video recorded in a live broadcast room by a main broadcast. The client can be a social application client, an instant messaging application client, a live application client, and the like.

Optionally, the client further has a video capture capability, and is configured to host a live video recorded in the live broadcast room and send the live video to the server 20.

The terminal 10 may be an electronic device such as a mobile phone, a tablet Computer, a PC (Personal Computer), an MP3 player (Moving Picture Experts Group Audio Layer iii, mpeg Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer iv, mpeg Audio Layer 4), and the like.

The server 20 may be a server, a server cluster composed of a plurality of servers, or a cloud computing service center. The server 20 may communicate with the terminal 10 through a wired or wireless network.

The technical solution of the present application will be described below by means of several embodiments.

Referring to fig. 2, a flowchart of a video processing method according to an embodiment of the present application is shown. In the present embodiment, the method is mainly illustrated as being applied to a terminal or a server (referred to as a computer device in the present embodiment) in the implementation environment shown in fig. 1. The method may include the steps of:

step 201, acquiring a recorded video of live video.

The live video broadcast refers to a video recorded in a live broadcast room by a main broadcast, and audiences can receive events occurring on the site of the main broadcast through the internet, such as live singing, live shopping and live cooking.

In order to ensure that some audiences do not want to watch live broadcast or want to watch live broadcast content again, the live broadcast of the video can be recorded in real time to generate recorded video of the live broadcast of the video. The duration and the content of the recorded video of the live video and the live video are completely the same. After the recorded video of the live video is generated, the user can directly watch the recorded video.

The computer equipment can obtain the recorded video of the live video from the video recording equipment. Optionally, the video recording device refers to an electronic device with a video recording function, such as a mobile phone, a video recorder, a video camera, and the like, which is not limited in this embodiment of the present application.

Step 202, performing content identification on the recorded video to obtain n video segments, wherein n is a positive integer.

After the recorded video is acquired, content identification can be performed on the recorded video, and n video segments are obtained based on the content identification result. The video clip refers to a video including a part of content in a recorded video.

The content identification is used to identify content included in the recorded video, which may include songs, clips, talent shows, viewers leave messages, and so on. In addition, some other contents may be included, and the embodiment of the present application is not limited thereto.

Step 203, determining the selection weights corresponding to the n video segments respectively.

After obtaining the n video segments, the selection weight corresponding to each video segment may be further determined.

The selection weight is used to indicate the selection priority of the video segment. Taking the target video segment as an example, the higher the selection priority of the target video segment is, the higher the possibility that the target video is selected for generating the playback video of the live video is.

And 204, selecting m video segments from the n video segments according to the selection weights corresponding to the n video segments respectively, wherein m is a positive integer less than or equal to n.

After the weights corresponding to the n video segments are obtained, m video segments can be selected from the n video segments based on the selection weights.

How to select m video segments from the n video segments according to the selection weights corresponding to the n video segments is described in detail in the following embodiment of fig. 3, and details thereof are not repeated here.

And step 205, generating a playback video of the live video according to the m video clips.

After the m video segments are determined, a live playback video can be generated based on the m video segments.

In summary, according to the technical scheme provided by the embodiment of the application, a plurality of video clips are obtained by performing content identification on a recorded video of live video; and selecting some video clips from the plurality of video clips to generate the playback video of the live video according to the selection weights corresponding to the plurality of video clips respectively. Compared with the related art, the longer playback video of the live video results in larger flow consumption and higher time cost when the playback video is watched. According to the technical scheme provided by the embodiment of the application, some segments in the live video are selected to generate the playback video of the live video, so that the duration of the playback video is shortened, the flow consumed when the playback video is watched is reduced, and the time cost is reduced.

Referring to fig. 3, a flow chart of a video processing method according to another embodiment of the present application is shown. In the present embodiment, the method is mainly exemplified by being applied to a terminal or a server (referred to as a computer device in the present embodiment) of the implementation environment shown in fig. 1. The method may include the steps of:

step 301, acquiring a recorded video of live video.

This step is the same as or similar to the content of step 201 in the embodiment of fig. 2, and is not described here again.

Optionally, the recorded video may be recorded by a main broadcast, and after the main broadcast is recorded and the recorded video is obtained, the recorded video may be distributed to a network, for example, to a video live broadcast application through a main broadcast user account. In some other embodiments, the recorded video may also be recorded by a user while watching a live video and distributed to a network.

In addition, the computer device may directly obtain the recorded video from a network, and may also obtain the recorded video from a local storage, which is not limited in this embodiment of the application.

Step 302, performing content identification on the recorded video to obtain n video segments, where n is a positive integer.

This step is the same as or similar to the step 202 in the embodiment of fig. 2, and is not described here again.

Optionally, the content identification of the recorded video to obtain n video segments may include: identifying the starting time and the ending time of the song from the recorded video, and extracting a video segment between the starting time and the ending time of the song to obtain a song video segment; or identifying the starting time and the ending time of the applause from the recorded video, and extracting the video segment between the starting time and the ending time of the applause to obtain a segment sub-video segment; or identifying the starting time and the ending time of the message from the recorded video, and extracting the video segment between the starting time and the ending time of the message to obtain a message video segment; or identifying the starting time and the ending time of the microphone connection from the recorded video, and extracting the video segment between the starting time and the ending time of the microphone connection to obtain a message-leaving video segment; or identifying the starting time and the ending time of the gift sending from the recorded video, and extracting a video segment between the starting time and the ending time of the gift sending to obtain a gift sending video segment; or identifying the starting time and the ending time of the cheering sound from the recorded video, and extracting the video segment between the starting time and the ending time of the cheering sound to obtain the game winning video segment. In addition, the video clips can also include talent video clips (such as dancing video clips, musical instrument performance video clips, oral skill video clips and the like). The embodiment of the present application does not limit the types of video clips included in the recorded video.

Optionally, for each video segment, it may contain multiple video segments, which may be ordered in chronological order of occurrence in the recorded video.

Illustratively, as shown in fig. 4, a diagram schematically illustrating content recognition is shown. Fig. 4(a) illustrates a schematic diagram of recognition of a song video clip, fig. 4(b) illustrates a schematic diagram of recognition of a clip sub-video clip, fig. 4(c) illustrates a schematic diagram of recognition of a message video clip, fig. 4(d) illustrates a schematic diagram of recognition of a microphone video clip, and fig. 4(e) illustrates a schematic diagram of recognition of a gift video clip.

Alternatively, CV, speech technology, NL, ML and other technologies may be used for the content recognition, which is not limited in the embodiments of the present application.

Step 303, for the ith video segment of the n video segments, obtaining the label information of the ith video segment.

After the n video clips are obtained, the tag information of each video clip can be acquired. The above-mentioned label information is used for characterizing the video segment. The tag information includes a content tag indicating a content classification corresponding to the video segment, for example, the content tag may include at least one of the following: songs, chapters, messages, gifts, game wins and talents.

When the content tag is a song, the video clip is represented as a song video clip; when the content tag is a segment, the video segment is represented as a segment sub video segment; when the content label is a message, the video frequency band is represented as a message video segment; when the content label is a gift, the video clip is represented as a gift sending video clip; when the content label is a game victory, the video frequency band is a game victory video segment; and so on.

And 304, determining a selection weight corresponding to the ith video clip according to the label information of the ith video clip, wherein i is a positive integer less than or equal to n.

Then, the selection weight corresponding to each segment can be determined according to the label information of each segment. The selection weight is used to indicate the selection priority of the video segment.

Optionally, the tag information may include k tags, where k is an integer greater than 1. The k tags include at least one of the following in addition to the content tag: time period label, usage label, time label, duration label. The time slot tag is used to indicate the time slot of the video segment in the recorded video, and for example, the time slot tag may include at least one of the following: a start live session, a warm-up session, a plateau session, a climax session, and an end live session. The usage label is used to indicate the possibility of using a video segment to generate the playback video, for example, the usage label of a certain video segment may be 100%, which means that the video segment is determined to be used to generate the playback video; as another example, the usage label for another video clip may be 50%, indicating that the video clip has a 50% probability of being used to generate playback video. The time stamp is used to indicate the time when the video segment ends in the recorded video, for example, the time stamp of a certain video segment is 00:20:12, which indicates that the video segment ends at 00:20:12 of the recorded video. The duration label is used to indicate the duration of the video segment, and may be, for example, 10 minutes, 5 minutes, 2 minutes, and so on.

In this case, the determining the selection weight corresponding to the ith video segment according to the tag information of the ith video segment includes the following steps:

(1) and acquiring the weight scores respectively corresponding to the k labels of the ith video clip.

For the ith video segment, which may include k tags, the weight scores corresponding to the respective tags may be obtained.

(2) And determining the selection weight corresponding to the ith video clip according to the weight scores corresponding to the k labels respectively.

After the weight scores corresponding to the tags are obtained, further, the selection weight corresponding to the ith video clip can be obtained according to the weight scores corresponding to the tags.

Optionally, the weight scores corresponding to the k labels may be directly summed to serve as a selection weight corresponding to the ith video segment; the weighting values corresponding to the k labels can be weighted and summed to be used as the selection weight corresponding to the ith video clip; the weight scores corresponding to the k labels can be summed, and the average value of the sum is calculated as the selection weight corresponding to the ith video segment, and the like. The embodiment of the present application does not limit a specific manner of determining the selection weight corresponding to the ith video segment according to the weight scores corresponding to the k tags, respectively.

Optionally, in the step (2), obtaining the weight scores corresponding to the k labels of the ith video segment respectively may include the following steps:

<1> for the content tag, virtual item information corresponding to the ith video clip is acquired.

The virtual article information is used for indicating a statistical result of the virtual article received by the anchor user in the live broadcast process of the ith video clip.

Alternatively, the statistics may be quantities or resource values (e.g., money amounts or other virtual resources representing value, such as coins, gold bullions, etc.).

The virtual object is an object given to a main broadcast by a viewer watching a live video, and the virtual object corresponds to the type and the number of the objects. The article type includes, but is not limited to, at least one of: virtual roses, virtual cakes, virtual sports cars, virtual airplanes, virtual stars, virtual lights, virtual rings, virtual watches, virtual high-heeled shoes, virtual love hearts, and the like. Alternatively, the number of items is calculated in terms of numbers, or in terms of converted virtual unit values.

For example, the viewer may send to the anchor a virtual item "nineteen roses" of the type "virtual roses" and a number of items of 99, which the viewer gives a gift to the anchor. For another example, the viewer transmits a virtual article "one sports car", the article type of which is "virtual sports car", and the number of articles is 1. If the virtual unit value is a score, one rose is equal to two points, and one sports car is equal to fifty, then the quantity of the items can also be calculated according to the converted score.

And< 2> determining a weight score corresponding to the content tag of the ith video clip according to the virtual article information corresponding to the ith video clip.

After the virtual item information corresponding to the ith video segment is acquired, the weight score corresponding to the content tag of the ith video segment may be determined based on the virtual item information corresponding to the ith video segment.

It should be noted that when k is 1, it indicates that the ith video segment has only one tag, that is, a content tag, in this case, the virtual item information corresponding to the ith video segment may also be obtained, and then the weight score corresponding to the content tag of the ith video segment is determined according to the virtual item information corresponding to the ith video segment.

When only the video segment associated with the anchor is included in the recorded video, after step 304 described above, the following step 305 may be performed.

And 305, selecting m video clips from the n video clips according to the sequence of the selection weights from large to small.

After the selection weight corresponding to the ith video segment is determined, m video segments can be selected from the n video segments according to the sequence of the selection weights from large to small, for example, the previous m video segments can be rearranged by the selection weight.

When the video segment related to the anchor and the video segment related to the user are included in the recorded video, after the above step 304, the following steps 306 and 307 may be performed.

Step 306, obtaining the attribute information corresponding to the n video segments respectively.

The attribute information is used to indicate whether the video segment belongs to the anchor video segment or the user video segment. A anchor video segment refers to a video segment associated with an anchor, such as a song video segment, a segment sub-video segment, an talent video segment, a game winning video segment, and so forth. The user video clip refers to a video clip related to a user (audience), such as a message video clip, a Lianmai video clip, a gift-offering video clip, and the like.

And 307, selecting at least one video segment belonging to the anchor video segment from the n video segments according to the selection weight and the selection proportion of the anchor video segment and the user video segment, and selecting at least one video segment belonging to the user video segment to obtain m video segments.

When the n video segments include a main video segment and a user video segment, further, m video segments may be selected from the n video segments according to the selection weight based on the selection ratio of the main video segment and the user video segment, where the m video segments include at least one video segment belonging to the main video segment, and at least one video segment belonging to the user video segment.

Optionally, the selection ratio may be set according to an actual situation, and this is not limited in the embodiment of the present application. For example, the main video segment and the user video segment are selected at a ratio of 4: 1. Then, based on the selection ratio, the first a video segments with the highest selection weight in the video segments belonging to the anchor and the first b video segments with the highest selection weight in the video segments belonging to the users can be selected, and the ratio of a to b is 4: 1.

Alternatively, the user video clip may be a video clip of a target user, so that the final playback video includes the anchor and the target user.

Illustratively, the target user may submit his/her face image to a computer device (including a terminal and a server), and then the computer device may identify, from the recorded video, the anchor video segment and the user video segment of the target user based on the face image, and cannot identify the user video segments of other users, so that the generated playback video further includes only the anchor video segment and the user video segment of the target user, thereby further realizing customized playback video generation.

And 308, respectively intercepting one video sub-segment from the m video segments to obtain m video sub-segments.

After the m video segments are obtained, a video sub-segment can be respectively cut from the m video segments to obtain the m video sub-segments.

Alternatively, the duration of the video sub-segment may be determined according to the set duration of the playback video to be generated. For example, assuming that the set duration of playback video is 3 minutes, the duration of the video sub-segment may be 10 s; for another example, assuming that the set duration of the playback video is 4 minutes, the duration of the video sub-segment may be 20 s; for another example, assuming that the set duration of the playback video is 5 minutes, the duration of the video sub-segment may be 30 s.

And 309, splicing the m video sub-segments to generate a playback video of the live video.

After the m video sub-segments are obtained, splicing can be performed according to the time sequence of the m video segments in the recorded video, so that a playback video of the live video is generated.

Optionally, the splicing the m video sub-segments to generate a playback video of the live video may include the following steps:

(1) selecting at least one transition animation from a transition animation library;

(2) and splicing the m video sub-segments and the at least one transition animation to generate a playback video of the live video.

The transition animation is used for connecting two adjacent video sub-segments, and the spliced playback video can be smoother by using the transition animation in the transition animation library.

Optionally, the transition animation library may include at least one of the following transition animations: fade-in, fade-out, disappearance of the middle, gradual change, amplification, signal interference effect and card transition. Wherein, fade-in means that the picture becomes gradually clear and bright from dark blur to completely reveal, indicating the beginning of a paragraph in the event development, which fade-in can be used before the first video sub-segment in the m video sub-segments ordered in time sequence. Fade-out, which means that the picture gradually becomes blurred and dimmed from sharp brightness to completely disappear, represents the end of a paragraph in the event development, can be used after m are the last video sub-segments in the video sub-segments ordered in time order.

Optionally, the at least one animation transition corresponds to a usage weight respectively, and the usage weight is used for indicating the usage priority of the animation transition. For transition animations used between other video sub-segments of the m video sub-segments except the first video sub-segment and the last video sub-segment, the transition animations can be selected according to the use weights of the transition animations.

Illustratively, as shown in FIG. 5, the usage weights of the transition animation may include C-1, C-2, C-3, and C-4, wherein the usage weights sequentially increase in the order of C-1, C-2, C-3, and C-4.

In summary, according to the technical scheme provided by the embodiment of the application, a plurality of video clips are obtained by performing content identification on a recorded video of live video; and selecting some video clips from the plurality of video clips to generate the playback video of the live video according to the selection weights corresponding to the plurality of video clips respectively. Compared with the related art, the playback video of the live video is longer, so that the flow consumption is higher and the time cost is higher when the playback video is watched. According to the technical scheme provided by the embodiment of the application, some segments in the live video are selected to generate the playback video of the live video, so that the duration of the playback video is shortened, the flow consumed when the playback video is watched is reduced, and the time cost is reduced.

In addition, as the played back video more prominently displays the selected content in the recorded video, the value of the played back content is greatly improved, the stickiness of the user to the video playing back function is improved, the distance between the anchor and the user is shortened, the user activity is kept and the like.

In addition, when the recorded video comprises the anchor video segment and the user video segment, the selection can be carried out according to a certain proportion, so that the finally generated playback video can be ensured, and the anchor video segment and the user video segment can also be included, so that the playback video is closer to the recorded video.

In addition, when the playback video is generated by splicing the video clips, the spliced playback video can be smoother by using the transition animation in the transition animation library.

Referring to fig. 6, a flowchart of a video playing method according to an embodiment of the present application is shown. In the present embodiment, the method is mainly applied to the terminal in the implementation environment shown in fig. 1 for illustration. The method may include the steps of:

step 601, displaying a focus interface of the video live broadcast application.

The terminal can be installed with a target application, and the target application is an application with a video playing function and is used for playing a video recorded in a live broadcast room by a main broadcast. The target application may be a live video application, a social application client, an instant messaging application client, and the like, which is not limited in this embodiment of the present application.

The terminal can display a focus interface of the live video application. The focus interface is a dynamic user interface used for displaying a focus anchor user account in a video live broadcast application.

Step 602, displaying information items of at least one anchor user account focused by the target user account in the focus interface.

The target user account refers to a user account logged in the target application. The user may focus on the at least one anchor user account through the target user account, such that information entries for the at least one anchor user account focused on by the target user account may be displayed in a focus interface.

The information entry refers to information published by the anchor user account, and the information may be a video, a text, or a combination of a video and a text, which is not limited in the embodiment of the present application. The video can be live video recorded and released by the anchor user account.

Illustratively, as shown in FIG. 7, a schematic view of one type of focus interface is illustratively shown. The focus interface 70 displays information items of two anchor user accounts focused by the target user account, for example, one of the anchor user accounts is "sponge baby", and the information item of the anchor user account is "video a"; the other anchor user account is "Pai Daxing", and the information entry for the anchor user account is "video B".

Step 603, receiving a playback video playing instruction in the information entry corresponding to the target anchor user account.

After that, the terminal may receive a playback video playing instruction in the information entry corresponding to the target anchor user account, where the playback video playing instruction is used to instruct to play the playback video in the information entry of the target anchor user account.

Step 604, responding to a playback video playing instruction, and playing a playback video of the target live user account; the playback video is part of video content extracted from a recorded video of a live video of the target anchor user account.

As to how to obtain the playback video of the recorded video, reference may be made to the description of the embodiments in fig. 2 and fig. 3, and details are not repeated here.

Optionally, the focus interface further includes a playback video setting control, where the playback video setting control is used to trigger display of the playback video setting interface.

In this case, the terminal may further perform the following steps:

(1) a trigger signal corresponding to a playback video setting control in the focus interface is received.

The user can click the playback video setting control in the above-mentioned focus interface, and correspondingly, the terminal can receive a trigger signal corresponding to the playback video setting control.

(2) In response to the trigger signal, a playback video setting interface is displayed.

After receiving the trigger signal, the terminal may display a playback video setting interface, where the playback video setting interface is used to set a playback video.

Optionally, the playback video setting interface may be displayed in a form of a pop-up window on an upper layer of the attention interface. In some other embodiments, the display device may also be shown in other forms, which are not limited by the embodiments of the present application.

The playback video setting interface comprises a playback opening control and a duration setting control, the playback opening control is used for opening the playback video, and the duration setting control is used for setting the duration of the playback video.

Optionally, the duration setting control may include a plurality of controls, and each of the duration setting controls is configured to set the playback duration to be one duration.

For example, the duration setting control includes 3 controls, such as "3 minutes", "4 minutes", and "5 minutes", wherein if the user selects the duration setting control "3 minutes", the playback videos displayed in the above-mentioned focus interface are all 3 minutes; if the user selects a time length setting control of 4 minutes, the playback videos displayed in the attention interface are all 4 minutes; if the user selects the time length setting control of "5 minutes", the playback videos displayed in the above-mentioned focus interface are all 5 minutes.

Illustratively, as shown in fig. 8, the above-mentioned attention interface 70 further includes a playback video setting control 71, such as "smart playback", a user may click the playback video setting control 71, and correspondingly, the terminal receives a trigger signal corresponding to the playback video setting control 71 and displays a playback video setting interface 72, where the playback video setting interface 72 includes a playback start control 73 and a duration setting control 74, the user may start a playback video function through the playback start control 73, and set a duration of a playback video through the duration setting control 74; the playback video 75 may then be displayed in the attention interface 70 described above.

To sum up, according to the technical solution provided by the embodiment of the present application, after receiving a playback video playing instruction in an information entry corresponding to a target anchor user account, playing a playback video of the target live user account; wherein the playback video is a portion of video content extracted from a recorded video of a live video of the target anchor user account. Compared with the related art, the playback video of the live video is longer, so that the flow consumption is higher and the time cost is higher when the playback video is watched. According to the technical scheme provided by the embodiment of the application, some segments in the live video are selected to generate the playback video of the live video, so that the duration of the playback video is shortened, the flow consumed when the playback video is watched is reduced, and the time cost is reduced.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 9, a block diagram of a video processing apparatus according to an embodiment of the present application is shown. The device has the functions of realizing the video processing method example, and the functions can be realized by hardware or by hardware executing corresponding software. The device may be the computer device described above, or may be provided on a computer device. The apparatus 900 may include: a video acquisition module 910, a content identification module 920, a weight determination module 930, a segment selection module 940, and a video generation module 950.

The video obtaining module 910 is configured to obtain a recorded video of a live video.

A content identification module 920, configured to perform content identification on the recorded video to obtain n video segments, where n is a positive integer.

A weight determining module 930, configured to determine selection weights corresponding to the n video segments, where the selection weights are used to indicate selection priorities of the video segments.

A segment selecting module 940, configured to select m video segments from the n video segments according to the selection weights corresponding to the n video segments, where m is a positive integer smaller than or equal to n.

A video generating module 950, configured to generate a playback video of the live video according to the m video segments.

In some possible designs, as shown in fig. 10, the weight determining module 930 includes: an information acquisition unit 931, and a weight determination unit 932.

An information obtaining unit 931, configured to obtain, for an ith video segment of the n video segments, tag information of the ith video segment.

A weight determining unit 932, configured to determine a selection weight corresponding to the ith video segment according to the tag information of the ith video segment, where i is a positive integer less than or equal to n.

In some possible designs, the tag information includes k tags, where k is an integer greater than 1;

the weight determining unit 932 is configured to obtain weight scores corresponding to the k labels of the ith video segment respectively; and determining the selection weight corresponding to the ith video segment according to the weight scores corresponding to the k labels respectively.

In some possible designs, the tag information includes a content tag; the weight determining unit 932 is configured to obtain, for the content tag, virtual item information corresponding to the ith video segment, where the virtual item information is used to indicate a statistical result of a virtual item received by a anchor user in a live broadcast process of the ith video segment; and determining a weight score corresponding to the content tag of the ith video clip according to the virtual article information corresponding to the ith video clip.

In some possible designs, the section selecting module 940 is configured to select the m video sections from the n video sections in an order from large to small according to the selection weight.

In some possible designs, as shown in fig. 10, the segment selection module 940 includes: an attribute acquisition unit 941 and a fragment selection unit 942.

An attribute obtaining unit 941, configured to obtain attribute information corresponding to each of the n video clips, where the attribute information is used to indicate whether the video clip belongs to an anchor video clip or a user video clip.

A segment selecting unit 942, configured to select, according to the selection ratio between the anchor video segment and the user video segment, at least one video segment belonging to the anchor video segment from the n video segments according to the selection weight, and select at least one video segment belonging to the user video segment, so as to obtain the m video segments.

In some possible designs, as shown in fig. 10, the video generation module 950 includes:

a section clipping unit 950, configured to respectively clip one video sub-section from the m video sections to obtain m video sub-sections.

And a video generating unit 951, configured to splice the m video sub-segments to generate a playback video of the live video.

In some possible designs, the video generation unit 951 is configured to select at least one transition animation from a transition animation library; splicing the m video sub-segments and the at least one transition animation to generate a playback video of the live video; wherein, the transition animation is used for connecting two adjacent video sub-segments.

In some possible designs, the content identification module 920 is configured to identify a start time and an end time of a song from the recorded video, and extract a video segment between the start time and the end time of the song to obtain a song video segment; or identifying the starting time and the ending time of the applause from the recorded video, and extracting a video segment between the starting time and the ending time of the applause to obtain a segment sub-video segment; or identifying the starting time and the ending time of the message from the recorded video, and extracting the video segment between the starting time and the ending time of the message to obtain a message video segment; or identifying the starting time and the ending time of wheat connection from the recorded video, and extracting the video segment between the starting time and the ending time of the wheat connection to obtain the wheat connection video segment.

Referring to fig. 11, a block diagram of a video playback device according to an embodiment of the present application is shown. The device has the function of realizing the video playing method example, and the function can be realized by hardware or by hardware executing corresponding software. The device may be the terminal described above, or may be provided on the terminal. The apparatus 1100 may include: an interface display module 1110, an item display module 1120, an instruction receiving module 1130, and a video playing module 1140.

An interface display module 1110, configured to display a focus interface of the video live broadcast application.

An entry display module 1120, configured to display, in the focus interface, information entries of at least one anchor user account focused on by the target user account.

An instruction receiving module 1130, configured to receive a playback video playing instruction in an information entry corresponding to the target anchor user account.

A video playing module 1140, configured to play the playback video of the target live user account in response to the playback video playing instruction; wherein the playback video is a portion of video content extracted from a recorded video of a live video of the target anchor user account.

In some possible designs, as shown in fig. 12, the apparatus further comprises: a signal receiving module 1150 and a setup display module 1160.

A signal receiving module 1150, configured to receive a trigger signal corresponding to a playback video setting control in the attention interface.

A setting display module 1160, configured to display a playback video setting interface in response to the trigger signal; the playback video setting interface comprises a playback starting control and a duration setting control, the playback starting control is used for starting the playback video, and the duration setting control is used for setting the duration of the playback video.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 13, a block diagram of a terminal according to an embodiment of the present application is shown. In general, terminal 1300 includes: a processor 1301 and a memory 1302.

Processor 1301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (field Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1301 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, processor 1301 may further include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 1302 may include one or more computer-readable storage media, which may be non-transitory. The memory 1302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1302 is used to store at least one instruction, at least one program, code set, or set of instructions for execution by the processor 1301 to implement the video processing method provided by the method embodiments herein, or to implement the video playback method described above.

In some embodiments, terminal 1300 may further optionally include: a peripheral interface 1303 and at least one peripheral. Processor 1301, memory 1302, and peripheral interface 1303 may be connected by a bus or signal line. Each peripheral device may be connected to the peripheral device interface 1303 via a bus, signal line, or circuit board. Specifically, the peripheral device may include: at least one of a communication interface 1304, a display screen 1305, an audio circuit 1306, a camera assembly 1307, a positioning assembly 1308, and a power supply 1309.

Those skilled in the art will appreciate that the configuration shown in fig. 13 is not intended to be limiting with respect to terminal 1300 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Referring to fig. 14, a schematic structural diagram of a server according to an embodiment of the present application is shown. Specifically, the method comprises the following steps:

the server 1400 includes a CPU (Central Processing Unit) 1401, a system Memory 1404 including a RAM (Random Access Memory) 1402 and a ROM (Read Only Memory) 1403, and a system bus 1405 connecting the system Memory 1404 and the Central Processing Unit 1401. The server 1400 also includes a basic I/O (Input/Output) system 1406 that facilitates transfer of information between devices within the computer, and a mass storage device 1407 for storing an operating system 1413, application programs 1414, and other program modules 1412.

The basic input/output system 1406 includes a display 1408 for displaying information and an input device 1409, such as a mouse, keyboard, etc., for user input of information. Wherein the display 1408 and input device 1409 are both connected to the central processing unit 1401 via an input-output controller 1410 connected to the system bus 1405. The basic input/output system 1406 may also include an input/output controller 1410 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1410 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1407 is connected to the central processing unit 1401 through a mass storage controller (not shown) connected to the system bus 1405. The mass storage device 1407 and its associated computer-readable media provide non-volatile storage for the server 1400. That is, the mass storage device 1407 may include a computer readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1404 and mass storage device 1407 described above may collectively be referred to as memory.

The server 1400 may also operate in accordance with various embodiments of the present application by connecting to remote computers over a network, such as the internet. That is, the server 1400 may be connected to the network 1412 through the network interface unit 1411 coupled to the system bus 1405, or the network interface unit 1411 may be used to connect to other types of networks or remote computer systems (not shown).

The memory also includes at least one instruction, at least one program, set of codes, or set of instructions stored in the memory and configured to be executed by one or more processors to implement the video processing method described above.

In an exemplary embodiment, a computer device is also provided. The computer device may be a terminal or a server. The computer device comprises a processor and a memory, wherein at least one instruction, at least one program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to realize the video processing method or realize the video playing method.

In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which when executed by a processor implements the above-mentioned video processing method, or implements the above-mentioned video playing method.

In an exemplary embodiment, a computer program product is also provided, which, when being executed by a processor, is adapted to carry out the above-mentioned video processing method or to carry out the above-mentioned video playing method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of video processing, the method comprising:

acquiring a recorded video of live video;

2. The method according to claim 1, wherein the determining the selection weights corresponding to the n video segments comprises:

for the ith video clip in the n video clips, acquiring the label information of the ith video clip;

and determining the selection weight corresponding to the ith video clip according to the label information of the ith video clip, wherein i is a positive integer less than or equal to n.

3. The method of claim 2, wherein the tag information comprises k tags, wherein k is an integer greater than 1;

the determining the selection weight corresponding to the ith video clip according to the tag information of the ith video clip includes:

acquiring weight scores respectively corresponding to k labels of the ith video clip;

and determining the selection weight corresponding to the ith video segment according to the weight scores corresponding to the k labels respectively.

4. The method of claim 3, wherein the tag information comprises a content tag;

the obtaining of the weight scores corresponding to the k labels of the ith video segment includes:

for the content tag, acquiring virtual article information corresponding to the ith video clip, wherein the virtual article information is used for indicating a statistical result of a virtual article received by an anchor user in the live broadcast process of the ith video clip;

and determining a weight score corresponding to the content tag of the ith video clip according to the virtual article information corresponding to the ith video clip.

5. The method according to claim 1, wherein said selecting m video segments from the n video segments according to the selection weights respectively corresponding to the n video segments comprises:

and selecting the m video clips from the n video clips according to the sequence of the selection weights from large to small.

6. The method according to claim 1, wherein said selecting m video segments from the n video segments according to the selection weights respectively corresponding to the n video segments comprises:

acquiring attribute information corresponding to the n video clips respectively, wherein the attribute information is used for indicating whether the video clip belongs to a main broadcasting video clip or a user video clip;

and according to the selection proportion of the anchor video clip and the user video clip, selecting at least one video clip belonging to the anchor video clip from the n video clips according to the selection weight, and selecting at least one video clip belonging to the user video clip to obtain the m video clips.

7. The method of claim 1, wherein the generating the playback video of the live video from the m video segments comprises:

respectively intercepting one video sub-segment from the m video segments to obtain m video sub-segments;

and splicing the m video sub-segments to generate a playback video of the live video.

8. The method of claim 7, wherein the splicing the m video sub-segments to generate the live playback video comprises:

selecting at least one transition animation from a transition animation library;

splicing the m video sub-segments and the at least one transition animation to generate a playback video of the live video;

wherein, the transition animation is used for connecting two adjacent video sub-segments.

9. The method according to any one of claims 1 to 8, wherein the content recognition of the recorded video to obtain n video segments comprises:

identifying the starting time and the ending time of a song from the recorded video, and extracting a video segment between the starting time and the ending time of the song to obtain a song video segment;

or,

identifying the starting time and the ending time of the applause from the recorded video, and extracting a video segment between the starting time and the ending time of the applause to obtain a segment sub-video segment;

or,

identifying the starting time and the ending time of a message from the recorded video, and extracting a video segment between the starting time and the ending time of the message to obtain a message video segment;

or,

and identifying the starting time and the ending time of the continuous microphone from the recorded video, and extracting the video segment between the starting time and the ending time of the continuous microphone to obtain the continuous microphone video segment.

10. A video playback method, the method comprising:

displaying a focus interface of a video live broadcast application;

11. The method of claim 10, further comprising:

receiving a trigger signal corresponding to a playback video setting control in the attention interface;

displaying a playback video setting interface in response to the trigger signal;

the playback video setting interface comprises a playback starting control and a duration setting control, the playback starting control is used for starting the playback video, and the duration setting control is used for setting the duration of the playback video.

12. A video processing apparatus, characterized in that the apparatus comprises:

and the video generation module is used for generating the playback video of the live video according to the m video clips.

13. A video playback apparatus, comprising:

14. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of any one of claims 1 to 9 or to implement the method of claim 10 or 11.

15. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of any one of claims 1 to 9 or to implement the method of claim 10 or 11.