WO2020199303A1 - 直播视频集锦的生成方法、装置、服务器及存储介质 - Google Patents

直播视频集锦的生成方法、装置、服务器及存储介质 Download PDF

Info

Publication number
WO2020199303A1
WO2020199303A1 PCT/CN2019/086049 CN2019086049W WO2020199303A1 WO 2020199303 A1 WO2020199303 A1 WO 2020199303A1 CN 2019086049 W CN2019086049 W CN 2019086049W WO 2020199303 A1 WO2020199303 A1 WO 2020199303A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
live video
live
highlight
highlights
Prior art date
Application number
PCT/CN2019/086049
Other languages
English (en)
French (fr)
Inventor
郑振贵
孙磊
陈祥祥
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to EP19765403.1A priority Critical patent/EP3739888A1/en
Priority to US16/569,866 priority patent/US11025964B2/en
Publication of WO2020199303A1 publication Critical patent/WO2020199303A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2183Cache memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the embodiments of the present application relate to the field of video processing, and in particular to a method, device, server, and storage medium for generating live video highlights.
  • Online video live broadcast is a live broadcast service using network resources. Through live video shooting and uploading to the network simultaneously, users can see the first live consultation on the network at the same time.
  • This online live broadcast service is widely used in real-time press conferences, exhibitions, product releases, product introductions, sales live shows, online concerts, company receptions, business meetings, celebrations, performances, sports games, and games Webcast services such as competitions, securities analysis, and distance education.
  • the audience In the process of live video playback, the audience usually has a strong interest in the highlights in the live video. In order to allow the audience to watch the highlights in the live video repeatedly, they are usually edited offline manually after the live broadcast, and then broadcast on demand. Publish in a way for viewers to find and watch.
  • offline editing is performed after the live broadcast ends, so that the user can only perform on-demand broadcasting after the live broadcast ends, which affects the user's viewing experience.
  • a large amount of human resources are required to generate edited videos, which is inefficient and cannot meet the needs of the booming live broadcast industry.
  • the purpose of some embodiments of this application is to provide a method, device, server, and storage medium for generating live video highlights, so that users can watch live video highlights at the same time while watching live broadcasts, and realize automatic editing of live video highlights to improve editing effectiveness.
  • the embodiment of the present application provides a method for generating live video highlights, including: identifying the live video of the live video and determining whether there is a target image element in the live video; if the target image element exists in the live video, Then save a segment of the live video where the live screen is located as a highlight; when the composition conditions are met, the highlight is merged into the live video according to the composition conditions to obtain a highlight composite video; the output live video is switched to a highlight composite video.
  • the embodiment of the present application also provides a device for generating live video highlights, including: a first identification module, a storage module, a second identification module, a synthesis module, and an output module;
  • the first identification module is used to identify the live screen of the live video And judge whether there is a target image element in the live screen;
  • the storage module is used to recognize that there is a target image element in the live screen, and save a live video where the live screen is located as a highlight;
  • the second recognition module is used to identify whether the synthesis condition is met
  • the synthesis module is used to merge the highlights into the live video according to the synthesis conditions when the synthesis conditions are met to obtain the highlights synthesized video;
  • the output module is used to switch the output live video to the highlights synthesized video.
  • the embodiment of the present application also provides a server, including: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions that can be executed by at least one processor, and the instructions are The processor executes, so that at least one processor can execute the above-mentioned method for generating live video highlights.
  • the embodiment of the present application also provides a storage medium storing a computer program, and when the computer program is executed by a processor, the method for generating the live video collection is realized.
  • the server recognizes whether there is a target image element in the live screen.
  • the target image element it means that the live screen has the characteristics of a wonderful moment. Determine that the live segment where the live screen is located is a highlight; when the server recognizes that the synthesis conditions are met, the recognized and saved highlight will be synthesized into the live video to obtain the live video highlights, and output the synthesized live video highlights to enable users While watching the live broadcast, you can watch the highlights of the live video at the same time, which improves the timeliness of the highlight video playback. And in the editing process, there is no need to manually identify the highlights, saving a lot of human resources and solving the inefficiency of manual editing.
  • the highlights are merged into the live video to obtain the highlight composite video, which specifically includes: decode the highlight and decode the live video; merge the decoded highlight with the decoded live video to obtain the composite data; The data is re-encoded to obtain a highlight composite video. Save the video data in the encoding state when saving the highlights, and use the data in the encoding state for easy storage; when you need to merge the highlights with the live video, decode the highlights and re-encode after the merge, which guarantees the highlights and the live broadcast
  • the merging of videos can also compress the data, and the compressed data occupies less storage space, which facilitates the transmission of the highlight composite video and improves the transmission efficiency of the highlight composite video.
  • the composite data includes: video composite data and audio composite data, combining the decoded highlights with the decoded live video, specifically including: each video frame of the decoded highlight and each video of the decoded live video The frames are combined in a preset mode to obtain video synthesis data; the audio stream of the decoded highlight segment and the audio stream of the decoded live video are combined in a mixed form to obtain the audio synthesis data. In this way, the user can hear the audio of the highlight segment and see the picture of the highlight segment when watching the live video, so that the user can obtain all the information of the played video.
  • the live screen before judging whether there is a target image element in the live screen, it also includes: real-time buffering the live video data package of the live video; saving a live video where the live screen is located as a highlight, specifically including: saving the currently cached live video data The package is saved as a highlight. Doing so can save the storage space occupied by saving highlights.
  • the cached live video data packet before saving the cached live video data packet as a highlight, it also includes: determining whether the duration of the cached live video data packet exceeds a preset upper limit; if it exceeds the preset upper limit, discarding part of the cached live video data packet. Doing so can control the length of the highlights and reduce the memory required to save the highlights. Discarding part of the data can also make the highlights of the highlights more prominent, avoid redundant information occupying the user's energy, and facilitate users to watch the beautiful video clips.
  • discarding part of the data of the buffered live video data packet specifically includes: discarding the sequence of video frames in the currently buffered live video data packet according to the playback sequence of the live video data packet, from front to back, until all The playing duration of the buffered live video data packet after the discarding does not exceed the preset upper limit. This can ensure that the highlights are within the preset duration and the data length of the highlights is within the preset upper limit, thereby reducing the memory required to save the highlights.
  • the cached live video data packets are discarded from first to last, it can be guaranteed The timeliness of the highlights, and because the duration of the highlights after discarding the data is short, the subsequent operations of synthesizing with the live video can be reduced, thereby improving the efficiency of obtaining the synthesizing video of the highlights.
  • the live video data package saved as a highlight is specifically: a cached live video data package whose first video frame is a key frame.
  • the key frame here refers to the key frame in the video encoding. Making the first video frame of the highlight segment obtained as the key frame can ensure the normal progress of the highlight video decoding, synthesis, encoding, compression and switching actions, and avoid the synthesis of the highlight segment There is a problem with the playback of the highlight composite video.
  • the synthesis moment also includes: if the number of saved highlights is greater than one, then multiple highlights will be spliced into one highlight; the highlights will be merged into the live video, specifically: merge highlights into Live video. Doing so can enable users to watch wonderful clips obtained at different time points, give users continuous visual enjoyment, and improve users' viewing experience.
  • splicing multiple highlights into a highlight collection including: editing each highlight; performing time stamp repair on each highlight after editing; and fixing each highlight after time stamp repair in time stamp order Stitched into a wonderful collection. Doing so can ensure the continuity of the highlights, and avoid video freezes and rollbacks when playing videos.
  • switching the output live video to the highlight composite video includes: judging whether the time point of the switch is satisfied according to the time stamp of the live video; when the time point of the switching is met, the output video is switched from the live video to the highlight composite video . Doing so can ensure that the user does not miss the wonderful live broadcast, and at the same time, when the part of the live broadcast is not of interest to the user, it is switched to the highlight composite video, so as to achieve the purpose of switching the video at an accurate time.
  • the method further includes: determining the remaining duration of the highlight composite video; when the remaining duration is zero, switching the output highlight composite video to the live video. By doing so, you can switch to the live video after the highlight composite video is played.
  • the highlight composite video and the live video are seamlessly connected during the playback process to ensure the continuity of the user's watching video and improve the user's viewing experience.
  • the output live video to the highlight composite video after switching the output live video to the highlight composite video, it also includes: determining the remaining duration of the highlight composite video; when the remaining duration of the highlight composite video is zero, determining whether the first video frame of the live video is a key frame; When the first video frame of the live video is a key frame, switch the output highlight composite video to the live video. At the same time, it is satisfied that the remaining time of the highlight synthesis video is zero and the first video frame of the live video is the key frame, which can ensure that the switched video can be played normally without any playback problems such as data loss, stuttering, and rollback, and it can also ensure the highlight synthesis
  • the seamless connection between video and live video during the playback process allows users to watch the video continuously and improves the user's viewing experience.
  • recognizing the live screen and judging whether there are target image elements in the live screen specifically includes: recognizing the live screen according to the pre-established image recognition model and judging whether there are target image elements in the live screen; wherein the image recognition model is based on the captured image characteristics Get trained.
  • the pre-established image recognition model to identify the live screen and determine whether there is a target image element in the live screen, specifically including: obtaining the designated area in the live screen; according to the pre-established image recognition model, identifying whether there is a target image in the designated area Element; if the target image element exists in the designated area, it is determined that the target image element exists in the live screen. Doing so can reduce the amount of calculation of feature values.
  • identifying whether the target image element exists in the designated area includes: intercepting the target area in the designated area according to the size of the target image element; inputting the characteristic value of the target area into the image recognition model, and judging the target area according to the output result of the image recognition model Whether there is a target image element in the target area; if the target image element does not exist in the target area, the position of the target area is offset according to a preset strategy, and whether there is a target image element in the offset target area is identified. In this way, the accuracy of image recognition can be guaranteed, the calculation amount of feature value calculation can be reduced, and the calculation efficiency of feature value can be improved.
  • grayscale image of the designated area After acquiring the designated area in the live screen, it also includes acquiring the grayscale image of the designated area; according to the pre-established image recognition model, identify whether the target image element exists in the designated area, specifically according to the pre-established image recognition model, Whether the target image element exists in the grayscale image.
  • grayscale images Compared with color images, grayscale images have a higher degree of discrimination, which reduces the amount of feature value calculation while ensuring the accuracy of image recognition.
  • Fig. 1 is a flowchart of a method for generating live video highlights according to the first embodiment of the present application
  • FIG. 2 is a schematic diagram of the selection process of the target area in the first embodiment of the present application.
  • Fig. 3 is a flowchart of a method for generating live video highlights according to a second embodiment of the present application
  • FIG. 4 is a flowchart of a method for generating live video highlights according to a third embodiment of the present application.
  • Fig. 5 is a flowchart of a method for generating live video highlights according to a fourth embodiment of the present application.
  • FIG. 6 is a flowchart of a method for switching between live video and highlight clips according to the fourth embodiment of the present application.
  • FIG. 7 is a schematic diagram of a process of processing live video data by a server in a fourth embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an apparatus for live video collection according to a fifth embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a server in a sixth embodiment according to the present application.
  • the first embodiment of the present application relates to a method for generating live video highlights, including: identifying the live video of the live video and determining whether there is a target image element in the live video; if the target image element exists in the live video, then the location of the live video A segment of the live video is saved as a highlight; when the synthesis moment is identified, the highlight is merged into the live video according to the synthesis moment to obtain a highlight composite video; the output live video is switched to a highlight composite video.
  • the following specifically describes the implementation details of the method for generating live video collections of this embodiment. The following content is provided for ease of understanding and is not necessary for implementing this solution. The specific process is shown in Figure 1.
  • Step 101 Identify the live screen of the live video.
  • the server receives the live video, processes the received live video, and outputs the processed live video to the client for the client to play the live video so that the user can watch the live broadcast video.
  • the server processes the received live video, it decodes the received live video, and the decoded live video data is a sequence of video frames, and the live images can be recognized by timing or frame by frame.
  • the server can be a streaming media server at the streaming media server.
  • the streaming media server compresses continuous audio and video information and puts it on the network server for users to download and watch, which improves the user's viewing experience.
  • Step 102 Determine whether there is a target image element in the live screen. If the judgment result is yes, go to step 103, if the judgment result is no, go back to step 101.
  • the target image element in the live broadcast screen can be an identifier that can represent the video content in the live broadcast screen, for example, the words "kill”, "success” and other related icons or related icons appearing on the highlight moments in the game live broadcast. With the same or similar identification, it is determined that the target image element exists in the live broadcast screen.
  • the image recognition model can be used to recognize the live screen and determine whether there is a target image element in the live screen. Through the pre-trained image recognition model to automatically recognize the highlights, you can get the highlights more accurately and quickly, improve the efficiency of editing the highlights, and reduce the manpower consumption.
  • Step 103 Save a segment of the live video where the live screen is located as a highlight segment. Specifically, when the target image element is identified in the live screen, the live video where the live screen is located is judged as a highlight segment.
  • the live video with a preset duration can be used as the highlight segment, or a live stream composed of several consecutive frames Video as a highlight.
  • Establish an image recognition model in advance collect a large number of target image elements in the video images that need to be recognized, and use the collected target image element pictures as the positive sample pictures of the training model; in addition, collect a large number of non-target image elements as the negative sample pictures of the training model. Train the image recognition model through the collected pictures.
  • the method of image recognition through the model is as follows:
  • the live video is decoded in real time to obtain decoded frames.
  • the pixel value is intercepted for the area at the specified position and size of each frame.
  • the chroma component is discarded. Because the brightness usually has a higher degree of discrimination than chroma Therefore, the amount of calculation can be greatly reduced while ensuring the recognition accuracy.
  • the resized grayscale image can have any aspect ratio, which has no effect on the basic characteristics of the grayscale image.
  • the training image size can be adjusted to the same size, which can improve the accuracy of image recognition , The amount of calculation is small and relatively stable.
  • HOG Hemogram of Gradient Direction
  • SVM Small Vector Machine
  • the result value obtained is the category number corresponding to the current interception area.
  • the category number here can represent the type of the target image element, for example, the target image element "five kills" corresponds to the category number 001, the target image element "success" corresponds to the category number 002, and so on.
  • the selected location, size and quantity of the designated area can be preset.
  • the designated area can be selected at the upper left corner, lower left corner, upper right corner, lower right corner or the middle of the live screen.
  • live video the target image element is usually fixed at a certain position on the live screen.
  • the words "five kills" of the target image element usually appear in the lower left corner of the live screen, in order to ensure the recognition of the target image element Accuracy, and appropriately reduce the amount of calculation of feature values, can select a designated area for the location where the target image element usually appears, that is, the lower left corner of the live screen.
  • the selected size of the designated area is usually larger than the size of the target image element.
  • the target area of the same size as the target image element can be intercepted in the designated area, and whether there is a target in the target area Image element, if there is no target image element, the position of the target area is offset according to the preset strategy.
  • the preset strategy can be to offset the intercepted area by one pixel in any direction, or it can be the area to be intercepted Offset the specified distance in any direction.
  • the method of selecting the target area is as follows, for example, the target image element size is 50 pixels * 50 pixels, the size of the specified area obtained is 80 pixels * 80 pixels, and an image with a size of 50 pixels * 50 pixels is taken as the target area in the specified area, such as As shown in the left side of Figure 2, a target area (shaded area) with a size of 50 pixels * 50 pixels is cut out in a designated area (a square area with a white background), and the target image element is judged according to the image recognition model. For example, the feature value of the target area is input into the image recognition model, and the output result of the image recognition model is used to determine whether the target image element exists in the target area.
  • the intercepted area is translated in any direction by a specified interval distance, as shown in the intermediate image in Figure 2, the second target area (shaded area) of 50 pixels * 50 pixels is intercepted, the same
  • the specified distance is sequentially translated in any direction until the target image element exists in the intercepted target area, or the search of the entire specified area is completed.
  • the number of target regions to be intercepted at a specified interval is greatly reduced, but the calculation accuracy will also be slightly affected. Therefore, in practical applications, fast algorithms can be used to search for ranges.
  • the calculation The matching degree between the target area and the target image elements select the target area with higher matching degree, and use the target area with higher matching degree as the starting point of the next round of precise search, and perform a small-scale refinement search based on the selected starting point, and select the match
  • the specified interval can be adjusted from ten pixels in the first round to five pixels. This makes the second round The interception result of is more accurate, which can improve the probability of locating to the precise position of the target image element.
  • Step 104 Combine the highlights into the live video according to the synthesis conditions to obtain the highlight synthesized video.
  • the synthesis condition is the moment when the user wants to replay the highlight, which can be at the end of the game or game, or during the intermission. For example, it can be the moment when the first round of the game is broadcast live, and the moment when the second round just begins.
  • the content that the live video is playing is usually the content that the user does not pay attention to.
  • the highlights are merged into the live video at the moment of synthesis. There are highlights in the synthesized video, so that the user can receive the content while watching the unfocused content.
  • the synthesis condition can also be that the number of highlights reaches the preset number. This can control the duration of the highlights, and when the number of highlights reaches the preset number, the highlights will be merged into the live video, and the highlights can be played in time.
  • the highlight clips are provided to users to watch, to ensure the timeliness of highlight clips playback.
  • the image recognition model can also be used to determine whether the synthesis conditions are satisfied.
  • the image recognition model is trained through a large number of images that can be used as the synthesis moment.
  • the images used for these training are usually image elements with content that the user does not pay attention to, such as image elements that represent the end of a game or a game or an intermission.
  • the recognition model can determine whether the currently playing live video meets the synthesis conditions.
  • the synthesis time can be determined, that is, the timing of synthesizing highlights and the live video.
  • the image recognition model can determine the synthesis timing to improve the determination.
  • the efficiency of synthesis timing reduces manpower consumption. And to a certain extent, it is ensured that the content of the live video played at the moment of synthesis is the content relatively unconcerned by the user, so as to avoid affecting the user's viewing of the live video.
  • Step 105 Switch the output live video to the highlight composite video.
  • the server outputs the video to the client for the client to present the video content to the user.
  • the video played by the client is the live video.
  • the server outputs the highlight composite video the client will By playing the highlight composite video, the user can watch the playback of the highlights.
  • the switching condition may be: judging the remaining time of the highlight composite video, and if the highlight composite video is played, switch the output highlight composite video to a live video, so that the user can continue to watch the live video.
  • the first video frame of the switched live video is a key frame
  • the live video data is based on a video frame sequence. Each video frame sequence consists of a key frame and multiple non-key frames. Frame composition, switching at the key frame can ensure that the switched video can be played normally, and there will be no problems such as missing data and unrecognizable video data.
  • the server recognizes whether there is a target image element in the live screen, and when the target image element is recognized, it represents that the live screen has the characteristics of a wonderful moment screen. It can be determined that the live clip where the live screen is located is a highlight; when the server recognizes the synthesis instruction, the recognized and saved highlight is synthesized into the live video to obtain the live video highlights, and output the synthesized live video highlights to enable users While watching the live broadcast, you can watch the highlights of the live video at the same time, which improves the timeliness of the highlight video playback. And in the editing process, there is no need to manually identify the highlights, saving a lot of human resources and solving the inefficiency of manual editing.
  • the second embodiment of the present application relates to a method for generating live video highlights.
  • the second embodiment further refines the first embodiment.
  • the details are: in the second embodiment of this application, the highlight segment is decoded and the live video is decoded; the decoded highlight segment is combined with the decoded highlight segment. Live video is merged to obtain composite data; the composite data is re-encoded to obtain a highlight composite video.
  • the specific flow chart is shown in Figure 2.
  • Step 201 Identify the live screen of the live video.
  • Step 202 Determine whether there is a target image element in the live broadcast screen. If the judgment result is yes, go to step 203, and if the judgment result is no, go back to step 201.
  • Step 203 Save a segment of the live video where the live screen is located as a highlight segment.
  • Steps 201 to 203 are the same as steps 101 to 103 in the first embodiment, and will not be repeated here.
  • Step 204 When the synthesis condition is satisfied, the highlight segment and the live video are decoded. Specifically, the decoding method corresponding to the encoding method is used to restore the digital to the content it represents.
  • the decoded video data is divided into audio data and video data.
  • the video data is a series of video frame sequences, and each video frame sequence can represent A picture or an action. It is convenient to process the picture in the video after decoding.
  • Step 205 Combine the decoded highlight segment with the decoded live video to obtain synthesized data.
  • the decoded highlights can be divided into audio data and video data.
  • the live video can also be divided into audio data and video data after decoding.
  • the audio data of the highlights and the audio data of the live video are combined to combine
  • the video data of the highlight segment is combined with the video data of the live video, and the combined audio data and video data are collectively called composite data.
  • the audio stream of the decoded highlight segment and the audio stream of the decoded live video can be combined in a mixed form to obtain audio synthesis data. Specifically, obtain the audio data of the live video and adjust the volume; obtain the audio data of the highlights and adjust the volume; mix the adjusted two audio data to form a synthesized audio stream. Enable users to hear live video and audio of highlights at the same time, so that users can obtain all the information of the played video.
  • the mixed audio data can play different audio data in different channels, or use one audio data as the main audio and the other audio data as the background.
  • the specific mixing method is not limited here.
  • each video frame of the decoded highlight segment can be merged with each video frame of the decoded live video in a preset mode to obtain video synthesis data; where the preset mode can be: overlay Mode or parallel mode.
  • each video frame of the live video can be reduced to be used as a small window for the client to play the video, and each video frame of the highlight segment is used as the background of the client to play the video, and each video frame of the reduced live video Overlay on a certain position of each video frame of the highlight, such as the upper left or lower right corner of each video frame of the highlight.
  • the composition of video images can also adopt different size modes.
  • each video frame of the highlight clip is reduced and overlaid on each video frame of the live video, and the highlight clip is played in a small window;
  • the live video and highlights can be displayed side by side in windows of the same size, etc., which will not be repeated here.
  • Step 206 Re-encode the synthesized data to obtain a highlight synthesized video.
  • the composite data is encoded with a preset encoding method. This can facilitate the transmission of the highlight composite video and avoid data loss during the transmission.
  • the encoded data can be compressed to reduce the transmission process. The occupied memory space improves the efficiency of data transmission.
  • Step 207 Switch the output live video to the highlight composite video. This step is consistent with step 105 in the first embodiment, and will not be repeated here.
  • the highlight segment is merged with the live video to ensure the stability of the merged video; while the highlight segment is merged with the video data of the live video, the highlight segment is combined with the audio data of the live video Merging enables users to listen to the live video and audio of highlights at the same time, so that users can obtain all the information of the played video, and improve the user's viewing experience. Re-encode and compress the composite data of the highlight and live video, saving the storage space of the highlight composite video.
  • the third embodiment of the present application relates to a method for generating live video highlights.
  • the third embodiment is an improvement over the second embodiment.
  • the improvement lies in that: in the third embodiment of the present application, the live video data packet is cached during the live broadcast, and when the target image element is identified, the cached live video The data packet is saved as a highlight.
  • the specific flow chart is shown in Figure 3.
  • Step 301 Identify the live screen of the live video.
  • Step 302 Determine whether there is a target image element in the live screen. If the judgment result is yes, go to step 303, and if the judgment result is no, go back to step 301.
  • Steps 301 and 302 are the same as steps 201 and 202 in the second embodiment, and will not be repeated here.
  • step 303 it is judged whether the play time length of the buffered live video data packet exceeds a preset upper limit, if the judgment result is yes, go to step 304; if the judgment result is no, go to step 307.
  • the live video data packets of the live video are cached in real time, and at the same time, it is judged whether there is a target image element in the live screen of the live video.
  • the current buffered live video data package is judged Whether the duration exceeds the preset upper limit, calculate the duration of the cached data according to the timestamp of the cached data, determine whether the calculated duration of the cached data exceeds the preset upper limit, if it exceeds the preset upper limit, discard some of the cached live video packets If the data does not exceed the preset upper limit, it means that the cached live video data package meets the length requirement of the highlight video, and the cached live video data package is saved as a highlight video.
  • Step 304 Discard the earliest video frame sequence in the buffered live video data packet. Specifically, the data of the cached live video data packets are discarded in sequence according to the sequence of the video frame sequence. This can ensure the timeliness of the highlight segment and ensure that the highlight segment is the segment most impressed by the user.
  • Step 305 Determine whether the first video frame of the data in the buffered live video data packet is a key frame. If the judgment result is yes, then return to step 303; if the judgment result is no, then go to step 306.
  • the first video frame of the cached live video data packet is a key frame, which can ensure that the first video frame of the saved highlight is a key frame, so that the first video frame of the obtained highlight is a key frame to ensure that it is wonderful Fragment video decoding, synthesis, encoding, and switching actions are performed normally, so as to avoid operating errors that cause problems in the playback of the highlight composite video synthesized by the highlight segment.
  • Step 306 Discard the first video frame in the live video data packet. Specifically, when the first video frame of the data in the live video data packet is a non-key frame, the non-key frame is discarded, so as to ensure that the live video data packet uses the key frame as the first video frame.
  • Step 307 Save the buffered live video data package as a highlight segment.
  • Step 308 When the synthesis condition is satisfied, the highlight segment and the live video are decoded.
  • Step 309 Combine the decoded highlight segment with the decoded live video to obtain synthesized data.
  • Step 310 Re-encode the synthesized data to obtain a highlight synthesized video.
  • Step 311 Switch the output live video to the highlight composite video.
  • Steps 307 to 311 are the same as steps 204 to 207 in the second embodiment, and will not be repeated here.
  • the length of the highlight segment is controlled, the memory required for saving the highlight segment is reduced, and the operation of video synthesis can be reduced, thereby improving the efficiency of obtaining the highlight synthesis video.
  • the cached data is discarded from front to back in the unit of video frame sequence, and the first video frame of the cached live video data is ensured as a key frame. Highlight clips can be synthesized smoothly, and the normal playback of the video is ensured to avoid the lack of content of the played highlight synthesized video.
  • the fourth embodiment of the present application relates to a method for generating live video highlights.
  • the fourth embodiment is an improvement over the third embodiment.
  • the improvement lies in that: in the fourth embodiment of the present application, multiple highlights are spliced into one highlight, and the highlights are edited.
  • the specific flow chart is shown in Figure 4.
  • Step 401 Identify the live screen of the live video.
  • Step 402 Determine whether there is a target image element in the live broadcast screen. If the judgment result is yes, go to step 403, and if the judgment result is no, go back to step 401.
  • step 403 it is judged whether the play time length of the buffered live video data packet exceeds a preset upper limit, if the judgment result is yes, go to step 404; if the judgment result is no, go to step 407.
  • Step 404 Discard the earliest video frame sequence in the buffered live video data packet.
  • Step 405 Determine whether the first video frame of the data in the buffered live video data packet is a key frame. If the judgment result is yes, then return to step 403; if the judgment result is no, then go to step 406.
  • Step 406 Discard the first video frame in the live video data packet.
  • Step 407 Save the buffered live video data package as a highlight segment.
  • Steps 401 to 407 are the same as steps 301 to 307 in the third embodiment, and will not be repeated here.
  • step 408 it is judged whether the synthesis condition is satisfied, if the judgment result is yes, go to step 409, and if the judgment result is no, go back to step 401.
  • the synthesis condition may be a moment when the viewer is not paying attention to the current live broadcast content. For example, the first round of the game has just ended when the game is live, and is in the process of entering the second round. Users are usually immersed in the content of the previous game and do not pay much attention to the opening of the second game. Before the synthesis moment is recognized, there may be one or more highlights in the live video, and the multiple recognized highlights may be discontinuous in time. Therefore, when these highlights are merged with the live video, you need to pre-compile multiple highlights. Fragments are processed so that users can watch continuous videos without playback problems such as freezes and rollbacks.
  • step 409 when there are multiple highlights, stitch the multiple highlights into one highlight.
  • you can edit each highlight for example, crop each video frame of the highlight to remove unimportant parts of the screen to highlight the highlights; also Add other screen embellishment elements to the screen, such as animation effects, narration text, etc., to make the wonderful screen more vivid and interesting.
  • After editing the highlight segment it is necessary to adjust the highlight segment according to the timestamp of the highlight video data.
  • the playback duration of the highlight segment can be reduced by adjusting the playback speed, so that the content of the highlight segment is more compact.
  • the time sequence of multiple highlight playback can be determined by the timestamp of each highlight, and the splicing is performed according to the time sequence indicated by the timestamp.
  • the first highlight is edited from the live video from 3 minutes 10 seconds to 3 minutes 25 seconds
  • the second highlight is from the live video 5 minutes 15 seconds to 5 minutes. Edited in 40 seconds.
  • the first 15 seconds of the highlight is the first highlight
  • the last 25 seconds of the highlight is the second highlight. Fragment, this can play the second highlight immediately after the first highlight is played, which improves the smoothness of video connection and ensures the continuous playback of multiple highlights.
  • Step 410 Decoding the highlights and live video.
  • Step 411 Combine the decoded highlights and the decoded live video to obtain synthesized data.
  • Step 412 Re-encode the synthesized data to obtain a highlight synthesized video.
  • Step 413 Switch the output live video to the highlight composite video.
  • Steps 410 to 413 are the same as steps 308 to 311 in the third embodiment, and are not repeated here.
  • each highlight segment is edited; each highlight segment after the editing process is time-stamped repaired; and each highlight segment after the time-stamp repair is spliced into a highlight in the order of the timestamp. This can ensure the continuous playback of multiple highlights and improve the smoothness of the convergence of multiple highlights.
  • Step 601 Output the live video.
  • step 602 it is judged whether the switching time point is met, if the judgment result is yes, go to step 603, otherwise go back to step 601. Specifically, when the live video is output, the video that needs to be live can be cached in the streaming server in advance.
  • the first frame in the cache queue is output
  • the synthesis starts at frame 101 of the live video, that is, the timestamp of the first video frame of the current highlight synthesis video is the same as the timestamp of the 100th frame in the live video buffer queue, then the output live video
  • the switching time point of the highlight composite video is set to the timestamp of the 100th frame of the live video.
  • the live video output from the streaming server to the client is the 80th frame when the synthesis is completed, and the switching time point has not yet been met (that is, the time stamp of the 100th frame has not been reached), the live video will continue to be played until the switching is satisfied At the time point, the current output live video is switched to the highlight composite video.
  • Step 603 When the first video frame is a key frame, switch to a highlight composite video.
  • step 604 it is judged whether the remaining time of the highlight synthesis video is zero, if the judgment result is yes, go to step 605, and if the judgment result is no, go to step 606. Specifically, if the remaining time length of the highlight composite video is not zero, it means that the highlight composite video has not been played, and the highlight composite video needs to be played. When the remaining time is zero, in order to ensure the continuity of the played video, it is necessary to perform Video switching to prevent blank video playback.
  • Step 605 When the first video frame is a key frame, switch to a live video.
  • Step 606 Continue to output the highlight composite video until the remaining duration of the highlight composite video is zero.
  • the processing process of the live video data is shown in FIG. 7.
  • the live video data of the game live broadcast is output to the client through the server, and the client provides a window for the user to watch the live game.
  • This application is aimed at the server's processing of video data, so that the processed data can be directly played by the client for the user to watch.
  • the server receives the audio and video data of the game live broadcast, it puts the received audio and video data packets into the buffer queue, and decodes the received game audio and video, and image the decoded video screen through the first image recognition model Recognition.
  • the target image element is recognized, it means that the live video played at the moment is the content that the user pays attention to, for example, a successful killing or a wonderful walk in the game live broadcast, etc., and the screens appearing in these contents are saved as a highlight segment.
  • the highlights it can be judged according to the timestamp whether the duration of the buffered data (audio and video data packets put in the buffer queue) exceeds the preset upper limit.
  • the preset upper limit is exceeded, part of the buffered data in the buffer queue is discarded to make The duration of the cached data is within the preset upper limit, and the cached data of the game live broadcast is saved as a highlight.
  • the obtained highlights or highlights need to be played.
  • the highlights or highlights are decoded, and the live video after the synthesis moment is decoded, and the decoded live video and highlights Highlights) are merged into a highlight composite video.
  • the highlight composite video contains both highlights and live video.
  • the user can choose to use the highlight as the background of the played video and display the live video as a small window; similarly, when the user pays attention to the live video, use the live video as the background and the highlight as the small window.
  • Window display two video contents can also be displayed side by side, the specific display method depends on the specific situation, and there is no restriction here.
  • the server switches the output highlight composite video to a live video to ensure the continuity of video playback.
  • the fifth embodiment of the present application relates to a device for generating live video collections. As shown in FIG. 8, it includes: a first identification module 81, a storage module 82, a second identification module 83, a synthesis module 84, and an output module 85;
  • the identification module 81 is used to identify the live screen and determine whether there is a target image element in the live screen when the live video is output;
  • the storage module 82 is used to locate the live screen when the first recognition module recognizes that there is a target image element in the live screen A piece of live video is saved as a highlight;
  • the second identification module 83 is used to identify whether the synthesis conditions are met;
  • the synthesis module 84 is used to merge the highlights into the live video according to the synthesis conditions when the synthesis conditions are met, to obtain a highlight synthesis video;
  • the output module 85 is used to output the live video, and after obtaining the highlight composite video, switch the output video from the live video to the highlight composite video.
  • the device for generating live video highlights may also include a decoding module and an encoding module; the decoding module is used to decode the highlights and the live video; the synthesis module is specifically used to merge the decoded highlights with the decoded live video , To obtain the synthesized data; the encoding module is used to re-encode the synthesized data to obtain the highlight synthesized video.
  • the synthesis data may include: video synthesis data and audio synthesis data; the synthesis module is specifically used to merge each video frame of the decoded highlight with each video frame of the decoded live video in a preset mode to obtain the video synthesis data ; Combine the decoded audio stream of the highlight segment with the decoded audio stream of the live video in the form of mixing to obtain audio synthesis data.
  • the device for generating live video highlights may further include a cache module; the cache module is used to cache live video data packets of the live video in real time; the storage module is specifically used to save the currently cached live video data packets as highlights.
  • the storage module is also used for discarding part of the cached live video data packets when the playing duration of the currently cached live video data packets exceeds the preset upper limit, and saves the discarded live video data packets as highlights.
  • the storage module when discarding part of the cached live video data packets, is specifically used to discard the video frame sequence in the currently cached live video data packets according to the playback sequence of the live video data packets, from front to back, until discarded
  • the playback duration of the buffered live video data packet does not exceed the preset upper limit.
  • the live video data package stored by the storage module is specifically a cached live video data package whose first video frame is a key frame.
  • the device for generating live video highlights may also include a splicing module; the splicing module is used to splice multiple highlights into one highlight when the number of highlights is greater than one; the merging module is specifically used to merge the highlights into the live video in.
  • the splicing module is also used to edit and process the highlights, and repair the timestamps of the edited highlights, and splice the highlights after the timestamp repair into a highlight in the order of the timestamps.
  • the output module is specifically configured to switch the output video from the live video to the highlight composite video when the time stamp of the live video meets the switching time point.
  • the output module is also used to switch the output of the highlight composite video to the live video when the remaining time of the highlight composite video is zero.
  • the output module is also used to switch the output highlight composite video to the live video when the remaining time length of the highlight composite video is zero and the first video frame of the live video is determined as a key frame.
  • the first recognition module specifically recognizes the live screen according to the pre-established image recognition model and determines whether there is a target image element in the live screen; wherein the image recognition model is trained based on the collected image features.
  • the first recognition module is specifically configured to obtain the designated area in the live screen, and according to the pre-established image recognition model, identify whether there is a target image element in the specified area; if the target image element exists in the specified area, determine whether the target image element is in the live screen There is a target image element.
  • the first recognition module recognizes whether there is a target image element in the designated area, it is specifically used to intercept the target area in the designated area according to the size of the target image element; input the characteristic value of the target area into the image recognition model, according to the image recognition model Determine whether there is a target image element in the target area; if there is no target image element in the target area, offset the position of the target area according to a preset strategy, and identify whether there is a target image in the offset target area element.
  • the first recognition module is also used to obtain a grayscale image of the designated area, and identify whether there is a target image element in the grayscale image according to a pre-established image recognition model.
  • the user can watch the highlights of the live video while watching the live broadcast, which improves the timeliness of the highlight video playback. And in the editing process, there is no need to manually identify the highlights, saving a lot of human resources and solving the inefficiency of manual editing.
  • this embodiment is a device embodiment corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment.
  • the related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, they will not be repeated here.
  • the related technical details mentioned in this embodiment can also be applied to the first embodiment.
  • modules involved in this embodiment are all logical modules.
  • a logical unit can be a physical unit, a part of a physical unit, or multiple physical units. The combination of units is realized.
  • this embodiment does not introduce a unit that is not closely related to solving the technical problem proposed by the present application, but this does not indicate that there are no other units in this embodiment.
  • the sixth embodiment of the present application relates to a server. As shown in FIG. 9, it includes at least one processor 901; and, a memory 902 communicatively connected to the at least one processor 901; The instructions executed by 901 are executed by the at least one processor 901, so that the at least one processor 901 can execute the above-mentioned method for generating a live video collection.
  • the memory and the processor are connected in a bus manner, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus can also connect various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are all well-known in the art, and therefore, no further description will be given herein.
  • the bus interface provides an interface between the bus and the transceiver.
  • the transceiver may be one element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices on the transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions.
  • the memory can be used to store data used by the processor when performing operations.
  • the seventh embodiment of the present application relates to a computer-readable storage medium that stores a computer program.
  • the computer program is executed by the processor, the above method embodiment is realized.
  • the program is stored in a storage medium and includes several instructions to enable a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请部分实施例涉及视频处理领域,公开了一种直播视频集锦的生成方法、装置、服务器及存储介质。本申请实施例的直播视频集锦的生成方法包括:识别直播视频的直播画面(101)并判断直播画面中是否存在目标图像元素(102);若直播画面中存在目标图像元素,则将直播画面所在的一段直播视频作为精彩片段保存(103);在满足合成条件时,根据合成条件将精彩片段合并至直播视频中,获得集锦合成视频(104);将输出的直播视频切换至集锦合成视频(105)。使用户在观看直播的过程中同时观看到直播视频播放过的精彩片段,提高集锦视频播放的时效性。并且在剪辑过程中无需人工识别精彩片段,节省了大量的人力资源并且解决了人工剪辑效率低下。

Description

直播视频集锦的生成方法、装置、服务器及存储介质
交叉引用
本申请引用于2019年4月2日递交的名称为“直播视频集锦的生成方法、装置、服务器及存储介质”的第201910262673.7号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请实施例涉及视频处理领域,特别涉及一种直播视频集锦的生成方法、装置、服务器及存储介质。
背景技术
在线视频直播就是利用网络资源进行的现场直播服务,通过现场的视频拍摄同步上传到网络上,用户可以同一时间在网络上看到第一时间的现场咨询。这种网络的现场直播服务广泛地应用于实时开展的新闻发布会、展会、产品发布、产品推介、销售现场展示、网络演唱会、公司酒会、商务会议、庆典仪式、节目演出、体育比赛、游戏比赛、证券分析、远程教育等网络直播服务。
在直播视频播放过程中,观众通常对直播视频中的精彩片段产生强烈的兴趣,为了让观众可以反复观看直播视频中的精彩片段,通常在直播结束后由人工进行线下剪辑,再以点播的方式进行发布,以供观众查找并观看。
发明人发现相关技术中至少存在如下问题:在直播结束后再进行线下剪辑,使得用户只能在直播结束后进行点播,影响用户的观看体验。而且生成剪辑视频需要投入大量的人力资源,效率低下,无法满足目前蓬勃发展的直播行业的需求。
发明内容
本申请部分实施例的目的在于提供一种直播视频集锦的生成方法、装置、服务器及存储介质,使得用户在观看直播过程中同时可以观看到直播视频集锦,并且实现自动剪辑直播视频集锦,提高剪辑效率。
为解决上述技术问题,本申请的实施例提供了一种直播视频集锦的生成方法,包括: 识别直播视频的直播画面并判断直播画面中是否存在目标图像元素;若直播画面中存在目标图像元素,则将直播画面所在的一段直播视频作为精彩片段保存;在满足合成条件时,根据合成条件将精彩片段合并至直播视频中,获得集锦合成视频;将输出的直播视频切换至集锦合成视频。
本申请的实施例还提供了一种直播视频集锦的生成装置,包括:第一识别模块,存储模块,第二识别模块,合成模块,输出模块;第一识别模块用于识别直播视频的直播画面并判断直播画面中是否存在目标图像元素;存储模块用于在识别直播画面中存在目标图像元素时,将直播画面所在的一段直播视频作为精彩片段保存;第二识别模块用于识别是否满足合成条件;合成模块用于在满足合成条件时,根据合成条件将所述精彩片段合并至直播视频中,获得集锦合成视频;输出模块用于将输出的直播视频切换至集锦合成视频。
本申请的实施例还提供了一种服务器,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述的直播视频集锦的生成方法。
本申请的实施例还提供了一种存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述直播视频集锦的生成方法。
本申请实施例相对于现有技术而言,在直播过程中,服务器识别直播画面中是否存在目标图像元素,在识别到目标图像元素时,则代表该直播画面中具有精彩瞬间画面的特征,可以判定该直播画面所在的直播片段为精彩片段;在服务器识别到满足合成条件时,将识别并保存的精彩片段合成至直播视频中,获得直播视频集锦,并将合成的直播视频集锦输出,使用户在观看直播的过程中同时观看到直播视频播放过的精彩片段,提高集锦视频播放的时效性。并且在剪辑过程中无需人工识别精彩片段,节省了大量的人力资源并且解决了人工剪辑效率低下。
另外,将精彩片段合并至直播视频中,获得集锦合成视频,具体包括:将精彩片段解码,并将直播视频解码;将解码后的精彩片段与解码后的直播视频合并,获得合成数据;将合成数据重新编码,获得集锦合成视频。在保存精彩片段时保存编码状态的视频数据,以编码状态的数据便于存储;在需要将精彩片段与直播视频合并时,将精彩片段解码,并在合并之后重新编码,即保证了精彩片段与直播视频的合并,又可以对数据进行压缩,压缩后的数据所占存储空间较少,方便集锦合成视频的传输,提高了集锦合成视频的传输效率。
另外,合成数据包括:视频合成数据和音频合成数据,将解码后的精彩片段与解码后的直播视频合并,具体包括:将解码后的精彩片段的各视频帧与解码后的直播视频的各视频 帧以预设模式合并,获得视频合成数据;将解码后的精彩片段的音频流与解码后的直播视频的音频流以混音的形式合并,获得音频合成数据。这样做用户可以在观看直播视频的时候听到精彩片段的音频,并看到精彩片段的画面,使用户可以获取播放的视频的全部信息。
另外,在判断直播画面中是否存在目标图像元素之前,还包括:实时缓存直播视频的直播视频数据包;将直播画面所在的一段直播视频作为精彩片段保存,具体包括:将当前缓存的直播视频数据包作为精彩片段保存。这样做可以节省保存精彩片段所占用的存储空间。
另外,在将缓存的直播视频数据包作为精彩片段保存之前,还包括:判断缓存的直播视频数据包的时长是否超过预设上限;若超过预设上限,则丢弃部分缓存的直播视频数据包。这样做可以控制精彩片段的长度,减少保存精彩片段所需的内存,丢弃部分数据还可以使精彩片段的重点更为突出,避免冗余信息占据用户精力,方便用户观看到精彩的视频片段。
另外,丢弃部分缓存的直播视频数据包的数据,具体包括:将当前缓存的直播视频数据包中的视频帧序列按照所述直播视频数据包的播放的先后顺序,从前到后依次丢弃,直至所述丢弃后的缓存的直播视频数据包的播放时长未超过预设上限。这样做可以保证精彩片段处于预设时长内,使精彩片段数据长度在预设上限之内,从而减少保存精彩片段所需的内存,由于缓存的直播视频数据包由先至后依次丢弃,可以保证精彩片段的时效性,且由于丢弃数据后的精彩片段时长较短,可以减少之后与直播视频进行合成的操作,从而提高获取集锦合成视频的效率。
另外,作为精彩片段保存的直播视频数据包,具体为:首个视频帧为关键帧的缓存的直播视频数据包。这里的关键帧指视频编码中的关键帧,使获得的精彩片段的首个视频帧为关键帧可以保证精彩片段视频解码、合成、编码、压缩以及切换动作的正常进行,避免由精彩片段合成的集锦合成视频的播放出现问题。
另外,在识别到合成时刻之前,还包括:若保存的精彩片段数量大于一个,则将多个精彩片段拼接成一个精彩集锦;将精彩片段合并至直播视频中,具体为:将精彩集锦合并至直播视频中。这样做可以使用户观看到不同时间点获取的精彩片段,给用户连续的视觉享受,提高用户的观看体验。
另外,将多个精彩片段拼接成一个精彩集锦,具体包括:将各精彩片段进行剪辑处理;将剪辑处理后的各精彩片段进行时间戳修复;将时间戳修复后的各精彩片段按时间戳顺序拼接成一个精彩集锦。这样做可以保证精彩片段的连续性,避免在播放视频时出现视频卡顿、回退等现象。
另外,将输出的直播视频切换至集锦合成视频,具体包括:根据直播视频的时间戳判 断是否满足切换的时间点;在满足切换的时间点时,将输出的视频由直播视频切换至集锦合成视频。这样做可以保证用户不错过精彩直播,同时在直播用户不感兴趣的部分的时间点时,切换为集锦合成视频,以达到在准确的时间点切换视频的目的。
另外,在将输出的直播视频切换至集锦合成视频之后,还包括:判断集锦合成视频的剩余时长;在剩余时长为零时,将输出的集锦合成视频切换为直播视频。这样做可以在集锦合成视频播放完毕后切换至直播视频,集锦合成视频与直播视频在播放过程中无缝衔接,保证用户观看视频的连续性,提高用户的观看体验。
另外,在将输出的直播视频切换至集锦合成视频之后,还包括:判断集锦合成视频的剩余时长;在集锦合成视频的剩余时长为零时,判断直播视频的首个视频帧是否为关键帧;在直播视频的首个视频帧为关键帧时,将输出的集锦合成视频切换为直播视频。同时满足集锦合成视频剩余时长为零且直播视频的首个视频帧为关键帧可以保证切换后的视频可以正常播放,不会出现数据缺失、卡顿、回退等播放问题,还可以保证集锦合成视频与直播视频在播放过程中的无缝衔接,使用户可以连续的观看视频,提高用户的观看体验。
另外,识别直播画面并判断直播画面中是否存在目标图像元素,具体包括:据预先建立的图像识别模型识别直播画面并判断直播画面中是否存在目标图像元素;其中,图像识别模型根据采集的图像特征训练得到。通过预先训练的图像识别模型自动识别精彩片段,可以更加准确且快速的得到精彩片段,提高剪辑精彩片段的效率,减少人力的消耗。
另外,根据预先建立的图像识别模型识别直播画面并判断直播画面中是否存在目标图像元素,具体包括:获取直播画面中的指定区域;根据预先建立的图像识别模型,识别指定区域中是否存在目标图像元素;若指定区域中存在目标图像元素,则判定直播画面中存在目标图像元素。这样做可以减少特征值的计算量。
另外,识别指定区域中是否存在目标图像元素,具体包括:在指定区域中根据目标图像元素的尺寸截取目标区域;将目标区域的特征值输入图像识别模型,根据图像识别模型的输出结果判断目标区域中是否存在目标图像元素;若目标区域中不存在目标图像元素,则按预设策略对目标区域的位置进行偏移,并识别偏移后的目标区域中是否存在目标图像元素。这样做可以保证图像识别的精度,并且减少了特征值计算的计算量,提高了特征值的计算效率。
另外,获取直播画面中的指定区域后,还包括获取指定区域的灰度图;根据预先建立的图像识别模型,识别指定区域中是否存在目标图像元素,具体为根据预先建立的图像识别模型,识别灰度图中是否存在目标图像元素。灰度图相对于彩色图像具有更高的区分度,在 保证图像识别精度的同时减少特征值的计算量。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是根据本申请第一实施例中的直播视频集锦的生成方法的流程图;
图2是根据本申请第一实施例中目标区域的选取过程的示意图;
图3是根据本申请第二实施例中的直播视频集锦的生成方法的流程图;
图4是根据本申请第三实施例中的直播视频集锦的生成方法的流程图;
图5是根据本申请第四实施例中的直播视频集锦的生成方法的流程图;
图6是根据本申请第四实施例中的直播视频与精彩片段相互切换方法的流程图;
图7是根据本申请第四实施例中的服务器对直播视频的数据的处理过程示意图;
图8是根据本申请第五实施例中的直播视频集锦的装置的结构示意图;
图9是根据本申请第六实施例中的服务器的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。
以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的第一实施例涉及一种直播视频集锦的生成方法,包括:识别直播视频的直播画面并判断直播画面中是否存在目标图像元素;若直播画面中存在目标图像元素,则将直播画面所在的一段直播视频作为精彩片段保存;在识别到合成时刻时,根据合成时刻将精彩片段合并至直播视频中,获得集锦合成视频;将输出的直播视频切换至集锦合成视频。下面对本实施例的直播视频集锦的生成方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。具体流程如图1所示。
步骤101,识别直播视频的直播画面。具体地说,在进行视频直播时,服务器接收直播视频,并将接收到的直播视频进行处理,将处理完成的直播视频输出至客户端,供客户端 将直播视频播放,使用户可以观看到直播视频。在服务器处理接收到的直播视频时,将接受到的直播视频解码,解码后的直播视频数据为一串视频帧序列,可以通过定时或逐帧进行图像识别直播画面。在实际应用中,服务器可以是在流媒体服务端的流媒体服务器,流媒体服务器将连续的音频和视频信息压缩后放到网络服务器上,以供用户边下载边观看,提高了用户的观看体验。
步骤102,判断直播画面中是否存在目标图像元素。若判断结果为是,则进入步骤103,若判断结果为否,则返回步骤101。具体地说,直播画面中的目标图像元素可以为直播画面中能够代表视频内容的标识,比如,游戏直播中的精彩瞬间画面出现的“击杀”、“成功”等字样或相关图标,若出现相同或类似标识,则判定该直播画面中存在目标图像元素。在实际应用中可以通过图像识别模型进行直播画面的识别并判断直播画面中是否存在目标图像元素。通过预先训练的图像识别模型自动识别精彩片段,可以更加准确且快速的得到精彩片段,提高剪辑精彩片段的效率,减少人力的消耗。
步骤103,将直播画面所在的一段直播视频作为精彩片段保存。具体地说,在识别到直播画面中存在目标图像元素时,将直播画面所在的直播视频判定为精彩片段,可以取预设时长直播视频作为精彩片段,也可以取若干连续帧直播画面组成的直播视频作为精彩片段。
以下对通过图像识别模型识别直播画面并判断直播画面中是否存在目标图像元素进行具体说明:
预先建立图像识别模型,收集大量需要识别的视频画面中的目标图像元素,将收集的目标图像元素图片作为训练模型的正样本图片;另外,收集大量非目标图像元素作为训练模型的负样本图片,通过收集的图片训练图像识别模型。
通过模型进行图像识别的方法具体如下:
首先,对直播视频进行实时解码,得到解码帧。对每一帧指定位置和大小的区域进行像素值截取,在截取过程中仅截取直播画面的亮度分量(灰度图)而放弃色度分量,由于亮度通常具有比色度更高的区分度,故可以在保证识别精度的同时大大减少计算量。其次,对截取出来的灰度图进行大小调整,通常可以选择一个较小、但不至于丢失过多细节的尺寸(例如64x64)。进行大小调整后的灰度图可以为任意宽高比,对灰度图的基本特征没有影响,在对图像识别模型进行训练时可以将训练图像大小调整为相同尺寸,这样做可以提高图像识别精度,计算量较小且较为稳定。再次,对调整尺寸后的图像进行HOG(梯度方向直方图)计算,得到其多维特征表示。最后,加载预训练好的SVM(支持向量机)模型文件,并将上面计算出来的多维特征传递给该模型。得到的结果值即为当前截取区域对应的类别序号。这里 的类别序号可以代表目标图像元素的种类,比如目标图像元素“五杀”对应类别序号001,目标图像元素“成功”对应类别序号002,等等。
在上述图像识别过程中,指定区域选取的位置、尺寸及数量可以是预先设置的,例如,可以在直播画面的左上角、左下角、右上角、右下角或直播画面的中间任意位置选取指定区域,也可以将整个直播画面作为一个指定区域。由于在直播视频中,目标图像元素通常固定在直播画面的某一位置,例如,游戏直播过程中通常在直播画面的左下角出现目标图像元素“五杀”等字样,为了保证目标图像元素的识别精度,并适当的减少特征值的计算量,可以针对目标图像元素通常出现的位置,也就是直播画面的左下角位置选取指定区域。
具体地说,指定区域选取的尺寸通常大于目标图像元素的尺寸,在对指定区域进行识别时,可以在指定区域内截取与目标图像元素的尺寸相同的目标区域,并识别目标区域中是否存在目标图像元素,若不存在目标图像元素,则按照预设策略对目标区域的位置进行偏移,预设策略可以是将截取的区域以任意方向偏移一个像素的距离,也可以是将截取的区域以任意方向偏移指定间隔的距离。目标区域的选取方式如下,例如,目标图像元素尺寸为50像素*50像素,获取的指定区域尺寸为80像素*80像素,在指定区域中截取尺寸为50像素*50像素图像作为目标区域,如图2左一所示在指定区域(底色为白色的方形区域)中截取一个尺寸为50像素*50像素的目标区域(阴影区域),根据图像识别模型判断目标区域中是否存在目标图像元素。例如,将目标区域的特征值输入图像识别模型,根据图像识别模型的输出结果判断目标区域中是否存在目标图像元素。若目标区域中不存在目标图像元素,则将所截取的区域以任意方向平移指定间隔的距离,如图2中间图像所示截取第二个50像素*50像素的目标区域(阴影区域),同理以上述规律以任意方向依次平移指定距离,直至截取的目标区域中存在目标图像元素,或者,完成整个指定区域的搜索。以指定间隔进行截取的目标区域数量大大减少,但是计算精度也会有些许影响,所以在实际应用中可以采取快速算法进行范围搜索,比如,在以十个像素为间隔截取目标区域后,计算各目标区域与目标图像元素的匹配度,选择匹配度较高的目标区域,并将匹配度较高的目标区域作为下一轮精确搜索的起点,基于选取的起点进行小范围细化搜索,选取匹配度更高的位置,例如,对于匹配度较高的目标区域,可以在第二轮截取过程中,指定间隔从第一轮截取中的十个像素调整为五个像素,这样做使第二轮的截取结果更为精确,从而可以提高定位到目标图像元素出现的精确位置的概率。
步骤104,根据合成条件将精彩片段合并至直播视频,获得集锦合成视频。具体地说,合成条件为用户希望回放精彩片段的时刻,可以在游戏或比赛结束时、或者中场休息时,例 如,可以为游戏直播第一局结束,第二局刚开始的时刻,合成时刻的直播视频正在播放的内容通常为用户不关注的内容,在合成时刻将精彩片段合并至直播视频,合成后的集锦合成视频中存在精彩片段,使用户在观看不关注的内容的过程中同时接收到精彩片段,既接收到精彩内容的回放,又避免了直播视频的枯燥,提高了用户观看直播视频的观看体验。合成条件也可以为精彩片段的数量达到预设数量,这样做可以控制精彩片段的时长,并且在精彩片段数量达到预设数量时,将精彩片段合并至直播视频,通过播放集锦合成视频,可以及时的将精彩片段提供给用户观看,保证精彩片段播放的时效性。
在实际应用中,合成条件是否满足也可以利用图像识别模型对画面的识别进行判断。通过大量的可以作为合成时刻的图像训练图像识别模型,这些训练所用的图像通常为带有用户不关注内容的图像元素,比如代表游戏或比赛结束或中场休息的图像元素,这样训练出的图像识别模型可以通过对当前播放的直播视频的识别判定是否满足合成条件,根据合成条件可以确定合成时刻,即确定将精彩片段与直播视频进行合成的时机,通过图像识别模型判定合成时机,可以提高确定合成时机的效率,减少人力的消耗。并且在一定程度上保证合成的时刻所播放的直播视频的内容为用户相对不关注的内容,避免影响用户对直播视频的观看。
步骤105,将输出的直播视频切换至集锦合成视频。具体地说,服务器输出视频至客户端以供客户端将视频内容呈现给用户,在服务器输出直播视频时,客户端播放的视频即为直播视频,在服务器输出集锦合成视频时,那么客户端将播放集锦合成视频,用户就可以观看到精彩片段的回放。
在将输出的直播视频切换至集锦合成视频之后,还可以判断是否满足切换条件,在满足切换条件时,将输出的集锦合成视频切换至所述直播视频。切换条件可以为,判断集锦合成视频剩余时长,若集锦合成视频播放完毕,则将输出的集锦合成视频切换成直播视频,使用户继续观看直播视频。在将输出的集锦合成视频切换为直播视频时,切换的直播视频的首个视频帧为关键帧,直播视频数据以视频帧序列为单位,每个视频帧序列由一个关键帧和多个非关键帧组成,在关键帧进行切换可以保证切换后的视频可以正常播放,不会出现数据缺失,无法识别视频数据等问题。
本申请实施例相对于现有技术而言,在直播过程中,服务器识别直播画面中是否存在目标图像元素,在识别到就目标图像元素时,则代表该直播画面中具有精彩瞬间画面的特征,可以判定该直播画面所在的直播片段为精彩片段;在服务器识别到合成指令时,将识别并保存的精彩片段合成至直播视频中,获得直播视频集锦,并将合成的直播视频集锦输出,使用户在观看直播的过程中同时观看到直播视频播放过的精彩片段,提高集锦视频播放的时效性。 并且在剪辑过程中无需人工识别精彩片段,节省了大量的人力资源并且解决了人工剪辑效率低下。
本申请的第二实施例涉及一种直播视频集锦的生成方法。第二实施例对第一实施例做了进一步细化,细化之处在于:在本申请第二实施例中将精彩片段解码,并将直播视频解码;将解码后的精彩片段与解码后的直播视频合并,获得合成数据;将合成数据重新编码,获得集锦合成视频。具体流程图如图2所示。
步骤201,识别直播视频的直播画面。
步骤202,判断直播画面中是否存在目标图像元素。若判断结果为是,则进入步骤203,若判断结果为否,则返回步骤201。
步骤203,将直播画面所在的一段直播视频作为精彩片段保存。
步骤201至203与第一实施例中步骤101至103一致,在此不再赘述。
步骤204,在满足合成条件时,将精彩片段与直播视频解码。具体地说,使用编码方式对应的解码方式将数码还原成它所代表的内容,解码后的视频数据分为音频数据和视频数据,视频数据为一串视频帧序列,每个视频帧序列可以代表一个画面或一个动作。解码后便于对视频中的画面进行处理。
步骤205,将解码后的精彩片段与解码后的直播视频合并,获得合成数据。具体地说,解码后的精彩片段可以分为音频数据和视频数据,同理直播视频解码后也可以分为音频数据和视频数据,将精彩片段的音频数据与直播视频的音频数据相合并,将精彩片段的视频数据与直播视频的视频数据相合并,合并后的音频数据及视频数据统称为合成数据。
在对音频数据进行合成时,可以将解码后的精彩片段的音频流与解码后的直播视频的音频流以混音的形式合并,获得音频合成数据。具体地说,获取直播视频的音频数据,并进行音量调整;获取精彩片段的音频数据,并进行音量调整;将调整后的两个音频数据进行混音,形成合成后的音频流,这样做可以使用户同时听到直播视频和精彩片段的音频,使用户可以获取播放的视频的全部信息。混音后的音频数据可以以不同声道播放不同的音频数据,也可以将一个音频数据作为主音频,而另一音频数据充当背景,具体混音方式在此不做限制。
在对视频数据进行合并时,可以将解码后的精彩片段的各视频帧与解码后的直播视频的的各视频帧以预设模式合并,获得视频合成数据;其中,预设模式可以为:覆盖模式或并列模式。具体地说,可以将直播视频的各视频帧进行缩小操作,作为客户端播放视频的小窗口,将精彩片段的各视频帧作为客户端播放视频的背景,将缩小后的直播视频的各视频帧覆盖在精彩片段的各视频帧的某个位置上,如精彩片段的各视频帧的左上角或右下角等。这样 做可以在用户观看精彩片段视频的同时,不会错过直播视频播放的直播信息。值得一提的是,视频画面的合成也可以采用不同的大小模式,比如,将精彩片段的各视频帧缩小后覆盖在直播视频的各视频帧上,将精彩片段以小窗口的形式播放;还可以将直播视频与精彩片段以相同大小的窗口并排显示播放等,在此不一一赘述。
步骤206,将合成数据重新编码,获得集锦合成视频。具体地说,用预先设定的编码方式对合成数据进行编码,这样做可以方便集锦合成视频的传输,避免传输过程中数据丢失,在编码后还可以将编码后的数据压缩,减少传输过程所占用的内存空间,提高数据传输的效率。
步骤207,将输出的直播视频切换至集锦合成视频。该步骤与第一实施例中步骤105一致,在此不再赘述。
在本实施例中,通过解码操作后,将精彩片段与直播视频进行合并,保证合并后视频的稳定性;将精彩片段与直播视频的视频数据合并的同时,将精彩片段与直播视频的音频数据合并,可以使用户同时听到直播视频和精彩片段的音频,使用户可以获取播放的视频的全部信息,提高用户的观看体验。将精彩片段与直播视频的合成数据重新编码并压缩,节省了集锦合成视频的存储空间。
本申请的第三实施例涉及一种直播视频集锦的生成方法。第三实施例对第二实施例进行了改进,改进之处在于:在本申请第三实施例中在直播过程中将直播视频数据包缓存,并在识别目标图像元素时,将缓存的直播视频数据包作为精彩片段保存。具体流程图如图3所示。
步骤301,识别直播视频的直播画面。
步骤302,判断直播画面中是否存在目标图像元素。若判断结果为是,则进入步骤303,若判断结果为否,则返回步骤301。
步骤301、302与第二实施例中步骤201、202一致,在此不再赘述。
步骤303,判断缓存的直播视频数据包的播放时长是否超过预设上限,若判断结果为是,则进入步骤304;若判断结果为否,则进入步骤307。具体地说,在直播过程中,实时缓存直播视频的直播视频数据包,同时判断直播视频的直播画面是否存在目标图像元素,在识别到目标图像元素时,判断目前缓存的直播视频数据包的播放时长是否超过预设上限,根据缓存数据的时间戳,计算出缓存数据的时长,判断计算的缓存数据的时长是否超过预设上限,若超过预设上限,则丢弃部分缓存的直播视频数据包的数据,若未超过预设上限,则说明该缓存的直播视频数据包符合精彩片段视频的长度要求,将缓存的直播视频数据包保存为精彩 片段。
步骤304,丢弃缓存的直播视频数据包中最早的视频帧序列。具体地说,将缓存的直播视频数据包的数据按照视频帧序列先后顺序依次丢弃,这样做可以保证精彩片段的时效性,保证精彩片段为用户印象最为深刻的片段。
步骤305,判断缓存的直播视频数据包中数据的首个视频帧是否为关键帧,若判断结果为是,则返回步骤303;若判断结果为否,则进入步骤306。具体地说,缓存的直播视频数据包的首个视频帧为关键帧,可以保证保存的精彩片段的首个视频帧为关键帧,使获得的精彩片段的首个视频帧为关键帧可以保证精彩片段视频解码、合成、编码以及切换动作的正常进行,避免操作错误导致由精彩片段合成的集锦合成视频的播放出现问题。
步骤306,将直播视频数据包中的首个视频帧丢弃。具体地说,在直播视频数据包中的数据的首个视频帧为非关键帧时,将非关键帧丢弃,从而保证直播视频数据包以关键帧作为首个视频帧。
步骤307,将缓存的直播视频数据包作为精彩片段保存。
步骤308,在满足合成条件时,将精彩片段与直播视频解码。
步骤309,将解码后的精彩片段与解码后的直播视频合并,获得合成数据。
步骤310,将合成数据重新编码,获得集锦合成视频。
步骤311,将输出的直播视频切换至集锦合成视频。
步骤307至311与第二实施例中步骤204至207一致,在此不再赘述。
在本实施例中,控制精彩片段的长度,减少保存精彩片段所需的内存,并且可以减少视频进行合成的操作,从而提高获取集锦合成视频的效率。另外,在缓存的直播视频数据包的时长超过预设上限时,以视频帧序列为单位由前至后依次丢弃缓存数据,并保证缓存的直播视频数据的首个视频帧为关键帧,可以保证精彩片段可以顺利进行合成,并且保证视频的正常播放,避免播放的集锦合成视频的内容缺失。
本申请的第四实施例涉及一种直播视频集锦的生成方法。第四实施例对第三实施例进行了改进,改进之处在于:在本申请第四实施例中将多个精彩片段拼接成一个精彩集锦,并对精彩集锦进行剪辑。具体流程图如图4所示。
步骤401,识别直播视频的直播画面。
步骤402,判断直播画面中是否存在目标图像元素。若判断结果为是,则进入步骤403,若判断结果为否,则返回步骤401。
步骤403,判断缓存的直播视频数据包的播放时长是否超过预设上限,若判断结果为 是,则进入步骤404;若判断结果为否,则进入步骤407。
步骤404,丢弃缓存的直播视频数据包中最早的视频帧序列。
步骤405,判断缓存的直播视频数据包中数据的首个视频帧是否为关键帧,若判断结果为是,则返回步骤403;若判断结果为否,则进入步骤406。
步骤406,将直播视频数据包中的首个视频帧丢弃。
步骤407,将缓存的直播视频数据包作为精彩片段保存。
步骤401至407与第三实施例中步骤301至307一致,在此不再赘述。
步骤408,判断是否满足合成条件,若判断结果为是,则进入步骤409,若判断结果为否,则返回步骤401。具体地说,通过预先训练的图像识别模型识别合成时刻,该合成条件可以为观众对目前直播内容不关注的时刻,比如,游戏直播时第一局刚结束,在进入第二局的过程中。用户通常沉浸在上一局游戏的内容中,对第二局的开场不甚关注。在识别到合成时刻之前,直播视频可能存在一个或多个精彩片段,识别到的多个精彩片段在时间上可能非连续,所以这些精彩片段在与直播视频进行合并时,需要预先对多个精彩片段进行处理,以使用户观看到连续的视频,不会出现卡顿、回退等播放问题。
步骤409,在存在多个精彩片段时,将多个精彩片段拼接成一个精彩集锦。具体地说,在对多个精彩片段拼接之前,可以对每一个精彩片段进行剪辑处理,比如,对精彩片段的每一视频帧画面进行裁剪,除去不重要的画面部分以突出精彩内容;还可以在画面中加入其它画面点缀元素,如:动画效果,旁白文字等,使精彩画面更加生动有趣。在对精彩片段剪辑后还需要根据精彩片段视频数据的时间戳对精彩片段进行调整,可以通过播放倍速的调节减少精彩片段的播放时长,从而使精彩片段的内容更加紧凑。
在对每个精彩片段处理之后,将多个精彩片段拼接成一个精彩集锦,通过每个精彩片段的时间戳可以判定多个精彩片段播放的时间顺序,按照时间戳表示的时间顺序进行拼接,可以使不同片段能够衔接的更加流畅,比如第一个精彩片段由直播视频第3分10秒至第3分25秒剪辑而成,第二个精彩片段由直播视频第5分15秒至第5分40秒剪辑而成,将以上两个精彩片段拼接,即可得到一段时长为40秒的精彩集锦,精彩集锦的前15秒为第一个精彩片段,精彩集锦的后25秒为第二个精彩片段,这样做可以在第一个精彩片段播放结束后立即播放第二个精彩片段,提高视频衔接的流畅度,保证多个精彩片段的连续播放。
步骤410,将精彩集锦与直播视频解码。
步骤411,将解码后的精彩集锦与解码后的直播视频合并,获得合成数据。
步骤412,将合成数据重新编码,获得集锦合成视频。
步骤413,将输出的直播视频切换至集锦合成视频。
步骤410至413与第三实施例中步骤308至311一致,在此不再赘述。
在本实施例中,将各精彩片段进行剪辑处理;将剪辑处理后的各精彩片段进行时间戳修复;将时间戳修复后的各集锦片段按时间戳顺序拼接成一个精彩集锦。这样做可以保证多个精彩片段的连续播放,提高多个精彩片段衔接的流畅度。
在实际应用中为保证视频播放的连续性,需要进行直播视频与集锦合成视频之间的切换,具体流程如图6所示。
步骤601,输出直播视频。
步骤602,判断是否满足切换时间点,若判断结果为是,则进入步骤603,否则返回步骤601。具体地说,在直播视频输出时可以预先将需要直播的视频在流媒体服务器中进行缓存,比如,在缓存队列中缓存长度为100帧的直播视频队列后,将缓存队列中的第一帧输出给客户端,假设在直播视频的第101帧开始进行合成,即当前集锦合成视频的首个视频帧的时间戳与直播视频缓存队列中第100帧的时间戳相同,那么在将输出的直播视频切换为集锦合成视频时,为了保证视频播放的连续性,将集锦合成视频的切换时间点设置为直播视频的第100帧的时间戳。若在合成完成时,流媒体服务器向客户端输出的直播视频为第80帧,这时还未满足切换时间点(即未达到第100帧的时间戳),则继续播放直播视频,直到满足切换时间点,将当前输出的直播视频切换为集锦合成视频。
步骤603,在首个视频帧为关键帧时,切换为集锦合成视频。
步骤604,判断集锦合成视频剩余时间是否为零,若判断结果为是,则进入步骤605,若判断结果为否,则进入步骤606。具体地说,若集锦合成视频的剩余时长不为零时,说明集锦合成视频并未播放完毕,需要继续播放集锦合成视频,在剩余时长为零时,为了保证播放视频的连续性,则需要进行视频的切换,防止视频播放出现空白。
步骤605,在首个视频帧为关键帧时,切换为直播视频。
步骤606,继续输出集锦合成视频,直至集锦合成视频的剩余时长为零。
下面以游戏直播为例,具体说明集锦合成视频的获取过程,对直播视频数据的处理过程如图7所示。
游戏直播的视频直播数据通过服务器输出至客户端,客户端为用户提供观看游戏直播的窗口,本申请是针对服务器对视频数据的处理,使处理后的数据可以直接供客户端播放给用户观看。在服务器接收到游戏直播的音视频数据时,将接收到的音视频数据包放入缓存队列,并且对接收的游戏音视频进行解码,并对解码后的视频画面通过第一图像识别模型进行 图像识别,在识别到目标图像元素时,说明此刻播放的直播视频为用户关注的内容,例如,游戏直播中击杀成功或出现精彩走位等,将这些内容出现的画面作为精彩片段保存,在保存精彩片段时可以根据时间戳判断缓存队列中的缓存数据(放入缓存队列的音视频数据包)的时长是否超过预设上限,在超过预设上限时,丢弃部分缓存队列中的缓存数据,使缓存数据的时长在预设上限之内,保存游戏直播的缓存数据作为精彩片段。在保存多个精彩片段时,将多个精彩片段剪辑,并拼接成一个精彩集锦。
在合成时刻需要播放所获得的精彩片段或精彩集锦,在识别到合成时刻时,将精彩片段或精彩集锦解码,同时将合成时刻之后的直播视频解码,将解码后的直播视频与精彩片段(精彩集锦)合并为集锦合成视频,集锦合成视频中即包含精彩片段,又包含直播视频。在服务器输出的直播视频达到合成时刻时,将输出的直播视频切换为集锦合成视频,使用户在观看直播的过程中同时观看到直播视频播放过的精彩片段。另外,若用户需要重点关注精彩片段时,可以选择将精彩片段作为播放视频的背景,将直播视频作为小窗口显示;同样,在用户关注直播视频时,将直播视频作为背景,将精彩片段作为小窗口显示;也可以将两个视频内容并排显示,具体显示方式依据具体情况而定,在此不做限制。
在集锦合成视频播放完毕,即集锦合成视频的剩余时间为零时,服务器将输出的集锦合成视频切换为直播视频,以保证视频播放的连续性。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请第五实施例涉及一种直播视频集锦的生成装置,如图8所示,包括:第一识别模块81,存储模块82,第二识别模块83,合成模块84,输出模块85;第一识别模块81用于在输出直播视频时,识别直播画面并判断直播画面中是否存在目标图像元素;存储模块82用于在第一识别模块识别到直播画面中存在目标图像元素时,将直播画面所在的一段直播视频作为精彩片段保存;第二识别模块83用于识别是否满足合成条件;合成模块84用于在满足合成条件时,根据合成条件将精彩片段合并至直播视频中,获得集锦合成视频;输出模块85用于输出直播视频,并在获得集锦合成视频后,将输出的视频由直播视频切换至集锦合成视频。
另外,直播视频集锦的生成装置还可以包括解码模块,编码模块;解码模块用于将精彩片段解码,并将直播视频解码;合成模块具体用于将解码后的精彩片段与解码后的直播视 频合并,获得合成数据;编码模块用于将合成数据重新编码,获得集锦合成视频。
另外,合成数据可以包括:视频合成数据和音频合成数据;合成模块具体用于将解码后的精彩片段的各视频帧与解码后的直播视频的各视频帧以预设模式合并,获得视频合成数据;将解码后的精彩片段的音频流与解码后的直播视频的音频流以混音的形式合并,获得音频合成数据。
另外,直播视频集锦的生成装置还可以还包括缓存模块;缓存模块用于实时缓存直播视频的直播视频数据包;存储模块具体用于将当前缓存的直播视频数据包作为精彩片段保存。
另外,存储模块还用于在当前缓存的直播视频数据包的播放时长超过预设上限时,丢弃部分缓存的直播视频数据包,并将丢弃后的直播视频数据包作为精彩片段保存。
另外,存储模块在丢弃部分缓存的直播视频数据包时,具体用于将当前缓存的直播视频数据包中的视频帧序列按照直播视频数据包的播放的先后顺序,从前到后依次丢弃,直至丢弃后的缓存的直播视频数据包的播放时长未超过预设上限。
另外,存储模块存储的直播视频数据包具体为首个视频帧为关键帧的缓存的直播视频数据包。
另外,直播视频集锦的生成装置还可以还包括拼接模块;拼接模块用于在精彩片段数量大于一个时,将多个精彩片段拼接成一个精彩集锦;合并模块具体用于将精彩集锦合并至直播视频中。
另外,拼接模块还用于将各精彩片段进行剪辑处理,并将剪辑处理后的各精彩片段进行时间戳修复,将时间戳修复后的各精彩片段按时间戳顺序拼接成一个精彩集锦。
另外,输出模块具体用于根据直播视频的时间戳满足切换的时间点时,将输出的视频由直播视频切换至集锦合成视频。
另外,输出模块还用于在集锦合成视频的剩余时间为零时,将输出的集锦合成视频切换至直播视频。
另外,输出模块还用于在集锦合成视频的剩余时长为零,且判定直播视频首个视频帧为关键帧时,将输出的集锦合成视频切换为直播视频。
另外,第一识别模块具体根据预先建立的图像识别模型识别直播画面并判断直播画面中是否存在目标图像元素;其中,图像识别模型根据采集的图像特征训练得到。
另外,第一识别模块具体用于获取直播画面中的指定区域,并根据预先建立的图像识别模型,识别指定区域中是否存在目标图像元素;若指定区域中存在目标图像元素,则判定直播画面中存在目标图像元素。
另外,第一识别模块在识别指定区域中是否存在目标图像元素时,具体用于在指定区域中根据目标图像元素的尺寸截取目标区域;将目标区域的特征值输入图像识别模型,根据图像识别模型的输出结果判断目标区域中是否存在目标图像元素;若目标区域中不存在目标图像元素,则按预设策略对目标区域的位置进行偏移,并识别偏移后的目标区域中是否存在目标图像元素。
另外,第一识别模块在还用于获取指定区域的灰度图,并根据预先建立的图像识别模型,识别灰度图中是否存在目标图像元素。
本申请实施例相对于现有技术而言,在直播过程中,用户在观看直播的过程中同时观看到直播视频播放过的精彩片段,提高集锦视频播放的时效性。并且在剪辑过程中无需人工识别精彩片段,节省了大量的人力资源并且解决了人工剪辑效率低下。
不难发现,本实施例为与第一实施例相对应的装置实施例,本实施例可与第一实施例互相配合实施。第一实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在第一实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本申请第六实施例涉及一种服务器,如图9所示,包括至少一个处理器901;以及,与至少一个处理器901通信连接的存储器902;其中,存储器902存储有可被至少一个处理器901执行的指令,指令被至少一个处理器901执行,以使至少一个处理器901能够执行上述的直播视频集锦的生成方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用 的数据。
本申请第七实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (19)

  1. 一种直播视频集锦的生成方法,包括:
    识别直播视频的直播画面并判断所述直播画面中是否存在目标图像元素;
    若所述直播画面中存在目标图像元素,则将所述直播画面所在的一段直播视频作为精彩片段保存;
    在满足合成条件时,根据所述合成条件将所述精彩片段合并至直播视频中,获得集锦合成视频;
    将输出的直播视频切换至所述集锦合成视频。
  2. 根据权利要求1所述的直播视频集锦的生成方法,其中,所述将所述精彩片段合并至直播视频中,获得集锦合成视频,具体包括:
    将所述精彩片段解码,并将所述直播视频解码;
    将解码后的精彩片段与所述解码后的直播视频合并,获得合成数据;
    将所述合成数据重新编码,获得集锦合成视频。
  3. 根据权利要求2所述的直播视频集锦的生成方法,其中,所述合成数据包括:视频合成数据和音频合成数据;
    所述将解码后的精彩片段与所述解码后的直播视频合并,具体包括:
    将解码后的精彩片段的各视频帧与解码后的直播视频的各视频帧以预设模式合并,获得所述视频合成数据;
    将解码后的精彩片段的音频流与解码后的直播视频的音频流以混音的形式合并,获得所述音频合成数据。
  4. 根据权利要求1所述的直播视频集锦的生成方法,其中,在所述判断所述直播画面中是否存在目标图像元素之前,还包括:
    实时缓存直播视频的直播视频数据包;
    所述将所述直播画面所在的一段直播视频作为精彩片段保存,具体包括:
    将当前缓存的直播视频数据包作为精彩片段保存。
  5. 根据权利要求4所述的直播视频集锦的生成方法,其中,在所述将当前缓存的直播视频数据包作为精彩片段保存之前,还包括:
    判断当前缓存的直播视频数据包的播放时长是否超过预设上限;
    若超过预设上限,则丢弃部分缓存的直播视频数据包。
  6. 根据权利要求5所述的直播视频集锦的生成方法,其中,所述丢弃部分缓存的直播视频数据包,具体包括:
    将当前缓存的直播视频数据包中的视频帧序列按照所述直播视频数据包的播放的先后顺序,从前到后依次丢弃,直至所述丢弃后的缓存的直播视频数据包的播放时长未超过预设上限。
  7. 根据权利要求4所述的直播视频集锦的生成方法,其中,所述作为精彩片段保存的直播视频数据包,具体为:首个视频帧为关键帧的缓存的直播视频数据包。
  8. 根据权利要求1所述的直播视频集锦的生成方法,其中,在所述根据所述合成条件将所述精彩片段合并至直播视频中之前,还包括:
    若所述保存的精彩片段数量大于一个,则将多个精彩片段拼接成一个精彩集锦;
    所述将所述精彩片段合并至直播视频中,具体为:将所述精彩集锦合并至直播视频中。
  9. 根据权利要求8所述的直播视频集锦的生成方法,其中,所述将多个精彩片段拼接成一个精彩集锦,具体包括:
    将各精彩片段进行剪辑处理;
    将所述剪辑处理后的各精彩片段进行时间戳修复;
    将所述时间戳修复后的各精彩片段按时间戳顺序拼接成一个精彩集锦。
  10. 根据权利要求1至9中任一项所述的直播视频集锦的生成方法,其中,所述将输出的直播视频切换至所述集锦合成视频,具体包括:
    根据直播视频的时间戳判断是否满足切换的时间点;
    在满足切换的时间点时,将输出的视频由所述直播视频切换至所述集锦合成视频。
  11. 根据权利要求1至9中任一项所述的直播视频集锦的生成方法,其中,在所述将输出的直播视频切换至所述集锦合成视频之后,还包括:
    判断所述集锦合成视频的剩余时长;
    在所述剩余时长为零时,将输出的集锦合成视频切换为直播视频。
  12. 根据权利要求1至9中任一项所述的直播视频集锦的生成方法,其中,在所述将输出的直播视频切换至所述集锦合成视频之后,还包括:
    判断所述集锦合成视频的剩余时长;
    在所述集锦合成视频的剩余时长为零时,判断直播视频的首个视频帧是否为关键帧;
    在所述直播视频的首个视频帧为关键帧时,将输出的集锦合成视频切换为直播视频。
  13. 根据权利要求1至9中任一项所述的直播视频集锦的生成方法,其中,所述识别直播画面并判断所述直播画面中是否存在目标图像元素,具体包括:
    根据预先建立的图像识别模型识别直播画面并判断所述直播画面中是否存在目标图像元素;其中,所述图像识别模型根据采集的图像特征训练得到。
  14. 根据权利要求13所述的直播视频集锦的生成方法,其中,所述根据预先建立的图像识别模型识别直播画面并判断所述直播画面中是否存在目标图像元素,具体包括:
    获取所述直播画面中的指定区域;
    根据预先建立的图像识别模型,识别所述指定区域中是否存在目标图像元素;
    若所述指定区域中存在目标图像元素,则判定所述直播画面中存在目标图像元素。
  15. 根据权利要求14所述的直播视频集锦的生成方法,其中,所述识别所述指定区域中是否存在目标图像元素,具体包括:
    在所述指定区域中根据所述目标图像元素的尺寸截取目标区域;
    将所述目标区域的特征值输入所述图像识别模型,根据所述图像识别模型的输出结果判断所述目标区域中是否存在所述目标图像元素;
    若所述目标区域中不存在所述目标图像元素,则按预设策略对所述目标区域的位置进行偏移,并识别偏移后的目标区域中是否存在所述目标图像元素。
  16. 根据权利要求14所述的直播视频集锦的生成方法,其中,所述获取所述直播画面中的指定区域后,还包括获取所述指定区域的灰度图;
    所述根据预先建立的图像识别模型,识别所述指定区域中是否存在目标图像元素,具体为根据预先建立的图像识别模型,识别所述灰度图中是否存在目标图像元素。
  17. 一种直播视频集锦的生成装置,包括:第一识别模块,存储模块,第二识别模块,合成模块,输出模块;
    所述第一识别模块用于识别直播视频的直播画面并判断所述直播画面中是否存在目标图像元素;
    所述存储模块用于在识别所述直播画面中存在目标图像元素时,将所述直播画面所在的一段直播视频作为精彩片段保存;
    所述第二识别模块用于识别是否满足合成条件;
    所述合成模块用于在满足合成条件时,根据所述合成条件将所述精彩片段合并至直播视频中,获得集锦合成视频;
    所述输出模块用于将输出的直播视频切换至所述集锦合成视频。
  18. 一种服务器,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至16中任一项所述的直播视频集锦的生成方法。
  19. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至16中任一项所述的直播视频集锦的生成方法。
PCT/CN2019/086049 2019-04-02 2019-05-08 直播视频集锦的生成方法、装置、服务器及存储介质 WO2020199303A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19765403.1A EP3739888A1 (en) 2019-04-02 2019-05-08 Live stream video highlight generation method and apparatus, server, and storage medium
US16/569,866 US11025964B2 (en) 2019-04-02 2019-09-13 Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910262673.7A CN109862388A (zh) 2019-04-02 2019-04-02 直播视频集锦的生成方法、装置、服务器及存储介质
CN201910262673.7 2019-04-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/569,866 Continuation US11025964B2 (en) 2019-04-02 2019-09-13 Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection

Publications (1)

Publication Number Publication Date
WO2020199303A1 true WO2020199303A1 (zh) 2020-10-08

Family

ID=66902961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/086049 WO2020199303A1 (zh) 2019-04-02 2019-05-08 直播视频集锦的生成方法、装置、服务器及存储介质

Country Status (3)

Country Link
EP (1) EP3739888A1 (zh)
CN (1) CN109862388A (zh)
WO (1) WO2020199303A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112511889A (zh) * 2020-11-17 2021-03-16 北京达佳互联信息技术有限公司 视频播放方法、装置、终端及存储介质
CN112929696A (zh) * 2021-01-26 2021-06-08 广州欢网科技有限责任公司 多剧集影视的拼接方法、装置、存储介质及电子设备
CN114727138A (zh) * 2022-03-31 2022-07-08 大众问问(北京)信息科技有限公司 商品信息处理方法、装置和计算机设备
CN115022663A (zh) * 2022-06-15 2022-09-06 北京奇艺世纪科技有限公司 直播流的处理方法、装置、电子设备、以及介质

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110337027A (zh) * 2019-07-11 2019-10-15 北京字节跳动网络技术有限公司 视频生成方法、装置及电子设备
CN110505234B (zh) * 2019-08-30 2022-07-26 湖南快乐阳光互动娱乐传媒有限公司 一种基于b/s架构的直播转点播方法及装置
CN112533023B (zh) * 2019-09-19 2022-02-25 聚好看科技股份有限公司 连麦合唱作品的生成方法和显示设备
CN110830847B (zh) * 2019-10-24 2022-05-06 杭州威佩网络科技有限公司 一种截取比赛视频片段的方法、装置及电子设备
CN110798699A (zh) * 2019-11-27 2020-02-14 北京翔云颐康科技发展有限公司 视频直播方法及装置
CN111212321A (zh) * 2020-01-10 2020-05-29 上海摩象网络科技有限公司 视频处理方法、装置、设备及计算机存储介质
CN111212299B (zh) * 2020-01-16 2022-02-11 广州酷狗计算机科技有限公司 直播视频教程的获取方法、装置、服务器及存储介质
CN111246126A (zh) * 2020-03-11 2020-06-05 广州虎牙科技有限公司 基于直播平台的导播切换方法、系统、装置、设备及介质
CN111770359B (zh) * 2020-06-03 2022-10-11 苏宁云计算有限公司 一种赛事视频剪辑方法、系统及计算机可读存储介质
CN111800648A (zh) * 2020-06-30 2020-10-20 北京玩在一起科技有限公司 一种电竞赛事直播流剪辑方法及系统
CN112153469A (zh) * 2020-09-07 2020-12-29 北京达佳互联信息技术有限公司 多媒体资源的播放方法、装置、终端、服务器及存储介质
CN112188221B (zh) * 2020-09-24 2023-04-07 广州虎牙科技有限公司 播放控制方法、装置、计算机设备及存储介质
CN112261425B (zh) * 2020-10-20 2022-07-12 成都中科大旗软件股份有限公司 一种视频直播和录像播放方法及系统
CN112672218A (zh) * 2020-12-16 2021-04-16 福州凌云数据科技有限公司 一种用于批量生成视频的剪辑方法
CN114697700A (zh) * 2020-12-28 2022-07-01 北京小米移动软件有限公司 视频剪辑方法、视频剪辑装置及存储介质
CN115119044B (zh) * 2021-03-18 2024-01-05 阿里巴巴新加坡控股有限公司 视频处理方法、设备、系统及计算机存储介质
CN113286157A (zh) * 2021-04-06 2021-08-20 北京达佳互联信息技术有限公司 一种视频播放方法、装置、电子设备及存储介质
CN112804556B (zh) * 2021-04-08 2021-06-22 广州无界互动网络科技有限公司 一种面向pc端的直播数据处理方法、装置和系统
CN113329263B (zh) * 2021-05-28 2023-10-17 努比亚技术有限公司 一种游戏视频集锦制作方法、设备及计算机可读存储介质
CN113507624B (zh) * 2021-09-10 2021-12-21 明品云(北京)数据科技有限公司 一种视频信息推荐方法及系统
CN114257875B (zh) * 2021-12-16 2024-04-09 广州博冠信息科技有限公司 数据传输方法、装置、电子设备和存储介质
CN114501058A (zh) * 2021-12-24 2022-05-13 北京达佳互联信息技术有限公司 一种视频生成方法、装置、电子设备及存储介质
CN114007084B (zh) * 2022-01-04 2022-09-09 秒影工场(北京)科技有限公司 一种视频片段云存储方法及装置
CN114339309A (zh) * 2022-01-11 2022-04-12 北京易智时代数字科技有限公司 一种基于视频的内容添加方法
CN114827454B (zh) * 2022-03-15 2023-10-24 荣耀终端有限公司 视频的获取方法及装置
CN115802101A (zh) * 2022-11-25 2023-03-14 深圳创维-Rgb电子有限公司 一种短视频生成方法、装置、电子设备及存储介质
CN116095355B (zh) * 2023-01-18 2024-06-21 百果园技术(新加坡)有限公司 视频显示控制方法及其装置、设备、介质、产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310894A1 (en) * 2014-04-23 2015-10-29 Daniel Stieglitz Automated video logging methods and systems
CN108540854A (zh) * 2018-03-29 2018-09-14 努比亚技术有限公司 直播视频剪辑方法、终端及计算机可读存储介质
CN108769801A (zh) * 2018-05-28 2018-11-06 广州虎牙信息科技有限公司 短视频的合成方法、装置、设备及存储介质
CN109089128A (zh) * 2018-07-10 2018-12-25 武汉斗鱼网络科技有限公司 一种视频处理方法、装置、设备及介质
CN109194978A (zh) * 2018-10-15 2019-01-11 广州虎牙信息科技有限公司 直播视频剪辑方法、装置和电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030056213A1 (en) * 2001-05-16 2003-03-20 Mcfaddin James E. Method and system for delivering a composite information stream over a computer network
CN101751926B (zh) * 2008-12-10 2012-07-04 华为技术有限公司 信号编码、解码方法及装置、编解码系统
CN102547214B (zh) * 2012-02-22 2013-05-29 腾讯科技(深圳)有限公司 一种多人视频通信中的视频编码方法以及终端
CN103605953B (zh) * 2013-10-31 2018-06-19 电子科技大学 基于滑窗搜索的车辆兴趣目标检测方法
CN105744292B (zh) * 2016-02-02 2017-10-17 广东欧珀移动通信有限公司 一种视频数据的处理方法及装置
US10390082B2 (en) * 2016-04-01 2019-08-20 Oath Inc. Computerized system and method for automatically detecting and rendering highlights from streaming videos
CN107682717B (zh) * 2017-08-29 2021-07-20 百度在线网络技术(北京)有限公司 服务推荐方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310894A1 (en) * 2014-04-23 2015-10-29 Daniel Stieglitz Automated video logging methods and systems
CN108540854A (zh) * 2018-03-29 2018-09-14 努比亚技术有限公司 直播视频剪辑方法、终端及计算机可读存储介质
CN108769801A (zh) * 2018-05-28 2018-11-06 广州虎牙信息科技有限公司 短视频的合成方法、装置、设备及存储介质
CN109089128A (zh) * 2018-07-10 2018-12-25 武汉斗鱼网络科技有限公司 一种视频处理方法、装置、设备及介质
CN109194978A (zh) * 2018-10-15 2019-01-11 广州虎牙信息科技有限公司 直播视频剪辑方法、装置和电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3739888A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112511889A (zh) * 2020-11-17 2021-03-16 北京达佳互联信息技术有限公司 视频播放方法、装置、终端及存储介质
CN112929696A (zh) * 2021-01-26 2021-06-08 广州欢网科技有限责任公司 多剧集影视的拼接方法、装置、存储介质及电子设备
CN114727138A (zh) * 2022-03-31 2022-07-08 大众问问(北京)信息科技有限公司 商品信息处理方法、装置和计算机设备
CN114727138B (zh) * 2022-03-31 2023-12-19 大众问问(北京)信息科技有限公司 商品信息处理方法、装置和计算机设备
CN115022663A (zh) * 2022-06-15 2022-09-06 北京奇艺世纪科技有限公司 直播流的处理方法、装置、电子设备、以及介质

Also Published As

Publication number Publication date
EP3739888A4 (en) 2020-11-18
EP3739888A1 (en) 2020-11-18
CN109862388A (zh) 2019-06-07

Similar Documents

Publication Publication Date Title
WO2020199303A1 (zh) 直播视频集锦的生成方法、装置、服务器及存储介质
US10581947B2 (en) Video production system with DVE feature
US11025964B2 (en) Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection
CN106921866B (zh) 辅助直播的多视频导播方法和设备
CN106060578B (zh) 生成视频数据的方法和系统
CN108282598B (zh) 一种软件导播系统及方法
JP6267961B2 (ja) 映像提供方法および送信装置
JP3198980B2 (ja) 画像表示装置及び動画像検索システム
CN101778257B (zh) 用于数字视频点播中的视频摘要片断的生成方法
CN107105315A (zh) 直播方法、主播客户端的直播方法、主播客户端及设备
US20020051081A1 (en) Special reproduction control information describing method, special reproduction control information creating apparatus and method therefor, and video reproduction apparatus and method therefor
KR102246305B1 (ko) 증강 미디어 서비스 제공 방법, 장치 및 시스템
CN109618179A (zh) 超高清视频直播的快速起播方法及装置
CN109891896A (zh) 用于直播流的锚
US11037603B1 (en) Computing system with DVE template selection and video content item generation feature
US10257436B1 (en) Method for using deep learning for facilitating real-time view switching and video editing on computing devices
US11451858B2 (en) Method and system of processing information flow and method of displaying comment information
CN106060627A (zh) 一种基于多路直播的音频处理方法及装置
CN110536164A (zh) 显示方法、视频数据处理方法及相关设备
CN107592549A (zh) 基于双向通信的全景视频播放拍照系统
CN110996021A (zh) 导播切换方法、电子设备和计算机可读存储介质
KR102069897B1 (ko) 사용자 영상 생성 방법 및 이를 위한 장치
KR102313309B1 (ko) 고객 맞춤형 라이브 방송 시스템
CN115734007B (zh) 视频剪辑方法、装置、介质及视频处理系统
CN116801006A (zh) 直播连麦合流方法、设备和存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019765403

Country of ref document: EP

Effective date: 20190917

ENP Entry into the national phase

Ref document number: 2019765403

Country of ref document: EP

Effective date: 20190917

NENP Non-entry into the national phase

Ref country code: DE