WO2021017496A1 - 导播方法、装置及计算机可读存储介质 - Google Patents

导播方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2021017496A1
WO2021017496A1 PCT/CN2020/080867 CN2020080867W WO2021017496A1 WO 2021017496 A1 WO2021017496 A1 WO 2021017496A1 CN 2020080867 W CN2020080867 W CN 2020080867W WO 2021017496 A1 WO2021017496 A1 WO 2021017496A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
event
preset
editing strategy
guide
Prior art date
Application number
PCT/CN2020/080867
Other languages
English (en)
French (fr)
Inventor
梅涛
左佳伟
姚霆
王林芳
刘武
徐俊
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2021017496A1 publication Critical patent/WO2021017496A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/28Mobile studios
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects

Definitions

  • the present disclosure relates to the field of multimedia technology, and in particular, to a broadcast guide method, device and computer-readable storage medium.
  • the cameraman collects the video images of the event site, and transmits the video images of multiple cameras at the event site to the guidance vehicle.
  • the director team on the guidance vehicle conducts the video images of the multiple locations on the scene based on professional knowledge and experience. Edit to form a guide video screen.
  • a technical problem solved by the present disclosure is how to automatically guide the video images at the event site.
  • a broadcasting guide method including: performing video analysis on a video screen at the event site to detect whether a preset event occurs at the event site; in the case where a preset event is detected at the event site , Look up the event and editing strategy mapping table to determine the video editing strategy corresponding to the preset event; edit the video screen according to the video editing strategy to obtain the guide video.
  • the video editing strategy is a sequence of multiple targets; editing the video screen according to the video editing strategy to obtain the guide video includes: in the multi-channel video screen at the event site, respectively determining the video containing each target Screen: According to the sequence, the video screen containing each target is played in turn to form a guide video.
  • the method further includes: before sequentially playing the video images containing each target, respectively adjusting the video images containing each target to a close-up shot of the corresponding target.
  • the video editing strategy is slow motion playback of preset events from different perspectives; editing the video frame according to the video editing strategy to obtain the guide video includes: determining to include multiple video frames in the event site The video images of the preset events originating from different perspectives; the slow motion playback of the preset events in the video images originating from different perspectives is played sequentially to form a guide video.
  • playing the slow motion playback of preset events in video images from different perspectives in sequence includes: determining the start time of the preset event; performing video analysis on the video images at the event site to determine the preset The end time of the event; the playback start time and end time in sequence, the preset event is played back in slow motion in the video screen from different perspectives.
  • the video editing strategy is visual special effects; editing the video screen according to the video editing strategy to obtain the guide video includes: detecting the location information of the target associated with the preset event in the video screen; and according to the location information on the video screen Render the visual effects in the middle to get the guide video.
  • rendering the visual special effects in the video screen according to the location includes: rendering the visual special effects on the video screen on the server according to the location information, and transmitting the rendered video screen of the visual effects to the client; or, The location information, the video screen and the logo of the visual special effect are transmitted to the client, and the visual special effect is rendered in the video screen according to the location information and the logo of the visual special effect on the client.
  • the method further includes: extracting preset events and their corresponding optional video editing strategies from the existing guide videos; using data mining algorithms to process the preset events and their corresponding optional video editing strategies, Obtain the video editing strategy corresponding to the preset event; use the preset event and its corresponding video editing strategy to construct an event and editing strategy mapping table.
  • performing video analysis on the video images of the event site to detect whether a preset event occurs at the event site includes: inputting a single-channel video image of the event site into a pre-trained neural network for video analysis, or Multi-channel video images are synchronously input to the pre-trained neural network for video analysis to obtain the probability of a preset event at the event site; according to the probability, it is judged whether a preset event occurs at the event site.
  • it further includes: using the current time to look up the time and editing strategy mapping table to determine the video editing strategy corresponding to the current time.
  • it further includes: extracting optional video editing strategies corresponding to each time period from the existing guide videos; using data mining algorithms to process each time period and its corresponding optional video editing strategies to obtain each Video editing strategy corresponding to the time period; use each time period and its corresponding editing strategy to construct a time and editing strategy mapping table.
  • the method further includes: determining the target detection frame of the movable target in the video frame; adjusting the position and/or angle of the camera according to the position of the target detection frame, so that the target detection frame is located in a preset area in the video frame .
  • the method further includes: adjusting the zoom factor of the camera according to the area of the target detection frame, so that the area of the target detection frame is within the preset numerical range.
  • it further includes: adopting an automatic color equalization algorithm to adjust the color and brightness of the video image; and adopting an adaptive contrast enhancement algorithm to adjust the contrast of the video image.
  • a broadcast guide device including: an event detection module configured to perform video analysis on video images at the event site to detect whether a preset event occurs at the event site; a mapping table search module , Is configured to look up the event and editing strategy mapping table when a preset event is detected at the event site to determine the video editing strategy corresponding to the preset event; the video editing module is configured to adjust the video screen according to the video editing strategy Edit and get the guide video.
  • the video editing strategy is a sequence of multiple targets; the video editing module is configured to: in the multiple video screens at the event site, determine the video screens containing each target; according to the sequence, play sequentially Contains the video images of each target to form a guide video.
  • the video editing module is further configured to: before sequentially playing the video images containing each target, adjust the video images containing each target to a close-up shot of the corresponding target.
  • the video editing strategy is slow-motion playback of preset events originating from different perspectives; the video editing module is configured to: in the multi-channel video images at the event site, determine the preset events originating from different perspectives The video screen; the slow motion playback of the preset event in the video screen from different perspectives is played sequentially to form a guide video.
  • the video editing module is configured to: determine the start time when the preset event occurs; perform video analysis on the video screen of the event site to determine the end time of the preset event; play the start time and the end time in sequence Inside, preset events are played back in slow motion in video frames from different perspectives.
  • the video editing strategy is visual special effects
  • the video editing module is configured to: detect the position information of the target associated with the preset event in the video frame; render the visual special effects in the video frame according to the position information to obtain the guide video .
  • the video editing module is configured to: render the video screen with visual effects on the server according to the location information, and transmit the rendered video screen with the visual effects to the client; or, transmit the location information and the video
  • the logos of the screen and visual effects are transmitted to the client, and the visual effects are rendered on the video screen according to the location information and the logo of the visual effects on the client.
  • it further includes an event and editing strategy mapping table building module, which is configured to: extract preset events and their corresponding optional video editing strategies from the existing guide videos; and use data mining algorithms to analyze preset events And its corresponding optional video editing strategy are processed to obtain the video editing strategy corresponding to the preset event; the preset event and its corresponding video editing strategy are used to construct the event and editing strategy mapping table.
  • an event and editing strategy mapping table building module which is configured to: extract preset events and their corresponding optional video editing strategies from the existing guide videos; and use data mining algorithms to analyze preset events And its corresponding optional video editing strategy are processed to obtain the video editing strategy corresponding to the preset event; the preset event and its corresponding video editing strategy are used to construct the event and editing strategy mapping table.
  • the event detection module is configured to: input a single-channel video image of the event scene into a pre-trained neural network for video analysis, or input multiple video images of the event scene into a pre-trained neural network for video analysis. , Get the probability of a preset event at the event site; judge whether a preset event occurs at the event site based on the probability.
  • mapping table search module is further configured to use the current time to search the time and editing strategy mapping table to determine the video editing strategy corresponding to the current time.
  • it also includes a time and editing strategy mapping table building module, which is configured to: extract the optional video editing strategy corresponding to each time period from the existing guide video; use a data mining algorithm to analyze each time period and its The corresponding optional video editing strategy is processed to obtain the video editing strategy corresponding to each time period; each time period and its corresponding editing strategy are used to construct a time and editing strategy mapping table.
  • a time and editing strategy mapping table building module which is configured to: extract the optional video editing strategy corresponding to each time period from the existing guide video; use a data mining algorithm to analyze each time period and its The corresponding optional video editing strategy is processed to obtain the video editing strategy corresponding to each time period; each time period and its corresponding editing strategy are used to construct a time and editing strategy mapping table.
  • it further includes a camera adjustment module configured to: determine the target detection frame of the movable target in the video frame; adjust the position and/or angle of the camera according to the position of the target detection frame, so that the target detection frame is located The preset area in the video screen.
  • a camera adjustment module configured to: determine the target detection frame of the movable target in the video frame; adjust the position and/or angle of the camera according to the position of the target detection frame, so that the target detection frame is located The preset area in the video screen.
  • the camera adjustment module is further configured to adjust the zoom factor of the camera according to the area of the target detection frame, so that the area of the target detection frame is within a preset value range.
  • it further includes a picture adjustment module configured to: use an automatic color equalization algorithm to adjust the color and brightness of the video picture; and use an adaptive contrast enhancement algorithm to adjust the contrast of the video picture.
  • a broadcasting directing device including: a memory; and a processor coupled to the memory, the processor being configured to execute the aforementioned directing method based on instructions stored in the memory.
  • a computer-readable storage medium wherein the computer-readable storage medium stores computer instructions, and the instructions are executed by a processor to implement the aforementioned broadcasting directing method.
  • the present disclosure can automatically guide the video images at the event site, reduce the dependence on directors, and reduce the labor cost in the guide process.
  • Fig. 1 shows a schematic flow chart of a directing method of some embodiments of the present disclosure.
  • Fig. 2 shows a schematic flow chart of a directing method according to other embodiments of the present disclosure.
  • Fig. 3 shows a schematic structural diagram of a broadcasting guide device according to some embodiments of the present disclosure.
  • Fig. 4 shows a schematic structural diagram of a broadcasting guide device according to other embodiments of the present disclosure.
  • the inventor's research found that the traditional directing method has higher labor costs and higher time costs.
  • the director of an event usually requires 40 to 60 relevant staff, and the director team needs to deploy and prepare for the event 1 to 3 days in advance, which takes a long time.
  • the present disclosure provides a directing method that can automatically direct the video images of the event site, reduce the dependence on the director, and reduce the labor cost in the directing process.
  • Fig. 1 shows a schematic flow chart of a directing method of some embodiments of the present disclosure. As shown in Figure 1, this embodiment includes steps S101 to S105.
  • step S101 video analysis is performed on the video images of the event site to detect whether a preset event occurs at the event site.
  • the neural network is usually a convolutional neural network, on which computer vision algorithms such as face detection algorithm, human tracking algorithm, object detection algorithm, video scene recognition algorithm, video action event recognition algorithm, etc. can be run to detect the person in the video screen , Actions, events, etc.
  • the position coordinates of the face, the human body, and the specified object can be detected, so as to detect the preset event occurring at the event scene in real time.
  • the preset events can be goals, fouls, offsides, free kicks, and so on.
  • step S101 In the case that a preset event is not detected at the event site, return to step S101. In the event that a preset event is detected at the event site, step S103 is executed.
  • step S103 a mapping table between the event and the editing strategy is searched to determine the video editing strategy corresponding to the preset event.
  • the event and editing strategy mapping table contains the mapping relationship between different preset events and different video editing strategies.
  • the video editing strategy corresponding to the preset event "Goal” is the camera switching strategy "(Host position—>)Goal player—>Goalkeeper—>Coach—>Audience (—>Host position)”
  • the video editing strategy corresponding to "Foul” is the slow motion playback strategy "(Host position—>) A perspective foul slow motion—>B perspective foul slow motion—>C perspective foul slow motion (—>host position)”
  • preset event " The video editing strategy corresponding to "Free kick” is to render special effects "Add an arrow from the football to the goal and display the distance”.
  • the director can record all related operations such as switching shots, making slow motions, etc., and also record the time point of the director switching video and the location identification of the switch.
  • the existing guide video can be divided into several video segments, and video analysis algorithms such as video actions, events, and person recognition can be used to label each video segment.
  • a data mining algorithm is used to process the preset event and its corresponding optional video editing strategy, and the video editing strategy corresponding to the preset event is obtained.
  • the optional video editing strategies include "A view foul slow motion -> B view foul slow motion -> C view foul slow motion -> D view foul slow motion", "A view foul slow motion” Foul slow motion -> B perspective foul slow motion -> C perspective foul slow motion -> E perspective foul slow motion” and so on.
  • the video editing strategy corresponding to the preset event "foul” is "A view foul slow motion -> B view foul slow motion -> C view foul slow motion” .
  • step S105 the video screen is edited according to the video editing strategy to obtain a guide video.
  • the step S105 is introduced in three situations below.
  • the video editing strategy is a sequence of multiple goals.
  • the video editing strategy corresponding to the preset event "Goal” as the camera switching strategy "(Host position—>) Goal player—> Goalkeeper—> Coach—> Spectator (—> Host position)" as an example.
  • the video images containing each target are determined respectively.
  • a goalkeeper is detected in the video screen of camera A
  • a goalkeeper is detected in the video screen of camera B
  • a coach is detected in the video screen of camera C
  • a coach is detected in the video screen of camera D. Audience.
  • the video screens containing each target are played in turn, that is, the video screens are switched from the host position to the A camera, the B camera, the C camera, and the D camera, and then back to the host camera to form a guide. video.
  • the playing time of each camera position can be set according to actual needs.
  • the video images containing the various targets may be adjusted to the close-up shots of the corresponding targets.
  • the goal detection frame of the goalkeeper in the video screen is obtained through video detection and analysis, and then the zoom factor of the camera is adjusted according to the area of the target detection frame, so that the area of the target detection frame is within the preset value range, such as the video screen area About 50%.
  • the video editing strategy is slow motion playback of preset events from different perspectives.
  • the video editing strategy corresponding to the preset event "foul” is still the slow-motion playback strategy "(Host position—>) A perspective foul slow motion—>B perspective foul slow motion—>C perspective foul slow motion (—>host position) "As an example.
  • the multi-channel video images at the event site determine the video images from different perspectives that contain the preset event. It is assumed that the video images provided by the three cameras A, B, and C all detect the preset event "foul". Then, the slow-motion playback of the preset events in the video images from different perspectives is played in sequence to form a guide video.
  • the start time of the preset event is the timestamp in the video screen when a preset event is detected in the video screen. Then, perform video analysis on the video images of the event site to determine the end time of the preset event. Assuming that the preset event "foul" is detected in the previous video frame M, and the preset event "foul” is not detected in the next video frame N, the time stamp of the previous video frame M in the video screen can be recorded as a preset Set the end time of the event "foul”. Finally, the start time and the end time are played in sequence, and the preset event is played back in slow motion in the video screen from different perspectives, that is, the slow motion of the preset event from different perspectives is played.
  • the video editing strategy is visual special effects.
  • the video editing strategy corresponding to the preset event "free kick” can be a rendering special effect of "adding an arrow from the football to the goal and displaying the distance”.
  • the video editing strategy corresponding to the preset event "offside” may be a rendering special effect "drawing an offside line determined by the rearmost defender”.
  • the same preset event can correspond to a continuous video editing strategy.
  • the video editing strategy corresponding to the preset event "Goal” may be an additional rendering special effect "AR effect of flying colorful confetti in the stadium" after the shot is switched.
  • rendering visual special effects When rendering visual special effects, it can be based on video analysis algorithms such as video action recognition algorithms, image semantic segmentation algorithms, and human detection algorithms. It not only detects the time when the preset event occurs, and inserts the rendering effects in real time, but also detects that the target associated with the preset event is in the video.
  • the position information in the screen (such as the position of the offside player, the position of the football before the kickoff, etc.). Then, the visual special effects are rendered in the video screen according to the location information to obtain the guide video.
  • OpenGL Open Graphics Library
  • the location information, the video screen, and the logo of the visual special effect are transmitted to the client, and the visual effects are rendered on the video screen according to the location information and the logo of the visual special effect on the client.
  • an iOS client can call the AR kit toolkit to implement visual effects rendering on the client side
  • an Android client can call the AR core toolkit to implement visual effects rendering on the client side.
  • the human body detection algorithm involved in the above description can specifically be Mask RCNN (Mask Region-based Convolutional Neural Network), SSD (Single Shot Multi-boxes Detector, single Lens multi-box detection), YOLO (You Only Look Once), etc.;
  • the image semantic segmentation algorithm can be FCN (Fully Convolutional Networks, Fully Convolutional Networks), DeepLab (Deep Research Laboratory), etc.
  • the present disclosure uses artificial intelligence to automatically, quickly, efficiently and accurately render visual special effects in the guide video, speed up the production of visual special effects, and overcome the lack of staff due to the complicated production of visual special effects in the traditional guide.
  • This embodiment analyzes the video screen based on computer vision technology, can more accurately and quickly identify the content and events in the video screen, and automatically edit the video screen to form a guide video, thereby realizing the video screen of the event site Carrying out automatic broadcasting, reducing the dependence on the director, eliminating the restriction on the number and ability of the director during the director process, reducing the labor cost in the director process, and avoiding the manual monitoring of multiple video images to a certain extent. Wrong judgments and missed judgments.
  • the directing method provided in this embodiment is easy to deploy and implement, can save the deployment time cost required before directing, and is suitable for the directing process of sports games, concerts and other activities.
  • step S104 is further included.
  • the current time is used to look up the time and editing strategy mapping table to determine the video editing strategy corresponding to the current time.
  • the time and editing strategy mapping table is shown in Table 1.
  • the time and editing strategy mapping table defines what content should be played in what time period.
  • An example of the construction process of the time and editing strategy mapping table is as follows. First, extract the optional video editing strategy corresponding to each time period from the existing guide videos; then use data mining algorithms to process each time period and its corresponding optional video editing strategy to obtain the corresponding video editing strategy for each time period ; Finally, use each time period and its corresponding editing strategy to construct a time and editing strategy mapping table. Since the construction process of time and editing strategy mapping table, events and the construction process of editing strategy mapping table are similar, detailed introduction will not be given here. Those skilled in the art should understand that the event and editing strategy mapping table can be used to form the guided video during the activity, and the time and editing strategy mapping table can be used to form the guided video outside the activity.
  • Fig. 2 shows a schematic flow chart of a directing method according to other embodiments of the present disclosure. As shown in FIG. 2, after step S105 in the embodiment corresponding to FIG. 1, this embodiment further includes step S206 to step S210.
  • step S206 the target detection frame of the movable target in the video frame is determined.
  • the aforementioned object detection algorithm can be used to obtain the target detection frame of the football in the video picture.
  • step S207 the position and/or angle of the camera is adjusted according to the position of the target detection frame, so that the target detection frame is located in a preset area in the video frame.
  • a player is detected with the ball in the video recorded by the camera at camera A.
  • the coordinates of the detection frame of the ball move to the left of the video screen, adjust the camera to move to the left, or adjust the camera's Adjust the angle to the left to ensure that the football is in the middle of the video frame.
  • step S208 is further included.
  • the zoom factor of the camera is adjusted according to the area of the target detection frame, so that the area of the target detection frame is within the preset numerical range.
  • the proportion of football in the video image is less than 2%, increase the magnification of the camera to ensure that the football can be clearly seen in the video image.
  • step S209 is further included.
  • the Automatic Color Equalization algorithm is adopted to adjust the color and brightness of the video screen.
  • step S210 is further included.
  • the Adaptive Contrast Enhancement algorithm is adopted to adjust the contrast of the video image.
  • This embodiment realizes automatic control of the camera through artificial intelligence, so that the camera can more quickly and efficiently respond to the demand for video image capture, and while reducing the labor cost required to capture the video image, it improves work efficiency.
  • this embodiment realizes the automatic collection of video images, thereby realizing the automation and intelligence of the entire process of signal collection, guide video production, and special effects production in the guide process.
  • Fig. 3 shows a schematic structural diagram of a broadcasting guide device according to some embodiments of the present disclosure.
  • the broadcasting guide device 30 in this embodiment includes:
  • the event detection module 301 is configured to perform video analysis on the video images at the event site to detect whether a preset event occurs at the event site;
  • the mapping table search module 304 is configured to, when a preset event is detected at the event site, Look up the event and editing strategy mapping table to determine the video editing strategy corresponding to the preset event;
  • the video editing module 305 is configured to edit the video screen according to the video editing strategy to obtain the guide video.
  • the video editing strategy is a sequence of multiple targets; the video editing module 305 is configured to: in the multiple video images at the event site, respectively determine the video images containing each target; according to the sequence, sequentially Play the video screen containing each target to form a guide video.
  • the video editing module 305 is further configured to: adjust the video images containing each target to a close-up shot of the corresponding target before sequentially playing the video images containing each target.
  • the video editing strategy is slow-motion playback of preset events originating from different perspectives; the video editing module 305 is configured to: determine that the preset events originate from different The video screen of the angle of view; the slow motion playback of the preset event in the video screen from different angles of view is played sequentially to form a guide video.
  • the video editing module 305 is configured to: determine the start time when the preset event occurs; perform video analysis on the video screen of the event site to determine the end time of the preset event; play the start time and end in sequence Within time, preset events are played back in slow motion in video frames from different perspectives.
  • the video editing strategy is visual special effects
  • the video editing module 305 is configured to: detect the position information of the target associated with the preset event in the video frame; render the visual special effects in the video frame according to the position information to obtain the guide video.
  • the video editing module 305 is configured to: render the video screen with visual effects on the server according to the location information, and transmit the rendered video screen with the visual effects to the client; or, transmit the location information, The logo of the video screen and the visual special effect is transmitted to the client, and the visual special effect is rendered in the video screen according to the location information and the logo of the visual special effect on the client.
  • it further includes an event and editing strategy mapping table construction module 302, configured to: extract preset events and their corresponding optional video editing strategies from the existing guide videos; The event and its corresponding optional video editing strategy are processed to obtain the video editing strategy corresponding to the preset event; the preset event and its corresponding video editing strategy are used to construct the event and editing strategy mapping table.
  • an event and editing strategy mapping table construction module 302 configured to: extract preset events and their corresponding optional video editing strategies from the existing guide videos; The event and its corresponding optional video editing strategy are processed to obtain the video editing strategy corresponding to the preset event; the preset event and its corresponding video editing strategy are used to construct the event and editing strategy mapping table.
  • the event detection module 301 is configured to: input a single-channel video image of the event site into a pre-trained neural network for video analysis, or input multiple video images of the event site into a pre-trained neural network for video analysis. Analyze to obtain the probability of a preset event at the event site; judge whether a preset event occurs at the event site based on the probability.
  • mapping table search module 304 is further configured to use the current time to search the time and editing strategy mapping table to determine the video editing strategy corresponding to the current time.
  • it also includes a time and editing strategy mapping table construction module 303, configured to: extract the optional video editing strategy corresponding to each time period from the existing guide video; use a data mining algorithm to calculate the time period and The corresponding optional video editing strategy is processed to obtain the video editing strategy corresponding to each time period; each time period and its corresponding editing strategy are used to construct a time and editing strategy mapping table.
  • a time and editing strategy mapping table construction module 303 configured to: extract the optional video editing strategy corresponding to each time period from the existing guide video; use a data mining algorithm to calculate the time period and The corresponding optional video editing strategy is processed to obtain the video editing strategy corresponding to each time period; each time period and its corresponding editing strategy are used to construct a time and editing strategy mapping table.
  • This embodiment analyzes the video screen based on computer vision technology, can more accurately and quickly identify the content and events in the video screen, and automatically edit the video screen to form a guide video, thereby realizing the video screen of the event site Carrying out automatic broadcasting, reducing the dependence on the director, eliminating the restriction on the number and ability of the director during the director process, reducing the labor cost in the director process, and avoiding the manual monitoring of multiple video images to a certain extent. Wrong judgments and missed judgments.
  • the directing method provided in this embodiment is easy to deploy and implement, can save the deployment time cost required before directing, and is suitable for the directing process of sports games, concerts and other activities.
  • it further includes a camera adjustment module 306 configured to: determine the target detection frame of the movable target in the video frame; adjust the position and/or angle of the camera according to the position of the target detection frame, so that the target detection frame Located in the preset area of the video screen.
  • a camera adjustment module 306 configured to: determine the target detection frame of the movable target in the video frame; adjust the position and/or angle of the camera according to the position of the target detection frame, so that the target detection frame Located in the preset area of the video screen.
  • the camera adjustment module 306 is further configured to adjust the zoom factor of the camera according to the area of the target detection frame, so that the area of the target detection frame is within a preset numerical range.
  • the image adjustment module 307 is further included, which is configured to: use an automatic color equalization algorithm to adjust the color and brightness of the video image; and use an adaptive contrast enhancement algorithm to adjust the contrast of the video image.
  • This embodiment realizes automatic control of the camera through artificial intelligence, so that the camera can more quickly and efficiently respond to the demand for video image capture, and while reducing the labor cost required to capture the video image, it improves work efficiency.
  • Fig. 4 shows a schematic structural diagram of a broadcasting guide device according to other embodiments of the present disclosure.
  • the broadcasting director device 40 of this embodiment includes a memory 410 and a processor 420 coupled to the memory 410.
  • the processor 420 is configured to execute any of the foregoing embodiments based on instructions stored in the memory 410. Directing method in.
  • the memory 410 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory for example, stores the operating system, application programs, boot loader (Boot Loader), and other programs.
  • the broadcasting director 40 may also include an input and output interface 430, a network interface 440, a storage interface 450, and the like. These interfaces 430, 440, 450 and the memory 410 and the processor 420 may be connected via a bus 460, for example.
  • the input and output interface 430 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 440 provides a connection interface for various networked devices.
  • the storage interface 450 provides a connection interface for external storage devices such as SD cards and U disks.
  • the present disclosure also includes a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the directing method in any of the foregoing embodiments is implemented.
  • the embodiments of the present disclosure can be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开提供了一种导播方法、装置及计算机可读存储介质,涉及多媒体技术领域。其中的导播方法包括:对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件;在检测到活动现场发生预设事件的情况下,查找事件与编辑策略映射表,以确定预设事件对应的视频编辑策略;根据视频编辑策略对视频画面进行编辑,得到导播视频。本公开能够对活动现场的视频画面进行自动导播,减少对导播人员的依赖,降低了导播过程中的人力成本。

Description

导播方法、装置及计算机可读存储介质
相关申请的交叉引用
本申请是以CN申请号为201910701261.9,申请日为2019年7月31日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及多媒体技术领域,特别涉及一种导播方法、装置及计算机可读存储介质。
背景技术
当前的体育赛事、演唱会等活动现场进行导播时,采用人工控制的形式。摄像师对活动现场的视频画面进行采集,并将活动现场多个机位的视频画面传输至导播车,由导播车上的导播团队根据专业知识和导播经验对现场多个机位的视频画面进行编辑,形成导播视频画面。
发明内容
本公开解决的一个技术问题是,如何对活动现场的视频画面进行自动导播。
根据本公开实施例的一个方面,提供了一种导播方法,包括:对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件;在检测到活动现场发生预设事件的情况下,查找事件与编辑策略映射表,以确定预设事件对应的视频编辑策略;根据视频编辑策略对视频画面进行编辑,得到导播视频。
在一些实施例中,视频编辑策略为具有先后顺序的多个目标;根据视频编辑策略对视频画面进行编辑,得到导播视频包括:在活动现场的多路视频画面中,分别确定包含各个目标的视频画面;根据先后顺序,依次播放包含各个目标的视频画面,以形成导播视频。
在一些实施例中,还包括:依次播放包含各个目标的视频画面之前,分别将包含各个目标的视频画面调整至相应目标的特写镜头。
在一些实施例中,视频编辑策略为源自不同视角的预设事件的慢镜头回放;根据视频编辑策略对视频画面进行编辑,得到导播视频包括:在活动现场的多路视频画面 中,确定包含预设事件的源自不同视角的视频画面;依次播放预设事件在源自不同视角的视频画面中的慢镜头回放,以形成导播视频。
在一些实施例中,依次播放预设事件在源自不同视角的视频画面中的慢镜头回放包括:确定发生预设事件的起始时间;对活动现场的视频画面进行视频分析,以确定预设事件的终止时间;依次播放起始时间及终止时间内,预设事件在源自不同视角的视频画面中的慢镜头回放。
在一些实施例中,视频编辑策略为视觉特效;根据视频编辑策略对视频画面进行编辑,得到导播视频包括:检测预设事件相关联的目标在视频画面中的位置信息;根据位置信息在视频画面中渲染视觉特效,得到导播视频。
在一些实施例中,根据位置在视频画面中渲染视觉特效包括:根据位置信息,在服务端对视频画面进行视觉特效的渲染,并将渲染了视觉特效的视频画面传输至客户端;或者,将位置信息、视频画面及视觉特效的标识传输至客户端,并在客户端根据位置信息及视觉特效的标识在视频画面中渲染视觉特效。
在一些实施例中,还包括:从已有的导播视频中提取预设事件及其对应的可选视频编辑策略;采用数据挖掘算法对预设事件及其对应的可选视频编辑策略进行处理,得到预设事件对应的视频编辑策略;利用预设事件及其对应的视频编辑策略,构建事件与编辑策略映射表。
在一些实施例中,对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件包括:将活动现场的单路视频画面输入预先训练的神经网络进行视频分析,或将活动现场的多路视频画面同步输入预先训练的神经网络进行视频分析,得到活动现场发生预设事件的概率;根据概率判断活动现场是否发生预设事件。
在一些实施例中,还包括:利用当前时间查找时间与编辑策略映射表,以确定当前时间对应的视频编辑策略。
在一些实施例中,还包括:从已有的导播视频中提取各时间段对应的可选视频编辑策略;采用数据挖掘算法对各时间段及其对应的可选视频编辑策略进行处理,得到各时间段对应的视频编辑策略;利用各时间段及其对应的编辑策略,构建时间与编辑策略映射表。
在一些实施例中,还包括:确定可移动目标在视频画面中的目标检测框;根据目标检测框的位置调整摄像机的位置和/或角度,以使得目标检测框位于视频画面中的预设区域。
在一些实施例中,还包括:根据目标检测框的面积调整摄像机的缩放倍数,以使得目标检测框的面积在预设数值范围内。
在一些实施例中,还包括:采用自动色彩均衡算法,对视频画面的色彩和亮度进行调整;采用自适应对比度增强算法,对视频画面的对比度进行调整。
根据本公开实施例的另一个方面,提供了一种导播装置,包括:事件检测模块,被配置为对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件;映射表查找模块,被配置为在检测到活动现场发生预设事件的情况下,查找事件与编辑策略映射表,以确定预设事件对应的视频编辑策略;视频编辑模块,被配置为根据视频编辑策略对视频画面进行编辑,得到导播视频。
在一些实施例中,视频编辑策略为具有先后顺序的多个目标;视频编辑模块被配置为:在活动现场的多路视频画面中,分别确定包含各个目标的视频画面;根据先后顺序,依次播放包含各个目标的视频画面,以形成导播视频。
在一些实施例中,视频编辑模块还被配置为:依次播放包含各个目标的视频画面之前,分别将包含各个目标的视频画面调整至相应目标的特写镜头。
在一些实施例中,视频编辑策略为源自不同视角的预设事件的慢镜头回放;视频编辑模块被配置为:在活动现场的多路视频画面中,确定包含预设事件的源自不同视角的视频画面;依次播放预设事件在源自不同视角的视频画面中的慢镜头回放,以形成导播视频。
在一些实施例中,视频编辑模块被配置为:确定发生预设事件的起始时间;对活动现场的视频画面进行视频分析,以确定预设事件的终止时间;依次播放起始时间及终止时间内,预设事件在源自不同视角的视频画面中的慢镜头回放。
在一些实施例中,视频编辑策略为视觉特效;视频编辑模块被配置为:检测预设事件相关联的目标在视频画面中的位置信息;根据位置信息在视频画面中渲染视觉特效,得到导播视频。
在一些实施例中,视频编辑模块被配置为:根据位置信息,在服务端对视频画面进行视觉特效的渲染,并将渲染了视觉特效的视频画面传输至客户端;或者,将位置信息、视频画面及视觉特效的标识传输至客户端,并在客户端根据位置信息及视觉特效的标识在视频画面中渲染视觉特效。
在一些实施例中,还包括事件与编辑策略映射表构建模块,被配置为:从已有的导播视频中提取预设事件及其对应的可选视频编辑策略;采用数据挖掘算法对预设事 件及其对应的可选视频编辑策略进行处理,得到预设事件对应的视频编辑策略;利用预设事件及其对应的视频编辑策略,构建事件与编辑策略映射表。
在一些实施例中,事件检测模块被配置为:将活动现场的单路视频画面输入预先训练的神经网络进行视频分析,或将活动现场的多路视频画面同步输入预先训练的神经网络进行视频分析,得到活动现场发生预设事件的概率;根据概率判断活动现场是否发生预设事件。
在一些实施例中,映射表查找模块还被配置为:利用当前时间查找时间与编辑策略映射表,以确定当前时间对应的视频编辑策略。
在一些实施例中,还包括时间与编辑策略映射表构建模块,被配置为:从已有的导播视频中提取各时间段对应的可选视频编辑策略;采用数据挖掘算法对各时间段及其对应的可选视频编辑策略进行处理,得到各时间段对应的视频编辑策略;利用各时间段及其对应的编辑策略,构建时间与编辑策略映射表。
在一些实施例中,还包括摄像机调整模块,被配置为:确定可移动目标在视频画面中的目标检测框;根据目标检测框的位置调整摄像机的位置和/或角度,以使得目标检测框位于视频画面中的预设区域。
在一些实施例中,摄像机调整模块还被配置为:根据目标检测框的面积调整摄像机的缩放倍数,以使得目标检测框的面积在预设数值范围内。
在一些实施例中,还包括画面调整模块,被配置为:采用自动色彩均衡算法,对视频画面的色彩和亮度进行调整;采用自适应对比度增强算法,对视频画面的对比度进行调整。
根据本公开实施例的又一个方面,提供了一种导播装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行前述的导播方法。
根据本公开实施例的再一个方面,提供了一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现前述的导播方法。
本公开能够对活动现场的视频画面进行自动导播,减少对导播人员的依赖,降低了导播过程中的人力成本。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1示出了本公开一些实施例的导播方法的流程示意图。
图2示出了本公开另一些实施例的导播方法的流程示意图。
图3示出了本公开一些实施例的导播装置的结构示意图。
图4示出了本公开另一些实施例的导播装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
发明人研究发现,传统的导播方法人力成本较高且时间成本较高。以体育赛事导播为例,一场活动现场的导播通常需要40至60名相关的工作人员,且需要导播团队提前1至3天进行活动现场的部署和准备工作,时间耗费较长。基于上述分析,本公开提供了一种导播方法,能够对活动现场的视频画面进行自动导播,减少对导播人员的依赖,降低了导播过程中的人力成本。
首先结合图1描述本公开导播方法的一些实施例。
图1示出了本公开一些实施例的导播方法的流程示意图。如图1所示,本实施例包括步骤S101~步骤S105。
在步骤S101中,对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件。
本领域技术人员应理解,将标注后的训练图片输入神经网络,能够训练得到具有图片分析功能的神经网络。该神经网络通常为卷积神经网络,其上可以运行人脸检测算法、人体跟踪算法、物体检测算法、视频场景识别算法、视频动作事件识别算法等计算机视觉算法,从而检测到视频画面中的人物、动作、事件等等。
利用训练好的卷积神经网络对活动现场的视频画面中的帧进行视频分析,可以检测出人脸、人体、指定物体的位置坐标,从而实时检测到活动现场发生的预设事件。以活动现场为足球比赛为例,预设事件具体可以为进球、犯规、越位、任意球等等。
在进行视频分析时,可以将活动现场的单路视频画面输入预先训练的神经网络进行视频分析,也可以将活动现场的多路视频画面同步输入预先训练的神经网络进行视频分析,得到活动现场发生预设事件的概率,并根据概率判断活动现场是否发生预设事件。
单路视频分析时,受到摄像机录制角度的限制,如果该路视频画面中出现运动员之间相互遮挡等情形,则该路视频画面中的信息不够完整,可能导致无法准确检测活动现场是否发生犯规。相比较而言,多路视频分析时可以采用early fusion(先融合)、late fusion(后融合)等算法,对活动现场的人物、动作和场景有更加全面的认识和理解,从多个角度联合检测活动现场是否发生预设事件,从而通过数据融合的方式提升检测的准确度。
在未检测到活动现场发生预设事件的情况下,返回步骤S101。在检测到活动现场发生预设事件的情况下,则执行步骤S103。
在步骤S103中,查找事件与编辑策略映射表,以确定预设事件对应的视频编辑策略。
事件与编辑策略映射表中包含不同预设事件与不同视频编辑策略之间的映射关系。例如,预设事件“进球”对应的视频编辑策略为镜头切换策略“(主机位—>)进球运动员—>守门员—>教练员—>观众(—>主机位)”,预设事件“犯规”对应的视频编辑策略为慢镜头回放策略“(主机位—>)A视角犯规慢镜头—>B视角犯规慢镜头—>C视角犯规慢镜头(—>主机位)”,预设事件“任意球”对应的视频编辑策略为渲染特效“从足球至球门之间增加箭头并显示距离”。
下面对事件与编辑策略映射表的构建过程进行举例说明。
首先,从已有的导播视频中提取预设事件及其对应的可选视频编辑策略。
例如,足球比赛导播人员进行赛事导播后,通过导播设备可以记录导播所有切换镜头、制作慢镜头等相关操作,也可以记录导播切换视频的时间点以及切换的机位标识。根据这些数据形成的数据集,可以将已有的导播视频划分为若干个视频片段,并利用视频动作、事件、人物识别等视频分析算法,标注每一个视频片段标签。
然后,采用数据挖掘算法对预设事件及其对应的可选视频编辑策略进行处理,得 到预设事件对应的视频编辑策略。
例如,在发生预设事件“犯规”后,可选视频编辑策略包括“A视角犯规慢镜头—>B视角犯规慢镜头—>C视角犯规慢镜头—>D视角犯规慢镜头”、“A视角犯规慢镜头—>B视角犯规慢镜头—>C视角犯规慢镜头—>E视角犯规慢镜头”等等。通过关联规则挖掘、频繁项集挖掘的数据挖掘方法可以得出,预设事件“犯规”对应的视频编辑策略为“A视角犯规慢镜头—>B视角犯规慢镜头—>C视角犯规慢镜头”。
最后,利用预设事件及其对应的视频编辑策略,构建事件与编辑策略映射表。
在步骤S105中,根据视频编辑策略对视频画面进行编辑,得到导播视频。
下面分三种情形对步骤S105进行介绍。
(1)视频编辑策略为具有先后顺序的多个目标。
仍以预设事件“进球”对应的视频编辑策略为镜头切换策略“(主机位—>)进球运动员—>守门员—>教练员—>观众(—>主机位)”为例。在活动现场的多路视频画面中,分别确定包含各个目标的视频画面。假设A机位的视频画面中检测到包含进球运动员,B机位的视频画面中检测到包含守门员,C的机位视频画面中检测到包含教练员,D机位的视频画面中检测到包含观众。然后,根据先后顺序,依次播放包含各个目标的视频画面,即从主机位依次切换至A机位、B机位、C机位、D机位的视频画面,再返回至主机位,以形成导播视频。每个机位的播放时长可以根据实际需要设定。
在一些实施例中,依次播放包含各个目标的视频画面之前,还可以分别将包含各个目标的视频画面调整至相应目标的特写镜头。例如,通过视频检测分析得到视频画面中守门员的目标检测框,然后根据该目标检测框的面积调整摄像机的缩放倍数,以使得该目标检测框的面积在预设数值范围内,如为视频画面面积的50%左右。
(2)视频编辑策略为源自不同视角的预设事件的慢镜头回放。
仍以预设事件“犯规”对应的视频编辑策略为慢镜头回放策略“(主机位—>)A视角犯规慢镜头—>B视角犯规慢镜头—>C视角犯规慢镜头(—>主机位)”为例。在活动现场的多路视频画面中,确定包含预设事件的源自不同视角的视频画面,假设A、B、C三个机位提供的视频画面均检测到预设事件“犯规”。然后,依次播放预设事件在源自不同视角的视频画面中的慢镜头回放,以形成导播视频。
依次播放预设事件在源自不同视角的视频画面中的慢镜头回放时,可以首先确定发生预设事件的起始时间。该起始时间即视频画面中检测到发生预设事件时,视频画 面中的时间戳。然后,对活动现场的视频画面进行视频分析,以确定预设事件的终止时间。假设上一视频帧M中检测到预设事件“犯规”,而下一视频帧N中未检测到预设事件“犯规”,则可以将上一视频帧M在视频画面中的时间戳记为预设事件“犯规”的终止时间。最后,依次播放起始时间及终止时间内,预设事件在源自不同视角的视频画面中的慢镜头回放,即播放来自不同视角的预设事件的慢镜头。
(3)视频编辑策略为视觉特效。
例如,预设事件“任意球”对应的视频编辑策略可以为渲染特效“从足球至球门之间增加箭头并显示距离”。再比如,预设事件“越位”对应的视频编辑策略可以为渲染特效“绘制根据最后方防守运动员确定的越位线”。本领域技术人员应理解,对于同一个预设事件,可以对应连续的视频编辑策略。比如,预设事件“进球”对应的视频编辑策略可以是镜头切换后附加渲染特效“体育场内彩色纸屑飞舞的AR效果”。
在渲染视觉特效时,可以基于视频动作识别算法、图像语义分割算法、人体检测算法等视频分析算法,既检测预设事件发生的时间实时插入渲染特效,也检测预设事件相关联的目标在视频画面中的位置信息(例如越位运动员的位置、任意球开球前足球的位置等等)。然后,根据位置信息在视频画面中渲染视觉特效,得到导播视频。
渲染视觉特效时,可以根据位置信息,在服务端调用OpenGL(Open Graphics Library,开放式图形库)对视频画面进行视觉特效的渲染,并将渲染了视觉特效的视频画面传输至客户端。或者,将位置信息、视频画面及视觉特效的标识传输至客户端,并在客户端根据位置信息及视觉特效的标识在视频画面中渲染视觉特效。例如,iOS的客户端可以调用AR kit工具包实现客户端侧的视觉特效渲染,安卓的客户端可以调用AR core工具包实现客户端侧的视觉特效渲染。
本领域技术人员应理解,以上描述中涉及的人体检测算法具体可以为Mask RCNN(Mask Region-based Convolutional Neural Network,掩膜基于区域的卷积神经网络),SSD(Single Shot Multi-boxes Detector,单镜头多箱式检测),YOLO(You Only Look Once,只看一次)等等;图像语义分割算法具体可以为FCN(Fully Convolutional Networks,全卷积网络)、DeepLab(深度研究实验室)等等。通过图像语义分割算法可以获得运动员准确的位置和轮廓,从而将运动员从视频画面的背景中抠出,调用视觉特效库中预先制作好的3D模型或2D模型,为运动员增加火焰特效或为视频画面的背景增加背景特效。由此可见,本公开采用人工智能的方式能够自动、快速、高效、准确的在导播视频中渲染视觉特效,提升视觉特效的制作的速度,克服 传统导播过程中因视觉特效制作复杂导致工作人员没有充足的时间进行视觉特效渲染的缺点。
本实施例基于计算机视觉技术对视频画面进行分析,能够更加准确、快速地对视频画面中的内容和事件进行识别,并对视频画面进行自动编辑形成导播视频,从而实现了对活动现场的视频画面进行自动导播,减少了对导播人员的依赖,省去了导播过程对导播人员数量和能力的限制,降低了导播过程中的人力成本,并在一定程度上避免了因人工监控多路视频画面导致的错判和漏判。此外,本实施例提供的导播方法易于部署实施,能够节省导播之前所需的部署时间成本,适用于体育比赛、演唱会等活动现场的导播过程。
在一些实施例中,还包括步骤S104。在步骤S104中,利用当前时间查找时间与编辑策略映射表,以确定当前时间对应的视频编辑策略。时间与编辑策略映射表例如表1所示。
表1
Figure PCTCN2020080867-appb-000001
时间与编辑策略映射表定义了在什么时间段内应该播放什么内容。时间与编辑策略映射表的构建过程举例如下。首先从已有的导播视频中提取各时间段对应的可选视频编辑策略;然后采用数据挖掘算法对各时间段及其对应的可选视频编辑策略进行处理,得到各时间段对应的视频编辑策略;最后利用各时间段及其对应的编辑策略,构建时间与编辑策略映射表。由于时间与编辑策略映射表的构建过程、事件与编辑策略映射表的构建过程比较类似,在此不作详细展开介绍。本领域技术人员应理解,在活 动进行过程中可以采用事件与编辑策略映射表形成导播视频,在活动进行过程之外可以采用时间与编辑策略映射表形成导播视频。
下面结合图2描述本公开导播方法的另一些实施例,以说明对摄像机的自动调整过程。
图2示出了本公开另一些实施例的导播方法的流程示意图。如图2所示,在图1所对应实施例的步骤S105之后,本实施例还包括步骤S206~步骤S210。
在步骤S206中,确定可移动目标在视频画面中的目标检测框。
例如,采用前述的物体检测算法可以得到足球在视频画面中的目标检测框。
在步骤S207中,根据目标检测框的位置调整摄像机的位置和/或角度,以使得目标检测框位于视频画面中的预设区域。
例如在足球比赛中,A机位的摄像机录制的视频画面中检测到运动员带球通过,当球的检测框坐标向视频画面的左侧移动时,调整摄像机向左侧移动,也可以调整摄像机的角度向左调整,以保证足球位于视频画面的中部。
在一些实施例中,还包括步骤S208。在步骤S208中,根据目标检测框的面积调整摄像机的缩放倍数,以使得目标检测框的面积在预设数值范围内。
例如,当足球在视频画面中的占比小于2%时,增大摄像机的放大倍数,保证在视频画面中能够清楚看到足球。
在一些实施例中,还包括步骤S209。在步骤S209中,采用自动色彩均衡Automatic Color Equalization算法,对视频画面的色彩和亮度进行调整。
在一些实施例中,还包括步骤S210。在步骤S210中,采用自适应对比度增强Adaptive Contrast Enhancement算法,对视频画面的对比度进行调整。
本实施例通过人工智能实现了对摄像机的自动控制,使得摄像机能够更加快速高效地响应视频画面采集的需求,在降低采集视频画面所需的人力成本的同时,提升了工作效率。
同时,在导播视频编辑、导播视频播放的基础上,本实施例实现了自动采集视频画面,从而实现了导播流程中信号采集、导播视频制作、特效制作的全流程的自动化、智能化。
下面结合图3描述本公开导播装置的一些实施例。
图3示出了本公开一些实施例的导播装置的结构示意图。如图3所示,本实施例中的导播装置30包括:
事件检测模块301,被配置为对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件;映射表查找模块304,被配置为在检测到活动现场发生预设事件的情况下,查找事件与编辑策略映射表,以确定预设事件对应的视频编辑策略;视频编辑模块305,被配置为根据视频编辑策略对视频画面进行编辑,得到导播视频。
在一些实施例中,视频编辑策略为具有先后顺序的多个目标;视频编辑模块305被配置为:在活动现场的多路视频画面中,分别确定包含各个目标的视频画面;根据先后顺序,依次播放包含各个目标的视频画面,以形成导播视频。
在一些实施例中,视频编辑模块305还被配置为:依次播放包含各个目标的视频画面之前,分别将包含各个目标的视频画面调整至相应目标的特写镜头。
在一些实施例中,视频编辑策略为源自不同视角的预设事件的慢镜头回放;视频编辑模块305被配置为:在活动现场的多路视频画面中,确定包含预设事件的源自不同视角的视频画面;依次播放预设事件在源自不同视角的视频画面中的慢镜头回放,以形成导播视频。
在一些实施例中,视频编辑模块305被配置为:确定发生预设事件的起始时间;对活动现场的视频画面进行视频分析,以确定预设事件的终止时间;依次播放起始时间及终止时间内,预设事件在源自不同视角的视频画面中的慢镜头回放。
在一些实施例中,视频编辑策略为视觉特效;视频编辑模块305被配置为:检测预设事件相关联的目标在视频画面中的位置信息;根据位置信息在视频画面中渲染视觉特效,得到导播视频。
在一些实施例中,视频编辑模块305被配置为:根据位置信息,在服务端对视频画面进行视觉特效的渲染,并将渲染了视觉特效的视频画面传输至客户端;或者,将位置信息、视频画面及视觉特效的标识传输至客户端,并在客户端根据位置信息及视觉特效的标识在视频画面中渲染视觉特效。
在一些实施例中,还包括事件与编辑策略映射表构建模块302,被配置为:从已有的导播视频中提取预设事件及其对应的可选视频编辑策略;采用数据挖掘算法对预设事件及其对应的可选视频编辑策略进行处理,得到预设事件对应的视频编辑策略;利用预设事件及其对应的视频编辑策略,构建事件与编辑策略映射表。
在一些实施例中,事件检测模块301被配置为:将活动现场的单路视频画面输入预先训练的神经网络进行视频分析,或将活动现场的多路视频画面同步输入预先训练的神经网络进行视频分析,得到活动现场发生预设事件的概率;根据概率判断活动现 场是否发生预设事件。
在一些实施例中,映射表查找模块304还被配置为:利用当前时间查找时间与编辑策略映射表,以确定当前时间对应的视频编辑策略。
在一些实施例中,还包括时间与编辑策略映射表构建模块303,被配置为:从已有的导播视频中提取各时间段对应的可选视频编辑策略;采用数据挖掘算法对各时间段及其对应的可选视频编辑策略进行处理,得到各时间段对应的视频编辑策略;利用各时间段及其对应的编辑策略,构建时间与编辑策略映射表。
本实施例基于计算机视觉技术对视频画面进行分析,能够更加准确、快速地对视频画面中的内容和事件进行识别,并对视频画面进行自动编辑形成导播视频,从而实现了对活动现场的视频画面进行自动导播,减少了对导播人员的依赖,省去了导播过程对导播人员数量和能力的限制,降低了导播过程中的人力成本,并在一定程度上避免了因人工监控多路视频画面导致的错判和漏判。此外,本实施例提供的导播方法易于部署实施,能够节省导播之前所需的部署时间成本,适用于体育比赛、演唱会等活动现场的导播过程。
在一些实施例中,还包括摄像机调整模块306,被配置为:确定可移动目标在视频画面中的目标检测框;根据目标检测框的位置调整摄像机的位置和/或角度,以使得目标检测框位于视频画面中的预设区域。
在一些实施例中,摄像机调整模块306还被配置为:根据目标检测框的面积调整摄像机的缩放倍数,以使得目标检测框的面积在预设数值范围内。
在一些实施例中,还包括画面调整模块307,被配置为:采用自动色彩均衡算法,对视频画面的色彩和亮度进行调整;采用自适应对比度增强算法,对视频画面的对比度进行调整。
本实施例通过人工智能实现了对摄像机的自动控制,使得摄像机能够更加快速高效地响应视频画面采集的需求,在降低采集视频画面所需的人力成本的同时,提升了工作效率。
下面结合图4描述本公开导播装置的另一些实施例。
图4示出了本公开另一些实施例的导播装置的结构示意图。如图4所示,该实施例的导播装置40包括:存储器410以及耦接至该存储器410的处理器420,处理器420被配置为基于存储在存储器410中的指令,执行前述任意一些实施例中的导播方法。
其中,存储器410例如可以包括系统存储器、固定非易失性存储介质等。系统存 储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。
导播装置40还可以包括输入输出接口430、网络接口440、存储接口450等。这些接口430、440、450以及存储器410和处理器420之间例如可以通过总线460连接。其中,输入输出接口430为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口440为各种联网设备提供连接接口。存储接口450为SD卡、U盘等外置存储设备提供连接接口。
本公开还包括一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现前述任意一些实施例中的导播方法。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (17)

  1. 一种导播方法,包括:
    对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件;
    在检测到活动现场发生预设事件的情况下,查找事件与编辑策略映射表,以确定所述预设事件对应的视频编辑策略;
    根据所述视频编辑策略对所述视频画面进行编辑,得到导播视频。
  2. 如权利要求1所述的导播方法,其中,所述视频编辑策略为具有先后顺序的多个目标;
    所述根据所述视频编辑策略对所述视频画面进行编辑,得到导播视频包括:
    在活动现场的多路视频画面中,分别确定包含各个所述目标的视频画面;
    根据所述先后顺序,依次播放包含各个所述目标的视频画面,以形成导播视频。
  3. 如权利要求2所述的导播方法,还包括:
    依次播放包含各个所述目标的视频画面之前,分别将包含各个所述目标的视频画面调整至相应所述目标的特写镜头。
  4. 如权利要求1所述的导播方法,其中,所述视频编辑策略为源自不同视角的所述预设事件的慢镜头回放;
    所述根据所述视频编辑策略对所述视频画面进行编辑,得到导播视频包括:
    在活动现场的多路视频画面中,确定包含所述预设事件的源自不同视角的视频画面;
    依次播放所述预设事件在所述源自不同视角的视频画面中的慢镜头回放,以形成导播视频。
  5. 如权利要求4所述导播方法,其中,所述依次播放所述预设事件在所述源自不同视角的视频画面中的慢镜头回放包括:
    确定发生所述预设事件的起始时间;
    对活动现场的视频画面进行视频分析,以确定所述预设事件的终止时间;
    依次播放所述起始时间及所述终止时间内,所述预设事件在所述源自不同视角的视频画面中的慢镜头回放。
  6. 如权利要求1所述的导播方法,其中,所述视频编辑策略为视觉特效;
    所述根据所述视频编辑策略对所述视频画面进行编辑,得到导播视频包括:
    检测所述预设事件相关联的目标在所述视频画面中的位置信息;
    根据所述位置信息在所述视频画面中渲染所述视觉特效,得到导播视频。
  7. 如权利要求6所述的导播方法,其中,所述根据所述位置在所述视频画面中渲染所述视觉特效包括:
    根据所述位置信息,在服务端对所述视频画面进行视觉特效的渲染,并将渲染了视觉特效的所述视频画面传输至客户端;
    或者,
    将所述位置信息、所述视频画面及所述视觉特效的标识传输至客户端,并在客户端根据所述位置信息及所述视觉特效的标识在所述视频画面中渲染所述视觉特效。
  8. 如权利要求1所述的导播方法,还包括:
    从已有的导播视频中提取所述预设事件及其对应的可选视频编辑策略;
    采用数据挖掘算法对所述预设事件及其对应的可选视频编辑策略进行处理,得到所述预设事件对应的视频编辑策略;
    利用所述预设事件及其对应的视频编辑策略,构建事件与编辑策略映射表。
  9. 如权利要求1所述的导播方法,其中,所述对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件包括:
    将活动现场的单路视频画面输入预先训练的神经网络进行视频分析,或将活动现场的多路视频画面同步输入预先训练的神经网络进行视频分析,得到活动现场发生所述预设事件的概率;
    根据所述概率判断活动现场是否发生预设事件。
  10. 如权利要求1所述的导播方法,还包括:
    利用当前时间查找时间与编辑策略映射表,以确定当前时间对应的视频编辑策略。
  11. 如权利要求10所述的导播方法,还包括:
    从已有的导播视频中提取各时间段对应的可选视频编辑策略;
    采用数据挖掘算法对各时间段及其对应的可选视频编辑策略进行处理,得到各时间段对应的视频编辑策略;
    利用各时间段及其对应的编辑策略,构建时间与编辑策略映射表。
  12. 如权利要求1所述的导播方法,还包括:
    确定可移动目标在所述视频画面中的目标检测框;
    根据所述目标检测框的位置调整摄像机的位置和/或角度,以使得所述目标检测框 位于视频画面中的预设区域。
  13. 如权利要求12所述的导播方法,还包括:
    根据所述目标检测框的面积调整摄像机的缩放倍数,以使得所述目标检测框的面积在预设数值范围内。
  14. 如权利要求1所述的导播方法,还包括:
    采用自动色彩均衡算法,对所述视频画面的色彩和亮度进行调整;
    采用自适应对比度增强算法,对所述视频画面的对比度进行调整。
  15. 一种导播装置,包括:
    事件检测模块,被配置为对活动现场的视频画面进行视频分析,以检测活动现场是否发生预设事件;
    映射表查找模块,被配置为在检测到活动现场发生预设事件的情况下,查找事件与编辑策略映射表,以确定所述预设事件对应的视频编辑策略;
    视频编辑模块,被配置为根据所述视频编辑策略对所述视频画面进行编辑,得到导播视频。
  16. 一种导播装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行如权利要求1至14中任一项所述的导播方法。
  17. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如权利要求1至14中任一项所述的导播方法。
PCT/CN2020/080867 2019-07-31 2020-03-24 导播方法、装置及计算机可读存储介质 WO2021017496A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910701261.9A CN111787243B (zh) 2019-07-31 2019-07-31 导播方法、装置及计算机可读存储介质
CN201910701261.9 2019-07-31

Publications (1)

Publication Number Publication Date
WO2021017496A1 true WO2021017496A1 (zh) 2021-02-04

Family

ID=72755071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080867 WO2021017496A1 (zh) 2019-07-31 2020-03-24 导播方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111787243B (zh)
WO (1) WO2021017496A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116389660B (zh) * 2021-12-22 2024-04-12 广州开得联智能科技有限公司 录播的导播方法、装置、设备及存储介质
CN116152711B (zh) * 2022-08-25 2024-03-22 北京凯利时科技有限公司 基于多模态的导播方法和系统以及计算机程序产品

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104519310A (zh) * 2013-09-29 2015-04-15 深圳锐取信息技术股份有限公司 一种远程导播控制系统
WO2017113019A1 (en) * 2015-12-30 2017-07-06 Steve Mann Recompressive sensing, resparsified sampling, and lightspacetimelapse: means, apparatus, and methods for spatiotemporal and spatiotonal timelapse and infinitely long media or multimedia recordings in finite memory
CN107888974A (zh) * 2016-09-30 2018-04-06 北京视连通科技有限公司 一种基于场景或特定对象的即时视频合成方法与系统
CN107995533A (zh) * 2012-12-08 2018-05-04 周成 视频中弹出跟踪对象的视频的方法
EP3468211A1 (en) * 2016-06-02 2019-04-10 Alibaba Group Holding Limited Video playing control method and apparatus, and video playing system
CN109922375A (zh) * 2017-12-13 2019-06-21 上海聚力传媒技术有限公司 直播中的事件展示方法、播放终端、视频系统及存储介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100686723B1 (ko) * 2005-07-20 2007-02-26 삼성전자주식회사 방송프로그램정보 표시방법 및 영상처리장치
EP2479991A3 (en) * 2006-12-04 2014-01-08 Lynx System Developers, Inc. Autonomous systems and methods for still and moving picture production
CN101650722B (zh) * 2009-06-01 2011-10-26 南京理工大学 基于音视频融合的足球视频精彩事件检测方法
CN201750957U (zh) * 2010-06-02 2011-02-23 五兆整合设计有限公司 体感运动游戏机的组成
US9684435B2 (en) * 2014-01-13 2017-06-20 Disney Enterprises, Inc. Camera selection interface for producing a media presentation
CN104394363B (zh) * 2014-11-21 2018-03-23 阔地教育科技有限公司 一种在线课堂导播方法及系统
CN105049764B (zh) * 2015-06-17 2018-05-25 武汉智亿方科技有限公司 一种基于多个定位摄像头的教学用图像跟踪方法及系统
CN106251334B (zh) * 2016-07-18 2019-03-01 华为技术有限公司 一种摄像机参数调整方法、导播摄像机及系统
CN106341711B (zh) * 2016-09-27 2019-09-24 成都西可科技有限公司 一种多机位视频直播回放方法及系统
CN108513081B (zh) * 2017-02-27 2020-12-29 杭州海康威视数字技术股份有限公司 一种用于课堂教学的录播方法、装置及系统
CN107087121B (zh) * 2017-04-20 2020-08-21 广州华多网络科技有限公司 一种基于运动检测的自动导播方法及装置
CN108282598B (zh) * 2017-05-19 2020-12-15 广州华多网络科技有限公司 一种软件导播系统及方法
CN107241611B (zh) * 2017-05-27 2019-09-24 蜜蜂四叶草动漫制作(北京)有限公司 一种直播联动装置及直播联动系统
CN109326310B (zh) * 2017-07-31 2022-04-08 西梅科技(北京)有限公司 一种自动剪辑的方法、装置及电子设备
US10432987B2 (en) * 2017-09-15 2019-10-01 Cisco Technology, Inc. Virtualized and automated real time video production system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107995533A (zh) * 2012-12-08 2018-05-04 周成 视频中弹出跟踪对象的视频的方法
CN104519310A (zh) * 2013-09-29 2015-04-15 深圳锐取信息技术股份有限公司 一种远程导播控制系统
WO2017113019A1 (en) * 2015-12-30 2017-07-06 Steve Mann Recompressive sensing, resparsified sampling, and lightspacetimelapse: means, apparatus, and methods for spatiotemporal and spatiotonal timelapse and infinitely long media or multimedia recordings in finite memory
EP3468211A1 (en) * 2016-06-02 2019-04-10 Alibaba Group Holding Limited Video playing control method and apparatus, and video playing system
CN107888974A (zh) * 2016-09-30 2018-04-06 北京视连通科技有限公司 一种基于场景或特定对象的即时视频合成方法与系统
CN109922375A (zh) * 2017-12-13 2019-06-21 上海聚力传媒技术有限公司 直播中的事件展示方法、播放终端、视频系统及存储介质

Also Published As

Publication number Publication date
CN111787243B (zh) 2021-09-03
CN111787243A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
JP5667943B2 (ja) コンピュータ実行画像処理方法および仮想再生ユニット
US10771760B2 (en) Information processing device, control method of information processing device, and storage medium
US11157742B2 (en) Methods and systems for multiplayer tagging for ball game analytics generation with a mobile computing device
CN109326310B (zh) 一种自动剪辑的方法、装置及电子设备
Stensland et al. Bagadus: An integrated real-time system for soccer analytics
CN101639354B (zh) 对象跟踪的设备和方法
US8611723B2 (en) System for relating scoreboard information with event video
JP2018504814A (ja) 放送において目標を追跡及びタグ付けするためのシステム及び方法
JP2009505553A (ja) ビデオストリームへの視覚効果の挿入を管理するためのシステムおよび方法
WO2018223554A1 (zh) 一种多源视频剪辑播放方法及系统
TWI537872B (zh) 辨識二維影像產生三維資訊之方法
JP2004500756A (ja) ビデオシーケンスと時空正規化との調整および合成
WO2021017496A1 (zh) 导播方法、装置及计算机可读存储介质
Pidaparthy et al. Keep your eye on the puck: Automatic hockey videography
KR20140126936A (ko) 실시간 영상에 프라이버시 마스킹 툴을 제공하는 장치 및 방법
US11978254B2 (en) Systems and methods for providing video presentation and video analytics for live sporting events
TWI601425B (zh) 一種串接攝影畫面以形成一物件軌跡的方法
CN109460724B (zh) 基于对象检测的停球事件的分离方法和系统
CN111741325A (zh) 视频播放方法、装置、电子设备及计算机可读存储介质
JP2019101892A (ja) オブジェクト追跡装置及びそのプログラム
US10025986B1 (en) Method and apparatus for automatically detecting and replaying notable moments of a performance
CN111402289A (zh) 基于深度学习的人群表演误差检测方法
US11514678B2 (en) Data processing method and apparatus for capturing and analyzing images of sporting events
CN112287771A (zh) 用于检测视频事件的方法、装置、服务器和介质
KR20150017564A (ko) Ar 글래스를 이용한 스포츠 경기 판정 시스템 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20847958

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20847958

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20847958

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12-09-2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20847958

Country of ref document: EP

Kind code of ref document: A1