CN106534618B

CN106534618B - Method, device and system for realizing pseudo field explanation

Info

Publication number: CN106534618B
Application number: CN201611052152.1A
Authority: CN
Inventors: 周文亮
Original assignee: Guangzhou UCWeb Computer Technology Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2016-11-24
Filing date: 2016-11-24
Publication date: 2020-05-12
Anticipated expiration: 2036-11-24
Also published as: CN106534618A

Abstract

The invention discloses a method, a device and a system for realizing pseudo-scene explanation, which are used for generating a video for explaining a scene in a scene. The camera is used for shooting the commentary video. A memory for storing live video and interpretive video of a live scene. And the processor is used for respectively extracting a live picture and a explaining picture corresponding to the same moment from the live video and the explaining video, intercepting an image of a specified object from the explaining picture, and superposing the image of the specified object on the live picture so as to form a pseudo live explaining picture. The invention can have the similar substitution feeling as the field explanation and also has the vivid expression mode which is not provided by the field explanation and the off-site explanation.

Description

Method, device and system for realizing pseudo field explanation

Technical Field

The invention relates to the field of comment video processing, in particular to a method, a device and a system for realizing pseudo-scene comment.

Background

The current explanation modes of the competition and the activity are single, and only two modes of on-site explanation and off-site explanation are provided.

The on-site commentary can have substitution feeling of short-distance commentary on the site, but due to the fact that the on-site observation angle is limited and the off-site commentary detail observation angle is lacked, commentary details are not rich enough, and due to the fact that the field competition and the activity of non-players are limited in movement range, many scenes cannot be well expressed.

The off-site explanation has no close-range substitution feeling of the on-site explanation, the observation angle is improved relative to the on-site explanation, but the expression form is insufficient, and many scenes cannot be well expressed.

Therefore, there is still a need for a commentary scheme for scene commentary, which has similar substitution feeling as scene commentary, and has vivid expression mode which is not available for scene commentary and off-site commentary.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method, a device and a system for realizing pseudo field explanation, which can have similar substitution feeling to the field explanation and simultaneously have vivid expression modes which are not provided by the field explanation and the off-site explanation.

According to an aspect of the present invention, there is provided a system for generating a video narrating a live scene, the system may include: a camera that can be used to capture commentary video; a memory operable to store live video and spoken video of a live scene; and a processor, which can be used for respectively extracting a live picture and a explaining picture corresponding to the same moment from the live video and the explaining video, intercepting an image of a specified object from the explaining picture, and superposing the image of the specified object on the live picture, thereby forming a pseudo live explaining picture.

Therefore, the image of the specified object is superposed on the scene picture to form a pseudo scene explanation picture, for example, tactical simulation pictures of both sides of a competition or an explanation person can be superposed on the scene picture as the specified object, so that the explanation content is greatly enriched, the explanation is more vivid, and the explanation efficiency and the interest of audiences can be improved.

Preferably, the system may further comprise: the audio acquisition device can be used for acquiring commentary audio data, and the commentary audio data is audio data of the commentary video; the processor may extract live audio data and commentary audio data corresponding to the same time from the live video and the commentary video, respectively, synthesize composite audio data based on the live audio data and the commentary audio data, and synthesize a pseudo-live commentary video based on the pseudo-live commentary picture and the composite audio data.

Therefore, the false scene explanation picture and the composite audio data are synthesized into a complete false scene explanation video, the indication can be made in the scene picture through the specified object such as the image of the explanation person, the corresponding explanation, the dubbing and the like are matched, the character introduction, the tactical introduction, the scene detail description and the like can be more vividly carried out, the explanation person can even demonstrate the actions such as running, standing, tactical execution and the like in the false scene explanation video in person, the explanation content is greatly enriched, and the explanation efficiency is improved.

According to another aspect of the present invention, there is also provided an apparatus for generating a video narrating a live scene, the apparatus may include: a picture extraction unit operable to extract a live picture and a talking picture corresponding to the same time from the live video and the talking video, respectively; the image intercepting unit can be used for intercepting an image of the specified object from the narration picture; and the superposition unit can be used for superposing the image of the specified object into the scene picture so as to form a pseudo scene explanation picture.

Preferably, the apparatus may further comprise: and the superposition unit can superpose the image of the specified object on the scene picture based on the position of the image of the specified object in the comment picture.

Preferably, the apparatus may further comprise: the image processing unit may be configured to perform image processing on an image of the specified object. The image processing may include at least one of: the image processing unit may perform a selected special effect process on the image of the specified object; the image processing unit may process a gradation or a background of an image of the specified object; and the image processing unit may scale the image of the specified object based on the size of the live view.

Preferably, the apparatus may further comprise: an audio extracting unit that can extract live audio data and commentary audio data corresponding to the same time from the live video and the commentary video, respectively; an audio synthesizing unit that can synthesize composite audio data based on the live audio data and the commentary audio data; and an audio/video synthesizing unit which can synthesize the pseudo-scene comment video based on the pseudo-scene comment picture and the composite audio data.

According to yet another aspect of the present invention, there is also provided a method of generating a video narrating a live scene, the method may include: respectively extracting a live picture and a talking picture corresponding to the same moment from the live video and the talking video; intercepting an image of a specified object from the explanation picture; and overlaying the image of the specified object into the scene picture, thereby forming a pseudo scene explanation picture.

For example, the designated object may include a commentator, and the commentator may demonstrate a running position, a standing position, a tactical execution, etc. in person in the commentary screen, and superimpose an image of the commentator on the live screen to form a pseudo live commentary screen, so that the audience may know the specific action track of the contestant in the competition more clearly.

Preferably, the image of the specified object may be superimposed into the live view based on a position of the image of the specified object in the commentary view.

For example, in the commentary video, the commentator may simulate the positions of the participants on the competition scene to give detailed commentary on the actions of the relevant participants. Then the image of the specified object may be superimposed into the scene based on the position of the commentator in the commentary screen (and possibly also with some misalignment). For example, in a continuous commentary screen, the image of the commentator can move along with the movement of the commentator, and the commentator can explain the situation such as the running of the field personnel through the movement of the commentator.

Preferably, the shooting angle of the commentary video is substantially consistent with that of the live video; or in the case that the shooting angles are not consistent, the method may further include: and performing image conversion processing on the comment picture to enable the comment picture to correspond to the same shooting angle as the live video.

Therefore, the image of the specified object in the comment video can be equivalently superposed on the live picture, and the image of the specified object can be better blended into the live picture.

Preferably, the image of the specified object may also be subjected to image processing, which may include at least one of: selecting special effect processing is carried out on the image of the specified object; processing the gray scale or background of the image of the specified object; and scaling the image of the specified object based on the size of the live view.

Therefore, the image of the designated object is processed, so that the image of the designated object can be matched with the scene picture better, or various expected visual effects can be realized, and a high-quality false scene explanation picture is formed.

Preferably, live audio data and commentary audio data corresponding to the same moment can be extracted from the live video and the commentary video respectively; synthesizing composite audio data based on the live audio data and the commentary audio data; and synthesizing the pseudo live commenting video based on the pseudo live commenting picture and the composite audio data.

Therefore, a pseudo-scene comment video is generated, the pseudo-scene comment video can bring substitution feeling similar to scene comment for a user, meanwhile, a vivid expression mode which is not provided by the scene comment and the off-scene comment is provided, and the interest of the user is improved.

Preferably, the commentary audio data may also be subjected to audio processing.

Various special effect processing can be carried out on the commentary audio data, for example, some cheering and applause sound special effects can be added, or the voice of commentary personnel can be subjected to tone modification processing to generate a desired sound effect, so that the pseudo-live commentary video is more vivid.

According to the method, the device and the system for realizing the pseudo-scene explanation, the substitution feeling similar to the scene explanation can be possessed, and meanwhile, the scene explanation and the vivid expression mode which is not possessed by the explanation video can be possessed.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

Fig. 1 is a block diagram of a video generation device according to one embodiment of the present invention.

Fig. 2 is a flowchart of a video generation method according to an embodiment of the present invention.

Fig. 3 is a flowchart of a video generation method according to another embodiment of the present invention.

Fig. 4 is a block diagram of a video generation system according to one embodiment of the invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Before describing the present invention, a brief explanation will be made on the concept of "live video" and "commentary video" mentioned in the present invention.

The live video mentioned in the invention is the video to be explained. Which may be a video of a game site or a video of an event site (e.g., a soccer game video), may be captured or captured by a live camera or a terminal device (e.g., a cell phone, an IPAD, etc.) used by a live user. In addition, the live video described in the present invention may also be a video in a game scene, and the game may be a virtual game such as a hand game or a web game, or may be a real-person game (e.g., a real-person CS game). For virtual games such as hand games and online games, live videos can be obtained by shooting or screenshot screens of game devices.

The commentary video may be taken off-site or on-site. Due to the reasons of fields, equipment and the like, a more colorful explanation effect can be realized under the condition of off-site shooting. A narration video may be captured or captured using a camera or other camera-enabled device. The captured commentary video may include a screen containing only the commentator, or may include a screen of other related content. For example, when a commentator plays a soccer game, the commentator may personally perform a demonstration running position, a standing position, a tactical execution and other actions or perform a corresponding tactical simulation diagram of the game for better commentary, and then the commentary video may include the commentator, action-related persons, the tactical simulation diagram and the like.

In addition, the live video and the explaining video can be downloaded from a server. The server (e.g., portal site, video site) can provide various video sources, enrich the live video and explain the video material.

In addition, the server may also provide a video upload service so that the user can upload live videos and commentary videos to the server.

In particular, the server may serve specific persons (e.g., may be respective persons responsible for a video of a game or activity), and the server may also serve general users. For example, a person who owns live video or commentary video may upload the video to the server, or a person in need may download the live video or commentary video from the server. Thus, for a certain live video, there may be one or more commentary videos. For example, for a live video of a game, different commentators may have different commentary modes, and then there may be multiple commentary videos.

As shown in fig. 1, the video generating apparatus 10 of the present invention may include a picture extracting unit 11, an image cutting unit 12, and an superimposing unit 14.

As shown in fig. 2, in step S100, for example, a live picture and a caption picture corresponding to the same time can be extracted from the live video and the caption video, respectively, by the picture extraction unit 11.

The live video and the spoken video are obtained in the same manner as above, and are not described herein again.

And extracting the field picture and the explaining picture at the same moment, so that the explaining picture can be synchronous with the field picture, and the situation that the explaining picture is inconsistent with the content of the field picture is avoided as much as possible.

Further, when the commentary video and the live video are photographed, it is preferable that a photographing angle of the commentary video substantially coincides with a photographing angle of the live video. Here, the shooting angle is an angle between a shooting direction (camera optical system axis direction) and a specific direction (for example, a horizontal plane, a vertical line perpendicular to the ground, or the like). For example, when shooting a live video, the shooting direction is directed to the ground at 45 ° from the horizontal plane, and the shooting direction of the commentary video should preferably be directed to the ground at 45 ° from the horizontal plane.

In the case where the shooting angles of the commentary video and the live video do not coincide with each other, the commentary screen in the commentary video may be subjected to image conversion processing so as to correspond to the same shooting angle as the live video. For example, when a live video is shot, the shooting direction is directed to the ground at 45 ° from the horizontal plane, and when an commentary video is shot, the shooting direction may be directed to the area where the commentary is located, parallel to the horizontal direction. Then, an image conversion process is also required to be performed on the comment picture, so that the comment picture is converted to correspond to the same shooting angle as the live video. If the image deformation may exist in the converted picture, the image can be processed, so that the image can be better integrated into the field picture.

The commentary video can be shot by preferably using a pure-color background, so that the commentary picture can be conveniently processed at the later stage. The background ratio can be kept around 16: 9.

In step S200, the image of the specified object may be cut out from the illustration screen by the image cutting unit 12, for example.

The extracted narration picture may include both an image of an area occupied by the narrator and an image of an area occupied by some auxiliary objects (such as a football, a table, a tactical simulation diagram, and the like), and the images of the areas may be important parts in the narration picture, so that the areas may be referred to as images of the designated object, and the images of the designated object are captured from the narration picture. And other relevant parts, such as the image of the background, may or may not be intercepted, and may be intercepted as desired.

The designated objects in the narration pictures at different times may be different, and then the images of the designated objects captured at different times may be different. For example, an image of the area occupied by the commentator may be captured at the previous moment, and an image of the area occupied by the tactical simulation diagram given by the commentator may be captured at the later moment.

In the process of capturing, the designated object may also be an image including the area occupied by the commentator and the auxiliary. For example, if the commentator analyzes the movement of a certain player by means of the football when the commentator commentaries on the football match, the commentator and the football are integrated, and it may be necessary to simultaneously capture the image of the area occupied by the whole. Or the images of the areas occupied by the commentator and the football can be respectively captured, and then the two images are processed in the subsequent process. The image of the specified object intercepted at a certain time is not particularly limited.

In step S300, an image of the specified object may be superimposed into the live view screen by the superimposing unit 14, for example, thereby forming a pseudo live view illustration screen.

And overlaying the intercepted image of the specified object in the comment picture at the same moment into the field picture at the moment to form a pseudo field comment picture at the moment. Wherein, the image of the specified object can be superposed to a proper position in the scene picture according to the specific situation. For example, the image of the specified object may be superimposed on the position of the lower left corner in the live view, the image of the specified object may be superimposed on the position of the lower right corner in the live view, or the image of the specified object may be superimposed on the left or right side or other positions of the entire live view. Alternatively, when the commentary content of the commentator relates to a specific contestant, an image of the specified object may be superimposed beside the contestant on the live view to allow the audience viewing the pseudo-live commentary to better understand who the contestant said by the commentator refers to. The specific stacking position and stacking manner are not limited.

When the images are superposed, the whole intercepted images of the specified objects can be superposed into a field picture, or the images of the specified objects respectively intercepted can be superposed into the field picture after being reprocessed.

The superimposing unit 14 may also superimpose the image of the specified object into the live view based on the position of the image of the specified object in the narration view. For example, when the designated object is positioned at the middle position of the comment screen, the image of the designated object is superimposed on the middle position of the live view screen.

Alternatively, the superimposing unit 14 may superimpose the image of the specified object on the live view based on the relative position of the specified object in the narration view. For example, in a soccer commentary, a commentator cuts out images of a plurality of designated objects by more vividly explaining a soccer game, and during the superimposition, it is necessary to superimpose the images of the plurality of designated objects on a live view based on the relative positions of the designated objects on a commentary view. If not overlaid based on the relative position of the designated object, a situation may arise where the pseudo-live commentary video narration is unclear.

As shown in fig. 1, the video generating apparatus 10 may further include an image processing unit 13.

In the method shown in fig. 2, after step S200 and before step S300, the image processing unit 13 may perform image processing on the image of the specified object, for example.

It will be appreciated that the image processing unit 13 and its operation are not essential to the implementation of the solution of the invention.

The image processing unit 13 may perform the selected special effect processing on the image of the specified object.

Special effects processing refers to special effects that do not generally occur in reality. The selected special effect processing may include an exaggerated processing of the form or action of the specified object, such as an enlarged head of the narrator, an exaggerated bouncing action, and the like. The method can also comprise the step of slowing down the action of the specified object, so that the action is slowly displayed and is more clearly known by the audience. Or it may be repeated multiple times to emphasize an action. For example, in a football game, when a player completes a beautiful shot at a critical moment, the action of the player shooting can be subjected to deceleration processing, so that the action can be displayed slowly and clearly, or the action can be processed to be displayed repeatedly and continuously to highlight the action. The selected special effect process may further include other special effect processes, which are not particularly limited herein.

The image processing unit 13 may also process the gradation or background of the image of the specified object.

In the process of shooting a video, the live picture and the spoken picture at the same time do not completely match due to the influence of the shooting location, the shooting angle, the brightness, and the like. Thus, the gradation or background of the image of the designated image can be processed. For example, in a soccer game, in order to present a high-quality picture to viewers, the best shooting parameters, such as a brightness parameter and a color parameter corresponding to the high-quality picture, are adjusted when a live video is shot, and in order to match a comment picture with the best shooting parameters, a gray scale parameter of an image of a specified object may be adjusted. For example, a solid background of the narration screen may be processed to add a background matching the scene screen to the narration screen.

The image processing unit 13 may also scale the image of the specified object based on the size of the live view.

For example, when the designated object is a person, the person in the narration screen at the same time can be scaled at the same scale as the live screen, so that the image of the designated object is better merged into the live screen. For example, when explaining a table tennis game, if a specified object intercepted in an explanation screen is a table tennis ball, since the size of the table tennis ball itself is small, when scaling the image of the specified object, the image of the specified object may be appropriately scaled based on the size of the live view. If necessary, the image of the designated object may be appropriately enlarged.

Wherein the image of the specified object can be scaled when the images of the plurality of specified objects are taken as a whole. Or when the images of the designated objects are respectively superposed, the images of the designated objects can be zoomed according to the size of the scene picture, and then the images of the designated objects can be superposed on the scene picture according to the positions of the images of the designated objects in the explanation picture.

So far, the implementation of the pseudo field explanation of the present invention is described in detail with reference to fig. 1 and fig. 2. The above-mentioned scheme mainly describes the superposition of the image of the specified object in the illustration picture and the scene picture. It should be appreciated that commentary audio and live audio may also be synthesized.

As shown in fig. 1, the video generating apparatus 10 of the present invention may further include an audio extracting unit 15, an audio synthesizing unit 16, and an audiovisual synthesizing unit 17.

For the steps S100, S200, and S300 shown in fig. 3, or the steps of image processing performed between step S200 and step S300, reference may be made to the corresponding description in fig. 2 above, and details are not repeated here. The following is only illustrative of steps not mentioned above. It should be noted that there is no strict sequence between the steps of implementing the pseudo-scene comment picture (step S100-step S300) and the step of forming the composite audio data (step S400 and step S500), and the steps may be performed simultaneously or sequentially (the sequence is not limited).

As shown in fig. 3, in step S400, live audio data and commentary audio data corresponding to the same time may be extracted from the live video and the commentary video, respectively, by the audio extraction unit 15, for example.

The live audio data of the live video may include all audio data of the live, including for example audio data of a player's self-encouragement, or audio data of a live audience cheering, audio data of a referee in the live, a broadcast notification, etc.

The audio extraction unit 15 may also automatically filter some non-essential live audio data when extracting the live audio data. For example, when a person speaks near a video capture device, the content of the utterance is easily captured, but the content has no significance for generating live video, and then automatically filtering the noise while extracting the live audio data may make the resulting pseudo-live commentary video clearer.

The commentary audio data may include audio data of commentary by the commentator, audio data of interaction between the commentator and other people, audio data generated when the commentator performs certain actions, and the like.

When extracting commentary audio data, since any audio data in the commentary process may be important audio data required for pseudo live commentary, the audio extraction unit 15 may extract all the required commentary audio data. Of course, the audio extraction unit 15 may also automatically filter audio data that is not important in the commentary audio data.

Live audio data and commentary audio data corresponding to the same time are extracted, so that the live audio matches the commentary audio.

In step S500, the composite audio data may be synthesized based on the live audio data and the commentary audio data, for example, by the audio synthesis unit 16.

The audio synthesis unit 16 may directly synthesize composite audio data based on the live audio data and the commentary audio data.

The audio synthesizing unit 16 may further perform audio processing on the commentary audio data, and then synthesize composite audio data based on the processed commentary audio data and the live audio data.

The audio processing may include filtering unimportant audio data in the narration audio data, such as unimportant chat content of people near the audio capture device, and the like. The audio processing may also include surround sound processing of the narration audio data to improve the sound quality of the audio data to provide the audience with a sense of being physically present. The audio processing may also include various special effect processing on the commentary audio data, for example, adding some sound special effects such as cheering and applause, or may perform tonal modification processing on the voice of the commentator to generate a desired sound effect, so that the pseudo live commentary video is more vivid.

The audio processing may also include adjusting the sound size of the narration audio. For example, when a soccer game is announced, if the sound of the announcement is relatively small at a certain exciting time, and the sound of the announcement may be covered by the cheering of the audience in the live audio data, which may cause the sound of the announcement not to be heard well when the pseudo live announcement video is played, it is necessary to perform the sound size processing on the announcement audio data based on the live audio data at the same time, so that the announcement sound is not covered.

In step S600, the pseudo live comment video may be synthesized based on the pseudo live comment screen and the composite audio data by the audio-video synthesizing unit 17, for example.

In the synthesis process, attention needs to be paid to keeping the matching between the pseudo field explanation picture and the composite audio, and the pseudo field explanation video is synthesized based on the pseudo field explanation picture and the composite audio data at the same time.

So far, the pseudo-scene explanation scheme of the present invention is described in detail with reference to fig. 1 to 3, and the pseudo-scene explanation scheme of the present invention can be implemented in a video generation system as described below.

As shown in fig. 4, the video production system 50 of the present invention may include a camera 51, a memory 52, and a processor 53.

The camera 51 is used to capture an explanation video. The image pickup apparatus for picking up a video may be an apparatus having an image pickup function such as a mobile phone and an IPAD.

In addition, another camera may be used to capture the live video, the camera capturing the live video may not belong to the system of the present invention, and the system 50 of the present invention may obtain the live video through a network or the like.

The memory 52 is used to store live video and interpretive video of a live scene.

The memory 52 may store live video and caption video taken by the camera, and may also store live video and caption video downloaded from the server. The memory 52 may also upload stored live and spoken videos to the server.

The processor 53 is configured to extract a live view and an explanation view corresponding to the same time from the live video and the explanation video, respectively, intercept an image of a specified object from the explanation view, and superimpose the image of the specified object on the live view, thereby forming a pseudo live explanation view.

In addition, the system may further include an audio collecting device 54 for collecting commentary audio data, which is audio data of commentary video. The device for collecting the comment audio data can also be a device with a sound collection function, such as a mobile phone and an IPAD.

In addition, the other audio acquisition device can acquire the field audio data, and the field audio data is the audio data of the field video. The audio acquisition device for acquiring the field audio data may not belong to the system of the present invention, and the system of the present invention may acquire the field audio data through a network or the like.

The memory 52 may also be used to store live audio data and spoken audio data.

The processor 53 may also extract live audio data and commentary audio data corresponding to the same time from the live video and the commentary video, respectively, synthesize composite audio data based on the live audio data and the commentary audio data, and synthesize a pseudo-live commentary video based on the pseudo-live commentary picture and the composite audio data.

In this embodiment, the processor 53 may execute the method described in detail above with reference to fig. 2 to fig. 3, which is not described herein again.

A pseudo field comment implementation method, apparatus and system in accordance with the present invention has been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention. Alternatively, the method according to the present invention may also be implemented as a computer program product comprising a computer readable medium having stored thereon a computer program for executing the above-mentioned functions defined in the above-mentioned method of the present invention. Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A system for generating video illustrating a live scene, comprising:

the camera is used for shooting an off-site explanation video;

a memory for storing a live video of the live scene and the commentary video;

the audio acquisition device is used for acquiring commentary audio data, and the commentary audio data is the audio data of the commentary video; and

a processor for extracting a live picture and a narration picture corresponding to the same time from the live video and the narration video, respectively, capturing an image of a specified object from the narration picture, scaling the image of the specified object based on the size of the live picture, and overlaying the scaled image of the specified object onto the live picture, thereby forming a pseudo-live narration picture at the time,

the processor extracts live audio data and the commentary audio data corresponding to the same moment from the live video and the commentary video, synthesizes composite audio data based on the live audio data and the commentary audio data, and synthesizes a pseudo-live commentary video based on the pseudo-live commentary picture and the composite audio data at the same moment.

2. An apparatus for generating video illustrating a live scene, comprising:

the picture extraction unit is used for extracting a live picture and an explanation picture corresponding to the same moment from the live video and the off-site explanation video respectively;

an image capturing unit configured to capture an image of a specified object from the illustration screen;

an image processing unit configured to perform image processing on an image of the specified object, the image processing including: scaling an image of the specified object based on a size of the live view;

the superposition unit is used for superposing the image of the specified object after the image processing to the scene picture so as to form a pseudo scene explanation picture at the moment;

an audio extracting unit, configured to extract live audio data and commentary audio data corresponding to the same time from the live video and the commentary video, respectively;

an audio synthesis unit for synthesizing composite audio data based on the live audio data and the commentary audio data; and

and the audio and video synthesis unit is used for synthesizing a pseudo field explanation video based on the pseudo field explanation picture and the composite audio data at the same moment.

3. The apparatus of claim 2, further comprising:

the superposition unit superposes the image of the specified object on the scene picture based on the position of the image of the specified object in the comment picture.

4. The apparatus of claim 2, the image processing further comprising at least one of:

the image processing unit performs selected special effect processing on the image of the specified object;

the image processing unit processes a gradation or a background of an image of the specified object.

5. A method of generating a video illustrating a live scene, the method comprising:

respectively extracting a live picture and an explanation picture corresponding to the same moment from a live video and an off-site explanation video of the live scene;

intercepting an image of a specified object from the narration picture;

performing image processing on the image of the specified object, the image processing including: scaling an image of the specified object based on a size of the live view;

superposing the image of the specified object after image processing to the scene picture so as to form a pseudo scene explanation picture at the moment;

extracting live audio data and commentary audio data corresponding to the same moment from the live video and the commentary video respectively;

synthesizing composite audio data based on the live audio data and the commentary audio data; and

and synthesizing a pseudo-scene comment video based on the pseudo-scene comment picture and the composite audio data at the same moment.

6. The method of claim 5, wherein,

and based on the position of the image of the specified object in the comment picture, overlaying the image of the specified object into the live picture.

7. The method of claim 5, wherein,

the shooting angle of the commentary video is basically consistent with that of the live video; or

In the case that the shooting angles are not consistent, the method further comprises: and performing image conversion processing on the comment picture to enable the comment picture to correspond to the same shooting angle as the live video.

8. The method of claim 5, the image processing further comprising at least one of:

carrying out selected special effect processing on the image of the specified object;

and processing the gray scale or the background of the image of the specified object.

9. The method of claim 5, further comprising:

and performing audio processing on the commentary audio data.