CN107241646B

CN107241646B - Multimedia video editing method and device

Info

Publication number: CN107241646B
Application number: CN201710566432.2A
Authority: CN
Inventors: 邵可
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-07-12
Filing date: 2017-07-12
Publication date: 2020-08-14
Anticipated expiration: 2037-07-12
Also published as: CN107241646A

Abstract

The invention discloses a method and a device for editing a multimedia video, relates to the technical field of multimedia, and mainly aims to solve the problem that a short video intercepted from a conventional live broadcast or small video cannot be edited. The main technical scheme comprises: acquiring a multimedia file; decoding video data and audio data in the multimedia file; rendering the video data and performing audio track processing on the audio data; and coding the processed video data and the processed audio data to obtain the multimedia video. The method is mainly used for editing the multimedia video.

Description

Multimedia video editing method and device

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a method and an apparatus for editing a multimedia video.

Background

With the rapid development of internet technology, people no longer satisfy the requirement of simply using mobile phone calls to communicate and communicate, wherein social platforms established by multimedia technologies such as online live broadcast, small videos and the like have become the main means for communicating among users.

Currently, when a user uses a terminal device to perform live broadcast or record a small video, the small video can be stored by intercepting a small segment of the video, for example, a live broadcast platform is dancing a girl, and in order to record a video of the rotation of the girl, the short video of the rotation of the girl in the live broadcast video needs to be intercepted. After intercepting a video, editing a multimedia video becomes an urgent problem to be solved in order to enhance the playing effect of the video content.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for editing a multimedia video, and mainly aims to solve the problem that a short video captured from an existing live video or a small video cannot be edited.

According to an aspect of the present invention, there is provided a multimedia video editing method, including:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

rendering the video data and performing audio track processing on the audio data;

and coding the processed video data and the processed audio data to obtain the multimedia video.

Further, the rendering the video data and the audio track processing the audio data comprise:

receiving a processing instruction input by a user, wherein the processing instruction carries an effect identifier;

rendering the video data according to the video effect identifier in the effect identifier, and processing the audio data according to the audio effect identifier in the effect identifier.

Further, the rendering the video data according to the video effect identifier in the effect identifiers comprises:

extracting image data of each frame in the video data, and carrying out filter processing on the image data;

and identifying a target image in the image data after filter processing according to the video effect identifier, and performing synthesis rendering on the target image.

Further, the identifying a target image in the image data after filter processing according to the video effect identifier, and performing composite rendering on the target image includes:

if the video effect identifier is recognized to be a composite three-dimensional image, segmenting the target image, and coloring and synthesizing the target image, the segmented target image and the rendering image according to a preset coloring rule, wherein the preset coloring rule is used for reflecting the position display relation among the target image, the segmented target image and the rendering image.

Further, the processing the audio data according to the audio effect identifier in the effect identifiers comprises:

collecting discrete audio track data in the audio data according to a preset time interval;

and effectively superposing the discrete audio track data and a preset audio track according to the audio effect identifier.

Further, the decoding video data and audio data in the multimedia file comprises:

and respectively decoding the video data and the audio data in the multimedia file according to the video track and the audio track.

Further, after the rendering processing of the video data and the audio track processing of the audio data, the method further includes:

and when a real-time preview request is received, displaying the video data and the audio data.

Further, the method further comprises:

and receiving a speed adjusting instruction, and adjusting the playing speed of video data and audio data in the multimedia video according to speed information carried in the speed adjusting instruction.

According to an aspect of the present invention, there is provided an editing apparatus for multimedia video, comprising:

the acquiring unit is used for acquiring the multimedia file;

a decoding unit for decoding video data and audio data in the multimedia file;

the processing unit is used for rendering the video data and carrying out audio track processing on the audio data;

and the coding unit is used for coding the processed video data and the processed audio data to obtain the multimedia video.

Further, the processing unit includes:

the receiving module is used for receiving a processing instruction input by a user, wherein the processing instruction carries an effect identifier;

and the processing module is used for rendering the video data according to the video effect identifier in the effect identifier and processing the audio data according to the audio effect identifier in the effect identifier.

Further, the processing module comprises:

the extraction submodule is used for extracting the image data of each frame in the video data and carrying out filter processing on the image data;

and the synthesis submodule is used for identifying a target image in the image data after the filter processing according to the video effect identifier and performing synthesis rendering on the target image.

The synthesis sub-module is specifically configured to, if it is identified that the video effect identifier is a synthesized stereo image, segment the target image, and perform color synthesis on the target image, the segmented target image, and the rendered image according to a preset coloring rule, where the preset coloring rule is used to reflect a position display relationship among the target image, the segmented target image, and the rendered image.

Further, the processing module further comprises:

the acquisition submodule is used for acquiring discrete audio track data in the audio data according to a preset time interval;

and the superposition submodule is used for effectively superposing the discrete audio track data and a preset audio track according to the audio effect identifier.

The decoding unit is specifically configured to decode video data and audio data in the multimedia file according to a video track and an audio track, respectively.

Further, the apparatus further comprises:

and the display unit is used for displaying the video data and the audio data when a real-time preview request is received.

Further, the apparatus further comprises:

and the adjusting unit is used for receiving the speed adjusting instruction and adjusting the playing speed of the video data and the audio data in the multimedia video according to the speed information carried in the speed adjusting instruction.

According to one aspect of the invention, there is provided a memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor to:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

According to one aspect of the present invention, there is provided a mobile terminal comprising a processor adapted to implement various instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

By the technical scheme, the technical scheme provided by the embodiment of the invention at least has the following advantages:

the invention provides a method and a device for editing a multimedia video, which are characterized by firstly obtaining a multimedia file, then decoding video data and audio data in the multimedia file, then rendering the video data, carrying out audio track processing on the audio data, and finally coding the processed video data and the processed audio data to obtain the multimedia video. Compared with the existing method that the short video intercepted from the live video or the small video cannot be edited, the method and the device provided by the embodiment of the invention respectively process the video data and the audio data by decoding the video data and the audio data in the multimedia file, and realize the editing of the live video or the intercepted video after the video is coded into the multimedia video, thereby increasing the playing effect of the short video, enabling the video to be more vivid, enabling characters in the edited video to be more attached to the rendered image, and improving the use efficiency of the video.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating an editing method for a multimedia video according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating another multimedia video editing method according to a second embodiment of the present invention;

fig. 3 shows a block diagram of an editing apparatus for multimedia video according to a third embodiment of the present invention;

fig. 4 shows a block diagram of another apparatus for editing multimedia video according to the fourth embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

An embodiment of the present invention provides a method for editing a multimedia video, as shown in fig. 1, the method includes:

101. and acquiring the multimedia file.

The multimedia file may be a video file with a different format, such as an MP4 format, an MKV format, a 3GP format, and the like, and the embodiment of the present invention is not specifically limited, and the multimedia file may be obtained by shooting with a camera in a terminal device, may also be captured from an online live video, and may also be directly extracted from a storage space in the terminal device, and the embodiment of the present invention is not specifically limited.

It should be noted that, in order to facilitate the editing and application of the multimedia video to the terminal device, a certain video playing time needs to be set when the video is captured or recorded, so that the finally generated multimedia video is a shorter video, and the current video editing method is conveniently applied to the terminal device with a smaller memory space.

102. And decoding video data and audio data in the multimedia file.

The decoding of the video and audio data may be realized by reading the video data and the audio data in the multimedia file, respectively, that is, decoding and restoring the video stream and the audio stream into analog video data and analog audio data.

It should be noted that, in the embodiment of the present invention, the decoding process may be completed by one media decoder, and the multimedia file is transmitted to the media decoder, so that the video data and the audio time may be automatically obtained.

103. Rendering the video data and performing audio track processing on the audio data.

The rendering processing includes adding a bitmap in a video, adding a dynamic image, adjusting an image effect of the video image, and the like, and the audio track processing includes adding different audio tracks for combination, adding a sound effect, and the like.

When video data and audio data are processed, they may be processed separately or in association with each other. For example, when adding a background to a video, only the video data may be subjected to rendering background processing, and not the audio track data, but when characters are output in the video, it is necessary to add the output characters to the video by converting the output characters into characters, and then to process the audio track data first, and then to add corresponding image data in the character library to the video data by recognizing the characters in the audio track data, and then to process the video data together with the audio track data.

104. And coding the processed video data and the processed audio data to obtain the multimedia video.

And the coding is a coding for matching the rendered video data with the rendered audio data so as to obtain a multimedia video corresponding to the smooth audio and video.

The invention provides a method for editing a multimedia video, which is compared with the existing method that a short video intercepted from a live broadcast or a small video cannot be edited.

An embodiment of the present invention provides another method for editing a multimedia video, as shown in fig. 2, the method includes:

201. and acquiring the multimedia file.

This step is the same as step 101 shown in fig. 1, and is not described herein again.

It should be noted that the method for editing a multimedia video according to the embodiment of the present invention may be applied to other live broadcast or video recording application programs, and the video editing is implemented by calling an interface, and the method may also be written as a separately used application program according to a corresponding program, and a multimedia file is obtained by directly shooting through calling a camera, which is not limited in the embodiment of the present invention.

202. And respectively decoding the video data and the audio data in the multimedia file according to the video track and the audio track.

The video track is a content track of video playing, the audio track is a content track of audio playing, and in order to decode video data and audio track data in a multimedia file, so as to process the video data and the audio track data respectively, decoding needs to be performed according to the video track and the audio track respectively.

203. And receiving a processing instruction input by a user.

Wherein, the processing instruction carries the effect identifier. The processing instruction is used to instruct the system to perform specific video editing on a video, and the effect identifier is information identifying that different video and audio effects can be achieved, such as rendering an image, converting a voice and a text, adding a background, and the like.

It should be noted that, if the rendered image is an image input by a user, the image may be transmitted through a processing instruction. In addition, the bitmap of the image to be rendered and the added background may be a preset image or an image input by a user, which is not specifically limited in the embodiment of the present invention.

204. Rendering the video data according to the video effect identifier in the effect identifier, and processing the audio data according to the audio effect identifier in the effect identifier.

The video effect identifier is an identifier for processing a video effect in video data, the audio effect identifier is an identifier for processing an audio effect in audio data, and in order to further add different images corresponding to different effects, the video data or the audio data needs to be processed according to the video or audio effect identifier.

The video and the audio are respectively subjected to rendering processing and sound effect processing according to the video effect identifier and the audio effect identifier in the effect identifier, so that the image and the sound are respectively edited, and the performance of video editing is optimized.

For the embodiment of the present invention, the step of rendering the video data according to the video effect identifier in the effect identifier may specifically include: extracting image data of each frame in the video data, and carrying out filter processing on the image data; and identifying a target image in the image data after filter processing according to the video effect identifier, and performing synthesis rendering on the target image.

Since the analyzed video data is composed of image information of one frame and one frame, in order to add an image to a video, the image information in each frame needs to be added with the image, and before the image information is processed, a filter process needs to be performed, so that a video effect needing to be filtered is obtained. The target image is a corresponding object to which a bitmap needs to be added, or an object to be rendered, and the embodiment of the present invention is not particularly limited, for example, when the video effect identifier is an addition background image, the target image is a human image or an animal image, and if the video effect identifier is an addition special effect of a word spitting, the target image is a human face or a human mouth.

It should be noted that, in the composite rendering, a bitmap to be added is added to each frame of image, and positions of the added bitmaps in each frame are different, so that the rendered image in the video playing is dynamic.

For the embodiment of the present invention, the step of identifying the target image in the image data after filter processing according to the video effect identifier, and performing composite rendering on the target image may specifically include: if the video effect identifier is recognized to be a composite three-dimensional image, segmenting the target image, and coloring and synthesizing the target image, the segmented target image and the rendering image according to a preset coloring rule, wherein the preset coloring rule is used for reflecting the position display relation among the target image, the segmented target image and the rendering image.

The composite stereo image is obtained by displaying the bitmap as a virtual reality stereo dynamic image with hierarchy by using a visual difference effect, and the stereo image depends on whether the added bitmap is displayed at different positions in each frame of image or not and how much the added bitmap is displayed.

It should be noted that, if the video effect identifier is a composite stereoscopic image, the specific step is to divide the target image, and color and combine the target image, the divided target image, and the rendered image according to a preset coloring rule, where generally, the target image of the composite stereoscopic image is a character image, image information of each frame needs to be divided in order to distinguish a character in the image from a background, the rendered image is a bitmap which needs to be added, the preset coloring rule is a policy for determining whether the rendered image is displayed or not when covering the target image, and whether the rendered image needs to be hidden, and the specific policy is set according to different bitmaps and positions of the character, which is not specifically limited in the embodiment of the present invention.

For the embodiment of the present invention, the step of processing the audio data according to the audio effect identifier in the effect identifier may specifically include: collecting discrete audio track data in the audio data according to a preset time interval; and effectively superposing the discrete audio track data and a preset audio track according to the audio effect identifier.

In order to better superimpose different audio tracks, rather than simply superimpose the sound volume alone, discretization needs to be performed on the audio data, and discrete audio track data are collected according to a preset time interval, where the preset time interval may be 1 second, 0.05 second, and the like, and the embodiment of the present invention is not particularly limited. The effective superposition may be to superpose discrete audio track data of multiple preset audio tracks, and other preset audio tracks may be stored in a cache or a hard disk of the current terminal device, except for the discrete audio track data acquired from the audio data in the multimedia file. For example, the acquired discrete audio track data is the sound of poetry read by children, and the preset audio track needing to be added in an overlapping manner is background music of the moons of the lotus pool, and then the sound is overlapped.

It should be noted that, in the audio processing, an open-source audio processing method, such as Ffmpg, may be selected.

205. And when a real-time preview request is received, displaying the video data and the audio data.

The real-time preview request is a request input by a user and required to preview a state of a currently processed video or audio, the real-time preview request is used for indicating that a video image processed by simulation playing is simulated, the video image can be browsed for each frame and also can be played in a video form, the audio processed by simulation playing is also simulated, generally, the real-time browse request is also used for indicating that an unprocessed original image and an unprocessed original audio are simulated and displayed, and the time for receiving the real-time preview request is specifically determined, and embodiments of the present invention are not particularly limited.

206. And coding the processed video data and the processed audio data to obtain the multimedia video.

This step is the same as step 104 shown in fig. 1, and will not be described herein again.

207. And receiving a speed adjusting instruction, and adjusting the playing speed of video data and audio data in the multimedia video according to speed information carried in the speed adjusting instruction.

The speed adjustment instruction carries speed information, which may be speed-up or speed-down, and specific data may be carried in the speed information, which is not specifically limited in the embodiment of the present invention.

It should be noted that, the specific method for adjusting the speed may be to adjust the number of frames of the image within 1 second for the video data to achieve the adjustment of the speed of playing the video, and to adjust the speed of playing the audio track within the preset time for the audio data to achieve the adjustment of the speed of playing the audio.

For the embodiment of the present invention, specific application scenarios may be as follows, but are not limited to the following scenarios, including: intercepting a multimedia file of online reading of boys, decoding video data and audio data of the online reading of the boys according to a video track and an audio track, and respectively processing the video data and the audio data if an effect identifier input by a user is a spitting character conversion identifier.

The embodiment of the invention realizes the editing of live or intercepted video and increases the playing effect of short video by decoding video data and audio data in a multimedia file, rendering the video data according to a video effect identifier, effectively superposing the audio data according to the audio effect identifier and encoding the audio data into the multimedia video, thereby enabling the video to be more vivid, improving the display effect of video content, enabling characters in the edited video to be more attached to rendered images, designing and recording the short video according to different requirements, increasing the application of the short video and improving the use efficiency of the video.

Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides an apparatus for editing a multimedia video, as shown in fig. 3, the apparatus includes: an acquisition unit 31, a decoding unit 32, a processing unit 33, and an encoding unit 34.

An acquisition unit 31 for acquiring a multimedia file; the acquiring unit 31 executes a function module for acquiring a multimedia file for an editing apparatus of a multimedia video.

A decoding unit 32 for decoding video data and audio data in the multimedia file; the decoding unit 32 is a functional module for an editing device of multimedia video to execute decoding of video data and audio data in the multimedia file.

A processing unit 33, configured to perform rendering processing on the video data and perform audio track processing on the audio data; the processing unit 33 is a functional module for performing rendering processing on the video data and performing audio track processing on the audio data for an editing apparatus of a multimedia video.

And an encoding unit 34, configured to encode the processed video data and the processed audio data to obtain a multimedia video. The encoding unit 34 is a functional module for performing encoding on the processed video data and the processed audio data by the editing apparatus of the multimedia video to obtain the multimedia video.

The invention provides a multimedia video editing device, which is compared with the existing device that short videos intercepted from live broadcast or small videos cannot be edited, the embodiment of the invention respectively processes the video data and the audio data by decoding the video data and the audio data in a multimedia file, and encodes the video into the multimedia video, thereby realizing the editing of the live broadcast or intercepted videos, increasing the playing effect of the short videos, leading the videos to be more vivid, leading characters in the edited videos to be more attached to rendered images, and improving the use efficiency of the videos.

Further, as an implementation of the method shown in fig. 2, an embodiment of the present invention provides another apparatus for editing a multimedia video, as shown in fig. 4, the apparatus includes: an acquisition unit 41, a decoding unit 42, a processing unit 43, an encoding unit 44, a presentation unit 45, and an adjustment unit 46.

An acquisition unit 41 for acquiring a multimedia file;

a decoding unit 42 for decoding video data and audio data in the multimedia file;

a processing unit 43, configured to perform rendering processing on the video data and perform audio track processing on the audio data;

and an encoding unit 44, configured to encode the processed video data and the processed audio data to obtain a multimedia video.

Specifically, in order to process video and audio according to the requirement of the user, the processing unit 43 includes:

a receiving module 4301, configured to receive a processing instruction input by a user, where the processing instruction carries an effect identifier;

and the processing module 4302 is configured to render the video data according to the video effect identifier in the effect identifier, and process the audio data according to the audio effect identifier in the effect identifier.

Specifically, in order to implement the processing steps of the video data, the processing module 4302 includes:

an extraction submodule 430201, configured to extract image data of each frame in the video data, and perform filter processing on the image data;

and the synthesis submodule 430202 is configured to identify a target image in the image data after the filter processing according to the video effect identifier, and perform synthesis rendering on the target image.

The synthesis sub-module 430202 is specifically configured to, if it is identified that the video effect identifier is a synthesized stereo image, segment the target image, and perform coloring synthesis on the target image, the segmented target image, and the rendered image according to a preset coloring rule, where the preset coloring rule is used to reflect a position display relationship among the target image, the segmented target image, and the rendered image.

Specifically, in order to implement the processing steps of the audio data, the processing module 4302 further includes:

an acquisition sub-module 430203, configured to acquire discrete audio track data in the audio data at preset time intervals;

and the superposition submodule 430204 is configured to effectively superimpose the discrete audio track data with a preset audio track according to the audio effect identifier.

The decoding unit 42 is specifically configured to decode video data and audio data in the multimedia file according to a video track and an audio track, respectively.

Further, in order to facilitate the user to preview the rendered video and the processed audio at any time, the apparatus further comprises:

a display unit 45, configured to display the video data and the audio data when a live preview request is received.

Further, in order to adjust the speed of playing the video at will, the device further comprises:

the adjusting unit 46 is configured to receive a speed adjusting instruction, and adjust the playing speed of the video data and the audio data in the multimedia video according to the speed information carried in the speed adjusting instruction.

The invention provides another multimedia video editing device, which is characterized in that video data and audio data in a multimedia file are decoded, the video data are rendered according to a video effect identifier, the audio data are effectively overlapped according to the audio effect identifier and then are encoded into a multimedia video, the editing of a live or intercepted video is realized, the playing effect of a short video is increased, the video is more vivid, the display effect of video content is improved, characters in the edited video are more attached to rendered images, the recorded short video can be designed according to different requirements, the purpose of the short video is increased, and the use efficiency of the video is improved.

An embodiment of the present invention provides a storage device, in which a plurality of instructions are stored, the instructions being adapted to be loaded and executed by a processor: acquiring a multimedia file; decoding video data and audio data in the multimedia file; rendering the video data and performing audio track processing on the audio data; and coding the processed video data and the processed audio data to obtain the multimedia video.

The embodiment of the invention provides a mobile terminal, which comprises a processor, a processor and a control module, wherein the processor is suitable for realizing various instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to: acquiring a multimedia file; decoding video data and audio data in the multimedia file; rendering the video data and performing audio track processing on the audio data; and coding the processed video data and the processed audio data to obtain the multimedia video.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be understood by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the multimedia video editing method and apparatus according to the embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The embodiment of the invention discloses:

a1, a method for editing multimedia video, comprising:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

A2, the rendering the video data and the soundtrack processing the audio data according to the method of A1 comprising:

A3, according to the method of A2, the rendering the video data according to the video effect identification in the effect identification includes:

A4, according to the method of A3, the identifying a target image in the filter-processed image data according to the video effect, and the compositely rendering the target image includes:

A5, the method of A2, the processing the audio data according to the audio effect identification of the effect identifications including:

A6, the method of A1, the decoding video data and audio data in the multimedia file comprising:

A7, after the rendering processing the video data and the soundtrack processing the audio data according to the method of A1, the method further comprising:

A8, the method of A1, the method further comprising:

B9, an apparatus for editing a multimedia video, comprising:

the acquiring unit is used for acquiring the multimedia file;

a decoding unit for decoding video data and audio data in the multimedia file;

B10, the apparatus according to B9, the processing unit comprising:

B11, the apparatus of B10, the processing module comprising:

B12, the device according to B11,

B13, the apparatus of B10, the processing module further comprising:

B14, the device according to B9,

B15, the apparatus of B9, the apparatus further comprising:

B16, the apparatus of B9, the apparatus further comprising:

C17, a storage device having stored therein a plurality of instructions adapted to be loaded and executed by a processor:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

D18, a mobile terminal comprising a processor adapted to implement various instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

Claims

1. A method for editing a multimedia video, comprising:

acquiring a multimedia file; decoding video data and audio data in the multimedia file;

rendering the video data and performing audio track processing on the audio data, including:

receiving a processing instruction input by a user, wherein the processing instruction carries an effect identifier; rendering the video data according to the video effect identifier in the effect identifiers, and processing the audio data according to the audio effect identifier in the effect identifiers;

coding the processed video data and the processed audio data to obtain a multimedia video;

the rendering the video data according to the video effect identifier in the effect identifiers comprises:

identifying a target image in the image data processed by the filter according to the video effect identifier, and performing synthesis rendering on the target image;

the identifying a target image in the image data after the filter processing according to the video effect identifier and the synthesizing and rendering the target image comprise:

if the video effect identifier is recognized to be a composite stereo image, segmenting the target image, and coloring and synthesizing the target image, the segmented target image and the rendered image according to a preset coloring rule, wherein the preset coloring rule is used for reflecting a position display relation among the target image, the segmented target image and the rendered image, the rendered image is a bitmap which needs to be added, and the position display relation refers to whether the rendered image is displayed or not when the rendered image covers the target image, how much the rendered image is displayed and whether the rendered image needs to be hidden or not.

2. The method of claim 1, wherein the processing the audio data according to the audio effect identifier of the effect identifiers comprises:

3. The method of claim 1, wherein the decoding video data and audio data in the multimedia file comprises:

4. The method of claim 1, wherein after the rendering the video data and the soundtrack processing the audio data, the method further comprises:

5. The method of claim 1, further comprising:

6. An apparatus for editing a multimedia video, comprising:

the acquiring unit is used for acquiring the multimedia file;

a decoding unit for decoding video data and audio data in the multimedia file;

a processing unit, configured to perform rendering processing on the video data and perform audio track processing on the audio data, including: the receiving module is used for receiving a processing instruction input by a user, wherein the processing instruction carries an effect identifier; the processing module is used for rendering the video data according to the video effect identifier in the effect identifiers and processing the audio data according to the audio effect identifier in the effect identifiers;

the encoding unit is used for encoding the processed video data and the processed audio data to obtain a multimedia video;

the processing module comprises:

the synthesis submodule is used for identifying a target image in the image data after the filter processing according to the video effect identifier and performing synthesis rendering on the target image;

the composition sub-module is specifically configured to, if it is identified that the video effect identifier is a composite stereo image, segment the target image, and perform coloring composition on the target image, the segmented target image, and the rendered image according to a preset coloring rule, where the preset coloring rule is used to reflect a position display relationship among the target image, the segmented target image, and the rendered image, the rendered image is a bitmap that needs to be added, and the position display relationship refers to whether the rendered image is displayed, how much it is displayed, and whether the rendered image needs to be hidden when the rendered image covers the target image.

7. The apparatus of claim 6, wherein the processing module further comprises:

8. The apparatus of claim 6,

9. The apparatus of claim 6, further comprising:

10. The apparatus of claim 6, further comprising:

11. A memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;

12. A mobile terminal comprising a processor adapted to implement various instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to:

acquiring a multimedia file;

decoding video data and audio data in the multimedia file;