CN109218630B

CN109218630B - Multimedia information processing method and device, terminal and storage medium

Info

Publication number: CN109218630B
Application number: CN201710546333.8A
Authority: CN
Inventors: 赵亮; 冯驰伟; 张中宝; 王文涛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-07-06
Filing date: 2017-07-06
Publication date: 2022-04-12
Anticipated expiration: 2037-07-06
Also published as: CN109218630A

Abstract

The embodiment of the invention discloses a multimedia information processing method and device, a terminal and a storage medium, wherein the method comprises the following steps: the method comprises the steps that a client side obtains at least two pieces of multimedia information shot by at least two cameras in more than two cameras of a terminal; determining synthesis parameters for synthesizing the at least two pieces of multimedia information; synthesizing the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information; receiving a first operation, wherein the first operation is used for sending the synthesized multimedia to a server corresponding to the client; and responding to the first operation, and sending the synthesized multimedia to a server corresponding to the client.

Description

Multimedia information processing method and device, terminal and storage medium

Technical Field

The present invention relates to internet technologies, and in particular, to a method and an apparatus for processing multimedia information, a terminal, and a storage medium.

Background

Nowadays, more and more clients, such as most of social Application (APP) or instant messaging Application, support recording videos or taking pictures, or directly select locally-available videos or pictures for simple processing and then send out, such as QQ, wechat, and the like. The prior art scheme utilizes the video recording and photographing functions provided by a single camera of the mobile phone, can support the completion of video recording and picture photographing in the APP of the user, and simultaneously supports some simple processing after the completion of the photographing. For example, when such a client publishes a video/picture, only one camera is used to shoot the video/picture at a time, for example, a front camera is used to shoot a self-shot video, or a rear camera is used to record some events or news, and the video/picture is sent out after the shooting is completed through simple processing. Some clients support locally selecting videos for editing, such as video cropping, and then distributing the cropped videos/pictures.

However, the video/picture acquisition schemes of the clients are single in form, the expressed content is only the video content acquired by a single camera, and the content is not rich and stereoscopic enough.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a multimedia information processing method and apparatus, a terminal, and a storage medium to solve at least one problem in the prior art.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a multimedia information processing method, which comprises the following steps:

the method comprises the steps that a client side obtains at least two pieces of multimedia information shot by at least two cameras in more than two cameras of a terminal;

determining synthesis parameters for synthesizing the at least two pieces of multimedia information;

synthesizing the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information;

receiving a first operation, wherein the first operation is used for sending the synthesized multimedia to a server corresponding to the client;

and responding to the first operation, and sending the synthesized multimedia to a server corresponding to the client.

The embodiment of the invention provides a multimedia information processing device, which comprises an acquisition unit, a determination unit, a synthesis unit, a receiving unit and a sending unit, wherein:

the acquiring unit is used for acquiring at least two pieces of multimedia information shot by at least two cameras in more than two cameras of the terminal;

the determining unit is configured to determine a synthesis parameter for synthesizing the at least two pieces of multimedia information;

the synthesis unit is used for synthesizing the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information;

the receiving unit is configured to receive a first operation, where the first operation is used to send the synthesized multimedia to a server corresponding to the client;

and the sending unit is used for responding to the first operation and sending the synthesized multimedia to a server corresponding to the client.

An embodiment of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to implement the multimedia information processing method when executing the program.

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the multimedia information processing method described above.

In the embodiment of the invention, a client acquires at least two pieces of multimedia information shot by at least two cameras in more than two cameras of a terminal; determining synthesis parameters for synthesizing the at least two pieces of multimedia information; synthesizing the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information; receiving a first operation, wherein the first operation is used for sending the synthesized multimedia to a server corresponding to the client; responding to the first operation, and sending the synthesized multimedia to a server corresponding to the client; therefore, the client can synthesize the images acquired by the plurality of cameras and finally generate an independent and complete video for publishing; therefore, the same scene can be expressed from a plurality of latitudes, or a plurality of different scenes can be expressed at one time, so that a user watching the video can understand the 'mutual scene and the scene' from a plurality of latitudes, the content is richer, the expression is more three-dimensional, the scene sense and the interestingness are greatly increased, and the user experience is improved.

Drawings

FIG. 1A is a schematic diagram of a system architecture;

FIG. 1B is a schematic diagram of a flow chart of a multimedia information processing method according to an embodiment of the present invention;

FIG. 2A is a schematic diagram illustrating an embodiment of the present invention;

FIG. 2B is a diagram illustrating an exemplary determination of a composite location area based on user operations according to an embodiment of the invention;

FIG. 2C is a diagram illustrating an exemplary process of determining a blank window area according to an embodiment of the invention;

FIG. 2D is a schematic diagram illustrating an example of boundary extraction according to an embodiment of the invention;

FIG. 3A is a schematic view of a flow chart of a video synthesis method according to an embodiment of the present invention;

FIG. 3B is a first scenario diagram according to an embodiment of the present invention;

FIG. 3C is a diagram illustrating a second scenario according to an embodiment of the present invention;

FIG. 3D is a third exemplary diagram illustrating synthesis according to an embodiment of the present invention;

FIG. 3E is a diagram of a fourth example of synthesis according to an embodiment of the present invention;

FIG. 4 is a schematic view of a flow chart of a video synthesis method according to an embodiment of the present invention;

FIG. 5 is a schematic view of a flow chart of implementing a video synthesis method according to an embodiment of the present invention;

FIG. 6 is a block diagram of a multimedia information processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware entity of the terminal according to the embodiment of the present invention.

Detailed Description

For a better understanding of the various embodiments of the present invention, the relevant terminology is now explained as follows:

front camera: a camera arranged on the same side as a main screen of the terminal; if the terminal only comprises one display screen, the screen is a main screen, and the camera arranged on one side of the display screen is a front camera; some terminals set up more than one display screen, including the display screen more than two, the display screen that shows the interface of using so is main screen, and leading camera is the camera in the same face with main screen so. Taking a mobile phone as an example, the mobile phone only includes one display screen, and then the camera above the display screen of the mobile phone is a front-facing camera, and generally speaking, the front-facing camera is used for self-shooting.

A rear camera: a camera arranged on the same side of the main screen of the terminal; if the terminal only comprises one display screen, the screen is a main screen, and the camera arranged on the back of the display screen is a rear camera; some terminals set up more than one display screen, including more than two display screens, the display screen that shows the interface of using is the main screen then, and the display screen that does not show application interface is secondary screen, and the rear camera is the camera that is in the same face with secondary screen then. Taking a mobile phone as an example, the mobile phone only includes one display screen, and then the camera above the display screen of the mobile phone is a front-facing camera, and the camera on the back of the display screen of the mobile phone is a rear-facing camera, and is located on the opposite side of the display screen of the mobile phone.

Two cameras: the front camera and the rear camera are collectively called.

Front video: and video content collected by the front camera.

Post-video: and video content collected by the rear camera.

The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.

Fig. 1A is a schematic diagram of a system architecture, and fig. 1A illustrates a communication system 1, where the system 1 includes a terminal 11 and a server 12, where various clients (e.g., applications) are installed and run on the terminal 11, the clients include clients such as social software, instant messaging software, self-media software, and information sharing software, and the server is a server corresponding to the clients. In this example, the terminal 11 and the server 12 may be one or more, and therefore, the system 1 includes one or more terminals 11 having clients installed therein and one or more servers 12, and these terminals 11 and servers 12 are connected via a network 13. In the embodiment of the present invention, the network side server 12 may interact with the terminal 11 through the client, the terminal 11 sends the material to be issued to the server 12, and then the server issues the received material to be issued. The material to be distributed comprises text and multimedia information, wherein the multimedia information at least comprises one of a photo and a video, or a combination of the text and the photo.

It should be noted that some embodiments of the present invention may be based on the system architecture presented in fig. 1A.

The embodiment of the invention provides a multimedia information processing method which is applied to a terminal, the functions realized by the method can be realized by calling a program code through a processor in the terminal, the program code can be saved in a computer storage medium, and the program code can be used as a client terminal in the realization process, so that the program code can be installed on the terminal and run by the terminal. As can be seen from the above, the terminal includes at least a processor and a storage medium. Fig. 1B is a schematic view of an implementation flow of a multimedia information processing method according to an embodiment of the present invention, as shown in fig. 1B, the method includes:

step S101, a client acquires at least two pieces of multimedia information shot by at least two cameras in more than two cameras of a terminal;

in this embodiment, the two or more include two, three, four, and so on.

In this embodiment, the camera includes various components capable of image acquisition.

In this embodiment, the client includes various clients capable of uploading and distributing multimedia information, such as social software, instant messaging software, self-media software, information sharing software, and information uploading software, and the user publishes the shot multimedia information through the social software (such as facebook, QQ space, microblog, and the like), and the user can also send the shot multimedia information to friends through the instant messaging software, and the user can also upload the shot multimedia information through the self-media software, and the user can also share the shot multimedia information through the information sharing software (such as video software YouTube, beep li, and photo guest).

In this example, the multimedia information includes at least one of a photograph and a video, for example, the photograph or the video taken by the user may be the multimedia information; the user performs editing processing on the shot photo or video, such as rotation, cropping, filtering, beautifying, text adding, saturation enhancing, brightness enhancing, contrast enhancing and the like, and the edited photo or video can also be used as multimedia information.

In this embodiment, the terminal includes more than two cameras, for example, two cameras, three cameras, four cameras, wherein a general mobile terminal such as a smart phone or a tablet computer generally includes two cameras, namely a first camera and a second camera; a mobile terminal having a feature function, for example, a three-dimensional imaging (3D) function, an Augmented Reality (AR) function, or a Virtual Reality (VR) function may include three or more cameras. In the following, two cameras are taken as an example for explanation, the terminal includes a first camera and a second camera, and step S101 includes: the client side obtains first multimedia information shot by the first camera and second multimedia information shot by the second camera. Generally, when the terminal includes two cameras, the first camera and the second camera may be a front camera and a rear camera, respectively, and then step S101 includes: the client side obtains first multimedia information shot by the front camera and second multimedia information shot by the rear camera.

Step S102, determining a synthesis parameter for synthesizing the at least two multimedia information;

in this example, the synthesis parameters include any one or more parameters capable of synthesizing at least two pieces of multimedia information into a whole, such as a synthesis style or a synthesis template, a synthesis position, a foreground tag, a background tag, a synthesis rule, and the like.

The synthesized style or the synthesized template can be a parameter for typesetting at least two pieces of collected multimedia information, can be preset by default, is updated by the server at regular time, and can also be some template styles or templates set by the user.

Step S103, synthesizing the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information;

in other embodiments, the terminal includes a front camera and a rear camera, the synthesis parameter includes a synthesis location area, and the determining the synthesis parameter for synthesizing the at least two pieces of multimedia information includes: determining a synthesis position area of the first multimedia information shot by the front camera and the second multimedia information shot by the rear camera; correspondingly, the synthesizing the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information includes: and adding the first multimedia information shot by the front camera to the synthesis position area to obtain synthesized multimedia information.

Step S104, receiving a first operation, wherein the first operation is used for sending the synthesized multimedia to a server corresponding to the client;

and step S105, responding to the first operation, and sending the synthesized multimedia to a server corresponding to the client.

In the embodiment of the invention, a user is allowed to start a plurality of cameras of the terminal simultaneously during shooting, images collected by the plurality of cameras are combined, previewed and displayed, various combined rendering effects are increased, and finally an independent and complete video is generated and distributed. Therefore, the meaning of the video can be expressed from a plurality of latitudes, and a video viewer can understand 'the scene of each other' from a plurality of latitudes, so that the content is richer, the expression is more three-dimensional, the scene feeling and the interestingness are greatly increased, and the user experience is improved.

The first operation and other operations such as the second operation, the third operation, the fourth operation, and the like in the following embodiments may be user operations, the type of the operation is related to an input device of the terminal, if the input device is a mouse, the operation is a click operation, and if the input device is a touch screen, the operation is a touch operation. The input device of the terminal may be of another type, and the terminal receives an operation of a user through a touch screen or a mouse, generates an instruction according to the operation of the user, and then executes the instruction, for example, the first operation acts to send the synthesized multimedia to the server corresponding to the client, and then the terminal generates an instruction "to send the synthesized multimedia to the server corresponding to the client" according to the first operation, and then executes the instruction, and sends the synthesized multimedia to the server corresponding to the client. The following second operation, third operation, fourth operation, and the like are similar to the first operation, and can be understood with reference to the first operation.

For example, fig. a1 shown in fig. 2A is a second image (second multimedia information) captured by a rear camera, and fig. a1 is a first image (first multimedia information) captured by a front camera; b 1-b 3 show synthesized patterns or templates, respectively, according to which the first image and the second image are synthesized. The template shown in fig. b1 includes regions b11 and b12 to which the first image and the second image are added, when the first image and the second image are synthesized, the first image or the second image may be added to the region b11, or the second image or the first image may be added to the region b12, and fig. c1 shows an interface schematic diagram in which the first image is added to the region b12 and the second image is added to the region b 11; the template shown in fig. b2 includes regions b13 and b14 to which the first image and the second image are added, and when the first image and the second image are synthesized, the first image or the second image may be added to the region b13, or the second image or the first image may be added to the region b14, and fig. c2 shows an interface diagram in which the first image is added to the region b13 and the second image is added to the region b 14; the template shown in fig. b3 includes regions b15 and b16 to which the first image and the second image are added, and when the first image and the second image are synthesized, the first image or the second image may be added to the region b15, or the second image or the first image may be added to the region b16, and fig. c3 shows an interface diagram in which the first image is added to the region b15 and the second image is added to the region b 16. In addition, the region b15 in the template shown in fig. b3 may or may not be fixed, and the user may set the position of b15 or the terminal may determine the position by itself.

As can be seen from the diagrams shown in fig. 2A, in the multimedia information synthesized by using the synthesis style or the synthesis template, it can be realized that at least two pieces of collected multimedia information can be simultaneously displayed in one interface, wherein the original multimedia information (the first image and the second image) in the multimedia information synthesized by the templates shown in b1 and b2 can be independent from each other, and the original multimedia information (the first image and the second image) in the multimedia information synthesized by the template shown in b3 can also be nested with each other, that is, the template shown in fig. b3 has a difference between a foreground and a background.

In the implementation process, when the original respective multimedia information in the synthesized style or template is nested with each other (as shown in fig. 2A, b 3), the synthesis parameter includes a synthesis location area, and correspondingly, the determining the synthesis parameter for synthesizing the at least two pieces of multimedia information includes: a process of determining a composite location area, wherein the determination of the composite location area comprises the following two ways:

the method comprises the steps of receiving a third operation, determining multimedia information serving as a background from the at least two pieces of multimedia information based on the third operation, namely determining the multimedia information serving as the background from the at least two pieces of multimedia information by the third operation, and then displaying the multimedia information serving as the background on a display interface of the terminal by the terminal; receiving a fourth operation, wherein the fourth operation is used for determining a synthesis position area of other multimedia information on the multimedia information as the background; and determining a synthesis position area of other multimedia information on the multimedia information as the background based on the fourth operation, and adding the other multimedia information on the multimedia information as the background according to the synthesis position area. Wherein the other multimedia includes all or part of the multimedia information except the multimedia information as the background in the at least two multimedia information.

In other embodiments, the terminal includes a front camera and a rear camera, and the step of determining the multimedia information as the background in the first mode may be omitted, that is, the first mode includes the following two steps: step SA11, receiving a second operation for the second multimedia information; step SA12, determining a composite position area based on the position corresponding to the second operation. In this example, the second multimedia information is shot by the rear camera, so the second multimedia information is suitable as a background, and the first multimedia information is shot by the front camera, so the first multimedia information is suitable as a foreground, so the second multimedia information can be displayed on the display screen of the terminal, and then a second operation of the user for the second multimedia information is received, wherein the second operation is used for determining a composite position area of the first multimedia information on the second multimedia information as the background. Of course, if the user wants to change the background, an alternative operation may be provided in the implementation process, and the user changes the background from the second multimedia information to the first multimedia information using the alternative operation.

In the embodiments shown in SA11 and SA12, the terminal needs to determine which is the rear camera shot from the multimedia information shot by the front camera and the rear camera, so as to use the rear camera shot as the background, and therefore, in other embodiments, the method further comprises: the client determines the multimedia information shot by the rear camera according to the attributes of the first and second multimedia information, wherein the attributes of the multimedia information comprise the size, the format, the source identifier and the shooting time of a file, if the multimedia information is a picture, the shooting time is the moment, and if the multimedia information is a video, the shooting time comprises the shooting starting moment and the duration, wherein the source identifier represents whether the multimedia information is shot by the front camera or the rear camera. And then the multimedia information (second multimedia information) shot by the rear camera is taken as a background.

Taking fig. 2B as an example, after the client acquires at least two pieces of multimedia information, the client acquires attributes of the at least two pieces of multimedia information, identifies a source in the attributes as a background of the multimedia information of the rear camera, and assumes that the client uses the second multimedia information of fig. a1 as the background, and then the user performs a second operation on the second multimedia information, for example, draws a circle 21 on the second multimedia information as a composite position area, and the client determines the composite position area according to a position corresponding to the second operation, for example, draws a circle 20. And then the client adds the first multimedia information to the second multimedia information according to the synthesis position area, so that the synthesized multimedia information is obtained.

The first mode provides a mode for determining a synthetic position area based on user operation, the second mode provides a mode for automatically determining a synthetic position area by a client, wherein the synthetic parameters comprise a foreground label, a background label and a synthetic position area, and the second mode comprises the following steps:

determining a foreground label or a background label for each of the at least two multimedia information at SB 11;

at SB12, carrying out image recognition on the multimedia information with the background label to obtain a blank window area;

in other embodiments, the image recognition may include color recognition and image texture recognition, and generally, the areas with better color consistency have less effective information; similarly, the effective information brought by the area with less image texture is less; then the areas with better color consistency and the areas with less image texture can be determined as the areas with empty windows. In the implementation process, the area where the texture features meet the preset condition may be determined as the empty window area. When the image recognition adopts color recognition, the embodiment further includes a step of determining a color value of the empty window region.

At SB13, the blank window region is determined as a synthesized position region.

Correspondingly, in step S104, the synthesizing the at least two pieces of multimedia information according to the synthesis parameter to obtain synthesized multimedia information includes: and adding the multimedia information with the foreground label to the synthesis position area to obtain the synthesized multimedia information.

In this embodiment, the empty window region is a color consistency region and does not include characters, the color consistency region may be implemented by using color identification, for example, a pixel region where a color difference is within a threshold range may be determined as the color consistency region. The color recognition of the multimedia information with the background label to obtain the empty window area comprises the following steps: carrying out color identification on the multimedia information with the background label to obtain a color consistency area on the multimedia information with the background label; referring to fig. 2C, assuming that the multimedia information with the background label is a drawing a or a drawing B in fig. 2C, analyzing the drawing a to obtain color consistency regions 2C1 and 2C 2; analyzing the B picture to obtain color consistency areas 2c3 and 2c 4; and then eliminating the region comprising the characters on the color consistency region to obtain a hollow window region. In other embodiments, the method further comprises: whether the empty window area is larger than a preset pixel area is judged, for example, generally, the two-dimensional code area at least needs 100 pixels × 100 pixels, if the two-dimensional code area is too small, the area of the adding and combining position is too small, so that the area does not meet the set threshold value of the pixel area, and cannot be regarded as the empty window area. And if the condition that the empty window area is larger than a preset pixel area is met, determining the empty window area as the image area.

In this embodiment, since the multimedia information may be a video, the composition of the multimedia information may be a composition of a video, and since the video is one frame by one frame, the composition between the video of multiple cameras, for example, the front camera and the rear camera, needs a one-frame by one-frame composition, assuming that the front camera has three frames [ a1, a2, a3], the rear camera has three frames [ b1, b2, b3], and the synthesized video also has three frames [ c1, c2, c3 ]; wherein c1 may be the result of the synthesis of a1 and b1, and is expressed by the formula c1 ═ a1+ b 1; c1 may be the result of the synthesis of a1 and b3, and may be formulated as c1 ═ a1+ b3, c1 may be the result of the synthesis of a2 and b3, and may be formulated as c1 ═ a2+ b 3; the relationship represented by the above formula may be embodied by association information. Thus, the multimedia information is a video, and the synthesizing of the at least two multimedia information according to the synthesis parameters to obtain synthesized multimedia information includes: determining a foreground label or a background label for each of the at least two multimedia information; carrying out image recognition on the multimedia information with the background label to obtain a blank window area; determining the empty window area as a synthetic position area; determining a frame order of each of the at least two multimedia information; and establishing association information between the frame sequence of the multimedia information with the foreground label and the frame sequence of the multimedia information with the background label. Correspondingly, the adding the multimedia information with the foreground label to the synthesis position area to obtain the synthesized multimedia information includes: and adding the multimedia information with the foreground label in the synthesis position area according to the associated information to obtain the synthesized multimedia information. In other embodiments, it is determined whether the number of frames of the multimedia information with the background tag is consistent with the number of frames of the multimedia information with the foreground tag, and if not, null frames are used for correspondence, that is, the number of frames of the multimedia information with the background tag is smaller than the number of frames of the multimedia information with the foreground tag, and there are sometimes no multimedia information with the background tag in the process of broadcasting the number of frames of the multimedia information with the foreground tag. In other embodiments, the two may be aligned by a minimum number.

In this embodiment, a problem that which multimedia information is superimposed on another multimedia information exists when a plurality of pieces of multimedia information are synthesized, and this embodiment gives a label to each acquired video or image, that is, the video is a foreground or a background of the synthesized video or image; correspondingly, the multimedia information is a video, and the synthesizing of the at least two multimedia information according to the synthesis parameter to obtain synthesized multimedia information includes: determining a foreground label or a background label for each of the at least two multimedia information; carrying out image recognition on the multimedia information with the background label to obtain a blank window area; and determining the empty window area as a synthesis position area. Adding the multimedia information with the foreground label to the synthesis position area to obtain synthesized multimedia information, wherein the step of obtaining the synthesized multimedia information comprises the following steps: step S31A, carrying out boundary detection on the multimedia information with the foreground label to obtain an area outside the boundary; step S32A, color filling is carried out on the area outside the boundary according to the color value of the synthesis position area, and the filled multimedia information with the foreground label is obtained; and step S33A, adding the color-filled multimedia information with foreground label to the synthesis position area, to obtain synthesized multimedia information. As shown in fig. 2D, it is assumed that the multimedia information with foreground labels is the graph a of fig. 2D, the result of performing boundary extraction (also called edge detection) on the graph a of fig. 2D is shown in the graph b of fig. 2D, the area inside the boundary is shown in the graph c of fig. 2D, the area outside the boundary is shown in the graph D of fig. 2D, then the graph D of fig. 2D is color-filled according to the color value of the synthesized position area, and the shaded area represented by the oblique lines in the graph D is filled with green if the color value of the synthesized position area is green.

In this embodiment, the synthesizing the at least two pieces of multimedia information according to the synthesis parameter to obtain synthesized multimedia information includes: determining a foreground label or a background label for each of the at least two multimedia information; carrying out image recognition on the multimedia information with the background label to obtain a blank window area; determining the empty window area as a synthetic position area; adding the multimedia information with the foreground label to the synthesis position area to obtain synthesized multimedia information, wherein the step of obtaining the synthesized multimedia information comprises the following steps: step S31B, extracting the boundary of the multimedia information with the foreground label to obtain the area in the boundary; step S32B, adding the area in the boundary to the synthesis position area to obtain the synthesized multimedia information. As shown in fig. 2D, assuming that the multimedia information with foreground labels is shown in a diagram of fig. 2D, a result of performing boundary extraction (also called edge detection) on the a diagram of fig. 2D is shown in b diagram of fig. 2D, and an area within the boundary is shown in c diagram of fig. 2D, and the c diagram of fig. 2D is added to the synthesized area to obtain synthesized multimedia information.

In other embodiments of the present invention, for different types of terminals, for example, some terminals support turning on multiple cameras simultaneously, and some terminals support turning on only one camera at a time. The present embodiment also provides different solutions to this difference in terminal capabilities. Correspondingly, in step S101, the acquiring at least two pieces of multimedia information shot by at least two cameras in the two or more cameras of the terminal includes: step S31C, judging whether the terminal supports the at least two cameras to shoot simultaneously, step S32C, if the terminal supports the at least two cameras to shoot simultaneously, calling the at least two cameras to shoot simultaneously; and acquiring the at least two pieces of multimedia information shot by the at least two cameras. Step S33C, if the terminal is determined not to support the simultaneous shooting of the at least two cameras, calling a default camera of the at least two cameras to shoot; after the multimedia information shot by the default camera is obtained, other cameras except the default camera in the at least two cameras are called to shoot in sequence; and acquiring multimedia information which is shot by the other cameras in sequence.

The embodiment of the invention can assist a user to simultaneously utilize the video recording functions of a plurality of front and rear cameras of the mobile phone to acquire, intensively display and process real-time videos and finally synthesize a solution for distributing the videos outwards.

In the embodiment of the invention, a user can simultaneously or sequentially utilize a plurality of front and rear cameras of the terminal to acquire videos. And the real-time processing of image contents, such as various filter effects, pendants, mosaics, barrage characters and the like, is supported in the acquisition process. Before formally generating a video file, merging and displaying processed contents acquired by a plurality of cameras is supported, and a user can select a scheme and an effect of merging and displaying in real time, such as the position, size, shape, rotation angle and the like of each video/picture during superposition, and various rendering effects supported in the acquisition process are also supported. When the user confirms the final composite effect, a unique video file which contains all the contents and is distributed outside is generated.

For different types of terminals, for example, some terminals support turning on multiple cameras simultaneously, and some devices support turning on one camera at a time. The present technology provides different solutions to this difference in device capabilities. Fig. 3A is a schematic flow chart of an implementation of a video synthesis method according to an embodiment of the present invention, where steps S303 to S311 on the left side of fig. 3A are schematic flow charts of an implementation of a device that only supports turning on one camera at a time, and steps S313 to S318 on the right side of fig. 3A are schematic flow charts of an implementation of a device that supports turning on multiple cameras at the same time, as shown in fig. 3A, the method includes:

step S301, entering a function entrance;

here, the user operates the APP on the terminal, and the terminal enters the function entrance according to the operation of the user; for example, the user opens the APP and clicks the "camera" button to enter a video recording page to start recording. Each step of the recording has corresponding prompts and directions to guide the user's operation. For example, an instant messaging APP is installed on a mobile phone of a user, as shown in fig. 3B, when the user selects to chat with one of his/her friends on the chat interface 30, for example, the user wants to share his/her video with a friend "Maffylee" 31, the user clicks the function entry 32, and in this example, the camera icon 32 is used as the function entry. After entering the recording interface, the camera of the mobile phone starts to work, and if the front-facing camera starts to work, a head portrait 33 (see fig. 3C) of the user is collected, and a picture previewed by the camera in real time is displayed on the screen of the mobile phone. The screen has a button 34 for switching between the front camera and the rear camera, a button 35 for turning on the flash, and the like. If the user has finished recording, the user may click on the stop button 36. In other embodiments of the present invention, the recording interface may allow the user to select a corresponding effect to process the collected data in real time, wherein the processing effect 37 includes a filter, a pendant, etc., and then the user clicks the recording button 36 to start recording the video, and the stop button and the recording button are the same button.

After the multi-segment video is shot, the video or the picture can be selected to be edited, such as adjusting the video synthesis effect, adjusting the size, the position, the direction and the like of the overlapped video/picture, adding published characters, watermarks, expressions, playing mosaic on the video or the picture, modifying background music, adding a filter effect and the like. The added effect supports real-time preview, which can be sent after user confirmation.

Step S302, judging that simultaneous recording of front and rear multiple cameras is supported, if so, simultaneously recording of the front and rear multiple cameras is supported, and entering step S313; if not, the front and rear multi-camera simultaneous recording is not supported, and the step S303 is entered;

step S303, starting any camera to select a scene for preview;

here, since the terminal only supports starting one camera for recording, the terminal may start a default camera, and if the started default camera is not the camera that the user wants to start, the user may switch on the interface; for example, the default camera started by the terminal is the rear camera, and the user does not want to start the rear camera but wants to start the front camera, then the user can operate on the interface, and then the started camera is switched, that is, the rear camera is switched to the front camera.

Here, after entering the recording interface, the camera of the mobile phone starts to operate, and assuming that the front camera starts to operate, the head portrait 33 (see fig. 3C) of the user is collected, and a picture for real-time preview of the camera is displayed on the screen of the mobile phone (see also a picture a in fig. 3D and a picture a in fig. 3E). The screen has a button 34 for switching between the front camera and the rear camera, a button 35 for turning on the flash, and the like. If the user has finished recording, the user may click on the stop button 36. In other embodiments of the present invention, the recording interface may allow the user to select a corresponding effect to process the collected data in real time, wherein the processing effect 37 includes a filter, a pendant, etc., and then the user clicks the recording button 36 to start recording the video, and the stop button and the recording button are the same button.

Step S304, collected data are preprocessed;

here, the terminal collects original video data and then preprocesses the collected original video data; including, for example, adding filter effects, cropping of video, etc. Referring to fig. 3D, b or fig. 3E,

icons

43 and 44 are edited templates, and when a certain template is selected by the user, a corresponding effect is added to the original image a captured by the camera, and if the template 43 is selected by the user,

corresponding effects

41 and 42 are added to the image a.

Step S305, generating a video/picture;

here, the terminal generates video/pictures from the preprocessed video data; referring to the diagram c in fig. 3D or the diagram c in fig. 3E, after the diagram b is edited, a video/picture as shown in the diagram c is generated.

Step S306, confirming video/picture playback;

here, the user can perform playback confirmation on the generated video, if the user wants to perform playback confirmation, the user clicks a playback button on the interface, and then the terminal performs playback operation according to the playback button clicked by the user; if the user is not satisfied with the video just generated, steps S303 to S305 may be repeated until the user is satisfied with the generated video/picture.

Step S307, starting another camera for view selection preview, and overlaying the acquired content onto the previous video for display;

here, the aforementioned steps S303 to S305 are continued, the steps S303 to S305 are a flow for one camera to generate video/pictures, and the steps S307 to S309 are a flow for another camera to generate video/pictures, similarly to the aforementioned steps S303 to S305.

Here, after the user starts the front camera to shoot the video/picture, and wants to start the rear camera to shoot the video/picture, the user selects the switch button on the interface to switch the front camera to the rear camera, and this starts another camera (rear camera) to select the scene preview by the terminal to start the recording.

Step S308, collected data are preprocessed;

here, the terminal collects original video data and then preprocesses the collected original video data;

step S309, generating another video/picture;

here, the terminal generates video/pictures from the video data preprocessed in step S308;

step S310, multi-video/picture superposition preview and effect adjustment;

here, the terminal synthesizes the video/picture generated in step S309 with the video/picture generated in step S306. Referring to fig. 3D or fig. 3E, assuming that the c-diagram is superimposed on an image of a table, a composite position area is determined in the image of the table, the c-diagram can be cut, the cut image is placed at the position shown by the icon 47, and then the composite video or picture is adjusted, such as adding characters 46 and a smiling face 45.

Here, the user may also perform playback confirmation on the video generated in step S309, and if the user wants to perform playback confirmation, the user clicks a playback button on the interface, and then the terminal performs playback operation according to the playback button clicked by the user; if the user is not satisfied with the video generated at step S309, steps S307 to S309 may be repeated until the user is satisfied with the generated video/picture.

In other embodiments of the present invention, the original video content collected in step S307 may be superimposed on the video/picture generated in step S306, and of course, the original video content collected in step S307 may not be superimposed on the video/picture generated in step S306. If the original video content acquired in S307 is superimposed on the video/picture generated in the previous step S306, the superimposed video can be obtained after the processing in step S309 is completed.

Step S311, confirming the synthesis effect;

here, a preview option may be displayed on an interface of the terminal, after the user selects preview, a confirmation dialog box pops up, if the user is satisfied with the composition effect, the user may perform a confirmation operation, if the user is not satisfied with the composition effect, the user may not perform the confirmation operation, thereby performing a cancellation operation, after the terminal receives the confirmation operation, the terminal may jump to a distribution interface, and if the terminal receives the cancellation operation of the user, the terminal may perform the composition interface again, thereby performing composition again.

Step S312, release;

continuing to carry out the step S311, if the user performs the confirmation operation, jumping to a publishing interface, the user performs the publishing operation on the publishing interface, and after the terminal receives the publishing operation, the terminal performs the publishing operation, that is, the terminal uploads the synthesized image (including the photo and the video) to the server, and the server receives the uploaded image and then publishes the synthesized image, and if the client used by the user is the WeChat, after publishing, the friend of the user can see the synthesized image uploaded by the user in the circle of friends. If the client used by the user is video sharing software, other visitors can see the synthesized image uploaded by the user.

Step S313, starting multi-camera scene selection preview;

after entering the recording interface, the camera of the mobile phone starts to work, and if the front-facing camera starts to work, a head portrait 33 (see fig. 3C) of the user is collected, and a picture previewed by the camera in real time is displayed on the screen of the mobile phone. The screen has a button 34 for switching between the front camera and the rear camera, a button 35 for turning on the flash, and the like. If the user has finished recording, the user may click on the stop button 36. In other embodiments of the present invention, the recording interface may allow the user to select a corresponding effect to process the collected data in real time, wherein the processing effect 37 includes a filter, a pendant, etc., and then the user clicks the recording button 36 to start recording the video, and the stop button and the recording button are the same button.

Step S314, processing the multi-video data in real time;

in this example, the terminal may perform real-time processing on the plurality of video data, and in other embodiments, the processing may not be real-time processing, that is, the previous video may have been processed before, and the real-time processing is limited to the step of composition.

Step S315, generating video/picture;

in this example, a synthesized video/picture is generated from the image processed in step S314.

Step S316, confirming video/picture playback;

step S317, multi-video/picture superposition preview and effect adjustment;

in this example, step S316 can refer to step S310 described above.

In step S318, the composite effect is confirmed.

In this example, step S316 can refer to step S311 described earlier.

For the video recorded in real time, according to the video synthesis logic shown in the following figure, a plurality of sections of videos can be synthesized into one video according to the editing effect of a user. Fig. 4 is a schematic flow chart of an implementation of a video synthesis method according to an embodiment of the present invention, as shown in fig. 4, the method includes:

step S401, starting a camera;

here, the user operates the APP on the terminal, and the terminal enters the function entrance according to the operation of the user; for example, the user opens the APP, clicks the "camera" button, and enters a video recording page to start recording, so that the terminal starts the camera. Each step of the recording has corresponding prompts and directions to guide the user's operation.

Step S402, preprocessing data collected by a camera;

the camera acquisition data preprocessing may include: the scene is projected on the surface of an image sensor through an optical image generated by a Lens (Lens), then converted into an electric Signal, converted into a Digital image Signal through Analog-to-Digital conversion (a/D), sent into a Digital Signal Processing chip (DSP) for Processing, and transmitted to a processor such as a Central Processing Unit (CPU) of a terminal through a data interface such as a Universal Serial Bus (USB).

Step S403, multi-camera parallel data;

step S404, data preprocessing: transcoding and adding a rendering effect;

step S405, storing the data to a cache or a local file;

step S406, playback preview and effect confirmation;

step S401 to step S406, after the terminal enters the camera page, a camera of the mobile phone starts to work, and a picture previewed by the camera in real time is displayed on the screen of the mobile phone. A button on the screen supports the switching of the front camera and the rear camera, and whether a flash lamp is started or not. Meanwhile, the entrance can allow a user to select corresponding effects to process the acquired data in real time, such as a filter, a pendant and the like. And clicking a recording button to start recording the video.

Step S407, the terminal judges whether the data acquisition is finished;

here, the terminal determines whether the data acquisition is completed, continues the acquisition if not, and proceeds to step S408 if completed. In the implementation process, whether the data acquisition is finished or not can be judged according to the selection operation of the user.

Step S408, video synthesis;

here, the video synthesis step in step S408 includes determination of synthesis parameters, and video synthesis is performed according to the synthesis parameters, and the synthesis parameters include a foreground tag, a background tag, a synthesis position region, and the like.

Step S409, rendering data;

here, the data rendering includes: after the multi-segment video is shot, the video or the picture can be selected to be edited, such as adjusting the video synthesis effect, adjusting the size, the position, the direction and the like of the overlapped video/picture, adding published characters, watermarks, expressions, playing mosaic on the video or the picture, modifying background music, adding a filter effect and the like.

Step S410, real-time display and synthesis;

step S411, playback preview and effect confirmation;

And step S412, issuing.

For this example, see step S312.

In the embodiment, the imaging data frame is captured by the camera of the mobile phone in real time, and during previewing, the data frame is processed in real time to finish the presentation of various video effects, so that the previewing is provided for a user. The user can start recording the video after confirming the effect. After the video is generated, the rendering effect selected by the user is also generated into the video file.

In the embodiment, for the mobile phone supporting the simultaneous start of a plurality of cameras for recording, recording and synthesis can be simultaneously carried out; for a mobile phone which does not support simultaneous recording of a plurality of cameras, a video file is recorded once, after a plurality of sections of videos are recorded respectively, the videos are gathered, overlapped and displayed together, and an entry for later editing and making is provided for a user. And then starting a synthesis logic to synthesize the multiple sections of videos into one video according to the effect edited by the user.

For non-real-time recorded videos, such as locally existing videos, a plurality of segments of videos can be synthesized into one video according to the editing effect of the user according to the video synthesis logic shown in the following figure. Fig. 5 is a schematic flow chart of an implementation of a video synthesis method according to an embodiment of the present invention, as shown in fig. 5, the method includes:

step S501, video synthesis is started;

here, the user operates the APP on the terminal, and the terminal enters the function entrance according to the operation of the user; for example, the user opens the APP, clicks the "camera" button to begin the video compositing phase, sending video cache/file data from local or receiving other devices according to the user's selection.

Step S502, analyzing video cache/file data;

here, the terminal analyzes the video cache/file data, and generates a file having a format specified by the APP after the analysis is completed.

Step S503, determine whether it is single video data, if yes, go to step S508, otherwise, go to step S504.

If the video data is not single video data, the video data is corresponding to a plurality of cameras, and the video data does not need to be synthesized. A user can select a local image file and then acquire an image file in real time; both files may also be local.

Step S504, video frame is synthesized frame by frame;

step S504 synthesizes the single video file of step S503.

Step S505, rendering a composite frame;

in this example, the composite frame rendering includes: after the multi-segment video is shot, the video or the picture can be selected to be edited, such as adjusting the video synthesis effect, adjusting the size, the position, the direction and the like of the overlapped video/picture, adding published characters, watermarks, expressions, playing mosaic on the video or the picture, modifying background music, adding a filter effect and the like. The added effect supports real-time preview, which can be sent after user confirmation.

Step S506, generating a video file;

step S507, incorporating an audio effect;

step S508, processing before release;

in step S509, video composition ends.

The invention has the advantages that the multi-camera shooting function of the mobile phone can be utilized, the functions of all cameras can be fully exerted when the video is shot, the effective content of the main video is expressed, meanwhile, other cameras are utilized for auxiliary recording, and the auxiliary content is superposed on the main video, so that the original video carries richer information. The invention can better support the user to express own feeling, experience or mood from a plurality of latitudes.

The embodiment of the present invention further provides a multimedia information processing apparatus, where each unit included in the apparatus, each module included in each unit, and each sub-module included in each module can be implemented by a processor in a terminal, and certainly can also be implemented by a logic circuit; in implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like. When implemented, the terminal may be implemented by using a computing device, where the computing device may be various types of electronic devices with information processing capability in an implementation process, and for example, the electronic devices may include a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, and the like.

Fig. 6 is a schematic diagram of a composition structure of a multimedia information processing apparatus according to an embodiment of the present invention, and as shown in fig. 6, the apparatus 600 includes an obtaining unit 601, a determining unit 602, a synthesizing unit 603, a receiving unit 604, and a sending unit 605, where:

the acquiring unit 601 is configured to acquire at least two pieces of multimedia information that are respectively captured by at least two cameras in the two or more cameras of the terminal;

the determining unit 602 is configured to determine a synthesis parameter for synthesizing the at least two pieces of multimedia information;

the synthesizing unit 603 is configured to synthesize the at least two pieces of multimedia information according to the synthesis parameters to obtain synthesized multimedia information;

the receiving unit 604 is configured to receive a first operation, where the first operation is used to send the synthesized multimedia to a server corresponding to the client;

the sending unit 605 is configured to send the synthesized multimedia to a server corresponding to the client in response to the first operation.

In other embodiments of the present invention, the terminal includes a front camera and a rear camera, the synthesis parameter includes a synthesis position area, and the determining unit is configured to determine that the first multimedia information captured by the front camera is synthesized in the synthesis position area of the second multimedia information captured by the rear camera; and the synthesis unit is used for adding the first multimedia information shot by the front camera to the synthesis position area to obtain synthesized multimedia information.

In other embodiments of the present invention, the determining unit comprises a receiving module and a first determining module, wherein: the receiving module is used for receiving a second operation aiming at the second multimedia information; the first determining module is configured to determine a synthesized location area based on the location corresponding to the second operation.

In other embodiments of the present invention, the synthesis parameters include a foreground tag, a background tag, and a synthesis position area, and the determining unit includes a second determining module, an identifying module, and a third determining module, wherein: the second determining module is configured to determine a foreground tag or a background tag for each of the at least two pieces of multimedia information; the identification module is used for carrying out image identification on the multimedia information with the background label to obtain a hollow window area; the third determining module is configured to determine the empty window region as a synthesized position region; the synthesis unit comprises an adding module used for adding the multimedia information with the foreground label in the synthesis position area to obtain the synthesized multimedia information.

In other embodiments of the present invention, the synthesis unit further comprises a fourth determination module and a setup module, wherein: the fourth determining module is configured to determine a frame sequence of each of the at least two multimedia messages; the establishing module is used for establishing association information between the frame sequence of the multimedia information with the foreground label and the frame sequence of the multimedia information with the background label; and the adding module is used for adding the multimedia information with the foreground label in the synthesis position area according to the associated information to obtain the synthesized multimedia information.

In other embodiments of the present invention, the adding module includes a detecting sub-module, a filling sub-module and an adding sub-module, wherein: the detection submodule is used for carrying out boundary detection on the multimedia information with the foreground label to obtain an area outside the boundary; the filling submodule is used for carrying out color filling on the area outside the boundary according to the color value of the synthesis position area to obtain the filled multimedia information with the foreground label; and the adding submodule is used for adding the multimedia information with the foreground label after the color filling into the synthesis position area to obtain the synthesized multimedia information.

In other embodiments of the present invention, the adding module includes an extracting sub-module and an adding sub-module, wherein: the extraction submodule is used for extracting the boundary of the multimedia information with the foreground label to obtain an area in the boundary; and the adding submodule is used for adding the area in the boundary to the synthesis position area to obtain the synthesized multimedia information.

In other embodiments of the present invention, the obtaining unit includes a fifth determining module and a first obtaining module, wherein: the fifth determining module is configured to determine that the terminal supports the at least two cameras to perform shooting simultaneously, and call the at least two cameras to perform shooting simultaneously; the first obtaining module is configured to obtain the at least two pieces of multimedia information captured by the at least two cameras.

In other embodiments of the present invention, the obtaining unit includes a sixth determining module, a second obtaining module and a third obtaining module, wherein: the sixth determining module is configured to call a default one of the at least two cameras to perform shooting when it is determined that the terminal does not support simultaneous shooting by the at least two cameras; the second obtaining module is configured to, after obtaining the multimedia information shot by the default one of the cameras, call the other cameras of the at least two cameras except the default one of the cameras to shoot in sequence; and the third acquisition module is used for acquiring the multimedia information which is shot by the other cameras in sequence.

The above description of the embodiment of the apparatus is similar to the above description of the embodiment of the method, and has similar beneficial effects to the embodiment of the method, and therefore, the description thereof is omitted. For technical details not disclosed in the embodiments of the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention for understanding.

In the embodiment of the present invention, if the multimedia information processing method is implemented in the form of a software functional module and is sold or used as an independent product, the multimedia information processing method may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

The embodiment of the invention provides a computer storage medium, wherein computer-executable instructions are stored in the computer storage medium and used for executing a multimedia information processing method provided by the embodiment of the invention.

An embodiment of the present invention provides a multimedia information processing apparatus, including:

a storage medium configured to store executable instructions;

and the processor is configured to execute the stored executable instructions, and the executable instructions are used for executing the multimedia information processing method.

The above description of the embodiments of the apparatus is similar to the above description of the embodiments of the storage medium and the device, and has similar beneficial effects to the embodiments of the method, and therefore, the description is omitted here for brevity. For technical details not disclosed in the embodiments of the storage medium and the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention.

Fig. 7 is a schematic diagram of a hardware entity of a terminal in an embodiment of the present invention, as shown in fig. 7, the hardware entity of the terminal 700 includes: a processor 701, a communication interface 702, an input module 703, a display module 704 and a memory 705, wherein

The processor 701 generally controls the overall operation of the computing device 700. For example, the input module 703 may be implemented as a touch screen, and output user operation data representing operation characteristics (including a touch point position, a touch point number, and a trigger pressure) of the touch screen to the processor 701, and the processor 701 may parse the user operation data to determine a function triggered by a user in the display interface, and generate display data corresponding to the triggered function, so that the display module 704 loads a page corresponding to the triggered function.

The communication interface 702 may enable the computing device to communicate with other terminals or servers over a network.

The input module 703 may be configured to receive input character information and generate signal inputs related to user settings and function controls. The input module may include a touch surface, which may collect a touch operation by a user (e.g., a user's operation on or near the touch surface using a finger, a stylus, or any other suitable object or attachment), acquire a signal from the touch operation, convert the signal into touch coordinates, send the touch coordinates to the processor 701 for processing, and receive and execute a command from the processor 701.

The display module 704 may be configured to display the functions performed by the processor 701 and related information.

The Memory 705 is configured to store instructions and applications executable by the processor 701, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 701 and modules in the computing device 700, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory 705 (RAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A multimedia information processing method is applied to a client, and the method comprises the following steps:

acquiring at least two pieces of multimedia information, wherein the at least two pieces of multimedia information are obtained by shooting through at least two cameras in a terminal, and at least one piece of the at least two pieces of multimedia information is obtained by shooting through each camera;

determining a foreground label or a background label for each of the at least two multimedia information; determining an area with consistent color and less than a threshold value in the multimedia information with the background label as a blank window area;

determining the empty window area as a synthesis position area, and adding the multimedia information with the foreground label in the synthesis position area according to the frame sequence of each multimedia information to obtain synthesized multimedia information;

displaying the synthesized multimedia information, and acquiring a real-time selected display scheme and effect;

formally generating multimedia information according to the real-time selected display scheme and effect;

receiving a first operation, wherein the first operation is used for sending the formally generated multimedia information to a server corresponding to the client;

and responding to the first operation, and sending the formally generated multimedia information to a server corresponding to the client.

2. The method of claim 1, wherein the terminal comprises a first camera and a second camera, wherein the composition parameter comprises a composition location area, and wherein determining the composition parameter for composition of the at least two multimedia information comprises: and determining a synthesis position area where the first multimedia information shot by the first camera is synthesized in the second multimedia information shot by the second camera.

3. The method of claim 2, wherein the adding the multimedia information with the foreground label to the synthesis position area according to the frame sequence of each multimedia information to obtain synthesized multimedia information comprises: and adding the first multimedia information shot by the first camera to the synthesis position area according to the frame sequence of each multimedia information to obtain synthesized multimedia information.

4. The method of claim 2, wherein the determining that the first multimedia information captured by the first camera is combined in the combination position area of the second multimedia information captured by the second camera comprises:

receiving a second operation for the second multimedia information;

and determining the synthetic position area based on the position corresponding to the second operation.

5. The method of claim 1, wherein the adding multimedia information with foreground labels to the composition position area according to the frame sequence of each multimedia information to obtain the composed multimedia information comprises:

determining a frame order of each of the at least two multimedia information;

establishing association information between the frame sequence of the multimedia information with the foreground label and the frame sequence of the multimedia information with the background label;

and adding the multimedia information with the foreground label in the synthesis position area according to the associated information to obtain the synthesized multimedia information.

6. The method of claim 1, wherein the adding multimedia information with foreground label to the synthesis position area according to the frame sequence of each multimedia information to obtain synthesized multimedia information, further comprises:

carrying out boundary detection on multimedia information with a foreground label to obtain an area outside the boundary;

performing color filling on the area outside the boundary according to the color value of the synthesis position area to obtain filled multimedia information with a foreground label;

and adding the multimedia information with the foreground label after color filling into the synthesis position area according to the frame sequence of each multimedia information to obtain the synthesized multimedia information.

7. The method of claim 1, wherein the adding multimedia information with foreground label to the synthesis position area according to the frame sequence of each multimedia information to obtain synthesized multimedia information, further comprises:

extracting the boundary of the multimedia information with the foreground label to obtain an area in the boundary;

and adding the area in the boundary to the synthesis position area according to the frame sequence of each piece of multimedia information to obtain synthesized multimedia information.

8. The method according to any one of claims 1 to 7, wherein said obtaining at least two multimedia messages comprises:

when the terminal supports the at least two cameras to shoot simultaneously, the at least two cameras are called to shoot simultaneously;

and acquiring the at least two pieces of multimedia information shot by the at least two cameras.

9. The method according to any one of claims 1 to 7, wherein said obtaining at least two multimedia messages comprises:

when the terminal does not support the simultaneous shooting of the at least two cameras, calling a default camera of the at least two cameras to shoot;

after the multimedia information shot by the default camera is obtained, other cameras except the default camera in the at least two cameras are called to shoot in sequence;

and acquiring multimedia information which is shot by the other cameras in sequence.

10. A multimedia information processing apparatus characterized by comprising an acquisition unit, a determination unit, a synthesis unit, a reception unit, and a transmission unit, wherein:

the acquiring unit is used for acquiring at least two pieces of multimedia information, wherein the at least two pieces of multimedia information are obtained by shooting through at least two cameras in the terminal, and at least one piece of the at least two pieces of multimedia information is obtained by shooting through each camera;

the synthesis unit is used for adding the multimedia information with the foreground label in a synthesis position area according to the frame sequence of each multimedia information to obtain synthesized multimedia information; displaying the synthesized multimedia information, and acquiring a real-time selected display scheme and effect; formally generating multimedia information according to the real-time selected display scheme and effect;

the receiving unit is used for receiving a first operation, and the first operation is used for sending the formally generated multimedia information to a server corresponding to the client;

the sending unit is used for responding to the first operation and sending the formally generated multimedia information to a server corresponding to the client;

a second determining module, configured to determine a foreground tag or a background tag for each of the at least two pieces of multimedia information;

the identification module is used for determining an area with consistent color and less than a threshold value in the multimedia information with the background label as a blank window area and determining the blank window area as a synthesis position area;

and the third determining module is used for determining the empty window area as a synthetic position area.

11. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is configured to implement the multimedia information processing method of any one of claims 1 to 9 when executing the program.

12. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the multimedia information processing method of any one of claims 1 to 9.