CN111464749A

CN111464749A - Method, device, equipment and storage medium for image synthesis

Info

Publication number: CN111464749A
Application number: CN202010384399.3A
Authority: CN
Inventors: 刘春宇
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou shiyinlian Software Technology Co.,Ltd.
Priority date: 2020-05-07
Filing date: 2020-05-07
Publication date: 2020-07-28
Anticipated expiration: 2040-05-07
Also published as: CN111464749B

Abstract

The application discloses a method, a device, equipment and a storage medium for image synthesis, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a first position of the added target three-dimensional animation, taking a first local image at the first position in a first video frame as a reference image, and adding the first animation frame of the target three-dimensional animation to the first position in the first video frame; when a second video frame is shot, determining a second local image which meets similar conditions with the reference image in the second video frame, and determining a second position of the second local image in the second video frame; acquiring position change information of the equipment between the shooting moments of a first video frame and a second video frame, and acquiring attitude information of the equipment at the shooting moment of the second video frame; and adding the current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the posture information, the first position and the second position. By the adoption of the method and the device, the mode of adding the image in the video is enriched.

Description

Method, device, equipment and storage medium for image synthesis

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for image synthesis.

Background

With the development of internet technology and the progress of image processing technology, more and more functions can be realized in short videos and live webcasts. For example, in the process of shooting a video, various decorative images can be added in a video picture to enrich the display content of the video.

In the prior art, a user can select different decoration images during video shooting, and a terminal can display the decoration images selected by the user in a current shooting video picture. For example, if the decorative image selected by the user is a photo frame, the terminal can display the photo frame in the currently shot video picture, and the edge area of the video picture is covered by four edges of the photo frame, so that the effect of playing the video in the photo frame is realized.

In the process of implementing the present application, the inventor finds that the prior art has at least the following problems:

at present, only two-dimensional images can be added to a video picture, and the adding mode is single.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for image synthesis, and enriches the modes of adding images in a video. The technical scheme is as follows:

in one aspect, a method for image synthesis is provided, the method comprising:

receiving an adding instruction of a target three-dimensional animation, and acquiring a first position corresponding to the adding instruction;

acquiring a first local image at the first position in a currently shot first video frame as a reference image, and adding a first animation frame of the target three-dimensional animation to the first position in the first video frame to obtain a first composite video frame;

when a second video frame is shot, determining a second local image which meets a similar condition with the reference image in the second video frame, and determining a second position of the second local image in the second video frame;

acquiring position change information of the equipment between the shooting time of the video frame to which the reference image belongs and the shooting time of the second video frame, and acquiring attitude information of the equipment at the shooting time of the second video frame;

and adding a current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the posture information, the first position and the second position to obtain a second composite video frame.

Optionally, the first local image at the first position is an image surrounded by a rectangle frame with a preset size and centered at the first position.

Optionally, the determining a second local image in the second video frame that satisfies a similar condition with the reference image includes:

acquiring characteristic information of the reference image;

acquiring a plurality of local images in the second video frame based on preset acquisition positions of the characteristic information of the plurality of local images, and acquiring the characteristic information of the plurality of local images, wherein the sizes of the plurality of local images are the same as the size of the reference image;

and determining a second local image with the highest similarity with the characteristic information of the reference image in the characteristic information of the plurality of local images.

Optionally, after determining a second local image in the second video frame that satisfies a similar condition with the reference image, the method further includes:

and updating the reference image into the second local image.

Optionally, when a second video frame is captured, determining a second local image in the second video frame, which satisfies a similar condition with the reference image, includes:

setting the interval frame number as an initial value, and when a second video frame which is separated from the first video frame by the interval frame number is shot, determining a second local image which meets similar conditions with the reference image in the second video frame;

after the determining a second local image of the second video frame that satisfies a similar condition with the reference image, the method further includes:

acquiring the calculation time length of a second partial image with the highest similarity to the feature information of the first partial image in the feature information of the plurality of second partial images;

determining a target value corresponding to the time length range in which the calculated time length is located based on the corresponding relation of the pre-stored time length range and the number of interval frames;

and adjusting the interval frame number to be the target value.

Optionally, the adding a current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the posture information, the first position, and the second position to obtain a second composite video frame includes:

determining coordinate adjustment information of the target three-dimensional animation based on the position change information and the posture information;

based on the coordinate adjustment information, adjusting the space coordinates of each image point in the target three-dimensional animation to obtain an adjusted target three-dimensional animation;

acquiring an adjusted second animation frame corresponding to the shooting time of the second video frame in the adjusted target three-dimensional animation;

determining two-dimensional coordinate transformation information of the adjusted second animation frame based on the first position and the second position;

converting the space coordinates of each image point in the adjusted second animation frame into two-dimensional coordinates based on the two-dimensional coordinate conversion information to obtain a two-dimensional animation frame corresponding to the second animation frame;

and adding the two-dimensional animation frame to the second position in the second video frame to obtain a second composite video frame.

In another aspect, there is provided an apparatus for performing image synthesis, the apparatus including:

the receiving module is configured to receive an adding instruction of a target three-dimensional animation and acquire a first position corresponding to the adding instruction;

a first synthesizing module, configured to obtain a first local image at the first position in a currently-shot first video frame as a reference image, and add a first animation frame of the target three-dimensional animation to the first position in the first video frame to obtain a first synthesized video frame;

the determining module is configured to determine a second local image meeting a similar condition with the reference image in a second video frame when the second video frame is shot, and determine a second position of the second local image in the second video frame;

the acquisition module is configured to acquire position change information of the device between the shooting time of the video frame to which the reference image belongs and the shooting time of the second video frame, and acquire attitude information of the device at the shooting time of the second video frame;

and the second synthesis module is configured to add a current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the posture information, the first position and the second position to obtain a second synthesized video frame.

Optionally, the determining module is configured to:

acquiring characteristic information of the reference image;

Optionally, the apparatus further includes an update module configured to: and updating the reference image into the second local image.

Optionally, the determining module is configured to:

the apparatus further comprises an adjustment module configured to:

and adjusting the interval frame number to be the target value.

Optionally, the second synthesis module is configured to:

In yet another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction is stored, and loaded and executed by the processor to implement the operations performed by the method for image synthesis as described above.

In yet another aspect, a computer-readable storage medium having at least one instruction stored therein is provided, the at least one instruction being loaded and executed by a processor to implement the operations performed by the method for image synthesis as described above.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the three-dimensional animation is added into the shot video frame, the reference image of the three-dimensional animation adding position is set, the pose transformation information of equipment among the video frames is obtained, and the display position, the display size and the display direction of the three-dimensional animation are adjusted, so that the three-dimensional animation is truly added into the scene of the video frames, and the mode of adding the images in the video is enriched.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a method for image synthesis according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a method for image synthesis provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a method for image synthesis according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a method for image synthesis provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a method for image synthesis provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a method for image synthesis provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an apparatus for image synthesis according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a terminal for image synthesis according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a method for performing image synthesis according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The method for image synthesis can be realized by a terminal, the terminal can run an application program with a video shooting function, such as a live application program and a short video application program, the terminal can be provided with components such as an acceleration sensor, a gyroscope, a direction sensor, a screen, a camera and the like, the terminal has a communication function and can be connected to the internet, and the terminal can be a mobile phone, a tablet personal computer, an intelligent wearable device and the like. In the embodiments provided in the present application, the terminal is taken as a mobile phone, and the corresponding application program is a short video application program, which are taken as examples to describe the scheme in detail, and other situations are similar and will not be described again.

According to the image synthesis method, the 3D (Three-Dimensional) animation can be added into the video picture shot by the camera of the terminal and displayed in the screen of the terminal, the display angle and the position of the 3D animation can be adjusted according to the pose change of the mobile phone, and therefore the effect of integrating the real scene in the video picture is achieved. The 3D animation may be static or dynamic.

Fig. 1 is a flowchart of a line image synthesis method according to an embodiment of the present application. Referring to fig. 1, the embodiment includes:

step 101, receiving an adding instruction of a target three-dimensional animation, and acquiring a first position corresponding to the adding instruction.

In implementation, a user can use a shooting function in a short video application program through terminal operation, and then enter a display interface for video shooting, and a video picture currently shot by a camera of the terminal can be displayed in the display interface. The video display interface is also provided with a three-dimensional animation adding option, when a user clicks the three-dimensional animation adding option, a three-dimensional animation selection list can pop up in the display interface, a plurality of display images corresponding to the three-dimensional animation are arranged in the three-dimensional animation selection list, the user can browse the display images corresponding to the three-dimensional animation, and the target three-dimensional animation added in the video picture is selected by clicking the display images corresponding to the three-dimensional animation. When the terminal detects that the user clicks the display image corresponding to the three-dimensional animation, the adding instruction of the target three-dimensional animation is triggered, and meanwhile, the corresponding three-dimensional animation resources can be downloaded to the local. The first position corresponding to the adding instruction may be a position preset by a technician, for example, the center of the screen. In addition, the first position can be selected by a user, and the user can click any position of a terminal screen to trigger an adding instruction of the target three-dimensional animation after selecting the target three-dimensional animation to be displayed. And the position of the user clicking the terminal screen is the first position corresponding to the adding instruction. The first position may be two-dimensional coordinate information of the first position in an image coordinate system, and as shown in fig. 2, the image coordinate system may use an upper left corner of a currently displayed video picture (i.e., a first video frame) as a vertex, an upper boundary of the video picture as an abscissa, and a left boundary of the video picture as an ordinate.

Step 102, a first local image at a first position in a currently shot first video frame is obtained as a reference image, and a first animation frame of the target three-dimensional animation is added to the first position in the first video frame to obtain a first composite video frame.

In implementation, after the terminal receives the adding instruction of the target three-dimensional animation, the three-dimensional animation resource corresponding to the target three-dimensional animation may be rendered, the target three-dimensional animation may be projected at a preset angle, a two-dimensional image corresponding to the target three-dimensional animation may be obtained, and the two-dimensional image is displayed at a first position of the first video frame, for example, if the target three-dimensional animation is a rotating 3D ball, a center of the ball corresponding to the ball may be the first position. The three-dimensional animation resources can comprise three-dimensional model resources and skeleton animation resources, the three-dimensional model resources can be fbx animation 3D models, and the skeleton animation resources comprise three-dimensional coordinate values of all moving skeleton points of the 3D models in corresponding animation coordinate systems at different time stamps and the position relations of all the moving skeleton points and all pixel points in the 3D models. The target three-dimensional animation can determine the position of the motion skeleton point through the three-dimensional coordinate values of the motion skeleton point under different time stamps, and then determine the three-dimensional coordinate value of each pixel point of the 3D model under different time stamps in an animation coordinate system according to the position relation between the motion skeleton point and each pixel point in the 3D model, namely each time stamp can correspond to one animation frame. When the target three-dimensional animation is added to the first video frame, the animation frame corresponding to the first timestamp of the target three-dimensional animation can be determined as the first animation frame. And then converting the first animation frame into a two-dimensional image according to a preset two-dimensional coordinate conversion matrix and displaying the two-dimensional image at a first position of the first video frame.

In order to achieve the purpose that the target three-dimensional animation is truly added to the video picture, namely the relative position of the 3D animation in the video scene is not changed, a reference image can be arranged in the video picture in the process of displaying the target three-dimensional animation, and the display position of the target three-dimensional animation in the video picture and the position of the reference image in the video picture are kept relatively unchanged. Such as a first local image around a first position in a first video frame as a reference image.

The first partial image at the first position may be an image surrounded by a rectangle frame of a preset size centered at the first position. As shown in fig. 3, the coordinates of the first position a are (100 ), the size of the rectangle frame with the preset size is 50 × 50, the coordinates of four vertices of the rectangle frame with the preset size are a (75,75), b (125,75), c (125 ), and d (75,125), respectively, and the pixel points of the first local image are the pixel points surrounded by the rectangle frame with the preset size.

And 103, when the second video frame is shot, determining a second local image which meets similar conditions with the reference image in the second video frame, and determining a second position of the second local image in the second video frame.

In implementation, during the process of shooting a video by a user, the displayed content of each video frame in the shot video changes along with the shooting angle of the position of the device, and the display position of the three-dimensional animation can be determined in each video frame according to the reference image. As shown in fig. 4, the second video frame is a video frame subsequent to the first video frame, and when the second video frame is captured, a second partial image that satisfies a similar condition with the reference image in the first video frame may be in the second video frame. The corresponding position of the second partial image is then determined as the display position of the three-dimensional animation in the second video frame (i.e. the second position), for example, the position of the center point of the second partial image may be determined as the second position.

Wherein, the similarity condition may be that, among a plurality of local images in the second video frame, a local image with the highest similarity to the feature value of the reference image is determined. The corresponding processing is as follows: acquiring characteristic information of a reference image; acquiring a plurality of local images in a second video frame based on preset acquisition positions of the characteristic information of the plurality of local images, and acquiring the characteristic information of the plurality of local images, wherein the sizes of the plurality of local images are the same as the size of a reference image; and determining a second local image with the highest similarity to the characteristic information of the reference image in the characteristic information of the plurality of local images.

In implementation, after the first position in the first video frame is determined, the coordinate values of the four vertices of the rectangle frame with the preset size and the first video frame as the center may be input into a preset feature information extraction model together with the first video frame, so as to obtain the feature information of the first local image, i.e. the feature information of the reference image. Then, when a second video frame is shot, the preset acquisition positions of the feature information of the plurality of local images and the second video frame can be sequentially input into the feature information extraction model to respectively obtain the feature information of the plurality of local images. Then, the similarity between the feature information of the plurality of local images and the feature information of the reference image is calculated, and the local image with the highest similarity to the feature information of the reference image is used as the second local image.

In addition, in order to improve the local image having the highest similarity with the feature information of the reference image in each video frame, the reference image may be updated, and after the second local image is determined, the reference image may be updated to the second local image.

In implementation, since the shooting angle and position may be changed continuously, the display angle and display size of the reference image determined in the first video frame may also be changed in the subsequent video frame, and if the first partial image in the first video frame is determined as the reference image all the time, it may result in that the corresponding partial image cannot be matched in the subsequent video frame, or that an erroneous partial image is matched. The second partial image may be used as a reference image for the next frame after the second partial image is determined. By analogy, each time a local image of which the current video frame and the reference image satisfy the similar condition is determined, the reference image is updated to the corresponding local image.

Due to the limited computing power of the terminal processor, a local image matching the reference image may not be detected for each captured video frame, and several intervening frames may exist between the first video frame and the second video frame, and the corresponding processing is as follows: and setting the interval frame number as an initial value, and when a second video frame which is the interval frame number of the first video frame is shot, determining a second local image which meets a similar condition with the reference image in the second video frame. In addition, because the processors of different terminals have different computing capacities, the value of the interval frame can be adjusted according to the computing capacity of the processors, and in practical application, the computing duration of the second partial image with the highest similarity to the feature information of the first partial image can be determined from the feature information of a plurality of second partial images; determining a target value corresponding to the time length range in which the calculated time length is located based on the corresponding relation of the pre-stored time length range and the number of interval frames; the interval frame number is adjusted to a target value.

In implementation, in order to reduce the amount of calculation, an interval frame may be set, that is, a number of video frames may be spaced out between two video frames for detecting the similarity to the reference image. The specific number of spaced frames may be set by the skilled person. And determining a next frame when the frame number after the first video frame reaches the interval frame number as a second video frame, and then determining a second local image which meets the similar condition with the reference image in the second video frame. The number of the interval frames can be generally set to 5, that is, the local image feature information extraction is performed on the next video frame after every 5 frames, so as to obtain the local image satisfying the similar condition with the reference image. In addition, the number of interval frames can be adjusted due to the difference in the computing power of the processors of different terminals. The calculation time length of the terminal can be determined by recording the time when the extraction of the local image characteristic value in the video frame is started and the time when the second local image with the highest similarity to the characteristic information of the reference image is obtained. The computing power of the processor of the terminal is determined by the calculated time duration. The technical staff can pre-store the corresponding relation between the time length range and the number of the interval frames in the terminal, and determine the number of the interval frames matched with the current terminal computing capability according to the computing time length. For example, the number of interval frames corresponding to 0.01s to 0.03s may be 3, the number of interval frames corresponding to 0.03s to 0.06s may be 5, and the number of interval frames corresponding to 0.06s to 0.10s may be 8, and when it is detected that the calculated time period of the terminal is 0.04s, the number of interval frames may be set to 5, so that it is possible to ensure smooth display of the target three-dimensional animation.

And 104, acquiring position change information of the equipment between the shooting time of the video frame to which the reference image belongs and the shooting time of the second video frame, and acquiring attitude information of the equipment at the shooting time of the second video frame.

In implementation, in order to ensure that the target three-dimensional animation can achieve the effect of being integrated with a scene in a video in the video, the size, the angle, the position and the like of the target three-dimensional animation can be adjusted by acquiring the position change information and the posture information of the camera. The position change information can be acquired by a terminal system and is a difference value between the spatial position data of the terminal at the shooting time of the video frame to which the reference image belongs and the spatial position data of the second video frame at the shooting time. The attitude information can be a rotation angle of the terminal in space when shooting a video, and can be a shooting direction of a camera of the terminal and a direction pointed right above the top of the camera, if the terminal is a mobile phone, the direction pointed right above the top of the camera is parallel to a long edge of a screen of the mobile phone.

And 105, adding a current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the posture information, the first position and the second position to obtain a second composite video frame.

In implementation, the display size and the display angle of the three-dimensional animation in the current video frame can be adjusted through the position change information of the terminal between the shooting moments of the video frames. The display position of the target three-dimensional animation is adjusted from the first position in the first video frame to the second position in the second video frame through the difference value of the first position and the second position in the image coordinate system, so that the target three-dimensional animation can be adjusted along with the change of the shot video picture, and the effect of being integrated with the video scene is achieved.

Optionally, providing the position change information, the posture information, the first position and the second position to adjust the display of the target three-dimensional animation in the video may be as follows: determining coordinate adjustment information of the target three-dimensional animation based on the position change information and the posture information; based on the coordinate adjustment information, adjusting the space coordinates of each image point in the target three-dimensional animation to obtain the adjusted target three-dimensional animation; acquiring an adjusted second animation frame corresponding to the shooting time of the second video frame in the adjusted target three-dimensional animation; determining two-dimensional coordinate transformation information of the adjusted second animation frame based on the first position and the second position; converting the adjusted space coordinates of each image point in the second animation frame into two-dimensional coordinates based on the two-dimensional coordinate conversion information to obtain a two-dimensional animation frame corresponding to the second animation frame; and adding the two-dimensional animation frame at a second position in the second video frame to obtain a second composite video frame.

In implementation, the terminal may include a gyroscope, an acceleration sensor, a direction sensor, and the like, and may be capable of acquiring position information and orientation information of the terminal. The short video application program can acquire the position information of the first video frame shooting moment and the second video frame shooting moment in the process of shooting the video, and determine the position change information of the terminal according to the difference value of the position information of the first video frame shooting moment and the position information of the second video frame shooting moment. And acquiring attitude information of the terminal at the shooting moment of the second video frame, wherein the attitude information can be information such as the current shooting direction of a camera of the terminal. And then, coordinate adjustment information of the target three-dimensional animation is formed through the position change information and the posture information, and coordinate values of all animation frames in the target three-dimensional animation under an animation coordinate system are adjusted. And determining a second animation frame corresponding to the shooting moment of the second video frame in the target three-dimensional animation according to the time difference between the shooting moment of the second video frame and the shooting moment of the first video frame, wherein the second animation frame is adjusted according to the coordinate adjustment information. And then determining two-dimensional coordinate conversion information according to coordinate values of the acquired first position and the acquired second position in an image coordinate system respectively, converting the adjusted second animation frame into a two-dimensional image displayed in a second video frame, namely converting the space coordinates of each image point in the adjusted second animation frame into two-dimensional coordinates to obtain a two-dimensional animation frame corresponding to the second animation frame, and then adding the two-dimensional animation frame to a second position in the second video frame to obtain a second composite video frame.

The position change information is a position change vector, the posture information is a first direction vector which has the same shooting direction of the equipment at the shooting time of the second video frame and a second direction vector which is parallel to the long edge of a shot picture of the equipment, and the coordinate adjustment information is a coordinate adjustment matrix which is composed of the position change vector, the first direction vector and the second direction vector.

The position change information may be represented by a position change vector, for example, if the position information at the shooting time of the first video frame is (1, 1, 2), and the position information at the shooting time of the second video frame is (3, 4, 5), the position change information may be (2, 3, 3). As shown in fig. 9, the attitude information may be represented by two direction vectors, a first direction vector that is the same as the shooting direction of the camera of the terminal at the shooting time of the second video frame, and a second direction vector that is parallel to the long side of the terminal shooting screen. The coordinate adjustment information may be a coordinate adjustment matrix composed of the position change vector, the first direction vector, and the second direction vector. For example, if the position change information is (2, 3, 3), the first direction vector is (0, 0, -1), and the second direction vector is (0, 1, 0), the coordinate adjustment information is [ (2, 3, 3), (0, 0, -1), (0, 1, 0) ].

The two-dimensional coordinate transformation information is a two-dimensional coordinate transformation matrix, and the processing for determining the two-dimensional coordinate transformation information of the adjusted second animation frame based on the first position and the second position may be as follows: acquiring an initial two-dimensional coordinate transformation matrix for transforming an animation frame of a pre-stored three-dimensional animation into a two-dimensional image; determining an adjustment matrix based on the displacement value of the second position relative to the first position; and determining the product of the initial two-dimensional coordinate transformation matrix and the adjustment matrix as the two-dimensional coordinate transformation matrix of the second animation frame of the target three-dimensional animation.

The initial two-dimensional coordinate transformation matrix is a projection matrix, and the three-dimensional coordinate value of each pixel point of the three-dimensional animation under the animation coordinate system can be transformed into the two-dimensional coordinate value under the image coordinate system through the initial two-dimensional matrix by the animation frame of the three-dimensional animation according to the preset projection direction, so that the two-dimensional image corresponding to the three-dimensional animation is obtained. The adjustment matrix comprises displacement values of the second position relative to the first position in the image coordinate system, and the offset value can be obtained by vector differences of coordinate values of the second position and the first position in the image coordinate system respectively. For example, the coordinate value of the first position is (100 ), the coordinate value of the second position is (220, 225), and the offset value may be (120, 125). The adjustment matrix may further include preset scaling values and rotation values for the three-dimensional animation, where the preset scaling values and rotation values are to scale and rotate the three-dimensional animation again after adjustment, and generally the scaling values are (1, 1, 1), and the rotation values are (0, 0, 0), that is, the three-dimensional animation after adjustment is not rotated and scaled again. In addition, since the offset value is obtained by two-dimensional coordinate values and includes only the vector difference between the two coordinate axes of the horizontal axis direction and the vertical axis direction, the offset vector may be formed by adding a preset difference value of the third coordinate axis to the offset value, and it is generally preset that the difference value of the third coordinate axis may be set to 0, that is, the offset vector is formed to (120, 125, 0). Then, an adjustment matrix [ (120, 125, 0), (1, 1, 1), (0, 0, 0) ] is formed according to the preset scaling value and the rotation value. And finally, multiplying the initial two-dimensional matrix by the adjusting matrix to obtain a two-dimensional coordinate conversion matrix for converting the second animation frame of the target three-dimensional animation into a two-dimensional image.

After the second animation frame of the target three-dimensional animation is converted into the two-dimensional coordinate conversion matrix of the two-dimensional image, the two-dimensional coordinate conversion matrix is multiplied by the adjusted space coordinate value of each image point in the second animation frame, and the projection of the second animation frame in the image coordinate system, namely the two-dimensional animation frame displayed at the second position of the second video frame, is obtained. For example, the three-dimensional animation added by the user in the video frame is a cylinder, and the user can move the mobile phone during the shooting process, as shown in fig. 5, and the display effect corresponding to the cylinder in the video frame can be as shown in fig. 6. After the second composite video is obtained, when a third video frame is shot, a third local image which meets similar conditions with the reference image in the third video frame can be determined, the reference image can be updated to be a second local image at the moment, then a third position of the third local image in the third video frame is determined, a third animation frame of the target three-dimensional animation is obtained by obtaining position change information and posture information of the device between the shooting time of the video frame to which the reference image belongs and the shooting time of the third video frame, according to the obtained position change information, the posture information, the second position and the third position, and the third animation frame is added to the third video frame to obtain a third composite video frame. And analogizing in sequence, when the fourth video frame, the fifth video frame and the like are shot, adjusting the target three-dimensional animation according to the same processing mode, so that the effect of truly adding the three-dimensional animation to the scene in the video frames is achieved.

According to the method and the device, the three-dimensional animation is added into the shot video frame, the reference image of the three-dimensional animation adding position is set, the pose transformation information of the equipment between the video frames is obtained, and the display position, the display size and the display direction of the three-dimensional animation are adjusted, so that the three-dimensional animation is truly added into the scene in the video frame, and the mode of adding the image in the video is enriched.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

Fig. 7 is a schematic structural diagram of an apparatus for image synthesis according to an embodiment of the present application, where the apparatus may be a terminal in the foregoing embodiment, and as shown in fig. 7, the apparatus includes:

the receiving module 710 is configured to receive an adding instruction of a target three-dimensional animation, and obtain a first position corresponding to the adding instruction;

a first synthesizing module 720, configured to obtain a first local image at the first position in a currently-captured first video frame as a reference image, and add a first animation frame of the target three-dimensional animation to the first position in the first video frame to obtain a first synthesized video frame;

a determining module 730, configured to, when a second video frame is captured, determine a second local image in the second video frame, which satisfies a similar condition with the reference image, and determine a second position of the second local image in the second video frame;

an obtaining module 740, configured to obtain position change information of the device between the shooting time of the video frame to which the reference image belongs and the shooting time of the second video frame, and obtain posture information of the device at the shooting time of the second video frame;

a second composition module 750 configured to add a current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the pose information, the first position, and the second position, resulting in a second composite video frame.

Optionally, the determining module 730 is configured to:

acquiring characteristic information of the reference image;

Optionally, the determining module 730 is configured to:

the apparatus further comprises an adjustment module configured to:

and adjusting the interval frame number to be the target value.

Optionally, the second synthesis module 750 is configured to:

It should be noted that: in the image synthesis apparatus provided in the above embodiment, only the division of the functional modules is illustrated when image synthesis is performed, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to perform all or part of the above described functions. In addition, the apparatus for performing image synthesis and the method for performing image synthesis provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 8 shows a block diagram of a terminal 800 according to an exemplary embodiment of the present application. The terminal 800 may be: smart phones, tablet computers, and the like. The terminal 800 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 800 includes: a processor 801 and a memory 802.

The processor 801 may also include a main processor, which is a processor for Processing data in a wake-up state, also referred to as a Central Processing Unit (CPU), and a coprocessor, which is a low power processor for Processing data in a standby state, in some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit) for rendering and rendering content for display, in some embodiments, the processor 801 may also include an intelligent processor (AI) for learning operations related to AI (Artificial Intelligence processor) for computing operations related to display screens.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the method for image synthesis provided by the method embodiments herein.

In some embodiments, the terminal 800 may further include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a touch screen display 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.

The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The Display 805 is used to Display a UI (User Interface) that may include graphics, text, icons, video, and any combination thereof, when the Display 805 is a touch Display, the Display 805 also has the ability to capture touch signals on or over the surface of the Display 805. the touch signals may be input to the processor 801 for processing as control signals, at which time the Display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard.

The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.

The positioning component 808 is used to locate the current geographic location of the terminal 800 to implement navigation or L BS (L geographic based Service.) the positioning component 808 may be a positioning component based on the united states GPS (global positioning System), the beidou System of china, the graves System of russia, or the galileo System of the european union.

Power supply 809 is used to provide power to various components in terminal 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power source 809 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the touch screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the terminal 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 813 may be disposed on the side bezel of terminal 800 and/or underneath touch display 805. When the pressure sensor 813 is disposed on the side frame of the terminal 800, the holding signal of the user to the terminal 800 can be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the touch display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is used for collecting fingerprints of a user, the identity of the user is identified by the processor 801 according to the fingerprints collected by the fingerprint sensor 814, or the identity of the user is identified by the fingerprint sensor 814 according to the collected fingerprints, when the identity of the user is identified to be a credible identity, the user is authorized to execute relevant sensitive operations by the processor 801, the sensitive operations comprise screen unlocking, encrypted information viewing, software downloading, payment, setting change and the like, the fingerprint sensor 814 can be arranged on the front side, the back side or the side of the terminal 800, when a physical key or a manufacturer L ogo is arranged on the terminal 800, the fingerprint sensor 814 can be integrated with the physical key or the manufacturer L ogo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch screen 805 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 805 is increased; when the ambient light intensity is low, the display brightness of the touch display 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically provided on the front panel of the terminal 800. The proximity sensor 816 is used to collect the distance between the user and the front surface of the terminal 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 gradually decreases, the processor 801 controls the touch display 805 to switch from the bright screen state to the dark screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 becomes gradually larger, the processor 801 controls the touch display 805 to switch from the screen-on state to the screen-on state.

Those skilled in the art will appreciate that the configuration shown in fig. 8 is not intended to be limiting of terminal 800 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform the method of image synthesis in the above embodiments is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of performing image synthesis, comprising:

2. The method of claim 1, wherein the first local image at the first location is an image surrounded by a rectangle of a predetermined size centered on the first location.

3. The method of claim 1, wherein the determining a second local image in the second video frame that satisfies a similarity condition with the reference image comprises:

acquiring characteristic information of the reference image;

4. The method of claim 3, wherein after determining a second local image in the second video frame that satisfies a similarity condition with the reference image, the method further comprises:

and updating the reference image into the second local image.

5. The method of claim 3, wherein the determining, when a second video frame is captured, a second local image of the second video frame that satisfies a similarity condition with the reference image comprises:

and adjusting the interval frame number to be the target value.

6. The method of claim 1, wherein adding a current second animation frame of the target three-dimensional animation to the second video frame based on the position change information, the pose information, the first position, and the second position to obtain a second composite video frame comprises:

7. An apparatus for performing image synthesis, comprising:

8. The apparatus of claim 7, wherein the first local image at the first location is an image surrounded by a rectangle of a predetermined size centered on the first location.

9. The apparatus of claim 7, wherein the determination module is configured to:

acquiring characteristic information of the reference image;

10. The apparatus of claim 9, further comprising an update module configured to: and updating the reference image into the second local image.

11. The apparatus of claim 9, wherein the determination module is configured to:

the apparatus further comprises an adjustment module configured to:

and adjusting the interval frame number to be the target value.

12. The apparatus of claim 7, wherein the second synthesis module is configured to:

13. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the method for image synthesis according to any one of claims 1 to 6.

14. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by the method for image synthesis according to any one of claims 1 to 6.