CN111652831A

CN111652831A - Object fusion method and device, computer-readable storage medium and electronic equipment

Info

Publication number: CN111652831A
Application number: CN202010601645.6A
Authority: CN
Inventors: 余自强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-09-11
Anticipated expiration: 2040-06-28
Also published as: CN111652831B

Abstract

The application provides an object fusion method, an object fusion device, a computer-readable storage medium and an electronic device; relates to the technical field of computer vision; determining a structural straight line of a target video frame in a video file; determining a vanishing point straight line of the target video frame from the structural straight lines according to straight line selection operation; determining the camera attitude according to the vanishing point straight line of the target video frame; and fusing the object to be fused into the target video frame according to the camera posture. Therefore, by the technical scheme, the fusion degree between the object to be fused and the video file can be improved, and the fusion effect is further improved.

Description

Object fusion method and device, computer-readable storage medium and electronic equipment

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to an object fusion method, an object fusion apparatus, a computer-readable storage medium, and an electronic device.

Background

In general, there is sometimes a need to add additional elements to a video file, for example, advertisers need to add advertising elements to a television show. The existing approaches for meeting the above requirements are generally: the method comprises the steps of integrating additional elements into a video scene in the early stage of video file production, so that after the video file production is completed, the required additional elements are included; alternatively, the mapping of the additional elements is added directly to the video file. However, the above method generally has a problem of low fusion degree between the additional element and the video file, and further easily causes a problem of poor fusion effect.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

An object fusion method, an object fusion device, a computer-readable storage medium, and an electronic device are provided, which can improve the fusion degree between an object to be fused and a video file, and further improve the fusion effect.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to a first aspect of the present application, there is provided an object fusion method, comprising:

determining a structural straight line of a target video frame in a video file;

determining a vanishing point straight line of the target video frame from the structural straight lines according to straight line selection operation;

determining the camera attitude according to the vanishing point straight line of the target video frame;

and fusing the object to be fused into the target video frame according to the camera posture.

In an exemplary embodiment of the present application, before determining a structural straight line of a target video frame in a video file, the method further includes:

and determining a target video frame from the video file according to the detected video frame selection operation.

In an exemplary embodiment of the present application, determining a structural straight line of a target video frame in a video file includes:

extracting the features of the target video frame to obtain a feature map of the target video frame;

predicting to obtain a straight line connection point corresponding to the target video frame according to the feature map;

and generating a structural straight line according to the straight line connecting points.

In an exemplary embodiment of the present application, generating a structural straight line from straight line connection points includes:

predicting a reference structure straight line according to the straight line connecting points;

and screening the reference structure straight line according to the characteristic diagram to obtain the structure straight line of the target video frame.

In an exemplary embodiment of the present application, determining a camera pose from a vanishing point line of a target video frame comprises:

determining vanishing points used for representing different directions in the target video frame according to vanishing point straight lines used for representing different directions in the target video frame;

determining a camera rotation matrix according to vanishing points used for representing different directions and the camera internal parameter matrix;

calculating a camera translation vector according to vanishing points used for representing different directions;

the camera pose is calculated from the camera translation vector and the camera rotation matrix.

In an exemplary embodiment of the present application, determining vanishing points in a target video frame for representing different directions according to vanishing point straight lines in the target video frame for representing different directions respectively includes:

determining linear equations respectively corresponding to vanishing point straight lines respectively used for representing different directions in a target video frame;

and calculating vanishing points for representing different directions in the target video frame according to a linear equation.

In an exemplary embodiment of the present application, vanishing points for characterizing different directions are respectively used for characterizing an x direction and a y direction, and a camera rotation matrix is determined according to the vanishing points for characterizing different directions and a camera internal reference matrix, including:

determining a reference vector in the x direction according to the camera internal reference matrix, the vanishing point in the x direction and a preset adjusting factor;

determining a reference vector in the y direction according to the camera internal reference matrix, the vanishing point in the y direction and a preset adjusting factor;

calculating a cross product result of the reference vector in the x direction and the reference vector in the y direction;

and combining the reference vector in the x direction, the reference vector in the y direction and the cross multiplication result to obtain a camera rotation matrix.

In an exemplary embodiment of the present application, after combining the reference vector in the x direction, the reference vector in the y direction, and the cross-multiplied result to obtain the camera rotation matrix, the method further includes:

determining rotation angles of the camera in the x direction, the y direction and the z direction respectively according to the camera rotation matrix;

determining the actual position of the camera according to the rotation angle; wherein the actual position of the camera is the position of the camera in the real three-dimensional space.

In an exemplary embodiment of the present application, calculating a camera translation vector from vanishing points characterizing different directions includes:

determining coordinates respectively corresponding to vanishing points used for representing different directions;

and calculating a homography matrix according to the coordinates and calculating a translation vector of the camera according to the homography matrix.

In an exemplary embodiment of the present application, if an object to be fused is a three-dimensional object, fusing the object to be fused into a target video frame according to a camera pose includes:

performing dimension conversion on an object to be fused according to the camera posture, wherein the object to be fused after the dimension conversion is a two-dimensional object;

and fusing the object to be fused after the dimension conversion into the target video frame.

In an exemplary embodiment of the present application, after fusing the object to be fused into the target video frame according to the camera pose, the method further includes:

determining vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame; and the time points corresponding to the other video frames are later than the target video frame.

In an exemplary embodiment of the present application, determining vanishing point straight lines of other video frames in a video file according to the vanishing point straight lines of the target video frame includes:

determining a vanishing point straight line of a first video frame in other video frames according to the vanishing point straight line of the target video frame; wherein the first video frame is adjacent to the target video frame;

determining vanishing point straight lines of second video frames in other video frames according to the vanishing point straight lines of the first video frame until the vanishing point straight lines of all other video frames are determined;

the first video frame is adjacent to the target video frame, and the second video frame is adjacent to the first video frame.

In an exemplary embodiment of the present application, determining a vanishing point straight line of a first video frame of other video frames according to a vanishing point straight line of a target video frame includes:

determining all structural straight lines in the first video frame;

calculating the distances between all the structural straight lines and vanishing point straight lines of the target video frame respectively;

and determining the structural straight line corresponding to the shortest distance as the vanishing point straight line of the first video frame.

In an exemplary embodiment of the present application, after determining vanishing point straight lines of other video frames in the video file according to the vanishing point straight line of the target video frame, the method further includes:

determining camera postures corresponding to other video frames according to vanishing point straight lines of the other video frames;

and respectively fusing the objects to be fused into other video frames according to the camera postures and the camera parameters corresponding to the other video frames.

According to a second aspect of the present application, there is provided an object fusion apparatus including a straight line determination unit, a camera pose determination unit, and an object fusion unit, wherein:

the straight line determining unit is used for determining a structural straight line of a target video frame in the video file;

the straight line determining unit is also used for determining a vanishing point straight line of the target video frame from the structural straight line according to the straight line selecting operation;

the camera attitude determination unit is used for determining the camera attitude according to the vanishing point straight line of the target video frame;

and the object fusion unit is used for fusing the object to be fused into the target video frame according to the camera posture.

In an exemplary embodiment of the present application, the apparatus further includes a video frame selecting unit, wherein:

and the video frame selecting unit is used for determining the target video frame from the video file according to the detected video frame selecting operation before the straight line determining unit determines the structural straight line of the target video frame in the video file.

In an exemplary embodiment of the present application, the straight line determining unit determines a structural straight line of a target video frame in a video file, including:

In an exemplary embodiment of the present application, the straight line determining unit generates the structural straight line from the straight line connecting points, including:

In an exemplary embodiment of the present application, the camera pose determination unit determines the camera pose from a vanishing point straight line of a target video frame, including:

In an exemplary embodiment of the present application, the camera pose determination unit determines vanishing points representing different directions in the target video frame according to vanishing point straight lines respectively representing different directions in the target video frame, including:

In an exemplary embodiment of the present application, vanishing points for characterizing different directions are respectively used for characterizing an x direction and a y direction, and the camera pose determination unit determines a camera rotation matrix according to the vanishing points for characterizing different directions and a camera internal reference matrix, including:

In an exemplary embodiment of the present application, after the camera pose determination unit combines the reference vector in the x direction, the reference vector in the y direction, and the cross-multiplied result to obtain the camera rotation matrix, the apparatus further includes a rotation angle determination unit and a position determination unit, where:

a rotation angle determining unit for determining rotation angles of the camera in the x direction, the y direction and the z direction according to the camera rotation matrix;

a position determination unit for determining the actual position of the camera according to the rotation angle; wherein the actual position of the camera is the position of the camera in the real three-dimensional space.

In an exemplary embodiment of the present application, the camera pose determination unit calculates a camera translation vector from vanishing points characterizing different directions, including:

In an exemplary embodiment of the present application, if the object to be fused is a three-dimensional object, fusing the object to be fused into the target video frame according to the camera pose by an object fusion unit, including:

In an exemplary embodiment of the present application, the straight line determining unit is further configured to determine vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame after the object fusing unit fuses the object to be fused into the target video frame according to the camera pose; and the time points corresponding to the other video frames are later than the target video frame.

In an exemplary embodiment of the present application, the determining a vanishing point straight line of other video frames in the video file according to the vanishing point straight line of the target video frame by the straight line determining unit includes:

In an exemplary embodiment of the present application, the determining a vanishing point straight line of a first video frame of the other video frames according to the vanishing point straight line of the target video frame by the straight line determining unit includes:

determining all structural straight lines in the first video frame;

In an exemplary embodiment of the present application, the camera pose determining unit is further configured to determine, after the straight line determining unit determines the vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame, camera poses corresponding to the other video frames according to the vanishing point straight lines of the other video frames;

and the object fusion unit is also used for respectively fusing the objects to be fused into other video frames according to the camera postures and the camera parameters corresponding to the other video frames.

According to a third aspect of the present application, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.

According to a fourth aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

The exemplary embodiments of the present application may have some or all of the following advantages:

in the object fusion method provided by an example embodiment of the present application, a structural straight line of a target video frame in a video file may be determined; determining a vanishing point straight line of the target video frame from the structural straight lines according to straight line selection operation; determining the camera attitude according to the vanishing point straight line of the target video frame; and fusing the object to be fused into the target video frame according to the camera posture. According to the technical description, on one hand, the camera attitude can be determined based on the vanishing point straight line of the target video frame needing object fusion, and then the object to be fused is fused into the target video frame based on the camera attitude, so that the fusion degree between the object to be fused and the video file is improved, and the fusion effect is improved. In another aspect of the application, a vanishing point straight line of a target video frame can be determined through a straight line selection operation, so that the interactivity with a user is enhanced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic diagram illustrating an exemplary system architecture to which an object fusion method and an object fusion apparatus according to an embodiment of the present application may be applied;

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present application;

FIG. 3 schematically shows a flow diagram of an object fusion method according to an embodiment of the present application;

FIG. 4 schematically illustrates a structural schematic of a wire analysis model according to one embodiment of the present application;

FIG. 5 schematically shows a diagram of a target video frame in an embodiment in accordance with the present application;

FIG. 6 schematically shows a diagram of a target video frame comprising structural lines in accordance with an embodiment of the present application;

FIG. 7 schematically illustrates a user interface diagram for collecting straight line selection operations in accordance with an embodiment of the present application;

FIG. 8 schematically illustrates a diagram of a target video frame including a vanishing point line in an embodiment in accordance with the present application;

FIG. 9 schematically shows a two-dimensional schematic view at different camera poses according to an embodiment of the present application;

FIG. 10 schematically illustrates a vanishing point diagram according to an embodiment of the present application;

FIG. 11 schematically shows a diagram of object fusion results according to an embodiment of the application;

FIG. 12 schematically shows a flow diagram of an object fusion method according to an embodiment of the present application;

fig. 13 schematically shows a block diagram of an object fusion apparatus in an embodiment according to the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present application.

Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which an object fusion method and an object fusion apparatus according to an embodiment of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The

terminal devices

101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like. The server 105 may be an independent physical server, a server cluster or a distributed system including a plurality of physical servers, or a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

It should be noted that the cloud server described above may provide basic cloud computing services by using cloud technology. Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

In addition, cloud computing (cloud computing) included in the above-described basic cloud computing service refers to a delivery and use mode of IT infrastructure, and refers to obtaining required resources in an on-demand, easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), Parallel Computing (Parallel Computing), utility Computing (UtilityComputing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.

And cloud storage (cloud storage) included in the basic cloud computing service is a new concept extended and developed on the cloud computing concept, and a distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system which integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network through functions of cluster application, a grid technology, a distributed storage file system and the like through application software or application interfaces to cooperatively work, and provides a data storage function and a service access function to the outside. At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object. The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.

And a Database (Database) included in the basic cloud computing service can be regarded as an electronic file cabinet, namely a place for storing electronic files, and a user can add, inquire, update, delete and the like to the data in the files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application.

And Big data (Big data) included in the basic cloud computing service refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which needs a new processing mode to have stronger decision-making power, insight discovery power and process optimization capability. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.

The object fusion method provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the object fusion device is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the object fusion method provided in the embodiment of the present application may also be executed by the

terminal device

101, 102, or 103, and accordingly, the object fusion apparatus may also be disposed in the

terminal device

101, 102, or 103, which is not particularly limited in this exemplary embodiment. For example, in one exemplary embodiment, the server 105 may determine a structural straight line of a target video frame in a video file; determining a vanishing point straight line of the target video frame from the structural straight lines according to straight line selection operation; determining the camera attitude according to the vanishing point straight line of the target video frame; and fusing the object to be fused into the target video frame according to the camera posture.

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 210 as necessary, so that a computer program read out therefrom is installed into the storage section 208 as necessary.

In particular, according to embodiments of the present application, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and apparatus of the present application. The method of the present application may be implemented based on artificial intelligence. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In general, there is sometimes a need to add additional elements to a video file, for example, advertisers need to add advertising elements to a television show. The existing Video-In is a soft embedded advertisement form, and has the advantages of high reach rate, small cooperation risk, budget conservation and the like compared with the traditional advertisement. However, the current advertisement implantation method generally cannot be used for advertisement implantation in combination with different characteristics (such as different shooting angles) of the video file, so that the problems of hard implantation effect and low fusion degree between the implanted object and the video file are easily caused.

In view of the above, the present exemplary embodiment provides an object fusion method. The object fusion method may be applied to the server 105, and may also be applied to one or more of the

terminal devices

101, 102, and 103, which is not particularly limited in this exemplary embodiment. Referring to fig. 3, the object fusion method may include the following steps S310 to S340:

step S310: and determining a structural straight line of the target video frame in the video file.

Step S320: and determining a vanishing point straight line of the target video frame from the structural straight lines according to the straight line selection operation.

Step S330: and determining the camera attitude according to the vanishing point straight line of the target video frame.

Step S340: and fusing the object to be fused into the target video frame according to the camera posture.

When the method and the device are applied to the field of advertisement implantation, the structural straight line in the video frame needing advertisement implantation can be determined, the straight line selected by the user from the structural straight line is determined as the vanishing point straight line, and the camera posture of the video frame can be determined according to the vanishing point straight line. Furthermore, the implanted object (namely, the object to be fused) can be fused into the video frame according to the camera posture, so that the advertisement implantation effect can be improved, the problem that the advertisement implantation is hard is solved, the fusion degree between the implanted object and the video file is higher, and the fusion effect is better.

By implementing the method shown in fig. 3, the camera pose can be determined based on the vanishing point straight line of the target video frame to be subjected to object fusion, and then the object to be fused is fused into the target video frame based on the camera pose, so that the fusion degree between the object to be fused and the video file is improved, and the fusion effect is improved. In addition, the vanishing point straight line of the target video frame can be determined through the straight line selection operation, and the interaction with the user is enhanced.

The above steps of the present exemplary embodiment will be described in more detail below.

In step S310, a structural straight line of the target video frame in the video file is determined.

Where a video file is composed of a series of video Frames, usually expressed as Frames Per Second (FPS), each video frame being an image, a moving image may be created when played in sequence, for example, 30FPS may indicate that 30 video Frames are included in a second of video. In addition, the target video frame may be any one of a video frame designated by a user, a default video frame, or a video file, and the embodiment of the present application is not limited. In addition, optionally, the number of the target video frames may be one or more, and in the embodiment of the present application, the number of the target video frames is 1 as an example. In addition, the structural straight line may be a straight line segment for constituting a three-dimensional structure/two-dimensional structure in the video frame. Optionally, the structural straight line may be a spatial structural straight line in a three-dimensional space represented by the video frame, or may also be a planar structural straight line in a two-dimensional plane represented by the video frame, which is not limited in the embodiment of the present application.

Further, optionally, before step S310, the method may further include the following steps: displaying cover entries of a plurality of video files on a display interface, and loading and displaying the cover entries of other video files according to the received user sliding up operation/sliding down operation; and when the selection operation is detected, selecting the video files to be played from the plurality of video files and playing.

In this embodiment of the present application, optionally, before determining a structural straight line of a target video frame in a video file, the method further includes: and determining a target video frame from the video file according to the detected video frame selection operation.

The video frame selection operation may specifically be a click operation, a touch screen operation, a voice input operation, a gesture control operation, an information input operation, or the like, which is not limited in the embodiments of the present application.

Specifically, on one hand, the method for determining the target video frame from the video file according to the detected video frame selection operation may be: when user operation (such as clicking operation, touch screen operation and gesture control operation) for pausing video file playing is detected, outputting a window containing a video frame selection control; when the video frame selection operation acting on the video frame selection control is detected, the video frame in the pause interface of the video file is determined as the target video frame, so that a user can conveniently select the target video frame at any time in the process of watching the video, and the interactivity between the user and the user is enhanced.

On the other hand, the method for determining the target video frame from the video file according to the detected video frame selection operation may be as follows: when a voice input operation triggering the starting of the voice detection function is detected, voice is converted into text (for example, I selects 3 minutes and 20 seconds of video) in response to the voice input operation, and a target video frame is determined from a video file according to the corresponding semantics of the text.

In another aspect, the method for determining the target video frame from the video file according to the detected video frame selection operation may be: when the information input operation is detected, input information corresponding to the information input operation is acquired, time information (such as 3 minutes and 20 seconds) in the input information is extracted, and a video frame corresponding to the time information is determined as a target video frame.

Therefore, the optional embodiment can provide a video frame selection function, enhance interactivity and be beneficial to improving the object fusion effect.

In this embodiment of the present application, optionally, determining a structural straight line of a target video frame in a video file includes: extracting the features of the target video frame to obtain a feature map of the target video frame; predicting to obtain a straight line connection point corresponding to the target video frame according to the feature map; and generating a structural straight line according to the straight line connecting points.

In addition, optionally, before obtaining the feature map of the target video frame before performing feature extraction on the target video frame, the method further includes the following steps: the method includes preprocessing a target video frame, where the preprocessing may include resizing, tone adjustment, and/or filter adjustment, and the like, and embodiments of the present application are not limited thereto.

Specifically, the manner of extracting the features of the target video frame to obtain the feature map of the target video frame may be as follows: down-sampling the target video frame through the convolution layer to obtain a first intermediate characteristic; inputting the first intermediate feature into a residual error unit, and down-sampling the first intermediate feature through a plurality of residual error units (for example, 64 residual error units) to obtain a second intermediate feature; performing maximum pooling on the second intermediate features to obtain third intermediate features; down-sampling the third intermediate feature through 4 residual error units with the step length of 2 of a Stacked HourglassNetworks (SHN) to obtain a fourth intermediate feature; performing nearest neighbor difference value upsampling on the fourth intermediate feature to obtain a feature map of the target video frame; wherein, the size of convolution kernel corresponding to convolution layer is 7 × 7, and the sliding step length is 2; the input and output of the residual unit are 256 dimensions.

Specifically, the dimension corresponding to the feature map is W × H, and the manner of obtaining the straight line connection point corresponding to the target video frame according to the feature map prediction can be that the feature map of W × H is divided into b dimensions W_b×H_bThe area of (a); calculating a likelihood probability characteristic graph J and an offset characteristic graph O corresponding to the b cells respectively, wherein the offset characteristic graph O comprises all real connection points in the target video frame; all real connection points are screened through J' (b), and straight line connection points corresponding to K target video frames with the highest confidence coefficient are obtained, namely,

k is a positive integer; where b is a positive integer, the likelihood probability feature graph J may be represented as

The offset profile O can be expressed as

J' (b) can be expressed as

It should be noted that otherwise means otherwise, i is used to indicate that the ith cell is currently processed, p represents a vertex position of the matrix V in the area, and the corresponding dimension of the matrix V is W_b×H_bO (b) ranges

N (b) represents 8 cells in the vicinity of b.

In addition, optionally, the manner of generating the structural straight line according to the straight line connection point may specifically be: and connecting the straight line connecting points pairwise, and drying out the crossed line segments to obtain the structural straight line corresponding to the target video frame.

Therefore, by implementing the optional embodiment, the straight line connecting point in the target video frame can be determined in a feature extraction mode, so that the structural straight line is generated according to the straight line connecting point, and further the subsequent object fusion is performed according to the structural straight line, and the fusion effect is improved.

Further optionally, generating a structural straight line according to the straight line connection point includes: predicting a reference structure straight line according to the straight line connecting points; and screening the reference structure straight line according to the characteristic diagram to obtain the structure straight line of the target video frame.

Specifically, the manner of predicting the reference structure line according to the line connection point may specifically be: extracting a positive sample (S +) and a negative sample (S-) from the real label by a static line segment sampler to obtain a positive sample set D + and a positive sample set D-, wherein the quantity of the negative samples is O (| V |)²) The number of positive samples is O (| E |), wherein O (| V |)²) > O (| E |); to O (| V! non conducting phosphor²) Sampling (S-); furthermore, the above-mentioned can be combined by a dynamic line segment sampler

Matching to the real connection point and expressing

Calculate each

The optimal real connecting points m corresponding to the straight connecting points respectively_i，

And m_iThe distance between the two is less than a preset threshold value; according to

Each straight line connecting point and m corresponding to the straight line connecting point_iGenerating line segment candidates

Wherein, i1, i2 ∈ {1,2, … …, K }, i 1! i2, and (m) is_i1，m_i2) ∈ S +, then

Added to the positive sample set D +, if (m)_i1，m_i2) ∈ S-, then

Adding to a negative sample set D-; selecting N candidate line segments from D +, D-and D as reference structure straight lines

Wherein L is a straight line connecting point

A collection of (a).

In addition, the method of screening the reference structural straight line according to the feature map to obtain the structural straight line of the target video frame may specifically be: each reference structure is straight-line

The endpoint coordinates of (2) are input into the pooling layer, so that the pooling layer calculates the coordinates of the intermediate point according to the endpoint coordinates

And according to q_kCalculating and feeding back a characteristic vector corresponding to the reference structure straight line; inputting the fed-back feature vector into a full-connection layer so that the full-connection layer determines the corresponding classification of each reference structure straight line according to a feature map, wherein the classification result comprises a first set and a second set, and the reference structure straight line in the first set is the structure straight line of the required target video frameThe reference structural straight line in the second set is not the structural straight line of the target video frame; and determining the reference structural straight line in the first set as the structural straight line of the target video frame. Wherein the first set and the second set are for category only and no priority.

Referring to fig. 4, fig. 4 schematically illustrates a structural diagram of a wire analysis model according to an embodiment of the present application. Where the outline analysis model 400 shown in FIG. 4 is used to perform the above-described determination of structural lines of target video frames in a video file, the outline analysis model may be an end-to-end outline analysis model (L-CNN) using a single unified neural network. Specifically, after the target video frame 401 is determined, feature extraction may be performed on the target video frame through the backbone network 402 to obtain a feature map 403 of the target video frame; further, the feature map 403 may be input to the connection point prediction module 404, so that the connection point prediction module 404 predicts a straight connection point 405 corresponding to the target video frame according to the feature map; further, the straight line connection point 405 may be input to the straight line sampling module 406 such that the straight line sampling module 406 predicts the reference structure straight line 407 from the straight line connection point; furthermore, the reference structure straight line 407 may be input into the straight line correction module 408, so that the pooling layer 4081 in the straight line correction module 408 filters the reference structure straight line according to the feature map to obtain a filtering result including a first set and a second set, where the first set is represented by a "pair hook", the second set is represented by a "cross mark", and a line segment in the first set is determined as the structure straight line 4082 of the target video frame.

Therefore, by implementing the optional embodiment, the subsequent object fusion can be facilitated by predicting the structural straight line in the target video frame, and the fusion degree of the target video frame and the object can be improved by performing the object fusion according to the structural straight line, so that the fusion effect is improved.

In step S320, a vanishing point straight line of the target video frame is determined from the structural straight lines according to the straight line selection operation.

The straight line selection operation may be a touch screen operation or a click operation, and the like, and the embodiment of the present application is not limited. In addition, the vanishing point in the vanishing point straight lines is the intersection point of the projection straight lines of the space parallel straight lines on the image, two vanishing point straight lines correspond to one vanishing point, and the straight lines passing through the vanishing point are vanishing point straight lines. The vanishing point line may be a target structure line selected by a user from a plurality of structure lines for determining vanishing points in the target video frame. The number of vanishing point straight lines in the target video frame is at least two, and in the embodiment of the present application, four vanishing point straight lines are taken as an example.

Specifically, the manner of determining the vanishing point straight line of the target video frame from the structural straight lines according to the straight line selection operation may specifically be: displaying the determined structural straight line of the target video frame in the target video frame and detecting straight line selection operation; and if the straight line selection operation is not detected, randomly selecting a straight line as a vanishing point straight line, and if the straight line selection operation is detected, determining a structural straight line corresponding to the straight line selection operation as the vanishing point straight line of the target video frame. Further optionally, after determining the structural straight line corresponding to the straight line selecting operation as the vanishing point straight line of the target video frame, the method may further include the following steps: if the direction defining operation is detected, determining the direction corresponding to each vanishing point straight line according to the direction defining operation, wherein the direction of each vanishing point straight line can be the x direction or the y direction; and if the direction positioning operation is not detected, defining the direction of the vanishing point straight line by user.

Referring to fig. 5, fig. 5 schematically illustrates a target video frame in accordance with an embodiment of the present application. The method and the device can identify the structural straight line for representing the spatial structure from the target video frame. Referring to fig. 6 in addition to fig. 5, fig. 6 schematically illustrates a target video frame including a structural straight line according to an embodiment of the present application. A plurality of structural straight lines are schematically shown in fig. 6, which may be used to represent the spatial structure of the target video frame. Further, when the straight line selection operation is detected, a vanishing point straight line of the target video frame can be determined from the structural straight lines. Referring to FIG. 7, FIG. 7 schematically illustrates a user interface diagram for capturing straight line selection operations, in accordance with an embodiment of the present application. As shown in fig. 7, the user interface 700 includes an x-axis straight line selection control 701 for providing a function of selecting a vanishing point straight line in the x-axis direction, a y-axis straight line selection control 702 for providing a function of selecting a vanishing point straight line in the y-axis direction, a fusion position selection control 703 for selecting a fusion position of an object to be fused, a target video frame 704 including a structural straight line, and an object to be fused 705. The user can select a desired structural straight line from 704 through 701 and 702 as a vanishing point straight line, or can select a corresponding fusion position for 705 through 703 in 704. Referring to fig. 8, when a line selection operation for triggering 701 and 702 is detected, fig. 8 schematically illustrates a target video frame including a vanishing line according to an embodiment of the present application. The vanishing point line shown in fig. 8 corresponds to a line selection operation. Specifically, the user can determine four structural straight lines through straight line selection operation to serve as vanishing point straight lines, and then the vanishing points corresponding to the target video frames can be determined according to the four structural straight lines.

In step S330, the camera pose is determined according to the vanishing point straight line of the target video frame.

In particular, different camera poses can produce different two-dimensional plane pictures in different shooting directions of the camera in a three-dimensional space. The camera pose may be represented by a camera pose matrix that describes how points in the world coordinate system are transformed into the camera coordinate system, the inverse of the camera pose matrix corresponding to the camera external reference matrix. Referring to fig. 9, fig. 9 schematically illustrates a two-dimensional schematic view of different camera poses according to an embodiment of the present application. As shown in fig. 9, the corresponding two-dimensional image is 910 when the camera is in the camera pose 911; the corresponding two-dimensional image is 920 when the camera is in the camera pose 921; the corresponding two-dimensional image is 930 when the camera is in the camera pose 931; the corresponding two-dimensional image is 940 when the camera is in the camera pose 941; the corresponding two-dimensional image is 950 when the camera is in the camera pose 951; the corresponding two-dimensional image is 960 when the camera is in the camera pose 961. It can be seen that the two-dimensional images captured by the camera are different in different camera poses.

In this embodiment, optionally, determining the camera pose according to the vanishing point straight line of the target video frame includes: determining vanishing points used for representing different directions in the target video frame according to vanishing point straight lines used for representing different directions in the target video frame; determining a camera rotation matrix according to vanishing points used for representing different directions and the camera internal parameter matrix; calculating a camera translation vector according to vanishing points used for representing different directions; the camera pose is calculated from the camera translation vector and the camera rotation matrix.

The vanishing point straight lines for representing different directions may include a vanishing point straight line for representing an x direction and a vanishing point straight line for representing a y direction.

As an alternative embodiment, determining vanishing points representing different directions in a target video frame according to vanishing point straight lines respectively representing different directions in the target video frame includes: determining linear equations respectively corresponding to vanishing point straight lines respectively used for representing different directions in a target video frame; and calculating vanishing points for representing different directions in the target video frame according to a linear equation.

Specifically, the manner of determining the line equations respectively corresponding to vanishing point lines respectively used for representing different directions in the target video frame may specifically be: determining intersection point coordinates of vanishing point straight lines respectively used for representing different directions in a target video frame; and calculating linear equations respectively corresponding to vanishing point straight lines according to the intersection point coordinates. The way of calculating the linear equations respectively corresponding to vanishing point straight lines according to the intersection point coordinates may specifically be: and determining two target intersection point coordinates corresponding to each vanishing point straight line, and performing cross multiplication on the two target intersection point coordinates to obtain a straight line equation of the vanishing point straight line.

Further optionally, the way of calculating vanishing points representing different directions in the target video frame according to the linear equation may specifically be: and performing cross multiplication on linear equations of vanishing point straight lines corresponding to the same direction to further obtain vanishing points corresponding to all directions respectively.

Referring to fig. 10, fig. 10 schematically illustrates a vanishing point diagram according to an embodiment of the present application, based on fig. 8. As shown in FIG. 10, the vanishing point V corresponding to the x-direction can be determined according to the vanishing point linear equation in the x-direction_xAnd determining a vanishing point V corresponding to the x direction according to a vanishing point linear equation in the y direction_y. Wherein, the vanishing point V_xAnd vanishing point V_yIs determined by two vanishing point linear equations in the same direction.

Therefore, by implementing the optional embodiment, the vanishing points in the corresponding direction can be determined through the vanishing point straight line selected by the user, so that the fusion of the objects to be fused according to the vanishing points is facilitated, and the fusion effect is improved.

As an optional implementation manner, the vanishing points for characterizing different directions are respectively used for characterizing the x direction and the y direction, and the camera rotation matrix is determined according to the vanishing points for characterizing different directions and the camera internal reference matrix, including: determining a reference vector in the x direction according to the camera internal reference matrix, the vanishing point in the x direction and a preset adjusting factor; determining a reference vector in the y direction according to the camera internal reference matrix, the vanishing point in the y direction and a preset adjusting factor; calculating a cross product result of the reference vector in the x direction and the reference vector in the y direction; and combining the reference vector in the x direction, the reference vector in the y direction and the cross multiplication result to obtain a camera rotation matrix.

The camera internal reference matrix may be used to represent a focal length, a pixel size, and the like, and may specifically be represented as:

f is the focal length (unit: mm); dx and dy are the pixel sizes; u. of₀And v₀An image center for representing a target video frame;

for representing a normalized focal length in the x-axis direction;

for expressing the normalized focal length in the y-axis direction. In addition, the preset adjustment factor may be any constant. For example, the camera reference matrix may be

In particular, according to camera parametersThe mode of determining the reference vector in the x direction by the matrix, the vanishing point in the x direction, and the preset adjustment factor may specifically be: inputting a camera internal reference matrix, a vanishing point in the x direction and a preset adjusting factor into an expression r₁＝K^- ¹zV_xTo calculate the reference vector r in the x direction₁. Further, the method for determining the reference vector in the y direction according to the camera internal reference matrix, the vanishing point in the y direction, and the preset adjustment factor may specifically be: inputting a camera internal reference matrix, a vanishing point in the y direction and a preset adjusting factor into an expression r₂＝K^-1zV_yTo calculate the reference vector r in the y direction₂. Further, the way of calculating the cross product result of the reference vector in the x direction and the reference vector in the y direction may specifically be: reference vector r₁And a reference vector r₂Performing cross multiplication to obtain a cross multiplication result r₃＝r₁×r₂. Further, combining the reference vector in the x direction, the reference vector in the y direction, and the cross-product result to obtain the camera rotation matrix may specifically be: reference vector r in x direction₁Y-direction reference vector r₂And combining the cross multiplication results r₃Obtaining a camera rotation matrix R ═ (R)₁r₂r₃)。

Therefore, by implementing the optional embodiment, the camera rotation matrix can be obtained according to vanishing point calculation, and the rotation matrix is an important element in the camera attitude matrix, so that more accurate camera attitude can be determined by determining the rotation matrix, and the object fusion effect can be improved.

As an optional implementation manner, after combining the reference vector in the x direction, the reference vector in the y direction, and the cross-multiplied result to obtain the camera rotation matrix, the method further includes: determining rotation angles of the camera in the x direction, the y direction and the z direction respectively according to the camera rotation matrix; determining the actual position of the camera according to the rotation angle; wherein the actual position of the camera is the position of the camera in the real three-dimensional space.

In particular, the x, y and z directions of the camera are determined according to the camera rotation matrixThe rotation angle may specifically be: let the camera rotation matrix R ═ R (R)₁r₂r₃) R in₃Substituting into formula

And β si6^-1r₃(2) Calculating a rotation angle α of the camera about the z-axis and a rotation angle β of the camera about the z-axis, and calculating a rotation angle according to R ═ R (R ═ R)₁r₂r₃) The rotation angle gamma of the camera around the x axis is determined, and the actual position of the camera can be determined according to the rotation angle [ αβ gamma ]]Determining the actual camera position [ X Y Z1 ] of the camera in the real three-dimensional space]. The actual position of the camera can be used for determining the camera posture and performing dimension conversion on the object to be fused.

Therefore, by implementing the optional embodiment, the actual position of the camera in the real space can be determined according to the camera rotation matrix, so that the camera posture can be favorably determined, and the subsequent object fusion effect is improved.

As an alternative embodiment, the calculation of the camera translation vector from vanishing points characterizing different directions comprises: determining coordinates respectively corresponding to vanishing points used for representing different directions; and calculating a homography matrix according to the coordinates and calculating a translation vector of the camera according to the homography matrix.

Wherein, vanishing points V for characterizing different directions_xAnd V_yIs obtained by cross-multiplying the linear equations in the corresponding directions, e.g. V_x＝(-43179712,2756165,5347)，V_y(-13688880,44914255, -40825). In addition, the camera translation vector is used for representing the translation transformation relation of the camera relative to the world coordinate system, and the homography matrix is a plane homography matrix and is used for representing perspective transformation of one plane in the real world and the corresponding image.

Specifically, the manner of calculating the homography matrix according to the coordinates and calculating the camera translation vector according to the homography matrix may specifically be: calculating a homography matrix H according to the coordinates; calculating a camera translation vector from a homography matrix

Wherein the content of the first and second substances,

‖r₁‖＝‖r₂‖＝1。

therefore, by implementing the optional embodiment, the translation vector of the camera can be obtained through calculation, and the camera posture can be determined according to the translation vector, so that the object to be fused can be fused according to the camera posture.

In addition, the method for calculating the camera pose according to the camera translation vector and the camera rotation matrix may specifically be: based on expressions

Determining

Wherein the camera rotation matrix

Camera translation vector

U_imgAnd V_imgRespectively used for representing horizontal and vertical coordinates in a two-dimensional plane representing the fusion position of the object to be fused; further, the camera pose can be determined from R and t

In step S340, the object to be fused is fused into the target video frame according to the camera pose.

In this embodiment of the application, optionally, if the object to be fused is a three-dimensional object, fusing the object to be fused to the target video frame according to the camera pose includes: performing dimension conversion on an object to be fused according to the camera posture, wherein the object to be fused after the dimension conversion is a two-dimensional object (namely, an object in a 3-dimensional space is mapped into a 2-dimensional image according to an internal reference matrix and a camera posture matrix obtained through calculation through the expression, namely rendering a 3d object material to generate two-dimensional pixel points corresponding to a video frame); and fusing the object to be fused after the dimension conversion into the target video frame.

Specifically, the object to be fused is subjected to dimension conversion according to the camera pose, and the mode that the object to be fused after the dimension conversion is a two-dimensional object may specifically be: rendering an object to be fused according to the camera posture to generate two-dimensional pixel points corresponding to the target video frame; the camera parameters are used to characterize the camera configuration and to define the parameters (e.g., aperture size, depth of field) and the like. Further, the method may further include the following operations: when the adjustment operation for the object to be fused is detected, responding to the adjustment operation, where the adjustment operation may include a size adjustment operation, a position adjustment operation, and the like, and the embodiment of the application is not limited. Further, the object to be fused after the dimension conversion may be fused to the target video frame, please refer to fig. 11, fig. 11 schematically illustrates an object fusion result diagram according to an embodiment of the present application, and fig. 11 illustrates the target video frame fused with the object to be fused.

Therefore, by implementing the optional embodiment, the dimension of the object to be fused can be converted, so that the object can be fused in the target video frame more easily, and the object fusion effect is improved.

In this embodiment of the application, optionally, after the object to be fused is fused into the target video frame according to the camera pose, the method further includes: determining vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame; and the time points corresponding to the other video frames are later than the target video frame.

Specifically, the manner of determining vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame may specifically be: determining a vanishing point straight line of an adjacent next video frame in the video file according to the vanishing point straight line of the target video frame, and further determining the vanishing point straight line in each video frame after the next video frame according to the vanishing point straight line in the previous video frame; the time points corresponding to the other video frames may be the playing time corresponding to each of the other video frames. In addition, it should be noted that the number of other video frames in the video file may be a preset number (e.g., 200 frames), or may also be a user-defined number, which may be beneficial for a user to perform object fusion on a video clip required in the video file.

Therefore, the implementation of the optional embodiment can realize the tracking of the vanishing point straight line in the video frames, and a user only needs to determine the vanishing point straight line in one video frame without determining the vanishing point straight line in each frame, so that the operation can be simplified, and the use experience of the user can be improved.

In this optional embodiment, determining vanishing point straight lines of other video frames in the video file according to the vanishing point straight line of the target video frame includes: determining a vanishing point straight line of a first video frame in other video frames according to the vanishing point straight line of the target video frame; wherein the first video frame is adjacent to the target video frame; determining vanishing point straight lines of second video frames in other video frames according to the vanishing point straight lines of the first video frame until the vanishing point straight lines of all other video frames are determined; the first video frame is adjacent to the target video frame, and the second video frame is adjacent to the first video frame.

Determining a vanishing point straight line of a first video frame in other video frames according to the vanishing point straight line of the target video frame, wherein the determining comprises the following steps: determining all structural straight lines in the first video frame; calculating the distances between all the structural straight lines and vanishing point straight lines of the target video frame respectively; and determining the structural straight line corresponding to the shortest distance as the vanishing point straight line of the first video frame.

Specifically, the way of calculating the distances between all the structural straight lines and the vanishing point straight line of the target video frame may specifically be: according to the expression

Calculating the distance d between all the structural straight lines and the vanishing point straight line of the target video frame respectively, wherein x_iAnd y_iRespectively used for representing the horizontal and vertical coordinates, L is used for representing a straight line, i is used for representing the ith endpoint of the straight line, and i is a positive integer.

Therefore, by implementing the optional embodiment, vanishing point straight lines in other video frames can be determined according to vanishing point straight lines in the target video frame in a straight line tracking manner, so that the user operation can be simplified, and the use experience of the user can be improved.

In this embodiment of the present application, optionally, after determining vanishing point straight lines of other video frames in the video file according to the vanishing point straight line of the target video frame, the method further includes: determining camera postures corresponding to other video frames according to vanishing point straight lines of the other video frames; and respectively fusing the objects to be fused into other video frames according to the camera postures and the camera parameters corresponding to the other video frames.

Specifically, the manner of determining the camera pose corresponding to the other video frame according to the vanishing point straight line of the other video frame may specifically be: and determining the camera pose corresponding to each other video frame according to the vanishing point straight lines of other video frames and the camera pose of the target video frame. In addition, for fusing the object to be fused to other video frames according to the camera pose and the camera parameters corresponding to the other video frames, the camera parameters may include a camera internal parameter matrix.

Therefore, the optional embodiment can be implemented to perform object fusion on other video frames according to the target video frame, so that the object fusion efficiency is improved.

Referring to fig. 12, fig. 12 schematically illustrates a flow diagram of an object fusion method according to an embodiment of the present application. As shown in fig. 12, the object fusion method includes: step S1200 to step S1270, wherein:

step S1200: and determining a target video frame from the video file according to the detected video frame selection operation, performing feature extraction on the target video frame to obtain a feature map of the target video frame, and predicting according to the feature map to obtain a straight line connection point corresponding to the target video frame.

Step 1210: and predicting a reference structure straight line according to the straight line connecting points, screening the reference structure straight line according to the characteristic diagram to obtain a structure straight line of the target video frame, and further determining straight line equations respectively corresponding to vanishing point straight lines respectively used for representing different directions in the target video frame.

Step S1220: and calculating vanishing points for representing different directions in the target video frame according to a linear equation.

Step S1230: determining a reference vector in the x direction according to the camera internal reference matrix, the vanishing point in the x direction and a preset adjusting factor, determining a reference vector in the y direction according to the camera internal reference matrix, the vanishing point in the y direction and the preset adjusting factor, further calculating a cross product result of the reference vector in the x direction and the reference vector in the y direction, and combining the reference vector in the x direction, the reference vector in the y direction and the cross product result to obtain the camera rotation matrix.

Step S1240: determining coordinates respectively corresponding to vanishing points used for representing different directions, calculating a homography matrix according to the coordinates, calculating a camera translation vector according to the homography matrix, further calculating a camera posture according to the camera translation vector and the camera rotation matrix, performing dimension conversion on an object to be fused according to the camera posture, wherein the object to be fused after the dimension conversion is a two-dimensional object, and fusing the object to be fused after the dimension conversion into a target video frame.

Step S1250: determining all structural straight lines in the first video frame, calculating the distances between all the structural straight lines and the vanishing point straight lines of the target video frame respectively, and further determining the structural straight line corresponding to the shortest distance as the vanishing point straight line of the first video frame; the first video frame is adjacent to the target video frame, and the corresponding time points of other video frames are later than the target video frame.

Step S1260: and determining the vanishing point straight lines of the second video frames in other video frames according to the vanishing point straight lines of the first video frame until the vanishing point straight lines of all other video frames are determined.

Step S1270: and determining camera postures corresponding to other video frames according to vanishing point straight lines of the other video frames, and respectively fusing the object to be fused into the other video frames according to the camera postures and the camera parameters corresponding to the other video frames.

It should be noted that steps S1200 to S1270 correspond to the steps and embodiments shown in fig. 3, and for the specific implementation of steps S1200 to S1270, please refer to the steps and embodiments shown in fig. 3, which are not described herein again.

It can be seen that, by implementing the method shown in fig. 12, the camera pose can be determined based on the vanishing point straight line of the target video frame that needs to be subjected to object fusion, and then the object to be fused is fused into the target video frame based on the camera pose, so that the fusion degree between the object to be fused and the video file is improved, and the fusion effect is improved. In addition, the vanishing point straight line of the target video frame can be determined through the straight line selection operation, and the interaction with the user is enhanced.

Further, in the present exemplary embodiment, an object fusion apparatus is also provided. Referring to fig. 13, the object fusion apparatus 1300 may include a straight line determination unit 1301, a camera pose determination unit 1302, and an object fusion unit 1303, wherein:

a straight line determining unit 1301, configured to determine a structural straight line of a target video frame in a video file;

the straight line determining unit 1301 is further configured to determine a vanishing point straight line of the target video frame from the structural straight line according to a straight line selecting operation;

a camera pose determination unit 1302, configured to determine a camera pose according to a vanishing point straight line of a target video frame;

and an object fusion unit 1303, configured to fuse the object to be fused into the target video frame according to the camera pose.

It can be seen that, by implementing the apparatus shown in fig. 13, the camera pose can be determined based on the vanishing point straight line of the target video frame that needs to be subject fused, and then the subject to be fused is fused into the target video frame based on the camera pose, so as to improve the fusion degree between the subject to be fused and the video file, and improve the fusion effect. In addition, the vanishing point straight line of the target video frame can be determined through the straight line selection operation, and the interaction with the user is enhanced.

In an exemplary embodiment of the present application, the apparatus further includes a video frame selecting unit (not shown), wherein:

and the video frame selecting unit is configured to determine the target video frame from the video file according to the detected video frame selecting operation before the straight line determining unit 1301 determines the structural straight line of the target video frame in the video file.

In an exemplary embodiment of the present application, the straight line determining unit 1301 determines a structural straight line of a target video frame in a video file, including:

In an exemplary embodiment of the present application, the straight line determining unit 1301 generates a structural straight line from the straight line connection point, including:

In an exemplary embodiment of the present application, the camera pose determination unit 1302 determines the camera pose according to a vanishing point straight line of the target video frame, including:

The camera pose determining unit 1302 determines vanishing points representing different directions in the target video frame according to vanishing point straight lines representing different directions in the target video frame, including:

The vanishing points used for representing different directions are respectively used for representing the x direction and the y direction, and the camera pose determining unit 1302 determines the camera rotation matrix according to the vanishing points used for representing different directions and the camera internal reference matrix, including:

After the camera pose determining unit 1302 combines the reference vector in the x direction, the reference vector in the y direction, and the cross-product result to obtain the camera rotation matrix, the apparatus further includes a rotation angle determining unit (not shown) and a position determining unit (not shown), wherein:

The camera pose determination unit 1302 calculates a camera translation vector according to vanishing points representing different directions, including:

In an exemplary embodiment of the present application, if the object to be fused is a three-dimensional object, the fusing the object to be fused into the target video frame according to the camera pose by the object fusion unit 1303, including:

In an exemplary embodiment of the present application, the straight line determining unit 1301 is further configured to determine vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame after the object fusing unit 1303 fuses the object to be fused into the target video frame according to the camera pose; and the time points corresponding to the other video frames are later than the target video frame.

In an exemplary embodiment of the present application, the determining unit 1301 determines vanishing point straight lines of other video frames in the video file according to the vanishing point straight line of the target video frame, including:

Specifically, the determining unit 1301 determines a vanishing point straight line of a first video frame in other video frames according to the vanishing point straight line of the target video frame, including:

determining all structural straight lines in the first video frame;

In an exemplary embodiment of the present application, the camera pose determining unit 1302 is further configured to, after the straight line determining unit 1301 determines the vanishing point straight lines of other video frames in the video file according to the vanishing point straight lines of the target video frame, determine the camera poses corresponding to the other video frames according to the vanishing point straight lines of the other video frames;

and the object fusion unit 1303 is further configured to fuse the objects to be fused into the other video frames according to the camera poses and the camera parameters corresponding to the other video frames.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the object fusion method described above for the details that are not disclosed in the embodiments of the apparatus of the present application, because each functional module of the object fusion apparatus of the exemplary embodiment of the present application corresponds to a step of the exemplary embodiment of the object fusion method described above.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An object fusion method, comprising:

determining a structural straight line of a target video frame in a video file;

2. The method of claim 1, wherein determining a structural line of a target video frame in a video file comprises:

and generating the structural straight line according to the straight line connecting point.

3. The method of claim 2, wherein generating the structural line from the line connection points comprises:

4. The method of claim 1, wherein determining a camera pose from a vanishing point line of the target video frame comprises:

determining a camera rotation matrix according to the vanishing points used for representing different directions and the camera internal parameter matrix;

calculating a translation vector of the camera according to the vanishing points used for representing different directions;

calculating a camera pose from the camera translation vector and the camera rotation matrix.

5. The method of claim 4, wherein determining vanishing points in the target video frame for representing different directions according to vanishing point straight lines in the target video frame for representing different directions respectively comprises:

determining linear equations respectively corresponding to vanishing point straight lines respectively used for representing different directions in the target video frame;

and calculating vanishing points which are used for representing different directions in the target video frame according to the linear equation.

6. The method of claim 4, wherein the vanishing points characterizing different directions are respectively used for characterizing an x-direction and a y-direction, and wherein determining a camera rotation matrix from the vanishing points characterizing different directions and a camera internal reference matrix comprises:

determining a reference vector of the y direction according to the camera internal reference matrix, the vanishing point of the y direction and the preset adjusting factor;

and combining the reference vector in the x direction, the reference vector in the y direction and the cross multiplication result to obtain the camera rotation matrix.

7. The method of claim 4, wherein computing a camera translation vector from the vanishing points characterizing different directions comprises:

determining coordinates respectively corresponding to the vanishing points used for representing different directions;

and calculating a homography matrix according to the coordinates and calculating the camera translation vector according to the homography matrix.

8. The method according to claim 4, wherein if the object to be fused is a three-dimensional object, fusing the object to be fused to the target video frame according to the camera pose comprises:

performing dimension conversion on the object to be fused according to the camera posture, wherein the object to be fused after the dimension conversion is a two-dimensional object;

9. The method of claim 1, wherein after fusing the object to be fused into the target video frame according to the camera pose, the method further comprises:

10. The method of claim 9, wherein determining vanishing point lines for other video frames in the video file from the vanishing point lines for the target video frame comprises:

determining a vanishing point straight line of a first video frame in the other video frames according to the vanishing point straight line of the target video frame; wherein the first video frame is adjacent to the target video frame;

determining vanishing point straight lines of second video frames in the other video frames according to the vanishing point straight lines of the first video frame until the vanishing point straight lines of all the other video frames are determined;

wherein the first video frame is adjacent to the target video frame and the second video frame is adjacent to the first video frame.

11. The method of claim 10, wherein determining the vanishing point straight line of the first video frame of the other video frames according to the vanishing point straight line of the target video frame comprises:

determining all structural straight lines in the first video frame;

12. The method of claim 11, wherein after determining vanishing point lines of other video frames in the video file according to the vanishing point lines of the target video frame, the method further comprises:

determining camera postures corresponding to the other video frames according to the vanishing point straight lines of the other video frames;

and respectively fusing the objects to be fused into the other video frames according to the camera postures corresponding to the other video frames and the camera parameters.

13. An object fusion apparatus, comprising:

the straight line determining unit is further used for determining a vanishing point straight line of the target video frame from the structural straight lines according to straight line selection operation;

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-12.

15. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-12 via execution of the executable instructions.