CN116456039A - Video synthesis method, device and system - Google Patents

Video synthesis method, device and system Download PDF

Info

Publication number
CN116456039A
CN116456039A CN202210022166.8A CN202210022166A CN116456039A CN 116456039 A CN116456039 A CN 116456039A CN 202210022166 A CN202210022166 A CN 202210022166A CN 116456039 A CN116456039 A CN 116456039A
Authority
CN
China
Prior art keywords
imaging
target object
video
video image
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210022166.8A
Other languages
Chinese (zh)
Inventor
张莉娜
张明
屈小刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210022166.8A priority Critical patent/CN116456039A/en
Priority to PCT/CN2023/071293 priority patent/WO2023131327A1/en
Publication of CN116456039A publication Critical patent/CN116456039A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2624Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of whole input images, e.g. splitscreen

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Devices (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a video synthesis method, device and system, and belongs to the technical field of video processing. The management device acquires N frames of video images acquired by N cameras deployed in a target scene at each acquisition time of a plurality of acquisition times. The management equipment acquires one frame of imaged target video image comprising a target object from N frames of video images corresponding to each acquisition time, and then synthesizes a video stream corresponding to the target object according to multiple frames of target video images corresponding to multiple acquisition times. Clear video images are acquired by a plurality of cameras for different areas in a target scene, respectively. The management device selects a frame of video image which corresponds to each acquisition time and contains the imaging of the target object to carry out video synthesis, and as the target object moves in the shooting areas of different cameras, the camera can always acquire a clear moving picture of the target object, so that the synthesized video stream can provide a clear picture of the whole-course moving of the target object in the target scene.

Description

Video synthesis method, device and system
Technical Field
The present disclosure relates to the field of video processing technologies, and in particular, to a video synthesis method, apparatus, and system.
Background
Video is typically recorded during the training and playing of the athlete, so that the trainer analyzes the athletic data of the athlete and can then pertinently develop a training program for the individual athlete. However, since the sports ground is generally large, a camera with a fixed position cannot clearly capture the full-range motion picture of an athlete, and thus, in a video stream captured by a camera with a fixed position, the full-range motion picture of an individual athlete cannot be ensured.
Disclosure of Invention
The application provides a video synthesis method, a video synthesis device and a video synthesis system, which aim at a video stream synthesized by a single movable object to provide clear pictures of the whole course of the movable object in a scene.
In a first aspect, a video composition method is provided, the method being applied to a management device. The management equipment acquires N frames of video images acquired by N cameras deployed in a target scene at each acquisition time in a plurality of acquisition times, wherein N is more than or equal to 2. The management device acquires a frame of target video image from N frames of video images corresponding to each acquisition time, wherein the target video image comprises imaging of a target object. And the management equipment synthesizes video streams corresponding to the target objects according to the multi-frame target video images corresponding to the acquisition moments, wherein the video streams are used for reflecting the activity information of the target objects in the target scene.
According to the method and the device, the plurality of cameras are fixedly arranged in the target scene, the shooting areas of the cameras are different, and the cameras can shoot clear video images aiming at the different areas in the target scene. The management device selects one frame of imaged video image containing the target object from multiple frames of video images respectively acquired by multiple cameras at the same acquisition time to perform video synthesis, and as the cameras can respectively shoot clear video images of corresponding areas in the target scene, when the target object moves in the shooting areas of different cameras, the cameras can always acquire clear moving pictures of the target object, so that the synthesized video stream can provide clear pictures of the whole-course movement of the target object in the target scene, namely the definition of the moving pictures of the target object in the synthesized video stream is ensured. In addition, because the cameras are fixedly deployed, camera parameters can be preset according to the required shooting areas, and the camera parameters do not need to be adjusted in the shooting process, so that the implementation mode is simple.
Optionally, the implementation manner of obtaining a frame of target video image from the N frames of video images corresponding to each acquisition time by the management device includes: the management equipment acquires all the video images to be selected including the imaging of the target object in the N frames of video images corresponding to each acquisition time, and then acquires the target video image from all the video images to be selected.
Optionally, the N cameras include a first camera and a second camera, the first camera and the second camera having a common view area. The implementation manner of acquiring all the video images to be selected including the imaging of the target object in the N frames of video images corresponding to each acquisition time by the management equipment comprises the following steps: when a target object is located in a common view area of a first camera and a second camera at a first acquisition time, the management device takes a first video image acquired by the first camera at the first acquisition time and a second video image acquired by the second camera at the first acquisition time as to-be-selected video images corresponding to the first acquisition time.
Accordingly, the implementation manner of the management device to acquire the target video image from all the video images to be selected may include: the management device obtains a first imaging of the target object in a first video image and a second imaging of the target object in a second video image. In response to the imaging effect of the first imaging being better than the imaging effect of the second imaging, the management device takes the first video image as a target video image corresponding to the first acquisition time.
In the application, the management device may use, as the target video image, a video image that includes imaging of the target object and has an optimal imaging effect of the target object in N frames of video images acquired at the same acquisition time, so as to be used for synthesizing a video stream corresponding to the target object. The definition of the moving picture of the target object in the synthesized video stream can be further improved, so that the synthesized video stream better reflects the moving characteristic of the target object, and the moving characteristic of the target object can be analyzed.
Optionally, the imaging effect of the first imaging is better than the imaging effect of the second imaging, satisfying one or more of the following conditions: the imaging area of the first imaging is larger than the imaging area of the second imaging. The first imaging includes a greater number of bone points than the second imaging includes. The confidence of the bone data of the first imaging is greater than the confidence of the bone data of the second imaging.
The greater the imaging area, the more details can be generally represented, the greater the number of bone points included in imaging or the higher the confidence coefficient of bone data, the better the activity characteristics of the target object can be reflected, so that the greater the imaging area, the greater the number of bone points included in imaging, the higher the confidence coefficient of bone data imaged, and the better the imaging effect of the imaging can be judged.
Optionally, the management device obtains an implementation of the second imaging of the target object in the second video image, including: the management device acquires a first imaging position of a first key point of the target object in the first video image after acquiring the first imaging of the target object in the first video image. The management device determines a second imaging location of the first keypoint in the second video image from the first imaging location based on a pixel coordinate mapping relationship between the first camera and the second camera. The management device determines a second imaging of the target object in the second video image from the second imaging location.
In the method, through the fact that the pixel coordinate mapping relation between two adjacent cameras is predetermined, when a target object moves to a common view area of the two adjacent cameras, the management device can achieve cross-camera tracking identification of the target object according to the correlation of imaging geometric positions of the target object in video images acquired by the two adjacent cameras. The method and the device are independent of the unique characteristics of the target object, and can be suitable for various scenes through flexible deployment and calibration of the camera.
Optionally, M cameras are deployed in the target scene. Any two adjacent cameras of the M cameras have a common view area. M is more than or equal to N, and N cameras belong to M cameras. The management device stores a plurality of homography matrices, each homography matrix being used for reflecting a pixel coordinate mapping relationship between a group of two adjacent cameras among the M cameras.
In the method, the accuracy of cross-camera tracking identification of the target object can be improved by arranging more cameras in the target scene, and the fluency of the synthesized video stream can be improved by selecting video images acquired by fewer cameras for synthesizing the video stream. I.e., M > N, which can simultaneously ensure the accuracy and fluency of the synthesized video stream.
Alternatively, the management apparatus may perform cropping processing on the target video image after acquiring the target video image such that imaging of the target object is located in a central area of the cropped video image. And the management equipment arranges the video images which are respectively subjected to cutting processing according to time sequence based on a plurality of acquisition moments so as to obtain a video stream corresponding to the target object.
In the application, the management device may perform clipping processing on each acquired target video image, so that the imaging of the target object is in the central area in all video images of the finally synthesized video stream. Therefore, the focus following effect on the target object can be realized, the display effect of the synthesized video stream is better, and the playing picture of the video stream is smoother and smoother, so that the watching experience of a user is improved.
Optionally, the management device may further determine a horizontal position of the second key point of the target object under the world coordinate system according to an imaging position of the second key point in the target video image, and generate the motion trail of the target object according to the horizontal positions of the second key point under the world coordinate system at a plurality of acquisition moments respectively.
In the present application, after acquiring the bone data of the target object, the management device may further perform motion analysis on the target object based on the bone data, including but not limited to determining a motion track of the target object, calculating the number of steps of the target object, calculating the displacement of the target object, calculating the motion speed of the target object, and the like.
Optionally, after the management device acquires the target video image, the management device may further acquire an imaging position of a skeletal point of the target object in the target video image, and display a play screen of the video stream on the play interface, where the skeletal point of the target object is displayed on the imaging of the target object in the play screen.
In the application, when the video stream corresponding to the target object is synthesized, the management device can encapsulate the imaging position of the skeleton point of the target object and the corresponding video image code together, so that when the playing picture of the video stream corresponding to the target object is displayed, the skeleton point of the target object can be displayed on the imaging of the target object in the playing picture, and the analysis of the activity condition of the target object is facilitated.
In a second aspect, a management device is provided. The management device comprises a plurality of functional modules that interact to implement the method of the first aspect and embodiments thereof described above. The plurality of functional modules may be implemented based on software, hardware, or a combination of software and hardware, and the plurality of functional modules may be arbitrarily combined or divided based on the specific implementation.
In a third aspect, there is provided a management apparatus comprising: a processor and a memory;
the memory is used for storing a computer program, and the computer program comprises program instructions;
the processor is configured to invoke the computer program to implement the method in the first aspect and embodiments thereof.
In a fourth aspect, a computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the method of the first aspect and embodiments thereof described above.
In a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of the first aspect and embodiments thereof described above.
In a sixth aspect, a chip is provided, the chip comprising programmable logic circuits and/or program instructions, which when the chip is run, implement the method of the first aspect and embodiments thereof described above.
Drawings
Fig. 1 is a schematic structural diagram of a video compositing system according to an embodiment of the application;
FIG. 2 is a schematic diagram of the relative positions of two adjacent cameras according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a camera distribution position according to an embodiment of the present application;
fig. 4 is a schematic flow chart of a video synthesizing method according to an embodiment of the present application;
FIG. 5 is a schematic view of a distribution of skeletal points of a human body according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a mapping of pixel coordinates between two cameras according to an embodiment of the present application;
fig. 7 is a schematic diagram of comparison before and after cropping of a video image according to an embodiment of the present application;
fig. 8 is a schematic diagram of a playing interface according to an embodiment of the present application;
fig. 9 is a schematic diagram of a motion trajectory of a target object according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a management device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a management device according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a management device according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a management device according to an embodiment of the present application;
fig. 14 is a block diagram of a management apparatus provided in an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of a video compositing system according to an embodiment of the application. As shown in fig. 1, the video composition system includes: a media source 101 and a management device 102.
The media source 101 is used to provide multiple video streams. Referring to fig. 1, the media source 101 includes a plurality of cameras 1011. Each camera 1011 is used to capture a video stream. The times and frequencies at which the plurality of cameras 1011 capture images are the same. Alternatively, the synchronous shooting of the plurality of cameras 1011 may be achieved using a camera synchronization technique. The number of cameras in fig. 1 is used as an exemplary illustration only and not as a limitation on the video composition system provided by embodiments of the present application.
Optionally, any two adjacent cameras of the plurality of cameras 1011 have a common view area. Wherein, the two cameras have a common view area, which means that the shooting areas of the two cameras have a coincidence area. For example, fig. 2 is a schematic diagram of the relative positions of two adjacent cameras according to an embodiment of the present application. As shown in fig. 2, the photographing region of the camera a is a region a, and the photographing region of the camera B is a region B. The region a and the region B have a superimposed region c, which is a common view region of the camera a and the camera B.
Alternatively, the plurality of cameras 1011 may be arranged in an annular arrangement, a fan-shaped arrangement, a linear arrangement, or other irregular arrangements, etc., and the corresponding camera arrangement may be designed according to the actual deployment scenario. For example, multiple cameras are used to capture video of the athlete's movements in an endless speed ski track, and then multiple cameras may be deployed in an endless arrangement around the speed ski track. Fig. 3 is a schematic diagram of a camera distribution position according to an embodiment of the present application. As shown in fig. 3, 20 cameras, denoted as cameras 1-20, respectively, are disposed near the speed skating track. The 20 cameras are in an annular arrangement mode, and the shooting directions of the 20 cameras face to the rapid sliding race track. Alternatively, the full set of photographing areas of the 20 cameras may completely cover the entire speed-slip track, i.e. when a certain athlete moves in the speed-slip track, at least one camera of the 20 cameras is always present at each acquisition instant capable of acquiring video images containing an image of the athlete.
The management device 102 is configured to analyze multiple video streams from the multiple cameras 1011 in the media source 101, so as to extract an imaged video image containing a target object in the multiple video streams, and further synthesize a video stream corresponding to the target object. Each frame of video image in the video stream includes an image of the target object, which may also be referred to as a composite video stream corresponding to the target object. Alternatively, the frame rate of the video stream synthesized by the management apparatus 102 is the same as the frequency at which the camera 1011 captures images. Since there is always at least one camera at each acquisition time when the target object moves in the scene where the plurality of cameras 1011 are disposed, the management apparatus 102 may acquire one frame of video image including the imaging of the target object at each acquisition time from the multiple video streams, and finally, the frame rate of video streams having the same frequency as that of the images acquired by the cameras is synthesized. Alternatively, the management device 102 may be one device or multiple devices. For example, the management device 102 may be a server, or a server cluster including several servers, or a cloud computing service center.
Alternatively, the management device 102 may employ a target detection algorithm to identify a target object in a video image captured by a single camera and a target tracking algorithm to determine the imaging of the target object in a video image captured subsequently by the camera. When the target object moves from the shooting area of the camera to the common view area of the camera and the adjacent camera of the camera, the management device 102 can determine the imaging of the target object in the video images acquired by the adjacent cameras according to the correlation of imaging geometric positions of the target object in the video images acquired by the adjacent two cameras, so as to realize cross-camera tracking identification of the target object.
In the embodiment of the present application, the plurality of cameras 1011 in the media source 101 are all fixedly disposed, and the camera parameters of each camera are preset. In the shooting process, the shooting area and the shooting focus of each camera are fixed, so that the image coordinate system of each camera is fixed, and then the pixel coordinates of the imaging of the common view area of two adjacent cameras in the two adjacent cameras have a fixed mapping relation. The management device 102 may have stored therein a plurality of homography matrices, each homography matrix for reflecting a pixel coordinate mapping relationship between a set of two adjacent cameras. The homography matrix is understood here as a transformation matrix between the image coordinate systems of two adjacent cameras.
After completing multiple phasesAfter deployment and calibration of the camera, the management device 102 may generate a homography matrix reflecting a mapping relationship of pixel coordinates between two adjacent cameras based on pixel coordinates of a plurality of pixel points in a common area of the two adjacent cameras in an image coordinate system of the two adjacent cameras, respectively. For example, referring to the example shown in fig. 3, the homography matrix of the cameras 1 to 2 is H12, a mark point M exists in the common area in the cameras 1 and 2, and the pixel coordinates of the mark point M in the video image acquired by the camera 1 are (x 1m ,y 1m ) The pixel coordinates of the marker point M in the video image acquired by the camera 2 are (x 2m ,y 2m ) Then the following is satisfied: (x) 2m ,y 2m )=H12*(x 1m ,y 1m ). The homography matrix from camera 2 to camera 3 is H23, and a marker point N exists in the common region in the cameras 2 and 3, and the pixel coordinates of the marker point N in the video image acquired by the camera 2 are (x 2n ,y 2n ) The pixel coordinates of the marker point N in the video image acquired by the camera 3 are (x 3n ,y 3n ) Then the following is satisfied: (x) 3n ,y 3n )=H23*(x 2n ,y 2n ). The image coordinate system is a coordinate system taking the top left vertex of an image acquired by a camera as the origin of coordinates. The x-axis and the y-axis of the image coordinate system are the length-width directions of the acquired images respectively.
Optionally, the management device 102 may also select any one of the M cameras, and generate a transformation matrix from the image coordinate system of the camera to the two-dimensional world coordinate system, that is, a homography matrix from the image coordinate system of the camera to the two-dimensional world coordinate system. For example, a plurality of markers can be placed in the shooting area of the camera, the horizontal positions of the markers under the world coordinate system are identified, and the management device calculates the transformation matrix according to the horizontal positions of the markers under the world coordinate system and the pixel coordinates under the image coordinate system. Then, the management apparatus may calculate a transformation matrix of the image coordinate system of each camera to the two-dimensional world coordinate system based on the transformation matrix of the image coordinate system of the camera to the two-dimensional world coordinate system and the plurality of homography matrices respectively reflecting the pixel coordinate mapping relations between the adjacent two cameras. For example, referring to the example shown in fig. 3, where the homography matrix of camera 1 to camera 2 is H12, the homography matrix of camera 2 to camera 3 is H23, and the transformation matrix of the image coordinate system of camera 2 to the two-dimensional world coordinate system is H2w, the transformation matrix H1w of the image coordinate system of camera 1 to the two-dimensional world coordinate system satisfies: h1w=h12×h2w, the transformation matrix H3w of the image coordinate system of the camera 3 to the two-dimensional world coordinate system satisfies: h3w=h32×h2w, H32 being the inverse matrix of H23. That is, given that the transformation matrix of the image coordinate system of camera i to the two-dimensional world coordinate system is Hiw, the transformation matrix Hjw of the image coordinate system of camera j to the two-dimensional world coordinate system satisfies: hjw = Hji × Hiw, hji =hj+1× … ×hi if i > j, and Hji =hj-1× … ×hi if i < j. i and j are positive integers, camera i represents the ith camera of the M cameras, and camera j represents the jth camera of the M cameras. Wherein the world coordinate system can describe the position of the camera in the real world, as well as the position of the object in the image acquired by the camera in the real world. The x-axis and y-axis of the world coordinate system lie on a horizontal plane, and the z-axis is perpendicular to the horizontal plane. The two-dimensional world coordinate system in the embodiment of the present application refers to a horizontal coordinate system composed of an x-axis and a y-axis. The horizontal position in the world coordinate system may be represented by two-dimensional horizontal coordinates (x, y).
Fig. 4 is a flow chart of a video synthesizing method according to an embodiment of the present application. The method may be applied to a management device 102 in a video composition system as shown in fig. 1. As shown in fig. 4, the method includes:
step 401, the management device acquires N frames of video images acquired by N cameras deployed in the target scene at each acquisition time of a plurality of acquisition times.
Wherein N is more than or equal to 2. Optionally, M cameras are deployed in the target scene, and any two adjacent cameras in the M cameras have common view areas, wherein M is larger than or equal to N. The management device stores a plurality of homography matrices, each homography matrix being used for reflecting a pixel coordinate mapping relationship between a group of two adjacent cameras among the M cameras. The N cameras belong to the M cameras. If m=n, it means that the N cameras include all cameras disposed in the target scene. If M > N, it is indicated that the N cameras include a portion of the cameras deployed in the target scene. In the case of M > N, two adjacent cameras of the N cameras may or may not have a common view region. The N selected cameras are uniformly deployed in the target scene, so that the whole set of the shooting areas of the N cameras covers the whole target scene as much as possible.
For example, referring to the example shown in fig. 3, where the target scene is a speed-slide scene, 20 cameras (i.e., m=20) are deployed near the speed-slide track, video images captured by 8 of the cameras may be selected for use in synthesizing the video stream (i.e., n=8), and the 8 cameras may include, for example, camera 2, camera 4, camera 6, camera 9, camera 12, camera 14, camera 16, and camera 19. Assuming that the length of the speed ski track is 400 meters, 1 camera can be selected every 50 meters.
Because the larger the common view area of two adjacent cameras is, the higher the accuracy of the homography matrix which is obtained by calculation and is used for reflecting the pixel coordinate mapping relation between the two adjacent cameras is, more cameras can be deployed in a target scene, the accuracy of the homography matrix obtained by calculation is improved by improving the deployment density of the cameras, and the accuracy of the cross-camera tracking identification of a target object is further improved. When the management device synthesizes the video stream, if the position switching frequency is too high when the video image is selected, the visual angle can be switched too fast, so that the video fluency is poor, and the watching experience of a user is influenced.
In the embodiment of the application, the accuracy of cross-camera tracking and identifying of the target object can be improved by arranging more cameras in the target scene, and the fluency of the synthesized video stream can be improved by selecting video images acquired by fewer cameras for synthesizing the video stream. Thus, the accuracy and fluency of the synthesized video stream can be improved at the same time.
Step 402, the management device acquires a frame of target video image from the N frames of video images corresponding to each acquisition time, where the target video image includes imaging of a target object.
N frames of video images corresponding to each acquisition time come from N cameras respectively. Optionally, the target object is located within a shooting area of at least one of the N cameras at each acquisition instant.
Optionally, the implementation procedure of step 402 includes the following steps 4021 to 4022.
In step 4021, the management apparatus acquires all the candidate video images including the imaging of the target object in the N frames of video images corresponding to each acquisition time.
Optionally, when the target object is located in the shooting area of only one camera at a certain acquisition time, the N frames of video images corresponding to the acquisition time include one frame of video image to be selected. When a target object is located in shooting areas of two or more cameras at a certain acquisition time, the N frames of video images corresponding to the acquisition time comprise two or more frames of video images to be selected.
Optionally, the N cameras include a first camera and a second camera. The first camera and the second camera have a common view region. When a target object is located in a common view area of a first camera and a second camera at a first acquisition time, the management device takes a first video image acquired by the first camera at the first acquisition time and a second video image acquired by the second camera at the first acquisition time as to-be-selected video images corresponding to the first acquisition time.
In step 4022, the management apparatus acquires a frame of target video image from all the video images to be selected.
Optionally, if the number of the to-be-selected video images corresponding to a certain acquisition time acquired by the management device is greater than 1, the management device may use, as the target video image, the to-be-selected video image with the optimal imaging effect of the target object of all the to-be-selected video images corresponding to the acquisition time. Or, the management device may also use any one of all the video images to be selected corresponding to the acquisition time as the target video image.
Alternatively, in combination with the related description with reference to step 4021, the management device may acquire the first imaging of the target object in the first video image and the second imaging of the target object in the second video image after acquiring the first video image and the second video image. In response to the imaging effect of the first imaging being better than the imaging effect of the second imaging, the management device takes the first video image as a target video image corresponding to the first acquisition time.
Optionally, the imaging effect of the first imaging is better than the imaging effect of the second imaging, satisfying one or more of the following conditions: the imaging area of the first imaging is larger than the imaging area of the second imaging. The first imaging includes a greater number of bone points than the second imaging includes. The confidence of the bone data of the first imaging is greater than the confidence of the bone data of the second imaging. Wherein the imaging area of the first imaging refers to the imaging area of the target object in the first video image, and the imaging area of the second imaging refers to the imaging area of the target object in the second video image. The first imaging and the second imaging each include bone points that are directly imaged and do not include inferred bone points. The confidence of bone data refers to the overall confidence of all bone points, wherein all bone points comprise bone points directly reflected on imaging and bone points which cannot be reflected on imaging, the bone points which cannot be reflected on imaging can be inferred by a related algorithm to obtain corresponding positions, and the confidence of the bone points of the positions obtained by inference is generally lower.
The greater the imaging area, the more details can be generally represented, the greater the number of bone points included in imaging or the higher the confidence coefficient of bone data, the better the activity characteristics of the target object can be reflected, so that the greater the imaging area, the greater the number of bone points included in imaging, the higher the confidence coefficient of bone data imaged, and the better the imaging effect of the imaging can be judged.
Optionally, the target object is a human body. Skeletal points of the human body include, but are not limited to, nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles, and the like. For example, fig. 5 is a schematic diagram of a distribution of skeletal points of a human body according to an embodiment of the present application. As shown in fig. 5, the human body may include 17 skeletal points, which are respectively a nose 0, a left eye 1, a right eye 2, a left ear 3, a right ear 4, a left shoulder 5, a right shoulder 6, a left elbow 7, a right elbow 8, a left wrist 9, a right wrist 10, a left hip 11, a right hip 12, a left knee 13, a right knee 14, a left ankle 15, and a right ankle 16. The following examples of the present application will be described taking the target object as an example.
In this embodiment of the present application, the management device may use, as the target video image, a video image that includes imaging of the target object and has an optimal imaging effect of the target object in N frames of video images acquired at the same acquisition time, to be used for synthesizing a video stream corresponding to the target object. The definition of the moving picture of the target object in the synthesized video stream can be improved, so that the synthesized video stream better reflects the moving characteristic of the target object, and the moving characteristic of the target object can be analyzed.
Optionally, in this embodiment of the present application, taking an example that the target object reaches the photographing area of the first camera first and then reaches the photographing area of the second camera, a process of implementing the management apparatus to acquire the first imaging of the target object in the first video image and the second imaging of the target object in the second video image will be described, where the implementing process includes the following steps S11 to S14.
In step S11, the management apparatus acquires a first imaging of the target object in a first video image.
Alternatively, the first camera is the first camera to track the identified target object, and the management device may employ a target detection algorithm to identify the target object in the captured video image. The management device, after identifying the target object, may also assign a globally unique identification to the target object and use the identification to distinguish imaging of the target object in the video images of the respective cameras. Finally, based on the identification of the target object, the multi-camera tracking identification of the target object can be realized by unifying the global task mapping relation through the union algorithm idea. Alternatively, the first camera may not be a first camera for tracking and identifying the target object, i.e. the target object is moving from the shooting area of the other camera to the shooting area of the first camera, and the implementation process of the management device for first acquiring the image of the target object in the video image acquired by the first camera may refer to the implementation process of the management device for acquiring the second image of the target object in the second video image acquired by the second camera.
After the target object reaches the shooting area of the first camera and the management device acquires the imaging of the target object in the video image acquired by the first camera, the management device can determine the imaging of the target object in the video image acquired by the first camera later by adopting a target tracking algorithm in the process that the target object moves in the shooting area of the first camera.
In this embodiment of the present application, after the management device acquires the imaging of the target object in the video image, the management device may further determine the imaging position of each bone point of the target object, and encapsulate the imaging position of the bone point and the corresponding video image code together for subsequent analysis and use. Wherein the imaging position of the bone point can be represented by pixel coordinates.
In step S12, the management apparatus acquires a first imaging position of a first key point of the target object in the first video image.
Alternatively, the first keypoint of the target object may be derived based on one or more skeletal points of the target object. For example, the height of the hip of the human body is taken as a calibration height, and the middle point of the left hip and the right hip of the human body (namely, the center point of the human body) can be taken as a first key point. The first keypoint may refer broadly to one or more keypoints.
In step S13, the management apparatus determines a second imaging position of the first keypoint in the second video image from the first imaging position based on the pixel coordinate mapping relationship between the first camera and the second camera.
Optionally, the first camera and the second camera are two adjacent cameras of the M cameras. The management device may determine a second imaging location of the first keypoint in the second video image from the first imaging location based on the homography matrix of the first camera to the second camera. For example, referring to the example shown in fig. 3, the first camera is camera 1, the second camera is camera 2, the homography matrix of cameras 1 to 2 is H12, and the first imaging position is (x 1p ,y 1p ) Then the second imaging position (x 2p ,y 2p ) The method meets the following conditions: (x) 2p ,y 2p )=H12*(x 1p ,y 1p )。
Alternatively, the first camera and the second camera are not adjacent two of the M cameras. Assuming that there is also a third camera between the first camera and the second camera, the management device may determine a second imaging location of the first keypoint in the second video image from the first imaging location based on the homography matrices of the first camera to the third camera and the homography matrices of the third camera to the second camera. For example, referring to the example shown in fig. 3, the first camera is camera 1, the second camera is camera 3, the homography matrix of camera 1 to camera 2 is H12, the homography matrix of camera 2 to camera 3 is H23, and the first imaging position is (x' 1p ,y’ 1p ) Then the second imaging position (x' 2p ,y’ 2p ) The method meets the following conditions: (x' 2p ,y’ 2p )=H12*H23*(x’ 1p ,y’ 1p )。
In step S14, the management device determines a second imaging of the target object in the second video image from the second imaging position.
Alternatively, the management device may determine the human body detection frame according to the second imaging position, and take the human body imaging located in the human body detection frame in the second video image as the second imaging. Alternatively, the management apparatus may detect all the human body images in the second video image, and take as the second image the human body image whose imaging position of the first key point in the human body images is closest to the second imaging position. If the management apparatus detects that only one human body is imaged in the second video image, the human body can be directly imaged as the second imaging of the target object in the second video image without executing the above steps S12 to S14.
For example, fig. 6 is a schematic diagram of mapping pixel coordinates between two cameras according to an embodiment of the present application. As shown in fig. 6, the left image is a first video image, and the right image is a second video image. The first video image includes a first image of the target object. The first imaging position of the first key point p in the first video image is p1, and the second imaging position of the first key point p in the second video image, which is obtained based on the pixel coordinate mapping relation between the first camera and the second camera, is p2. Referring to fig. 6, a second imaging of the target object in the second video image may be determined based on the second imaging position p2.
Optionally, the target scene is a training or playing field of the athlete. There are typically multiple athletes in a target scene that are active, and thus multiple person images may be included in a video image captured by a single camera. But is limited by the field scale and the shooting angle, the camera cannot collect the face in many cases, so that tracking and identification of a single target are difficult to realize in a face recognition mode. In the embodiment of the application, by determining the pixel coordinate mapping relationship between two adjacent cameras in advance, when the target object moves to the common view area of the two adjacent cameras, the management device can realize cross-camera tracking identification of the target object according to the correlation of the imaging geometric positions of the target object in the video images acquired by the two adjacent cameras. The method and the device are independent of the unique characteristics of the target object, and can be suitable for various scenes through flexible deployment and calibration of the camera.
Step 403, the management device synthesizes the video stream corresponding to the target object according to the multi-frame target video images corresponding to the multiple acquisition moments.
The video stream of the target object is used to reflect the activity information of the target object in the target scene. Video images in the video stream corresponding to the target object are arranged according to time sequence.
Alternatively, the management apparatus may perform cropping processing on the target video image after acquiring the target video image such that imaging of the target object is located in a central area of the cropped video image. Accordingly, the implementation procedure of step 403 includes: based on a plurality of acquisition moments, the management equipment arranges the video images which are respectively subjected to cutting processing on a plurality of frames according to time sequence so as to obtain a video stream corresponding to the target object. Alternatively, the size of the cropping window may be set in advance, and the management apparatus performs cropping processing on the original video image with the imaging of the human body center point of the target object as the center of the cropping window. For example, fig. 7 is a schematic diagram of comparison before and after cropping of a video image according to an embodiment of the present application. As shown in fig. 7, the imaging of the target object in the cropped video image is located in the center region of the video image, and the cropped video image can better highlight the imaging of the target object than the video image before cropping. The management device may perform clipping processing on each acquired target video image, so that the imaging of the target object is in the central area in all video images of the finally synthesized video stream. The method can realize the focus following effect on the target object, and can also ensure that the display effect of the synthesized video stream is better, and the playing picture of the video stream is smoother and smoother, thereby improving the watching experience of users. In addition, the management apparatus may also perform smoothing filter processing or the like on the video image subjected to the cropping processing.
In the embodiment of the application, a plurality of cameras are fixedly arranged in a target scene, the shooting areas of the cameras are different, and the cameras can shoot clear video images aiming at different areas in the target scene. The management device selects one frame of imaged video image containing the target object from multiple frames of video images respectively acquired by multiple cameras at the same acquisition time to perform video synthesis, and as the cameras can respectively shoot clear video images of corresponding areas in the target scene, when the target object moves in the shooting areas of different cameras, the cameras can always acquire clear moving pictures of the target object, so that the synthesized video stream can provide clear pictures of the whole-course movement of the target object in the target scene, namely the definition of the moving pictures of the target object in the synthesized video stream is ensured. In addition, because the cameras are fixedly deployed, camera parameters can be preset according to the required shooting areas, and the camera parameters do not need to be adjusted in the shooting process, so that the implementation mode is simple.
Optionally, after the management device synthesizes the video stream of the target object, the following step 404 may also be performed.
Step 404, the management device outputs a video stream corresponding to the target object.
Optionally, the management device has a display function. The management device outputs the video stream corresponding to the target object, which may be that the management device displays a play screen of the video stream corresponding to the target object on the play interface. Optionally, the management device may further obtain an imaging position of a skeletal point of the target object in the target video image, and when the management device displays a play frame of the video stream corresponding to the target object on the play interface, the skeletal point of the target object may be displayed on the imaging of the target object in the play frame.
Optionally, the management device outputs the video stream corresponding to the target object, or the management device sends the video stream corresponding to the target object to the terminal, so that the terminal can display the playing picture of the video stream corresponding to the target object on the playing interface. For example, in response to receiving a play request from a terminal, the management apparatus transmits a video stream corresponding to a target object to the terminal. The play request may carry the identification of the target object.
For example, fig. 8 is a schematic diagram of a playing interface according to an embodiment of the present application. As shown in fig. 8, a play screen of the video stream corresponding to the target object is displayed on the play interface Z. Wherein a plurality of skeletal points are displayed on the imaging of the target object (only 9 skeletal points are shown as schematic illustrations).
In the embodiment of the application, when the management device synthesizes the video stream corresponding to the target object, the imaging position of the skeleton point of the target object and the corresponding video image code can be packaged together, so that when the playing picture of the video stream corresponding to the target object is displayed, the skeleton point of the target object can be displayed on the imaging of the target object in the playing picture, and the analysis of the activity condition of the target object is facilitated.
Optionally, after acquiring the skeleton data of the target object, the management device may further perform motion analysis on the target object based on the skeleton data, including but not limited to determining a motion trajectory of the target object, calculating a step number of the target object, calculating a displacement of the target object, calculating a motion speed of the target object, and the like. When the management device or the terminal displays the playing picture of the video stream corresponding to the target object, the real-time motion analysis result of the target object can be displayed in a superimposed manner on the playing picture, for example, the real-time motion track, the real-time step number, the real-time displacement, the real-time speed and the like of the target object are displayed in a superimposed manner on the playing picture, so that the motion analysis of the target object is further facilitated. Alternatively, the management apparatus may transmit the bone data of the target object to the analysis apparatus after acquiring the bone data of the target object, and the analysis apparatus may perform the motion analysis of the target object. That is, the synthesis of the video stream and the motion analysis of the target object may be performed by one device, or may be performed by a plurality of devices, which is not limited in the embodiment of the present application.
Optionally, the implementation process of determining the motion trail of the target object by the management device includes: the management device determines the horizontal position of the second key point under the world coordinate system according to the imaging position of the second key point of the target object in the target video image. And the management equipment generates a motion track of the target object according to the horizontal positions of the second key points respectively at a plurality of acquisition moments under the world coordinate system.
Alternatively, the second keypoint of the target object may be derived based on one or more skeletal points of the target object. For example, the ground is used as a calibration height, and the middle points of the left ankle and the right ankle of the human body are used as second key points. The second keypoint may refer broadly to one or more keypoints. The management device may determine a horizontal position of the second key point of the target object in the world coordinate system based on a transformation matrix of the image coordinate system of the camera acquiring the target video image to the two-dimensional world coordinate system according to an imaging position of the second key point in the target video image.
For example, fig. 9 is a schematic diagram of a motion trajectory of a target object according to an embodiment of the present application. As shown in fig. 9, the target object moves on the speed skating track, and the second key point of the target object has a two-dimensional horizontal coordinate (x t1 ,y t1 ) The two-dimensional horizontal coordinate at the acquisition time t2 is (x t2 ,y t2 ) The two-dimensional horizontal coordinate at the acquisition time t3 is (x t3 ,y t3 ) The two-dimensional horizontal coordinate at the acquisition time t4 is (x t4 ,y t4 ) The two-dimensional horizontal coordinate at the acquisition time t5 is (x t5 ,y t5 ) FinallyA motion track on a horizontal plane is obtained. Wherein the two-dimensional horizontal coordinates reflect the horizontal position in the world coordinate system.
Alternatively, the management apparatus may determine that the left and right ankle crosses each time as one step based on the video stream of the synthesized target object when calculating the number of steps of the target object, thereby realizing the calculation of the number of steps.
The sequence of the steps of the video synthesis method provided by the embodiment of the application can be properly adjusted, and the steps can be correspondingly increased or decreased according to the situation. Any method of modification, which is within the scope of the present disclosure, will be readily apparent to those skilled in the art, and is intended to be encompassed within the scope of the present disclosure. For example, the method and the device can be applied to the training scene of the athlete or the whole-course motion video of the athlete synthesized in the competition scene, and can also be applied to the emergency escape command scene, and the real-time video of each person is synthesized, so that the escape route can be formulated for the actual situation of the individual, and the escape probability can be improved. The method can also be applied to tourist attractions and used for synthesizing the whole-journey tourist video of tourists at the attractions. The target object can be an animal besides a human, and the scheme of the application can also be applied to animal protection scenes and the like. The application scenario of the method is not limited in this embodiment, and is not described here in detail. In addition, the camera deployed in the target scene can also be realized by a remote control unmanned aerial vehicle and the like.
In summary, in the video compositing method provided in the embodiments of the present application, by fixedly disposing a plurality of cameras in a target scene, the shooting areas of the cameras are different, and the cameras can respectively shoot clear video images for different areas in the target scene. The management device selects one frame of imaged video image containing the target object from multiple frames of video images respectively acquired by multiple cameras at the same acquisition time to perform video synthesis, and as the cameras can respectively shoot clear video images of corresponding areas in the target scene, when the target object moves in the shooting areas of different cameras, the cameras can always acquire clear moving pictures of the target object, so that the synthesized video stream can provide clear pictures of the whole-course movement of the target object in the target scene, namely the definition of the moving pictures of the target object in the synthesized video stream is ensured. In addition, because the cameras are fixedly deployed, camera parameters can be preset according to the required shooting areas, and the camera parameters do not need to be adjusted in the shooting process, so that the implementation mode is simple. The management device can track and identify the target object across cameras according to the correlation of imaging geometric positions of the target object in video images acquired by the two adjacent cameras when the target object moves to the common view area of the two adjacent cameras by pre-determining the pixel coordinate mapping relation between the two adjacent cameras. The method and the device are independent of the unique characteristics of the target object, and can be suitable for various scenes through flexible deployment and calibration of the camera.
Fig. 10 is a schematic structural diagram of a management device according to an embodiment of the present application. The management device may be the management device 102 in a video composition system as shown in fig. 1. As shown in fig. 10, the management apparatus 1000 includes:
the first acquiring module 1001 is configured to acquire N frames of video images acquired at each of a plurality of acquisition moments by N cameras deployed in a target scene, where N is greater than or equal to 2.
The second obtaining module 1002 is configured to obtain a frame of target video image from N frames of video images corresponding to each capturing moment, where the target video image includes imaging of a target object.
The video composition module 1003 is configured to compose a video stream corresponding to the target object according to multiple frames of target video images corresponding to multiple acquisition moments, where the video stream is used to reflect the activity information of the target object in the target scene.
Optionally, a second obtaining module 1002 is configured to: and acquiring all the video images to be selected including the imaging of the target object in the N frames of video images corresponding to each acquisition time. And acquiring target video images from all the video images to be selected.
Optionally, the N cameras include a first camera and a second camera, the first camera and the second camera having a common view area. A second obtaining module 1002, configured to: when a target object is located in a common view area of a first camera and a second camera at a first acquisition time, a first video image acquired by the first camera at the first acquisition time and a second video image acquired by the second camera at the first acquisition time are used as to-be-selected video images corresponding to the first acquisition time.
Optionally, a second obtaining module 1002 is configured to: a first imaging of a target object in a first video image and a second imaging of the target object in a second video image are acquired. And responding to the imaging effect of the first imaging being better than that of the second imaging, and taking the first video image as a target video image corresponding to the first acquisition moment.
Optionally, the imaging effect of the first imaging is better than the imaging effect of the second imaging, satisfying one or more of the following conditions: the imaging area of the first imaging is larger than the imaging area of the second imaging. The first imaging includes a greater number of bone points than the second imaging includes. The confidence of the bone data of the first imaging is greater than the confidence of the bone data of the second imaging.
Optionally, a second obtaining module 1002 is configured to: after a first imaging of the target object in the first video image is acquired, a first imaging location of a first keypoint of the target object in the first video image is acquired. And determining a second imaging position of the first key point in the second video image according to the first imaging position based on the pixel coordinate mapping relation between the first camera and the second camera. A second imaging of the target object in the second video image is determined from the second imaging position.
Optionally, M cameras are disposed in the target scene, any two adjacent cameras in the M cameras have a common view area, M is greater than or equal to N, the N cameras belong to the M cameras, a plurality of homography matrices are stored in the management device, and each homography matrix is used for reflecting a pixel coordinate mapping relationship between a group of two adjacent cameras in the M cameras.
Optionally, as shown in fig. 11, the management apparatus 1000 further includes: the image processing module 1004 is configured to perform cropping processing on the target video image, so that imaging of the target object is located in a central area of the cropped video image. The video composition module 1003 is configured to arrange the video images that are respectively subjected to clipping processing on the multiple frames according to the time sequence based on the multiple acquisition moments, so as to obtain a video stream.
Optionally, as shown in fig. 12, the management apparatus 1000 further includes: a determining module 1005 is configured to determine a horizontal position of the second keypoint under the world coordinate system according to an imaging position of the second keypoint of the target object in the target video image. The track generation module 1006 is configured to generate a motion track of the target object according to horizontal positions of the second key points in the world coordinate system at a plurality of acquisition moments.
Optionally, as shown in fig. 13, the management apparatus 1000 further includes: a third obtaining module 1007 is configured to obtain an imaging position of a bone point of the target object in the target video image. And the display module 1008 is used for displaying a playing picture of the video stream on the playing interface, and skeleton points of the target object are displayed on imaging of the target object in the playing picture.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The embodiment of the application also provides a video synthesis system, which comprises: a management device and a plurality of cameras. The camera is used for capturing video images and the management device is used for performing the method steps as shown in fig. 4.
Fig. 14 is a block diagram of a management apparatus provided in an embodiment of the present application. As shown in fig. 14, the management apparatus 1400 includes: a processor 1401 and a memory 1402.
A memory 1402 for storing a computer program, the computer program comprising program instructions;
a processor 1401 for invoking said computer program, implementing the method steps as shown in fig. 4 in the method embodiment described above.
Optionally, the management device 1400 further comprises a communication bus 1403 and a communication interface 1404.
Wherein the processor 1401 comprises one or more processing cores, the processor 1401 performs various functional applications and data processing by running computer programs.
Memory 1402 may be used to store a computer program. Optionally, the memory may store an operating system and at least one application unit required for functionality. The operating system may be a real-time operating system (Real Time eXecutive, RTX), LINUX, UNIX, WINDOWS, or an operating system such as OS X.
The communication interface 1404 may be multiple, and the communication interface 1404 is used to communicate with other devices. For example, in the embodiment of the present application, the communication interface of the management device 1400 may be used for the terminal to transmit a video stream.
The memory 1402 and the communication interface 1404 are connected to the processor 1401 via the communication bus 1403, respectively.
Embodiments of the present application also provide a computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the method steps as shown in fig. 4.
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method steps as shown in fig. 4.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
In the present embodiments, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The term "and/or" in this application is merely an association relation describing an associated object, and indicates that three relations may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The foregoing description of the preferred embodiments is merely exemplary in nature and is in no way intended to limit the invention, its application, to the form and details of construction and the arrangement of the preferred embodiments, and thus, any and all modifications, equivalents, and alternatives falling within the spirit and principles of the present application.

Claims (24)

1. A video compositing method, applied to a management device, the method comprising:
N frames of video images acquired respectively at each acquisition moment of a plurality of acquisition moments of N cameras deployed in a target scene are acquired, wherein N is more than or equal to 2;
acquiring a frame of target video image from N frames of video images corresponding to each acquisition time, wherein the target video image comprises imaging of a target object;
and synthesizing a video stream corresponding to the target object according to the multi-frame target video images corresponding to the plurality of acquisition moments, wherein the video stream is used for reflecting the activity information of the target object in the target scene.
2. The method according to claim 1, wherein the acquiring a frame of the target video image from the N frames of video images corresponding to each acquisition time comprises:
acquiring all the video images to be selected, which comprise imaging of the target object, in the N frames of video images corresponding to each acquisition moment;
and acquiring the target video image from all the video images to be selected.
3. The method of claim 2, wherein the N cameras include a first camera and a second camera, the first camera and the second camera have a common view area, and the acquiring all candidate video images including the imaging of the target object in the N frames of video images corresponding to each acquisition time includes:
When the target object is located in a common view area of the first camera and the second camera at a first acquisition time, a first video image acquired by the first camera at the first acquisition time and a second video image acquired by the second camera at the first acquisition time are used as to-be-selected video images corresponding to the first acquisition time.
4. A method according to claim 3, wherein said obtaining said target video image from said all candidate video images comprises:
acquiring a first imaging of the target object in the first video image and a second imaging of the target object in the second video image;
and responding to the imaging effect of the first imaging being better than that of the second imaging, and taking the first video image as a target video image corresponding to the first acquisition moment.
5. The method of claim 4, wherein the imaging effect of the first imaging is superior to the imaging effect of the second imaging, satisfying one or more of the following conditions:
the imaging area of the first imaging is larger than the imaging area of the second imaging;
The first imaging includes a greater number of bone points than the second imaging includes;
the confidence of the first imaged bone data is greater than the confidence of the second imaged bone data.
6. The method of claim 4 or 5, wherein acquiring a second image of the target object in the second video image comprises:
after acquiring a first imaging of the target object in the first video image, acquiring a first imaging position of a first key point of the target object in the first video image;
determining a second imaging position of the first key point in the second video image according to the first imaging position based on a pixel coordinate mapping relation between the first camera and the second camera;
determining the second imaging of the target object in the second video image according to the second imaging position.
7. The method of claim 6, wherein M cameras are deployed in the target scene, any two adjacent cameras in the M cameras have a common view area, M is greater than or equal to N, the N cameras belong to the M cameras, and a plurality of homography matrices are stored in the management device, and each homography matrix is used for reflecting a pixel coordinate mapping relationship between a group of two adjacent cameras in the M cameras.
8. The method of any one of claims 1 to 7, wherein after acquiring the target video image, the method further comprises:
clipping the target video image to enable the imaging of the target object to be located in the central area of the clipped video image;
the synthesizing the video stream corresponding to the target object according to the multi-frame target video image corresponding to the plurality of acquisition moments comprises the following steps:
based on the plurality of acquisition moments, arranging the video images which are respectively subjected to cutting processing on a plurality of frames according to time sequence, so as to obtain the video stream.
9. The method according to any one of claims 1 to 8, further comprising:
determining the horizontal position of a second key point of the target object under a world coordinate system according to the imaging position of the second key point in the target video image;
and generating the motion trail of the target object according to the horizontal positions of the second key points under the world coordinate system at the plurality of acquisition moments respectively.
10. The method of any one of claims 1 to 9, wherein after acquiring the target video image, the method further comprises:
Acquiring an imaging position of a bone point of the target object in the target video image;
and displaying a playing picture of the video stream on a playing interface, wherein skeleton points of the target object are displayed on imaging of the target object in the playing picture.
11. A management apparatus, characterized in that the management apparatus comprises:
the first acquisition module is used for acquiring N frames of video images acquired respectively at each acquisition moment of a plurality of acquisition moments of N cameras deployed in a target scene, wherein N is more than or equal to 2;
the second acquisition module is used for acquiring a frame of target video image from the N frames of video images corresponding to each acquisition moment, wherein the target video image comprises imaging of a target object;
and the video synthesis module is used for synthesizing a video stream corresponding to the target object according to the multi-frame target video images corresponding to the plurality of acquisition moments, and the video stream is used for reflecting the activity information of the target object in the target scene.
12. The management device of claim 11, wherein the second acquisition module is configured to:
acquiring all the video images to be selected, which comprise imaging of the target object, in the N frames of video images corresponding to each acquisition moment;
And acquiring the target video image from all the video images to be selected.
13. The management device of claim 12, wherein the N cameras comprise a first camera and a second camera, the first camera and the second camera having a common view area, the second acquisition module to:
when the target object is located in a common view area of the first camera and the second camera at a first acquisition time, a first video image acquired by the first camera at the first acquisition time and a second video image acquired by the second camera at the first acquisition time are used as to-be-selected video images corresponding to the first acquisition time.
14. The management device of claim 13, wherein the second acquisition module is configured to:
acquiring a first imaging of the target object in the first video image and a second imaging of the target object in the second video image;
and responding to the imaging effect of the first imaging being better than that of the second imaging, and taking the first video image as a target video image corresponding to the first acquisition moment.
15. The management device of claim 14, wherein an imaging effect of the first imaging is better than an imaging effect of the second imaging, satisfying one or more of the following conditions:
the imaging area of the first imaging is larger than the imaging area of the second imaging;
the first imaging includes a greater number of bone points than the second imaging includes;
the confidence of the first imaged bone data is greater than the confidence of the second imaged bone data.
16. The management device according to claim 14 or 15, wherein the second acquisition module is configured to:
after acquiring a first imaging of the target object in the first video image, acquiring a first imaging position of a first key point of the target object in the first video image;
determining a second imaging position of the first key point in the second video image according to the first imaging position based on a pixel coordinate mapping relation between the first camera and the second camera;
determining the second imaging of the target object in the second video image according to the second imaging position.
17. The management apparatus according to claim 16, wherein M cameras are disposed in the target scene, any two adjacent cameras among the M cameras have a common view area, M being greater than or equal to N, the N cameras belong to the M cameras, and a plurality of homography matrices are stored in the management apparatus, each homography matrix being used for reflecting a pixel coordinate mapping relationship between a group of two adjacent cameras among the M cameras.
18. A management device according to any one of claims 11 to 17, wherein the management device further comprises:
the image processing module is used for clipping the target video image so that the imaging of the target object is positioned in the central area of the clipped video image;
the video synthesis module is used for arranging the video images which are respectively subjected to cutting processing on the multiple frames according to the time sequence based on the multiple acquisition moments so as to obtain the video stream.
19. A management device according to any one of claims 11 to 18, wherein the management device further comprises:
the determining module is used for determining the horizontal position of the second key point under a world coordinate system according to the imaging position of the second key point of the target object in the target video image;
And the track generation module is used for generating the motion track of the target object according to the horizontal positions of the second key points under the world coordinate system at the plurality of acquisition moments.
20. A management device according to any one of claims 11 to 19, wherein the management device further comprises:
the third acquisition module is used for acquiring the imaging position of the skeleton point of the target object in the target video image;
and the display module is used for displaying a playing picture of the video stream on a playing interface, and skeleton points of the target object are displayed on imaging of the target object in the playing picture.
21. A video compositing system, comprising: a management device for capturing video images and a plurality of cameras for performing the video compositing method according to any of claims 1 to 10.
22. A management apparatus, characterized by comprising: a processor and a memory;
the memory is used for storing a computer program, and the computer program comprises program instructions;
the processor is configured to invoke the computer program to implement the video compositing method according to any of claims 1-10.
23. A computer readable storage medium having instructions stored thereon which, when executed by a processor, implement the video compositing method of any of claims 1-10.
24. A computer program product comprising a computer program which, when executed by a processor, implements the video compositing method of any of claims 1-10.
CN202210022166.8A 2022-01-10 2022-01-10 Video synthesis method, device and system Pending CN116456039A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210022166.8A CN116456039A (en) 2022-01-10 2022-01-10 Video synthesis method, device and system
PCT/CN2023/071293 WO2023131327A1 (en) 2022-01-10 2023-01-09 Video synthesis method, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210022166.8A CN116456039A (en) 2022-01-10 2022-01-10 Video synthesis method, device and system

Publications (1)

Publication Number Publication Date
CN116456039A true CN116456039A (en) 2023-07-18

Family

ID=87073279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210022166.8A Pending CN116456039A (en) 2022-01-10 2022-01-10 Video synthesis method, device and system

Country Status (2)

Country Link
CN (1) CN116456039A (en)
WO (1) WO2023131327A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4700476B2 (en) * 2005-11-10 2011-06-15 日本放送協会 Multi-view video composition device and multi-view video composition system
JP5476900B2 (en) * 2009-09-30 2014-04-23 カシオ計算機株式会社 Image composition apparatus, image composition method, and program
JP2017011598A (en) * 2015-06-25 2017-01-12 株式会社日立国際電気 Monitoring system
CN113115110B (en) * 2021-05-20 2022-04-08 广州博冠信息科技有限公司 Video synthesis method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
WO2023131327A1 (en) 2023-07-13

Similar Documents

Publication Publication Date Title
JP7132730B2 (en) Information processing device and information processing method
KR101295471B1 (en) A system and method for 3D space-dimension based image processing
ES2790885T3 (en) Real-time object tracking and motion capture at sporting events
CN110544301A (en) Three-dimensional human body action reconstruction system, method and action training system
WO2019111817A1 (en) Generating device, generating method, and program
JP2010504711A (en) Video surveillance system and method for tracking moving objects in a geospatial model
JP2016213808A (en) Camera selecting method and video distribution system
CN106023241A (en) Image processing device, image processing method, and image processing system
WO2003036565A2 (en) System and method for obtaining video of multiple moving fixation points within a dynamic scene
CN110544302A (en) Human body action reconstruction system and method based on multi-view vision and action training system
Oskiper et al. Augmented reality binoculars
JP2020086983A (en) Image processing device, image processing method, and program
CN110622215A (en) Three-dimensional model generation device, generation method, and program
JP2021060868A (en) Information processing apparatus, information processing method, and program
JP2008194095A (en) Mileage image generator and generation program
WO2022023142A1 (en) Virtual window
US20160127617A1 (en) System for tracking the position of the shooting camera for shooting video films
JPH11339139A (en) Monitoring device
CA2393803C (en) Method and apparatus for real time insertion of images into video
JP2023100805A (en) Imaging apparatus, imaging method, and imaging program
CN116456039A (en) Video synthesis method, device and system
US20230328355A1 (en) Information processing apparatus, information processing method, and program
JP2020067815A (en) Image processing system, image processing method, and program
Kaur et al. Computer vision and sensor fusion for efficient hybrid tracking in augmented reality systems
JP2009519539A (en) Method and system for creating event data and making it serviceable

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination