WO2021036275A1 - 多路视频同步的方法、系统及设备 - Google Patents

多路视频同步的方法、系统及设备 Download PDF

Info

Publication number
WO2021036275A1
WO2021036275A1 PCT/CN2020/084356 CN2020084356W WO2021036275A1 WO 2021036275 A1 WO2021036275 A1 WO 2021036275A1 CN 2020084356 W CN2020084356 W CN 2020084356W WO 2021036275 A1 WO2021036275 A1 WO 2021036275A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frame
channel
target
channels
Prior art date
Application number
PCT/CN2020/084356
Other languages
English (en)
French (fr)
Inventor
陈恺
杨少鹏
冷继南
李宏波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911209316.0A external-priority patent/CN112449152B/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021036275A1 publication Critical patent/WO2021036275A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising

Definitions

  • This application relates to the field of Artificial Intelligence (AI), and in particular to methods, systems and equipment for multi-channel video synchronization.
  • AI Artificial Intelligence
  • IPC IP Camera/Network Camera
  • the panoramic video is obtained by splicing videos sent by multiple IPCs with different shooting angles. Then the received video of multiple IPCs at different angles also needs to be time synchronized multi-channel video. , Otherwise the stitched panoramic video will have defects such as blurred images and motion ghosts. Therefore, in order to ensure the accuracy of subsequent video processing, it is necessary to ensure the time synchronization of multiple channels of video.
  • This application provides a method, system, and equipment for synchronizing multiple channels of video, which can solve the problem of obstacles in video processing caused by time non-synchronization of multiple channels of video.
  • a method for multi-channel video synchronization includes the following steps:
  • N channels of video are obtained by shooting a geographic area by N cameras, and N is an integer not less than 2;
  • the synchronization frame information is used to synchronize the time of the video shot by the N cameras, and the synchronization frame information includes N time synchronized video frames. Location information in the corresponding video.
  • the synchronization frame information of the N-channel video is determined by calculating the similarity between the geographic coordinates of the target in the video frame of each of the N-channel videos, and the synchronization frame information can be used to generate multiple synchronization videos.
  • the N channels of videos are video streams
  • the above method further includes: obtaining N channels of time-synchronized videos according to the position information of the N time-synchronized video frames in the corresponding videos ,
  • the start video frame of each channel of time-synchronized video is the time-synchronized video frame in the channel of video.
  • the N channels of time-synchronized video streams can be sent to a display device for displaying N channels of synchronized video, such as a monitoring center or The studio’s display screen allows it to directly display real-time monitoring of synchronized playback.
  • the above method further includes: sending synchronization frame information to other devices; or sending N channels of time-synchronized videos to other devices.
  • the above implementations enable this application to send the synchronized N channels of video or synchronized frame information to the required processing system or processing equipment according to different application scenarios, and it is not only applicable to those that require N channels of synchronized video to be displayed.
  • Application scenarios such as monitoring centers and studios, and can be applied to frame-level application scenarios, such as panoramic video production, video splicing, and target detection. It can also be applied to application scenarios that require processing of N time-synchronized video frames, such as panoramic video production, video splicing, and target detection. Therefore, the application scenarios are very wide.
  • obtaining the geographic coordinates of the target in the video frame of each of the N channels of videos includes: inputting the video frame of each channel into the target detection model to obtain each channel of video The pixel coordinates of the target in the video frame of each channel of video; determine the geographic coordinates of the target in the video frame of each channel of video according to the pixel coordinates of the target in the video frame of each channel of video and the calibration parameters of the camera corresponding to each channel of video, Among them, the calibration parameters of the camera are used to indicate the mapping relationship between the video screen shot by the camera and the geographic area being shot.
  • the N cameras may be spatially calibrated first; then, the target detection model is input to each video frame of each video to obtain the output result image corresponding to each video frame, wherein the output result image Contains a bounding box, the bounding box is used to indicate the position of the target in the image; then according to the output result image corresponding to each video frame, obtain the pixel coordinates of the target in each video frame; finally according to the The calibration parameters and the pixel coordinates of the target in each video frame are described to obtain the geographic coordinates of the target in each video frame.
  • determining the similarity between the video frames of different channels of video includes: calculating the video frame of each channel of video The distance between the geographic coordinates of the target and the geographic coordinates of the target in the video frames of other road videos; the similarity between the video frames of different road videos is determined according to the distance.
  • calculating geographic coordinates between the first video frame in the video and a second video P i of a video frame Q i similarity of specific processes may include: First, the video frame of the target P i A 1 geographical coordinates Q i with the video frame in the target B 1, B 2, ..., a distance between the geographic coordinates of B W D 11, D 12, ..., D 1w, calculates the target video frame P a geographic 2 coordinate with the video frame Q i in the target B 1, B 2, ..., the distance D between the geographical coordinates of B W 21, D 22, ..., D 2w, ..., calculation target video frames P i of a w geographical coordinates of the I and Q video target frame B 1, B 2, ..., D w1 distance between the geographical coordinates of B W, D w2, ..., D ww , wherein the video frame with the video frame P i Q i is the video frame in different videos; secondly, get the minimum value D 1 between the distances D 11 , D 12 ,...,D 1w
  • the pixel coordinates of the target in each video frame are obtained, and then the geographic coordinates of the target in each frame are obtained according to the calibration parameters, so that the geographic coordinates of the target in the video frame are obtained.
  • Determine the similarity between the video frames of different channels of video and then determine the synchronization frame information of the N channels of video. Therefore, the overall process of determining the synchronization frame information of N channels of video does not require additional hardware devices such as acquisition equipment, video capture devices, etc., and does not limit the type of IPC, network environment, and transmission protocol, and the overall versatility and robustness of the solution Well, it can be deployed completely in the form of software, and it can be applied to frame-level application scenarios.
  • the above method further includes: determining a common view area, the common view area being an area jointly captured by the N cameras, and the common view area being a part of the geographic area Or all; according to the geographic coordinates of the target in the video frame of each channel of video, determine the similarity between the video frames in different channels of video, including: according to the common view area recorded in the video frame of each channel of video The geographic coordinates of the target determine the similarity between video frames in different video channels.
  • the above implementation method performs secondary processing on the geographic coordinates of the target in each video frame to filter out the geographic coordinates of the target in the common view area of each video frame, which can greatly reduce the amount of calculation of geographic coordinate similarity and increase the amount of acquisition.
  • the processing efficiency of the channel video synchronization method is the same.
  • obtaining synchronization frame information according to the similarity between video frames in different channels of video includes: based on the similarity between the geographic coordinates of the target in the multiple video frames, Calculate the synchronization frame information between every two videos in the N-channel video; determine the frame number relationship of the two frames synchronized in time of every two videos according to the synchronization frame information between every two videos; The frame number relationship between the two frames of the time synchronization of the channel video, and the frame number relationship between the N video frames of the time synchronization of the N channel video; the frame number between the N video frames synchronized according to the time of the N channel video Relationship, determine the synchronization frame information of N channels of video.
  • the above implementation method determines the frame number relationship between each two channels of video by determining the synchronization frame information between each two channels of video, and then determines the frame number relationship between the N channels of video, so as to obtain the synchronization frame between the N channels of video information. Since the synchronization frame information between 2 channels of video is calculated each time, the calculation pressure on the server is very small, so it is very suitable for deployment on servers with low computing performance. For example, edge computing all-in-one machines deployed on both sides of the road can calculate the synchronization frame information of multiple IPCs at the intersection without occupying too much computing resources of the edge computing all-in-one machine.
  • obtaining synchronization frame information includes: extracting one video frame from each channel of the N channels of video Form 1 video frame group to obtain t N video frame groups; determine the sum of the similarity between the geographic coordinates of the targets in every two video frames in each video frame group; according to the smallest video The frame number of each frame in the frame group determines the same frame information of the N channels of video.
  • the above implementation method determines the synchronization frame information of N channels of video according to the similarity between video frames in different channels of video, which is suitable for servers with higher computing performance, such as cloud servers, and can reduce the calculation time of the multi-channel video synchronization method. Improve the efficiency of multi-channel video synchronization.
  • multiple channels of synchronized video are obtained by calculating the synchronization frame information of multiple channels of video, and the obtained multiple channels of synchronized video are multi-channel videos synchronized at the video frame level. Therefore, the application scenarios are more extensive and not only applicable to second-level application scenarios, such as
  • the video synchronization display in the monitoring center can also be applied to frame-level application scenarios, such as panoramic video production, video splicing, and target detection.
  • a multi-channel video synchronization system which is characterized in that the system includes an input unit and a calculation unit, wherein:
  • the input unit is used to obtain N channels of video, the N channels of video are obtained by shooting a geographic area by N cameras, and the N is an integer not less than 2;
  • the calculation unit is used to obtain the geographic coordinates of the target in the video frame of each channel of the N videos, and determine the video in the different channels of video according to the geographic coordinates of the target in the video frame of each channel of the video The similarity between frames;
  • the calculation unit is used to obtain synchronization frame information according to the similarity between the video frames in the different channels of video, wherein the synchronization frame information is used to synchronize the time of the video shot by the N cameras, so
  • the synchronization frame information includes position information of N time-synchronized video frames in the corresponding video.
  • the N channels of videos are video streams
  • the system further includes an output unit configured to display the corresponding video frames in the corresponding video frames according to the N time-synchronized video frames.
  • the start video frame of each channel of time-synchronized video is the time-synchronized video frame in each channel of video.
  • the output unit is further configured to send the synchronization frame information to other devices; or, the output unit is further configured to send the N channels of time-synchronized videos to other devices. equipment.
  • the calculation unit is configured to input the video frame of each channel of video into the target detection model to obtain the pixel coordinates of the target in the video frame of each channel;
  • the calculation unit is used to determine the geographic coordinates of the target in the video frame of each video according to the pixel coordinates of the target in the video frame of each video and the calibration parameters of the camera corresponding to each video, wherein ,
  • the calibration parameter of the camera is used to indicate the mapping relationship between the video screen shot by the camera and the geographic area being shot.
  • the calculation unit is configured to determine a common-view area, the common-view area is an area jointly captured by the N cameras, and the common-view area is the geographic area. Part or all of the area; the calculation unit is configured to determine the similarity between video frames in different channels of video according to the geographic coordinates of the target in the common view area recorded in the video frames of each channel of video.
  • the calculation unit is configured to calculate the difference between the geographic coordinates of the target in the video frame of each channel of video and the geographic coordinates of the target in the video frame of other channels of video. Distance; the calculation unit is used to determine the similarity between video frames of different videos according to the distance.
  • a computer program product including a computer program, and when the computer program is read and executed by a computing device, the method described in the first aspect is implemented.
  • a computer-readable storage medium including instructions, which, when the instructions run on a computing device, cause the computing device to implement the method described in the first aspect.
  • an electronic device including a processor and a memory, and the processor executes the code in the memory to implement the method described in the first aspect.
  • FIG. 1A is a schematic diagram of the deployment of a multi-channel video synchronization system provided by this application;
  • FIG. 1B is a schematic diagram of the deployment of another multi-channel video synchronization system provided by this application.
  • Figure 2 is a schematic structural diagram of a multi-channel video synchronization system provided by the present application.
  • Fig. 3 is a schematic flow chart of a method for synchronizing multiple channels provided by the present application.
  • FIG. 4 is a schematic flowchart of a method for obtaining geographic coordinates of targets in multiple video frames provided by the present application
  • FIG. 5 is a schematic diagram of a common view area of two videos in an application scenario provided by this application.
  • FIG. 6 is a schematic flowchart of a method for acquiring a shooting range provided by the present application.
  • FIG. 7 is a schematic flowchart of another method for acquiring a shooting range provided by the present application.
  • FIG. 8 is a schematic diagram of a process for obtaining the similarity between geographic coordinates of targets in two video frames provided by the present application
  • FIG. 9 is a schematic diagram of a process for obtaining synchronization frame information of two channels of video provided by the present application.
  • FIG. 10 is a schematic diagram of a process for obtaining synchronization frame information of N channels of video provided by the present application.
  • FIG. 11 is a schematic diagram of another process for obtaining synchronization frame information of N channels of videos provided by the present application.
  • 12A-12B are schematic flowcharts of obtaining N channels of synchronized videos according to synchronization frame information in an application scenario provided by this application;
  • FIG. 13 is a schematic diagram of the structure of an electronic device provided by the present application.
  • IPC IP Camera/Network Camera
  • the video frame at time T1 in the video obtained by IPC1 is the picture of the pedestrian's right foot just stepping on the zebra crossing. If the video frame at time T1 in the video obtained by IPC2 is not The pedestrian’s right foot is just stepping on the zebra crossing, but the pedestrian has not stepped on the zebra crossing, or the pedestrian has already stepped on the zebra crossing. Then IPC1 and IPC2 are two videos that are out of sync. It should be understood that the above examples are only for illustration and cannot constitute a specific limitation.
  • IPC1 and IPC2 are two surveillance videos at the same intersection. Since IPC1 captured the red-lighted vehicle at T1, the video frame corresponding to the captured time T1 in the real-time video stream transmitted by IPC1 was lost, and IPC2 did not. There was no frame loss during the capture.
  • the real-time video stream sent by IPC1 and IPC2 received by the processing system was 1 frame faster than IPC1 at T1, which further caused the processing system to perform processing based on the received multi-channel video.
  • There are obstacles in the process of video processing such as target recognition and panoramic video production.
  • this application provides a multi-channel video synchronization system.
  • the synchronization frame information of the multi-channel video can be calculated according to the content of the video frame of each video in the multi-channel video, so as to obtain the time-synchronized multi-channel video.
  • the multi-channel video synchronization system provided in this application is flexible in deployment and can be deployed in an edge environment, specifically, it may be an edge computing device in the edge environment or a software system running on one or more edge computing devices.
  • Edge environment refers to a cluster of edge computing devices that are geographically close to the IPC used to collect multiple channels of video and used to provide computing, storage, and communication resources, such as edge computing all-in-one machines located on both sides of the road.
  • the multi-channel video synchronization system may be an edge computing all-in-one machine located close to the intersection or a software system running on the edge computing all-in-one machine located close to the intersection.
  • Two network cameras, IPC1 and IPC2 are installed at the intersection to monitor the intersection.
  • Each IPC can send the real-time video stream of the intersection to the multi-channel video synchronization system through the network, and the multi-channel video synchronization system can execute this
  • the method for multi-channel video synchronization provided by the application is calculated to calculate the synchronization frame information of the multi-channel video.
  • the synchronization frame information can be used for multi-channel IPC correction, or synchronous playback of monitoring and playback platforms, or for panoramic video production , Multi-eye detection, etc., the multi-channel video synchronization system can send the synchronization frame information to the corresponding processing system according to the application scenario.
  • the multi-channel video synchronization system provided by this application can also be deployed in a cloud environment, which is an entity that uses basic resources to provide cloud services to users in a cloud computing mode.
  • the cloud environment includes a cloud data center and a cloud service platform.
  • the cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by a cloud service provider.
  • the computing resources included in the cloud data center can be a large number of computing resources.
  • Device for example, server).
  • the multi-channel video synchronization system can be a server in a cloud data center; the detection device can also be a virtual machine created in a cloud data center; it can also be a server or a software system deployed on a virtual machine in a cloud data center.
  • a multi-channel video synchronization system is deployed in a cloud environment.
  • Two network cameras, IPC1 and IPC2 are installed at the intersection to monitor the intersection.
  • Each IPC can send the real-time video stream of the intersection through the network.
  • the multi-channel video synchronization system can execute the multi-channel video synchronization method provided in this application, and calculate the synchronization frame information of the multi-channel video.
  • the synchronization frame information can be used for multi-channel video synchronization.
  • the multi-channel video synchronization system can send the synchronization frame information to the corresponding processing system according to the application scenario, and receive the synchronization frame information.
  • the processing system can also be deployed in a cloud environment, an edge environment, or in a terminal device.
  • the unit modules in the multi-channel video synchronization system can also be divided into various types. Each module can be a software module, a hardware module, or a part of a software module and a part of a hardware module, which is not limited in this application.
  • FIG. 2 is an exemplary division method. As shown in FIG. 2, the multi-channel video synchronization system 100 includes an input unit 110, a calculation unit 120 and an output unit 130. The function of each functional unit is introduced below.
  • the input unit 110 is configured to receive N channels of video and input the N channels of video to the computing unit 120.
  • the input unit 110 may be used to obtain N channels of video, the N channels of video being obtained by shooting a geographic area by N cameras, and the N is an integer not less than 2.
  • the N-channel video may be multiple videos obtained by multiple IPCs from the same angle shooting the same geographic area, or multiple IPCs from different angles shooting multiple videos from the same geographic area.
  • the N-channel video may be multiple live videos input by the IPC at the monitoring site, or may be a local file or an offline video read from a cloud storage server, which is not specifically limited in this application.
  • the scene can be any scene that needs to be adjusted to the synchronized playback of multiple IPC backhauled videos in the monitored target area, such as traffic intersections, banks, communities, hospitals, data centers, schools, examination rooms, studios and other scenes. This application also There is no specific limitation on this.
  • the calculation unit 120 is used to process N channels of video to obtain synchronization frame information of the N channels of video.
  • the calculation unit 120 is configured to detect the target in the video frame of each channel of video, and obtain the geographic coordinates of the target in the video frame of each channel of video.
  • the geographic coordinates of the target indicate that the target is The location of the geographic area;
  • the calculation unit 120 is configured to determine the similarity between video frames in different channels of video according to the geographic coordinates of the target in the video frame of each channel of the video;
  • the calculation unit 120 uses According to the similarity between the video frames in the different channels of video, the synchronization frame information of the N channels of video is obtained, wherein the synchronization frame information includes the positions of the N time-synchronized video frames in the corresponding video information.
  • N time-synchronized video frames describe scenes that occur at the same time, and the position information of the N time-synchronized video frames in the corresponding video may be the frame number of the N time-synchronized video frames in the corresponding video.
  • the output unit 130 may directly transmit the synchronization frame information group to processing systems of different application scenarios, or may process N channels of video according to the synchronization frame information to obtain time-synchronized N channels of synchronized videos, and then transfer the N-channel synchronous video is transmitted to the corresponding processing system.
  • the output unit 130 is configured to perform time synchronization on the N channels of video according to the synchronization frame information to obtain time synchronized N channels of video.
  • the N channels of video are real-time video streams, and the output unit is configured to determine that the time-synchronized video frame in each channel of video is a video according to the position information of the N time-synchronized video frames in the corresponding video.
  • the starting video frame of the video frame, N channels of video after time synchronization are obtained.
  • the output unit 130 is used to send the synchronization frame information to other devices or systems; or, the output unit 130 is used to send the time synchronized N channels of videos to other devices or systems.
  • the output unit 130 may directly return the synchronization frame information to each IPC, so that each IPC adjusts its output video timing according to the synchronization frame information;
  • the output unit 130 can obtain multiple channels of synchronized video according to the synchronized frame information, and then send it to the display screen of the monitoring center, so that the monitoring center can directly display the synchronized playback of real-time monitoring;
  • the output unit 130 can directly send the synchronization frame information to the target detection server, so that it can determine N time-synchronized video frames according to the synchronization frame information, and perform target detection on the N time-synchronized video frames .
  • the multi-channel video synchronization system provided by this application performs multi-channel video synchronization according to the video content without any additional hardware devices such as acquisition equipment and video capture devices. It does not limit the type of IPC, network environment and transmission protocol, and the overall solution is universal Better performance and robustness.
  • the multi-channel video synchronization system provided by the present application obtains multi-channel synchronous videos by calculating the synchronization frame information of the multi-channel videos, and the obtained multi-channel synchronous videos are multi-channel videos synchronized at the video frame level. Therefore, the application scenarios are more extensive. It is suitable for second-level application scenarios, such as video synchronization display in a monitoring center, and frame-level application scenarios, such as panoramic video production, video splicing, and target detection.
  • the present application provides a method for synchronizing multiple channels of video.
  • the method includes the following steps:
  • S201 Acquire N channels of video, where the N channels of video are obtained by shooting a geographic area by N cameras, and the N is an integer not less than 2.
  • each of the N channels of videos includes multiple video frames. It is understandable that if too many video frames are calculated at the same time, the amount of calculation will be too large and the processing efficiency of multi-channel video synchronization will be reduced. Therefore, each time the multi-channel video is synchronized, the number of video frames of each video can be determined according to the historical synchronization record and the video frame rate during the multi-channel video synchronization. For example, suppose N cameras with a frame rate of 12 frames per second (Frames Per Second, FPS), that is, 12 video frames per second, and N channels of video with a frame rate of 12 FPS in the historical calculation During the synchronization process, make sure that each channel of video is at most 1 second faster than the other channel of video.
  • FPS Frames Per Second
  • S202 Obtain the geographic coordinates of the target in the video frame of each channel of video in the N channels, and determine the geographic coordinates of the target in the video frame of each channel of video according to the geographic coordinates of the target in the video frame of each channel of video. Similarity.
  • the acquiring the geographic coordinates of the target in the video frame of each of the N channels of video includes: inputting the video frame of each channel of the video into the target detection model to obtain the The pixel coordinates of the target in the video frame of the video; according to the pixel coordinates of the target in the video frame of each channel of video and the calibration parameters of the camera corresponding to each channel of video, determine the pixel coordinates in the video frame of each channel of video
  • the geographic coordinates of the target, where the calibration parameters of the camera are used to indicate the mapping relationship between the video screen shot by the camera and the geographic area being shot.
  • the geographic coordinates of the target in the video frame of each video can be directly obtained from other systems or devices through the network.
  • the video frame of each channel of video is sent to other systems or devices, and the other system or this side performs target detection on the video frame of each channel of video to obtain the target in the video frame of each channel of video.
  • the geographic coordinates of the target in the video frame of each video are determined according to the calibration parameters.
  • the determining the similarity between the video frames of different channels of video according to the geographic coordinates of the target in the video frame of each channel of video includes: calculating the degree of similarity in the video frames of each channel of video The distance between the geographic coordinates of the target and the geographic coordinates of the target in the video frames of other road videos; the similarity between the video frames of the different road videos is determined according to the distance.
  • the specific content of this step will be described in detail in the following steps S2026-S2028.
  • S203 Obtain synchronization frame information of the N channels of video according to the similarity between the video frames in the different channels of video, where the synchronization frame information includes the number of time-synchronized video frames in the corresponding video. location information.
  • N time-synchronized video frames describe scenes that occur at the same time
  • the position information of the N time-synchronized video frames in the corresponding video may include the frame numbers of the N time-synchronized video frames in the corresponding video.
  • N time synchronized video frames belong to different video channels.
  • the frame number refers to the number of each video frame in the frame sequence after the multiple video frames in each video are arranged in time sequence into a frame sequence, that is, the frame number of each video frame.
  • the frame number of the first video frame of channel A is 1, the frame number of the second video frame is 2, and so on, or the frame number of the first video frame is 0, and the frame number of the second video frame is 0.
  • the frame number is 1, and so on.
  • the three time-synchronized video frames of three channels of videos A, B, and C can be the second frame of video A, which corresponds to the frame number 2, and the third frame of video B, which corresponds to The corresponding frame number of is 3, and the fourth frame of video C, the corresponding frame number of this frame is 4, then the synchronization frame information of the three-channel video A, B, and C can be (2, 3, 4).
  • video A is 1 frame faster than video B
  • video C is 1 frame slower than video B.
  • the synchronization frame information can be used for the IPC to adjust its own output video timing, and can also be used to obtain multiple channels of synchronized videos.
  • the output unit 130 in the embodiment of FIG. 2, which will not be repeated here please refer to the description of the output unit 130 in the embodiment of FIG. 2, which will not be repeated here. It should be understood that the above examples are only for illustration and cannot constitute a specific limitation. The specific content of this step will be described in detail in the following steps S2031A-S2034A and S2031B-
  • step S202 The specific process of obtaining the geographic coordinates of the target in the video frame of each of the N channels of video in the foregoing step S202 will be explained in detail in conjunction with step S2021 to step S2025.
  • the target in the video frame may be determined according to the content of the N-channel video.
  • people or things that move frequently in the video may be used as the target.
  • the target can be cars, pedestrians, non-motorized vehicles, and so on.
  • the N-channel video is an examination room, then the target can be students, invigilators, patrol examiners, etc. It should be understood that the above examples are only for illustration and cannot constitute a specific limitation. It should be understood that since the number of targets contained in each video frame may be one or more, the geographic coordinates of the target in each video frame may include one or more geographic coordinates, which is not specifically described in this application. limited.
  • the geographic coordinates of the target in the video frame may be the geographic coordinates of the target contained in the common view area of the video frame. It is understandable that although the multiple IPCs that obtain N-channel videos capture the same geographic area, the shooting angles of different IPCs may be different, so each IPC will have a common viewing area with other IPCs.
  • the common view area refers to the area that each IPC and other IPCs can shoot
  • the non-common view area refers to the area that some IPCs can shoot but other IPCs cannot.
  • this application determines the similarity between video frames in different video frames based on the similarity between the geographic coordinates of the target in the video frame, and the target in the non-common view area cannot be captured by other IPCs.
  • Target resulting in the similarity between the geographic coordinates of the target in the non-common-view area and the geographic coordinates of the target in the next calculated video frame. Therefore, for this application, the geographic coordinates of the target in the non-common-view area The coordinates may not participate in the subsequent similarity calculation.
  • the geographic coordinates of multiple targets are processed twice to filter out the geographic coordinates of the targets in the common view area of each video frame, the amount of calculation of the similarity of geographic coordinates can be greatly reduced, and the processing efficiency of obtaining multi-channel video synchronization methods can be improved. .
  • the trained target detection model can be used to identify the target in each video frame, and then filter the target based on the pixel coordinates of the target based on the video frame.
  • the geographic coordinates of the target in the non-common-view area of each channel of video are discarded, so as to obtain the geographic coordinates of the target in the common-view area in the video frame of each channel of video. Therefore, as shown in FIG. 4, the specific process of determining the geographic coordinates of the target in the video frame in step S202 may include the following steps:
  • step S2021 Perform spatial calibration on the N cameras to obtain calibration parameters, where the calibration parameters are used to obtain geographic coordinates corresponding to the pixel coordinates according to the pixel coordinates, and the calibration parameters of each camera represent the video shot by each camera The mapping relationship between the picture and the geographic area being shot. It should be noted that in the same application scenario, step S2021 only needs to perform the space calibration process once. After obtaining the calibration parameters, as shown in Figure 4, the calibration parameters will be stored in the memory for the next calculation of the video in the same scene. Used when the geographic coordinates of the target in the frame. It is understandable that the N-channel video can be a video taken by N cameras set at a fixed position, or a video recorded by N cameras set at a fixed position.
  • the space calibration needs to be re-calibrated.
  • the calibration parameters of the N cameras can be directly obtained from other systems or devices through the network, which is not specifically limited in this application.
  • spatial calibration refers to the process of calculating the calibration parameters of the N cameras.
  • the calibration parameter refers to the mapping relationship between the video screen shot by the camera and the geographic area being shot, and specifically refers to the correspondence between the pixel coordinates of a point in the image shot by the camera and the geographic coordinates corresponding to the point.
  • the pixel coordinates of any point in the image can be converted into geographic coordinates.
  • the pixel coordinates may be the coordinates of the pixel point where the target is located in the image, and the pixel coordinates are two-dimensional coordinates.
  • the geographic coordinates may be three-dimensional coordinate values of points in a geographic area. It should be understood that in the physical world, the corresponding coordinate values of the same point in different coordinate systems are different.
  • the geographic coordinates of the target in this application can be coordinate values in any coordinate system set according to actual conditions.
  • the geographic coordinates of the target in this application can be three-dimensional coordinates composed of longitude, latitude, and altitude corresponding to the target, or it can be
  • the three-dimensional coordinates composed of X, Y, and Z coordinates in the natural coordinate system corresponding to the target may also be other forms of coordinates.
  • the coordinates can uniquely determine the location of a point in a geographic area, this application does not limit it.
  • S2022 Input each video frame of each video to the target detection model, and obtain an output result image corresponding to each video frame, wherein the output result image includes a bounding box, and the bounding box is used
  • the bounding box may specifically be a rectangular box, a circular box, an elliptical box, etc., which is not specifically limited in this application, and a rectangular box will be used as an example for description in the following.
  • the target detection model in FIG. 4 is a model for detecting motor vehicles, so after target detection is performed on the video frame shown in FIG. 4, all motor vehicles are selected by rectangular boxes.
  • the target detection model can be obtained by training an AI model.
  • the AI model includes multiple types.
  • the neural network model is one of the AI models. In describing the embodiments of this application, the neural network Take the model as an example. It should be understood that other AI models can also be used to complete the functions of the neural network model described in the embodiments of the present application, which is not limited in this application.
  • the neural network model is a kind of mathematical calculation model that imitates the structure and function of a biological neural network (an animal's central nervous system).
  • a neural network model can also be composed of a combination of multiple existing neural network models.
  • Neural network models with different structures can be used in different scenarios (for example: classification, recognition) or provide different effects when used in the same scenario.
  • Different neural network model structures specifically include one or more of the following: the number of network layers in the neural network model is different, the order of each network layer is different, and the weights, parameters or calculation formulas in each network layer are different.
  • neural network models can be trained by a specific training set to complete a task alone or be combined with other neural network models.
  • a combination of network models (or other functional modules) completes a task.
  • Some neural network models can also be used directly to complete a task alone or in combination with other neural network models (or other functional modules) to complete a task.
  • the target detection model in the embodiment of the present application can use any of the neural network models that have been used for target detection with better effects in the industry, for example: One-stage unified real-time target detection (You Only Look Once : Unified, Yolo) model, Single Shot multi-box Detector (SSD) model, Regional ConvolutioNal Neural Network (RCNN) model or Fast Regional Convolutional Neural Network (Fast Regional Convolutional Neural) Network, Fast-RCNN) model, etc.
  • This application is not specifically limited.
  • the Yolo model is a deep neural network (Deep Neural Network, DNN) with a convolutional structure.
  • the Yolo model places N ⁇ N grids on the picture, and predicts the target position and classifies and recognizes the target for each grid. Compared with the sliding window for target position prediction and target classification and recognition, it can greatly reduce the amount of calculation. It can realize fast target detection and recognition with high accuracy.
  • the Yolo model can include several network layers, where the convolutional layer is used to extract the features of the target in the image, and the fully connected layer is used to predict the target location and target category probability value of the target features extracted by the convolutional layer.
  • the Yolo model needs to be trained, so that the Yolo model has the function of target detection.
  • the training set includes multiple sample images.
  • Each sample image is an image containing a target (such as a motor vehicle or pedestrian).
  • Each sample image is placed with an n ⁇ n grid.
  • Each grid containing the target is marked with the position information (x 0 , y 0 , w 0 , h 0 ) of the target's bounding box and the probability value P 0 of the target category, where x 0 , y 0 are the target's
  • the offset value of the center coordinate of the bounding box relative to the center coordinate of the current grid, w 0 and h 0 are the length and width of the bounding box.
  • the convolutional layer in the Yolo model extracts the features of the target in each sample, and the fully connected layer extracts the features of the target output by the convolutional layer.
  • the Yolo model has been trained, that is, it has the function of target detection and can be used The target in the video frame is detected, and the Yolo model is the target detection model used in step S2022.
  • the Yolo model is used to detect the target in the video frame that contains the target captured by the camera.
  • the convolutional layer extracts the characteristics of the target in the video frame, and the fully connected layer detects the characteristics of the target. Identify and predict the position information (x', y', w', h') of the bounding box of the target in the video frame to be tested and the probability value P'of the target category, according to the position information (x', y' ,w',h')
  • the predicted bounding box of the target can be generated, and the category information of the target is also marked according to the probability value P'of the category to which the target belongs, and the output result image corresponding to the video frame to be detected can be obtained.
  • the representative point can be determined by the object centroid detection method.
  • the target is detected by the weighting of the maximum likelihood estimation method. The only point (mass point) that does not change its position due to its rigid motion, and the position of the mass point represents its position in the video frame.
  • the indicated point can also be determined by 3D detection.
  • the original 2D object detection can be converted to 3D object detection through point cloud map, new object height or depth, etc., to obtain 3D modeling of the target object, which is determined according to its 3D model
  • One of the locations is used as a point, and the point represents the location of the target.
  • the indication point can also be directly determined by the rectangular frame on the 2D pixel screen in conjunction with the video content.
  • the target is a motor vehicle
  • the horizontal or vertical direction of the straight vehicle is basically the same
  • the midpoint of the lower edge of the rectangular frame is often selected as the representative point of the target; the close-range vehicle is large in size and deformed in front and rear perspective. Therefore, the lower right corner of the rectangular frame is often selected as the representative point of the target; because of the small size of the distant vehicle, the rectangular frame is also very small, so the center point of the rectangular frame is often selected as the representative point of the target.
  • the pixel coordinates of the target in the video frame can be obtained according to the pixel coordinates of the representative point in the video frame.
  • the video frame passes through the target detection model to obtain the output result image (including multiple boxes to select the rectangular frame of the motor vehicle), and replace each rectangular box in the output result image with a representative point to obtain Take the pixel coordinates of the target in the video frame as shown in FIG. 4.
  • the rectangular frame and pixel coordinates shown in FIG. 4 are only for illustration and cannot constitute a specific limitation.
  • S2024 Obtain the geographic coordinates of the target in each video frame according to the calibration parameters and the pixel coordinates of the target in each video frame.
  • the geographic coordinates corresponding to the pixel coordinates of the target in the video frame can be obtained.
  • the foregoing Embodiments will not be repeated here.
  • S2025 Filter the geographic coordinates of the target in each video frame to obtain the geographic coordinates of the target in the common view area of each video frame.
  • the method further includes: determining a common view area, the common view area being an area jointly captured by the N cameras, and the common view area being part or all of the geographic area;
  • the determining the similarity between video frames in different channels of video according to the geographic coordinates of the target in the video frame of each channel of video includes: according to the common view recorded in the video frames of each channel of video
  • the geographic coordinates of the target in the area determine the similarity between video frames in different videos.
  • the shooting ranges of the N cameras of the N video can be obtained by calculation, and the common view area of the N videos can be obtained by taking the intersection of the shooting ranges of the N cameras.
  • the shooting range can specifically refer to the geographic coordinate range corresponding to the geographic area that the camera can capture
  • the common view area of two channels of video refers to a geographic area that can be captured by both cameras corresponding to the two channels of video.
  • the range of geographic coordinates Therefore, it is possible to sequentially determine whether the geographic coordinates of the target in each video frame obtained in step S2024 are within the geographic coordinate range of the common-view area, and filter out the geographic coordinates of the targets that are not in the common-view area, so as to obtain each video The geographic coordinates of the target in the common view area of the frame.
  • One video is captured by IPC1 and the other is captured by IPC2.
  • the shooting range of IPC1 is sector CDE, and the shooting range of IPC2 is sector FGH, which is taken by IPC1
  • the geographic coordinates of the target in the video frame P 1 in the first video are A 1 and A 2
  • the geographic coordinates of the target in the video frame P 2 in the other video taken by IPC2 are B 1 and B 2
  • IPC1 The common view area of the two videos obtained with IPC2 can be the shaded area in Fig. 5.
  • the geographic coordinates of the target in the common view area of the video frame P 1 are A 2
  • the common view area of the video frame P 2 The geographic coordinates of the target within is B 2 .
  • FIG. 5 is only used for illustration, and does not constitute a specific limitation.
  • the N IPCs obtained from N channels of video are fixed-position IPCs, such as surveillance cameras at traffic intersections, when the common view area of a certain channel of video is obtained, it can be stored in the memory so that the same one can be calculated next time
  • the geographic coordinates of the target of each video frame of the video transmitted by the IPC are used, thereby reducing unnecessary calculations and improving the efficiency of multi-channel video synchronization.
  • the shooting range of each video is the range of the geographic area recorded in the video frame captured by each IPC. Therefore, the pixel coordinates of each edge position point can be calculated by determining the most edge position point that can be displayed in the video screen of each video, and the pixel coordinates of each edge position point can be converted into geographical coordinates, and then the area can be determined according to the geographical coordinates.
  • the shooting range of road video is the range of the geographical area recorded in the video frame captured by each IPC. Therefore, the pixel coordinates of each edge position point can be calculated by determining the most edge position point that can be displayed in the video screen of each video, and the pixel coordinates of each edge position point can be converted into geographical coordinates, and then the area can be determined according to the geographical coordinates.
  • the shooting range of road video is the shooting range of road video.
  • FIG. 6 may select the position of the point C on the edge of one video frame P, D and E, obtaining each edge position of the point C, D and E are pixel coordinates, then the point is determined according to the calibration parameters C , D and E are pixel coordinates corresponding to geographical coordinates, the geographical coordinates of the sector points C, D and E is the composition of the CDE video frame imaging range P 1.
  • Fig. 6 only uses points C, D, and E as the edge position points for illustration.
  • the video frame P1 can select multiple edge position points at the video edge, and the more the number of edge position points, The more accurate the obtained shooting range, the more the number of edge position points can be determined according to the processing capability of the computing device.
  • FIG. 6 is only used for illustration, and this application does not specifically limit it.
  • each channel of video contains targets, such as pedestrians, motor vehicles, etc.
  • targets such as pedestrians, motor vehicles, etc.
  • the area of the shadow formed by multiple geographic coordinates constitutes the shooting range of IPC1.
  • step S2026-step S2028 the specific process of determining the similarity between video frames in different channels of video in the foregoing step S202 will be explained in detail.
  • the similarity between video frames can be determined by calculating the distance value between the geographic coordinates of the target in each video frame and the geographic coordinates of the target in other video frames. Among them, the greater the distance value, the higher the similarity, and the smaller the distance value, the lower the similarity. Moreover, since there can be multiple targets in each video frame, the average distance between the geographic coordinates of multiple targets in the video frame and the geographic coordinates of multiple targets in other video frames can be calculated. Determine the similarity between this video frame and other video frames.
  • a first step of calculating a similarity particular video frame in a video geographic coordinates between P i and the second video frame in a video Q i may be as follows:
  • the video frames geographical coordinates P i A 1 Q i of the video frame and the geographical coordinates of B may be the distance between the Euclidean matrix between geographic coordinates, and may be an absolute value of distance, may be geographic coordinates
  • the length of the line segment between, and the specific calculation formula are not specifically limited in this application.
  • video frames and video frames P i Q i belong to a different video channel, shown in Figure 8, a video frame is P i A video of a video frame, the video frame is a B Q i of a video Video frame.
  • the distance D 11 is the distance value between the geographic coordinate A 1 and the geographic coordinate B 1
  • the distance D 12 is the distance value between the geographic coordinate A 1 and the geographic coordinate B 2
  • the distance D 1w is the geographic coordinate a distance value between A1 and geographical coordinates B w
  • D 11, D 12, ... between a minimum value D if D 1w 1 corresponding to the target D 11
  • the video frame of geographical coordinates P i a 1 (such as vehicle license plate number A10000)
  • the target video frame of Q i corresponding to geographic coordinates B 1 (A10000 is a motor vehicle license plate number) is the maximum likelihood of the same target, the same token, if D is D.
  • the video frames in geographic coordinates P i a 1, Q i with the video frame corresponding to a target geographical coordinates B 2 are the same and most likely target.
  • the same calculating a distance between a target video frame geographic coordinates P i and Q i of the video frame if the closer, represents the geographic coordinates P i of the video frame and the target video frame of the target Q i
  • Q i geographical coordinates P i of the video frame and the video frame of the target represents the target is not similar.
  • S2028 Determine the average value among the minimum values D 1 , D 2 ,..., D w Obtaining a similarity S i between the geographical coordinates of the video frame with the video frame P i Q i.
  • the calculation of the average value can be the calculation method shown in Figure 8, or other methods of calculating the average number, such as taking the median value between the minimum values D 1 , D 2 ,..., D w as the average value, After removing the maximum value and the minimum value among the minimum values D 1 , D 2 ,..., D w , the average value of the remaining values is taken as the average value, etc., which is not specifically limited in this application.
  • this application provides two methods for implementing step S203.
  • the first method is based on the targets in the two video frames.
  • the similarity between the geographic coordinates of the two channels is determined, the synchronization frame information between the two videos is determined, and then the synchronization frame information between the N videos is determined, which will be described in detail in steps S2031A-S2034A; the other method is directly based on
  • the similarity between the geographic coordinates of the targets in the two video frames determines the synchronization frame information between the N channels of video, which will be described in steps S2031B-S2033B below.
  • the two methods are respectively introduced below.
  • step S203 that is, the method of first determining the synchronization frame information between every two channels of video, and then determining the synchronization frame information between the N channels of video, is introduced.
  • S2031A Calculate synchronization frame information between every two videos in the N videos based on the similarity between the geographic coordinates of the targets in each video frame.
  • the synchronization frame information between every two videos can be determined according to the minimum value of the similarity between the geographic coordinates of the target in each frame of every two videos.
  • the specific steps of step S2031A can be shown in Figure 9, where Figure 9 uses this multi-channel video synchronization process, the A channel video includes t video frames participating in the calculation, and the B channel video includes t video frames. Participating in the calculation is taken as an example to illustrate.
  • the number of video frames in the A video and the B video are different, you can also refer to the steps in FIG. 9 for calculation, and the details are not repeated here.
  • the synchronization frame information includes the position information of the video frame Pu of the A channel video in the video A and the position information of the video frame Q v of the B channel video in the video B, that is, the position information of the video frame P u of the A channel video
  • the u video frames P u (assuming the frame number is u) and the v-th video frame Q v (assuming the frame number as v) of the B-channel video are two video frames synchronized in time, so the A-channel video and the B-channel video in Fig. 9
  • the synchronization frame information between channels of video is (u, v).
  • the two time-synchronized video frames of channel A and channel B are video frame P 2 and video frame Q 1 , that is, Video of channel A is 1 video frame faster than video of channel B
  • the synchronization frame information of video of channel A and video of channel B is (2,1).
  • the two time-synchronized video frames of the B-channel video and the C-channel video are Q 5 and R 7 respectively , that is to say, the B-channel video is 2 video frames slower than the C-channel video, so the synchronization frame of the B-channel video and the C-channel video
  • the information is (5, 7).
  • the first synchronization frame information (u 1 , v 1 ) between the first video and the second video can be obtained.
  • S2032A According to the synchronization frame information between every two videos in the N videos, determine the frame number relationship of the two frames of the time synchronization of every two videos in the N videos.
  • the frame number relationship between the first video and the second video is (0, x 2 ).
  • the synchronization frame information of the A channel video and the B channel video is (2, 1), then the frame number relationship between the two time-synchronized video frames of the A channel video and the B channel video is recorded as ( 0, -1).
  • the synchronization frame information of the B-channel video and the C-channel video is (5, 7), then the frame number relationship between the two time-synchronized video frames of the B-channel video and the C-channel video is recorded as (0, 2).
  • S2033A According to the x 1 , x 2 ,..., x N-1, determine the frame number relationship (0, x 1 , x 1 + x 2 ,... , X 1 +x 2 +...+x N-1 ). Still taking the above example as an example, the frame number relationship between the two time synchronized video frames of the A video and the B video is recorded as (0, -1), and the two time synchronization of the B video and the C video The frame number relationship between video frames is recorded as (0, 2). Therefore, the frame number relationship of the three time-synchronized video frames between the A channel video, the B channel video and the C channel video is (0, -1, 1).
  • S2034A According to the frame number relationship (0, x 1 , x 1 + x 2 ,..., x 1 + x 2 +... + x N-1 ) between the N video frames synchronized according to the time of the N channels of video, Determine the synchronization frame information of the N channels of video.
  • the synchronization frame information of N channels of video includes the frame numbers of the N time synchronized video frames in the corresponding video, and satisfies the frame number relationship (0, x 1 , x 1 + x 2 ,..., x 1 + x 2 +...+x N-1 ) can have many frame numbers, such as (1,1+x 1 ,1+x 1 +x 2 ,...,1+x 1 +x 2 +...+x N-1 ), it can also be the frame number (2, 2+x 1 , 2+x 1 +x 2 ,..., 2+x 1 +x 2 +...+x N-1 ) and so on, so you can select all the frame numbers A group of frame numbers with a positive number and the smallest sum of frame numbers are used as the synchronization frame information of N-channel video.
  • the frame number relationship of the three time-synchronized video frames between the A video, the B video and the C video is (0, -1, 1), then the A video and the B video
  • the synchronization frame information between the video and the C-channel video can be (2, 1, 3).
  • the time-synchronized video frames between the A-channel video, the B-channel video, and the C-channel video may be P 2 , Q 1 and R 3 . It should be understood that the above examples are only for illustration and cannot constitute a specific limitation.
  • the first method for determining synchronization frame information of N channels of video is to first determine the synchronization frame information between every two channels of video, and then determine the synchronization frame between N channels of video Information method, this method only calculates the synchronization frame information between 2 videos at a time, and the calculation pressure on the server is very small, so it is very suitable for deployment on servers with low computing performance, such as the deployment on the road in the embodiment of Figure 1A
  • the edge computing all-in-one computers on both sides have a small calculation pressure for this synchronization frame information calculation method. It can calculate the synchronization frame information of multiple IPCs at the intersection without occupying too much computing resources of the edge computing all-in-one computer. It should be understood that the above-mentioned examples are only for illustration and cannot create specific limitations.
  • step S203 that is, directly determining the synchronization frame information between N channels of videos.
  • the specific process can be as follows:
  • S2031B Extract one video frame from each of the N videos to form a video frame group, and obtain t N video frame groups.
  • the first video includes video frames P 1 and P 2
  • the second video includes video frames Q 1 and Q 2
  • the third video includes video frame R 1 and R 2
  • the first video frame group in Figure 11 includes P 1 , Q 1 and R 1
  • the second video frame group includes video frames P 1 , Q 1 and R 2
  • the third video frame group includes video frames P 2 , Q 1 and R 1, etc., which will not be repeated here.
  • S2032B Determine the sum of the similarities between the geographic coordinates of the targets in every two video frames in each video frame group.
  • the first video frame group includes video frames P 1 , Q 1 and R 1 , so step S1102 may first calculate the geographic coordinates of the target between the video frames P 1 and Q 1 respectively.
  • the sum of the similarities Y 1 , Y 2, ..., Y 8 of the geographic coordinates of the target between every two video frames in the 8 video frame groups can be obtained.
  • S2033B Determine the same frame information of the N channels of video according to the frame number of each frame in the video frame group with the smallest sum.
  • the minimum value among Y 1 , Y 2, ..., Y 8 is Y 3 , that is, the third video frame group (video frames P 2 , Q in the shaded area 1 and R 1 )
  • the sum of the similarity of the geographic coordinates of the target between every two video frames is the smallest. Therefore, the three time-synchronized video frames of the three-channel video shown in FIG. 11 are P 2 , Q 1 and R 1.
  • the synchronization frame information is (2, 1, 1).
  • the second method of directly determining synchronization frame information between N channels of videos can reduce the calculation time of the multi-channel video synchronization method and improve the efficiency of multi-channel video synchronization.
  • step S202 calculates each video When the geographic coordinates of the target in the frame are obtained, the geographic coordinates of all the targets can be obtained.
  • step S202 calculates each video When the geographic coordinates of the target in the frame are obtained, the geographic coordinates of all the targets can be obtained.
  • step S202 calculates each video When the geographic coordinates of the target in the frame are obtained, the geographic coordinates of all the targets can be obtained.
  • step S202 calculates each video When the geographic coordinates of the target in the frame are obtained, the geographic coordinates of all the targets can be obtained.
  • the geographic coordinates of the target in the common view area of the A channel video and the B channel video select the geographic coordinates of the target in the common view area of the A channel video and the B channel video The coordinates are calculated.
  • the geographic coordinates of the target in the common-view area of the A-channel video and the C-channel video are selected for calculation, which will not be repeated here.
  • step S202 when calculating the geographic coordinates of each video frame, it is possible to directly obtain the geographic coordinates in the common view area of the N-channel video, and use the geographic coordinate system in the filtered formula area as the geographic coordinates of the target of each video frame to calculate the synchronization frame information of the N-channel video.
  • This application The sequence of the specific steps for screening geographic coordinates in the common view area is not limited.
  • the method further includes: performing time synchronization on the N channels of video according to the synchronization frame information to obtain time synchronized N channels of video.
  • the N-channel video can be a real-time video stream, it can also be an offline video stored locally.
  • the N channels of video are real-time video streams
  • N channels of time-synchronized videos are obtained, and the start of each channel of time-synchronized video
  • the video frame is a time-synchronized video frame in each channel of the video.
  • the N channels of video are offline videos, after obtaining the synchronization frame information of the N channels of video, the video frame corresponding to each frame number in the synchronization frame information can be used as the starting point of each channel of video playback. In order to obtain N-channel synchronous video.
  • the synchronization frame information of the four-channel video shown in FIG. 12A is (3, 5, 1, 8), that is, the four time-synchronized video frames of the four-channel video are respectively The third video frame of the first channel, the fifth video frame of the second channel, the first video frame of the third channel, and the 8th video frame of the fourth channel.
  • the start video frame of the first video can be confirmed as the third video frame, and the start video of the second video The frame is confirmed as the 5th video frame, the starting video frame of the third channel is confirmed as the first video frame, and the starting video frame of the fourth channel is confirmed as the 8th video frame, thus obtaining 4 channels of synchronized Real-time video streaming.
  • FIGS. 12A to 12B are only used for illustration, and cannot constitute a specific limitation.
  • the method further includes: sending the synchronization frame information to other devices; or, sending the N channels of time-synchronized videos to other devices.
  • the output module 103 may directly send the synchronization frame information to the processing system or processing device according to the application scenario, so that the processing system or processing device can obtain multiple images shot in the same geographic area at the same time according to the N time-synchronized video frames, And perform panoramic image production or image recognition processing based on the multiple images.
  • the output unit 130 may obtain multiple channels of synchronized videos according to the synchronization frame information, and then send them to the display screen of the monitoring center, so that the monitoring center can directly display the real-time monitoring of synchronized playback.
  • the multi-channel video synchronization method does not need to arrange any additional hardware devices such as acquisition equipment, video capture devices, etc., because it performs multi-channel video synchronization according to the video content, and is suitable for any model, manufacturer, parameter, and time.
  • the stamped IPC is suitable for any network delay condition and communication environment of the transmission protocol, and the overall versatility and robustness of the scheme is better.
  • the application scenarios are more extensive and not only applicable to second-level application scenarios,
  • the video synchronization display in the monitoring center can also be applied to frame-level application scenarios, such as panoramic video production, video splicing, and target detection.
  • the present application also provides a multi-channel video synchronization system 100 as shown in FIG. 2, and the multi-channel video synchronization system 100 is used to perform the aforementioned multi-channel video synchronization method.
  • This application does not limit the division of functional units in the multi-channel video synchronization system, and each unit in the multi-channel video synchronization system can be added, reduced, or combined as needed.
  • Fig. 2 exemplarily provides a division of functional units: the multi-channel video synchronization system 100 includes an input unit 110, a calculation unit 120, and an output unit 130, among which,
  • the input unit 110 is configured to obtain N channels of video, where the N channels of video are obtained by shooting a geographic area by N cameras, and the N is an integer not less than 2.
  • the calculation unit 120 is configured to obtain the geographic coordinates of the target in the video frame of each channel of the N videos, and determine the geographic coordinates of the target in the video frame of each channel of the video. The similarity between video frames;
  • the calculation unit 120 is configured to obtain synchronization frame information according to the similarity between the video frames in the different channels of videos, where the synchronization frame information is used to synchronize the time of the videos taken by the N cameras,
  • the synchronization frame information includes position information of N time-synchronized video frames in the corresponding video.
  • the N channels of videos are video streams
  • the system further includes an output unit 130 configured to obtain N video frames according to the position information of the N time-synchronized video frames in the corresponding video.
  • Channel time-synchronized video, the start video frame of each channel of time-synchronized video is the time-synchronized video frame in each channel of video.
  • the output unit 130 is further configured to send the synchronization frame information to other devices; or, the output unit is further configured to send the N channels of time-synchronized videos to other devices.
  • the calculation unit 120 is configured to input the video frame of each channel of video into the target detection model to obtain the pixel coordinates of the target in the video frame of each channel; the calculation unit 120 is configured to The pixel coordinates of the target in the video frame of each channel of video and the calibration parameters of the camera corresponding to each channel of video are used to determine the geographic coordinates of the target in the video frame of each channel of video.
  • the calibration parameters of the camera are used for Indicate the mapping relationship between the video screen shot by the camera and the geographic area shot.
  • the calculation unit 120 is configured to determine a common-view area, the common-view area is an area jointly captured by the N cameras, and the common-view area is part or all of the geographic area;
  • the calculation unit 120 is configured to determine the similarity between video frames in different channels of video according to the geographic coordinates of the target in the common view area recorded in the video frames of each channel of video.
  • the calculation unit 120 is configured to calculate the distance between the geographic coordinates of the target in the video frame of each channel of video and the geographic coordinates of the target in the video frame of the other channel of video; the calculation unit 120 uses To determine the similarity between video frames of different video channels according to the distance.
  • the input unit 110 and the calculation unit 120 in the multi-channel video synchronization system 100 are used to execute steps S201 to S203 and optional steps of the foregoing method.
  • the calculation unit 120 is configured to execute the aforementioned method steps S2021-step S2028, step S2031A-step S2034A, step S2031B-step S2033B and optional steps thereof.
  • each unit included in the multi-channel video synchronization system 100 can be a software unit, a hardware unit, or a part of a software unit and a part of a hardware unit.
  • the electronic device 1300 may be the multi-channel video synchronization system 100 in the foregoing content.
  • the electronic device 1300 includes a processor 1310, a communication interface 1320, and a memory 1330.
  • the processor 1310, the communication interface 1320, and the memory 1330 are connected to each other through an internal bus 1340.
  • the electronic device 1300 may be an electronic device in the cloud environment shown in FIG. 1B, or an electronic device in the edge environment shown in FIG. 1A.
  • the processor 1310, the communication interface 1320, and the memory 1330 may be connected by a bus, or may be communicated by other means such as wireless transmission.
  • the embodiment of the present application takes the connection through a bus 1340 as an example, where the bus 1340 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus 1340 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 13, but it does not mean that there is only one bus or one type of bus.
  • the processor 1310 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination thereof.
  • the aforementioned PLD may be a complex programmable logic device (CPLD), a field programmable logic gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL), or any combination thereof.
  • the processor 1310 executes various types of digital storage instructions, such as software or firmware programs stored in the memory 1330, which enables the computing device 1300 to provide a wide variety of services.
  • the processor 1310 may include a computing unit and an output unit.
  • the computing unit may implement processing functions by calling program codes in the memory 1330, including the functions described by the computing unit 120 in FIG. 2, for example, obtaining the N
  • the output unit can also implement processing functions by calling the program code in the memory 1330, including the functions described by the output unit 130 in FIG. 2, such as obtaining N channels of time-synchronized video according to the synchronization frame information of the N channels of video, or sending synchronization
  • the frame information to other devices, or sending the N channels of time-synchronized videos to other devices, etc. can also be used to perform other steps described in the embodiments of FIG. 3 to FIG. 12B, which will not be repeated here.
  • the memory 1330 may include a volatile memory (Volatile Memory), such as a random access memory (Random Access Memory, RAM); the memory 1330 may also include a non-volatile memory (Non-Volatile Memory), such as a read-only memory (Read-only memory). Only Memory, ROM, Flash Memory, Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 1330 may also include a combination of the above types.
  • the memory 1330 may store application program codes and program data.
  • the program code can be the code to calculate the common view area of N-channel video, the code to calculate the geographic coordinates of the target in each frame, the code to calculate the synchronization frame information, etc.
  • the program data can be the calibration parameter, the geographic coordinate range of the common view area and many more. It can also be used to perform other steps described in the embodiment in FIG. 3 to FIG. 12B, which will not be repeated here.
  • the communication interface 1320 may be a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (for example, a cellular network interface or the use of a wireless local area network interface) to communicate with other devices or modules.
  • a wired interface such as an Ethernet interface
  • PCIe serial computer expansion bus
  • PCIe Peripheral Component Interconnect express
  • Ethernet interface such as an Ethernet interface
  • a wireless interface for example, a cellular network interface or the use of a wireless local area network interface
  • FIG. 13 is only a possible implementation of the embodiment of the present application.
  • the electronic device may also include more or fewer components, which is not limited here.
  • the content that is not shown or described in the embodiments of the present application please refer to the relevant descriptions in the embodiments described in FIG. 3 to FIG. 12B, and details are not repeated here.
  • the electronic device shown in FIG. 13 may also be a computer cluster composed of multiple computing nodes, which is not specifically limited in this application.
  • An embodiment of the present application also provides a computer-readable storage medium, which stores instructions, and when the computer-readable storage medium runs on a processor, the method flow shown in FIGS. 3 to 12B is implemented.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product runs on a processor, the method flow shown in FIG. 3 to FIG. 12B can be realized.
  • the foregoing embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a high-density digital video disc (Digital Video Disc, DVD)), or a semiconductor medium.
  • the semiconductor medium may be an SSD.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

本申请提供了一种多路视频同步的方法。所述方法包括:获取N路视频,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数;获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度;根据所述不同路视频中的视频帧之间的相似度,获得同步帧信息,其中,所述同步帧信息用于对所述N个摄像机拍摄的视频进行时间同步,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。

Description

多路视频同步的方法、系统及设备 技术领域
本申请涉及人工智能(Artificial Intelligence,AI)领域,尤其涉及多路视频同步的方法、系统及设备。
背景技术
随着网络摄像机(IP Camera/Network Camera,IPC)的发展创新,IPC逐渐广泛地应用于多个领域,如教育、商业、医疗、公共事业等。对于某个需要被监控的场景,通常存在多个不同视角的IPC对该场景进行多路视频监控。对多路监控同一地理区域的视频进行运用时,常常有严格的时间同步性要求。举例来说,对于多目识别的场景中,如果使用由不同拍摄角度的多个IPC发送的多路视频,来识别各个目标(如车辆、非机动车和行人),那么需要采用上述不同角度的多个IPC拍摄并传输的多路视频中的同一时刻的多路视频的视频帧进行目标的识别,否则将影响目标识别结果的精度。再例如,对于全景视频制作的场景中,通过拼接不同拍摄角度的多个IPC发送的视频,获得全景视频,那么接收到的上述不同角度的多个IPC的视频也需要是时间同步的多路视频,否则拼接后的全景视频将产生图像模糊、运动鬼影等缺陷。因此,为了保证后续视频处理的准确度,需要保证多路视频的时间同步性。
但是,每个IPC由于型号、厂家难以统一,每个IPC的时间戳是不同的,加之网络传输存在时延等问题,往往会出现多个IPC发送的多路视频时间不同步的情况,进一步导致后续的目标识别、全景视频制作等以多路视频为输入源进行视频处理的过程存在障碍。
发明内容
本申请提供了一种多路视频同步的方法、系统及设备,该方法可以解决由于多路视频的时间不同步导致的视频处理存在障碍的问题。
第一方面,提供了一种多路视频同步的方法,该方法包括以下步骤:
获取N路视频,该N路视频由N个摄像机对一个地理区域进行拍摄获得,N为不小于2的整数;
获取该N路视频中的每路视频的视频帧中的目标的地理坐标,并根据每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度;
根据不同路视频中的视频帧之间的相似度,获得同步帧信息,其中,同步帧信息用于对所述N个摄像机拍摄的视频进行时间同步,同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。
上述方法中,通过计算N路视频中的每路视频的视频帧中的目标的地理坐标之间的相似度,确定N路视频的同步帧信息,该同步帧信息可以用于生成多路同步视频,也可以用于获取多个时间同步的视频帧,从而解决由于多路视频的时间不同步导致的视频处理存在障碍的问题。
在第一方面的一种可能的实现方式中,该N路视频为视频流,上述方法还包括:根据N个时间同步的视频帧在对应的视频中的位置信息,获得N路时间同步的视频,每路时间同步的视频的起始视频帧为该路视频中的时间同步的视频帧。
上述实现方式中,根据同步帧信息将N路视频流调整为N路时间同步的视频流之后,可以将N路时间同步的视频流发送至显示设备进行N路同步视频的显示,比如监控中心或者演播室的显示屏幕,使其可以直接显示同步播放的实时监控。
在第一方面的一种可能的实现方式中,上述方法还包括:发送同步帧信息至其他设备;或者,发送N路时间同步的视频至其他设备。
上述实现方式使得本申请可以根据不同的应用场景,分别将同步后的N路视频或同步帧信息发送至所需的处理系统或处理设备,不但可以适用于需要N路同步后的视频进行显示的监控中心、演播室等应用场景,而且可以适用于帧级别的应用场景,比如全景视频制作、视频拼接和目标检测。还可以适用于全景视频制作、视频拼接和目标检测等需要对N个时间同步的视频帧进行处理的应用场景,因此应用场景非常广泛。
在第一方面的一种可能的实现方式中,获取N路视频中的每路视频的视频帧中的目标的地理坐标,包括:将每路视频的视频帧输入目标检测模型,获得每路视频的视频帧中的目标的像素坐标;根据每路视频的视频帧中的目标的像素坐标和所述每路视频对应的摄像机的标定参数,确定每路视频的视频帧中的目标的地理坐标,其中,摄像机的标定参数用于指示该摄像机拍摄的视频画面和被拍摄的地理区域的映射关系。
具体地,可以先对所述N个摄像头进行空间标定;然后对每路视频的每个视频帧输入目标检测模型,获得所述每个视频帧对应的输出结果图像,其中,所述输出结果图像中包含边界框,所述边界框用于指示目标在图像中的位置;接着根据所述每个视频帧对应的输出结果图像,获得所述每个视频帧中的目标的像素坐标;最后根据所述标定参数以及所述每个视频帧中的目标的像素坐标,获得所述每个视频帧中的目标的地理坐标。
在第一方面的一种可能的实现方式中,根据每路视频的视频帧中的目标的地理坐标,确定不同路视频的视频帧之间的相似度,包括:计算每路视频的视频帧中的目标的地理坐标与其他路视频的视频帧中的目标的地理坐标之间的距离;根据距离确定不同路视频的视频帧之间的相似度。
具体地,计算第一路视频中的视频帧P i与第二路视频中的视频帧Q i之间地理坐标的相似度的具体流程可以包括:首先,确定视频帧P i中的目标A 1的地理坐标与视频帧Q i中的目标B 1,B 2,…,B W的地理坐标之间的距离D 11,D 12,…,D 1w,计算视频帧P中的目标A 2的地理坐标与视频帧Q i中的目标B 1,B 2,…,B W的地理坐标之间的距离D 21,D 22,…,D 2w,…,计算视频帧P i中的目标A w的地理坐标与视频帧Q i中的目标B 1,B 2,…,B W的地理坐标之间的距离D w1,D w2,…,D ww,其中,所述视频帧P i与视频帧Q i是不同路视频中的视频帧;其次,获取距离D 11,D 12,…,D 1w之间的最小值D 1,获取距离D 21,D 22,…,D 2w之间的最小值D 2,…,获取距离D w1,D w2,…,D ww之间的最小值D w;最后,确定所述最小值D 1,D 2,…,D w之间的平均值
Figure PCTCN2020084356-appb-000001
获得视频帧P i与视频帧Q i之间的地理坐标的相似度S i
上述实现方式中,通过对视频帧进行目标检测,获得每个视频帧中的目标的像素坐标,再根据标定参数获得每个帧中的目标的地理坐标,从而根据视频帧中的目标的地理坐标确定不同路视频的视频帧之间的相似度,进而确定N路视频的同步帧信息。因此, 确定N路视频的同步帧信息的整体流程无需额外布置任何的采集设备、视频捕捉装置等硬件装置,不对IPC的类型、网络环境以及传输协议进行限制,方案整体通用性和鲁棒性更好,可以完全以软件的形式进行部署,且可以适用于帧级别的应用场景。
在第一方面的一种可能的实现方式中,上述方法还包括:确定共视区域,该共视区域为所述N个摄像机共同拍摄到的区域,该共视区域为所述地理区域的部分或全部;根据每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度,包括:根据所述每路视频的视频帧中记录的共视区域中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。
上述实现方式通过对每个视频帧中的目标的地理坐标进行二次处理,筛选出每一个视频帧的共视区域中目标的地理坐标,可以大大减少地理坐标相似度的计算量,提高获取多路视频同步方法的处理效率。
在第一方面的一种可能的实现方式中,根据不同路视频中的视频帧之间的相似度,获得同步帧信息包括:基于多个视频帧中的目标的地理坐标之间的相似度,计算所述N路视频中每两路视频之间的同步帧信息;根据每两路视频之间的同步帧信息,确定每两路视频的时间同步的两个帧的帧号关系;根据每两路视频的时间同步的两个帧的帧号关系,确定N路视频的时间同步的N个视频帧之间的帧号关系;根据N路视频的时间同步的N个视频帧之间的帧号关系,确定N路视频的同步帧信息。
上述实现方式通过确定每两路视频之间的同步帧信息,确定每两路视频之间的帧号关系,进而确定N路视频之间的帧号关系,从而得到N路视频之间的同步帧信息。由于每次计算2路视频之间的同步帧信息,对服务器的计算压力很小,因此非常适合部署于于计算性能不高的服务器。比如,部署于道路两侧的边缘计算一体机,可以在不占用边缘计算一体机过多的计算资源的情况下,计算出路口多个IPC的同步帧信息。
在第一方面的另一种可能的实现方式中,根据不同路视频中的视频帧之间的相似度,获得同步帧信息包括:从所述N路视频中的每路视频中抽取一个视频帧组成1个视频帧组,获得t N个视频帧组;确定所述每一个视频帧组中每两个视频帧中的目标的地理坐标之间相似度的和;根据所述和最小的一个视频帧组中每个帧的帧号,确定所述N路视频的同帧信息。
上述实现方式根据不同路视频中的视频帧之间的相似度,确定N路视频的同步帧信息,适用于计算性能较高的服务器,比如云服务器,可以减少多路视频同步方法的计算时间,提高多路视频同步的效率。并且,通过计算多路视频的同步帧信息获得多路同步视频,获得的多路同步视频为视频帧级别同步的多路视频,因此应用场景更广泛,不仅可以适用于秒级别的应用场景,比如监控中心的视频同步显示,还可以适用于帧级别的应用场景,比如全景视频制作、视频拼接和目标检测。
第二方面,提供了一种多路视频同步系统,其特征在于,所述系统包括输入单元以及计算单元,其中,
所述输入单元用于获取N路视频,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数;
所述计算单元用于获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相 似度;
所述计算单元用于根据所述不同路视频中的视频帧之间的相似度,获得同步帧信息,其中,所述同步帧信息用于对所述N个摄像机拍摄的视频进行时间同步,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。
在第二方面的一种可能的实现方式中,所述N路视频为视频流,所述系统还包括输出单元,所述输出单元用于根据所述N个时间同步的视频帧在对应的视频中的位置信息,获得N路时间同步的视频,每路时间同步的视频的起始视频帧为所述每路视频中的时间同步的视频帧。
在第二方面的一种可能的实现方式中,所述输出单元还用于发送所述同步帧信息至其他设备;或者,所述输出单元还用于发送所述N路时间同步的视频至其他设备。
在第二方面的一种可能的实现方式中,所述计算单元用于将所述每路视频的视频帧输入目标检测模型,获得所述每路视频的视频帧中的目标的像素坐标;所述计算单元用于根据所述每路视频的视频帧中的目标的像素坐标和所述每路视频对应的摄像机的标定参数,确定所述每路视频的视频帧中的目标的地理坐标,其中,摄像机的标定参数用于指示该摄像机拍摄的视频画面和被拍摄的所述地理区域的映射关系。
在第二方面的一种可能的实现方式中,所述计算单元用于确定共视区域,所述共视区域为所述N个摄像机共同拍摄到的区域,所述共视区域为所述地理区域的部分或全部;所述计算单元用于根据所述每路视频的视频帧中记录的所述共视区域中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。
在第二方面的一种可能的实现方式中,所述计算单元用于计算所述每路视频的视频帧中的目标的地理坐标与其他路视频的视频帧中的目标的地理坐标之间的距离;所述计算单元用于根据所述距离确定不同路视频的视频帧之间的相似度。
第三方面,提供了一种计算机程序产品,包括计算机程序,当所述计算机程序被计算设备读取并执行时,实现如第一方面所描述的方法。
第四方面,提供了一种计算机可读存储介质,包括指令,当所述指令在计算设备上运行时,使得所述计算设备实现如第一方面描述的方法。
第五方面,提供了一种电子设备,包括处理器和存储器,所述处理器执行所述存储器中的代码实现如第一方面描述的方法。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1A为本申请提供的一种多路视频同步系统的部署示意图;
图1B是本申请提供的另一种多路视频同步系统的部署示意图;
图2是本申请提供的一种多路视频同步系统的结构示意图;
图3是本申请提供的一种多路视频同步方法流程示意图;
图4是本申请提供的一种获取多个视频帧中目标的地理坐标的方法流程示意图;
图5是本申请提供的一应用场景下的两路视频的共视区域的示意图;
图6是本申请提供的一种拍摄范围的获取方法流程示意图;
图7是本申请提供的另一种拍摄范围的获取方法的流程示意图;
图8是本申请提供的一种获取两个视频帧中目标的地理坐标之间的相似度的流程示意图;
图9是本申请提供的一种获取两路视频的同步帧信息的流程示意图;
图10是本申请提供的一种获取N路视频的同步帧信息的流程示意图;
图11是本申请提供的另一种多获取N路视频的同步帧信息的流程示意图;
图12A-12B是本申请提供的一应用场景下根据同步帧信息获得N路同步视频的流程示意图;
图13是本申请提供的一种电子设备的结构示意图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
随着城市智能、交通智能等浪潮的兴起,网络摄像机(IP Camera/Network Camera,IPC)的应用越来越广泛,IPC成为了重要的信息采集设备,通过IPC拍摄的视频可以更及时地获知某一场景发生的事件。对于某个需要被监控的场景,通常存在多个不同视角的IPC进行多路视频监控,获得多路视频。对多路监控同一地理区域的视频进行运用时,常常有严格的时间同步性要求。时间同步的多路视频中同一时刻的视频帧描述的是同一时刻的场景。举例来说,IPC1和IPC2在对同一路口进行拍摄时,IPC1获取的视频中T1时刻的视频帧为行人的右脚刚好踩在斑马线上的画面,如果IPC2获取的视频中T1时刻的视频帧不是该行人的右脚刚好踩在斑马线上的画面,而是行人还没有踩在斑马线上的画面,或者行人已经双脚踩在斑马线上的画面,那么IPC1和IPC2是两路时间不同步的视频。应理解,上述举例仅用于说明,并不能构成具体限定。
但是,由于获取多路视频的多个IPC的型号、厂家、时间戳、视频帧的帧率可能是不相同的,网络传输时延也可能会导致传输过程中出现个别视频丢帧,IPC本身的计算性也可能较差导致个别视频丢帧,使得多个IPC发送的多路视频难以保证时间的同步性。举例来说,IPC1和IPC2是同一个路口的两个监控视频,IPC1由于在T1时刻的对闯红灯的车辆进行了抓拍,使得IPC1传输的实时视频流中抓拍时刻T1对应的视频帧丢失,IPC2没有进行抓拍,也没有出现丢帧的情况,因此处理系统接收到的IPC1和IPC2发送的实时视频流在T1时刻起,IPC2比IPC1快1帧,进一步导致处理系统根据接收到的多路视频进行的目标识别、全景视频制作等视频处理的过程存在障碍。
为了解决上述由于多路视频时间不同步导致以多路视频为输入源进行视频处理的过程存在障碍的问题,本申请提供了一种多路视频同步系统。可以根据多路视频中每个视频的视频帧的内容,进行多路视频的同步帧信息的计算,从而获得时间同步的多路视频。
本申请提供的多路视频同步系统的部署灵活,可部署在边缘环境,具体可以是边缘环境中的一个边缘计算设备或运行在一个或者多个边缘计算设备上的软件系统。边缘环境指在地理位置上距离用于采集多路视频的IPC较近的,用于提供计算、存储、通信资源的边缘计算设备集群,比如位于道路两侧的边缘计算一体机。举例来说,如图1A所示, 多路视频同步系统可以是距离路口较近的位置的一台边缘计算一体机或者是运行在距离路口较近的位置的边缘计算一体机的软件系统。该路口中设置有IPC1和IPC2两个网络摄像头对该路口进行监控,每个IPC可以将路口的实时视频流通过网络发送至所述多路视频同步系统,所述多路视频同步系统可以执行本申请提供的多路视频同步的方法,计算出所述多路视频的同步帧信息,所述同步帧信息可以用于多路IPC的矫正,或者监控播放平台的同步播放,或者用于全景视频制作、多目检测等等,多路视频同步系统可以根据应用场景将同步帧信息发送至相应的处理系统中。
本申请提供的多路视频同步系统还可以部署在云环境,云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,所述云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。多路视频同步系统可以是云数据中心的服务器;检测装置也可以是创建在云数据中心中的虚拟机;还可以是部署在云数据中心中的服务器或者虚拟机上的软件系统,该软件系统可以分布式地部署在多个服务器上、或者分布式地部署在多个虚拟机上、或者分布式地部署在虚拟机和服务器上。例如:如图1B所示,多路视频同步系统部署在云环境中,该路口中设置有IPC1和IPC2两个网络摄像头对该路口进行监控,每个IPC可以将路口的实时视频流通过网络发送至所述多路视频同步系统,所述多路视频同步系统可以执行本申请提供的多路视频同步的方法,计算出所述多路视频的同步帧信息,所述同步帧信息可以用于多路IPC的矫正,或者监控播放平台的同步播放,或者全景视频制作、多目检测等等,多路视频同步系统可以根据应用场景将同步帧信息发送至相应的处理系统中,接收同步帧信息的处理系统也可以部署在云环境中、边缘环境中,或者部署在终端设备中。
多路视频同步系统内部的单元模块也可以有多种划分,各个模块可以是软件模块,也可以是硬件模块,也可以部分是软件模块部分是硬件模块,本申请不对其进行限制。图2为一种示例性的划分方式,如图2所示,多路视频同步系统100包括输入单元110、计算单元120和输出单元130。下面分别介绍每个功能单元的功能。
输入单元110用于接收N路视频,并将N路视频输入至计算单元120。具体地,所述输入单元110可以用于获取N路视频,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数。具体实现中,N路视频可以是同一个角度的多个IPC拍摄同一个地理区域获得的多个视频,也可以是不同角度的多个IPC拍摄同一个地理区域获得的多个视频。并且,N路视频可以是监控现场的IPC输入的多个直播视频,也可以是本地文件或者云存储服务器中读取的离线视频,本申请不作进行具体限定。场景可以是任意一个需要将监控目标区域的多个IPC回传的视频调整为同步播放的场景,比如交通路口、银行、小区、医院、数据中心、学校、考场、演播室等场景,本申请也不对此进行具体限定。
计算单元120用于对N路视频进行处理,获得N路视频的同步帧信息。具体地,所述计算单元120用于对每路视频的视频帧中的目标进行检测,获得所述每路视频的视频帧中的目标的地理坐标,所述目标的地理坐标指示所述目标在所述地理区域的位置;所述计算单元120用于根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度;所述计算单元120用于根据所述不同路视频中的视频帧之间 的相似度,获得所述N路视频的同步帧信息,其中,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。这里,N个时间同步的视频帧描述的是同一时刻发生的场景,N个时间同步的视频帧在对应的视频中的位置信息可以是N个时间同步的视频帧在对应视频中的帧号。
输出单元130可以直接将所述同步帧信息组传输至不同应用场景的处理系统中,也可以根据所述同步帧信息对N路视频进行处理获得时间同步的N路同步视频后,再将所述N路同步视频传输至相应的处理系统中。具体地,所述输出单元130用于根据所述同步帧信息对所述N路视频进行时间同步,获得时间同步后的N路视频。所述N路视频为实时视频流,所述输出单元用于根据所述N个时间同步的视频帧在对应的视频中的位置信息,确定所述每路视频中的时间同步的视频帧为视频的起始视频帧,获得时间同步后的N路视频。所述输出单元130用于发送所述同步帧信息至其他设备或系统;或者,所述输出单元130用于发送所述时间同步后的N路视频至其他设备或系统。举例来说,如果应用场景为多路IPC的同步矫正,那么输出单元130可以直接将同步帧信息返回给每个IPC,使得每个IPC根据同步帧信息调整自身的输出视频时序;如果应用场景为监控视频的实时同步播放,那么输出单元130可以根据同步帧信息获得多路同步视频后,再将其发送至监控中心的显示屏幕,使得监控中心可以直接显示同步播放的实时监控;如果应用场景为目标检测的场景,那么输出单元130可以直接将同步帧信息发送至目标检测的服务器,使其根据同步帧信息确定N个时间同步的视频帧,对所述N个时间同步的视频帧进行目标检测。应理解,上述举例仅用于说明,并不能构成具体限定。
本申请提供的多路视频同步系统,根据视频内容进行多路视频同步,无需额外布置任何的采集设备、视频捕捉装置等硬件装置,不对IPC的类型、网络环境以及传输协议进行限制,方案整体通用性和鲁棒性更好。并且,本申请提供的多路视频同步系统通过计算多路视频的同步帧信息获得多路同步视频,获得的多路同步视频为视频帧级别同步的多路视频,因此应用场景更广泛,不仅可以适用于秒级别的应用场景,比如监控中心的视频同步显示,还可以适用于帧级别的应用场景,比如全景视频制作、视频拼接和目标检测。
下面结合附图,对本申请提供的上述多路视频同步系统如何对多路视频进行同步矫正,获得多路同步视频的过程,进行详细介绍。
如图3所示,本申请提供了一种多路视频同步的方法,所述方法包括以下步骤:
S201:获取N路视频,其中,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数。
在一种实施例中,所述N路视频中的每路视频包括多个视频帧。可以理解的,如果同时对过多的视频帧进行计算,将会导致计算量过大,降低多路视频同步的处理效率。因此,每次进行多路视频同步时,可以根据历史同步记录和视频帧率确定多路视频同步时每一路视频的视频帧数量。举例来说,假设N个帧率为12每秒传输视频帧数(Frames Per Second,FPS)的摄像头,即每秒传输12个视频帧,而在历史计算帧率为12FPS的N路视频的视频同步过程中,确定每路视频最多比另一路视频快1秒,为了确保N路12FPS的视频可以同步直播,那么可以每两秒执行一次多路视频同步方法,也就是说,每路视频 包括的视频帧数量为12×2=24。应理解,上述举例仅用于说明,本申请不对此进行限定。
S202:获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。
在一实施例中,所述获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,包括:将所述每路视频的视频帧输入目标检测模型,获得所述每路视频的视频帧中的目标的像素坐标;根据所述每路视频的视频帧中的目标的像素坐标和所述每路视频对应的摄像机的标定参数,确定所述每路视频的视频帧中的目标的地理坐标,其中,摄像机的标定参数用于指示该摄像机拍摄的视频画面和被拍摄的所述地理区域的映射关系。该步骤的具体内容将在后文的步骤S2021-S2025进行具体描述。
在另一实施例中,可以直接通过网络从其他系统或者设备中获取所述每路视频的视频帧中的目标的地理坐标。换句话说,向其他系统或者设备发送所述每路视频的视频帧,由其他系统或这边对所述每路视频的视频帧进行目标检测,获得所述每路视频的视频帧中的目标的像素坐标后,根据标定参数确定每路视频的视频帧中的目标的地理坐标。
在一实施例中,所述根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频的视频帧之间的相似度,包括:计算所述每路视频的视频帧中的目标的地理坐标与其他路视频的视频帧中的目标的地理坐标之间的距离;根据所述距离确定不同路视频的视频帧之间的相似度。该步骤的具体内容将在后文的步骤S2026-S2028进行具体描述。
S203:根据所述不同路视频中的视频帧之间的相似度,获得所述N路视频的同步帧信息,其中,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。应理解,N个时间同步的视频帧描述的是同一时刻发生的场景,N个时间同步的视频帧在对应的视频中的位置信息可以包括N个时间同步的视频帧在对应视频中的帧号,N个时间同步的视频帧分别属于不同路的视频。其中,帧号指的是将每路视频中的多个视频帧按时间顺序排列成一个帧序列后,依次对帧序列中的每个视频帧进行编号,即为每个视频帧的帧号,比如,A路视频的第一个视频帧的帧号为1,第二个视频帧的帧号为2,依次类推,或者,第一个视频帧的帧号为0,第二个视频帧的帧号为1,依次类推。上述举例仅用于说明,并不能构成具体限定。
举例来说,三路视频A、B以及C的3个时间同步的视频帧可以是视频A的第2个帧,该帧对应的帧号为2,视频B的第3个帧,该帧对应的对应的帧号为3,以及视频C的第4个帧,该帧对应的帧号为4,那么该三路视频A、B以及C的同步帧信息可以是(2,3,4)。换句话说,视频A比视频B快1帧,视频C比视频B慢1帧。可以理解的,同步帧信息可以用于供IPC调整自身的输出视频时序,也可以用于获得多路同步视频,具体可参考图2实施例中对输出单元130的描述,这里不再进行赘述。应理解,上述举例仅用于说明,并不能构成具体限定。该步骤的具体内容将在后文的步骤S2031A-S2034A以及步骤S2031B-S2033B进行具体描述。
下面结合步骤S2021-步骤S2025,对前述步骤S202中获取所述N路视频中的每路视频的视频帧中的目标的地理坐标的具体流程进行详细地解释说明。
本申请实施例中,视频帧中的目标可以是根据N路视频的内容决定的,通常情况下,可以将视频中经常移动的人物或事物作为目标。举例来说,如果N路视频是交通路口的 监控视频,那么目标可以是汽车、行人、非机动车等等。如果N路视频是考场,那么目标可以是学生、监考老师、巡考考官等等。应理解,上述举例仅用于说明,并不能构成具体限定。应理解,由于每个视频帧包含的目标的数量可以是1个或者多个,因此每个视频的视频帧中的目标的地理坐标可以包括1个或者多个地理坐标,本申请不对此进行具体限定。
进一步地,视频帧中的目标的地理坐标可以是该视频帧的共视区域中包含的目标的地理坐标。可以理解的,获取N路视频的多个IPC虽然拍摄的是同一地理区域,但是不同的IPC的拍摄角度可能不同,因此每个IPC与其他IPC将存在一个共视区域。其中,共视区域指的是每个IPC与其他IPC都可以拍摄的区域,非共视区域指的是一些IPC可以拍摄到但是另一些IPC无法拍摄到的区域。参考前述内容可知,本申请根据视频帧中的目标的地理坐标之间的相似度确定不同路视频中的视频帧之间的相似度,而非共视区域中的目标是其他IPC无法拍摄到的目标,导致非共视区域中的目标的地理坐标与接下来计算视频帧中的目标的地理坐标之间的相似度无关系,因此,对于本申请来讲,非共视区域中的目标的地理坐标可以不参与接下来的相似度计算。如果对多个目标的地理坐标进行二次处理,筛选出每一个视频帧的共视区域中目标的地理坐标,可以大大减少地理坐标相似度的计算量,提高获取多路视频同步方法的处理效率。
具体实现中,可以通过训练好的目标检测模型,分别识别出每个视频帧中的目标后,根据目标基于该视频帧的像素坐标,获得每个视频帧中的目标的地理坐标之后,再筛选掉每路视频的非共视区域内的目标的地理坐标,从而获得所述每路视频的视频帧中的共视区域中的目标的地理坐标。因此,如图4所示,步骤S202中确定视频帧中的目标的地理坐标的具体流程可以包括以下步骤:
S2021:对所述N个摄像头进行空间标定,获取标定参数,其中,所述标定参数用于根据像素坐标获得所述像素坐标对应的地理坐标,每个摄像机的标定参数表示每个摄像机拍摄的视频画面和被拍摄的地理区域的映射关系。需要说明的,同一个应用场景下,步骤S2021只需要执行一次空间标定的过程,在获取标定参数后,如图4所示,标定参数将存储于存储器中,以便下一次计算同场景下的视频帧中的目标的地理坐标时使用。可以理解的,N路视频可以是设置于固定位置N个摄像头拍摄得到视频,或者是通过设置于固定位置的N个摄像头记录的视频,因此,当拍摄的角度发生变化,需要重新进行空间标定,获得角度更改后的标定参数。或者,在另一种实施例中,可以直接通过网络从其他系统或者设备中获取该N个摄像头的标定参数,本申请不作具体限定。
其中,空间标定是指计算所述N个摄像头的标定参数的过程。标定参数是指摄像机拍摄的视频画面和被拍摄的地理区域的映射关系,具体指的是摄像头拍摄的图像中的点的像素坐标与该点对应的地理坐标之间的对应关系。而根据所述标定参数可以将图像中任一点的像素坐标转换为地理坐标。其中,像素坐标可以是图像中目标所在位置的像素点的坐标,像素坐标是二维坐标。地理坐标可以是一个地理区域中的点的三维坐标值。应理解,物理世界中,同一个点在不同的坐标系下其对应的坐标值是不同的。本申请中目标的地理坐标可以是根据实际情况设定的任意坐标系下的坐标值,例如,本申请中目标的地理坐标可以是目标对应的经度、纬度和海拔组成的三维坐标,也可以是目标对应的自然坐标系下的X坐标、Y坐标和Z坐标组成的三维坐标,还可以是其它形式的坐标, 只要该坐标可以唯一确定一个点在地理区域中的位置,本申请不作限定。
S2022:将每路视频的每个视频帧输入目标检测模型,获得所述每个视频帧对应的输出结果图像,其中,所述输出结果图像中包含边界框(Bounding Box),所述边界框用于指示目标在图像中的位置,边界框具体可以是矩形框、圆形框、椭圆框等,本申请不作具体限定,下文将统一以矩形框为例进行说明。
具体地,某一视频帧输入目标检测模型后,对应的输出结果图像可以如图4所示。可以理解的,图4中的目标检测模型是用于检测机动车的模型,因此图4中所示的视频帧在进行目标检测后,全部的机动车都被矩形框框选出来。需要说明的,目标检测模型可以是由一种AI模型被训练后得到的,AI模型包括多种多类,神经网络模型为AI模型中的一类,在描述本申请实施例时,以神经网络模型为例。应理解,还可以使用其他AI模型完成本申请实施例描述的神经网络模型的功能,本申请不对此作任何限定。其中,神经网络模型是一类模仿生物神经网络(动物的中枢神经系统)的结构和功能的数学计算模型。一个神经网络模型也可以由多个已有的神经网络模型组合构成。不同结构的神经网络模型可用于不同的场景(例如:分类、识别)或在用于同一场景时提供不同的效果。神经网络模型结构不同具体包括以下一项或多项:神经网络模型中网络层的层数不同、各个网络层的顺序不同、每个网络层中的权重、参数或计算公式不同。业界已存在多种不同的用于识别或分类等应用场景的具有较高准确率的神经网络模型,其中,一些神经网络模型可以被特定的训练集进行训练后单独完成一项任务或与其他神经网络模型(或其他功能模块)组合完成一项任务。一些神经网络模型也可以被直接用于单独完成一项任务或与其他神经网络模型(或其他功能模块)组合完成一项任务。
具体实现中,本申请实施例中的目标检测模型可采用业界已有的用于目标检测具有较优效果的神经网络模型中的任意一种,例如:一阶段统一实时目标检测(You Only Look Once:Unified,Yolo)模型、单镜头多盒检测器(Single Shot multi box Detector,SSD)模型、区域卷积神经网络(Region ConvolutioNal Neural Network,RCNN)模型或快速区域卷积神经网络(Fast Region Convolutional Neural Network,Fast-RCNN)模型等。本申请不作具体限定。
下面以Yolo模型为例,对步骤S2022进行阐述说明。
Yolo模型是一种带有卷积结构的深度神经网络(Deep Neural Network,DNN)。Yolo模型通过在图片上放置N×N的网格,并对每个网格进行目标位置的预测和目标分类识别,相比滑动窗口进行目标位置的预测和目标分类识别,可以大大减少计算量,能够实现高准确率的快速目标检测与识别。具体实现中,Yolo模型可以包含若干网络层,其中卷积层用于提取图像中目标的特征,全连接层用于对卷积层提取的目标特征预测目标位置和目标类别概率值。
首先需要对Yolo模型进行训练,以使得Yolo模型具备目标检测的功能。在进行训练时,首先获取多个训练集,训练集中包括多个样本图像,每个样本图像为包含目标(比如机动车或者行人)的图像,每个样本图像放置有n×n的网格,每个包含目标的网格中标注有目标的边界框的位置信息(x 0,y 0,w 0,h 0)和目标所属类别的概率值P 0,其中,x 0,y 0是目标的边界框的中心坐标相对于当前网格中心坐标的偏移值,w 0和,h 0为边界框的长宽大小。其次,将yolo模型的参数初始化,然后输入训练集的样本图像至Yolo模型,Yolo 模型中的卷积层对每一个样本中的目标进行特征提取,全连接层对卷积层输出的目标的特征进行识别,预测出该目标在图像中的边界框的位置信息(x,y,w,h)以及目标所述类别的概率值P;将预测得到边界框的位置信息(x,y,w,h)与样本标注的边界框的位置信息(x 0,y 0,w 0,h 0)进行比对,将预测得到的目标所属类别的概率值P于样本标注的目标所述类别的概率值P 0进行比对,并计算出损失函数,利用计算得到的损失函数调整Yolo模型中的参数。迭代执行上述计算过程,直到损失函数值收敛之后,且计算得到的损失函数值小于预设阈值时,则停止迭代,此时,Yolo模型已经训练完毕,即已经具备目标检测的功能,可以用来进行检测视频帧中的目标,该Yolo模型即为步骤S2022中使用的目标检测模型。
得到训练完成的Yolo模型之后,利用该Yolo模型对摄像机拍摄的包含目标的待检测的视频帧进行目标检测,卷积层提取视频帧中的目标的特征,全连接层对该目标的特征进行检测识别,预测出该目标在待测视频帧中的边界框的位置信息(x’,y’,w’,h’)以及目标所属类别的概率值P’,根据位置信息(x’,y’,w’,h’)可以生成预测后的目标的边界框,根据目标所属类别的概率值P’将该目标的类别信息也标注出来,即可获得待检测的视频帧对应的输出结果图像。
S2023:根据所述每个视频帧对应的输出结果图像,获得所述每个视频帧中的目标的像素坐标。
可以理解的,如图4所示,某一视频帧在输入目标检测模型后,获得的输出结果图像中的目标将被矩形框框选出,因此可以将每个矩形框框选出的目标,用一个表示点代替,从而获得所述目标的像素坐标。
具体实现中,表示点可以是通过物体质心检测方法确定的,根据已被矩形框框选出目标的视频帧以及其他无线传感器反馈的信息,通过最大似然估计的加权等方法,检测到该目标不因其刚性运动而改变位置的唯一点(质点),并以质点位置代表其在视频帧中的位置。
表示点还可以是通过3D检测确定的,通过点云图、新增物体高度或深度等方法,将原有的2D物体检测转换为3D物体检测,得到目标物体的3D建模,根据其3D模型确定其中某一位置作为表示点,以此表示点代表目标的位置。
表示点还可以是结合视频内容,直接通过2D像素画面上矩形框确定的。举例来说,当目标为机动车时,直行车辆由于水平或垂直方向基本保持一致,因此常选择矩形框下沿的中点作为该目标的表示点;近景车辆由于尺寸较大且前后透视形变,因此常选择矩形框右下角点作为该目标的表示点;远景车辆由于尺寸较小,矩形框也很小,因此常选择矩形框的中心点作为该目标的表示点。
应理解,上述列举的多个表示点的获取方法仅用于举例说明,还可以使用其他方法获得矩形框框的表示点,本申请并不对此作具体限定。
需要说明的,将每个矩形框用表示点代替后,根据该表示点在视频帧中的像素坐标,即可获得该视频帧中的目标的的像素坐标。例如图4所示,视频帧经过目标检测模型获得输出结果图像(包含多个框选出机动车的矩形框),将该输出结果图像中的每个矩形框用一个表示点代替,即可获得以如图4所示的,该视频帧中的目标的像素坐标。应理解,图4显示的矩形框和像素坐标仅用于举例说明,不能构成具体限定。
S2024:根据所述标定参数以及所述每个视频帧中的目标的像素坐标,获得所述每个视频帧中的目标的地理坐标。
参考前述内容可知,如图4所示,根据步骤S2021获得的标定参数A以及视频帧中的目标的像素坐标,即可获取视频帧中的目标的像素坐标对应的地理坐标,具体步骤可以参考前述实施例,这里不再进行赘述。
S2025:对所述每个视频帧中的目标的地理坐标进行筛选,得到每个视频帧的共视区域内的目标的地理坐标。
在本申请实施例中,所述方法还包括:确定共视区域,所述共视区域为所述N个摄像机共同拍摄到的区域,所述共视区域为所述地理区域的部分或全部;所述根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度,包括:根据所述每路视频的视频帧中记录的所述共视区域中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。具体实现中,可以通过计算获得该N路视频的N个摄像头的拍摄范围,取N个摄像头的拍摄范围的交集获得N路视频的共视区域。其中,拍摄范围具体可以指的是摄像头能够拍摄到的地理区域对应的地理坐标范围,两路视频的共视区域指的是该两路视频对应的两个摄像头都能够拍摄到的一个地理区域对应的地理坐标范围。因此可以依次通过确定步骤S2024获得的每个视频帧中的目标的地理坐标是否处于共视区域的地理坐标范围内,将不处于共视区域内的目标的地理坐标筛选掉,从而得到每个视频帧的共视区域内的目标的地理坐标。
例如图5所示,假设N=2,共两路视频,一路视频由IPC1拍摄得到,另一路视频由IPC2得到,且IPC1的拍摄范围是扇形CDE,IPC2的拍摄范围是扇形FGH,由IPC1拍摄的一路视频中的视频帧P 1中的目标的地理坐标为A 1和A 2,由IPC2拍摄的另一路视频中的视频帧P 2中的目标的地理坐标为B 1和B 2,那么IPC1和IPC2获取的两路视频的共视区域可以是图5的阴影区域,经过步骤S2025后,视频帧P 1的共视区域内的目标的地理坐标为A 2,视频帧P 2的共视区域内的目标的地理坐标为B 2。应理解,图5仅用于举例说明,并不能构成具体限定。并且,由于获取N路视频的N个IPC是固定位置的IPC,比如交通路口的监控摄像头,因此当获取某路视频的共视区域后,可以将其存储在存储器中,以便下一次计算同一个IPC传输的视频的每一视频帧的目标的地理坐标时使用,从而减小不必要的计算量,提高多路视频同步的效率。
可以理解的,由于获取每路视频的IPC是固定的,因此每路视频的拍摄范围也是固定不变的,每路视频的拍摄范围是每个IPC拍摄的视频帧中记录的地理区域的范围。因此,可以通过确定每路视频的视频画面中能够显示的最边缘的边缘位置点,计算每个边缘位置点的像素坐标,并将其转换为地理坐标后,根据这些地理坐标组成的区域确定该路视频的拍摄范围。举例来说,如图6所示,可以在视频帧P 1上选取边缘位置点C、D和E后,获取每个边缘位置点C、D和E的像素坐标,再根据标定参数确定点C、D和E的像素坐标对应的地理坐标,由点C、D和E的地理坐标组成的扇形CDE即为视频帧P 1的拍摄范围。可以理解的,图6仅用点C、D以及E作为边缘位置点进行了举例说明,具体实现中,视频帧P1可以在视频边缘处选择多个边缘位置点,边缘位置点的数量越多,获得的拍摄范围越精确,因此可以根据计算设备的处理能力确定边缘位置点的数量。图6仅用于举例说明,本申请不作具体限定。
可以理解的,由于每路视频均包含目标,比如行人、机动车等等,因此,还可以通过统计每路视频的每一个视频帧中的目标的地理坐标集合,根据该地理坐标集合组成的空间轮廓,确定该路视频对应的拍摄范围。举例来说,如图7所示,可以在分别获取视频帧P 1,P 2,…,P m中目标的地理坐标后,多个地理坐标形成的阴影部分面积组成了IPC1的拍摄范围,也就是图7所示的扇形CDE。应理解,图7仅用于举例说明,本申请不作具体限定。
下面结合步骤S2026-步骤S2028,对前述步骤S202中确定不同路视频中的视频帧之间的相似度的具体流程进行详细地解释说明。
在本申请实施例中,可以通过计算每个视频帧中的目标的地理坐标与其他视频帧中的目标的地理坐标之间的距离值,来确定视频帧之间的相似度。其中,距离值越大,相似度越高,距离值越小,相似度越低。并且,由于每个视频帧中的目标可以是多个,因此可以通过计算视频帧中的多个目标的地理坐标,与其他视频帧中的多个目标的地理坐标之间的距离的平均值,确定该视频帧与其他视频帧之间的相似度。
如图8所示,计算两个视频帧中的目标的地理坐标的相似度的具体步骤可以如图8所示,其中,视频帧P i的目标包括A 1,A 2,…,A w,视频帧Q i的目标包括B 1,B 2,…,B w。需要说明的,一般来说,N个时间同步的视频帧中的目标的数量应该是相同的,但是,特殊情况下比如目标检测模型漏检一个或者多个目标,N个时间同步的视频帧中的目标的数量也可能不同。因此,计算两个视频帧中的目标的地理坐标的相似度时,两个视频帧中的目标的数量可能相同,也可能不同。图8以两个视频帧的目标数量相同(都是w个目标)为例进行了举例说明,两个视频帧中目标数量不同的情况下,也可以参考图8中的步骤S2026-步骤S2028进行计算,这里不再展开赘述。
计算第一路视频中的视频帧P i与第二路视频中的视频帧Q i之间地理坐标的相似度的具体步骤可以如下:
S2026:确定视频帧P i中的目标A 1的地理坐标与视频帧Q i中的目标B 1,B 2,…,B W的地理坐标之间的距离D 11,D 12,…,D 1w,计算视频帧P中的目标A 2的地理坐标与视频帧Q i中的目标B 1,B 2,…,B W的地理坐标之间的距离D 21,D 22,…,D 2w,…,计算视频帧P i中的目标A w的地理坐标与视频帧Q i中的目标B 1,B 2,…,B W的地理坐标之间的距离D w1,D w2,…,D ww,其中,所述视频帧P i与视频帧Q i是不同路视频中的视频帧。
具体实现中,视频帧P i的地理坐标A 1与视频帧Q i的地理坐标B 1之间的距离可以是地理坐标之间的欧氏矩阵,也可以是绝对值距离,还可以是地理坐标之间线段长度,具体的计算公式本申请不作具体限定。并且,视频帧P i和视频帧Q i属于不同路的视频,如图8所示,视频帧P i是A路视频中的某一视频帧,视频帧Q i是B路视频中的某一视频帧。
S2027:获取距离D 11,D 12,…,D 1w之间的最小值D 1,获取距离D 21,D 22,…,D 2w之间的最小值D 2,…,获取距离D w1,D w2,…,D ww之间的最小值D w
可以理解的,距离D 11是地理坐标A 1和地理坐标B 1之间的距离值,距离D 12是地理坐标A 1和地理坐标B 2之间的距离值,…,距离D 1w是地理坐标A1和地理坐标B w之间的距离值,因此,D 11,D 12,…,D 1w之间的最小值D 1如果是D 11,那么视频帧P i中的地理坐标A 1对应的目标(比如车牌号为A10000的机动车),与视频帧Q i中地理坐标B 1对应的目标(车 牌号为A10000的机动车)是相同的目标的可能性最大,同理,如果D 1是D 12,那么视频帧P i中的地理坐标A 1对应的目标,与视频帧Q i中地理坐标坐标B 2对应的目标是同一个目标的而可能性最大。因此计算同一个目标在视频帧P i和视频帧Q i中的地理坐标之间的距离,如果距离越近,则代表视频帧P i中的目标和视频帧中的目标Q i的地理坐标越相似,如果距离越远,则代表视频帧P i中的目标和视频帧中的目标Q i的地理坐标越不相似。
S2028:确定所述最小值D 1,D 2,…,D w之间的平均值
Figure PCTCN2020084356-appb-000002
获得视频帧P i与视频帧Q i之间的地理坐标的相似度S i
其中,平均值的计算可以是如图8所示的计算方法,还可以是其他计算平均数的方法,比如将最小值D 1,D 2,…,D w之间的中值作为平均值,将最小值D 1,D 2,…,D w之间的最大值和最小值去除后,剩余值的均值作为平均值等等,本申请不作具体限定。可以理解的,距离S i与平均值
Figure PCTCN2020084356-appb-000003
之间的关系为
Figure PCTCN2020084356-appb-000004
其中,y=f(x)为递减函数,即,平均值
Figure PCTCN2020084356-appb-000005
越小,相似度S i越大,具体实现中,y=f(x)可以是y=f(x)=1/x,即
Figure PCTCN2020084356-appb-000006
或者根据经验设定的其他递减函数,本申请不作具体限定。
下面对前述步骤S203进行详细地解释说明。
基于步骤S2026-步骤S2028计算两个视频帧的目标的地理坐标之间的相似度的方法,本申请提供了两种实现步骤S203的方法,第一种方法是先根据两个视频帧中的目标的地理坐标之间的相似度,确定两路视频之间的同步帧信息,再确定N路视频之间的同步帧信息,下文将在步骤S2031A-S2034A进行详细描述;另一种方法是直接根据两个视频帧中的目标的地理坐标之间的相似度,确定N路视频之间的同步帧信息,下文将在步骤S2031B-S2033B进行描述。下面分别对两种方法进行介绍。
首先对步骤S203的第一种实现方法,即先确定每两路视频之间的同步帧信息,再确定N路视频之间的同步帧信息的方法进行介绍。
S2031A:基于所述每个视频帧中的目标的地理坐标之间的相似度,计算所述N路视频中每两路视频之间的同步帧信息。
具体地,可以根据每两路视频的每个帧中的目标的地理坐标之间的相似度的最小值,确定每两路视频之间的同步帧信息。举例来说,步骤S2031A的具体步骤可以如图9所示,其中,图9以本次多路视频同步过程中,A路视频中包括t个视频帧参与计算、B路视频包括t个视频帧参与计算为例进行了举例说明,A路视频和B路视频中视频帧数量不同的情况下,也可以参考图9中的步骤进行计算,这里不再展开赘述。
如图9所示,首先,计算A路视频的第一个视频帧P 1分别与B路视频的t个视频帧Q 1,Q 2,…,Q t之间的目标的地理坐标的相似度S 11,S 12,…,S 1t,计算A路视频的第二个视频帧P 2分别与B路视频的t个视频帧Q 1,Q 2,…,Q t的目标的地理坐标之间的相似度S 21,S 22,…,S 2t,…,计算A路视频的第t个视频帧分别与B路视频的t个视频帧Q 1,Q 2,…,Q t的目标的地理坐标之间的相似度S t1,S t2,…,S tt,获得A路视频与B路视频的t×t个相似度。具体实现中,计算两个视频帧的目标的地理坐标之间的相似度的具体方法可以参考图8实施例中的步骤S2026-S2028,这里不再进行赘述。
最后,获取所述t×t个相似度中的最小值S uv,获得所述A路视频和B路视频的同步帧信息。其中,所述同步帧信息包括A路视频的视频帧P u在视频A中的位置信息和B路 视频的视频帧Q v在视频B中的位置信息,也就是说,A路视频中的第u个视频帧P u(假设帧号为u)与B路视频的第v个视频帧Q v(假设帧号为v)是时间同步的两个视频帧,因此图9中A路视频以及B路视频之间的同步帧信息为(u,v),举例来说,假设A路视频和B路视频的2个时间同步的视频帧为视频帧P 2和视频帧Q 1,也就是说,A路视频比B路视频快1视频帧,那么A路视频和B路视频的同步帧信息为(2,1)。B路视频和C路视频的2个时间同步的视频帧分别为Q 5和R 7,也就是说,B路视频比C路视频慢2视频帧,因此B路视频和C路视频的同步帧信息为(5,7)。
因此,参照图9所示的计算A路视频和B路视频的同步帧信息的过程,可以获得第1路视频和第2路视频之间的第1个同步帧信息(u 1,v 1),所述第2路视频和第3路视频之间的第2个同步帧信息(u 2,v 2),…,第N-1路视频和第N路视频之间的第N-1个同步帧信息(u N-1,v N-1)。
S2032A:根据所述N路视频中每两路视频之间的同步帧信息,确定所述N路视频中每两路视频的时间同步的两个帧的帧号关系。
具体地,如图10所示,可以根据第1个同步帧信息(u 1,v 1)确定第1路视频比第2路视频快x 1帧(其中,x 1=v 1-u 1),获得第1路视频和第二路视频之间的帧号关系为(0,x 2),根据第2个同步帧信息确定第2路视频比第3路视频快x 2帧(其中,x 2=v 2-u 2),获得第1路视频和第二路视频之间的帧号关系为(0,x 2),…,根据第2个同步帧信息确定第2路视频比第3路视频快x N-1帧(其中,x N-1=v N-1-u N-1),获得第N-1路视频和第N路视频之间的帧号关系为(0,x N-1)。仍以上述例子为例,A路视频和B路视频的同步帧信息为(2,1),那么A路视频和B路视频的2个时间同步的视频帧之间的帧号关系记为(0,-1)。B路视频和C路视频的同步帧信息为(5,7),那么B路视频和C路视频的2个时间同步的视频帧之间的帧号关系记为(0,2)。
S2033A:根据所述x 1,x 2,…,x N-1确定所述N路视频的时间同步的N个视频帧之间的帧号关系(0,x 1,x 1+x 2,…,x 1+x 2+…+x N-1)。仍以上述例子为例,A路视频和B路视频的2个时间同步的视频帧之间的帧号关系记为(0,-1),B路视频和C路视频的2个时间同步的视频帧之间的帧号关系记为(0,2),因此,A路视频、B路视频和C路视频之间的3个时间同步的视频帧的帧号关系为(0,-1,1)。
S2034A:根据所述N路视频的时间同步的N个视频帧之间的帧号关系(0,x 1,x 1+x 2,…,x 1+x 2+…+x N-1),确定所述N路视频的同步帧信息。
参考前述内容可知,N路视频的同步帧信息包括N个时间同步的视频帧在对应视频中的帧号,而满足帧号关系(0,x 1,x 1+x 2,…,x 1+x 2+…+x N-1)的帧号可以有很多,比如(1,1+x 1,1+x 1+x 2,…,1+x 1+x 2+…+x N-1),也可以是帧号(2,2+x 1,2+x 1+x 2,…,2+x 1+x 2+…+x N-1)等等,因此可以选择帧号全部都是正数、且帧号之和最小的一组帧号,作为N路视频的同步帧信息。仍以上述例子为例,A路视频、B路视频和C路视频之间的3个时间同步的视频帧的帧号关系为(0,-1,1),那么A路视频、B路视频和C路视频之间的同步帧信息可以为(2,1,3)。也就是说,A路视频、B路视频和C路视频之间的时间同步的视频帧可以是P 2、Q 1和R 3。应理解,上述举例仅用于说明,并不能构成具体限定。
可以理解的,上述第一种确定N路视频的同步帧信息的方法(步骤S2031A-步骤S2034A),即先确定每两路视频之间的同步帧信息,再确定N路视频之间的同步帧信息的 方法,该方法每次只计算2路视频之间的同步帧信息,对服务器的计算压力很小,因此非常适合部署于于计算性能不高的服务器,比如图1A实施例中部署于道路两侧的边缘计算一体机,该种同步帧信息的计算方法的计算压力很小,可以在不占用边缘计算一体机过多的计算资源的情况下,计算出路口多个IPC的同步帧信息。应理解,上述举例仅用于说明,并不能造成具体限定。
其次,对步骤S203的第二种实现方法,即直接确定N路视频之间的同步帧信息进行介绍。具体流程可以如下:
S2031B:从所述N路视频中的每路视频中抽取一个视频帧组成1个视频帧组,获得t N个视频帧组。
举例来说,如图11所示,如果是3路视频,第一路视频包括视频帧P 1和P 2,第二路视频包括视频帧Q 1和Q 2,第三路视频包括视频帧R 1和R 2,那么视频帧组共有8个,比如,图11中的第一个视频帧组包括P 1、Q 1和R 1,第二个视频帧组包括视频帧P 1、Q 1和R 2,第三个视频帧组包括视频帧P 2、Q 1和R 1等等,这里不展开赘述。
S2032B:确定所述每一个视频帧组中每两个视频帧中的目标的地理坐标之间相似度的和。
举例来说,如图11所示,第一个视频帧组中包括视频帧P 1、Q 1和R 1,因此步骤S1102可以先分别计算视频帧P 1和Q 1之间的目标的地理坐标的相似度S 11、视频帧P 1和R 1之间的地理坐标的相似度S' 11以及视频帧Q 1和R 1之间的目标的地理坐标的相似度S" 11,获得第一个视频帧组中每两个视频帧之间的目标的地理坐标的相似度的和Y 1=S 11+S' 11+S" 11。同理,可以获得8个视频帧组中每两个视频帧之间的目标的地理坐标的相似度的和Y 1,Y 2,…,Y 8
S2033B:根据所述和最小的一个视频帧组中每个帧的帧号,确定所述N路视频的同帧信息。
举例来说,如图11所示,假设Y 1,Y 2,…,Y 8中的最小值为Y 3,也就是说,第三个视频帧组(阴影区域中的视频帧P 2、Q 1和R 1)中每两个视频帧之间的目标的地理坐标的相似度的和最小,因此图11所示的三路视频的3个时间同步的视频帧为P 2、Q 1和R 1,同步帧信息为(2,1,1)。
可以理解的,对于一些计算性能较高的服务器,比如图1B实施例中的云服务器,由于云服务器的计算力很高,可以不考虑计算压力的问题。因此,第二种直接确定N路视频之间的同步帧信息的方法(步骤S2031B-步骤S2033B),可以减少多路视频同步方法的计算时间,提高多路视频同步的效率。
需要说明的,如果选择第一种确定N路视频的同步帧信息的方法(步骤S2031A-步骤S2034A),由于每次只计算两路视频之间的的同步帧信息,那么步骤S202计算每个视频帧中的目标的地理坐标时,可以获取全部目标的地理坐标,当计算A路视频和B路视频的的同步帧信息时,选择A路视频和B路视频的共视区域内的目标的地理坐标进行计算,当计算A路视频和C路视频的的同步帧信息时,选择A路视频和C路视频的共视区域内的目标的地理坐标进行计算,这里不在进行赘述。
如果选择第二种确定N路视频的同步帧信息的方法(步骤S2031B-步骤S2033B),由于需要同时计算N路视频之间的同步帧信息,那么步骤S202计算每个视频帧的地理坐标 时,可以直接获取N路视频的共视区域内的地理坐标,将筛选后的公式区域内的地理坐标系作为每一个视频帧的目标的地理坐标,进行N路视频的同步帧信息的计算,本申请不对共视区域筛选地理坐标的具体步骤顺序进行限定。
参考图2实施例可知,在不同的应用场景下,有的处理系统需要根据同步帧信息进行处理,有的需要根据时间同步的视频进行处理。因此,在本申请实施例中,所述方法还包括:根据所述同步帧信息对所述N路视频进行时间同步,获得时间同步后的N路视频。
具体实现中,由于N路视频可以是实时视频流,也可以是本地存储的离线视频。在所述N路视频为实时视频流的情况下,根据所述N个时间同步的视频帧在对应的视频中的位置信息,获得N路时间同步的视频,每路时间同步的视频的起始视频帧为所述每路视频中的时间同步的视频帧。同理,在所述N路视频为离线视频的情况下,在获得N路视频的同步帧信息后,可以将同步帧信息中每一个帧号对应的视频帧,作为每一路视频的播放起点,从而获得N路同步视频。例如,经过步骤S201-步骤S203计算得到图12A所示的四路视频的同步帧信息为(3,5,1,8),也就是说,四路视频的4个时间同步的视频帧分别为第一路的第3个视频帧、第二路的第5个视频帧、第三路的第1个视频帧以及第四路的第8个视频帧。因此,如图12B所示,如果图12A所示的四路视频为实时视频流,那么可以将第一路视频的起始视频帧确认为第3个视频帧,第二路视频的起始视频帧确认为第5个视频帧,第三路视频从的起始视频帧确认为第1个视频帧,第四路视频的起始视频帧确认为第8个视频帧,从而得到4路同步的实时视频流。同理,如果图12A所示的四路视频为离线视频,那么可以第一路视频的第3个视频帧,第二路视频的第5个视频帧,第三路视频的第1个视频帧,第四路视频的第8个视频帧,作为每一路视频的播放起点,从而获得N路同步视频。理解的,图12A-图12B仅用于举例说明,并不能构成具体限定。
在本申请实施例中,所述方法还包括:发送所述同步帧信息至其他设备;或者,发送所述N路时间同步的视频至其他设备。可以理解的,参考图2实施例可知,如果是全景视频制作以及目标检测等需要根据多个时间同步的视频帧进行处理的场景,在计算单元102获得N路视频的同步帧信息后,输出模块103可以根据应用场景,直接将同步帧信息发送至处理系统或者处理设备,使得所述处理系统或者处理设备可以根据所述N路时间同步的视频帧获得同一时刻拍摄同一地理区域的多个图像,并根据所述多个图像进行全景图像制作或者图像识别处理。如果应用场景为监控视频的实时同步播放,那么输出单元130可以根据同步帧信息获得多路同步视频后,再将其发送至监控中心的显示屏幕,使得监控中心可以直接显示同步播放的实时监控。
综上可知,本申请提供的多路视频同步的方法,由于根据视频内容进行多路视频同步,无需额外布置任何的采集设备、视频捕捉装置等硬件装置,适用于任何型号、厂家、参数、时间戳的IPC,适用于任何网络延迟状况、传输协议的通信环境,方案整体通用性和鲁棒性更好。并且,由于通过计算多路视频的同步帧信息获得多路同步视频,获得的多路同步视频为视频帧级别同步的多路视频,因此应用场景更广泛,不仅可以适用于秒级别的应用场景,比如监控中心的视频同步显示,还可以适用于帧级别的应用场景,比如全景视频制作、视频拼接和目标检测。
上述详细阐述了本申请实施例的方法,为了便于更好的实施本申请实施例上述方案,相应地,下面还提供用于配合实施上述方案的相关设备。
本申请还提供如图2所示的一种多路视频同步系统100,该多路视频同步系统100用于执行前述多路视频同步方法。本申请对该多路视频同步系统中的功能单元的划分不做限定,可以根据需要对该多路视频同步系统中的各个单元进行增加、减少或合并。图2示例性地提供了一种功能单元的划分:多路视频同步系统100包括输入单元110、计算单元120和输出单元130,其中,
所述输入单元110用于获取N路视频,其中,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数。
所述计算单元120用于获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度;
所述计算单元120用于根据所述不同路视频中的视频帧之间的相似度,获得同步帧信息,其中,所述同步帧信息用于对所述N个摄像机拍摄的视频进行时间同步,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。
可选地,所述N路视频为视频流,所述系统还包括输出单元130,所述输出单元130用于根据所述N个时间同步的视频帧在对应的视频中的位置信息,获得N路时间同步的视频,每路时间同步的视频的起始视频帧为所述每路视频中的时间同步的视频帧。
可选地,所述输出单元130还用于发送所述同步帧信息至其他设备;或者,所述输出单元还用于发送所述N路时间同步的视频至其他设备。
可选地,所述计算单元120用于将所述每路视频的视频帧输入目标检测模型,获得所述每路视频的视频帧中的目标的像素坐标;所述计算单元120用于根据所述每路视频的视频帧中的目标的像素坐标和所述每路视频对应的摄像机的标定参数,确定所述每路视频的视频帧中的目标的地理坐标,其中,摄像机的标定参数用于指示该摄像机拍摄的视频画面和被拍摄的所述地理区域的映射关系。
可选地,所述计算单元120用于确定共视区域,所述共视区域为所述N个摄像机共同拍摄到的区域,所述共视区域为所述地理区域的部分或全部;所述计算单元120用于根据所述每路视频的视频帧中记录的所述共视区域中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。
可选地,所述计算单元120用于计算所述每路视频的视频帧中的目标的地理坐标与其他路视频的视频帧中的目标的地理坐标之间的距离;所述计算单元120用于根据所述距离确定不同路视频的视频帧之间的相似度。
在一种实施例中,多路视频同步系统100中的输入单元110与计算单元120用于执行前述方法的步骤S201-步骤S203及其可选步骤。在另一种更具体的实施例中,计算单元120用于执行前述方法步骤S2021-步骤S2028、步骤S2031A-步骤S2034A、步骤S2031B-步骤S2033B及其可选步骤。
上述三个单元之间互相可通过通信通路进行数据传输,应理解,多路视频同步系统100包括的各单元可以为软件单元、也可以为硬件单元、或部分为软件单元部分为硬件单 元。
参见图13,图13为本申请实施例提供的一种电子设备的结构示意图。其中,所述电子设备1300可以是前述内容中的多路视频同步系统100。如图13所示,电子设备1300包括:处理器1310、通信接口1320以及存储器1330,所示处理器1310、通信接口1320以及存储器1330通过内部总线1340相互连接。应理解,该电子设备1300可以为图1B所示的云环境中的电子设备,或为图1A所示的边缘环境中的电子设备。
处理器1310、通信接口1320和存储器1330可通过总线方式连接,也可通过无线传输等其他手段实现通信。本申请实施例以通过总线1340连接为例,其中,总线1340可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线1340可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
所述处理器1310可以由一个或者多个通用处理器构成,例如中央处理器(Central Processing Unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(Application-Specific Inegrated Circuit,ASIC)、可编程逻辑器件(Programmable Logic Device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场可编程逻辑门阵列(Field-Programmable Gate Array,FPGA)、通用阵列逻辑(Generic Array Logic,GAL)或其任意组合。处理器1310执行各种类型的数字存储指令,例如存储在存储器1330中的软件或者固件程序,它能使计算设备1300提供较宽的多种服务。
具体地,所述处理器1310可以包括计算单元和输出单元,计算单元可以通过调用存储器1330中的程序代码以实现处理功能,包括图2中的计算单元120所描述的功能,例如获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,或者根据所述每路视频的视频帧中的目标的地理坐标确定不同路视频中的视频帧之间的相似度等等,具体可用于执行前述方法的S2021-步骤S2028、步骤S2031A-步骤S2034A、步骤S2031B-步骤S2033B及其可选步骤,还可以用于执行图3-图12B实施例描述的其他步骤,这里不再进行赘述。输出单元也可以通过调用存储器1330中的程序代码以实现处理功能,包括图2中的输出单元130所描述的功能,例如根据N路视频的同步帧信息获得N路时间同步的视频,或者发送同步帧信息至其他设备,或者发送所述N路时间同步的视频至其他设备等等,还可以用于执行图3-图12B实施例描述的其他步骤,这里不再进行赘述。
存储器1330可以包括易失性存储器(Volatile Memory),例如随机存取存储器(Random Access Memory,RAM);存储器1330也可以包括非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);存储器1330还可以包括上述种类的组合。其中,存储器1330可以存储有应用程序代码以及程序数据。程序代码可以是计算N路视频共视区域的代码、计算每个帧中的目标的地理坐标的代码、计算同步帧信息的代码等等,程序数据可以是标定参数、共视区域的地理坐标范围等等。还可以用于执行图3-图12B实施例描述的其他步骤,这里不再进行赘述。
通信接口1320可以为有线接口(例如以太网接口),可以为内部接口(例如高速串行计算机扩展总线(Peripheral Component Interconnect express,PCIe)总线接口)、有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与与其他设备或模块进行通信。
需要说明的,图13仅仅是本申请实施例的一种可能的实现方式,实际应用中,所述电子设备还可以包括更多或更少的部件,这里不作限制。关于本申请实施例中未示出或未描述的内容,可参见前述图3-图12B所述实施例中的相关阐述,这里不再赘述。图13所示的电子设备还可以是多个计算节点构成的计算机集群,本申请不作具体限定。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在处理器上运行时,图3-图12B所示的方法流程得以实现。
本申请实施例还提供一种计算机程序产品,当所述计算机程序产品在处理器上运行时,图3-图12B所示的方法流程得以实现。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(Digital Video Disc,DVD)、或者半导体介质。半导体介质可以是SSD。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (15)

  1. 一种多路视频同步的方法,其特征在于,所述方法包括:
    获取N路视频,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数;
    获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度;
    根据所述不同路视频中的视频帧之间的相似度,获得同步帧信息,其中,所述同步帧信息用于对所述N个摄像机拍摄的视频进行时间同步,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。
  2. 如权利要求1所述的方法,其特征在于,所述N路视频为视频流,所述方法还包括:
    根据所述N个时间同步的视频帧在对应的视频中的位置信息,获得N路时间同步的视频,每路时间同步的视频的起始视频帧为所述每路视频中的时间同步的视频帧。
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:
    发送所述同步帧信息至其他设备;
    或者,发送所述N路时间同步的视频至其他设备。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,包括:
    将所述每路视频的视频帧输入目标检测模型,获得所述每路视频的视频帧中的目标的像素坐标;
    根据所述每路视频的视频帧中的目标的像素坐标和所述每路视频对应的摄像机的标定参数,确定所述每路视频的视频帧中的目标的地理坐标,其中,摄像机的标定参数用于指示该摄像机拍摄的视频画面和被拍摄的所述地理区域的映射关系。
  5. 如权利要求1-4任一项所述的方法,其特征在于,
    所述方法还包括:
    确定共视区域,所述共视区域为所述N个摄像机共同拍摄到的区域,所述共视区域为所述地理区域的部分或全部;
    所述根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度,包括:
    根据所述每路视频的视频帧中记录的所述共视区域中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频的视频帧之间的相似度,包括:
    计算所述每路视频的视频帧中的目标的地理坐标与其他路视频的视频帧中的目标的地理坐标之间的距离;
    根据所述距离确定不同路视频的视频帧之间的相似度。
  7. 一种多路视频同步系统,其特征在于,所述系统包括输入单元以及计算单元,其中,
    所述输入单元用于获取N路视频,所述N路视频由N个摄像机对一个地理区域进行拍摄获得,所述N为不小于2的整数;
    所述计算单元用于获取所述N路视频中的每路视频的视频帧中的目标的地理坐标,根据所述每路视频的视频帧中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度;
    所述计算单元用于根据所述不同路视频中的视频帧之间的相似度,获得同步帧信息,其中,所述同步帧信息用于对所述N个摄像机拍摄的视频进行时间同步,所述同步帧信息包括N个时间同步的视频帧在对应的视频中的位置信息。
  8. 如权利要求7所述的系统,其特征在于,所述N路视频为视频流,所述系统还包括输出单元,所述输出单元用于根据所述N个时间同步的视频帧在对应的视频中的位置信息,获得N路时间同步的视频,每路时间同步的视频的起始视频帧为所述每路视频中的时间同步的视频帧。
  9. 如权利要求8所述的系统,其特征在于,所述输出单元还用于发送所述同步帧信息至其他设备;或者,所述输出单元还用于发送所述N路时间同步的视频至其他设备。
  10. 如权利要求7-9任一项所述的系统,其特征在于,
    所述计算单元用于将所述每路视频的视频帧输入目标检测模型,获得所述每路视频的视频帧中的目标的像素坐标;
    所述计算单元用于根据所述每路视频的视频帧中的目标的像素坐标和所述每路视频对应的摄像机的标定参数,确定所述每路视频的视频帧中的目标的地理坐标,其中,摄像机的标定参数用于指示该摄像机拍摄的视频画面和被拍摄的所述地理区域的映射关系。
  11. 如权利要求7-10任一项所述的系统,其特征在于,
    所述计算单元用于确定共视区域,所述共视区域为所述N个摄像机共同拍摄到的区域,所述共视区域为所述地理区域的部分或全部;
    所述计算单元用于根据所述每路视频的视频帧中记录的所述共视区域中的目标的地理坐标,确定不同路视频中的视频帧之间的相似度。
  12. 如权利要求7-11任一项所述的系统,其特征在于,
    所述计算单元用于计算所述每路视频的视频帧中的目标的地理坐标与其他路视频的视频帧中的目标的地理坐标之间的距离;
    所述计算单元用于根据所述距离确定不同路视频的视频帧之间的相似度。
  13. 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在计算设备上运行时,使得所述计算设备执行如权利要求1至6任一权利要求所述的方法。
  14. 一种电子设备,其特征在于,包括处理器和存储器,所述处理器执行所述存储器中的代码执行如权利要求1至6任一权利要求所述的方法。
  15. 一种计算机程序产品,其特征在于,包括计算机程序,当所述计算机程序被计算设备读取并执行时,使得所述计算设备执行如权利要求1至6任一权利要求所述的方法。
PCT/CN2020/084356 2019-08-29 2020-04-11 多路视频同步的方法、系统及设备 WO2021036275A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910809382.5 2019-08-29
CN201910809382 2019-08-29
CN201911209316.0 2019-11-30
CN201911209316.0A CN112449152B (zh) 2019-08-29 2019-11-30 多路视频同步的方法、系统及设备

Publications (1)

Publication Number Publication Date
WO2021036275A1 true WO2021036275A1 (zh) 2021-03-04

Family

ID=74685448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084356 WO2021036275A1 (zh) 2019-08-29 2020-04-11 多路视频同步的方法、系统及设备

Country Status (1)

Country Link
WO (1) WO2021036275A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010038847A (ja) * 2008-08-08 2010-02-18 Shimadzu Corp モーショントラッカシステム及びその座標系設定方法
US20140078395A1 (en) * 2012-09-19 2014-03-20 Tata Consultancy Services Limited Video synchronization
CN104063867A (zh) * 2014-06-27 2014-09-24 浙江宇视科技有限公司 一种多摄像机视频同步方法和装置
CN104268866A (zh) * 2014-09-19 2015-01-07 西安电子科技大学 基于运动信息与背景信息相结合的视频序列配准方法
CN107623838A (zh) * 2016-07-15 2018-01-23 索尼公司 信息处理设备、方法和计算机程序产品
CN108234819A (zh) * 2018-01-30 2018-06-29 西安电子科技大学 基于单应变换的视频同步方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010038847A (ja) * 2008-08-08 2010-02-18 Shimadzu Corp モーショントラッカシステム及びその座標系設定方法
US20140078395A1 (en) * 2012-09-19 2014-03-20 Tata Consultancy Services Limited Video synchronization
CN104063867A (zh) * 2014-06-27 2014-09-24 浙江宇视科技有限公司 一种多摄像机视频同步方法和装置
CN104268866A (zh) * 2014-09-19 2015-01-07 西安电子科技大学 基于运动信息与背景信息相结合的视频序列配准方法
CN107623838A (zh) * 2016-07-15 2018-01-23 索尼公司 信息处理设备、方法和计算机程序产品
CN108234819A (zh) * 2018-01-30 2018-06-29 西安电子科技大学 基于单应变换的视频同步方法

Similar Documents

Publication Publication Date Title
CN112449152B (zh) 多路视频同步的方法、系统及设备
CN109272530B (zh) 面向空基监视场景的目标跟踪方法与装置
JP5138031B2 (ja) 深度関連情報を処理する方法、装置及びシステム
JP6261815B1 (ja) 群集監視装置、および、群集監視システム
US20140072170A1 (en) 3d human pose and shape modeling
TW200818916A (en) Wide-area site-based video surveillance system
CN113447923A (zh) 目标检测方法、装置、系统、电子设备及存储介质
Cho et al. Diml/cvl rgb-d dataset: 2m rgb-d images of natural indoor and outdoor scenes
CN111383204A (zh) 视频图像融合方法、融合装置、全景监控系统及存储介质
CN112967341A (zh) 基于实景图像的室内视觉定位方法、系统、设备及存储介质
CN113711276A (zh) 尺度感知单目定位和地图构建
CN112950717A (zh) 一种空间标定方法和系统
Bauer et al. UASOL, a large-scale high-resolution outdoor stereo dataset
CN116194951A (zh) 用于基于立体视觉的3d对象检测与分割的方法和装置
CN115376109A (zh) 障碍物检测方法、障碍物检测装置以及存储介质
CN114299230A (zh) 一种数据生成方法、装置、电子设备及存储介质
TW202247108A (zh) 視覺定位方法、設備及電腦可讀儲存媒體
WO2021036275A1 (zh) 多路视频同步的方法、系统及设备
Lee et al. Vehicle counting based on a stereo vision depth maps for parking management
CN116665179A (zh) 数据处理方法、装置、域控制器以及存储介质
Knorr et al. A modular scheme for 2D/3D conversion of TV broadcast
CN115880643A (zh) 一种基于目标检测算法的社交距离监测方法和装置
Fang et al. Energy-efficient distributed target tracking in wireless video sensor networks
Zhang et al. Edge assisted real-time instance segmentation on mobile devices
CN114782496A (zh) 一种对象的跟踪方法、装置、存储介质及电子装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20858627

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20858627

Country of ref document: EP

Kind code of ref document: A1