CN109257559A

CN109257559A - A kind of image display method, device and the video conferencing system of panoramic video meeting

Info

Publication number: CN109257559A
Application number: CN201811143282.5A
Authority: CN
Inventors: 韦国华; 胡小鹏; 万春雷; 顾振华
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2019-01-22

Abstract

The invention discloses image display method, device and the video conferencing systems of a kind of panoramic video meeting, and wherein the image generating method of panoramic video meeting includes the following steps: the panoramic video data for using panoramic camera to obtain meeting scene；Humanoid detection is carried out to video frame each in panoramic video data, the humanoid coordinate information for obtaining each video frame in panoramic video data obtains the action message of a certain mobile human body in each video frame according to humanoid coordinate information；Image interception is carried out to each video frame in panoramic video data according to action message, generates the video image of mobile human body.The panoramic video data at meeting scene are obtained by using panoramic cameras such as fish eye cameras, solve the problems, such as that the structure of video conferencing system in the prior art is complex, simultaneously, also the humanoid coordinate information of each video frame in panoramic video data can be obtained by humanoid detection, realize being particularly shown in panorama video signal, cost of implementation is lower.

Description

A kind of image display method, device and the video conferencing system of panoramic video meeting

Technical field

The present invention relates to technical field of video communication more particularly to a kind of image display method of panoramic video meeting, Image display device, video conferencing system and the computer readable storage medium of panoramic video meeting.

Background technique

With the development of video camera and network technology, make it possible remote real-time video meeting, existing video conference System supports the synchronous transfer function of audio and video, can be sent to the audio-video signal at meeting scene far from meeting scene On audio/video player, so that long-range participant be helped to participate in or audit meeting.Previous video conference mostly uses single width figure The tiling display mode of picture, the video of this mode do not have Deep Canvas, and are difficult to obtain the short distance front elevation of spokesman Picture, so that the real-time interactive effect of video conference is had a greatly reduced quality.It is shown to make video conference scene be able to carry out panorama, often Multiple cameras is needed to carry out the shooting of different angle, meanwhile, captured image is switched on a video display aobvious Show, or the specific picture of a certain meeting participant progress is focused using one or more video camera and is shown.However, being taken the photograph using more Camera switches or when being particularly shown, although having achieved the effect that panorama video signal, this mode needs multiple cameras and specially The operator of door completes, and considerably increases the cost and complexity of video conference.

Summary of the invention

Therefore, the technical problem to be solved in the present invention is that solving the video conference in the prior art for realizing panorama video signal The problem of structure is complicated for system, higher cost provides a kind of image that being able to carry out based on panoramic camera is particularly shown Display methods, device and video conferencing system.

For this purpose, according in a first aspect, the present invention provides a kind of image generating method of panoramic video meeting, including it is as follows Step: the panoramic video data at meeting scene are obtained using panoramic camera；Video frame each in panoramic video data is carried out Humanoid detection obtains the humanoid coordinate information of each video frame in panoramic video data；It is obtained according to humanoid coordinate information each The action message of a certain mobile human body in video frame；Each video frame in panoramic video data is carried out according to action message Image interception generates the video image of mobile human body.

Optionally, the image generating method of panoramic video meeting further includes following steps: using microphone array to meeting The sound source at scene is positioned, and sound source position information is obtained；Sound source is obtained in panoramic video data according to sound source position information Sound source coordinate information；Using sound source coordinate information as speaker's location information, or the sound source seat that will include the corresponding moment The humanoid coordinate information of information is marked as speaker's location information；According to speaker's location information to each in panoramic video data A video frame carries out image interception, generates the video image of speaker.

Optionally, the image generating method of panoramic video meeting further includes following steps: to each in panoramic video data Video frame carries out Face datection, obtains the face coordinate information of each video frame in panoramic video data；When to include to correspondence The face coordinate information of the sound source coordinate information at quarter is as speaker's location information.

According to second aspect, the present invention also provides a kind of image generating methods of panoramic video meeting, including walk as follows It is rapid: to receive the panoramic video data at meeting scene；Receive the action message of a certain mobile human body in panoramic video data；Movement letter Breath includes humanoid coordinate information of the mobile human body in the different video frame of panoramic video data；According to action message to aphorama Each video frame of the frequency in carries out image interception, generates the video image of mobile human body.

Optionally, the image generating method of panoramic video meeting further includes following steps: receiving speaker's location information；It is main Say people's location information be panoramic video data in sound source coordinate information or include the corresponding moment sound source coordinate information Humanoid coordinate information or include the corresponding moment sound source coordinate information face coordinate information；According to speaker's location information Image interception is carried out to each video frame in panoramic video data, generates the video image of speaker.

Optionally, the image generating method of panoramic video meeting further includes following steps: by panoramic video data and entirely Timestamp alignment in scape video data in the action message of a certain mobile human body.

According to the third aspect, the present invention also provides a kind of video generation devices of panoramic video meeting, comprising: video letter Breath obtains module, for using panoramic camera to obtain the panoramic video data at meeting scene；Humanoid detection module, for complete Each video frame carries out humanoid detection in scape video data, obtains the humanoid coordinate letter of each video frame in panoramic video data Breath；Action message obtains module, for obtaining the movement of a certain mobile human body in each video frame according to humanoid coordinate information Information；First image generation module is cut for carrying out image to each video frame in panoramic video data according to action message It takes, generates the video image of mobile human body.

Optionally, the video generation device of panoramic video meeting further include: auditory localization module, for using microphone array Column position the sound source at meeting scene, obtain sound source position information；The first locating module of speaker, for according to sound source position Confidence ceases to obtain sound source coordinate information of the sound source in panoramic video data；Believe sound source coordinate information as speaker position Breath, or using include the corresponding moment sound source coordinate information humanoid coordinate information as speaker's location information；Second figure It is raw for carrying out image interception to each video frame in panoramic video data according to speaker's location information as generation module At the video image of speaker.

Optionally, the video generation device of panoramic video meeting further include: the second locating module of speaker, for panorama Each video frame carries out Face datection in video data, obtains the face coordinate information of each video frame in panoramic video data； Using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information.

According to fourth aspect, the present invention also provides a kind of video generation devices of panoramic video meeting, comprising: video letter Receiving module is ceased, for receiving the panoramic video data at meeting scene；Action message receiving module, for receiving aphorama frequency The action message of a certain mobile human body in；Action message includes mobile human body in the different video frame of panoramic video data Humanoid coordinate information；Third image generation module, for according to action message to each video frame in panoramic video data into Row image interception generates the video image of mobile human body.

Optionally, the video generation device of panoramic video meeting further include: speaker's information receiving module, for receiving master Say people's location information；Speaker's location information is the sound source coordinate information in panoramic video data or includes the corresponding moment The humanoid coordinate information of sound source coordinate information or include the corresponding moment sound source coordinate information face coordinate information；4th Image generation module, for carrying out image interception to each video frame in panoramic video data according to speaker's location information, Generate the video image of speaker.

According to the 5th aspect, the present invention provides a kind of video conferencing systems, comprising: at least one processor；And with The memory of at least one processor communication connection；Wherein, memory is stored with the instruction that can be executed by a processor, instruction It is executed by least one processor, so that at least one processor executes all or part of method of above-mentioned first aspect, or Execute all or part of method of above-mentioned second aspect.

According to the 6th aspect, the present invention provides a kind of computer readable storage mediums, are stored thereon with computer instruction, The step of all or part of method of above-mentioned first aspect, is realized in the instruction when being executed by processor, or realizes above-mentioned second party The step of all or part of method in face.

Technical solution provided in an embodiment of the present invention, has the advantages that

1, the image generating method of panoramic video meeting provided by the invention, is included the following steps: for being taken the photograph using panorama The panoramic video data at camera acquisition meeting scene；Humanoid detection is carried out to video frame each in panoramic video data, is obtained complete The humanoid coordinate information of each video frame obtains a certain shifting in each video frame according to humanoid coordinate information in scape video data The action message of moving body；Image interception is carried out to each video frame in panoramic video data according to action message, generates and moves The video image of moving body.The panoramic video data at meeting scene, solution are obtained by using panoramic cameras such as fish eye cameras It has determined and has carried out the shooting of different angle using multiple cameras in the prior art, to obtain the video conference of panoramic video data The complex problem of the structure of system；Meanwhile the humanoid of each video frame in panoramic video data is obtained by humanoid detection Coordinate information, and obtain whether the human body is shifting by the situation of change of the humanoid coordinate information of a certain human body in each video frame Moving body, and using humanoid coordinate information of the mobile human body in meeting scene in each video frame as its action message, then When according to the image of each intercepting video frames mobile human body of the action message in panoramic video data, it can focus and generate meeting The video image of mobile human body in view scene, to realize being particularly shown in panorama video signal, cost of implementation is lower.

2, the image generating method of panoramic video meeting provided by the invention, further includes following steps: using microphone array Column position the sound source at meeting scene, obtain sound source position information；Sound source is obtained in aphorama according to sound source position information Sound source coordinate information of the frequency in；It using sound source coordinate information as speaker's location information, or will include the corresponding moment Sound source coordinate information humanoid coordinate information as speaker's location information；According to speaker's location information to aphorama frequency Each video frame in carries out image interception, generates the video image of speaker.It is existing to meeting by using microphone array The sound source of field is positioned to obtain sound source position information, and the sound source position in meeting scene is generally speaker position, then Intercept the video image in each video frame of panoramic video data at sound source position, it will be able to generate the speaker's at meeting scene Video image, thus, it is possible to realize that the speaker in panorama video signal is particularly shown, the image for enriching the panoramic video meeting is generated Method is particularly shown function.In addition, using include the corresponding moment sound source coordinate information humanoid coordinate information as speaker People's location information is realized the dual status of speaker position by sound source coordinate information and humanoid coordinate information, can prevented When only positioning speaker position by sound source coordinate information, the problem of non-human sound source at meeting scene can also be treated as speaker Generation, improve the positioning accuracy of speaker.

3, the image generating method of panoramic video meeting provided by the invention, to each video frame in panoramic video data into Row Face datection obtains the face coordinate information of each video frame in panoramic video data；It will include the sound source at corresponding moment The face coordinate information of coordinate information is as speaker's location information.By carrying out people to video frame each in panoramic video data Face detection, and using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information, that is, lead to It crosses sound source coordinate information and face coordinate information realizes the dual status of speaker position, can prevent from only believing by sound source coordinate When breath positioning speaker position, the generation for the problem of non-human sound source at meeting scene can also be treated as speaker, additionally it is possible to prevent Only using include the corresponding moment sound source coordinate information humanoid coordinate information as speaker's location information, and human body is by obstacle When object blocks, the generation for the problem of can not accurately determining speaker position further improves the positioning accuracy of speaker.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of method flow diagram of the image generating method for panoramic video meeting that embodiment 1 provides；

Fig. 2 and Fig. 3 is the specific example figure of video image output mode；

Fig. 4 is a kind of method flow diagram of the image generating method for panoramic video meeting that embodiment 2 provides；

Fig. 5 is a kind of structural schematic diagram of the video generation device for panoramic video meeting that embodiment 3 provides；

Fig. 6 is a kind of structural schematic diagram of the video generation device for panoramic video meeting that embodiment 4 provides；

Fig. 7 is a kind of hardware structural diagram for video conferencing system that embodiment 5 provides.

Specific embodiment

Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that term " first ", " second ", " third " are used for description purposes only, It is not understood to indicate or imply relative importance.

Embodiment 1

A kind of image generating method of panoramic video meeting is present embodiments provided, as shown in Figure 1.It should be noted that Step shown in the flowchart of the accompanying drawings can execute in a computer system such as a set of computer executable instructions, and It, in some cases, can be to be different from sequence execution institute herein and although logical order is shown in flow charts The step of showing or describing.The process includes the following steps:

Step S100 obtains the panoramic video data at meeting scene using panoramic camera.In this embodiment, using flake Video camera obtains the panoramic video data at meeting scene, also, the panoramic video data are the video data after correction process.

Step S200 carries out humanoid detection to video frame each in panoramic video data, obtains each in panoramic video data The humanoid coordinate information of a video frame.In the present embodiment, it is humanoid that the progress of OpenCV AdaBoost scheduling algorithm can be used Detection, obtains the humanoid coordinate information of each video frame in panoramic video data, in a particular embodiment, humanoid coordinate information is One coordinate information comprising humanoid rectangle frame, specifically, humanoid coordinate information are the rectangle frame comprising human body in aphorama Coordinate in frequency data coordinate system.

Step S300 obtains the action message of a certain mobile human body in each video frame according to humanoid coordinate information.? In the present embodiment, different human bodies is distinguished by the humanoid head and shoulder ratio detected or humanoid shape.In the present embodiment In, whether change by comparing humanoid coordinate information of the same human body in different video frame and judges whether the human body is mobile human Body, specifically, when humanoid coordinate information of the same human body in different video frame constantly changes, the human body is mobile human body, Humanoid coordinate information of the mobile human body in each video frame is the action message of the mobile human body；When same human body is not Constant or when being basically unchanged with the humanoid coordinate information in video frame, the human body is not mobile human body.

Step S400 carries out image interception to each video frame in panoramic video data according to action message, generates and move The video image of moving body.In the present embodiment, the people by the mobile human body that includes in action message in each video frame Shape coordinate information intercepts the image at the corresponding humanoid coordinate information of video frame, to generate the video image of mobile human body.

In the present embodiment, the user of terminal can according to their own needs carry out the video pictures locally exported autonomous Selection.Specifically, user can choose in a manner of picture-in-picture (one big picture adds a small picture) to show video figure Picture, as shown in Fig. 2, the panorama at display conference scene may be selected in picture-in-picture big picture therein, and small picture then may be selected The focused visual of mobile human body, i.e., newly enter personnel's mobile tracking picture of meeting room in meeting scene；Certainly, as shown in figure 3, User also can choose the focused visual of mobile human body in big picture display conference scene, small picture then display conference scene it is complete Scape picture.

It should be noted that the above method is the data collection station in video conferencing system while being used as image output eventually Data processing method process when end, and when data collection station is exported without image, then directly executing step S100 After~step S300, video playing terminal is sent by the action message of panoramic video data and a certain mobile human body, video is broadcast It puts terminal and image interception is carried out to each video frame in panoramic video data according to action message, generate the video of mobile human body Image.In the present embodiment, it before carrying out data transmission, needs to carry out Video coding to panoramic video data, specifically, depending on The coded format of frequency coding can be various coded formats, for example, H.263, H.264, H.265, MPEG-4, VP8 etc., and to dynamic Make information and also carry out necessary coding, in the present embodiment, carries out the panoramic video data of Video coding and carry out humanoid detection Panoramic video data can be same part data, i.e. video detection and Video coding can be with serial process, or two parts Different data, i.e. video detection and Video coding can also be with parallel processings, specifically same part data or two parts different Depending on the ability of the chip of data visualization data collection station, if the chip operational capability of data collection station is enough, humanoid inspection It is minimum to survey work bring time delay, same part data may be used carrys out serial process and be detected and encoded；If data collection station Chip can brings in humanoid detection compared with long time delay, then is detected and encoded respectively using two parts of data.

It in the present embodiment, can be on the same channel by the panoramic video data and action message packing hair after coding It send to video playing end, the panoramic video data after coding can also be sent to video playing end on one channel, another The action message after coding is sent to video playing end on one channel；Panoramic video after on the same channel by coding When data and action message are sent to video playing end, the combinations of panoramic video data and action message after needing to encode with It is aligned on the basis of timestamp (time-stamp).In the present embodiment, the packing transmission of panoramic video data uses standard The packing transport protocol of video conference, to maximize compatible various traditional video conference terminals and entity, so that data acquire Terminal both can execute the method in the present embodiment with the video playing end for supporting specific picture to show and carry out remote real-time video Meeting can also carry out common to realize that specific picture is shown with traditional video playing end for not supporting specific picture to show Remote real-time video meeting.

The image generating method of panoramic video meeting provided in this embodiment, by using panoramic shootings such as fish eye cameras Machine obtains the panoramic video data at meeting scene, solves the bat for carrying out different angle using multiple cameras in the prior art It takes the photograph, thus the problem that the structure for obtaining the video conferencing system of panoramic video data is complex；Meanwhile it being detected by humanoid The humanoid coordinate information of each video frame into panoramic video data, and pass through the humanoid coordinate of a certain human body in each video frame The situation of change of information obtains whether the human body is mobile human body, and by the mobile human body in meeting scene in each video frame Humanoid coordinate information as its action message, then each intercepting video frames according to action message in panoramic video data should When the image of mobile human body, the video image for generating mobile human body in meeting scene can be focused, to realize in panorama video signal Be particularly shown, cost of implementation is lower.

In an alternate embodiment of the invention, the image generating method of panoramic video meeting further includes following steps:

Step S500 is positioned using sound source of the microphone array to meeting scene, obtains sound source position information.At this In embodiment, the time difference information of current speech and the position of multiple microphones are obtained by microphones multiple in microphone array Confidence ceases to obtain sound source position information.

Step S600 obtains sound source coordinate information of the sound source in panoramic video data according to sound source position information.At this In embodiment, using sound source coordinate information as speaker's location information, or the sound source coordinate information that will include the corresponding moment Humanoid coordinate information as speaker's location information.In the present embodiment, pass through sound source position information and fish eye camera Location information obtains sound source angle value of the sound source position relative to fish eye camera position, to obtain sound source in aphorama frequency According to the sound source coordinate information in coordinate system, and humanoid coordinate information obtained in sound source coordinate information and step S200 be based on The coordinate information of same coordinate system.

In the present embodiment, since auditory localization result is a location point, sound source coordinate information is panoramic video data A coordinate points in coordinate system at the location point, it includes humanoid rectangle frame, humanoid coordinate information that humanoid testing result, which is one, For the coordinate set of all coordinate points composition in panoramic video data coordinate system within the scope of the rectangle frame, and it is when a certain human body When speaker, sound source position should be overlapped with position of human body, and obtained sound source coordinate also should be a coordinate in humanoid coordinate set, It therefore, can be with when the humanoid coordinate information when a certain human body in a video frame includes the sound source coordinate information at corresponding moment The identification human body is speaker, and the humanoid coordinate information of the human body is speaker's location information.

Step S700 carries out image interception to each video frame in panoramic video data according to speaker's location information, Generate the video image of speaker.

In the present embodiment, the user of terminal can according to their own needs carry out the video pictures locally exported autonomous Selection.Specifically, user can choose in a manner of picture-in-picture (one big picture adds a small picture) to show video figure Picture, and the focused visual to panorama, mobile human body and the focused visual of speaker can carry out respectively according to their own needs Kind combination can also switch over as needed at any time in agenda.

It should be noted that the above method is also the data collection station in video conferencing system while image being used as to export Data processing method process when terminal, and when data collection station is exported without image, then directly executing step After S500 and step S600, video playing terminal is sent by panoramic video data and speaker's location information, video playing is whole End carries out image interception to each video frame in panoramic video data according to speaker's location information, generates the video of speaker Image.Before carrying out data transmission, it is also desirable to panoramic video data carry out Video coding, to speaker's location information also into The necessary coding of row, specific coding mode and sending method are identical as above-mentioned action message coding mode and sending method, Details are not described herein.

The image generating method of panoramic video meeting provided in this embodiment, by using microphone array to meeting scene Sound source positioned to obtain sound source position information, and the sound source position in meeting scene is generally speaker position, then is cut Take the video image at the sound source position in each video frame of panoramic video data, it will be able to generate the speaker's at meeting scene Video image, thus, it is possible to realize that the speaker in panorama video signal is particularly shown, the image for enriching the panoramic video meeting is generated Method is particularly shown function.In addition, using include the corresponding moment sound source coordinate information humanoid coordinate information as speaker People's location information is realized the dual status of speaker position by sound source coordinate information and humanoid coordinate information, can prevented When only positioning speaker position by sound source coordinate information, the problem of non-human sound source at meeting scene can also be treated as speaker Generation, improve the positioning accuracy of speaker.

Step S800 carries out Face datection to video frame each in panoramic video data, obtains each in panoramic video data The face coordinate information of a video frame.In the present embodiment, face coordinate information is the coordinate of a rectangle frame comprising face Information, specifically, face coordinate information are coordinate of the rectangle frame comprising face in panoramic video data coordinate system, and people Face coordinate information is base with humanoid coordinate information obtained in sound source coordinate information obtained in step S600 and step S200 In the coordinate information of same coordinate system.

In the present embodiment, since auditory localization result is a location point, sound source coordinate information is panoramic video data A coordinate points in coordinate system at the location point, Face datection result are a rectangle frame comprising face, face coordinate information For the coordinate set of all coordinate points composition in panoramic video data coordinate system within the scope of the rectangle frame, and it is when a certain human body When speaker, sound source position should be overlapped with the face location of the human body, and obtained sound source coordinate also should be in face coordinate set One coordinate, therefore, when the sound source coordinate that face coordinate information of a certain human body in a video frame includes the corresponding moment is believed When breath, it can be assumed that the human body is speaker, the face coordinate information of the human body is speaker's location information.

In the present embodiment, AdaBoost, Viola Jones can be used or CNN scheduling algorithm carries out Face datection.

The image generating method of panoramic video meeting provided in this embodiment, by each video in panoramic video data Frame carry out Face datection, and using include the corresponding moment sound source coordinate information face coordinate information as speaker position letter Breath is realized the dual status of speaker position by sound source coordinate information and face coordinate information, can prevent from only passing through sound When source coordinate information positions speaker position, the generation for the problem of non-human sound source at meeting scene can also be treated as speaker, Can also prevent will include the sound source coordinate information at corresponding moment humanoid coordinate information as speaker's location information, and people When body is blocked by barrier, the generation for the problem of can not accurately determining speaker position further improves the positioning of speaker Precision.

Embodiment 2

A kind of image generating method of panoramic video meeting is present embodiments provided, as shown in Figure 4.It should be noted that This method is embodiment of the method in embodiment 1 at video playing end, already explained to repeat no more.In addition, in attached drawing The step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouch The step of stating.The process includes the following steps:

Step S10 receives the panoramic video data at meeting scene.In the present embodiment, the panoramic video data received It needs to be decoded the panoramic video data after coding before generating video image for the panoramic video data after coding.

Step S20 receives the action message of a certain mobile human body in panoramic video data.In the present embodiment, movement letter Breath includes humanoid coordinate information of the mobile human body in the different video frame of panoramic video data.In the present embodiment, it receives Action message also be coding after action message also need before generating video image to the action message after coding It is decoded.

Step S30 carries out image interception to each video frame in panoramic video data according to action message, generates movement The video image of human body.

In the present embodiment, user can also carry out autonomous choosing to the video pictures locally exported according to their own needs It selects, identical in the specific optional way of output such as embodiment 1, details are not described herein.

Step S40 receives speaker's location information.In the present embodiment, speaker's location information is panoramic video data In sound source coordinate information or include the corresponding moment sound source coordinate information humanoid coordinate information or include correspondence The face coordinate information of the sound source coordinate information at moment.

Step S50 carries out image to each video frame in the panoramic video data according to speaker's location information and cuts It takes, generates the video image of speaker.

Step S60, by panoramic video data and panoramic video data in the action message of a certain mobile human body when Between stab alignment.In the present embodiment, when receiving panoramic video data and action message from two channels, video image is being generated Before, it needs the timestamp pair in panoramic video data and panoramic video data in the action message of a certain mobile human body Together.Similarly, when receiving panoramic video data and speaker's location information from two channels, before generating video image, It is also required to the timestamp alignment in panoramic video data and speaker's location information.

Embodiment 3

A kind of video generation device of panoramic video meeting is provided in the present embodiment, and the device is for realizing above-mentioned reality Example 1 and its preferred embodiment are applied, the descriptions that have already been made will not be repeated.As used below, term " module " can be with Realize the combination of the software and/or hardware of predetermined function.Although device described in following embodiment is preferably come with software real It is existing, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.

The present embodiment provides a kind of video generation devices of panoramic video meeting, as shown in Figure 5, comprising: video information obtains Modulus block 100, humanoid detection module 200, action message obtain module 300 and the first image generation module 400.

Wherein, acquiring video information module 100 is used to obtain the panoramic video data at meeting scene using panoramic camera； Humanoid detection module 200 is used to carry out humanoid detection to video frame each in panoramic video data, obtains in panoramic video data The humanoid coordinate information of each video frame；Action message obtains module 300 and is used to obtain each video according to humanoid coordinate information The action message of a certain mobile human body in frame；First image generation module 400 is used for according to action message to aphorama frequency Each video frame in carries out image interception, generates the video image of mobile human body.

In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: auditory localization module 500, it is main Say people's the first locating module 600 and the second image generation module 700.

Wherein, auditory localization module is positioned for 500 using sound source of the microphone array to meeting scene, obtains sound Source location information；The first locating module of speaker 600 is used to obtain sound source in panoramic video data according to sound source position information Sound source coordinate information；Using sound source coordinate information as speaker's location information, or the sound source seat that will include the corresponding moment The humanoid coordinate information of information is marked as speaker's location information；Second image generation module 700 is used for according to speaker position Information carries out image interception to each video frame in panoramic video data, generates the video image of speaker.

In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: the second locating module of speaker, For carrying out Face datection to video frame each in panoramic video data, the face of each video frame in panoramic video data is obtained Coordinate information；Using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information.

Embodiment 4

A kind of video generation device of panoramic video meeting is provided in the present embodiment, and the device is for realizing above-mentioned reality Example 2 and its preferred embodiment are applied, the descriptions that have already been made will not be repeated.As used below, term " module " can be with Realize the combination of the software and/or hardware of predetermined function.Although device described in following embodiment is preferably come with software real It is existing, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.

The present embodiment provides a kind of video generation devices of panoramic video meeting, as shown in Figure 6, comprising: video information connects Receive module 10, action message receiving module 20 and third image generation module 30.

Wherein, video information receiving module 10 is used to receive the panoramic video data at meeting scene；Action message receives mould Block 20 is used to receive the action message of a certain mobile human body in panoramic video data；Action message includes mobile human body in aphorama Humanoid coordinate information in the different video frame of frequency evidence；Third image generation module 30 is used for according to action message to aphorama Each video frame of the frequency in carries out image interception, generates the video image of mobile human body.

In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: speaker's information receiving module 40 and the 4th image generation module 50.

Wherein, speaker's information receiving module 40 is for receiving speaker's location information；Speaker's location information is panorama Sound source coordinate information in video data or include the corresponding moment sound source coordinate information humanoid coordinate information or packet The face coordinate information of sound source coordinate information containing the corresponding moment；4th image generation module 50 is used for according to speaker position Information carries out image interception to each video frame in panoramic video data, generates the video image of speaker.

In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: timestamp alignment module is used for By the timestamp alignment in panoramic video data and panoramic video data in the action message of a certain mobile human body.

Embodiment 5

The embodiment of the invention provides a kind of video conferencing systems, as shown in fig. 7, the video conferencing system may include: At least one processor 701, such as CPU (Central Processing Unit, central processing unit), at least one communication connect Mouth 703, memory 704, at least one communication bus 702.Wherein, communication bus 702 is for realizing the company between these components Connect letter.Wherein, communication interface 703 may include display screen (Display), keyboard (Keyboard), optional communication interface 703 It can also include standard wireline interface and wireless interface.Memory 704 can be high speed RAM memory (Random Access Memory, effumability random access memory), it is also possible to non-labile memory (non-volatile memory), A for example, at least magnetic disk storage.Memory 704 optionally can also be that at least one is located remotely from aforementioned processor 701 Storage device.Application program is wherein stored in memory 704, and processor 701 calls the program generation stored in memory 704 Code, with for either executing in embodiment 1 or embodiment 2 method step, i.e., for performing the following operations:

The panoramic video data at meeting scene are obtained using panoramic camera；To each video frame in panoramic video data into The detection of pedestrian's shape, obtains the humanoid coordinate information of each video frame in panoramic video data；It is obtained according to humanoid coordinate information each The action message of a certain mobile human body in a video frame；According to action message to each video frame in panoramic video data into Row image interception generates the video image of mobile human body.

In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation: It is positioned using sound source of the microphone array to meeting scene, obtains sound source position information；It is obtained according to sound source position information Sound source coordinate information of the sound source in panoramic video data；Using sound source coordinate information as speaker's location information, or will packet Contain the humanoid coordinate information of the sound source coordinate information at corresponding moment as speaker's location information；According to speaker's location information Image interception is carried out to each video frame in panoramic video data, generates the video image of speaker.

In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation: Face datection is carried out to video frame each in panoramic video data, obtains the face coordinate of each video frame in panoramic video data Information；Using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information.

In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation: Receive the action message of a certain mobile human body in panoramic video data；Action message includes mobile human body in panoramic video data Humanoid coordinate information in different video frame；Image is carried out to each video frame in panoramic video data according to action message to cut It takes, generates the video image of mobile human body.

In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation: Receive speaker's location information；Speaker's location information is the sound source coordinate information in panoramic video data or includes correspondence The humanoid coordinate information of the sound source coordinate information at moment or include the corresponding moment sound source coordinate information face coordinate letter Breath；Image interception is carried out to each video frame in panoramic video data according to speaker's location information, generates the view of speaker Frequency image.

In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation: By the timestamp alignment in panoramic video data and panoramic video data in the action message of a certain mobile human body.

Wherein, communication bus 702 can be Peripheral Component Interconnect standard (peripheral component Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (extended industry standard Architecture, abbreviation EISA) bus etc..Communication bus 702 can be divided into address bus, data/address bus, control bus etc.. Only to be indicated with a line in Fig. 7, it is not intended that an only bus or a type of bus convenient for indicating.

Wherein, memory 704 may include volatile memory (English: volatile memory), such as arbitrary access Memory (English: random-access memory, abbreviation: RAM)；Memory also may include nonvolatile memory (English Text: non-volatile memory), for example, flash memory (English: flash memory), hard disk (English: hard disk Drive, abbreviation: HDD) or solid state hard disk (English: solid-state drive, abbreviation: SSD)；Memory 704 can also wrap Include the combination of the memory of mentioned kind.

Wherein, processor 701 can be central processing unit (English: central processing unit, abbreviation: CPU), the combination of network processing unit (English: network processor, abbreviation: NP) or CPU and NP.

Wherein, processor 701 can further include hardware chip.Above-mentioned hardware chip can be specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), programmable logic device (English: Programmable logic device, abbreviation: PLD) or combinations thereof.Above-mentioned PLD can be Complex Programmable Logic Devices (English: complex programmable logic device, abbreviation: CPLD), field programmable gate array (English: Field-programmable gate array, abbreviation: FPGA), Universal Array Logic (English: generic array Logic, abbreviation: GAL) or any combination thereof.

Embodiment 6

The embodiment of the invention also provides a kind of non-transient computer storage medium, the computer storage medium is stored with Either embodiment 1 or embodiment 2 method step can be performed in computer executable instructions, the computer executable instructions.Wherein, The storage medium can be magnetic disk, CD, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive, contracting Write: HDD) or solid state hard disk (Solid-State Drive, SSD) etc.；The storage medium can also include depositing for mentioned kind The combination of reservoir.

Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or It changes still within the protection scope of the invention.

Claims

1. a kind of image generating method of panoramic video meeting, which comprises the steps of:

The panoramic video data at meeting scene are obtained using panoramic camera；

Humanoid detection is carried out to video frame each in the panoramic video data, obtains each video in the panoramic video data The humanoid coordinate information of frame；

The action message of a certain mobile human body in each video frame is obtained according to the humanoid coordinate information；

Image interception is carried out to each video frame in the panoramic video data according to the action message, generates the movement The video image of human body.

2. the image generating method of panoramic video meeting according to claim 1, which is characterized in that further include walking as follows It is rapid:

It is positioned using sound source of the microphone array to the meeting scene, obtains sound source position information；

Sound source coordinate information of the sound source in the panoramic video data is obtained according to the sound source position information；It will be described Sound source coordinate as speaker's location information, or by include the corresponding moment sound source coordinate information humanoid coordinate information make For speaker's location information；

Image interception is carried out to each video frame in the panoramic video data according to speaker's location information, generates institute State the video image of speaker.

3. the image generating method of panoramic video meeting according to claim 2, which is characterized in that further include walking as follows It is rapid:

Face datection is carried out to video frame each in the panoramic video data, obtains each video in the panoramic video data The face coordinate information of frame；Using include the corresponding moment sound source coordinate information face coordinate information as speaker position believe Breath.

4. a kind of image generating method of panoramic video meeting, which comprises the steps of:

Receive the panoramic video data at meeting scene；

Receive the action message of a certain mobile human body in the panoramic video data；The action message includes the mobile human body Humanoid coordinate information in the different video frame of the panoramic video data；

5. the image generating method of panoramic video meeting according to claim 4, which is characterized in that further include walking as follows It is rapid:

Receive speaker's location information；Speaker's location information be the panoramic video data in sound source coordinate information or Person include the humanoid coordinate information of the sound source coordinate information at corresponding moment or include the corresponding moment sound source coordinate information Face coordinate information；

6. a kind of video generation device of panoramic video meeting characterized by comprising

Acquiring video information module, for using panoramic camera to obtain the panoramic video data at meeting scene；

Humanoid detection module obtains the panorama for carrying out humanoid detection to video frame each in the panoramic video data The humanoid coordinate information of each video frame in video data；

Action message obtains module, for obtaining a certain mobile human body in each video frame according to the humanoid coordinate information Action message；

First image generation module, for being carried out according to the action message to each video frame in the panoramic video data Image interception generates the video image of the mobile human body.

7. the video generation device of panoramic video meeting according to claim 6, which is characterized in that further include:

Auditory localization module obtains sound source position for positioning using sound source of the microphone array to the meeting scene Information；

The first locating module of speaker, for obtaining the sound source in the panoramic video data according to the sound source position information In sound source coordinate information；Using the sound source coordinate as speaker's location information, or the sound source that will include the corresponding moment The humanoid coordinate information of coordinate information is as speaker's location information；

Second image generation module, according to speaker's location information to each video frame in the panoramic video data into Row image interception generates the video image of the speaker.

8. a kind of video generation device of panoramic video meeting characterized by comprising

Video information receiving module, for receiving the panoramic video data at meeting scene；

Action message receiving module, for receiving the action message of a certain mobile human body in the panoramic video data；It is described dynamic Making information includes humanoid coordinate information of the mobile human body in the different video frame of the panoramic video data；

Third image generation module, for being carried out according to the action message to each video frame in the panoramic video data Image interception generates the video image of the mobile human body.

9. a kind of video conferencing system characterized by comprising at least one processor；And with it is described at least one processing The memory of device communication connection；Wherein, the memory is stored with the instruction that can be executed by one processor, described instruction It is executed by least one described processor, so that any described at least one described processor execution the claims 1-5 Method.

10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the instruction is by processor The step of any the method in the claims 1-5 is realized when execution.