WO2024070761A1

WO2024070761A1 - Information processing device, information processing method, and program

Info

Publication number: WO2024070761A1
Application number: PCT/JP2023/033687
Authority: WO
Inventors: 滉太今枝; 和平岡田; 大資田原; 慧柿谷
Original assignee: ソニーグループ株式会社
Priority date: 2022-09-29
Filing date: 2023-09-15
Publication date: 2024-04-04

Abstract

This information processing device includes a video processing unit generating video data causing simultaneous display, within the same screen, of a bird's-eye view of a space for image capturing, an image capturing range presentation video presenting the image capturing range of a camera within the bird's eye view, and a video captured by the camera.

Description

Information processing device, information processing method, and program

This technology relates to an information processing device, information processing method, and program, and is related to the display of images of the target space and virtual images.

2. Description of the Related Art There is known a technique for displaying the shooting direction and depth of field of a camera.
Japanese Patent Application Laid-Open No. 2003-233693 discloses a technique for displaying the depth of field and the angle of view based on shooting information.
Japanese Patent Application Laid-Open No. 2003-233633 discloses expressing the shooting range in a captured image using a trapezoidal figure.
Japanese Patent Laid-Open No. 2003-233633 discloses generating and displaying a map image for indicating the depth position and focus position of an object to be imaged.

JP 2013-183217 A JP 2009-60337 A JP 2010-177741 A

For example, in a system that shoots video for broadcast or distribution, it is useful for cameramen, directors, etc. to be able to grasp the shooting direction and angle of view of one or more cameras, the position of the subject that is being focused on, etc. One way to achieve this is to display on the screen the shooting range that changes depending on the shooting direction and angle of view. However, simply displaying such a shooting range does not allow users to simultaneously check what image is currently being shot by the camera.

This disclosure therefore proposes technology that displays images that make it easier to understand the correspondence between camera images and positions in space.

An information processing device related to the present technology includes an image processing unit that generates image data that simultaneously displays an overhead image of a space to be photographed, a shooting range presentation image that presents the camera's shooting range within the overhead image, and the image photographed by the camera on a single screen.
The shooting range presentation image is an image showing the shooting range determined by the shooting direction and zoom angle of the camera. When the image showing the shooting range of the camera is displayed in the overhead image, the camera's shooting image is also displayed in the same screen.

FIG. 1 is an explanatory diagram of photography by a photography system according to an embodiment of the present technology. This is an explanatory diagram of AR (Augmented Reality) overlaid images. FIG. 1 is an explanatory diagram of a system configuration according to an embodiment. FIG. 11 is an explanatory diagram of another example of a system configuration according to the embodiment; FIG. 2 is an explanatory diagram of an environment map according to the embodiment; 11A and 11B are diagrams illustrating drift correction of an environment map according to an embodiment. FIG. 1 is a block diagram of an information processing apparatus according to an embodiment. FIG. 2 is an explanatory diagram of a view frustum according to an embodiment. 1 is an explanatory diagram of a display example of a captured image on a focus plane of a view frustum according to an embodiment; 1 is an explanatory diagram of a display example of a captured image within the depth of field of a view frustum according to an embodiment. FIG. 11 is an explanatory diagram of a display example of a captured image at a position close to the starting point of a view frustum in the embodiment; FIG. 1 is an explanatory diagram of an example of a display of a captured image on a far end surface of a view frustum according to an embodiment; FIG. 13 is an explanatory diagram of a case where a view frustum according to an embodiment is set at infinity. 11A to 11C are explanatory diagrams illustrating a change in the display state of a captured image on the far end side of a view frustum according to an embodiment. 11A and 11B are explanatory diagrams of a display example of a captured image outside a view frustum according to an embodiment. 1A to 1C are explanatory diagrams illustrating an example of display of captured images inside and outside a plurality of view frustums according to an embodiment. 11A and 11B are explanatory diagrams of a display example of a captured image outside a view frustum according to an embodiment. 11A and 11B are explanatory diagrams of a display example of a captured image outside a view frustum according to an embodiment. 11 is a flowchart of a processing example of the information processing apparatus according to the embodiment. 11 is a flowchart of an example of a process for setting a display position of a captured image according to an embodiment; 11 is a flowchart of an example of a process for setting a display position of a captured image according to an embodiment; 11 is a flowchart of an example of a process for setting a display position of a captured image according to an embodiment; 11 is a flowchart of an example of a process for setting a display position of a captured image according to an embodiment; 11 is a flowchart of an example of a process for setting a display position of a captured image according to an embodiment; FIG. 4 is an explanatory diagram of a collision determination according to an embodiment. FIG. 4 is an explanatory diagram of a collision determination according to an embodiment. 11A and 11B are explanatory diagrams of changes in an overhead view image in the embodiment. FIG. 13 is an explanatory diagram of an overhead view from the director's side in the embodiment. 11A and 11B are diagrams illustrating a determination of an image to be highlighted according to an embodiment. 11 is a flowchart of a processing example of the information processing apparatus according to the embodiment. 11 is a flowchart of an example of a process for highlighting according to an embodiment. 11 is a flowchart of an example of a process for highlighting according to an embodiment. FIG. 11 is an explanatory diagram of a display example based on feedback according to the embodiment. 11 is a flowchart of an example of a display process based on feedback according to an embodiment. 11A and 11B are explanatory diagrams of a display example of overlapping view frustums according to an embodiment; 11 is a flowchart of a processing example of displaying overlapped view frustum according to an embodiment. FIG. 13 is an explanatory diagram of a preferred display of one view frustum according to an embodiment. 13 is a flowchart of a processing example when performing priority display according to the embodiment. FIG. 13 is an explanatory diagram of an example of a display of the instruction frustum on the director side in the embodiment. 11 is an explanatory diagram of an example of a display on the cameraman's side of an instruction frustum according to an embodiment. FIG. 13 is a flowchart of a process for generating an overhead view video according to another embodiment. 11 is an explanatory diagram of an example of a display on the cameraman's side of an instruction frustum according to an embodiment. FIG. 11 is a flowchart of a process for generating an overhead video from a cameraman's side according to an embodiment. 11 is an explanatory diagram of an example of instruction information displayed on the cameraman's side according to the embodiment; FIG. 11 is a flowchart of a process for generating an overhead video from a cameraman's side according to an embodiment. 11 is an explanatory diagram of a display example of a marker frustum according to an embodiment. FIG. FIG. 11 is an explanatory diagram of a display example of a marker according to an embodiment. 13 is a flowchart of a process example of displaying marker information according to an embodiment. 11A and 11B are explanatory diagrams of a display example of a different overhead view image according to an embodiment. 11A and 11B are explanatory diagrams of a display example of a different overhead view image according to an embodiment. FIG. 13 is an explanatory diagram of a display example on the director side of the embodiment. 13 is a flowchart of a process for generating an overhead view video according to another embodiment.

The embodiments will be described below in the following order.
1. System configuration
2. Configuration of information processing device
<3. Display of view frustum>
<4. Example of a cameraman and director screen>
[4-1: Highlighted display]
[4-2: Priority Display]
[4-3: Instruction Display]
[4-4: Marker display]
[4-5: Examples of various displays]
5. Summary and Modifications

In this disclosure, the term "video" or "image" includes both moving images and still images, but the embodiment will be described taking a moving image as an example.

1. System configuration
In the embodiment, an image capturing system capable of generating so-called AR images by combining a virtual image with a real image will be taken as an example. Fig. 1 is a schematic diagram showing how an image is captured by the image capturing system.

1 shows an example in which three cameras 2 are arranged to capture images of a real target space 8. The number of cameras 2 is just an example, and one or more cameras 2 may be used.
The subject space 8 may be any location, but one example is a stadium for soccer, rugby, or the like.

1, the camera 2 is a mobile camera 2M that is suspended by a wire 9 and can move above a target space 8. Images and metadata captured by this mobile camera 2M are sent to a render node 7.
Also shown as the camera 2 is a fixed camera 2F that is fixedly disposed on, for example, a tripod 6. Images and metadata captured by this fixed camera 2F are sent to a render node 7 via a CCU (Camera Control Unit) 3.
In addition, the captured images and metadata from the mobile camera 2M may be sent to the render node 7 via the CCU 3.
Hereinafter, "camera 2" collectively refers to

cameras

2F and 2M.

The render node 7 referred to here refers to a CG engine or image processor that generates CG (Computer Graphics) and synthesizes it with live-action video, and is, for example, a device that generates AR video.

2A and 2B show examples of AR images. In Fig. 2A, a line that does not actually exist is composited as a CG image 38 into live-action footage of a game being played in a stadium. In Fig. 2B, an advertising logo that does not actually exist is composited as an image 38 into the live-action footage in the stadium.
These CG images 38 can be rendered to look like they exist in reality by appropriately setting the shape, size and synthesis position depending on the position of the camera 2 at the time of shooting, the shooting direction, the angle of view, the structural object photographed, etc.

The process of generating AR overlay images by combining CG with such live-action footage is already known. The filming system of this embodiment also enables the cameraman and director involved in the video production to perform production tasks such as shooting and giving instructions while visually viewing the AR overlay image. This allows filming to be performed while checking the fusion state of the real scene and the virtual image, making it possible to produce videos that are in line with the creative intent.

In particular, in this embodiment, in a shooting system where a cameraman or the like can check such AR superimposed images, a shooting range presentation image that is suitable for the viewer of the monitor image, such as the cameraman or director, is displayed.

Two configuration examples of the imaging system are shown in FIG. 3 and FIG.
In the configuration example of FIG. 3, a

camera system

1, 1A, a control panel 10, a GUI (Graphical User Interface) device 11, a network hub 12, a switcher 13, and a master monitor 14 are shown.
The dashed arrows indicate the flow of various control signals CS, while the solid arrows indicate the flow of each of the image data of the shot image V1, the AR superimposed image V2, and the overhead image V3.

Camera system 1 is configured to perform AR linkage, while camera system 1A is configured not to perform AR linkage.
Although an example of a fixed camera 2F mounted on a tripod 6 is shown in Figs. 3 and 4, a mobile camera 2M may also be used as the

camera system

1, 1A.

The camera system 1 includes a camera 2, a CCU 3, for example an AI (artificial intelligence) board 4 built into the CCU 3, and an AR system 5.
The camera 2 sends video data of the shot video V1 and metadata MT to the CCU 3. The CCU 3 sends the video data of the shot video V1 to the switcher 13. The CCU 3 also sends the video data of the shot video V1 and metadata MT to the AR system 5.

The metadata MT includes lens information including the zoom angle of view and focal length when the captured image V1 was captured, and sensor information such as the IMU (Inertial Measurement Unit) mounted on the camera 2. Specifically, this information includes the 3doF (Degree of Freedom) attitude information of the camera 2, acceleration information, lens focal length, aperture value, zoom angle of view, lens distortion, etc. This metadata MT is output from the camera 2, for example, as frame-synchronized or asynchronous information.

In the case of FIG. 3, the camera 2 is a fixed camera 2F, and the position information does not change, so the camera position information only needs to be stored as a known value by the CCU 3 and the AR system 5. When a mobile camera 2M is used, the position information is also included in the metadata MT transmitted successively from the camera 2M.

The AR system 5 is an information processing apparatus including a rendering engine that performs CG rendering. The information processing apparatus as the AR system 5 is an example of the render node 7 shown in FIG.
The AR system 5 generates video data of an AR superimposed video V2 by superimposing an image 38 generated by CG on a video V1 captured by the camera 2. In this case, the AR system 5 sets the size and shape of the image 38 by referring to the metadata MT, and also sets the synthesis position within the captured video V1, thereby generating video data of an AR superimposed video V2 in which the image 38 is naturally synthesized with the actual scenery.

The AR system 5 also generates video data of a CG overhead image V3, as described later. For example, the video data is the overhead image V3 that reproduces the target space 8 by CG. Furthermore, the AR system 5 displays a view frustum 40 as shown in FIG. 8, which will be described later, in the overhead image V3 as a shooting range presentation image that visually presents the shooting range of the camera 2.
For example, the AR system 5 calculates the shooting range in the shooting target space 8 from the metadata MT and position information of the camera 2. The shooting range of the camera 2 can be obtained by acquiring the position information of the camera 2, the angle of view, and the attitude information (corresponding to the shooting direction) of the camera 2 in the three axial directions (yaw, pitch, roll) on the tripod 6.
The AR system 5 generates an image as a view frustum 40 in accordance with the calculation of the shooting range of the camera 2. The AR system 5 generates image data of the overhead image V3 so that the view frustum 40 is presented from the position of the camera 2 in the overhead image V3 corresponding to the target space 8.

In this disclosure, the term "bird's-eye view image" refers to an image from a bird's-eye view of the target space 8, but does not necessarily have to display the entire target space 8 within the image. An image that includes at least a portion of the view frustum 40 of the camera 2 and the surrounding space is referred to as a bird's-eye view image.
In the embodiment, the overhead image V3 is generated as an image expressing the shooting target space 8 such as a stadium by CG, but the overhead image V3 may be generated by real-life images. For example, a camera 2 may be provided with a viewpoint for the overhead image, and the image V1 shot by the camera 2 may be used to generate the overhead image V3. The image V1 shot by a camera 2M moving in the sky on a wire 9 may be used as the overhead image V3. Furthermore, a 3D (three dimensions)-CG model of the shooting target space 8 may be generated using the images V1 shot by multiple cameras 2, and the viewpoint position may be set for the 3D-CG model and rendered to generate an overhead image V3 with a variable viewpoint position.

The video data of the AR superimposed image V2 and the overhead image V3 by the AR system 5 is supplied to a switcher 13.
Furthermore, the image data of the AR superimposed image V2 and the overhead image V3 by the AR system 5 is supplied to the camera 2 via the CCU 3. This allows the cameraman of the camera 2 to visually recognize the AR superimposed image V2 and the overhead image V3 on a display unit such as a viewfinder.
The image data of the AR superimposed image V2 and the overhead image V3 by the AR system 5 may be supplied to the camera 2 without going through the CCU 3. Furthermore, there are also examples in which the CCU 3 is not used in the

camera systems

1 and 1A.

The AI board 4 in the CCU 3 performs processing to calculate the amount of drift of the camera 2 from the captured image V1 and metadata MT.
At each time point, the positional displacement of the camera 2 is obtained by integrating twice the acceleration information from the IMU mounted on the camera 2. By accumulating the amount of displacement at each time point from a certain reference origin attitude (attitude position that is the reference for each of the three axes of yaw, pitch, and roll), the position of the three axes of yaw, pitch, and roll at each time point, that is, attitude information corresponding to the shooting direction of the camera 2, is obtained. However, repeated accumulation increases the deviation (accumulated error) between the actual attitude position and the calculated attitude position. The amount of deviation is called the drift amount.

In order to eliminate such drift, the AI board 4 calculates the amount of drift using the captured image V1 and the metadata MT. Then, the calculated amount of drift is sent to the camera 2 side.
The camera 2 receives the drift amount from the CCU 3 (AI board 4) and corrects the attitude information of the camera 2. Then, the camera 2 outputs metadata MT including the corrected attitude information.

The above drift correction will be explained with reference to FIGS.
5 shows the environment map 35. The environment map 35 stores feature points and feature amounts in the coordinates of a virtual dome, and is generated for each camera 2.
The camera 2 is rotated 360 degrees, and an environment map 35 is generated in which feature points and feature quantities are registered in global position coordinates on the celestial sphere. This makes it possible to restore the orientation even if it is lost during feature point matching.

FIG. 6A shows a schematic diagram of a state in which a drift amount DA occurs between the imaging direction Pc in the correct attitude of the camera 2 and the imaging direction Pj calculated from the IMU data.
Information on the three-axis motion, angle, and field of view of the camera 2 is sent from the camera 2 to the AI board 4 as a guide for feature point matching. The AI board 4 detects the accumulated drift amount DA by feature point matching of image recognition, as shown in FIG. 6B. The "+" in the figure indicates a feature point of a certain feature amount registered in the environment map 35 and a feature point of the corresponding feature amount in the frame of the current captured image V1, and the arrow between them is the drift amount vector. In this way, by detecting a coordinate error by feature point matching and correcting the coordinate error, the drift amount can be corrected.

The AI board 4 determines the amount of drift by this type of feature point matching, and the camera 2 transmits corrected metadata MT based on this, thereby improving the accuracy of the attitude information of the camera 2 detected in the AR system 5 based on the metadata MT.

Camera system 1A in FIG. 3 is an example having a camera 2 and a CCU 3, but not an AR system 5. Video data and metadata MT of the shot video V1 are transmitted from the camera 2 of camera system 1A to the CCU 3. The CCU 3 transmits the video data of the shot video V1 to the switcher 13.

The video data of the captured image V1, AR superimposed image V2, and overhead image V3 output from the

camera system

1, 1A is supplied to the GUI device 11 via the switcher 13 and network hub 12.

The switcher 13 selects the so-called main line video from among the images V1 captured by the multiple cameras 2, the AR superimposed video V2, and the overhead video V3. The main line video is the video output for broadcasting or distribution. The switcher 13 outputs the selected video data to a transmitting device or recording device (not shown) as the main line video for broadcasting or distribution.

Moreover, the video data of the video selected as the main line video is sent to the master monitor 14 and displayed thereon, so that the video production staff can check the main line video.
In addition, the master monitor 14 may display an AR superimposed image V2, an overhead image V3, etc. in addition to the main line image.

The control panel 10 is a device that allows video production staff to operate the switcher 13 to give switching instructions, video processing instructions, and various other instructions. The control panel 10 outputs a control signal CS in response to operations by the video production staff. This control signal CS is sent via the network hub 12 to the switcher 13 and the

camera systems

1 and 1A.

The GUI device 11 is, for example, a PC or a tablet device, and is a device that enables video production staff, such as a director, to check the video and give various instructions.
The captured image V1, the AR superimposed image V2, and the overhead image V3 are displayed on the display screen of the GUI device 11. For example, in the GUI device 11, the captured images V1 from the multiple cameras 2 are split into a screen and displayed as a list, the AR superimposed image V2 is displayed, and the overhead image V3 is displayed. Alternatively, in the GUI device 11, an image selected by the switcher 13 as a main line image is displayed.

The GUI device 11 is also provided with an interface for a director or the like to perform various instruction operations. The GUI device 11 outputs a control signal CS in response to an operation by the director or the like. This control signal CS is transmitted via a network hub 12 to a switcher 13 and the

camera systems

1 and 1A.
Depending on the GUI device 11, it may be possible to give instructions regarding, for example, the display mode of the view frustum 40 in the overhead view image V3.
A control signal CS corresponding to the instruction is transmitted to the AR system 5, and the AR system 5 generates video data of an overhead video V3 including a view frustum 40 in a display format corresponding to an instruction from a director or the like.

3 has

camera systems

1 and 1A, but in this case, camera system 1 is a set of camera 2, CCU 3, and AR system 5, and in particular, by having AR system 5, video data of AR superimposed video V2 and overhead video V3 corresponding to video V1 captured by camera 2 is generated. Then, AR superimposed video V2 and overhead video V3 are displayed on a display unit such as a viewfinder of camera 2, displayed on GUI device 11, or selected as a main line video by switcher 13.
On the other hand, on the camera system 1A side, image data of the AR superimposed image V2 and the overhead image V3 corresponding to the captured image V1 of the camera 2 is not generated.
Therefore, FIG. 3 shows a system in which a camera 2 that performs AR linkage and a camera 2 that performs normal shooting are mixed.

The example of FIG. 4 is an example of a system in which one AR system 5 corresponds to each camera 2.
4, a plurality of camera systems 1A are provided. The AR system 5 is provided independently of each of the camera systems 1A.

The CCU 3 of each camera system 1A sends the video data and metadata MT of the shot video V1 from the camera 2 to the switcher 13. The video data and metadata MT of the shot video V1 are then supplied from the switcher 13 to the AR system 5.
This allows the AR system 5 to acquire the video data and metadata MT of the captured video V1 for each camera system 1A, and generate video data of the AR superimposed video V2 corresponding to the captured video V1 of each camera system 1A, and video data of the overhead video V3 including the view frustum 40 corresponding to each camera system 1A. Alternatively, the AR system 5 can generate video data of the overhead video V3 in which the view frustums 40 of the cameras 2 of the multiple camera systems 1A are collectively displayed.

The video data of the AR superimposed image V2 and the overhead image V3 generated by the AR system 5 is sent to the CCU 3 of the camera system 1A via the switcher 13, and then sent to the camera 2. This allows the cameraman to view the AR superimposed image V2 and the overhead image V3 on a display such as the viewfinder of the camera 2.

In addition, the video data of the AR overlay image V2 and the overhead image V3 generated by the AR system 5 is transmitted to the GUI device 11 via the switcher 13 and the network hub 12 and displayed. This allows the director and others to visually confirm the AR overlay image V2 and the overhead image V3.

In the configuration shown in FIG. 4, it is possible to generate and display the AR superimposed image V2 and the overhead image V3 of each camera 2 without providing an AR system 5 to each camera system 1A.

Incidentally, in FIG. 3 and FIG. 4, the overhead view image V3 is denoted as "V3-1" and "V3-2".
The video data of the overhead image V3-1 is the video data of the overhead image V3 to be displayed on the GUI device 11 or the master monitor 14, with a director or the like assumed as the viewer. The video data of the overhead image V3-2 is the video data of the overhead image V3 to be displayed on the viewfinder of the camera 2, with a cameraman or the like assumed as the viewer.

The video data for these overhead images V3-1 and V3-2 may be video data that displays images of the same content. Both of these are video data that display an overhead image V3 of the target space 8 that includes at least the view frustum 40. However, in the embodiment, a case will also be described in which these are video data that include different display contents.

In other words, the AR system 5 may generate video data that will become an overhead image V3 with the same video content regardless of the transmission destination, or may generate, for example, video data of a first overhead image V3-1 to be transmitted to the GUI device 11 and video data of a second overhead image V3-2 to be transmitted to the camera 2 in parallel.
Furthermore, in the case of the system of FIG. 4, it is also assumed that the AR system 5 generates multiple second overhead images V3-2 in parallel so that the content differs for each camera 2.

2. Configuration of information processing device
In the above-described imaging system, a configuration example of an information processing device 70 that is, for example, the AR system 5 will be described with reference to FIG.
The information processing device 70 is a device capable of information processing, particularly video processing, such as a computer device. Specific examples of the information processing device 70 include personal computers, workstations, mobile terminal devices such as smartphones and tablets, video editing devices, etc. The information processing device 70 may also be a computer device configured as a server device or a computing device in cloud computing.

The CPU 71 of the information processing device 70 executes various processes according to programs stored in the ROM 72 or a non-volatile memory unit 74, such as an EEPROM (Electrically Erasable Programmable Read-Only Memory), or programs loaded from the storage unit 79 to the RAM 73. The RAM 73 also stores data necessary for the CPU 71 to execute various processes, as appropriate.

The CPU 71 is configured as a processor that performs various types of processing. The CPU 71 performs overall control processing and various types of calculation processing, but in this embodiment, it also has the functions of an image processing unit 71a and an image generation control unit 71b in order to execute image processing as the AR system 5 based on a program.

The video processing unit 71a has a processing function for performing various types of video processing. For example, it performs one or more of the following: 3D model generation processing, rendering, video processing including color and brightness adjustment processing, video editing processing, video analysis and detection processing, etc.

The video processing unit 71a also performs processing to generate an overhead image V3 as video data that simultaneously displays an overhead image V3 of the target space 8, a view frustum 40 that shows the shooting range of camera 2 within the overhead image V3, and the captured image V1 of camera 2 on a single screen.

The image generation control unit 71b in the CPU 71 performs processing to variably set the display position of the captured image V1 to be simultaneously displayed on one screen in the overhead image V3 including the view frustum 40 generated by the image processing unit 71a, and to control the generation of image data by the image processing unit 71a. The image processing unit 71a generates the overhead image V3 including the view frustum 40 according to the settings of the image generation control unit 71b.

The image processing unit 71a may also perform in parallel a process of generating first image data that displays the view frustum 40 of the camera 2 within the target space 8, and a process of generating second image data that displays an image of the view frustum 40 within the target space 8, the image having a different display mode from the image generated by the first image data.
In this case, the first video data is, for example, video data of the overhead view V3-1, and the second video data is, for example, video data of the overhead view V3-2.

In addition, the functions of the image processing unit 71a and the image generation control unit 71b may be realized by a CPU separate from the CPU 71, a GPU (Graphics Processing Unit), a GPGPU (General-purpose computing on graphics processing units), an AI (artificial intelligence) processor, etc.
Furthermore, the functions of the video processing unit 71a and the video production control unit 71b may be realized by a plurality of processors.

The CPU 71, ROM 72, RAM 73, and non-volatile memory unit 74 are interconnected via a bus 83. The input/output interface 75 is also connected to this bus 83.

An input unit 76 consisting of operators and operation devices is connected to the input/output interface 75. For example, the input unit 76 may be various operators and operation devices such as a keyboard, a mouse, a key, a trackball, a dial, a touch panel, a touch pad, a remote controller, or the like.
An operation by the user is detected by the input unit 76 , and a signal corresponding to the input operation is interpreted by the CPU 71 .
A microphone may also be used as the input unit 76. Voice uttered by the user may also be input as operation information.

Further, the input/output interface 75 is connected, either integrally or separately, to a display unit 77 formed of an LCD (Liquid Crystal Display) or an organic EL (electro-luminescence) panel, or the like, and an audio output unit 78 formed of a speaker, or the like.
The display unit 77 is a display unit that performs various displays, and is configured, for example, by a display device provided in the housing of the information processing device 70, or a separate display device connected to the information processing device 70, or the like.
The display unit 77 displays various images, operation menus, icons, messages, etc., on the display screen based on instructions from the CPU 71, that is, displays them as a GUI (Graphical User Interface).

The input/output interface 75 may also be connected to a storage unit 79 and a communication unit 80, which may be configured using a hard disk drive (HDD) or solid-state memory.

The storage unit 79 can store various data and programs. A database can also be configured in the storage unit 79.

The communication unit 80 performs communication processing via a transmission path such as the Internet, and communication with various devices such as external databases, editing devices, and information processing devices via wired/wireless communication, bus communication, and the like.
For example, assuming an information processing device 70 as the AR system 5 , communication with the CCU 3 and the switcher 13 is performed via a communication unit 80 .

A drive 81 is also connected to the input/output interface 75 as required, and a removable recording medium 82 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately mounted thereon.
The drive 81 allows video data, various computer programs, and the like to be read from the removable recording medium 82. The read data is stored in the storage unit 79, and the video and audio contained in the data are output on the display unit 77 and the audio output unit 78. In addition, the computer programs, etc. read from the removable recording medium 82 are installed in the storage unit 79 as necessary.

In this information processing device 70, for example, software for the processing of this embodiment can be installed via network communication by the communication unit 80 or via a removable recording medium 82. Alternatively, the software may be stored in advance in the ROM 72, the storage unit 79, etc.

<3. Display of view frustum>
The display of the view frustum 40 will now be described. As described above, the AR system 5 generates the overhead image V3 and can transmit it to the viewfinder of the camera 2, the GUI device 11, or the like for display. The AR system 5 generates video data for the overhead image V3 so as to display the view frustum 40 of the camera 2 within the overhead image V3.

Fig. 8 shows an example of a view frustum 40 displayed in the overhead image V3. Fig. 8 shows an example of a CG image of the subject space 8 in Fig. 1 as viewed from above, but for the sake of explanation, it is shown in a simplified form. For example, an example of the overhead image V3 of a stadium is as shown in Fig. 16, which will be described later.
The overhead image V3 in Fig. 8 includes an image showing a background 31, such as a stadium, and a person 32, such as a player. Note that, although the camera 2 is shown in Fig. 8, this is shown for the purpose of explanation. The overhead image V3 may or may not include an image of the camera 2 itself.

The view frustum 40 visually presents the shooting range of the camera 2 within the overhead image V3, and has a pyramid shape that spreads in the direction of the shooting optical axis with the position of the camera 2 within the overhead image V3 as the frustum origin 46. For example, it is a pyramid shape extending from the frustum origin 46 to the frustum far end surface 45.

The reason why it is a quadrangular pyramid is because the image sensor of the camera 2 is quadrangular.
The extent of the spread of the pyramid changes depending on the angle of view of the camera 2 at that time. Therefore, the range of the pyramid indicated by the view frustum 40 is the shooting range of the camera 2.
In practice, for example, the view frustum 40 may be represented as a pyramid with a semi-transparent colored image.

The view frustum 40 displays a focus plane 41 and a depth of field range 42 at that time inside a quadrangular pyramid. As the depth of field range 42, for example, a range from a near depth end surface 43 to a far depth end surface 44 is expressed by a translucent color different from the rest.
The focus plane 41 is also expressed in a semi-transparent color that is different from the others.

The focus plane 41 indicates the depth position at which the camera 2 is focused at that point in time. In other words, by displaying the focus plane 41, it can be confirmed that a subject at a depth (distance in the depth direction as seen from the camera 2) equivalent to the focus plane 41 is in focus.
Furthermore, the depth of field range 42 makes it possible to confirm the range in the depth direction in which the subject is not blurred.
The in-focus depth and the depth of field vary depending on the focus operation and aperture operation of the camera 2. Therefore, the focus plane 41 and the depth of field range 42 in the view frustum 40 vary each time.

The AR system 5 can set the pyramidal shape of the view frustum 40, the display position of the focus plane 41, the display position of the depth of field range 42, and the like, by acquiring metadata MT from the camera 2, which includes information such as focal length, aperture value, and angle of view. Furthermore, since the metadata MT includes attitude information of the camera 2, the AR system 5 can set the direction of the view frustum 40 from the camera position (frustum origin 46) in the overhead image V3.

The AR system 5 then displays, together with the view frustum 40, the image V1 captured by the camera 2 in which the view frustum 40 is shown in the overhead image V3.
That is, the AR system 5 generates an image of the CG space 30 to be used as the overhead image V3, synthesizes the image of the CG space 30 with the view frustum 40 generated based on the metadata MT supplied from the camera 2, and further synthesizes the image V1 captured by the camera 2. The image data of such a synthesized image is output as the overhead image V3.

An example will be described in which a view frustum 40 in an image of a CG space 30 and a photographed image V1 are simultaneously displayed on one screen.
First, an example will be described in which the AR system 5 generates video data of an overhead video V3 in which the captured video V1 is displayed within the view frustum 40.
In other words, this is an example of generating video data in which the captured video V1 is arranged within the range of the view frustum 40. Furthermore, this can be said to be an example of generating video data in which the captured video V1 is displayed in a state in which it is arranged within the range of the view frustum 40.

Figure 9 shows an example in which the captured image V1 is displayed on the focus plane 41 in the view frustum 40. This makes it possible to view the image captured at the focus position. The example in Figure 9 is also one example in which the captured image V1 is displayed within the depth of field range 42.

10 shows an example in which a captured image V1 is displayed on a surface other than the focus surface 41 within the depth of field range 42 in the view frustum 40. In the example shown in the figure, the captured image V1 is displayed on a surface 44 at the far end of the depth field.
In addition to this, examples are also conceivable in which the captured image V1 is displayed on the near depth end surface 43, or at a depth position midway within the depth of field range 42.

FIG. 11 shows an example in which the captured image V1 is displayed within the view frustum 40 at a position (surface 47 near the frustum origin) closer to the frustum origin 46 than the near-depth end surface 43 of the depth-of-field range 42. When considering displaying within the view frustum 40, the size of the captured image V1 becomes smaller the closer it is to the frustum origin 46, but by displaying it on the surface 47 near the frustum origin in this way, the focus plane 41, depth-of-field range 42, etc. become easier to see.

12 shows an example in which a captured image V1 is displayed on the far side of a far end surface 44 of a depth of field range 42 within a view frustum 40. Note that "far" means far from the viewpoint of the camera 2 (frustum starting point 46).
In the illustrated example, the captured image V1 is displayed on the frustum far end surface 45, which is located at the far side.
In this way, when the photographed image V1 is displayed on the far side of the depth of field range 42 within the view frustum 40, the area of the photographed image V1 can be made large. This is therefore suitable for checking the position of the focus plane 41 and the depth of field range 42 while carefully checking the contents of the photographed image V1.

The distance of the rendered view frustum 40 may be finite or infinite. For example, the view frustum 40 may be rendered at a finite distance, such as the rendering distance d1 in Fig. 12. For example, the rendering distance d1 may be twice the distance from the frustum starting point 46 to the focus plane 41.
By doing so, the frustum far end surface 45 is determined, so that the photographed image V1 can be displayed in the widest area within the view frustum 40 as shown in FIG.

On the other hand, the view frustum 40 may be rendered at infinity as shown in FIG. 13 without any particular rendering distance. In other words, the frustum far end surface 45 is not always specified as a constant. In that case, the captured image V1 may be displayed at an indefinite position farther away than the depth of field range 42.

Even when the view frustum 40 is set to infinity, it is preferable to render the view frustum 40 up to the portion where the view frustum 40 hits a wall or the like represented by CG on the far side. Therefore, the far end of the rendering range is set as the frustum far end surface 45.
14A and 14B show that when the view frustum 40 is rendered up to the position of the wall W, the position at which it collides with the wall W is the frustum far end surface 45. In other words, the frustum far end surface 45 changes depending on the positional relationship with the object created by the CG.
When the view frustum 40 is set to infinity in this manner, it is conceivable that the far end of the range that can be drawn in the overhead image V3 is the frustum far end surface 45, and the captured image V1 is displayed on that frustum far end surface 45.

Even if the view frustum 40 is at a finite distance as in FIG. 12, it may happen that the object collides with the wall W before the rendering distance d1. In that case, the position of collision with the wall W should be set to the far end surface 45 of the frustum.

So far, an example has been given in which the photographed image V1 is displayed within the view frustum 40, but the photographed image V1 may be displayed at a position outside the view frustum 40 within the same screen as the overhead image V3.
15 shows four examples (captured images V1w, V1x, V1y, and V1z) as examples of display positions outside the view frustum 40. In particular, these four examples are examples in which the captured image V1 is displayed near the view frustum 40.

The captured image V1 may be displayed near the far end surface 45 of the frustum as captured image V1w.

Furthermore, it is possible to display the captured image V1 at a distance farther than the far end surface 45 of the frustum, as in the captured image V1x. If the view frustum 40 is at a finite distance, this means a position beyond the rendering distance d1 (see FIG. 12).

The captured image V1 can also be displayed near the focus plane 41 (or depth of field range 42) as in the captured image V1y in FIG. 15. In this case, it becomes easier to view the captured image V1 together with the focus plane 41 or depth of field range 42, which are areas of the view frustum 40 that are likely to be noticed by the viewer.

The captured image V1 can also be displayed near the camera 2 (or the frustum starting point 46) as the captured image V1z. In this case, the relationship between the camera 2 and the captured image V1 by that camera 2 becomes easier to understand.

It is desirable for the viewer to easily grasp the correspondence between the view frustum 40 of camera 2 (or camera 2) and the image V1 captured by that camera 2. By displaying the captured image V1 near the view frustum 40, the relationship can be easily grasped.

In particular, when producing sports footage, it is expected that the view frustum 40 of multiple cameras 2 will be displayed within the overhead image V3 as shown in FIG. 16. In such cases, if the relationship between the view frustum 40 and the captured image V1 is not clear, it is expected that the viewer will become confused. Therefore, it is advisable to display the captured image V1 of a certain camera 2 near the view frustum 40 of that camera 2.

However, depending on structures in the overhead image V3, the direction and angle of the view frustum 40, or the positional relationship between the view frustums 40, there may be cases where the captured image V1 cannot be displayed near the view frustum 40 or where the correspondence is not clear.
Therefore, for example, the color of the frame of the captured image V1 may be matched with the semi-transparent color or the color of the contour of the corresponding view frustum 40 to indicate the correspondence.

In the example of FIG. 16,

view frustums

40a, 40b, and 40c corresponding to the three cameras 2 are displayed within the overhead image V3. In addition, captured images V1a, V1b, and V1c corresponding to these

view frustums

40a, 40b, and 40c are also displayed.

The photographed image V1a is displayed on a frustum far end surface 45 of the view frustum 40a. The photographed image V1b is displayed in the vicinity of a frustum starting point 46 of the view frustum 40b (in the vicinity of the camera position).
The captured image V1c is displayed in a corner of the screen, but is displayed in the upper left corner, which is closest to the view frustum 40c, among the four corners of the overhead image V3.

For example, in the case of a mobile camera 2M, the fluctuation of the view frustum 40 will be more severe than the view frustum 40 of the fixed camera 2. Therefore, the image V1 captured by the mobile camera 2 may be displayed fixedly in a corner of the screen, for example.

The above Figure 16 is an example of an overhead image V3 in which the target space 8 is viewed from diagonally above, but the AR system 5 may also display a planar overhead image V3 viewed from directly above, as shown in Figure 17.
In this example, cameras 2a, 2b, 2c, and 2d, their

corresponding view frustums

40a, 40b, 40c, and 40d, and the captured images V1a, V1b, V1c, and V1d are displayed as an overhead image V3.
The captured images V1a, V1b, V1c, and V1d are displayed near the corresponding cameras 2a, 2b, 2c, and 2d, respectively.

The AR system 5 may be configured so that the viewpoint direction of the overhead image V3 shown in Figures 16 and 17 can be continuously changed by the viewer operating the GUI device 11, etc.

18 is another example of the overhead image V3. In the overhead image V3, which is a CG representation of an automobile racing track, the

view frustums

40a and 40b are displayed, and the images V1a and V1b captured by the cameras 2 of the

view frustums

40a and 40b are displayed in the corners of the screen or near the camera positions.
For example, when shooting a race, it is difficult to tell from the shot image V1 alone which part of the course is being shot, but by simultaneously displaying the overhead image V3, the view frustum 40, and the shot image V1, the relationship becomes easier to understand.
In particular, when multiple cameras 2 are arranged along the course, the shooting conditions can be easily understood by displaying each of the view frustums 40 and shot images V1, as in the example shown in the figure.

As illustrated in Figs. 9 to 18 above, the AR system 5 displays the view frustum 40 of the camera 2 in the CG space 30, and generates video data for an overhead image V3 so that the captured image V1 of the camera 2 is also displayed at the same time. By displaying this overhead image V3 on the camera 2 or GUI device 11, viewers such as the cameraman or director can easily understand the shooting situation.

Let me be more specific.
By displaying the view frustum 40 and the captured image V1 in the CG space 30, the correspondence between the captured image V1 of the camera 2 and the spatial position becomes clear, and the viewer can easily grasp the correspondence between the captured image V1 of the camera 2 and the position in the target space 8.

In addition, the viewer can easily understand what each camera 2 is capturing, where the camera is focused, and so on.
In particular, if you have little experience with shooting or video production using the camera 2, it may be difficult to understand the correspondence between the position of the camera 2 and the shot image V1, and you may end up going back and forth between the screen of the shot image V1 and the screen of the overhead image V3. By displaying the shot image V1 in the CG space 30 as a single screen, it is possible to eliminate such going back and forth between the screens.

In addition, from the position of the camera 2 and the captured image V1, it is possible to predict which camera 2 will next capture the target subject.
For example, if a player runs to the right in the image V1a captured by the camera 2a, it can be predicted that the player will next be captured by the camera 2b. Such a prediction is difficult to make using only the captured image V1a.

For example, from the viewpoint of a director using the GUI device 11, by visually checking the view frustum 40 of the multiple cameras 2 and the overhead image V3 displaying the shot images V1, the director can very easily grasp the relative positions of the cameras, the relationship between the shooting directions, the subject being shot, etc. This allows the director to give appropriate instructions.
From the director's point of view, it is enough to know the general content of each shot image V1. Therefore, there is no problem even if the shot image V1 is relatively small in the overhead image V3. On the other hand, by displaying the view frustum 40 of each camera 2 in the CG space 30, the director can check and simulate the composition, standing position, and camera position while taking into consideration the overall situation of each camera 2.

When adjusting the focus, the cameraman can perform the focusing operation by looking at the depth of field range 42 of the view frustum 40.
In addition, by checking the view frustum 40 of the camera 2 that the user is operating, the user can easily check the location and direction being photographed within the overhead image V3 of the subject space 8 represented by CG.
In addition, the user can see the view frustum 40 and the captured image V1 of the other camera 2 and reflect them in the operation of his/her own camera. The user can also grasp the relationship between the contents of the images captured by the other camera 2, the direction of the subject, etc., and therefore can perform preferable shooting in relation to the other camera 2. For example, the user can check the position and angle of view of the other camera 2 and shoot from a different position and angle of view with his/her own camera 2.

From the perspective of the operating staff who remotely operates the camera 2, for example, by controlling the focus of a mobile camera 2, this is convenient when the situation on-site is difficult to see due to remote operation. In other words, the overhead image V3 increases the amount of information (captured image V1, position, etc.), making it easier to grasp the situation on-site.

In Figures 9 to 18, various display positions of the captured image V1 are shown as examples of displaying the captured image V1 together with the view frustum 40, but it is preferable that this display position be appropriately changed based on the user's intention or automatic determination.
An example of processing of the AR system 5, including changing the display settings of the captured video V1, will be described below.

FIG. 19 shows an example of processing by the AR system 5 that generates video data for the overhead view video V3. In this case, the video data for the overhead view video V3 is video data in which the view frustum 40 and the captured video V1 are synthesized into the CG space 30, which corresponds to the subject space 8. In other words, it is video data for displaying the images shown in FIGS. 9 to 18.

The AR system 5 performs the processes from step S101 to step S107 in FIG. 19 for each frame of the video data of the overhead video V3, for example. These processes can be considered as control processes of the CPU 71 (video processing unit 71a, video generation control unit 71b) in the information processing device 70 in FIG. 7 as the AR system 5.

In step S101, the AR system 5 sets the CG space 30. For example, it sets the viewpoint position of the CG space 30 corresponding to the shooting target space 8, and renders an image as the CG space 30 from that viewpoint position. In particular, if there is no change in the viewpoint position or image content between the previous frame and the CG space 30, the image in the CG space of the previous frame can be used in the current frame as well.

In step S102, the AR system 5 inputs the captured image V1 and metadata MT from the camera 2. That is, the captured image V1 of the current frame, and the attitude information, focal length, angle of view, aperture value, and the like of the camera 2 at the frame timing are acquired.
For example, when one AR system 5 displays the view frustum 40 and the captured image V1 for a plurality of cameras 2 as shown in FIG. 4, the AR system 5 inputs the captured image V1 and metadata MT of each camera 2.
When there are multiple camera systems 1 in which the cameras 2 and AR systems 5 correspond 1:1 as shown in Figure 3, and each generates an overhead image V3 including multiple view frustums 40 and captured images V1, it is preferable for each of these AR systems 5 to work together so as to share the metadata MT and captured images V1 of the corresponding camera 2.

In step S103, the AR system 5 generates a view frustum 40 for the current frame. From the metadata MT acquired in step S102, the AR system 5 sets the direction of the view frustum 40 in the CG space 30 according to the attitude of the camera 2, the quadrangular pyramid shape according to the angle of view, the positions of the focus plane 41 and the depth of field range 42 based on the focal length and aperture value, and the like, and generates an image of the view frustum 40 according to the settings.
When displaying the view frustum 40 for a plurality of cameras 2 , the AR system 5 generates an image of the view frustum 40 according to the metadata MT of each camera 2 .

In step S104, the AR system 5 sets the display position for the captured image V1 acquired in step S103. Various examples of this process will be described later.

In step S105, the AR system 5 synthesizes the view frustum 40 corresponding to one or more cameras 2 and the captured image V1 into the CG space 30 that becomes the overhead image V3, generating image data for one frame of the overhead image V3.

Then, in step S106, the AR system 5 outputs one frame of video data of the overhead view video V3.
The above process is repeated until the display of the view frustum 40 and the captured image V1 is completed. As a result, the overhead image V3 as shown in FIGS.

An example of setting the display position of the captured image V1 in step S104 will be described.
20, 21 and 22 show examples in which the display position of the photographed video V1 is set fixedly, while FIGS. 23 and 24 show examples in which the display position of the photographed video V1 is set variably.
20, 21, 22, 23, and 24 are examples of display position settings of the captured image V1 corresponding to one camera 2. When displaying the view frustum 40 and the captured image V1 for a plurality of cameras 2, the processes as shown in Fig. 20 to 24 may be performed for each camera 2. Furthermore, the same display position setting process may be performed for each camera 2, or different display position setting processes may be performed.

First, FIG. 20 shows a display position setting process when the photographed image V1 is displayed on the focus plane 41 as in FIG.
In step S120, the AR system 5 determines the size and shape of the focus plane 41 in the view frustum 40 generated in step S103 in Fig. 19 for the current frame. In step S121 in Fig. 20, the AR system 5 sets the size and shape of the captured image V1 so as to match the focus plane 41.

The shape of the captured image V1 to be synthesized within the view frustum 40 may be the cross-sectional shape of that view frustum 40. For example, the shape of the focus plane 41 differs depending on the viewpoint of the overhead image V3 and the position and direction of the view frustum 40 to be displayed, but may be the shape of a cross section cut perpendicular to the optical axis of the camera 2 at the focus plane 41 of the view frustum 40 in that frame.
Therefore, when the photographed image V1 is displayed within the view frustum 40, the photographed image V1 is transformed into a cross-sectional shape perpendicular to the optical axis and then synthesized.
However, it does not necessarily have to be displayed in a cross-sectional shape perpendicular to the optical axis. It may be displayed in the view frustum 40 with a cross-sectional shape that is not perpendicular to the optical axis of the camera 2.

After the above processing, when the process proceeds to step S105 in FIG. 19, the size and shape of the captured image V1 are adjusted, and an overhead image V3 is generated by combining the captured image V1 with the focus plane 41 of the view frustum 40.

FIG. 21 shows a display position setting process in the case where the photographed image V1 is displayed on the depth far end surface 44 as in FIG.
In step S130, the AR system 5 determines the size and shape of the depth far end surface 44 in the view frustum 40 generated in step S103 in the current frame.
In step S131, the AR system 5 sets the size and shape of the captured image V1 so as to match the size of the depth far end surface 44.

As a result, when the process proceeds to step S105 in FIG. 19, the size and shape of the captured image V1 are adjusted, and an overhead image V3 is generated in which the captured image V1 is composited onto the far end surface 44 of the view frustum 40.

FIG. 22 shows a display position setting process in the case where the photographed image V1 is displayed in the vicinity of the frustum starting point 46 as in FIG.
In step S140, the AR system 5 sets the display position of the captured image V1 within the view frustum 40 generated in step S103 in the current frame. That is, a certain position is set on the frustum origin 46 side of the depth of field range 42. In this case, the position may be set as a fixed distance from the frustum origin 46, or may be set as a position where a minimum area is obtained as a cross section of a quadrangular pyramid shape according to the angle of view.

In step S141, the AR system 5 determines the cross section at the set display position, that is, the size and shape of the display area.
In step S142, the AR system 5 sets the size and shape of the captured image V1 so as to match the cross section of the determined display position.

As a result, when the process proceeds to step S105, the size and shape of the captured image V1 are adjusted, and an overhead image V3 is generated in which the captured image V1 is composited at a position near the frustum origin 46 of the view frustum 40.

Next, FIG. 23 shows a display position setting process in which the display position of the captured image V1 is changed according to the operation of a user such as a cameraman or director.

In step S150, the AR system 5 checks whether or not a display position change operation has been performed on the captured image V1. For example, the GUI device 11 and the camera 2 are configured so that a director, cameraman, etc. can change the display position by performing a specified operation. The AR system 5 checks the operation information for the display position change operation from the control signal CS that it receives.

For example, it is possible to operate the display position setting so that it can be changed within the view frustum 40, such as to the "focus plane 41," "depth far end plane 44," "plane 47 near the frustum starting point," and "frustum far end plane 45." An operation interface may be provided that allows each plane to be switched by a toggle operation, or an operation interface may be provided that allows each plane to be directly specified.

Furthermore, the display position setting may be switched not only to positions within the view frustum 40 but also to positions outside the view frustum 40 .
For example, it is possible to perform operations such as changing the focus plane 41, the far end plane of the frustum 45, the corner of the screen, and the vicinity of the camera.

Furthermore, the display position setting can be switched outside the view frustum 40. For example, it is possible to change the position to "near the focus plane 41," "near the far end surface 45 of the frustum," "corner of the screen," or "near camera 2."

9 to 18, various examples of display positions of the captured image V1 are given. Within the view frustum 40, the "focus plane 41", the "depth near end plane 43", the "depth far end plane 44", the "plane 47 near the frustum starting point", and the "frustum far end plane 45" are given as examples. Outside the view frustum 40, the "corner of the screen", "near the camera", "near the focus plane 41", "farther than the frustum far end plane 45" and the like are given as examples.
Of these, the user may be able to set a position that can be selected by a switching operation.
In addition, for example, the display position within the depth of field range 42 and the display position near the focus plane 41 may be adjustable by the user.

If no particular operation to change the display position is confirmed at the time of processing the current frame, the AR system 5 proceeds to step S151, maintains the same display position setting as in the previous frame, and ends the processing of FIG.
As a result, when the process proceeds to step S105 in FIG. 19, a frame of the current overhead image V3 is generated in which the shot image V1 is displayed in the same position as in the previous frame.

If a display position change operation is confirmed at the time of processing the current frame, the AR system 5 proceeds from step S150 to step S152 in FIG. 23 and changes the display position setting in response to the operation. For example, the setting that had been set as the focus plane 41 until then may be switched to the frustum far end plane 45.

In step S153, the AR system 5 branches the process depending on whether the changed position setting is outside the view frustum 40 or not.
If the changed position setting is a position within the view frustum 40, the AR system 5 proceeds to step S154, and determines the size and shape of the display area as a cross section of the view frustum 40 at the set position.
Then, in step S156, the AR system 5 sets the size and shape of the captured image V1 so as to match the cross section of the determined display position.

As a result, when the process proceeds to step S105 in FIG. 19, the size of the captured image V1 is adjusted, and an overhead image V3 is generated in which the captured image V1 is composited at a position within the view frustum 40 different from the previous frame.

If the position setting changed this time in response to an operation is outside the view frustum 40, the AR system 5 proceeds from step S153 to step S155 in FIG. 23, and sets the display size and shape of the captured image V1 at the new set position. If outside the view frustum 40, the shape of the captured image V1 to be synthesized is not limited to the cross-sectional shape of the view frustum 40, and may be, for example, a rectangle, or if it is near the view frustum 40, a parallelogram according to the angle of the view frustum 40. The size of the captured image V1 can also be set relatively freely, but it is desirable to set it appropriately according to other displays on the screen.

As a result, when the process proceeds to step S105 in FIG. 19, the size and shape of the captured image V1 are adjusted, and an overhead image V3 is generated in which the captured image V1 is composited at a position outside the view frustum 40 that is different from the previous frame.

In the above processing example of FIG. 23, it is possible to change the display position outside the view frustum 40, but it is also possible to change the display position only within the view frustum 40. In that case, steps S153 and S155 are unnecessary.

Alternatively, the display position may be changed only outside the view frustum 40. In that case, steps S153 and S154 are unnecessary, and the process may proceed from step S152 to step S155.

Next, FIG. 24 shows an example of processing in which the AR system 5 automatically changes the display position of the captured image V1.

In step S160, the AR system 5 performs a display position change determination.
The display position change determination is a process of determining whether or not to change the display position setting of the photographed video V1 in the current frame from that in the previous frame.
Examples of this determination process include the following processes (P1), (P2), and (P3).
(P1) Determination based on the positional relationship between the view frustum 40 and an object in the overhead image V3. (P2) Determination based on the angle of the view frustum 40 in the overhead image V3. (P3) Determination based on the viewpoint position of the overhead image V3.

First, let us take an example of (P1).
For example, it judges whether the view frustum 40 collides with the ground or a wall in the overhead image V3. For example, Fig. 25 shows a state in which the frustum far end surface 45 of the finitely distant view frustum 40 collides with the ground GR and is partially embedded therein. Fig. 26 shows a state in which the far end side of the finitely distant or infinitely distant view frustum 40 collides with a structure CN and it becomes impossible to display anything beyond that.

For example, suppose that up until the previous frame, the captured image V1 was displayed on or near the far end surface 45 of the frustum 40 within the view frustum 40, but in the current frame, the far end of the view frustum 40 has collided with an object and become embedded in it, as shown in Figures 25 and 26. In such a case, displaying the captured image V1 with the same settings as the previous time will no longer be appropriate. It is expected that part of the captured image V1 will be missing, or that the entire image will become invisible. Therefore, it is determined that a change in the display position is required.

In addition, if the pyramidal shape of the view frustum 40 widens or its direction changes due to a change in the angle of view or shooting direction of the camera 2, and it is determined that the display position of the captured image V1 up to that point is not appropriate based on the positional relationship between a specific position of the view frustum 40 (such as the frustum far end surface 45 or the focus surface 41) and other objects being displayed, it may be determined that the display position needs to be changed.

In addition, other view frustums 40 may also be considered as objects in the overhead image V3, and if it is determined that the display position of the captured image V1 is not appropriate due to its positional relationship with the other view frustums 40, it may be determined that the display position needs to be changed.

In addition, if the positional relationship with other view frustums 40 is the cause, as in Figure 17, when multiple view frustums 40 overlap, making it difficult to understand the relationship between the view frustum 40 and the captured image V1, it may be determined that the display position needs to be changed.

Next, the example of (P2) takes into consideration the visibility of the captured image V1 that is adapted to the cross-sectional shape of the view frustum 40.
Depending on the direction of the view frustum 40 in the overhead image V3, the cross-sectional shape may not be appropriate as a display surface. The shape and direction of the view frustum 40 change according to the angle of view and the shooting direction of the camera 2. Then, the angle of the view frustum 40 displayed in the overhead image V3 also changes. That is, the angle between the direction from the viewpoint of the entire overhead image V3 and the axial direction of the view frustum 40 changes. This angle is the angle between the normal direction on the display screen when viewed from the line of sight from the viewpoint set for the overhead image V3 at a certain time, and the axial direction of the displayed view frustum 40. Note that the axial direction of the view frustum 40 is the direction of the perpendicular line drawn from the frustum starting point 46 to the frustum far end surface 45.
For example, Fig. 27 shows the captured images V1a, V1b, and V1c corresponding to the

view frustum

40a, 40b, and 40c. In this case, the captured image V1a displayed according to the cross-sectional shape becomes a parallelogram with a large difference between acute and obtuse angles due to the angle of the view frustum 40a in the overhead image V3. If this continues, the visibility of the captured image V1a will be poor. In such a case, it is advisable to change the display position as shown by the dashed arrow and display it at the position of the captured image V1a'.
In this way, it is conceivable to determine that the display position needs to be changed when the acute angle and obtuse angle of the photographed image V1 are equal to or greater than a predetermined value.

The example (P3) is based on the same idea as (P2).
The viewpoint position of the overhead view video V3 can be changed according to an operation by a director, etc. For example, the viewpoint position of the overhead view video V3 may be changed from the state shown in Fig. 16 to that shown in Fig. 27 by an operation.
In the case of Fig. 27, the visibility of the captured image V1a is poor, as in the case described above. In other words, even if there is no change in the angle of view or the shooting direction of the camera 2, the shape of the rendered view frustum 40 and the captured image V1 changes due to a change in the viewpoint of the overhead image V3, which may reduce visibility. In such a case, for example, if the acute and obtuse angles of the captured image V1 become equal to or larger than a predetermined value, it is determined that the display position needs to be changed.

In addition, changing the viewpoint of the overhead image V3 may cause the size of the captured image V1 to become smaller. If the viewpoint position when rendering the overhead image V3 is changed to a distant position, causing the size of the captured image V1 to become equal to or smaller than a predetermined size, it may be determined that the display position needs to be changed.

In step S160 of FIG. 24, the AR system 5 performs a display position change determination as described above, and in step S161, the process branches depending on whether a change is required.

If it is determined that no change is necessary, the AR system 5 proceeds to step S162, where it maintains the same display position setting as in the previous frame, and ends the processing of FIG.
As a result, when the process proceeds to step S105 in FIG. 19, a frame of the current overhead image V3 is generated in which the shot image V1 is displayed in the same position as in the previous frame.

When it is determined that a change is required in the display position change determination, the AR system 5 proceeds from step S161 to step S163 in FIG. 24 and selects a destination to which the display position setting is to be changed.
The destination of this change may be determined depending on the reason why the display position change is required.
For example, in the above (P1), if a collision with an object in the overhead image V3 occurs, it is possible to change the position to a position that is not affected by the collision point, such as the surface 47 near the frustum origin or a corner of the screen.
If the visibility of the captured image V1 decreases in the above (P2) and (P3), it may be possible to select a location outside the view frustum 40 where a shape allows for good visibility, such as a corner of the screen or near the focus plane 41.

The type information of the camera 2 can also be used to set the destination of the captured image V1. For example, if the object to be changed is a mobile camera 2M, the destination can be a corner of the screen. For the captured image V1 of the mobile camera 2M, it can be displayed within the view frustum 40 when the camera is not moving, and can be changed to the corner of the screen when the camera is moving. This is because the movement of the view frustum 40 within the overhead image V3 becomes larger when the camera is moving, and the visibility of the captured image V1 within the view frustum 40 decreases.

In step S164, the AR system 5 branches the process depending on whether the selected destination is outside the view frustum 40 or not.

If the change destination is a position within the view frustum 40, the AR system 5 proceeds to step S165, where it determines the size and shape of the display area as a cross section of the view frustum 40 at the set position. Then, in step S167, the AR system 5 sets the size and shape of the captured image V1 so that it matches the cross section of the determined display position.

If the position selected as the new position this time is outside the view frustum 40, the AR system 5 proceeds to step S166 in FIG. 24 and sets the display size and shape of the captured image V1 at the new set position (similar to step S155 in FIG. 23).

In the above processing example of FIG. 24, an example is also conceivable in which the display position is changed only to within the view frustum 40, in which case steps S164 and S166 are unnecessary.
Furthermore, the display position may be changed only to outside the view frustum 40. In this case, steps S164 and S165 are unnecessary, and the process may proceed from step S163 to step S166.

Up to this point, with reference to Figures 8 to 24, we have described examples of displaying the captured image V1 together with the view frustum 40. However, for example, the view frustum 40 and the captured image V1 may be displayed together all the time, or may be displayed only temporarily.
For example, it is possible to normally display the view frustum 40 but not the shot image V1. In that case, the cameraman or director may perform an operation to select the view frustum 40, so that the shot image V1 corresponding to the selected view frustum 40 is displayed.
Alternatively, a cameraman or director may be able to switch between a mode in which only the view frustum 40 is displayed and a mode in which the view frustum 40 and the shot video V1 are displayed simultaneously.

<4. Example of a cameraman and director screen>
In the system of this embodiment, an overhead image V3-1 is displayed on the GUI device 11 for the director, and an overhead image V3-2 is displayed on a display unit such as a viewfinder of the camera 2 for the cameraman.

In this case, the overhead images V3-1 and V3-2 are both images showing the view frustum 40 in the CG space 30 simulating the shooting target space 8, but are images with different display modes. This makes it possible to provide information appropriate for the role of the director or cameraman.

[4-1: Highlighted display]
Various examples are conceivable in which the overhead views V3-1 and V3-2 are images in different aspects.
28 to 32, an example will be described in which the AR system 5 displays the view frustum 40 of a specific camera including a subject of interest in the director's overhead video V3-1 in a different display mode from the other view frustum 40. In particular, an example will be described in which a certain view frustum 40 is highlighted. On the other hand, such highlighting is not performed in the cameraman's overhead video V3-2.

FIG. 28 shows an example in which an overhead view image V3-1 is displayed as the device display image 51 on the GUI device 11.
This overhead image V3-1 is an image that includes a CG space 30 overlooking the target shooting space 8, for example a stadium, and displays the view frustums 40 of a plurality of cameras 2 taking pictures at the stadium. View

frustums

40a, 40b, and 40c for the three cameras 2 are displayed.

In this example, the view frustum 40a is displayed in a different manner from the

other view frustums

40b and 40c. In this particular case, the view frustum 40a is highlighted and made to stand out more than the

other view frustums

40b and 40c.

As mentioned above, the shape and direction of the view frustum 40 and the display positions of the focus plane 41 and depth of field range 42 are determined by the angle of view, shooting direction, focal length, depth of field, etc. of the camera 2 at that time, and therefore these differences are not included in the difference in display mode referred to here. Different display modes of the view frustum 40 do not refer to differences determined by the state of the angle of view or shooting direction of the camera 2, but to differences in the display of the view frustum 40 itself. For example, differences in color, brightness, darkness, type and thickness of the outline, differences in the display of the pyramid faces, differences between normal display and flashing display, differences in the flashing cycle, etc.

In the example of FIG. 28, for example, when the view frustum 40 is normally displayed in a semi-transparent white color, the view frustum 40a is highlighted in a semi-transparent red color, for example. This allows the view frustum 40a to be highlighted and shown to the director, etc.

One of the conditions for such highlighted display is that the subject of interest is currently being photographed.
Various settings are possible for the target subject, but in the case of a sports broadcast, possible targets include "a specific player,""a player involved with a sports object such as a ball," and "a sports object such as a ball."
For example, the AR system 5 configured as shown in FIG. 4 determines whether or not a particular subject of interest, such as a specific player, is being photographed by performing image recognition processing on the captured image V1 of each camera 2.
For example, it is determined whether or not the image of the video V1 captured by the camera 2 shows a target subject as shown in Fig. 29. Then, the AR system 5 generates an overhead video V3-1 so that the view frustum 40 of the camera 2 capturing the target subject is displayed in a highlighted manner.

However, if the highlighting is done simply on the condition that it captures the subject of interest, many view frustums 40 may be highlighted, which reduces the significance of the highlighting. Therefore, below, we will explain an example of processing to select the camera 2 with the most appropriate captured image V1 as an image of the subject of interest.

The processing examples in the following Figures 30, 31, 32, 34, 36, 38, 41, 43, 45, 48, and 52 are easy to understand in the case of a system in which the AR system 5 is integrated and corresponds to each camera 2 as in Figure 4. However, even in the case of the configuration in Figure 3, it is possible to implement the processing examples by providing multiple camera systems 1 and linking the AR systems 5 of each camera system 1.

30 shows an example of processing by the AR system 5 that generates the video data for the overhead images V3-1 and V3-2. The video data for the overhead images V3-1 and V3-2 in this case refers to video data in which a view frustum 40 is synthesized with a CG space 30 that corresponds to the subject space 8.
As described above, the overhead views V3-1 and V3-2 may be further synthesized with the shot image V1.

The AR system 5 performs the processes from step S101 to step S107 in FIG. 30 for each frame of the video data of the overhead images V3-1 and V3-2, for example. These processes can be considered as control processes of the CPU 71 (video processing unit 71a) in the information processing device 70 in FIG. 7 as the AR system 5.

In step S102, the AR system 5 inputs the captured image V1 and metadata MT from the camera 2. That is, the captured image V1 of the current frame, and the attitude information, focal length, angle of view, aperture value, and the like of the camera 2 at the frame timing are acquired.
When displaying the view frustum 40 and the captured video V1 for a plurality of cameras 2, the AR system 5 inputs the captured video V1 and metadata MT of each camera 2.

In step S201, the AR system 5 generates a view frustum 40 for the cameraman for the current frame. The view frustum 40 for the cameraman is a view frustum 40 to be synthesized with the overhead image V3-2 to be transmitted to the camera 2 and displayed.
In the case of the AR system 5 having the configuration shown in FIG. 4, a view frustum 40 for the cameraman is generated separately in correspondence with each of the cameras 2.
In the case of the AR system 5 having the configuration shown in FIG. 3, the AR system 5 in the camera system 1 generates a view frustum 40 to be displayed on the camera 2 of the camera system 1 .

From the metadata MT acquired in step S102, the AR system 5 sets the direction of the view frustum 40 within the CG space 30 according to the attitude of the camera 2, the pyramid shape according to the angle of view, the position of the focus plane 41 and depth of field range 42 based on the focal length and aperture value, and so on, and generates an image of the view frustum 40 according to the settings.
When displaying the view frustum 40 for a plurality of cameras 2 , the AR system 5 generates an image of the view frustum 40 according to the metadata MT of each camera 2 .

In step S202, the AR system 5 generates a director's view frustum 40 for the current frame. The director's view frustum 40 is a view frustum 40 to be transmitted to the GUI device 11 and synthesized with the overhead view video V3-1 to be displayed.
Basically, similarly to step S201, an image of the view frustum 40 is generated based on the attitude (shooting direction), angle of view, focal length, and aperture value of each camera 2.

However, the view frustum 40 for the cameraman generated in step S201 and the view frustum 40 for the director generated in step S202 may be displayed in different ways. A specific example will be described later.

In step S203, the AR system 5 synthesizes the view frustum 40 generated for the cameraman into the CG space 30 that will become the overhead image V3-2, to generate one frame of video data for the overhead image V3-2. Note that the captured image V1 may also be synthesized in correspondence with each view frustum 40.

In step S204, the AR system 5 synthesizes the view frustum 40 generated for the director into the CG space 30 that will become the overhead image V3-1, to generate one frame of video data for the overhead image V3-1. Note that the shot image V1 may also be synthesized in correspondence with each view frustum 40.

Then, in step S205, the AR system 5 outputs one frame of video data of the overhead videos V3-1 and V3-2.
The above process is repeated until the display of the view frustum 40 is completed.

A process of emphasizing one view frustum 40, for example the view frustum 40a, as shown in FIG. 28, using the process shown in FIG. 30 will be described.
28 is an example of the overhead view V3-1 viewed by the director. In the overhead view V3-2 viewed by the cameraman, the highlighting is not performed. In other words, in the overhead view V3-2, the

view frustums

40a, 40b, and 40c are all displayed in the same display mode, that is, white semi-transparent.

FIG. 31 shows a specific example of the processes in steps S201 and S202 in FIG.
In step S201, the AR system 5 generates a view frustum 40 for each camera 2 as step S210. That is, for example, the

view frustums

40a, 40b, and 40c are generated as the same white semi-transparent image for the cameraman.

In the subsequent step S202, the AR system 5 acquires the value of the screen occupancy rate of the target subject for the captured image V1 of each camera 2 as step S210.
For example, the AR system 5 constantly performs image recognition processing on the captured images V1 of each camera 2, and determines whether or not the set target subject is captured, and determines the screen occupancy rate in each frame. For example, as shown in Fig. 29, the screen occupancy rate is calculated by determining whether the target subject is captured and the area of the target subject in the screen. In step S210, the AR system 5 obtains the screen occupancy rate of the target subject in each captured image V1 at the current time calculated in this manner.

In step S211, the AR system 5 determines the optimal captured image V1. For example, the captured image V1 with the highest screen occupancy rate is determined to be optimal.

In step S212, the AR system 5 generates an image of each view frustum 40, including highlighting of the view frustum 40 corresponding to the camera 2 of the optimal shot image V1, as the view frustum 40 for the director. For example, the view frustum 40a is highlighted as a red semi-transparent image, and the

view frustums

40b and 40c are highlighted as white semi-transparent images.

After performing the processes of steps S201 and S202 in FIG. 30 as shown in FIG. 31, the AR system 5 performs the processes of steps S203, S204, and S205. As a result, the overhead image V3-1 displayed on the GUI device 11 becomes as shown in FIG. 28. On the other hand, in the overhead image V3-2 displayed by each camera 2, the view frustum 40 is not highlighted.

This allows the director to recognize the camera 2 currently showing the largest image of the subject of interest.
In the above, the view frustum 40 for highlighting is selected based on the screen occupancy rate of the target subject, but the selection may be based on the continuous shooting time instead of the screen occupancy rate.

Another example of step S202 is shown in Fig. 32. Note that step S201 is the same as in Fig. 31.
In step S202 of FIG. 30, the AR system 5 acquires the value of the continuous shooting time of the target subject for the captured image V1 of each camera 2 as step S215 of FIG.
As described above, the AR system 5 constantly performs image recognition processing on the captured images V1 of each camera 2, and determines whether or not the set target subject is captured. In this case, the AR system 5 calculates the duration (number of continuous frames) during which the target subject is recognized for each captured image V1. Then, in step S215, the AR system 5 obtains the continuous shooting time calculated in this manner.

In step S211, the AR system 5 determines the optimal captured image V1. In this case, the captured image V1 with the longest continuous shooting time is determined to be optimal.

In step S212, the AR system 5 generates an image of each view frustum 40, including a highlight of the view frustum 40 corresponding to the camera 2 of the optimal shot image V1, as the view frustum 40 for the director.

Thereafter, the AR system 5 performs the processes of steps S203, S204, and S205 in Fig. 30. As a result, the overhead image V3-1 displayed on the GUI device 11 becomes as shown in Fig. 28.
This allows the director to recognize the camera 2 that is continuously capturing a long image of the subject of interest.

When highlighting the view frustum 40 in the overhead image V3-1 according to the screen occupancy rate or continuous shooting time of the subject of interest as described above, it is also possible to have the shot image V1 displayed only on the view frustum 40 that is being highlighted. This allows the director to simultaneously check how the subject of interest is being shot.

Next, we will explain an example in which the display mode of the overhead image V3-1 viewed by the director is changed based on feedback from the cameraman.

FIG. 33A shows an overhead image V3-1 as the device display image 51 of the GUI device 11. In this example, the

view frustums

40a, 40b, and 40c are each displayed in the same display mode, for example, semi-transparent white.

Here, it is assumed that, among the multiple cameras 2, a specific operation is performed by the cameraman (or remote operator) of the camera 2 corresponding to the view frustum 40a.
In this case, the overhead view video V3-1 will be as shown in Fig. 33B. That is, the view frustum 40a will be highlighted in a different manner from the

view frustums

40b and 40c, so that this will be clearly indicated to the director.

For example, a specific operation by the cameraman is an operation in which the cameraman notifies the director that "good footage is now being taken." If such an operation is made possible on the camera 2 side, when the operation is performed, the AR system 5 makes the display mode of the view frustum 40 of the camera 2 on which the operation was performed in the overhead image V3-1 different from the others.

An example of the process is shown in Fig. 34. Fig. 34 shows a concrete example of steps S201 and S202 in Fig. 30.
In step S201 of Fig. 30, the AR system 5 generates an image of the view frustum 40 for a cameraman as step S210 of Fig. 34. For example, the same white semi-transparent image is generated as the

view frustum

40a, 40b, and 40c.

In step S202 in FIG. 30, the AR system 5 first checks whether or not there has been feedback from each camera, that is, whether or not there has been a specific operation by the cameraman, in step S220 in FIG. 34, and branches the process in step S221.
If no specific operation has been performed, the AR system 5 proceeds from step S221 to step S223, and generates an image of the director's view frustum 40. For example, the same white semi-transparent image is generated as the

view frustum

40a, 40b, and 40c.

On the other hand, if a specific operation is detected, the AR system 5 proceeds to step S222 and generates an image of the view frustum 40 for the director, including highlighting. For example, the view frustum 40a is generated as a red semi-transparent image, and the

view frustums

40b and 40c are generated as white semi-transparent images.

After that, the AR system 5 performs the processes of steps S203, S204, and S205 in Fig. 30. As a result, the overhead image V3-1 displayed on the GUI device 11 becomes as shown in Fig. 33A or Fig. 33B. In other words, if there is no specific operation from the cameraman, the image becomes as shown in Fig. 33A, and once there is a specific operation from the cameraman, the image becomes as shown in Fig. 33B. This allows the director to recognize the cameraman's appeal that "we're getting some good footage now."
On the other hand, in the overhead view image V3-2 displayed by each camera 2, the

view frustums

40a, 40b, and 40c are displayed in the same display mode.

Next, an example of changing the display mode when the view frustum 40 overlaps on the video image will be described.
35A shows an overhead view image V3-1 as the device display image 51 of the GUI device 11. In this example, the

view frustums

40a, 40b, and 40c are displayed in the same display mode.

Now, let us assume that the

view frustums

40a and 40b overlap on the image as shown in FIG. 35B. In this case, the

view frustums

40a and 40b are highlighted in a different way than usual to make them easier for the director to recognize.

An example of the process is shown in Fig. 36. Fig. 36 shows a concrete example of steps S201 and S202 in Fig. 30.
In step S201 of Fig. 30, the AR system 5 generates an image of the view frustum 40 for a cameraman as step S210 of Fig. 36. For example, the same white semi-transparent image is generated as the

view frustum

40a, 40b, and 40c.

In step S202 of FIG. 30, the AR system 5 first sets the size, shape, and orientation of the view frustum 40 of each camera 2 based on the metadata MT of each camera 2 in step S230 of FIG. 36.

In step S231, the AR system 5 checks the arrangement of each view frustum 40 within the three-dimensional coordinates of the CG space 30 of the current frame. This makes it possible to check whether the view frustums 40 overlap.

In step S232, the AR system 5 branches the process depending on whether or not there is an overlap.
If there is no overlap of the view frustum 40, the AR system 5 proceeds to step S234, and generates an image of the view frustum 40 for the director. For example, the same white semi-transparent image is generated as the

view frustum

40a, 40b, and 40c.

On the other hand, if there is overlap, the AR system 5 proceeds to step S233 and generates an image of the view frustum 40 for the director, including highlighting. In this case, the overlapping view frustum 40, for

example view frustum

40a and 40b, are generated as a red semi-transparent image, and the non-overlapping view frustum 40c is generated as a white semi-transparent image, etc.

After that, the AR system 5 performs the processes of steps S203, S204, and S205 in Fig. 30. As a result, the overhead image V3-1 displayed on the GUI device 11 becomes as shown in Fig. 35A or Fig. 35B. In other words, if there is no overlap of the view frustum 40, the image becomes as shown in Fig. 35A, and if there is overlap, the image becomes as shown in Fig. 35B. This allows the director, etc. to easily recognize the situation in which the same subject is being shot from different viewpoints by multiple cameras 2. This makes it possible to clarify instructions to each cameraman. It is also convenient for switching main line images when it is desired to switch images of the same subject.
On the other hand, in the overhead view image V3-2 displayed by each camera 2, the

view frustums

40a, 40b, and 40c are displayed in the same display mode.

[4-2: Priority Display]
Next, an example will be described in which, when view frustums 40 overlap on an image, a certain view frustum 40 is preferentially displayed.
17, when the

view frustums

40a, 40b, 40c, and 40d overlap, the overlap may reduce visibility. In particular, overlapping semi-transparent view frustums 40 makes it difficult to see the focus planes 41 and depth of field ranges 42 of the respective view frustums.

Therefore, as shown in FIG. 37, one view frustum 40 is preferentially displayed.
37 shows an overhead image V3-1 as a device display image 51 of the GUI device 11. In this example, the

view frustum

40a, 40b, 40c, and 40d overlap, but the view frustum 40a is set as a priority, and the focus plane 41 and depth of field range 42 of the view frustum 40a are displayed in the overlapping portion.

A processing example is shown in Fig. 38. Fig. 38 shows a concrete example of steps S201 and S202 in Fig. 30.
In step S201 of Fig. 30, the AR system 5 generates an image of the view frustum 40 for the cameraman as step S210 of Fig. 38. For example, images are generated as the

view frustum

40a, 40b, 40c, and 40d. No particular priority setting is made for the image of the view frustum 40 for the cameraman.

In step S202 of FIG. 30, the AR system 5 first sets the size, shape, and orientation of the view frustum 40 of each camera 2 based on the metadata MT of each camera 2 in step S240 of FIG. 38.

In step S241, the AR system 5 checks the arrangement of each view frustum 40 within the three-dimensional coordinates of the CG space 30 of the current frame. This makes it possible to check whether the view frustums 40 overlap.

In step S242, the AR system 5 branches the process depending on whether or not there is an overlap.
If there is no overlap of the view frustum 40, the AR system 5 proceeds to step S244, and generates an image of the view frustum 40 for the director. For example, images are generated as the

view frustum

40a, 40b, 40c, and 40d.

On the other hand, if there is overlap, the AR system 5 proceeds to step S245 to determine a view frustum 40 that has priority among the overlapping view frustum 40. Alternatively, the AR system 5 may determine a view frustum 40 that has priority among all view frustum 40, including those that do not overlap.
There are several possible methods for making this decision.
For example, it is conceivable to give priority to the view frustum 40 of the camera 2 which is currently capturing the main line image.
Alternatively, the director or the like may be allowed to arbitrarily select the view frustum 40 to be prioritized.
Furthermore, as described above, the view frustum 40 selected as the one to be highlighted by photographing a subject of interest or by a specific operation of the cameraman may be set as a priority.

In step S246, the AR system 5 generates an image of the view frustum 40 for the director. In this case, for the view frustum 40 set as a priority, an image is generated in which the focus plane 41 and depth of field range 42 are normally displayed. For the other view frustum 40, an image is generated in which the focus plane 41 and depth of field range 42 are not displayed in the areas that overlap with the view frustum 40 set as a priority. Alternatively, for all other view frustum 40, an image may be generated in which the focus plane 41 and depth of field range 42 are not displayed.

Thereafter, the AR system 5 performs the processes of steps S203, S204, and S205 in Fig. 30. As a result, the overhead image V3-1 displayed on the GUI device 11 becomes an image in which the focus plane 41 and the depth of field range 42 can be clearly recognized for the view frustum 40 that has been prioritized, even if the view frustum 40 overlaps, as shown in Fig. 37.
On the other hand, in the overhead view image V3-2 displayed by each camera 2, the

view frustums

40a, 40b, 40c, and 40d are displayed as shown in FIG.

37 and 38, the director's overhead view V3-1 is set as the priority, but the cameraman's overhead view V3-2 may be set as the priority. Considering the cameraman's visual confirmation, it is preferable that the view frustum 40 of the camera 2 that he is operating be set as the priority.
Therefore, in step S201 in Fig. 30 where a view frustum for the cameraman is generated, the same processing as steps S240 to S246 in Fig. 38 may be performed. However, the view frustum to be given priority in step S245 is determined to be the view frustum 40 of the camera 2 itself.
This allows the cameraman to clearly view the focus plane 41 and depth of field range 42 of the camera 2 he is operating even if the view frustum 40 overlaps with the view frustum 40 of another camera 2.

When priority is set in the overhead view video V3-2 in this way, priority may be set in the overhead view video V3-1 viewed by the director as described above, or it is also possible that priority is not set.
Even if priority is set for both the overhead images V3-1 and V3-2, the conditions for determining the prioritized view frustum 40 are different, so the overhead image V3-1 and all of the overhead images V3-2 displayed by each camera 2 will not be displayed in the same manner.

It is also possible that in the overhead view image V3-2 visually recognized by the cameraman, only the view frustum 40 of his own camera 2 is displayed, and the view frustums 40 of the other cameras 2 are not displayed.

[4-3: Instruction Display]
Next, an example will be described in which instructions from a director can be visually conveyed to a cameraman.
39A and 39B show an overhead view image V3-1 as the device display image 51 of the GUI device 11. In this example,

view frustums

40a, 40b, and 40c are displayed.
Fig. 40A also shows an overhead image V3-2 as the viewfinder display image 50 of camera 2. In this example, it is assumed that the overhead image V3-2 is synthesized in a corner of the screen of the shot image V1. Fig. 40B shows an enlarged view of the overhead image V3-2.

FIG. 39A shows an example of a case where the director has performed an instruction operation on camera 2 of view frustum 40a. For example, the director may perform an operation such as dragging view frustum 40b on the GUI device 11 to cause instruction frustum 40DR to be displayed. This is an instruction from the director to the cameraman of camera 2 of view frustum 40b to change the shooting direction to the direction of instruction frustum 40DR.

Therefore, in this case, the AR system 5 displays the instruction frustum 40DR for the view frustum 40b also in the overhead image V3-2 viewed by the cameraman, as shown in Figures 40A and 40B.
The cameraman operating the camera 2 of the view frustum 40b can comply with the director's instructions by changing the shooting direction so that the view frustum 40b coincides with the instruction frustum 40DR.

The instruction frustum 40DR may be configured to be able to specify not only the shooting direction but also the angle of view and the focus plane 41. For example, the director may be able to move the focus plane 41 forward or backward, widen the angle of view (change the inclination of the pyramid), etc., by operating the instruction frustum 40DR.
The cameraman can adjust the focus so that the focus plane 41 of the view frustum 40b coincides with the pointing frustum 40DR, and can adjust the angle of view so that the inclinations of the pyramids coincide with each other.

Note that overhead image V3-1 in FIG. 39A and overhead image V3-2 in FIG. 40A and FIG. 40B show examples in which the viewpoint position relative to CG space 30 is different. The director or cameraman can change the viewpoint position for overhead images V3-1 and V3-2 by operating the camera. The example in the figure shows that the CG space 30 is not necessarily displayed as seen from the same viewpoint position in overhead image V3-1 and overhead image V3-2.

FIG. 39B shows the state in which the director has also given instructions to the view frustum 40a, causing the instruction frustum 40DR to be displayed. In this way, in the overhead image V3-1, instructions can be given to each view frustum 40.

As shown in the figure, even if a new instruction is given, it is desirable to keep the instruction frustum 40DR of the previous instruction (instruction to the view frustum 40b) displayed as it is, in order to enable the director to confirm the currently valid instruction.
It is considered that the instruction frustum 40DR is erased from the overhead images V3-1 and V3-2 when the view frustum 40 of the designated camera 2 substantially coincides with the instruction frustum 40DR.
Alternatively, the instruction frustum 40DR may be erased from the overhead images V3-1 and V3-2 by a cancellation operation by the director, for example, to accommodate cancellation or change of instructions.

In addition, in the overhead view image V3-2, the instruction frustum 40DR for all the cameras 2 may be displayed, or only the instruction frustum 40DR for the camera 2 of the camera operator may be displayed.
By displaying the instruction frustum 40DR for all the cameras 2 on each camera 2, each cameraman can grasp the overall instructions being issued.
On the other hand, by displaying the instruction frustum 40DR only for the camera 2 of the cameraman himself, the cameraman can easily recognize the instructions given to him from the director.

A processing example is shown in FIG. 41. FIG. 41 shows specific examples of steps S201, S202, S203, and S204 in FIG. 30.

In step S201 in FIG. 30, the AR system 5 performs the processes of steps S250 to S254 in FIG.
First, in step S250, the AR system 5 generates an image of the view frustum 40 for a cameraman. For example, images are generated as the

view frustums

40a, 40b, and 40c.

In step S251, the AR system 5 checks whether or not there is an instruction operation by the director. If there is no instruction operation, the process proceeds to step S202 in FIG.
If a pointing operation has been performed, the AR system 5 proceeds from step S251 to step S252 in FIG. 41, and branches the process depending on the display mode of the pointing frustum 40DR.
The display mode in this case can be selected by the cameraman either in a mode in which only the instruction frustum 40DR for the cameraman himself is displayed or in a mode in which all instruction frustum 40DR are displayed.

In addition, such mode selection may not be possible, and only the instruction frustum 40DR for the user may always be displayed, or all instruction frustum 40DR may always be displayed.

If the mode is one in which the instruction frustum 40DR directed to the user is displayed, the AR system 5 proceeds to step S253 and generates an image of the instruction frustum 40DR. However, if the instruction from the director is not directed to the camera 2 that is the subject of the overhead image V3-2 generation process, it is not necessary to generate an image of the instruction frustum 40DR in step S253.

In this case, the video data transmitted to each camera 2 as the overhead video V3-2 will have different display contents. In other words, for each camera 2, there will be video data that contains the indication frustum 40DR and video data that does not contain the indication frustum 40DR.

If the mode is one in which all the indication frustums 40DR are to be displayed, the AR system 5 proceeds to step S254 and generates an image of the indication frustum 40DR that is valid at that time.

Following the above processing of steps S250 to S254, the AR system 5 performs the processing of step S202 in FIG. 30 as shown in steps S260 to S262 in FIG. 41.

In step S260, the AR system 5 generates an image of the view frustum 40 for the director. For example, images are generated as

view frustum

40a, 40b, and 40c.

In step S261, the AR system 5 checks whether or not there is an instruction operation by the director. If there is no instruction operation, the process proceeds to step S203 in FIG.
If a pointing operation has been performed, the AR system 5 proceeds from step S261 to step S262 in FIG. 41, and generates an image of the pointing frustum 40DR that is valid at that time point.

As step S203 in FIG. 30, the AR system 5 performs the processes of steps S255 and S256 in FIG.
In step S255, the AR system 5 synthesizes the overhead view image V3-2 with the view frustum 40 and the instruction frustum 40DR, thereby generating image data of the overhead view image V3-2 as shown in FIG.

In step S256, the AR system 5 synthesizes the overhead image V3-2 and the captured image V1 to generate image data of a synthetic image as shown in FIG. 40A.
The overhead view image V3-2 and the photographed image V1 may be combined on the camera 2 side.

As step S204 in FIG. 30, the AR system 5 performs the process of step S265 in FIG.
In step S265, the AR system 5 synthesizes the overhead view image V3-1 with the view frustum 40 and the instruction frustum 40DR, thereby generating image data of the overhead view image V3-1 as shown in Figures 39A and 39B.

Thereafter, in step S205 of FIG. 30, the overhead view V3-1 is transmitted to the GUI device 11, and the overhead view V3-2 corresponding to each camera 2 is transmitted to each camera 2.
This allows the director to check his/her own instructions on the instruction frustum 40DR in the overhead view V3-1, and each cameraman can visually check the instructions from the director through the instruction frustum 40DR.

By the way, the display of the instruction frustum 40DR that the cameraman can see is seen in the overhead image V3-2, but it is a good idea to control the viewpoint position of the overhead image V3-2 to make the instructions easier for the cameraman to understand.

For example, Figures 42A and 42B show overhead images V3-2 as the viewfinder display image 50 of camera 2. These are overhead images V3-2 with the position of camera 2 on the view frustum 40c as the viewpoint position, and are images viewed by the cameraman of camera 2.

In addition, in the overhead view image V3-2 of FIG. 42A, an instruction frustum 40DR for the view frustum 40c is displayed, and an instruction frustum 40DR for the view frustum 40a of the other camera 2 is also displayed.
Also, in the overhead view image V3-2 of FIG. 42B, an instruction frustum 40DR for the view frustum 40c is displayed, but an instruction frustum 40DR for the view frustum 40a of the other camera 2 is not displayed.

As shown in FIG. 42A or 42B, if the cameraman viewing the overhead image V3-2 can see it in a state close to his own viewpoint, the direction of the instruction by the instruction frustum 40DR becomes easy to understand.
That is, in Figures 42A and 42B, it can be intuitively understood that the instruction frustum 40DR directed at the user himself is an instruction to turn the imaging direction to the left.
Therefore, when the instruction frustum 40DR is displayed in the overhead view image V3-2, a 3D image is displayed in which the viewpoint position is set to the camera position, and the view frustum 40 and the instruction frustum 40DR are displayed therein.

An example of the process will be described. First, the AR system 5 performs steps S201 and S202 in FIG. 30 as shown in FIG. 41. Then, it performs step S203 in FIG. 30 as shown in FIG. 43.

In step S280, the AR system 5 branches the process depending on whether or not the instruction frustum 40DR is to be displayed in the current frame.
If the instruction frustum 40DR is not to be displayed in the overhead image V3-2 for the camera 2 to be processed, the AR system 5 proceeds to step S281 and generates video data in which the image of the view frustum 40 is synthesized with the overhead image V3-2.

When the instruction frustum 40DR is to be displayed in the current frame, the AR system 5 proceeds to step S282, and sets the arrangement of the view frustum 40 and the instruction frustum 40DR within the 3D space coordinates for generating the overhead image V3-2.
Then, in step S283, the AR system 5 sets the viewpoint position within the 3D space coordinates. That is, the coordinates of the position of a specific camera 2 among the multiple cameras to which the overhead video V3-2 is to be transmitted are set as the viewpoint position.

In step S284, the AR system 5 generates video data for the overhead image V3-2, which is CG in which the view frustum 40 and the instruction frustum 40DR are combined at the set viewpoint position.

When this processing is performed and an overhead image V3-2 including the instruction frustum 40DR is displayed as the viewfinder display image 50, the cameraman can see an image such as that shown in FIG. 42A or FIG. 42B from the viewpoint of the camera 2. This makes it easier to understand the director's instructions.

Incidentally, it would be convenient if the cameraman could arbitrarily switch the viewfinder display image 50 between the overhead image V3-2 and the shot image V1.
For example, the viewfinder display image 50 can be switched by the cameraman between an overhead image V3-2 as shown in FIG. 42A and a shot image V1 as shown in FIG.

In particular, since the cameraman needs to constantly check the captured image V1 (i.e., live view) of the camera 2 that he is operating while shooting, it is necessary for the captured image V1 to be displayed in the viewfinder.
For this reason, it is conceivable to composite the overhead image V3-2 with the shot image V1 and display it as previously shown in FIG. 40A, but the overhead image V3-2 may be small and the indication frustum 40DR may be difficult to see.
Therefore, it is advisable to switch between the overhead view image V3-2 as shown in FIG. 42A and the photographed image V1 as shown in FIG. 44 at any timing and display each image in full screen.

However, it is also necessary to know that an instruction has been issued while the photographed image V1 is being displayed.To this end, as shown in Figure 44, a pointed direction 54 and a coincidence rate 53 are displayed as instruction information on the photographed image V1.
The designated direction 54 is the shooting direction designated by the designated frustum 40DR. The match rate 53 indicates the match rate between the current view frustum 40 and the designated frustum 40DR. When the match rate becomes 100%, the current view frustum 40 matches the designated frustum 40DR.

By displaying in this way, the cameraman can confirm that the director has given instructions even when he is normally viewing the shot image V1, and can follow the instructions by relying on the instruction direction 54 and the coincidence rate 53. If necessary, the screen can also be switched to the overhead image V3-2 to check the instruction frustum 40DR.

An example of processing is shown in FIG.
The AR system 5 performs the processes of steps S270 to S273 in FIG. 45 in step S201 in FIG.
Furthermore, the AR system 5 performs the processes of steps S275 to S278 in FIG. 45 in step S203 in FIG.

In step S270, the AR system 5 checks whether the display of the view frustum 40 is OFF in the current frame. In other words, it checks whether the current frame is displaying the captured image V1 instead of the overhead image V3-2.

If the captured image V1 has been selected as the viewfinder display image 50, the AR system 5 ends the processing of step S201. In other words, there is no need to generate images of the view frustum 40 and the indication frustum 40DR.

If the overhead image V3-2 is selected as the viewfinder display image 50, the AR system 5 generates image data for the view frustum 40 based on the metadata MT in step S271.

In step S272, the AR system 5 determines whether or not to display the instruction frustum 40DR.
The instruction frustum 40DR is displayed when the director instructs it to be displayed. The selection of a mode for displaying all the instruction frustum 40DR and a mode for displaying only the instruction frustum 40DR for the camera of the camera is also confirmed.

If the instruction frustum 40DR is not to be displayed, the process of step S201 is ended.
If the instruction frustum 40DR is to be displayed in the overhead image V3-2, the AR system 5 proceeds to step S273 and generates image data of the instruction frustum 40DR.

In step S203 in FIG. 30, the AR system 5 checks whether the display of the view frustum 40 is OFF, as in step S275 in FIG. 45. This is to check whether the captured image V1 is currently being displayed.

If the camera 2 being processed is currently displaying the overhead image V3-2, the AR system 5 proceeds to step S278, where it synthesizes the image data of the view frustum 40 with the image data of the overhead image V3-2, and if image data of the instruction frustum 40DR has been generated, it also generates image data that is synthesized with the instruction frustum 40DR.

If the camera 2 to be processed is currently displaying the captured image V1, the AR system 5 proceeds to step S276, where the process branches depending on whether or not there is an instruction from the director. If there is no instruction, the process of step S203 ends. If there is an instruction from the director, in step S277, the captured image V1 is set to display the indicated direction 54 and the matching rate 53.

Then, in step S205 of FIG. 30, video data is output to camera 2. That is, video data of the shot video V1 as shown in FIG. 44 or video data of the overhead video V3-2 as shown in FIG. 42A is output to camera 2.

For example, the viewfinder display image 50 may be switched between the shot image V1, the overhead image V3-2, and a composite image as shown in FIG. 40A by operation of the cameraman.

[4-4: Marker display]
Next, an example of executing marker display in the overhead view image V3-2 as the viewfinder display image 50 visually recognized by the cameraman will be described.

Fig. 46A shows a state in which a photographed image V1 and an overhead image V3-2 are displayed as a viewfinder display image 50 of camera 2. In this example, the overhead image V3-2 is composited into a corner of the screen of the photographed image V1. Fig. 46B shows an enlarged view of the overhead image V3-2.
Also, as shown in FIG. 46B, in the overhead view image V3-2 displayed by camera 2, only the view frustum 40 of that camera itself is displayed.
In the overhead view video V3-2 displayed on the GUI device 11 on the director's side, it is assumed that the view frustums 40 of all the cameras 2 are displayed as described with reference to FIG. 28 and the like.

In the overhead view image V3-2 shown in FIGS. 46A and 46B, in addition to the view frustum 40, marker frustum 40M1 and 40M2 are displayed.
The marker frustums 40M1 and 40M2 are displayed in response to the cameraman registering the subject position and direction to be photographed, that is, the cameraman frequently marks the direction in which he or she wishes to photograph.

The marker frustum 40M1, 40M2 may be displayed in a different manner from, for example, the view frustum 40. The marker frustum 40M1 and the marker frustum 40M2 may also be displayed in a different manner.
For example, when the view frustum 40 is white and semi-transparent, the marker frustum 40M1 is yellow and semi-transparent, and the marker frustum 40M2 is light blue and semi-transparent.

Also, as shown in FIG. 47, the positions of the marker frustums 40M1 and 40M2 may be indicated by markers 55M1 and 55M2 on the captured image V1.
In this case, the correspondence may be clearly indicated by making the marker 55M1 yellow like the marker frustum 40M1 and making the marker 55M2 light blue like the marker frustum 40M2.

A processing example will be described. For the sake of explanation, the marker frustums 40M1, 40M2, etc. will be collectively referred to as "marker frustum 40M." Also, the markers 55M1, 55M2, etc. will be collectively referred to as "marker 55M."
FIG. 48 shows a specific example of steps S201, S202, S203, and S204 in FIG.

As step S201 in FIG. 30, the AR system 5 performs the processes of steps S300 to S303 in FIG.
First, in step S300, the AR system 5 generates image data of the view frustum 40 based on the metadata MT. For example, the view frustum 40 corresponding to the camera 2 to be processed is generated. In some cases, the view frustum 40 corresponding to all of the cameras 2 is generated.

In step S301, the AR system 5 determines whether or not a marking operation has been performed on the camera 2 to be processed. A marking operation is an operation for adding or deleting a marking. If no marking operation has been performed, the process of step S201 ends.

When a marking operation has been performed, in step S302, the AR system 5 performs a process of adding a marking point to the registered marking or deleting a marking from the registered marking for the camera 2 to be processed.
Then, in step S303, the AR system 5 generates image data of the marker frustum 40M as necessary. That is, if there are markings registered at that time, image data of the marker frustum 40M is generated.

In step S202 of FIG. 30, the AR system 5 generates a view frustum 40 for the director in step S310 of FIG. 48. In this case, image data of the view frustum 40 corresponding to all cameras 2 is generated.

In step S203 in FIG. 30, the AR system 5 performs the processes of steps S320 and S321 in FIG.
In step S320, the AR system 5 synthesizes the view frustum 40 with the CG data as the overhead view image V3-2. If there is a marking registration, the AR system 5 also synthesizes image data of the marker frustum 40M.

In step S321, the AR system 5 combines the marker 55M with the captured image V1 in accordance with the marking registration.
As described above, the video data of the overhead video V3-2 and the captured video V1 to be transmitted to the camera 2 is generated.

In step S204 in FIG. 30, the AR system 5 performs the process of step S330 in FIG.
In step S330, the AR system 5 synthesizes the view frustum 40 with the CG data as the overhead image V3-1.
As a result, video data for the overhead view video V3-1 is generated.

Then, in step S205 of FIG. 30, the video data of the overhead view V3-2 and the shot video V1 are transmitted to the camera 2, and the video data of the overhead view V3-1 is transmitted to the GUI device 11.
This allows the cameraman to visually recognize the marker frustum 40M and the marker 55M in accordance with the marking registration operation.
From the director's perspective, by not displaying the marker frustum 40M and marker 55M, the overhead image V3-1 does not become unnecessarily cluttered.

[4-5: Examples of various displays]
As yet another example, a display example of appropriate overhead views V3-1 and V3-2 on the director's side and cameraman's side, respectively, will be described.

FIG. 49A shows an example in which an overhead image V3-1 is displayed as the device display image 51 of the GUI device 11, and FIG. 49B shows an example in which an overhead image V3-2 is simultaneously displayed as the viewfinder display image 50 of the camera 2.

In the overhead image V3-1 of FIG. 49A, the

view frustums

40a, 40b, and 40c of the cameras 2 are displayed in a similar manner, for example, in semi-transparent white.
In the overhead image V3-2 of Figure 49B, the view frustum 40b of the camera 2 corresponding to the view frustum 40b is highlighted in, for example, a semi-transparent red, while the

view frustums

40a and 40c of the other cameras 2 are each displayed in the usual semi-transparent white.

Although not shown, in the camera 2 corresponding to the view frustum 40a, that view frustum 40a is highlighted in, for example, a semi-transparent red, and the

view frustums

40b and 40c of the other cameras 2 are each displayed in the normal semi-transparent white.
In addition, in the camera 2 corresponding to the view frustum 40c, that view frustum 40c is highlighted in, for example, a semi-transparent red, and the

view frustums

40a, 40b of the other cameras 2 are each displayed in a normal semi-transparent white.

By doing this, the director can check the view frustum 40 of each camera 2 evenly, and the cameraman can easily check the view frustum 40 of the camera 2 he is operating.

FIG. 50A shows an example in which an overhead image V3-1 is displayed as the device display image 51 of the GUI device 11, and FIG. 50B shows an example in which an overhead image V3-2 is simultaneously displayed as the viewfinder display image 50 of the camera 2.

In the overhead image V3-1 in Fig. 50A, the

view frustums

40a, 40b, and 40c of the cameras 2 are displayed in the same manner, for example, in semi-transparent white. By setting the viewpoint position at a relatively high position in the CG space 30 corresponding to the subject space 8, the image is easy to see overall.
50B, in the camera 2 corresponding to the view frustum 40b, the view frustum 40b is highlighted in, for example, a semi-transparent red, and the

view frustums

40a and 40c of the other cameras 2 are each displayed in a normal semi-transparent white. Furthermore, the viewpoint position is set to the position of the camera 2 corresponding to the view frustum 40b.

Although not shown, in the overhead image V3-2 displayed by the camera 2 corresponding to the view frustum 40a, that view frustum 40a is highlighted, for example, in a semi-transparent red color, and the

view frustums

40b, 40c of the other cameras 2 are each displayed in a normal semi-transparent white color, and the viewpoint position is set to the position of the camera 2 of the view frustum 40a.
Similarly, the overhead view image V3-2 of the camera 2 corresponding to the view frustum 40c also has its own view frustum 40 highlighted, and the viewpoint position is the position of the camera 2 of the view frustum 40c.

In this way, the director can check the view frustum 40 of each camera 2 evenly, and the cameraman can check the view frustum 40 of the camera 2 he is operating from a viewpoint similar to his own.

FIG. 51 shows an example in which an overhead image V3-1 is displayed as the device display image 51 of the GUI device 11. In this case, an example is shown in which two overhead images are synthesized and displayed as overhead images V3-1a and V3-1b. The overhead image V3-1a is an image from a viewpoint diagonally above the stadium, and the overhead image V3-1b is an image from a viewpoint directly above.

The director needs to understand the entire camera, so it is ideal to display multiple overhead images V3-1 from different viewpoints.

An example of processing for displaying each of the above examples will be described.
FIG. 52 shows a specific example of steps S201, S202, S203, and S204 in FIG.

As step S201 in FIG. 30, the AR system 5 performs the process of step S410 in FIG. 52. In step S410, the AR system 5 generates image data of the view frustum 40 for the cameraman based on the metadata MT. In this case, the image data is generated in a state in which the view frustum 40 corresponding to the camera 2 to be processed is highlighted.

In step S202 of FIG. 30, the AR system 5 generates a view frustum 40 for the director in step S420 of FIG. 52. In this case, image data with the same display mode is generated as the view frustum 40 corresponding to all cameras 2.

In step S203 in FIG. 30, the AR system 5 performs the processes of steps S430 and S431 in FIG.
In step S430, the AR system 5 sets the layout of the image data of the view frustum 40 within the 3D coordinate space as the overhead image V3-2.

In step S431, the AR system 5 generates video data as an overhead image V3-2, with the position of the target camera 2 in the 3D coordinate space set as the viewpoint position.
In this manner, the video data of the overhead video V3-2 to be transmitted to the camera 2 is generated.

In step S204 in FIG. 30, the AR system 5 performs the processes of steps S440, S441, and S442 in FIG.
In step S440, the AR system 5 synthesizes the view frustum 40 with the CG data as the overhead view image V3-1a.
In step S441, the AR system 5 synthesizes the view frustum 40 with the CG data as the overhead view image V3-1b.

In step S442, the AR system 5 generates video data that combines the overhead view image V3-1a and the overhead view image V3-1b on one screen. This generates the video data of the overhead view image V3-1 to be sent to the GUI device 11.

Thereafter, in step S205 of FIG. 30, the video data of the overhead view V3-2 is transmitted to the camera 2, and the video data of the overhead view V3-1 is transmitted to the GUI device 11.
This allows the cameraman to view, for example, the overhead image V3-2 as shown in FIG. 50B, and the director to view, for example, the overhead images V3-1a and V3-1b as shown in FIG.

In each of the examples described above in Fig. 28 to Fig. 52, the captured image V1 may be displayed together with the view frustum 40 as described in Fig. 9 to Fig. 27. In other words, the examples described in the embodiments can be implemented in a composite manner.

5. Summary and Modifications
According to the above embodiment, the following effects can be obtained.

In one embodiment, for example, an information processing device 70 as an AR system 5 is equipped with an image processing unit 71a that generates image data for simultaneously displaying an overhead image V3 of the target space 8, a view frustum 40 (shooting range presentation image) that presents the shooting range of the camera 2 within the overhead image V3, and the captured image V1 of the camera 2 on one screen (see Figures 7 and 19).
By displaying the view frustum 40 of the camera 2 in the overhead image V3 as the CG space 30 and simultaneously displaying the captured image V1, the viewer can easily grasp the correspondence between the image of the camera 2 and the position in space.

In the embodiment, an example has been given in which the video processor 71a generates video data that causes the captured video V1 to be displayed within the view frustum 40 (see FIGS. 9 to 14).
In other words, the image processor 71a generates image data in which the captured image V1 is arranged within the range of the image presentation image (view frustum 40). In other words, the image processor 71a generates image data in which the captured image V1 is displayed in a state in which it is arranged within the range of the image presentation image (view frustum 40).
By displaying the captured image V1 within the view frustum 40, the relationship between the view frustum 40 and the image captured by the camera 2 corresponding to the view frustum 40 becomes extremely easy for the viewer to understand.

In the embodiment, an example has been given in which the image processing unit 71a generates image data in which the captured image V1 is displayed at a position within the depth of field range shown on the view frustum 40 (see Figures 9 and 10).
The depth of field range 42 is displayed within the view frustum 40, and the captured image V1 is displayed inside the display of the depth of field range 42. This causes the captured image V1 to be displayed at a position close to the actual position of the subject within the overhead image V3. Therefore, the viewer can easily grasp the relationship between the shooting range of the view frustum 40, the actual captured image V1, and the position of the captured subject.

In the embodiment, an example has been given in which the video processor 71a generates video data in which the captured video V1 is displayed on the focus plane 41 shown on the view frustum 40 (see FIG. 9).
A focus plane 41 is displayed within the view frustum 40, and the captured image V1 is displayed on the focus plane 41. This allows the viewer to easily confirm the focus position of the camera 2 and the image of the subject at that position.

In addition, in the embodiment, an example was given in which the image processing unit 71a generates image data in which the captured image V1 is displayed farther away than the depth of field range 42 when viewed from the frustum starting point 46 (see Figures 12 to 14).
The view frustum 40 is an image that spreads in a quadrangular pyramid shape, and the area of the cross section increases as it goes farther. Therefore, by displaying the captured image V1 on or near the frustum far end surface 45, it is possible to display the captured image V1 relatively large within the view frustum 40. This is suitable, for example, when the contents of the captured image V1 are to be confirmed.

In addition, in the embodiment, an example is given in which the image processing unit 71a generates image data in which the captured image V1 is displayed at a position closer to the frustum origin 46 (a surface 47 near the frustum origin) than the depth of field range 42 shown on the view frustum 40 (see Figure 11).
For example, when it is desired to check the depth of field range 42 or the focus plane 41 in the view frustum 40, or when it is difficult to display the image on the far end surface 45 of the frustum, it is preferable to display the captured image V1 at a position close to the frustum starting point 46.

In the embodiment, an example has been given in which an image generation control unit 71b is provided that controls the generation of image data by variably setting the display position of the captured image V1, which is simultaneously displayed on one screen together with the overhead image V3 and the view frustum 40 (see Figures 7, 23, and 24).
For example, the display position of the captured image V1 is set as any position inside the view frustum 40 or any position outside the view frustum 40. By setting an appropriate position, it is possible to make it easier for the viewer to grasp the captured image V1, and to prevent the view frustum 40 and the captured image V1 from interfering with each other.

In the embodiment, an example has been given in which video production control section 71b determines whether to change the display position of photographed video V1, and changes the setting of the display position of photographed video V1 in accordance with the determination result (see FIG. 24).
For example, a change determination is performed so that the display position of the captured image V1 is automatically changed to an appropriate position, whereby the view frustum 40 and the captured image V1 are displayed in an appropriate positional relationship for the viewer, for example, a positional relationship that provides good visibility or a positional relationship that makes it easy to understand the correspondence relationship.

In the embodiment, an example has been given in which the image generation control unit 71b determines whether or not it is necessary to change the display position of the captured image V1 based on the positional relationship between the view frustum 40 and the object represented in the overhead image V3 (see steps S160 and P1 in Figure 24).
For example, when the far end side of the view frustum 40 is embedded in the ground GR or a structure CN in the overhead image V3, the image may become unnatural or may not be displayed at all when displayed on the frustum far end surface 45. In such a case, the image generation control unit 71b determines that the position setting needs to be changed and changes the position setting of the captured image V1. This makes it possible to automatically provide an easily viewable captured image V1.

In the embodiment, the image generation control unit 71b judges whether or not the display position of the captured image V1 needs to be changed based on the angle determined by the direction from the viewpoint of the entire overhead image V3 and the axial direction of the view frustum 40 (see steps S160 and P2 in FIG. 24). That is, it is the angle between the normal direction on the display screen when viewed from the line of sight direction from the viewpoint set for the overhead image V3 at a certain point in time, and the axial direction of the displayed view frustum 40. As described above, the axial direction of the view frustum 40 is the direction of a perpendicular line drawn from the frustum starting point 46 to the frustum far end surface 45.
The size and direction of the rendered view frustum 40 change according to the angle of view and shooting direction of the camera 2. Depending on the angle of the view frustum 40 in the overhead image V3, it may not be possible to secure a sufficient surface area within the view frustum 40 for displaying the captured image V1. In that case, even if the captured image V1 is displayed, it is difficult for the viewer to confirm the content. Therefore, the image generation control unit 71b determines that the position setting needs to be changed according to the angle of the view frustum 40, and changes the position setting of the captured image V1. This makes it possible to automatically provide the captured image V1 in an easy-to-view state.

In the embodiment, an example has been given in which video production control unit 71b determines whether or not the display position of captured video V1 needs to be changed based on a change in viewpoint within overhead video V3 (see steps S160 and P3 in Figure 24).
For example, changing the viewpoint of the overhead image V3 changes the direction, size, angle, etc. of the view frustum 40. When the viewpoint of the overhead image V3 is changed, the image generation control unit 71b judges whether the display of the captured image V1 up to that point is appropriate, and changes the settings if necessary. This makes it possible to provide the captured image V1 in a state that is always easy to view, even if the viewer arbitrarily changes the overhead image V3.

In the embodiment, an example has been given in which video production control section 71b uses type information of camera 2 capturing captured video V1 to set the destination of the captured video (see step S163 in FIG. 24).
For example, the change destination of the display position of the captured image V1 is set depending on whether the camera 2 is a fixed type using a tripod 6 or a mobile type. This makes it possible to set a position according to the fixed type camera 2F and the mobile type camera 2M. In particular, in the case of the mobile camera 2M, the view frustum 40 frequently changes, so that an easy-to-view display can be provided by displaying the captured image V1 at a position that is less affected by the change in the view frustum 40.

In the embodiment, an example has been given in which video production control section 71b changes the setting of the display position of captured video V1 in response to a user operation (see FIG. 23).
The user, who is the viewer, can arbitrarily switch the display position of the captured image V1, thereby allowing the captured image V1 to be displayed at a position that suits the viewer's ease of viewing and purpose.

In the embodiment, an example has been given in which image production control section 71b changes the display position of captured image V1 within view frustum 40 (see Figs. 23 and 24).
For example, within the view frustum 40, switching is performed among the focus plane 41, the frustum far end plane 45, the plane on the frustum starting point 46 side, the plane within the depth of field, etc. This allows the captured image V1 to be displayed at an appropriate position while clarifying the correspondence between the view frustum 40 and the captured image V1.

In the embodiment, an example has been given in which the image production control section 71b changes the display position of the captured image V1 between inside and outside the view frustum 40 (see Figs. 23 and 24).
For example, the display position of the captured image V1 is changed within the view frustum 40, such as the focus plane 41, the frustum far end plane 45, the plane on the frustum starting point 46 side, and the plane within the depth of field range, or further, at a position outside the view frustum 40, such as near the camera, in the corner of the screen, or near the focus plane 41. This makes it possible to widely select the display position of the captured image V1 according to the state of the overhead image V3 and the view frustum 40.

In the embodiment, an example is given in which the video processing unit 71a generates video data that simultaneously displays an overhead image V3, each view frustum 40 for each of the multiple cameras 2, and each captured image V1 for each of the multiple cameras 2 on a single screen (see Figures 16, 17, and 27).
The view frustum 40 and the captured images V1 of the multiple cameras 2 are displayed in the CG space 30 represented by the overhead image V3. This allows the viewer to easily understand the relationship between the shooting ranges of the cameras 2. This is convenient for a director, for example, to check the contents of the images captured by each camera 2.

The view frustum 40 is given as an example of a shooting range presentation image, and its shape is a quadrangular pyramid, but it is not limited to this. For example, it may be an image in which multiple rectangular outlines of a quadrangular pyramid cross section are arranged, or an image in which the outline of a quadrangular pyramid is expressed by a dashed line. It is also not necessarily limited to a quadrangular pyramid, and it may be a cone shape, etc.
Alternatively, the shooting range presentation image may display only the focus plane 41 or only the depth of field range 42 .

In addition, the information processing device 70 as, for example, the AR system 5 in the embodiment is equipped with an image processing unit 71a that performs in parallel a process of generating first image data that displays the view frustum 40 (image presenting the shooting range) of the camera 2 within the target shooting space 8, and a process of generating second image data that displays an image that displays the view frustum 40 within the target shooting space 8 and has a display mode different from that of the first image data.
In particular, the first video data and the second video data are the video data of the overhead video V3-1 transmitted to the GUI device 11 and the video data of the overhead video V3-2 transmitted to the camera 2 in the embodiment.
By displaying the view frustum 40 of the camera 2 within the overhead image V3 as the CG space 30, the viewer can easily grasp the correspondence between the image of the camera 2 and the position in the space. By generating video data with different display modes according to the role of each viewer for the overhead image V3 including the view frustum 40, it is possible to present information suited to each viewer through the video display.

In the embodiment, the video data of the overhead images V3-1 and V3-2 is video data of an image viewed by a video production supervisor, and the other is video data of an image viewed by a camera operator of camera 2 regarding the target space 8.
For example, the overhead image V3-1 has content intended for viewing by a video production instructor such as a director on the GUI device 11, and the overhead image V3-2 has video content intended for viewing by a shooting operator such as a cameraman. By displaying the overhead images V3-1 and V3-2 with different video content for the director and the cameraman in this way, it becomes possible to present information suitable for video production instructions and shooting operations, respectively.
In this case, the video production director refers to staff involved in video production, such as a director and a switcher engineer, other than the camera operator. The camera operator refers to a cameraman who directly operates the camera 2 and a staff member who remotely operates the camera 2.

In the embodiment, at least one of the video data of the overhead images V3-1 and V3-2 is video data that displays an image including a plurality of view frustums 40 corresponding to a plurality of cameras 2, respectively.
For example, one or both of the overhead images V3-1, V3-2 display view frustums 40 for multiple cameras 2. By displaying multiple view frustums 40, the director, cameraman, etc. can easily grasp the positional relationship of each camera 2 and the subject.
For the overhead image V3-1 viewed by a director or the like, a view frustum 40 is displayed for multiple cameras 2, allowing the director or the like to give various instructions and select main line images while recognizing the position and direction of the subject of each camera 2.
Regarding the overhead view image V3-2 viewed by the cameraman, the view frustum 40 is displayed for the plurality of cameras 2, so that the cameraman can perform shooting operations while taking into consideration the relationship with the other cameras 2.

For the overhead view image V3-2 viewed by the cameraman, only the view frustum 40 may be displayed for his/her own camera 2. In this way, the cameraman can easily grasp the position of the subject in the image V1 captured by his/her own camera operation within the whole image.
Furthermore, in the overhead image V3-2 viewed by the cameraman, only the view frustum 40 of the camera 2 of the other cameraman may be displayed. In this way, the cameraman can operate his own camera while recognizing the shooting locations and subjects of the other camera 2.

In the embodiment, an example is given in which the video processing unit 71a generates video data as at least one of the video data for the overhead images V3-1, V3-2, which displays an image in which a portion of a plurality of view frustums 40 corresponding to a plurality of cameras 2 is displayed in a different manner from the other view frustums 40.
That is, when a plurality of view frustums 40 are displayed, some of them are displayed in a different manner from the other view frustums 40. This makes it possible to realize a display in which a specific view frustum 40 has meaning when displaying a plurality of view frustums 40.

In the embodiment, an example is given in which the video processing unit 71a generates video data that displays an image in which a portion of a plurality of view frustums 40 corresponding to a plurality of cameras 2 is highlighted as at least one of the video data for the overhead images V3-1, V3-2.
When a plurality of view frustums 40 are displayed, a particular view frustum 40 can be clearly identified by displaying some of the view frustums 40 in a more emphasized manner than the other view frustums 40 .
Examples of highlighting include a display with increased brightness, a display using a conspicuous color, a display with emphasized contours, a blinking display, and the like.

In the embodiment, an example is given in which the video processing unit 71a generates video data that displays, as an overhead image V3-1, an image in which the view frustum 40 of a specific camera, which is a camera 2 among multiple cameras 2 that contains a subject of interest in the captured image V1, is displayed in a different manner from the other view frustums 40 (see Figures 28 to 32).
By clearly indicating the view frustum 40 of the camera 2 selected from among the cameras 2 capturing the target subject, it is easy for the director to know which camera is appropriate when he wants to use the image of the target subject as the main line image. It is also easy for the director to understand the positional relationship between the camera 2 capturing the target subject and the shooting direction of the other cameras 2.

In addition, an example has been given in which the specific camera that highlights the view frustum 40 is the camera 2 in which the screen occupancy rate of the target subject in the captured image V1 is the highest (see FIGS. 29, 30, and 31).
By clearly indicating the camera 2 showing the target subject most largely within the screen, the director can give instructions while grasping the status of the camera 2 mainly showing the target subject and the other cameras 2.

Also, an example has been given in which the specific camera for highlighting the view frustum 40 is the camera 2 that has the longest continuous shooting time of the target subject in the shot video V1 (see FIG. 32).
By clearly indicating the camera 2 that is continuously filming the subject of interest, the director can grasp the status of the camera 2 that mainly films the subject of interest and other cameras 2 and give instructions accordingly.

In the embodiment, an example is given in which the video processing unit 71a generates video data as the video data for the overhead video V3-1, in which the view frustum 40 of a camera 2 among multiple cameras 2 that has detected a specific operation by the shooting operator is displayed in a different manner from the other view frustum 40 (see Figures 33 and 34).
By allowing the cameraman to give feedback to the director when a good shot is being taken, the director can easily understand what the cameraman is saying. In particular, it makes it easier to understand situations where a good scene is being shot unexpectedly.

In the embodiment, an example is given in which the video processing unit 71a generates video data as video data for the overhead video V3-1 in which, when the view frustums 40 of multiple cameras 2 overlap within the displayed image, the overlapping view frustums 40 are displayed in a different manner from the non-overlapping view frustums 40 (see Figures 35 and 36).
When multiple view frustums 40 overlap, multiple cameras 2 are shooting the direction of a common subject. By clearly indicating this to the director, it becomes suitable for giving instructions regarding the common subject. For example, it is suitable for giving instructions to change the focus position or angle of view of each camera 2, and it also serves as information presentation suitable for switching main line images.

In the embodiment, an example is given in which the video processing unit 71a generates video data that preferentially displays one of the overlapping view frustums 40 as at least one of the overhead images V3-1, V3-2 when the view frustums 40 of multiple cameras 2 overlap on the displayed image (see Figures 37 and 38).
When multiple view frustums 40 overlap, one view frustum 40 is preferentially displayed in the overlapping portion. For example, in the overlapping portion, the focus plane 41 and depth of field range 42 of only one view frustum 40 that has been set as the priority are displayed. By preventing the display of the focus plane 41 and depth of field range 42 from overlapping, the overhead view video V3 can be made easy to view without being cluttered.
In addition, in the overlapping portion, it is possible to increase the brightness of only one view frustum 40 that has been set as a priority, or to give it a conspicuous color. Furthermore, the above-mentioned highlighted display may be performed. In the overlapping portion, only the view frustum 40 that has been set as a priority may be displayed. These also make it easier to view the overhead image V3 including multiple view frustum 40.

As a specific example, in the overhead image V3-1 viewed by the director, the view frustum 40 of camera 2, which is the main line image, is displayed as a priority, while in the overhead image V3-2 viewed by the cameraman, no particular priority is set.
In addition, there is an example in which no particular priority setting is made in the overhead image V3-1 viewed by the director, but in the overhead image V3-2 viewed by the cameraman, the view frustum 40 of the camera 2 that he operates is displayed with priority.

In the embodiment, an example has been given in which video processing unit 71a generates video data for displaying, as overhead images V3-1 and V3-2, images including instruction images in different display modes (see FIGS. 39 to 45).
For example, when a director gives instructions by operating the view frustum 40 on the screen, the instruction contents can be confirmed by the instruction frustum 40DR. On the cameraman side, the instruction frustum 40DR is displayed on the screen, so that the cameraman can visually understand the instruction contents. In this case, the overhead images V3-1 and V3-2 are displayed in a way that is appropriate for each role, so that the shooting can proceed smoothly.

In the embodiment, an example is given in which the video processing unit 71a sets the video data of the overhead image V3-1 as video data that displays instruction images for multiple cameras 2, and sets the video data of the overhead image V3-2 as video data that displays instruction images for a specific camera 2 among the multiple cameras (see Figures 39, 41, and 42).
This allows the director to understand the instructions for each camera, while camera operators can easily understand the instructions by only seeing the instructions that are directed to them.

In the embodiment, an example was given in which the video processing unit 71a converts the video data of the overhead video V3-2 into video data that displays an instruction video within a video of a viewpoint corresponding to the position of a specific camera 2 among multiple cameras (see Figures 42 and 43).
For the cameraman, the indication frustum 40DR is displayed in the overhead view image V3-2 from his/her viewpoint, so that the indication direction can be easily seen from his/her own viewpoint.

In the embodiment, an example has been given in which the video processing unit 71a generates video data for the overhead video V3-2 that displays the current view frustum 40 and a marker image in the shooting direction based on the marking operation (see Figures 46 to 48).
In response to the cameraman performing the marking operation, the bird's-eye view image V3-2 including the marker images of the marker frustum 40M, the marker 55M, etc. is displayed. This allows the cameraman to mark the shooting position or subject that he or she has set, which is convenient for taking pictures of that position at the appropriate time.
Furthermore, by not displaying such a marker image on the director's overhead image V3-1, it is possible to prevent the overhead image V3-1 from becoming unnecessarily cluttered.

In the embodiment, an example is given in which the video processing unit 71a generates video data as the video data for the overhead video V3-2, which displays an overhead video from a viewpoint corresponding to the position of a specific camera 2 among multiple cameras, and generates video data as the video data for the overhead video V3-1, which displays an overhead video from a different viewpoint (see Figures 49 to 52).
For the cameraman, the bird's-eye view V3-2 is displayed from the same viewpoint as his/her own viewpoint, making it easy to recognize the overall situation and his/her own shooting direction. For the director, the bird's-eye view V3-1 is displayed from a viewpoint that makes it easy to grasp the whole picture, rather than from the viewpoint of a specific cameraman, making it ideal for directing the entire shoot.

In the embodiment, an example has been given in which the video processing unit 71a generates video data for displaying a plurality of overhead views V3-1a, V3-1b from a plurality of viewpoints as the video data for the overhead view V3-1 (see FIGS. 51 and 52).
Since the director needs to understand the shooting conditions of each camera 2, an overhead image V3-1 that provides an overall bird's-eye view from a plurality of viewpoints as shown in FIG. 51 is extremely useful.

In the embodiment, an example has been given in which the video processor 71a generates the overhead view video V3 as a virtual video using CG.
This makes it possible to generate an overhead image V3 from any viewpoint, and to display the view frustum 40 and the captured image V1 from a variety of viewpoints.

In the embodiment, the view frustum 40 is configured to display the shooting direction and angle of view at the time of shooting in real time, but it may also be configured to display a past view frustum 40, for example, during a prior simulation of camera work.
For example, the current view frustum 40 at the time of shooting and the past view frustum 40 may be displayed at the same time for comparison.
In such a case, it is advisable to make the past view frustum 40 different from the current view frustum 40 by increasing its transparency, for example, so that the cameraman or the like can distinguish between them.

The program of the embodiment is a program that causes a processor such as a CPU or DSP, or a device including these, to execute the processes shown in Figures 20, 21, 22, 23, and 24 described above. That is, the program of the embodiment is a program that causes the information processing device 70 to execute a process of generating video data that simultaneously displays, on one screen, an overhead image V3 of the space to be photographed, a view frustum 40 (shooting range presentation image) that presents the shooting range of the camera 2 within the overhead image V3, and the captured image V1 of the camera 2.

The program of the embodiment is a program that causes a processor such as a CPU or DSP, or a device including these, to execute the processes shown in Figures 30, 31, 32, 34, 36, 38, 41, 43, 45, 48, and 52 described above. That is, the program of the embodiment is a program that causes the information processing device 70 to execute in parallel a process of generating first video data that displays a view frustum 40 (shooting range display image) that presents the shooting range of the camera 2 within the shooting target space, and a process of generating second video data that displays an image that displays the view frustum 40 within the shooting target space and has a display mode different from that of the image generated by the first video data.

These programs allow an information processing device 70 that operates like the AR system 5 described above to be realized using various computer devices.

Such a program can be pre-recorded in a HDD as a recording medium built into a device such as a computer device, or in a ROM in a microcomputer having a CPU. Also, such a program can be temporarily or permanently stored (recorded) in a removable recording medium such as a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a Blu-ray Disc (registered trademark), a magnetic disk, a semiconductor memory, or a memory card. Such a removable recording medium can be provided as a so-called package software.
Such a program can be installed in a personal computer or the like from a removable recording medium, or can be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.

Furthermore, such a program is suitable for the widespread provision of the information processing device 70 of the embodiment. For example, by downloading the program to personal computers, communication devices, mobile terminal devices such as smartphones and tablets, mobile phones, game devices, video devices, PDAs (Personal Digital Assistants), etc., these devices can function as the information processing device 70 of the present disclosure.

Note that the effects described in this specification are merely examples and are not limiting, and other effects may also be present.

The present technology can also be configured as follows.
(1)
An information processing device comprising: an image processing unit that generates image data for simultaneously displaying an overhead image of a space to be photographed, a shooting range presentation image that presents the shooting range of a camera within the overhead image, and the image photographed by the camera on a single screen.
(2)
The information processing device according to (1), wherein the image processing unit generates image data in which the captured image is displayed within the shooting range presentation image.
(3)
The information processing device according to (1) or (2), wherein the image processing unit generates image data in which the captured image is displayed at a position within a depth of field range shown in the shooting range presentation image.
(4)
The information processing device according to any one of (1) to (3) above, wherein the image processing unit generates image data in which the captured image is displayed on a focus plane shown in the shooting range presentation image.
(5)
The information processing device according to (2) above, wherein the image processing unit generates image data in which the captured image is displayed farther away than a depth of field range as viewed from a starting point of the shooting range presentation image.
(6)
The information processing device described in (2) above, wherein the image processing unit generates image data in which the captured image is displayed at a position closer to an origin of the shooting range presentation image than a depth of field range shown in the shooting range presentation image.
(7)
The information processing device according to any one of (1) to (6) above, further comprising an image generation control unit that controls generation of image data by variably setting a display position of the captured image that is simultaneously displayed on one screen together with the overhead image and the shooting range presentation image.
(8)
The information processing device according to (7) above, wherein the image generation control unit determines whether to change a display position of the shot image, and changes a setting of the display position of the shot image according to a result of the determination.
(9)
The information processing device according to (8) above, wherein the image generation control unit, in the change determination, determines whether or not it is necessary to change the display position of the captured image based on a positional relationship between the shooting range presentation image and an object represented in the overhead image.
(10)
The information processing device described in (8) or (9) above, wherein, in the change determination, the image generation control unit determines whether or not it is necessary to change the display position of the captured image based on the angle between the direction from the viewpoint of the entire overhead image and the axial direction of the shooting range presentation image.
(11)
The information processing device according to (8) or (9), wherein the image generation control unit, in the change determination, determines whether or not a change is required for a display position of the captured image in accordance with a change in a viewpoint within the overhead image.
(12)
The information processing device according to any one of (7) to (10) above, wherein the image generation control unit uses type information of a camera that captures the captured image to set a destination of the captured image.
(13)
The information processing device according to any one of (7) to (12) above, wherein the image generation control unit changes a setting of a display position of the captured image in response to a user operation.
(14)
The information processing device according to any one of (7) to (13) above, wherein the image generation control unit changes a display position of the captured image within the shooting range presentation image.
(15)
The information processing device according to any one of (7) to (13), wherein the image generation control unit changes a display position of the captured image within the shooting range presentation image and outside the shooting range presentation image.
(16)
The information processing device described in any one of (1) to (15) above, wherein the image processing unit generates image data that simultaneously displays the overhead image, each of the shooting range presentation images for the multiple cameras, and each of the shot images for the multiple cameras on one screen.
(17)
The information processing device according to any one of (1) to (16) above, wherein the overhead image is generated by a virtual image.
(18)
An information processing method in which an information processing device executes a process of generating video data that simultaneously displays an overhead image of a space to be photographed, a shooting range presentation image that presents the camera's shooting range within the overhead image, and the image captured by the camera on a single screen.
(19)
A program that causes an information processing device to execute a process of generating video data that simultaneously displays an overhead image of a space to be photographed, a shooting range presentation image that presents the camera's shooting range within the overhead image, and the image captured by the camera on a single screen.

1, 1A Camera system 2 Camera 3 CCU
4 AI board 5 AR system 6 Tripod 8 Space to be photographed 10 Control panel 11 GUI device 12 Network hub 13 Switcher 14 Master monitor 30 CG space 35

Environment map

40, 40a, 40b, 40c View frustum 40DR Indication frustum 40M1, 40M2, 40M Marker frustum 41 Focus plane 42 Depth of field range 43 Depth near end plane 44 Depth far end plane 45 Frustum far end plane 46 Frustum origin 47 Frustum origin vicinity plane V1 Photographed image V2 AR superimposed image V3 Bird's-eye image 70 Information processing device 71 CPU
71a Image processing unit 71b Image generation control unit

Claims

An information processing device comprising: an image processing unit that generates image data for simultaneously displaying an overhead image of a space to be photographed, a shooting range presentation image that presents the shooting range of a camera within the overhead image, and the image photographed by the camera on a single screen.
The information processing device according to claim 1 , wherein the image processing unit generates image data in which the captured image is displayed within the capturing range presentation image.
The information processing device according to claim 1 , wherein the image processing unit generates image data in which the captured image is displayed at a position within a depth of field range shown in the shooting range presentation image.
The information processing device according to claim 1 , wherein the image processing unit generates image data in which the captured image is displayed on a focus plane indicated in the capturing range presentation image.
The information processing device according to claim 2 , wherein the image processing unit generates image data in which the captured image is displayed farther away than a depth of field range when viewed from a starting point of the shooting range presentation image.
The information processing device according to claim 2 , wherein the image processing unit generates image data in which the captured image is displayed at a position closer to an origin of the shooting range presentation image than a depth of field range shown in the shooting range presentation image.
The information processing device according to claim 1 , further comprising an image generation control unit that controls generation of image data by variably setting a display position of the captured image that is simultaneously displayed on one screen together with the overhead image and the shooting range presentation image.
The information processing device according to claim 7 , wherein the image generation control unit determines whether to change a display position of the captured image, and changes a setting of the display position of the captured image in accordance with a result of the determination.
The information processing device according to claim 8 , wherein the image generation control unit, in the change determination, determines whether or not it is necessary to change the display position of the captured image based on a positional relationship between the shooting range presentation image and an object represented in the overhead image.
The information processing device according to claim 8 , wherein the image generation control unit, in the change determination, determines whether or not it is necessary to change the display position of the captured image based on an angle between a direction from a viewpoint of the entire overhead image and an axial direction of the shooting range presentation image.
The information processing device according to claim 8 , wherein the image generation control unit, in the change determination, determines whether or not it is necessary to change the display position of the captured image in response to a change in a viewpoint within the overhead image.
The information processing device according to claim 7 , wherein the image generation control unit uses type information of a camera that captures the captured image to set a destination of the captured image.
The information processing device according to claim 7 , wherein the image generation control unit changes a setting of a display position of the captured image in response to a user operation.
The information processing device according to claim 7 , wherein the image generation control unit changes a display position of the captured image within the shooting range presentation image.
The information processing device according to claim 7 , wherein the image generation control unit changes a display position of the captured image within the shooting range presentation image and outside the shooting range presentation image.
The information processing device according to claim 1 , wherein the image processing unit generates image data for simultaneously displaying the overhead image, each of the shooting range presentation images for a plurality of cameras, and each of the shot images for the plurality of cameras on one screen.
The information processing device according to claim 1 , wherein the overhead image is generated from a virtual image.
An information processing method in which an information processing device executes a process of generating video data that simultaneously displays, on a single screen, an overhead image of a space to be photographed, a shooting range presentation image that presents the camera's shooting range within the overhead image, and the image captured by the camera.
A program that causes an information processing device to execute a process of generating video data that simultaneously displays an overhead image of a space to be photographed, a shooting range presentation image that presents the camera's shooting range within the overhead image, and the image captured by the camera on a single screen.