WO2024116733A1

WO2024116733A1 - Information processing device, information processing method, and recording medium

Info

Publication number: WO2024116733A1
Application number: PCT/JP2023/039870
Authority: WO
Inventors: 和俊河村; 佑輝中居
Original assignee: ソニーグループ株式会社; 株式会社ソニー・ミュージックエンタテインメント
Priority date: 2022-12-02
Filing date: 2023-11-06
Publication date: 2024-06-06

Abstract

The information processing device according to the present technology comprises a designation acceptance processing unit that performs a process in which a subject to be tracked by a tracking camera capable of tracking a subject is designated by a user from a subject captured in a sensing image obtained by a sensor device which, separately from the tracking camera, has a light reception sensor.

Description

Information processing device, information processing method, and recording medium

This technology relates to an information processing device and method, and a recording medium, and in particular to imaging control technology.

Examples of image capture control include controlling the camera's orientation to track a detected subject (i.e., subject tracking control) and controlling the camera's zoom to match a preset angle of view.

The following Patent Document 1 discloses a technique for controlling the orientation of a camera so that an object is captured. Specifically, Patent Document 1 discloses a technique for detecting the subject position based on a captured image and detecting the subject position using a wireless tag carried by the subject, and controlling the orientation of the camera based on these two detection results.
Furthermore, Japanese Patent Laid-Open No. 2003-233663 discloses a technique for controlling the imaging direction and imaging operation of a camera based on position information and audio information transmitted from an individual information transmitter attached to a subject.

JP 2008-288745 A JP 2005-277845 A

Here, when tracking a subject, it is conceivable to have the user specify in advance the subject to be tracked; however, if the subject to be tracked is specified from among the subjects appearing in the image captured by the tracking camera performing the tracking, there is a risk that the user will not be able to properly specify the subject to be tracked. For example, depending on the situation, there is a risk that the user will not be able to properly specify the subject to be tracked if the subject that the user wants to track falls outside the imaging range of the tracking camera.

This technology was developed in consideration of the above circumstances, and aims to enable a tracking camera to properly specify the subject to be tracked.

The information processing device related to the present technology is equipped with a designation reception processing unit that performs processing to allow a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from among subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
According to the above configuration, even if the subject that the user wants to track is located outside the imaging range of the tracking camera, a sensor device other than the tracking camera can capture the subject in a sensing image, and the user can specify the subject that he or she wants to track from the sensing image of the sensor device. In addition, if a sensor device having a light receiving sensor capable of receiving infrared light is used, even if the user cannot identify the subject from the image captured by the tracking camera in a dark environment, the subject can be identified in the sensing image by the sensor device, and it is possible to increase the ease of specifying the subject to be tracked.

In addition, the information processing method related to the present technology is an information processing method in which an information processing device performs processing to allow a user to specify a subject to be tracked by a tracking camera capable of tracking a subject from among subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
In addition, the recording medium related to the present technology is a recording medium having a program recorded thereon that can be read by a computer device, and is a recording medium having a program recorded thereon that causes the computer device to realize a designation reception processing function that performs processing to allow a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from among subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
The information processing device according to the present technology described above is realized by these information processing methods and recording media.

1 is a diagram illustrating an example of the configuration of an image processing system including an information processing device according to an embodiment of the present technology. FIG. 1 is an image diagram of a live music venue envisioned in an embodiment. FIG. 1 is a block diagram illustrating an example of a hardware configuration of an information processing device according to an embodiment. FIG. 13 is a diagram showing an example of an operation screen as an embodiment used for composition control for each camera. 11A to 11C are diagrams for explaining an example of a procedure for designating a tracking target subject. FIG. 13 is a diagram showing an example of a setting screen (in a common setting mode) in the embodiment. FIG. 13 is a diagram showing an example of a setting screen (in a common setting mode) on which various information has been input. FIG. 13 is a diagram showing an example of a setting screen (in a control setting mode) in the embodiment. FIG. 13 is a diagram showing an example of an operation screen after controller setting. 11A and 11B are diagrams showing examples of operation screens when a tracking target subject is designated. FIG. 13 is a diagram showing an example of an operation screen when a tracking target subject is designated. 13 is a diagram showing an example of an operation screen on which a distance image captured by a sensor device is superimposed. FIG. 13A and 13B are diagrams showing examples of screen displays for accepting designation of a tracking composition. 11A to 11C are diagrams for explaining an example of a method for calculating information for enabling reproduction of a specified composition. 11 is an explanatory diagram of an example of a method for calculating parameter information for enabling reproduction of a specified composition. 6A to 6C are explanatory diagrams of various cases relating to calculation of a target image frame. 11A and 11B are explanatory diagrams of composition types that can be switched in semi-automatic control. FIG. 4 is an explanatory diagram of a composition selection table in the embodiment. 13A and 13B are diagrams for explaining an example of composition selection based on prohibited transition composition information. 11 is a flowchart illustrating an example of a processing procedure for implementing composition control for selecting a composition based on a composition selection table and switching to the selected composition. 13 is a flowchart of a process relating to updating weight information in a composition selection table. FIG. 11 is a diagram showing an example of a setting procedure for semi-automatic control. FIG. 13 is a diagram showing an example of a procedure for setting semi-automatic control. 4A to 4C are explanatory diagrams of functions relating to composition control according to an embodiment. 11 is a flowchart showing an example of a processing procedure relating to a screen display corresponding to a tracking target subject designated by a user. 11 is a flowchart of a process relating to display switching according to the brightness of a captured image displayed on an operation screen. 11 is a flowchart of a process to be executed in response to a tracking composition designated by a user. 11 is a flowchart of a process relating to calculation of a target image frame for realizing a specified tracking composition. FIG. 2 is a diagram illustrating an outline of a calibration process according to an embodiment. FIG. 11 is an explanatory diagram of a calibration processing technique as a first technique. 11 is a flowchart showing an example of a specific procedure for performing calibration in the first method. 11 is a flowchart of a process executed by an information processing device when a first method is adopted. FIG. 11 is an explanatory diagram of a calibration processing technique as a second technique. 10 is a flowchart showing an example of a specific procedure for performing calibration in the second method. 11 is an explanatory diagram of an example in which a user is prompted to specify positions on an image that are identical points on an object. FIG. FIG. 13 is a diagram showing an example of a group shot setting acceptance screen. 13 is a diagram showing an example of an execution instruction button for instructing a tracking composition as a group shot. FIG. 13 is an explanatory diagram of a modified example in which the tracking sensitivity can be variably set. FIG. 11A and 11B are diagrams for explaining examples of the arrangement of manual composition adjustment buttons. 11A and 11B are diagrams illustrating an example of a composition control technique in response to a manual operation input. FIG. 13 is an explanatory diagram of a modified example in which a selection ratio by a switcher is displayed.

Hereinafter, with reference to the accompanying drawings, embodiments of the present technology will be described in the following order.
1. Image Processing System as an Embodiment
2. Hardware configuration of information processing device
3. Composition Control as an Embodiment
(3-1. Specifying the subject to be tracked)
(3-2. Specifying the tracking composition)
(3-3. Semi-automatic control)
(3-4. Functional configuration related to composition control as an embodiment)
<4. Processing Procedure>
<5. About calibration>
6. Modifications
7. Recording medium
8. Summary of the embodiment
<9. This Technology>

1. Image Processing System as an Embodiment
FIG. 1 shows an example of the configuration of an image processing system 100 including an information processing device 1 according to an embodiment of the present technology.
As shown in the figure, the image processing system 100 includes an information processing device 1 , a parent camera 2 , a tracking camera 3 , a pan head 4 , a switcher 5 , and a distance measuring/imaging device 6 .

In this example, the parent camera 2 is singular and the tracking cameras 3 are plural; specifically, three tracking cameras 3 are used. When distinguishing between the three tracking cameras 3, the codes are written as "3-1", "3-2", and "3-3" by adding a "- (hyphen)" and a number to the end of the code as shown in the figure.

The parent camera 2 and the tracking camera 3 are configured as imaging devices that have imaging elements such as a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor to capture images.

The pan head 4 is configured as an electronic pan head, and supports the tracking camera 3 while being capable of changing the orientation of the tracking camera 3 in both the pan direction and the tilt direction based on an external control signal.
In this example, a tripod head 4 is provided for each tracking camera 3, and in the following, when distinguishing between the three tripod heads 4, the symbols will be written as "4-1,""4-2," and "4-3" with a "- (hyphen)" and a numerical value added to the end of the symbol as shown.

In the image processing system 100 , an object is imaged from a plurality of viewpoints by a plurality of cameras such as a parent camera 2 and a tracking camera 3 , and a plurality of systems of images obtained based on the imaging are input to a switcher 5 .
The switcher 5 selects and outputs one of the images from the multiple images input based on an operation. In this example, the captured image content for the target event is generated from the images selected and output by the switcher 5.
The captured image content generated based on the output image of the switcher 5 can be distributed via a network such as the Internet, or can be transmitted by broadcast waves. Alternatively, the captured image content can be recorded (stored) on a predetermined recording medium.

In this example, the event to be imaged by the cameras is a live music event, and the parent camera 2 and tracking camera 3 are installed at the live music venue.

FIG. 2 is an image diagram of a live music venue assumed in this embodiment.
As shown in the figure, the live music venue is equipped with a stage, an audience seating area, and a Front of House (FOH). Performers such as musicians and singers perform on the stage.
The seating area is located behind the stage and is a space capable of accommodating spectators.
The FOH is located behind the seating area and is a space where various devices for controlling the sound of the venue and elements related to the live performance such as lighting are located. People from the concert organizers, such as directors and staff, can enter the FOH.

In this example, the parent camera 2 is a camera for capturing the entire stage within its angle of view, and is disposed at the FOH. In this example, the resolution of the image captured by the parent camera 2 is 4K (3840×2160), whereas the resolution of the output image by the switcher 5 is FHD (Full High Definition: 1920×1080). In this example, a camera without an optical zoom is used as the parent camera 2.
Two of the tracking cameras 3 are placed in the space between the stage and the audience area (the space in front of the fence), making it possible to capture the performers on stage within the field of view from a position closer than the FOH. As shown in the figure, these two tracking cameras 3 are placed at both ends in the left-right direction (the direction perpendicular to the front-rear direction).
The remaining one of the tracking cameras 3 is disposed at the FOH and is used as a camera for capturing a telephoto image of the performers on stage.
In this example, a camera equipped with an optical zoom is used as each tracking camera 3. Also, in this example, each tracking camera 3 is configured to be able to change the output resolution of the captured image. Specifically, the output resolution can be switched between at least 4K and FHD.

As described below, in this example of image processing system 100, the orientation of tracking camera 3 is controlled so that it tracks a subject, such as a performer on stage. In other words, each tracking camera is a camera capable of tracking a subject.

As shown in the figure, of the two tracking cameras 3 placed in front of the front fence of the stage, the tracking camera 3 placed on the stage's downhill side (left side as you face the stage) is tracking camera 3-1, and the tracking camera 3 placed on the stage's uphill side is tracking camera 3-2. In addition, the tracking camera 3 placed at the FOH is tracking camera 3-3.

To enable the tracking camera 3 to track a subject, the image processing system 100 is provided with a distance measuring/imaging device 6 for detecting the three-dimensional position (position in three-dimensional space) of the subject on the stage.

The distance measuring/imaging device 6 is a type of sensor device having a light receiving sensor separate from the tracking camera 3, and specifically in this example is configured as a device having a distance measuring function and an imaging function for visible light, such as a Kinect. Specifically, the distance measuring/imaging device 6 in this example has, as light receiving sensors, a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and also has an imaging function for obtaining an image of visible light based on a light receiving signal of the visible light sensor, and a distance measuring function for obtaining a distance image based on a light receiving signal of the infrared light sensor.
The distance measuring/imaging device 6 in this example has a function of outputting a captured image and a distance image, and is also capable of generating a bone estimation model of the subject by image recognition processing based on the distance image.
The bone estimation model referred to here means a simplified model of a subject that represents the structure of the subject using main parts such as the face, chest, pelvis, shoulders, elbows, hands, knees, and ankles. Hereinafter, the bone estimation model will be referred to as a "simplified model."
The distance measuring/imaging device 6 also has a function of outputting information indicating the three-dimensional position of each part of the simplified model of the subject.

In this example, a plurality of distance measuring/imaging devices 6 are used, and are arranged at each end of the stage on the left and right sides. Specifically, in this example, two distance measuring/imaging devices 6 are arranged at each end of the stage on the left and right sides. These distance measuring/imaging devices 6 are arranged so that even if the subject moves on the stage, the subject is captured within the distance measuring range of at least one of the distance measuring/imaging devices 6.
Here, of the distance measuring/imaging devices 6, the one placed on the near side of the right-hand end of the stage is referred to as distance measuring/imaging device 6-1, the one placed on the near side of the right-hand end of the stage is referred to as distance measuring/imaging device 6-2, the one placed on the far side of the right-hand end of the stage is referred to as distance measuring/imaging device 6-3, and the one placed on the far side of the right-hand end of the stage is referred to as distance measuring/imaging device 6-4.

As shown in FIG. 1, in the image processing system 100 of this embodiment, images from a total of six systems, CAM 1 to CAM 6, are input to the switcher 5.
As for the images CAM1 to CAM3, images captured by the three tracking cameras 3 are input to a switcher 5. The images CAM4 to CAM6 are images generated by an information processing device 1 serving as a computer device based on images captured by the parent camera 2.

The images in CAM4, CAM5, and CAM6 are cut out from the image captured by the parent camera 2. For these images in CAM4, CAM5, and CAM6, it is possible to adjust not only the cut-out size but also the cut-out position.

Here, changing the cut-out position for a captured image is equivalent to virtually changing the positional relationship between the camera and the subject, and changing the cut-out size for a captured image is equivalent to virtually changing the optical zoom magnification. In other words, obtaining an image by changing the cut-out position or cut-out size for a captured image can be said to be equivalent to changing the composition by moving or operating a virtual camera.

In this specification, the term "camera" is used as a concept that includes both such virtual cameras and real cameras such as the parent camera 2 and tracking camera 3. In other words, when "camera" is used in this specification, it refers to a concept that includes both real cameras and virtual cameras that virtually change the composition by cutting out part of an image obtained by the light receiving operation of the real camera.

In addition, the term "imaging" is used in this specification, but this "imaging" refers to the action of obtaining an image using a camera when both real cameras and virtual cameras are defined as "cameras" as described above.

In the following explanation, image cropping may also be referred to as "cutout."

In addition, in this example, the image of CAM 6 is generated by cropping and outputted, for example, an image with a basic angle of view narrower than the angle of view of the image captured by parent camera 2.

1 is configured with a computer device having, for example, a CPU (Central Processing Unit), and performs image generation from CAM 4 to CAM 6 as described above based on an image captured by parent camera 2, and performs composition control for an image captured by tracking camera 3 as a real camera. Specifically, based on an operation input, the composition control of an image captured by tracking camera 3 is performed by controlling pan and tilt using camera platform 4 and zoom of tracking camera 3.
The composition control according to the embodiment will be described in detail later.

2. Hardware configuration of information processing device
FIG. 3 is a block diagram showing an example of the hardware configuration of the information processing device 1. As shown in FIG.
The information processing device 1 may take the form of a personal computer, for example.
3, a CPU 11 of the information processing device 1 executes various processes according to programs stored in a non-volatile memory unit 14 such as a read-only memory (ROM) 12 or an electrically erasable programmable read-only memory (EEP-ROM), or programs loaded from a storage unit 19 to a random access memory (RAM) 13. The RAM 13 also stores data necessary for the CPU 11 to execute various processes, as appropriate.
The CPU 11, the ROM 12, the RAM 13, and the non-volatile memory unit 14 are interconnected via a bus 23. The input/output interface 15 is also connected to this bus 23.

The input/output interface 15 is connected to an input unit 16 including an operator and an operating device.
For example, the input unit 16 may be various types of operators or operation devices such as a keyboard, a mouse, keys, a dial, a touch panel, a touch pad, or a remote controller.
An operation by a user is detected by the input unit 16 , and a signal corresponding to the input operation is interpreted by the CPU 11 .

In addition, the input/output interface 15 is connected, either integrally or separately, to a display unit 17 formed of an LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) panel, or the like, and an audio output unit 18 formed of a speaker, or the like.
The display unit 17 is a display unit that performs various displays, and is configured, for example, by a display device provided in the housing of the information processing device 1, a separate display device connected to the information processing device 1, or the like.
The display unit 17 executes display of images for various image processing, moving images to be processed, etc. on the display screen based on instructions from the CPU 11. Furthermore, the display unit 17 displays various operation menus, icons, messages, etc., that is, GUI (Graphical User Interface), based on instructions from the CPU 11.

The input/output interface 15 may be connected to a storage unit 19 configured with a hard disk or solid-state memory, or a communication unit 20 configured with a modem or the like.
The communication unit 20 performs communication processing via a transmission path such as the Internet, and communication with various devices via wired/wireless communication, bus communication, and the like.

A drive 21 is also connected to the input/output interface 15 as required, and a removable recording medium 22 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately mounted thereon.
The drive 21 allows data files such as image files and various computer programs to be read from the removable recording medium 22. The read data files are stored in the storage unit 19, and images and sounds contained in the data files are output on the display unit 17 and the sound output unit 18. In addition, the computer programs and the like read from the removable recording medium 22 are installed in the storage unit 19 as necessary.

In this information processing device 1, software can be installed via network communication by the communication unit 20 or via the removable recording medium 22. Alternatively, the software may be stored in advance in the ROM 12, the storage unit 19, etc.

3. Composition Control as an Embodiment
FIG. 4 is a diagram showing an example of an operation screen Gm as an embodiment used for composition control for each camera.
In this example, the operation screen Gm is a screen displayed on the display unit 17 of the information processing device 1, and as shown in the figure, an execution instruction button Bm for issuing an instruction to execute composition control is arranged for each of the cameras CAM1 to CAM6. Specifically, in the operation screen Gm of this example, a camera name display area Arc for displaying name information of each of the cameras CAM1 to CAM6 is provided, and an execution instruction button Bm is arranged for each camera.
A user, such as a director, can operate an execution instruction button Bm displayed for a certain camera to instruct the information processing device 1 to execute composition control for that camera in a manner determined corresponding to the operated execution instruction button Bm.

In this example, two items related to composition control are defined: "Direction" and "Target". The operation screen Gm is provided with a direction area Are for arranging the execution instruction button Bm related to "Direction" for each camera, and a target area for arranging the execution instruction button Bm related to "Target" for each camera.

The composition control of the "performance" is a composition control based on the imaging performance information. The imaging performance information can be said as information indicating a change in the captured image, and specifically, is information indicating a visual performance mode related to the captured image.
Examples of composition control related to this "production" include control of "gentle zoom" (a slow zoom technique often used to film live music performances) and "handheld feel" (the effect of filming with a handheld camera).
There are four types of gradual zoom effects available: "pull-to-close" (i.e. zooming in), "close-to-pull" (zooming out), "alternate," and "random" (a random combination of pull-to-close and close-to-pull).

The user can specify in advance in the "Control Settings" on the settings screen Gs (described later) which type of performance execution command button Bm will be placed for each camera in the performance area Are (see Figure 8).

"Target" is an item related to specifying the subject to be tracked. By operating the execution instruction button Bm belonging to this "Target" item, composition control is performed to realize a composition that includes the subject corresponding to the execution instruction button Bm in the captured image.

In this example, the composition control of the "target" may be either control that uses the subject's position detection results or control that does not. An example of composition control that uses the subject's position detection results is subject tracking control. In this case, the execution instruction button Bm is associated with the subject to be tracked, and by operating the execution instruction button Bm, control is performed to realize a composition that tracks the associated subject. For example, if the performers to be imaged are members of a rock band, the "target" execution instruction button Bm can be associated with the performer as "vocalist" or "guitarist," and the user can operate the execution instruction button Bm to give an instruction to execute control that realizes a composition that tracks the performer as "vocalist" or "guitarist."

Moreover, as an example of composition control that does not use the subject position detection result, there can be mentioned control according to a "fixed position" described later. As described later, the "fixed position" is a function that allows a specific position in the imaging target space to be preset as the position of the imaging target, and the user can operate the "fixed position" execution instruction button Bm to instruct execution of control that realizes a composition in which the preset specific position fits within the imaging image (for example, a composition in which the specific position coincides with a predetermined position such as the center of the imaging image). For example, in the figure, the execution instruction buttons Bm for "drum left", "drum right", "drum center", and "pull back" correspond to the "fixed position" execution instruction button Bm.
As can be understood from the above description, the subject (target) of imaging in the composition control of this embodiment is not limited to a specific person, but also includes a specific position.

Here, in this embodiment, the user can arbitrarily specify the composition for tracking a subject with the camera. Specifically, the position and size (zoom amount) of the target to be tracked within the image frame can be arbitrarily specified. Therefore, when the execution instruction button Bm, which is arranged in the target area Art in the figure and instructs control to realize a composition for tracking a target subject such as "vocals," "guitar," or "bass," is operated, the pan, tilt, and zoom of the target camera are controlled so that the tracking composition previously specified for the target subject is realized.

Here, on the operation screen Gm, the information processing device 1 (CPU 11) performs processing to display in different display modes the execution instruction buttons Bm for which composition control is being executed and the execution instruction buttons Bm for which composition control is not being executed, for both the execution instruction buttons Bm arranged in the performance area Are and the execution instruction buttons Bm arranged in the target area Art, such as highlighting the execution instruction buttons Bm for which operation has been performed and the corresponding composition control is being executed.

In this example, a tracking indicator It is provided for an execution instruction button Bm for issuing an instruction to execute composition control for tracking a target subject in a target area Art of the operation screen Gm. For the execution instruction button Bm provided with the tracking indicator It, the information processing device 1 performs a process of changing the display mode of the tracking indicator It depending on whether the distance measuring/imaging device 6 has detected a tracking target subject corresponding to the execution instruction button Bm.
For example, if the subject to be tracked is detected by any of the distance measuring/imaging devices 6, the display color of the tracking indicator It is a specific color such as blue, and if the subject to be tracked cannot be detected by any of the distance measuring/imaging devices 6, the display color of the tracking indicator It is a color other than the specific color such as gray.

Furthermore, for the execution command button Bm located in the performance area Are, a display can be made showing the level of composition control related to the performance executed by operating the execution command button Bm (see the numbers in the square in the figure). For example, for the gentle zoom, the zoom distance can be set in, for example, 10 steps in the setting screen Gs described below, so it is conceivable to display a numerical value indicating the step of the zoom distance. Furthermore, for the feel in the hand, the level can be set in, for example, 10 steps in the setting screen Gs, so it is conceivable to display a numerical value indicating the level.

On the operation screen Gm, for example, at the top of the screen, a controller name display area Acn, a pull-down button Bpd, a tracking target display area At, and a menu button B0 are provided.
The controller name display area Acn is an area for displaying name information of the controller.
It is desirable that which execution instruction button Bm is arranged for each camera can be switched for each imaging subject, such as for each artist to be imaged. In this example, it is possible to set multiple combinations of the arrangement of the execution instruction buttons Bm for each camera as "controllers" in a setting screen Gs described later. When multiple controllers are set, the user can specify the controller to be called up on the operation screen Gm from a list of controllers by operating a pull-down button Bpd on the operation screen Gm.

The tracking target display area At is an area that displays the identification information of each of the number of subjects that can be designated as tracking targets. In this example, the number of subjects that can be designated as tracking targets is set to "6," and therefore, in the tracking target display area At, six individual display areas, individual display area t1 to individual display area t6, are provided as individual display areas for displaying the identification information of each of the tracking target subjects. As shown in the figure, each individual display area has an image display area ti and a name display area tn.
As described later, in this embodiment, the designation of the tracking target subject is performed from among the subjects appearing in the captured image of the distance measuring/image capturing device 6, and Fig. 4 shows a state after such designation of tracking target subjects has been performed for up to five of the six subjects that can be designated as tracking targets. In the image display area ti for the subject for which the tracking target has been designated, an image captured by the distance measuring/image capturing device 6 for the designated subject is displayed.
Here, the composition of the image of the corresponding tracking target subject displayed in the image display area ti is a composition in which the upper half of the subject is visible.
In addition, if the ranging/imaging device 6 loses the subject to be tracked, the image display area ti will be switched to display a specific image other than the image captured by the ranging/imaging device 6 (for example, an icon image resembling the upper body of a person, displayed in the image display area ti of the sixth tracking target subject in the figure; hereinafter referred to as the "lost notification image").

In each individual display area in the tracking target display area At, the name display area tn displays the name information set in the setting screen Gs described below. If the name information has not been set, it will be displayed with a predetermined initial character string, such as "No Name" in the figure.

Also, on the operation screen Gm, the semi-auto ON/OFF button B1 is a button for instructing ON/OFF of the semi-auto composition control described below.

The menu button B0 is a button for calling up various menus.

Here, on the operation screen Gm, in the camera name display area Arc, it is also possible to display the camera being selected by the switcher 5 (the camera currently outputting PGM) and the so-called NEXT (Preview) camera (the camera that will be outputted as PGM next after the camera currently outputting PGM: a candidate camera for PGM output) (see the shaded and matte areas in the figure).

3. Composition Control as an Embodiment
(3-1. Specifying the subject to be tracked)
The procedure for designating a tracking target subject will be described with reference to FIGS.
To start designating a tracking target subject, as shown in FIG. 5, the menu button B0 on the operation screen Gm is operated, and the "Settings" item is selected from the called menu M.

FIG. 6 shows an example of a setting screen Gs that is displayed on the display unit 17 in response to a selection operation of the "Settings" item.
As shown in the figure, the setting screen Gs is provided with a common setting button B2, a control setting button B3, and a start button Bs, as well as a camera switching operation area As1, a subject name setting area As2, and a controller setting area As3.
FIG. 6 shows an example of the setting screen Gs that is displayed when the common setting button B2 is selected out of the common setting button B2 and the control setting button B3.

The camera switching operation area As1 is provided with a camera switching button Bc and a camera name input box bcm. The camera switching button Bc allows the camera for which the camera name is to be set to be switched between CAM1 to CAM6.
In the camera switching operation area As1, the user can set name information for the selected camera by selecting the camera for which the user wants to set a name using the camera switching button Bc, and then inputting name information into the camera name input box bcm.

In the subject name setting area As, subject name input boxes (bt1 to bt6) are provided for the number of possible tracking target subjects. As described above, in this example, the number of possible tracking target subjects is 6, and subject name input boxes bt1 to bt6 are provided.
The name information input in this subject name input box is reflected in the corresponding name display area tn in the tracking target display area At of the operation screen Gm.

In the controller setting area As3, controller slots are displayed, and within these slots, a controller name input box bcn and a controller delete button Be1 are provided. The controller name input box bcn allows the user to set name information for the controller corresponding to the slot. In addition, by operating the delete button Be1, the user can instruct the user to delete the controller corresponding to the slot.
In addition, an add controller button Bp1 is provided in the controller setting area As3, and the addition of a new controller slot can be instructed by operating the add button Bp1.

Figure 7 shows an example of the setting screen Gs after the camera name, tracking target subject name, and controller name have been set from the state shown in Figure 6.

When the start button Bs is operated on the setting screen Gs, the screen transitions to displaying an operation screen Gm that reflects the combination of execution command buttons Bm for each camera that has been set for the controller. At this time, the operation screen Gm displays information about the controller name that has been set for the selected controller in the controller name display area Acn.

Here, on the setting screen Gs, the control setting button B3 can be operated to transition to the control setting mode, whereby the layout and combination of the execution instruction buttons Bm for each camera can be set, i.e., the controller can be set.

FIG. 8 shows an example of the setting screen Gs in the control setting mode.
In the setting screen Gs in the control setting mode, a common setting button B2, a control setting button B3, and a start button Bs are provided, similar to the setting screen Gs in the common setting mode shown in Figure 6, while a camera switching button Bc, an operation area controller name display area Acn, a pull-down button Bpd, a performance setting area As4, and a preset setting area As5 are also provided.

The effect setting area As4 is provided with check boxes cb that allow for individual specification of composition control related to the effect, specifically, in this example, "pull back → close up", "random", "close back → pull back", "alternate", and "handheld feeling" related to the effect of "gradual zoom". Also, the effect setting area As4 is provided with a setting operation section such as, for example, a slide bar for setting the zoom distance (zoom amount) for at least the "gradual zoom".
The strength of the "sense of holding" is expressed as a change in at least one of the magnitude of blurring of the composition and the speed of the blurring.

In the preset setting area As5, preset slots are displayed. Here, setting a preset corresponds to setting the execution instruction button Bm to be placed in the target area Art on the operation screen Gm. Specifically, it is setting the composition control content to be associated with the execution instruction button Bm.

Within the preset slot, there are provided a preset name input box bpn for setting the preset name (i.e., the name information to be displayed on the execution instruction button Bm), a check box cbs for specifying composition control that captures the aforementioned "fixed point position" as a composition control mode, and a check box cbt for specifying composition control that tracks the subject.
When the latter check box cbt is checked, a check box cb for selecting which subject is to be tracked is displayed in the slot of the preset name.
As the check boxes cb here, the check boxes cb for the tracking target subjects for which name information has been set in the previously described subject name setting area As2 (see FIG. 6) are displayed. In the example of FIG. 7, name information has been set for five people, a vocalist, a guitarist, a bassist, a drummer, and a keyboardist, and accordingly, here, an example is shown in which check boxes cb (cb1 to cb5) for these five people are displayed.

In the preset setting area As5, a delete button Be2 for deleting the preset corresponding to the slot is disposed in the preset slot.
In addition, an add button Bp2 for instructing the addition of a preset slot is also arranged in the preset setting area As5.

In this type of setting screen Gs during control setting mode, the user can set the execution command button Bm to be displayed on the operation screen Gm for the desired camera by operating the performance setting area As4 and preset setting area As5 while switching the camera to be set using the camera switching button Bc.

Here, we have taken the example of setting the execution command button Bm to be displayed on the operation screen Gm for the controller set with the name "Artist ○△". However, if multiple controller slots have been created in the controller setting area As3 (see FIG. 6) described above, the user can operate the pull-down button Bpd on the setting screen Gs shown in FIG. 8 to call up another controller on the setting screen Gs and set the execution command button Bm in the same way.

FIG. 9 shows an example of the operation screen Gm after the controller has been set.
Specifically, an operation screen Gn is shown as an example that is displayed in response to the start button Bs being operated after the setting operation for the controller of "artist ○△" on the setting screen Gs shown in FIG.

In this state, the name information of the subject to be tracked is set, but it is not yet set which subject should be specifically tracked.
In this embodiment, the specification of which subject should be specifically tracked, that is, the specification of the tracking target subject, is performed by the following method: That is, the tracking target subject is specified by the user from among the subjects captured in the sensing image captured by the distance measuring/imaging device 6 (that is, a sensor device having a light receiving sensor other than the tracking camera 3).

In this case, the user first performs an operation (e.g., a tap operation) to select the individual display area of the subject to be specified in the tracking target display area At of the operation screen Gm. Here, it is assumed that an operation to select the individual display area t1 of the subject as the "vocalist" has been performed (see <1> in the figure).

When an operation for selecting any one of the individual display areas is performed in the tracking target display area At of the operation screen Gm, the information processing device 1 performs a process for superimposing and displaying the image Ims captured by the distance measuring/imaging device 6 on the operation screen Gm, as shown in Fig. 10. Specifically, in this example, the captured image Ims is superimposed and displayed on the performance area Are.
Here, there are four distance measuring and imaging devices 6, and only the captured images Ims from three of the devices are displayed as an example, but it is of course possible to display the captured images Ims from all four devices.

At this time, the information processing device 1 performs subject recognition processing on the captured images Ims input from each distance measurement/imaging device 6, and displays a detection frame Fd for the subject appearing in each captured image Ims in a superimposed manner.

The user performs an operation to specify the subject (vocalist in this example) selected in the previous operation in Fig. 9 from the subjects appearing in the captured image Ims displayed on the operation screen Gm in this way (see <2> in the figure). Specifically, this specification operation is an operation to select the detection frame Fd of the corresponding subject, specifically, for example, an operation to tap inside the detection frame Fd.
This operation of specifying the detection frame Fd needs to be performed for at least one captured image Ims.

By performing this selection operation of the detection frame Fd, it is possible to specify which specific subject should be tracked from among the subjects in the individual display area selected in the previous operation <1>.

In response to a selection operation of the detection frame Fd, the information processing device 1 performs a process of displaying name information of the subject specified by the selection operation of the detection frame Fd, i.e., the subject specified as the tracking target, at a position corresponding to the detection frame Fd, as shown in Figure 11.
In addition, in response to the selection operation of the detection frame Fd, the information processing device 1 performs a process of displaying an image of the subject to be tracked, extracted from the captured image Ims, in the image display area ti of the individual display area selected by the previous operation <1>, as illustrated in Figure 11 (see <3> in the figure).
As will be understood from the above description, the image displayed in the image display area ti displays an image in which the upper body of the tracking target subject in the captured image Ims is extracted.

Here, the name information of the tracking target subject for the detection frame Fd can be displayed not only for the image in which the detection frame Fd is selected, but also for all captured images Ims in which the tracking target subject appears.

By displaying name information in the detection frame Fd and displaying an image in the image display area ti as described above, the user can intuitively understand whether or not the subject to be tracked has been specified as intended.

Here, the detection frame Fd for a subject designated as a tracking target subject and the detection frame Fd for a subject not designated as a tracking target subject can be displayed in different display modes. For example, designated can be a specific color such as blue, and undesignated can be a non-specific color such as gray. Alternatively, designated can be a solid frame and undesignated can be a dotted frame, or designated can be a thick frame and undesignated can be a thin frame.

The user can similarly specify other subjects for which name information has been set as the tracking target subject from the captured image Ims by selecting the individual display area of the tracking target display area At and selecting the detection frame Fd in the captured image Ims.

Here, in this embodiment, the information processing device 1 determines whether or not the brightness of the captured image Ims displayed on the operation screen Gm for specifying the subject to be tracked is equal to or lower than a predetermined brightness, and if the brightness becomes equal to or lower than the predetermined brightness, performs a process of displaying an image based on the distance image captured by the distance measurement/imaging device 6 as an image for accepting the specification of the subject to be tracked.
Specifically, in this example, when the brightness of the captured image Ims becomes equal to or lower than a predetermined brightness, the information processing device 1 performs a process of displaying a distance image captured by the distance measuring/capturing device 6 by superimposing it on the captured image Ims, as illustrated in FIG. 12 .

The brightness of the captured image Ims may be determined based on, for example, an average luminance value within the image.
Moreover, instead of the distance image, a simplified model generated based on the distance image may be displayed.

　Displaying an image based on the distance image as described above makes it easier for the user to specify the subject to be tracked, even in a dark environment. In particular, it is expected that the operation of specifying the subject to be tracked will be performed when the artist to be imaged comes onstage and before they begin performing. At this time, the stage lights are often turned off, making it a dark environment, so it is effective to display an image based on the distance image as described above.

(3-2. Specifying the tracking composition)
Next, the designation of the tracking composition will be described with reference to FIGS.
In this embodiment, it is possible to specify a tracking composition for each tracking target subject.
When specifying a tracking composition for a certain tracking target subject, the user performs an operation to select the subject for which he or she wishes to specify a tracking composition in the subject name setting area As2 of the setting screen Gs in the common setting mode shown in FIG.

In response to this selection operation, the information processing device 1 performs processing to display a composition designation screen as shown in FIG. 13A on the display unit 17.
On this composition designation screen, a virtual simple model Mbv of the subject, an image frame Fs, a reset button B5, and an end button B6 are displayed. Here, the virtual simple model Mbv is a virtual simple model of the subject.

In this embodiment, the information processing device 1 accepts a designation of a tracking composition as a designation of a tracking image frame for a display image of the virtual simple model Mbv.
Specifically, in this example, the image frame Fs side is fixed, and an adjustment operation for the size and position of the virtual simple model Mbv within the image frame Fs is accepted as an adjustment operation for the tracking composition.

In this example, the composition adjustment is performed by operating the keys on the keyboard. Specifically, the + and - key operations are performed to enlarge and reduce the virtual simple model Mbv within the image frame Fs, and the up, down, left, and right arrow key operations are performed to adjust the position of the virtual simple model Mbv within the image frame Fs.

In Figure 13, the transition from Figure 13A to Figure 13B shows an enlargement operation performed with the + key, and the transition from Figure 13B to Figure 13C shows the virtual simple model Mbv being shifted to the right within the image frame Fs with the right arrow key.

If the user wishes to specify the adjusted composition, he or she operates the end button B6. In addition, the composition can be reset to the initial state shown in FIG. 13A by operating the reset button B5.

For the sake of clarity, it is possible to specify the tracking composition before specifying the subject to be tracked from the captured image Ims as described in Figures 9 to 11.

In addition, on the composition specification screen for the tracking composition, if zooming based on the center point is not possible when the zoom is changed at the boundary value, it is possible to zoom by moving the zoom center point to a range that includes the virtual simple model Mbv.

When the end button B6 is operated to specify a tracking composition, the information processing device 1 performs a process of calculating and storing parameter information for reproducing the specified composition.

FIG. 14A shows a schematic representation of the designated tracking composition.
When the tracking composition is specified, the information processing device 1 obtains an intersection J as shown in Fig. 14B. This intersection J is an intersection between a vertical center line Vc (a line vertically bisecting the image frame Fs), which is a center line in the vertical direction of the image frame Fs, and a horizontal center line (a line horizontally bisecting the virtual simple model Mbv) of the virtual simple model Mbv.
FIG. 14B shows the center Fc of the image frame Fs and a horizontal center line Hc (a line that bisects the image frame Fs in the horizontal direction).

As parameter information for making it possible to reproduce the designated composition, the information processing apparatus 1 obtains the vertical positional deviation amount Vrc shown in FIG. 15A and the horizontal offset amount Hof shown in FIG. 15B.
The vertical position deviation amount Vrc is a value obtained by scaling the vertical position of the intersection point J by a vertical scaler (hereinafter referred to as a "first vertical scaler") that defines the vertical position of the head of the virtual simple model Mbv as 1.0 and the vertical position of the ankle as 0.0. In other words, the vertical position deviation amount Vrc is a value that represents the vertical position of the intersection point J in a relative value, with the length from the ankle to the head of the virtual simple model Mbv being 1.0.

In this example, the tracking composition can be specified within a range that satisfies the condition that at least a portion of the virtual simple model Mbv fits within the image frame Fs, so the vertical position deviation amount Vrc may be a value exceeding 1.0 or a value less than 0.0 (i.e., a negative value).

The horizontal offset amount Hof is calculated as a value scaled by a horizontal scaler in which the left end position of the image frame Fs is 0.0 and the right end position is 1.0, as shown in FIG. 15B.

The information processing device 1 also stores information on the zoom amount Zr as parameter information for enabling reproduction of the specified composition.
The information on the zoom amount Zr is stored as the vertical length of the image frame Fs in the first vertical scaler, which is determined from the enlargement/reduction operation of the virtual simple model Mbv performed on the previous composition specification screen, expressed as a value between the minimum allowable zoom amount = 0.0 and the maximum allowable zoom amount = 1.0.

The maximum allowable zoom amount may be considered to be the amount at which the vertical length of the image frame Fs is twice the vertical length of the virtual simple model Mbv, and the minimum allowable zoom amount may be considered to be the amount at which the vertical length of the image frame Fs is 1/5 the vertical length of the virtual simple model Mbv.

Next, a method for calculating a target image frame for reproducing a specified tracking composition will be described.
In this example, the target image frame is calculated not in the camera coordinate system of the tracking camera 3 but in a world coordinate system (three-dimensional spatial coordinate system) based on the distance measuring/imaging device 6. Specifically, the vertical and horizontal center coordinates of the target image frame are calculated.

The method of calculating the target image frame differs depending on the following cases 1 to 4.
FIG. 16 is an explanatory diagram of cases 1 to 4.
Case 1 is a case where the value of the vertical positional deviation amount Vrc is larger than the value (1.0) of the head position of the hypothetical simplified model Mbv on the first vertical scaler.
Case 2 is a case where the value of the vertical positional deviation amount Vrc is equal to or smaller than the value of the head position of the hypothetical simplified model Mbv on the first vertical scaler and is larger than the value of the pelvis position.
Case 3 is a case where the value of the vertical positional deviation amount Vrc is equal to or smaller than the value of the pelvis position of the hypothetical simplified model Mbv on the first vertical scaler and equal to or larger than the value of the ankle position.
Case 4 is a case where the value of the vertical positional deviation amount Vrc is smaller than the value of the ankle of the hypothetical simplified model Mbv on the first vertical scaler.

In all of cases 1 to 4, the vertical center coordinate of the target image frame is calculated using a similar method based on the three-dimensional position information of the detection simple model Mbd, which is a simple model of the subject to be tracked generated based on the distance image by the ranging/imaging device 6, and the vertical position deviation amount Vrc.
Specifically, to determine the vertical center coordinate of the target image frame, a vertical scaler (called the "second vertical scaler") is used that sets the vertical position of the head position of the detected simple model Mbd to 1.0 and the vertical position of the ankle position to 0.0.
At this time, the vertical position of the ankle used as the reference by the second vertical scaler is the vertical position of the lower of the left and right ankles, taking into consideration the possibility that the subject to be tracked may have one leg raised.

The vertical center coordinates of the target image frame are determined in the second vertical scaler as the coordinates of the vertical position specified by the vertical position deviation amount Vrc.
That is, for example, if the value of the vertical positional deviation amount Vrc is 0.8, the vertical coordinate of the position where this becomes 0.8 in the second vertical scaler is determined as the vertical center coordinate of the target image frame.

The horizontal center coordinates of the target image frame are determined as follows in each of cases 1 to 4.
That is, in case 1, the horizontal center coordinate of the target image frame is determined so that the horizontal offset amount from the horizontal coordinate of the head position in the detected simple model Mbd of the tracked subject matches the horizontal offset amount Hof.

In case 2, the horizontal center coordinate of the target image frame is determined so that the horizontal offset from the horizontal center line of the simple model Mbd of the tracked subject matches the horizontal offset Hof.

In case 3, the horizontal center coordinate of the target image frame is determined so that the horizontal offset from the horizontal coordinate of the pelvis center position in the simple detection model Mbd of the tracking target subject matches the horizontal offset Hof.

In case 4, as in case 3 above, the horizontal center coordinate of the target image frame is determined so that the horizontal offset from the horizontal coordinate of the pelvis center position in the simple detection model Mbd of the tracking target subject matches the horizontal offset Hof.

Here, in all cases from Case 1 to Case 4, the size of the target image frame is set to a size determined by the zoom amount Zr in the second vertical scaler described above.

By adopting a composition reproduction method based on relative values using the virtual simple model Mbv and the detected simple model Mbd as described above, it is possible to ensure that the specified tracking composition is properly reproduced even if the size of the virtual simple model Mbv differs from the size of the actual detected simple model Mbd (the size of the subject to be tracked). In other words, it is possible to ensure that the specified tracking composition is properly reproduced regardless of the size of the subject to be tracked.

Here, as described above, the center coordinates of the target image frame are set as coordinate information in the world coordinate system. In order to make it possible to convert this into coordinate information in the camera coordinate system of the tracking camera 3 to be controlled, the positional relationship between at least each distance measuring/image capturing device 6 and each tracking camera 3 is grasped by a calibration process described later.
By grasping the positional relationship between each distance measuring/imaging device 6 and each tracking camera 3, it is possible to obtain a coordinate conversion table for converting coordinates in three-dimensional space into coordinates in the camera coordinate system of each tracking camera 3. If the coordinates in the camera coordinate system are obtained for the center position of the target image frame, it is possible to calculate pan and tilt angle information for matching the optical axis center with the coordinates, and it is possible to control the tracking camera 3 so as to realize a specified tracking composition.

In the above example, the first vertical scaler is a scaler that sets the vertical position of the head of the virtual simple model Mbv to 1.0 and the vertical position of the ankle to 0.0. However, any other scaler can be used as the first vertical scaler, which uses the vertical position of a specified part of the virtual simple model Mbv as a reference value and uses the length between specified parts spaced apart vertically on the virtual simple model Mbv as a reference unit.
Moreover, as the second vertical scaler, a scaler may be used that uses the vertical position of the specified portion of the simplified detection model Mbd as a reference value and the length between the specified portions in the simplified detection model Mbd as a reference unit.

Just to be clear, even when an execution command button Bm is selected to instruct the tracking of a tracking target subject with a specified tracking composition, and tracking of that tracking target subject is in progress, if an execution command button Bm is selected to instruct the execution of an effect such as a gentle zoom or handheld feel, composition control related to the corresponding effect will also be executed.

(3-3. Semi-automatic control)
The information processing device 1 of this embodiment is capable of performing composition control as semi-automatic control, in which the tracking target subject and tracking composition of each camera are automatically switched without the user selecting the execution instruction button Bm on the operation screen Gm.

FIG. 17 is an explanatory diagram of composition types that can be switched in semi-automatic control.
Composition types that can be switched in semi-automatic control include "UP (close-up shot)", "BS (bust shot)", "WS (waist shot)", and "FF (full figure)", as shown in Figures 17A to 17D.
"UP" is a composition that fits the face of the person as the subject (subject) to the full extent of the image frame, "BS" is a composition that fits only the part of the person as the subject from the chest to the top of the head within the image frame, "WS" is a composition that fits only the part of the person as the subject from the waist to the top of the head within the image frame, and "FF" is a composition that fits the whole of the person as the subject from the head to the feet within the image frame.

In semi-automatic composition control, the "composition" is specified by the "composition type" as shown in FIG. 17 and the type of subject to be imaged (in this example, the performer). For example, the composition is specified as capturing the subject as a guitar player with the composition type = "WS", or the composition is specified as capturing the subject as a vocalist with the composition type = "UP".

In semi-automatic control, the camera selects the composition based on a composition selection table.

FIG. 18 is an explanatory diagram of the composition selection table.
The composition selection table is information in which weight information is associated with each combination of an object to be captured by the camera and a composition type of the camera.
Here, an example of a composition selection table is shown for a case where the event to be imaged is a live music event, and the selectable subjects to be imaged are the vocalist, guitar player, or bass player of a music band.
The weighting information associated with each combination of an imaging subject and a composition type, in other words, the weighting information associated with each composition identified by the combination of an imaging subject and a composition type, can be said to be information indicating the selection rate of that composition.

In this embodiment, a composition selection table is prepared for each camera, and the information processing device 1 selects a composition for each camera 3 using the corresponding composition selection table.
In this example, the composition selection table is stored in a predetermined memory device such as the storage unit 19, and the information processing device 1 (CPU 11) selects a composition for each camera based on the composition selection table thus stored.

As for the composition selection table, it is conceivable to prepare multiple types of tables with at least different combination settings of composition type and weight information for each camera that is the target of composition control. For example, composition selection tables with combination settings of composition type and weight information for each music producer A, B, and C that make it easier to select the composition that is frequently used by those music producers are prepared, and the user is allowed to select from these composition selection tables the table to use for composition selection. This makes it possible for the user to select which music producer's style the captured image content will be finished as.

Here, in the semi-automatic control, composition selection for each camera is performed based on prohibited transition composition information so as to prevent a composition transition that is a predetermined prohibited transition from occurring.
The prohibited transition here refers to a composition transition that is prohibited from occurring between images that are sequentially selected by the switcher 5 .

Based on the above-mentioned prohibited transition composition information and the camera selection history information by the switcher 5 (corresponding to composition history information), the information processing device 1 selects a composition based on a composition selection table so that the composition transition between images selected sequentially by the switcher 5 does not become a prohibited transition defined in the prohibited transition composition information.
Specifically, based on the prohibited transition composition information and the composition of the camera currently selected by the switcher 5, the information processing device 1 identifies, from among the compositions in the composition selection table of each camera, compositions that would result in a prohibited transition if next selected by the switcher 5, as "prohibited transition compositions," and performs composition selection based on the weight information of the composition selection table, targeting only compositions excluding these prohibited transition compositions.

An example of composition selection based on such prohibited transition composition information will be described with reference to FIG.
Specifically, Figure 19 describes an example of composition selection for a certain camera when the composition of the camera selected by the switcher 5 is "Base WS," and a transition from the "Base WS" composition to a "Base UP" composition, and a transition to a composition in which the subject to be imaged is "Base," are defined as prohibited transitions.
In this case, in the composition selection table, the "Bass UP" composition and the composition with the "Bass" as the imaging subject are prohibited transition compositions, so the information processing device 1 selects compositions excluding these compositions (in the illustrated example, the composition with the vocalist as the imaging subject, the "Bass WS" composition, and the "Bass FF" composition).
This makes it possible to prevent the composition transition of the output image by the switcher 5 from becoming a prohibited transition even if the target camera is switched to a selected composition and the image captured by the camera is selected by the switcher 5.

Furthermore, in the semi-automatic control, the information processing device 1 executes composition selection based on the composition selection table for each camera described above, on the condition that image selection has been performed by the switcher 5 .
That is, when any of the captured images from each camera is newly selected by the switcher 5, a composition is selected for each camera based on the composition selection table accordingly, and the composition of each camera is controlled to the selected composition.

Furthermore, in semi-automatic control, the information processing device 1 performs composition selection based on the composition selection table, excluding the camera for which an image is being selected by the switcher 5. As described above, in this example, composition selection based on the composition selection table is performed on the condition that image selection has been performed by the switcher 5, but in this case, the camera for which an image is being selected by the image selection is excluded, and composition selection based on the composition selection table is performed.
As described above, by excluding a camera for which an image is being selected by the switcher 5 from the targets for composition selection, it is possible to prevent composition switching from being performed for a camera selected by the switcher 5 even though an image captured by the camera is being selected.

Furthermore, in the semi-automatic control, the information processing device 1 performs composition selection based on the composition selection table only for the musical accompaniment section identified from the audio analysis result of the event to be imaged.
As an example, each camera is equipped with a microphone, and audio data picked up by the microphone is added to the captured image data of each camera. The information processing device 1 then performs audio analysis on the added audio data to determine whether or not the section is a musical accompaniment section. Based on the result of this determination, the information processing device 1 selects a composition based on a composition selection table, targeting only the musical accompaniment section.
The method of acquiring the voice data is not limited to the above-mentioned method using the microphone of the camera. For example, it is also possible to use voice data collected by a microphone provided in a device other than a camera, such as a PA (Public Address) device, for voice analysis. In this case, the voice data is not attached to the image captured by the camera, but is treated as data independent of the captured image.
The determination as to whether or not a section includes audio accompaniment can also be made based on an operational input from the user.

In the case of captured image content of a music event, the need for composition change is low in inter-song portions, such as MC (Master of Ceremonies) portions, compared to during the middle of a song.
By selecting the composition only for the musical accompaniment section as described above, it is possible to prevent unnecessary composition switching between songs, thereby reducing the processing burden associated with composition switching.

Furthermore, in the semi-automatic control, the information processing device 1 updates the weight information in the composition selection table based on preset conditions.
As an example, the information processing device 1 updates the weight information based on the selection history information of the camera by the switcher 5. In this case, every time the switcher 5 selects a camera, in other words, a captured image is selected, the information processing device 1 performs a process of storing the captured image and information on the composition as selection history information in a predetermined storage device such as the storage unit 19, and updates the weight information in the composition selection table based on the selection history information.

A specific method of updating weights based on selection history information could be, for example, to reduce the weighting of the weighting information for compositions that are frequently selected by the switcher 5 (selection frequency equal to or greater than a certain frequency) among the weighting information for each composition in each composition selection table.
As a result, the weights are updated so that the switcher 5 is less likely to select the same composition.

The weight information may also be updated based on the content of the event to be imaged. Specifically, when a solo part of an instrument such as a guitar solo is detected as the content of the event to be imaged, the weight information may be updated to increase the weight of the composition in which the player of the instrument is the image target.
In this example, the detection of the solo part of the instrument is performed based on the results of audio analysis of the event to be imaged, for example, based on the results of audio analysis of the audio data attached to the images captured by each camera as described above, or the audio data obtained by PA equipment, etc.
The detection of the solo part of the instrument can also be performed based on an operational input by the user.

As described above, by increasing the weight of a composition that captures the player of an instrument in response to the detection of the solo part of that instrument, it becomes easier to select an appropriate composition according to the content of the event, thereby improving the quality of the captured image content.

Here, there are at least two possible examples of when to update the weight information:
The first example is an example in which the weighting information is updated on the condition that an image has been selected by the switcher 5 .
This makes it possible, in response to a certain camera (certain composition) being selected by the switcher 5, to adjust which composition is made easier (or harder) to select for the other cameras that were not selected.
Therefore, for the camera that was not selected, it is possible to make it easier to select a composition suitable for the next image selection by the switcher 5, and it is possible to increase the possibility that the captured images selected by the switcher 5 will include captured images with an appropriate composition, which leads to an improvement in the quality of the captured image content.

In the second example, the weighting information is updated when a predetermined sound change is detected from the audio analysis result of the imaged event. For example, the weighting information may be updated when a predetermined melody change, such as a change from a singing part to an instrument solo part, is detected.
This makes it possible to perform weight updates so that when the sound content of the event to be imaged transitions to specific content, an appropriate composition that corresponds to the specific content is more likely to be selected, such as by updating weights so that when a sound change that is estimated to have transitioned to a guitar solo part is detected, a composition in which the guitar player is the imaged is more likely to be selected.
Therefore, the quality of the captured image content can be improved.

For confirmation, an example of a processing procedure to be executed by the CPU 11 to realize semi-automatic control as the embodiment described above will be described with reference to the flowcharts of FIGS.
FIG. 20 shows an example of a processing procedure for implementing composition control for selecting a composition based on a composition selection table and switching to the selected composition.
First, in step S11, the CPU 11 waits until a composition switching trigger occurs. As will be understood from the above description, in this example, the condition for generating the composition switching trigger is that an image is selected by the switcher 5 and that the image is in a musical accompaniment section. In step S11, the CPU 11 performs a process of waiting for the establishment of the condition.

When a composition switching trigger occurs, the CPU 11 proceeds to step S12 and performs a process of identifying a composition that will result in a prohibited transition for each target camera. That is, based on the above-mentioned prohibited transition composition information and information on the composition of the camera currently selected by the switcher 5 (based on the above-mentioned selection history information of the camera), a process of identifying a composition that will result in a prohibited transition when next selected by the switcher 5 from among the compositions in the composition selection table of each camera as a "prohibited transition composition."
For confirmation, in the specification process of step S12, there may be a case where a specification result is obtained that there is no composition that corresponds to the prohibited transition composition in at least any of the composition selection tables.

In step S13 following step S12, the CPU 11 performs a process of selecting a composition based on the composition selection table for compositions other than those that are prohibited transitions. That is, for a composition selection table that has a composition that corresponds to a prohibited transition composition, composition selection is performed based on weight information for compositions other than the prohibited transition composition. For a composition selection table that does not have a composition that corresponds to a prohibited transition composition, composition selection is performed based on weight information for each composition in the table.
When all compositions are prohibited transition compositions, the composition is determined in anticipation of the next composition selection with respect to the selection of the other cameras.

Here, as described above, when composition selection is performed based on the composition selection table, excluding a camera for which an image is being selected by the switcher 5, composition selection based on the composition selection table is not performed for the relevant camera in step S13. This makes it possible to prevent composition switching from being performed for a camera for which an image is being selected by the switcher 5.

In step S14 following step S13, the CPU 11 controls the composition of the target camera so that the composition is the selected one. In other words, the camera for which composition selection was performed based on the composition selection table in step S13 is treated as the target camera, and control is performed so that the composition of the target camera is the selected composition.

In step S15 following step S14, the CPU 11 determines whether a processing end condition has been met. The processing end condition here is a predetermined condition that has been set in advance as a condition for ending the series of processes shown in FIG. 20, such as a state in which generation of captured image content should be ended.

If it is determined that the processing end condition is not satisfied, the CPU 11 returns to step S11. As a result, in response to the occurrence of another composition switching trigger, composition selection based on the composition selection table and switching to the selected composition are executed again for the corresponding camera.

On the other hand, if it is determined that the processing termination condition is met, the CPU 11 ends the series of processing steps shown in FIG. 20.

FIG. 21 is a flowchart of a process related to updating weight information.
In step S21, the CPU 11 waits for the occurrence of a weight update trigger. As will be understood from the above description, the weight update trigger may be the first or second example described above. In the first example, in step S21, the CPU 11 waits for an image selection by the switcher 5. In the second example, in step S21, the CPU 11 waits for a predetermined sound change to be detected from the audio analysis result of the image capture target event (for example, a change from a singing part to a solo part of an instrument to be detected).

If a weight update trigger occurs, the CPU 11 proceeds to step S22 and performs processing for determining a weight for each composition in the composition selection table of each camera.
For example, when updating the weights so that compositions that are frequently selected by the switcher 5 are more likely to be selected as described above, a higher value is determined for the weight information of compositions that are selected by the switcher 5 at a certain frequency or higher.
In addition, when updating the weighting based on the content of the event to be imaged, for example, in response to a case where a solo part of an instrument is detected, a higher value is determined for the weighting information of a composition in which the player of that instrument is the image subject.

In step S23 following step S22, the CPU 11 performs a process of updating the weights to the determined weights. That is, the CPU 11 performs a process of updating the numerical values of the corresponding weight information among the weight information in the composition selection table for each camera to the numerical values determined in step S22.

In step S24 following step S23, the CPU 11 determines whether a processing end condition has been met. The processing end condition here is a predetermined condition that has been set in advance as a condition for ending the series of processes shown in FIG. 21, such as a state in which generation of captured image content should be ended.

If it is determined that the processing termination condition is not met, the CPU 11 returns to step S21. As a result, an update is performed on the corresponding weight information in response to the occurrence of another weight update trigger.

On the other hand, if it is determined that the processing termination condition is met, the CPU 11 ends the series of processing steps shown in FIG. 21.

In the above, an example was given in which composition selection based on the composition selection table is performed on the condition that image selection has been performed by the switcher 5, but composition selection based on the composition selection table can also be performed in response to changes in the content of the event to be imaged.
For example, based on the results of audio analysis of the event being imaged, a composition may be selected based on a composition selection table on the condition that a predetermined change in melody (change in event content), such as a change from a singing part to an instrument solo part, is detected.
In this way, by selecting a composition based on the composition selection table in response to a change in the content of the event to be imaged, it is possible to achieve appropriate composition switching in response to the change in the content of the event to be imaged.

In addition, composition selection based on the composition selection table may be performed according to the amount of time that has elapsed since the composition was switched based on the previous composition selection. For example, composition selection may be performed on the condition that a certain amount of time has elapsed since the composition was switched based on the previous composition selection.

It is also conceivable that composition selection based on the composition selection table may be performed in response to a specified operational input by the user. In this case, for example, by operating a specified button, composition selection based on the composition selection table and switching to the selected composition are performed for cameras other than the camera for which an image is being selected by the switcher 5.

In addition, in updating the weight information, the weight of a composition that has been selected by the switcher 5 can be reduced based on the selection history information of the camera by the switcher 5 .
As a result, the weights are updated so that a composition that has been selected by the switcher 5 becomes less likely to be selected, and it is possible to prevent the same composition from frequently appearing in the captured image content, thereby preventing deterioration in the quality of the content.

An example of a procedure for setting semi-automatic control will be described with reference to FIGS.
To enable semi-automatic control, as shown in FIG. 22, the controller add button Bp1 provided in the controller setting area As3 of the setting screen Gs (in the common setting mode) is operated.
In response to the operation of the Add button Bp1, a dialogue box is displayed asking whether or not to perform semi-automatic setting, as shown in the figure. The user can move to the display of a semi-automatic pattern setting screen as shown in FIG. 23 by operating the "Yes" button B7 provided in the dialogue box.
When the "No" button B8 in the dialog box is operated, a normal controller slot is added to the controller setting area As3 for manually setting the execution instruction button Bm for each camera as described above in FIG. 8.

In FIG. 23, the semi-automatic pattern setting screen has check boxes cba for selecting the number of members of the artist to be imaged. Specifically, in this example, four check boxes cba are provided to allow selection of 3-piece, 4-piece, 5-piece, and others. These check boxes cba are provided to instruct filtering so that the artist personnel composition pattern information displayed on the semi-automatic pattern setting screen is only information on the corresponding number of members among the artist personnel composition patterns that can be handled by semi-automatic control.

On the semi-automatic pattern setting screen, for the displayed personnel composition pattern information, a check box cbb is provided for selecting the corresponding personnel composition pattern.
The user checks a check box cbb for selecting the personnel composition pattern of the artist to be imaged from among the personnel composition patterns displayed as shown in the figure, and operates the OK button B9.
In response to this, the information processing device 1 generates a controller to which a composition selection table determined in accordance with the selected personnel composition pattern is linked.

Although not shown in the drawings, the information processing device 1 displays an operation screen Gm reflecting the controller in response to the operation of the OK button B9. Specifically, the information processing device 1 displays an operation screen Gm on which execution instruction buttons Bm representing compositions such as "vocal up" and "guitar work sound" that can be selected from the composition selection table are arranged.
During execution of semi-automatic control, when a certain composition is being selected based on a composition selection table, the information processing device 1 performs processing to display, on the operation screen Gm, an execution instruction button Bm indicating the selected composition in a display mode different from that of the execution instruction button Bm indicating a composition not being selected.

Here, the information processing device 1 switches semi-automatic control ON/OFF in response to the operation of the semi-automatic ON/OFF button B1 on the operation screen Gm.

Semi-automatic control may involve automatically turning on/off the device based on preset conditions, rather than on the basis of a user operation.
For example, since frequent composition changes are undesirable during MC, it may be possible to automatically turn off the semi-automatic control, whereas during a performance, it may be possible to automatically turn on the semi-automatic control.
Alternatively, it is also possible to automatically turn semi-automatic control ON/OFF in accordance with the frequency of camera selection by the switcher 5. For example, it is possible to turn semi-automatic control ON when the frequency of camera selection by the switcher 5 is high, and OFF when the frequency of camera selection is low.

In addition, if the semi-automatic control is turned off, it is also possible to turn off composition controls such as "gentle zoom" in conjunction with this.

(3-4. Functional configuration related to composition control according to the embodiment)
Figure 24 is an explanatory diagram of the functions related to composition control as an embodiment of the invention possessed by the CPU 11 of the information processing device 1, and shows the parent camera 2, tracking camera 3, pan head 4, switcher 5, and distance measurement/imaging device 6 shown in Figure 1, along with functional blocks of various functions possessed by the CPU 11.

As shown in the figure, the CPU 11 has the functions of a calibration unit F1, a designation reception unit F2, an operation reception unit F3, a composition selection unit F4, a composition switching control unit F5, a weight update unit F6, an image recognition processing unit F7, an image frame calculation unit F8, a coordinate calculation unit F9, a pan head/camera control unit F10, and a cutout image generation unit F11.

The calibration unit F 1 performs a calibration process to identify the same points that appear in both the distance image obtained by the distance measuring/imaging device 6 and the captured image obtained by the tracking camera 3 .
An example of a specific calibration method performed by the calibration unit F1 in this embodiment will be described later.

The specification receiving unit F2 performs processing to receive various specifications related to composition control in the embodiment.

Specifically, the designation reception unit F2 performs a process in which the user designates the subject to be tracked from among the subjects appearing in the image Ims captured by the distance measurement and image capture device 6. Note that the specific method of accepting the designation of the subject to be tracked has already been described with reference to Figures 9 and 10, so a duplicate description will be avoided.

In addition, as explained above with reference to FIG. 12, when the brightness of the captured image Ims becomes equal to or lower than a predetermined brightness, the designation receiving unit F2 performs processing to display an image based on the distance image captured by the distance measuring/capturing device 6 as an image for accepting the designation of the subject to be tracked.

In addition, as described above with reference to FIG. 11, the designation reception unit F2 performs processing to display the name information of the tracking target subject in a position corresponding to the detection frame Fd of the tracking target subject in response to the designation of the tracking target subject.

Furthermore, as described above with reference to FIG. 11, in response to the designation of a tracking target subject, the designation receiving unit F2 performs processing to display an image of the designated tracking target subject in the image display area ti of the corresponding subject in the tracking target display area At.

The specification receiving unit F2 also receives a specification of the tracking composition, as described above with reference to Figures 13 to 15, and performs processing to calculate the vertical position deviation amount Vrc and the horizontal offset amount Hof.

In addition, the specification reception unit F2 accepts instructions to execute specified tracking composition control, which controls the composition of the tracking camera 3 so that the tracking composition is the tracking composition specified by the user for the subject to be tracked, and accepts instructions to execute semi-automatic control.
In this example, the acceptance of an execution instruction operation of the designated tracking composition control corresponds to the acceptance of an operation of the corresponding execution instruction button Bm arranged on the operation screen Gm. Also, in this example, the acceptance of an execution instruction operation of the semi-automatic control corresponds to the acceptance of an operation of the semi-automatic ON/OFF button B1 described above.

The operation reception unit F3 receives operations on the operation screen Gm. This operation reception unit F3 recognizes which execution instruction button Bm has been operated for which camera from CAM1 to CAM6.

The composition selection unit F4, the composition switch control unit F5, and the weight update unit F6 are functional units for realizing semi-automatic control.
The composition selection section F4 selects a composition for each camera based on the composition selection table described above.
The composition switching control unit F5 performs control for switching the composition of each camera to the composition selected by the composition selection unit F4. Specifically, the composition switching control unit F5 performs processing for instructing the image frame calculation unit F8 on the composition selected for each camera by the composition selection unit F4.

The weight update unit F6 updates the weight information in the composition selection table. The specific method for updating the weight information has already been explained, so a duplicate explanation will be avoided.

While the specified tracking composition control is being executed, the image frame calculation unit F8 calculates the target image frame of the tracking camera 3 for tracking the tracking target subject based on the three-dimensional position information of the detection simplified model Mbd of the tracking target subject generated based on the distance image obtained by the ranging/imaging device 6.
Specifically, the vertical and horizontal center coordinates of the target image frame are calculated based on the three-dimensional position information of the detection simple model Mbd, the vertical position deviation amount Vrc, and the horizontal offset amount Hof.
Further, the image frame calculation unit F8 calculates a target image frame so as to obtain a composition according to the operation content on the operation screen Gm recognized by the operation reception unit F3. For example, the target composition is calculated so as to realize the composition control of the performance corresponding to the execution instruction button Bm selected from among the execution instruction buttons Bm related to the performance.

In addition, during semi-automatic control, the image frame calculation unit F8 calculates a target image frame so that the composition instructed by the composition switching control unit F5 is realized.

The coordinate calculation unit F9 converts the information on the target image frame of the tracking camera 3 calculated by the image frame calculation unit F8 (coordinate information in the world coordinate system) into coordinate information in the coordinate system of the tracking camera 3. For this coordinate conversion, a coordinate conversion matrix obtained from the calibration process results by the calibration unit F1 is used.

The tripod head/camera control unit F10 controls the pan and tilt of the tripod head 4 and the zoom of the tracking camera 3 as necessary for each tracking camera 3, so as to obtain a composition that captures the range indicated by the image frame information that has been coordinate-converted by the coordinate calculation unit F9.
This makes it possible to control the composition of each tracking camera 3 to a target composition.

The cutout image generating unit F11 performs image cropping as necessary on the image captured by the parent camera 2 according to the image frame information for the virtual camera calculated by the image frame calculating unit F8, and generates captured images for CAM4, CAM5, and CAM6.

The image recognition processing unit F7 can receive images captured by the parent camera 2 and images captured by each tracking camera 3, and performs image recognition processing on the received images. The image recognition processing here includes at least a bone estimation process for a subject as a person.
In this embodiment, the image recognition results of the image recognition processing unit F7 for the images captured by each tracking camera 3, specifically, information on the simple model of the subject generated by the bone estimation processing, are used for the calibration processing by the calibration unit F1.

<4. Processing Procedure>
A specific example of a processing procedure to be executed by the CPU 11 in order to realize the function relating to the composition control of the embodiment described above will be described with reference to the flowcharts of FIGS.

FIG. 25 shows an example of a processing procedure relating to a screen display corresponding to a tracking target subject designated by the user.
It is assumed that the operation screen Gm is displayed on the display unit 17 when the process shown in this figure is executed.

First, in step S101, the CPU 11 waits until an individual display area of any target in the tracking target display area At on the operation screen Gm is operated.
If any of the individual display areas has been operated, the CPU 11 proceeds to step S102 and determines whether or not the image Ims captured by the distance measuring and imaging device 6 has not yet been displayed on the operation screen Gm.
If the captured image Ims has not been displayed, the CPU 11 proceeds to step S103 and executes a process of displaying the captured image Ims of the distance measuring/imaging device 6. For example, the CPU 11 executes a process of superimposing and displaying the captured image Ims of each distance measuring/imaging device 6 on the performance area Are of the operation screen Gm.
After executing the display process of step S103, the CPU 11 advances the process to step S104.

On the other hand, if it is determined in step S102 that the captured image Ims is not undisplayed, the CPU 11 skips the display process of step S103 and proceeds to step S104. In other words, if the captured image Ims is already displayed on the operation screen Gm in response to an operation to designate a subject to be tracked, for example, the display process of step S103 is not executed.

In step S104, the CPU 11 waits for an operation on the detection frame Fd (a selection operation on the detection frame Fd), and if an operation on the detection frame Fd has been performed, the process proceeds to step S105, where the CPU 11 performs a process of matching the subject of the operated detection frame Fd with the operated target. In other words, the CPU 11 performs a process of matching the subject appearing within the operated detection frame Fd with the subject whose individual display area was operated in step S101.

In step S106 following step S105, the CPU 11 performs a process of displaying the target name information for the detection frame Fd. That is, the CPU 11 performs a process of displaying the name information set for the subject whose individual display area was operated in step S101 at a position corresponding to the detection frame Fd in which the subject whose detection frame Fd was operated in step S104 appears.

In step S107 following step S106, the CPU 11 performs processing to display the captured image of the designated subject in the corresponding image display area ti of the tracking target display area At. That is, the CPU 11 performs processing to display the captured image of the subject designated as the tracking target subject (image captured by the distance measurement/imaging device 6), specifically, an image extracted from the captured image Ims, in the image display area ti in the individual display area operated in step S101.

In step S108 following step S107, the CPU 11 determines whether or not the process has ended, that is, whether or not a predetermined condition has been established under which the series of processes shown in FIG. 25 should be ended.
If it is determined in step S108 that the process is not complete, the CPU 11 returns to step S101. This makes it possible to accept the designation of another tracking target subject, display the name information of the designated subject in the detection frame Fd when a tracking target subject is designated, and so on.

If the CPU 11 determines in step S108 that the processing has ended, it ends the series of processing steps shown in FIG. 25.

FIG. 26 is a flowchart of a process related to display switching according to the brightness of the captured image Ims displayed on the operation screen Gm.
In step S201, the CPU 11 executes a brightness detection process for the captured image Ims of the distance measurement and image capture device 6. For example, a process for detecting an average luminance value of the captured image Ims is performed.

In step S202 following step S201, the CPU 11 determines whether or not the brightness is equal to or lower than a predetermined value, specifically, for example, whether or not the above-mentioned average luminance value is equal to or lower than a predetermined value.
For example, if it is determined that the average luminance value is equal to or lower than a predetermined value and the brightness is equal to or lower than a predetermined value, the CPU 11 proceeds to step S203 to determine whether or not a distance image is being superimposed, i.e., whether or not a distance image captured by the distance measuring/imaging device 6 is being superimposed on the captured image Ims.
If the distance image is not being superimposed, the CPU 11 proceeds to step S204, where it performs a distance image superimposing process, i.e., a process of superimposing the distance image captured by the distance measuring/imaging device 6 on the captured image Ims being displayed, and then proceeds to step S207.

On the other hand, if the CPU 11 determines in step S203 that the distance image is not being superimposed, it skips the superimposition process in step S204 and proceeds to step S207.

Furthermore, if the CPU 11 determines in the previous step S202 that the brightness is not below the predetermined value, the process proceeds to step S205 to determine whether or not the distance image is being superimposed, and if the distance image is being superimposed, the process proceeds to step S206, which is a process to stop the superimposition display of the distance image, and then the process proceeds to step S207.
On the other hand, if it is determined in step S205 that the distance image is not being superimposed, the CPU 11 skips the superimposition stop process in step S206 and advances the process to step S207.

In step S207, the CPU 11 determines whether or not processing has ended. If it determines that processing has not ended, the process returns to step S201, and if it determines that processing has ended, the process ends the series of steps shown in FIG. 26.

FIG. 27 is a flowchart of a process to be executed in response to a tracking composition designated by a user.
In step S301, the CPU 11 waits until the designation of the tracking composition is completed. Specifically, in this example, the CPU 11 performs a process of waiting until the end button B6 on the composition designation screen shown in FIG.

If it is determined that the end button B6 has been operated and the specification of the tracking composition has been completed, the CPU 11 proceeds to step S302 and performs processing to calculate the intersection between the vertical center line of the image frame Fs and the center line of the virtual simple model. In other words, the CPU 11 performs processing to calculate the intersection between the vertical center line Vc of the image frame Fs and the horizontal center line of the virtual simple model Mbv, which is the intersection J described with reference to FIG. 14B.

In step S303 following step S302, the CPU 11 calculates the vertical position deviation amount Vrc based on the height of the virtual simple model Mbv. In other words, the vertical position of the intersection point J is scaled by the first vertical scaler described above to calculate the vertical position deviation amount Vrc.

In step S304 following step S303, the CPU 11 calculates the horizontal offset amount Hof of the intersection point J based on the width of the image frame Fs. That is, the horizontal offset amount Hof is calculated by scaling the horizontal position of the intersection point J (the horizontal position of the horizontal center line of the virtual simplified model Mbv) using the horizontal scaler described above.

In step S305 following step S304, the CPU 11 performs a process of storing information on the zoom amount Zr obtained from the zoom operation when the composition was specified, the vertical position deviation amount Vrc, and the horizontal offset amount Hof, for example, a process of storing them in a predetermined storage device such as the storage unit 19.

The CPU 11 ends the series of processes shown in FIG. 27 after executing the process of step S305.

FIG. 28 is a flowchart of a process relating to calculation of a target image frame for realizing a specified tracking composition.
In step S401, the CPU 11 performs a process of acquiring three-dimensional coordinate information of the detection simple model Mbd of the tracking target. Specifically, in this example, the CPU 11 performs a process of acquiring the three-dimensional coordinate information of the detection simple model Mbd of the tracking target subject from the distance measuring/imaging device 6.

In step S402 following step S401, the CPU 11 executes a case determination process. That is, based on the stored value of the vertical position deviation amount Vrc, it determines which of the above-mentioned cases 1 to 4 applies.

If the case is case 1, the CPU 11 proceeds to step S403 to execute a center coordinate calculation process of the target image frame according to case 1, and proceeds to step S407. If the case is case 2, the CPU 11 proceeds to step S404 to execute a center coordinate calculation process of the target image frame according to case 2, and proceeds to step S407. If the case is case 3, the CPU 11 proceeds to step S405 to execute a center coordinate calculation process of the target image frame according to case 3, and proceeds to step S407. If the case is 4, the CPU 11 proceeds to step S406 to execute a center coordinate calculation process of the target image frame according to case 4, and proceeds to step S406.
Note that specific examples of the process of calculating the center coordinates of the target image frame for each of Cases 1 to 4 have already been described, so a duplicate description will be avoided.

In step S407, the CPU 11 performs processing to control the pan, tilt, and zoom of the target camera based on the calculated center coordinates of the target image frame and the stored zoom amount Zr information.

After executing the process of step S407, the CPU 11 ends the series of processes shown in FIG. 28.

<5. About calibration>
The calibration process performed by the calibration unit F1 will now be described.
As mentioned earlier, the calibration process in this embodiment is a process for understanding the positional relationship between each tracking camera 3 and each ranging/imaging device 6, and more specifically, a process for identifying the same points that appear in both the distance image obtained by the ranging/imaging device 6 and the image obtained by the tracking camera 3 between each ranging/imaging device 6 and each tracking camera 3.

If the subject of imaging is, for example, a live music event, the calibration process must often be performed while various preparations for the start of the event are being carried out on stage. In such cases, if calibration is performed using a conventional test board, the stage must be occupied for calibration, which causes a delay in the preparation work, and is undesirable.

In light of these circumstances, in this embodiment, a calibration process is performed to identify the positions of identical parts between a sensor device side simple model, which is a simple model of the subject generated from a distance image obtained by the distance measurement/imaging device 6, and a camera side simple model, which is a simple model of the subject generated from an image obtained by the tracking camera 3.

Specifically, as shown in FIG. 29, a person 50 is placed on a stage as a subject for calibration, and a calibration process is performed to identify the positions of identical parts between a simple model on the sensor device side and a simple model on the camera side of this person 50.
As shown in Figure 29, the coordinates of a certain part of the sensor device side simple model of person 50 are represented by the XYZ three-dimensional spatial coordinates = P _N (x _N , y _N , z _N ), while the coordinates of the same part of the camera side simple model of person 50 are represented by coordinates in the camera coordinate system (uv coordinate system) = Q _N (u _N , v _N ).
In the calibration process in this case, it is sufficient to identify the three-dimensional position detected on the sensor device side simple model and the uv coordinate system position detected on the camera side simple model for a certain part of the person 50 in this manner.

In this case, during calibration, the tracking function of person 50 allows person 50 to move around on stage, so that performing calibration does not interfere with preparations for the start of the event.

In this embodiment, a first method and a second method are proposed as calibration processing methods.
FIG. 30 is an explanatory diagram of a calibration processing technique as the first technique.
In the first method, data sampling for calibration is performed while switching the target distance measuring/image capturing device 6. In this first method, data sampling for calibration is performed the same number of times as the number of distance measuring/image capturing devices 6.

A specific example of the procedure for performing calibration is shown in the flow chart of FIG.
First, in step S51, the user selects one distance measuring and image capturing device 6 to be calibrated.
In the next step S52, the user adjusts the pan, tilt and zoom of all the cameras (all the tracking cameras 3) so that the distance measurement range of the selected distance measurement/image capture device 6 is captured.

Then, in step S53, the user instructs the information processing device 1 to start a sampling process for calibration. The sampling process for calibration here refers to a process of acquiring coordinate information (three-dimensional coordinates) of a specified part of the simple model on the sensor device side generated for the person 50 on the stage, and coordinate information (uv coordinates) of a specified part of the simple model on the camera side generated for the person 50, as sample data for calibration.

In step S54 following step S53, person 50, who is the subject of the calibration, moves through the selected distance measurement range of distance measurement/image capture device 6 while raising both hands. By having person 50 raise both hands, person 50 can be recognized by information processing device 1 even if there is a person other than person 50 on the stage.

Then, in step S55 following step S54, it is determined whether or not sufficient samples have been obtained. If sufficient samples have not been obtained, the movement of person 50 in step 54 continues. If sufficient samples have been obtained, in step S56, the user instructs information processing device 1 to end the calibration sampling process.

In the first method, the procedure shown in FIG. 31 is performed for each distance measurement/imaging device 6 while switching between the target distance measurement/imaging devices 6.

FIG. 32 is a flowchart of a process executed by the CPU 11 of the information processing device 1 when the first method is adopted.
In step S501, the CPU 11 waits for the start of sampling processing. Specifically, the CPU 11 waits for an instruction to start the sampling processing for calibration performed in step S53 in FIG.

If it is determined that the sampling process has started, the CPU 11 proceeds to step S502 and acquires the simple model information obtained by the distance measurement/imaging device 6. That is, information on the sensor device side simple model obtained by the distance measurement/imaging device 6 selected as the target (for example, coordinate information of a specific part of the sensor device side simple model) is acquired.

In step S503 following step S502, the CPU 11 extracts and stores simple model information of the subject for calibration. That is, from the acquired information on the sensor device side simple model, only information on the person 50 with both hands raised is extracted, and the information is stored in a predetermined storage device such as the storage unit 19.

In step S505 following step S504, the CPU 11 executes image recognition processing on the images captured by each camera (each tracking camera 3), and in the subsequent step S505, extracts and stores simple model information of the calibration subject (i.e., person 50) from the image recognition processing results of each camera.

In step S506 following step S505, the CPU 11 determines whether or not the sampling process has ended, that is, whether or not an instruction to end the sampling process for calibration performed in step S56 in FIG. 31 has been issued.
If it is determined that the sampling process has not ended, the CPU 11 returns to step S502, so that the calibration sampling process continues.

On the other hand, if it is determined that the sampling process has ended, the CPU 11 proceeds to step S507, and performs a process of identifying identical points between the distance measurement/image capture device 6 and each camera based on the stored simple model information, and then ends the series of processes shown in FIG. 32.

Here, the process of the CPU 11 corresponding to the first technique described with reference to FIG. 32 can be expressed as follows.
In other words, the composition is adjusted so that each tracking camera 3 captures the ranging range of a targeted ranging/imaging device 6, and a unit sampling process is performed to sample the positions of each part of the simple model on the sensor device side and the simple model on the camera side, and the target ranging/imaging device 6 is switched, for each ranging/imaging device 6.

By performing the above-mentioned unit sampling process for each distance measuring/image capturing device 6, it is possible to obtain sample data necessary for calibration processing (identification of the same points) between each tracking camera 3 and each distance measuring/image capturing device 6.
In each unit sampling process, the imaging range of each tracking camera 3 can be made to sufficiently overlap with the distance measurement range of the target distance measurement/imaging device 6, so that the calibration process can be directly performed between the target distance measurement/imaging device 6 and each tracking camera 3. Therefore, by performing the unit sampling process for each distance measurement/imaging device 6 while switching the target distance measurement/imaging device 6 as described above, the calibration process between each tracking camera 3 and each distance measurement/imaging device 6 can be performed with high accuracy.

FIG. 33 is an explanatory diagram of a calibration processing technique as the second technique.
The second technique is a technique that enables the necessary number of calibration sampling processes to be reduced to one.
If the imaging ranges of all the tracking cameras 3 can be sufficiently overlapped with the distance measurement ranges of all the distance measurement/imaging devices 6, the number of sampling processes for calibration can be reduced to one. However, according to the arrangement of the tracking cameras 3 and the distance measurement/imaging devices 6 in this example, when the tracking camera 3-1 tries to sufficiently overlap its imaging range with the distance measurement ranges of the distance measurement/imaging devices 6-2 and 6-4 arranged on the opposite side (upper side), it cannot sufficiently overlap its imaging range with the distance measurement ranges of the distance measurement/imaging devices 6-1 and 6-3 arranged on the same side (lower side). Similarly, when the tracking camera 3-2 tries to sufficiently overlap its imaging range with the distance measurement ranges of the distance measurement/imaging devices 6-1 and 6-3 arranged on the opposite side (lower side), it cannot sufficiently overlap its imaging range with the distance measurement ranges of the distance measurement/imaging devices 6-2 and 6-4 arranged on the same side (upper side).
It is not possible to directly identify the same point between the tracking camera 3 and the distance measurement/imaging device 6, which are in a relationship in which the imaging range cannot be fully overlapped with the distance measurement range. Therefore, in this case, it is possible to separately execute the calibration sampling process for each of the following states: the tracking camera 3-1 fully overlaps its imaging range with the distance measurement range of the distance measurement/imaging devices 6-2 and 6-4 and the tracking camera 3-2 fully overlaps its imaging range with the distance measurement range of the distance measurement/imaging devices 6-1 and 6-3, and the tracking camera 3-1 fully overlaps its imaging range with the distance measurement range of the distance measurement/imaging devices 6-1 and 6-3 and the tracking camera 3-2 fully overlaps its imaging range with the distance measurement range of the distance measurement/imaging devices 6-2 and 6-4.

However, in this example, the tracking camera 3-3 placed at the FOH is capable of capturing the entire stage, and therefore can fully overlap its own imaging range with the ranging ranges of all ranging/imaging devices 6 and the imaging ranges of all other tracking cameras 3 (i.e. 3-1 and 3-2).

In the second method, therefore, the calibration process between the tracking camera 3 and the distance measurement/imaging device 6, which are in a relationship in which the imaging range cannot be sufficiently overlapped with the ranging range, is performed along the tracking camera 3-3 arranged in the HOF, thereby reducing the number of required sampling processes for calibration to just one.

FIG. 34 is a flowchart showing a specific example of a procedure for performing calibration in the second method.
In this case, in step S61, the user adjusts the pan, tilt, and zoom of each camera (each tracking camera 3) as follows.
That is, FOH (tracking camera 3-3): Captures the entire stage (sufficiently overlapping the distance measuring ranges of all distance measuring/imaging devices 6, the imaging ranges of tracking cameras 3-1, and the imaging ranges of tracking cameras 3-2).
Downstream (tracking camera 3-1): Captures the distance measurement range of the distance measurement/imaging device 6 (6-2 and 6-4) installed on the upstream side.
Upstream (tracking camera 3-2): Captures the distance measurement range of the distance measurement/imaging device 6 (6-1 and 6-3) installed downstream.
Next, in step S62, the user instructs the information processing device 1 to start the calibration sampling process.

In step S63 following step S62, the person 50 who is the subject of the calibration moves through the distance measurement ranges of all distance measurement and image capture devices 6 while raising both hands.

Then, in step S64 following step S63, it is determined whether or not sufficient samples have been obtained. If sufficient samples have not been obtained, the movement of person 50 in step 63 continues. If sufficient samples have been obtained, in step S65, the user instructs information processing device 1 to end the calibration sampling process.

In this case, the CPU 11 performs the same process as shown in Fig. 32 as the process related to the calibration. However, in this case, the CPU 11 performs the process of identifying the same point in step S507 along the tracking camera 3-3 for identifying the tracking camera 3 and the distance measurement/imaging device 6, which are in a relationship in which the imaging range cannot be sufficiently overlapped with the distance measurement range.
Specifically, the process of identifying the same points between the tracking camera 3-1 and the distance measurement/image capture devices 6-1 and 6-3 is performed based on the result of identifying the same points between the tracking camera 3-1 and the tracking camera 3-3 and the result of the calibration process between the tracking camera 3-3 and the distance measurement/image capture devices 6-1 and 6-3. The process of identifying the same points between the tracking camera 3-2 and the distance measurement/image capture devices 6-2 and 6-4 is performed based on the result of identifying the same points between the tracking camera 3-2 and the tracking camera 3-3 and the result of the calibration process between the tracking camera 3-3 and the distance measurement/image capture devices 6-2 and 6-4.

By adopting the second method as described above, it is possible to eliminate the need to separately execute calibration sampling processes for each of the following states: the state in which the tracking camera 3-1 sufficiently overlaps its own imaging range with the ranging ranges of the ranging/imaging devices 6-2 and 6-4 and the tracking camera 3-2 sufficiently overlaps its own imaging range with the ranging ranges of the ranging/imaging devices 6-1 and 6-3, and the state in which the tracking camera 3-1 sufficiently overlaps its own imaging range with the ranging ranges of the ranging/imaging devices 6-1 and 6-3 and the tracking camera 3-2 sufficiently overlaps its own imaging range with the ranging ranges of the ranging/imaging devices 6-2 and 6-4. This makes it possible to reduce the number of times that calibration sampling processes are required to just one.

Here, the calibration processing method as the second method described above can be expressed as follows.
That is, there are multiple distance measuring/imaging devices 6 and tracking cameras 3, and the tracking cameras 3 include a first tracking camera (tracking camera 3-3) arranged so as to be able to image the imaging ranges of all other tracking cameras 3 and the distance measuring ranges of all distance measuring/imaging devices 6, and a second tracking camera (tracking camera 3-1 or 3-2) arranged on either the left or right side as viewed from the first tracking camera and arranged so as to be able to image the imaging range of the first tracking camera, and the distance measuring/imaging device 6 includes a same-side sensor device and an opposite-side sensor device arranged at a position on the same side as the second tracking camera and a position on the opposite side as the second tracking camera, as viewed from the first tracking camera, and the calibration unit F1 performs a calibration process between the first tracking camera and the same-side sensor device based on the identification result of the same point between the second tracking camera and the first tracking camera and the result of the calibration process between the first tracking camera and the same-side sensor device.

If calibration processing between each tracking camera 3 and each ranging/imaging device 6 is to be achieved with a single sampling process, the imaging range of each tracking camera 3 needs to be sufficiently overlapped with the ranging ranges of all ranging/imaging devices 6. However, for the second tracking camera, if an attempt is made to make its imaging range sufficiently overlap with the ranging range of the opposite side sensor device, it may not be possible to sufficiently image the ranging range of the same side sensor device, in which case calibration processing between the second tracking camera and the same side sensor device cannot be performed directly. According to the processing of the calibration unit F1 described above, even if calibration processing between the second tracking camera and the same side sensor device cannot be performed directly, calibration processing can be performed along the first tracking camera. In other words, when performing calibration processing between the second tracking camera and the opposite-side sensor device, and between the second tracking camera and the same-side sensor device, there is no need to perform separate sampling processing for calibration in a state where the imaging range of the second tracking camera is sufficiently overlapped with the ranging range of the opposite-side sensor device, and in a state where the imaging range of the second tracking camera is sufficiently overlapped with the ranging range of the same-side sensor device, and the calibration processing between each tracking camera and each sensor device can be performed with a single sampling processing.

In addition, in the calibration process, as shown in FIG. 35, the user can specify on the image the position that is to be the same point on the object on the stage.
At this time, when the user specifies a position that is the same point on the image captured by the distance measuring/imaging device 6, it is also possible to detect the brightness of the captured image and display a distance image in a superimposed manner if the brightness is equal to or lower than a predetermined value. This allows the user to properly specify the same point even when the stage is dark.

6. Modifications
Here, the embodiment is not limited to the specific examples described so far, and various modified configurations may be adopted.
For example, in the above description, the tracking composition is a composition for tracking a single subject, but it is also possible to specify a tracking composition that includes a plurality of subjects, that is, a composition for a group shot.

FIG. 36 shows an example of a group shot setting reception screen for receiving a designation of a tracking composition for such a group shot.
As shown in FIG. 36A, a possible reception screen for group shot settings may first be a screen displaying check boxes cbg for specifying the tracking target for the group shot, with the subject whose name information has been set in the subject name setting area As2 described above (see FIGS. 6 and 7).
The user can select a subject by checking the check box cbg as a tracking subject to be included in the group shot, and can confirm the selection of the tracking target subject by operating the OK button B11 in the figure.

When the OK button B11 is operated, a screen appears on which virtual simple models Mbv for a plurality of subjects designated as tracking targets are arranged, as shown in FIG. 36B.
On this screen, the user performs an operation to select a part that the user wishes to include in the composition from among the parts displayed on the virtual simple model Mbv, and operates the end button B12.

In this case, the target image frame can be calculated based on information about the detection simple models Mbd for the subjects designated as the subjects to be tracked in the group shot, obtained by the ranging/imaging device 6, so that all of the parts specified in the group shot settings in the detection simple models Mbd are included within the image frame.
As one example, the position where the average value of the minimum and maximum values on each axis (e.g., X, Y, and Z axes) of the specified part in each simple detection model Mbd is set as the center of the image frame, and an image frame of a size such that all of the specified parts fit within the image frame can be calculated.

The center of the target image frame may be set at a position offset by a specified amount in the horizontal and vertical directions from the position that corresponds to the average value.
As for the image frame size, it is also possible to set a size that includes all the specified parts as close as possible as a reference size, and to set a size that is offset from that size by a specified amount.

In addition, with regard to composition control for group shots, it is possible to designate a main person, and exclude people who are further away from the main person than a designated distance from the main person from being tracked.
It is also possible to configure the system so that it tracks only people who enter a pre-specified range in three-dimensional space.

FIG. 37 shows an example of an execution instruction button Bm ("vocals, guitar" in the figure) for instructing a tracking composition as a group shot.
A tracking indicator It may also be provided for the execution instruction button Bm for instructing a tracking composition as a group shot. In this case, the tracking indicator It may be displayed in different ways in each of the following states: when all subjects to be tracked are detected, when only some of the subjects are detected, and when none of the subjects are detected. For example, the tracking indicator It may be displayed in blue when all of the subjects are detected, in yellow when only some of the subjects are detected, and in gray when none of the subjects are detected.

In addition, regarding tracking of a subject, it is also possible to variably set the tracking sensitivity.
FIG. 38 shows an example in which a setting operation section for tracking sensitivity is provided in, for example, the performance area Are on the operation screen Gm, as indicated by "K" in the figure.
In this case, even while the tracking target subject is being tracked, a user may issue an instruction to switch the sensitivity, so that the CPU 11 in this case performs tracking sensitivity switching control while the tracking camera 3 is tracking the tracking target subject.
In this case, it is conceivable that the tracking sensitivity can be switched between three levels, for example, low, medium, and high.

For example, it is possible to appropriately adjust the tracking sensitivity according to the situation, such as lowering the sensitivity during MCs, increasing the sensitivity for intense songs, and lowering the sensitivity for calm songs.
It is also possible to automate the switching of sensitivity.
At this time, as an automation, if the semi-automatic control is turned OFF (because it can be assumed that the MC is in effect), the sensitivity can be switched to a lower level, for example.

Also, even when the tracking target subject is being tracked, it is possible to adjust the composition in response to a manual operation input using, for example, a joystick 30 or the like provided as part of the input unit 16 .
FIG. 39 shows an example in which a manual adjustment button Ba for instructing switching to such manual adjustment is arranged, for example, in a performance area Are on an operation screen Gm.
For example, by turning on the manual adjustment button, the composition controlled in accordance with the selected execution instruction button Bm can be adjusted by manual operation input using the joystick 30 or the like.

At this time, manual operation input using the joystick 30 or the like is accepted by the camera control protocol for the virtual camera of the information processing device 1 (CPU 11) as shown in FIG. 40, and a control signal based on the manual operation input is input via the switching unit in the figure to the camera control protocol for the relevant camera among the camera control protocols for CAM1, CAM2, and CAM3, so that the composition of the relevant camera is controlled to the composition based on the manual operation input.

Also, as shown in FIG. 41, information indicating the ratio of subjects selected as PGM out for each tracking target subject can be displayed on the operation screen Gm (see "R" in the figure).
In this case, the information showing the ratio may be displayed as statistical information for the entire live performance or for each song. It is also preferable that the information showing the ratio can be reset in response to a predetermined operation such as the operation of a reset button.

In addition, in the embodiment, it has been mentioned that when a subject to be tracked is specified, the captured images Ims are displayed on the operation screen Gm (see FIG. 10, etc.), but it is also conceivable to make it possible to change the order in which the captured images Ims are arranged on the screen.
This makes it possible, for example, when the arrangement of the distance measuring and image capturing devices 6 is changed, to change the arrangement order of the captured images Ims on the screen in accordance with the change in the arrangement.

In addition, in the above explanation, an example has been given in which the switcher 5 is configured as a hardware device, but the switcher 5 may also be realized by a software program executed by the information processing device 1.

In addition, while the explanation so far has been given of an example in which the event to be imaged is a live music event, this technology can also be suitably applied to events in which other events are to be imaged, such as musicals and other events where the performances are held on a stage (either indoors or outdoors), program recordings in studios, and sporting events such as baseball, soccer, basketball, and volleyball.

7. Recording medium
The information processing device (No. 1) has been described as an embodiment, and the recording medium of the embodiment is a recording medium on which a program for causing a computer device such as a CPU to execute processing as the information processing device 1 is recorded.

The recording medium of the embodiment is a recording medium having a program recorded thereon that can be read by a computer device, and is a recording medium having a program recorded thereon that causes the computer device to realize a designation reception processing function that allows a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from among subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
That is, this recording medium corresponds to a recording medium on which a program for causing a computer device to execute the processes described with reference to FIG. 25 and the like is recorded.

In this case, the program can be stored in advance in a computer-readable recording medium, such as a ROM, a hard disk drive (HDD), or a solid state drive (SSD). Alternatively, the program can be temporarily or permanently stored in a removable recording medium, such as a semiconductor memory, a memory card, an optical disk, a magneto-optical disk, or a magnetic disk. Such a removable recording medium can be provided as a so-called package software.
In addition, the program in this case can be installed from a removable recording medium onto a personal computer or the like, or can be downloaded from a download site to a required information processing device such as a smartphone via a network such as a LAN or the Internet.

8. Summary of the embodiment
As described above, the information processing device (same embodiment 1) is equipped with a designation reception processing unit (designation reception unit F2) that performs processing to allow a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from among subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
According to the above configuration, even if the subject that the user wants to track is located outside the imaging range of the tracking camera, a sensor device other than the tracking camera can capture the subject in a sensing image, and the user can specify the subject that he or she wants to track from the sensing image of the sensor device. In addition, if a sensor device having a light receiving sensor capable of receiving infrared light is used, even if the user cannot identify the subject from the image captured by the tracking camera in a dark environment, the subject can be identified in the sensing image by the sensor device, and it is possible to increase the ease of specifying the subject to be tracked.
In this manner, according to the information processing device of the embodiment, it is possible to ensure that the tracking target subject is appropriately specified by the tracking camera.

In addition, in an information processing device as an embodiment, the sensor device has a light receiving sensor including a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and has an imaging function for obtaining an image of visible light based on the light receiving signal of the visible light sensor, and a distance measurement function for obtaining a distance image based on the light receiving signal of the infrared light sensor, and the designation reception processing unit performs processing for displaying an image based on the distance image as an image for accepting designation of a subject to be tracked in response to the brightness of the image being lower than a predetermined brightness when an image captured by the sensor device is displayed as an image for accepting designation of a subject to be tracked.
As a result, when the ambient light of the subject becomes dark and it becomes difficult for the user to identify the subject from the image captured by the sensor device, an image based on the distance image captured by the sensor device, specifically the distance image itself or an image of a simple model of the subject generated based on the distance image, is displayed as an image for accepting the designation of the subject to be tracked, making it easier to designate the subject to be tracked even in a dark environment.
Therefore, it is possible to appropriately specify the tracking target in a dark environment.

Furthermore, in an information processing device as an embodiment, in response to a subject being designated as a tracking target subject from a subject appearing in a sensing image captured by a sensor device, the designation reception processing unit performs processing to display name information of the tracking target subject at a position corresponding to the detection frame of the tracking target subject superimposed on the display image of the sensing image.
This allows the user to intuitively understand whether or not the tracking target subject has been designated as intended based on whether or not name information of the tracking target subject is displayed in the detection frame of the tracking target subject.

Furthermore, in the information processing device as an embodiment, the designation reception processing unit displays a sensing image captured by the sensor device on an operation screen where an instruction operation can be performed as to whether or not to track the tracked subject using at least the tracking camera, and in response to a tracked subject being designated from a subject appearing in the sensing image, displays an image of the designated tracked subject extracted from the sensing image in a tracking target display area provided on the operation screen that displays name information of the tracked subject.
This allows the user to intuitively understand whether or not the tracking target subject has been designated as intended based on whether or not a sensing image of the tracking target subject is displayed in the tracking target display area provided on the operation screen.

In addition, in the information processing device as the embodiment, the designation reception processing unit receives a designation of a tracking composition of a subject by a tracking camera as a designation of a tracking image frame for a display image of a virtual simple model, which is a virtual simple model of the subject.
This allows the specification of a highly flexible composition that allows the tracking target subject to be positioned at any position within the image frame as the tracking composition. Also, according to the above configuration, the specification of the tracking composition is accepted as the specification of the tracking image frame for the virtual simple model of the subject that is visually displayed, so that it is possible to make it easier for the user to imagine the composition of the captured image obtained by tracking, and the ease of specification of the tracking composition can be improved.

Furthermore, in the information processing device as the embodiment, the designation reception processing unit receives a designation of a subject tracking composition by the tracking camera as a designation of the position of the tracking image frame and the size of the tracking image frame relative to the display image of the virtual simple model.
As a result, when specifying a subject tracking composition, the user needs to perform only two operations: an operation of specifying the position of the tracking image frame relative to the virtual simple model, and an operation of specifying the size of the tracking image frame.
Therefore, the operational burden on the user when specifying a subject tracking composition can be reduced.

Furthermore, in an information processing device as an embodiment, the sensor device has an infrared light sensor capable of receiving infrared light as a light receiving sensor, and has a distance measurement function for obtaining a distance image based on the light receiving signal of the infrared light sensor, and is equipped with an image frame calculation unit (F8) that calculates a target image frame of the tracking camera for tracking the tracking subject based on three-dimensional position information of a detection simple model, which is a simple model of the tracking subject generated based on the distance image obtained by the sensor device, and the designation reception processing unit performs a process of storing information indicating the positional deviation of the image frame center of the tracking image frame relative to the virtual simple model as model image frame center position deviation information, and the image frame calculation unit calculates the target image frame so that the position deviation indicated by the model image frame center position deviation information is reproduced as a position deviation of the image frame center of the target image frame relative to the detection simple model.
In this way, by focusing on the positional shift of the center of the image frame relative to the virtual simple model in the specified tracking image frame and reproducing this positional shift as the positional shift of the image frame centers of the detected simple model and the target image frame, the tracking camera can be appropriately controlled so that the tracking composition matches the specified composition.

In addition, in an information processing device as an embodiment, the designation reception processing unit calculates the vertical position shift amount indicating the vertical position shift amount included in the model image frame center position shift information as a relative value scaled by a first scaler that uses the vertical position of a specified part of the virtual simple model as a reference value and the length between specified parts spaced apart vertically in the virtual simple model as a reference unit, and the image frame calculation unit determines the vertical center position of the target image frame to the vertical position specified by the vertical position shift amount in a second scaler that uses the vertical position of a specified part of the detection simple model as a reference value and the length between specified parts spaced apart vertically in the detection simple model as a reference unit.
This allows the designated tracking composition to be properly reproduced even if the size of the virtual simple model used when designating the tracking composition differs from the size of the actual detection simple model (the size of the tracking target subject). In other words, the designated tracking composition can be properly reproduced regardless of the size of the tracking target subject.

Furthermore, in the information processing device as an embodiment, the sensor device has, as a light receiving sensor, a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and has an imaging function of obtaining an image of visible light based on a light receiving signal of the visible light sensor, and a distance measuring function of obtaining a distance image based on the light receiving signal of the infrared light sensor. The sensor device is provided with an image frame calculation unit that calculates a target image frame of the tracking camera for tracking the tracking target subject based on three-dimensional position information of a detection simple model that is a simple model of the tracking target subject generated based on the distance image obtained by the sensor device. The specification reception processing unit performs processing to display a specification reception screen on which a plurality of virtual simple models, which are virtual simple models of the subjects, are arranged as a specification reception screen for the tracking target subject during group tracking in which a plurality of subjects are tracking target subjects of the tracking camera, and accepts specification of parts from each of the multiple arranged virtual simple models. When a detection simple model of each subject that is the tracking target of group tracking has been obtained, the image frame calculation unit calculates the target image frame so that the part specified on the specification reception screen among the detected parts of each of the tracking targets is included within the target image frame.
As a result, when specifying a tracking composition during group tracking, the user only needs to perform an operation of specifying a portion to be included in the tracking composition from at least a plurality of arranged virtual simple models.
Therefore, it is possible to facilitate the specification of a tracking composition during group tracking.
Also, it is possible to facilitate the specification of a tracking composition that includes important portions of each of a plurality of subjects that are set as tracking targets.

Furthermore, the information processing device according to the embodiment is configured to be capable of switching control of the tracking sensitivity of the tracking camera while the tracking camera is tracking the tracking target subject.
During the progress of the event to be imaged, the state of the subject may change. For example, in a live music concert, the state of the subject may change in various ways, such as when the subject is giving an MC, singing or playing an intense song, or singing or playing a calm song. In this case, it may be necessary to switch the tracking sensitivity depending on the state of the subject, for example, it is desirable to avoid sudden changes in the tracking composition during the MC, while it is desirable to suddenly change the tracking composition when singing or playing an up-tempo song.
According to the above configuration, it is possible to switch the tracking sensitivity when the event to be imaged is in progress and the subject is being tracked by the tracking camera, thereby realizing appropriate tracking sensitivity control in response to changes in the state of the subject while the event to be imaged is in progress.

In addition, the information processing device as an embodiment is configured to be able to execute specified tracking composition control that controls the composition of the tracking camera so that it becomes a tracking composition specified by the user for the subject to be tracked, and semi-automatic composition control that selects a composition of the tracking camera based on a composition selection table in which weighting information is associated with each combination of a subject to be imaged by the tracking camera and a composition type of the tracking camera, and controls the composition of the tracking camera so that it becomes the selected composition, and the specification reception processing unit accepts an instruction operation to execute the specified tracking composition control, and an instruction operation to execute the semi-automatic composition control.
This makes it possible to execute semi-automatic composition control in response to a user's instruction as the composition control of the tracking camera, in addition to the designated tracking composition control.
For example, for captured image content for events such as live music concerts, it is desirable to create high-quality content that does not tire the viewer, for example by appropriately switching the camera's composition. However, it may be difficult for an inexperienced user to select an appropriate composition, and this may require the work of an expert. In addition, manually selecting the composition in the first place increases the cost of creating captured image content.
According to the above-described semi-automatic composition control, the composition switching of the tracking camera is performed automatically based on the composition selection table, and by setting the weight information in the composition selection table, the composition switching manner of the tracking camera can be appropriately set.
Therefore, for captured image content involving composition switching, it is possible to achieve both an improvement in content quality and a reduction in the operational costs involved in content creation.

Furthermore, in the information processing device as an embodiment, the sensor device has an infrared light sensor capable of receiving infrared light as a light receiving sensor, and has a distance measurement function of obtaining a distance image based on the light receiving signal of the infrared light sensor, and is equipped with a calibration processing unit (calibration unit F1) that performs a calibration process to identify the same point that appears in both the distance image obtained by the sensor device and the captured image obtained by the tracking camera, and identifies the position of the same part between a sensor device side simple model, which is a simple model of the subject generated from the distance image obtained by the sensor device, and a camera side simple model, which is a simple model of the subject generated from the captured image obtained by the tracking camera.
By performing the above-described calibration process, it becomes possible to appropriately convert a target tracking image frame determined in a three-dimensional coordinate system by the sensor device into an image frame in the coordinate system of the tracking camera.
Furthermore, since the calibration method uses a simple model of the subject, it is possible to realize calibration based on detected parts of a simple model of the subject for calibration, such as a subject with both hands raised.

Furthermore, in an information processing device as an embodiment, the number of sensor devices and tracking cameras are multiple, and the calibration processing unit performs a unit sampling process for each sensor device while switching the target sensor device, sampling the positions of parts of the sensor device side simple model and the camera side simple model with the composition adjusted so that each tracking camera captures the ranging range of a single target sensor device.
By performing the above-mentioned unit sampling process for each sensor device, it is possible to obtain sample data necessary for calibration processing (identification of the same point) between each tracking camera and each sensor device.
In each unit sampling process, the imaging range of each tracking camera can be made to overlap sufficiently with the distance measurement range of the target sensor device, so that the calibration process between the target sensor device and each tracking camera can be directly performed. Therefore, by performing the unit sampling process for each sensor device while switching the target sensor device as described above, the calibration process between each tracking camera and each sensor device can be performed with high accuracy.

In addition, in an information processing device as an embodiment, the number of sensor devices and tracking cameras are each multiple, and the tracking cameras include a first tracking camera arranged so as to be able to capture the imaging ranges of all other tracking cameras and the ranging ranges of all sensor devices, and a second tracking camera arranged on either the left or right side as viewed from the first tracking camera and arranged so as to be able to capture the imaging range of the first tracking camera, and the sensor devices include a same-side sensor device and an opposite-side sensor device arranged at a position on the same side as the second tracking camera as viewed from the first tracking camera, and a calibration processing unit performs calibration processing between the first tracking camera and the same-side sensor device based on the identification result of the same points between the second tracking camera and the first tracking camera and the result of the calibration processing between the first tracking camera and the same-side sensor device.
When the calibration process between each tracking camera and each sensor device is to be realized by a single sampling process, it is necessary to make the imaging range of each tracking camera sufficiently overlap the distance measurement range of all sensor devices. However, when the imaging range of the second tracking camera is to be sufficiently overlapped with the distance measurement range of the opposite side sensor device, it may not be possible to sufficiently image the distance measurement range of the same side sensor device. In that case, the calibration process between the second tracking camera and the same side sensor device cannot be performed directly. According to the above configuration, even if the calibration process between the second tracking camera and the same side sensor device cannot be performed directly, the calibration process can be performed along the first tracking camera. That is, when realizing the calibration process between the second tracking camera and the opposite side sensor device and between the second tracking camera and the same side sensor device, it is not necessary to perform sampling processes for calibration in a state where the imaging range of the second tracking camera is sufficiently overlapped with the distance measurement range of the opposite side sensor device and a state where the imaging range of the second tracking camera is sufficiently overlapped with the distance measurement range of the same side sensor device, and the calibration process between each tracking camera and each sensor device can be realized by a single sampling process.

Furthermore, in the information processing device according to the embodiment, the event to be imaged by the tracking camera is a live music event.
This makes it possible to appropriately specify the subject to be tracked by the tracking camera when the event to be imaged is a live music event.

In addition, an information processing method as an embodiment is an information processing method in which an information processing device performs processing to allow a user to select a subject to be tracked by a tracking camera capable of tracking a subject from among subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
With such an information processing method, it is possible to obtain the same functions and effects as those of the information processing device according to the above embodiment.

Note that the effects described in this specification are merely examples and are not limiting, and other effects may also be present.

<9. This Technology>
The present technology can also be configured as follows.
(1)
An information processing device comprising: a designation reception processing unit that performs processing to allow a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
(2)
the sensor device has, as the light receiving sensor, a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and has an imaging function of obtaining an image of visible light based on a light receiving signal of the visible light sensor, and a distance measuring function of obtaining a distance image based on the light receiving signal of the infrared light sensor,
The designation reception processing unit includes:
The information processing device described in (1) performs a process of displaying an image based on the distance image as an image for accepting designation of the tracking target subject when the brightness of the captured image becomes equal to or lower than a predetermined brightness while an image captured by the sensor device is being displayed as an image for accepting designation of the tracking target subject.
(3)
The designation reception processing unit includes:
The information processing device described in (1) or (2) performs a process of displaying name information of the tracking target subject at a position corresponding to a detection frame of the tracking target subject superimposed on a display image of the sensing image in response to the tracking target subject being designated from a subject appearing in a sensing image captured by the sensor device.
(4)
The designation reception processing unit includes:
displaying a sensing image captured by the sensor device on an operation screen on which an instruction operation as to whether or not to track the tracking target subject by at least the tracking camera can be performed;
The information processing device according to any one of (1) to (3), wherein, in response to the tracking target subject being designated from among the subjects appearing in the sensing image, a process is performed to display an image of the designated tracking target subject extracted from the sensing image in a tracking target display area provided on the operation screen that displays name information of the tracking target subject.
(5)
The designation reception processing unit includes:
The information processing device according to any one of (1) to (4), wherein a designation of a tracking composition of the subject by the tracking camera is accepted as a designation of a tracking image frame for a display image of a virtual simple model that is a virtual simple model of the subject.
(6)
The designation reception processing unit includes:
The information processing device according to (5), wherein a designation of a tracking composition of the subject by the tracking camera is accepted as a designation of a position of the tracking image frame and a size of the tracking image frame with respect to a display image of the virtual simple model.
(7)
the sensor device has an infrared light sensor capable of receiving infrared light as the light receiving sensor, and has a distance measuring function of obtaining a distance image based on a light receiving signal of the infrared light sensor,
a frame calculation unit that calculates a target frame of the tracking camera for tracking the tracking target subject based on three-dimensional position information of a detection simple model that is a simple model of the tracking target subject generated based on the distance image obtained by the sensor device,
The designation reception processing unit includes:
A process is performed to store information indicating a positional deviation of the image frame center of the tracking image frame with respect to the virtual simple model as image frame center positional deviation information with respect to the model;
The image frame calculation unit
The information processing apparatus according to (6), wherein the target image frame is calculated so that the position shift indicated by the information on position shift of the image frame center relative to the model is reproduced as a position shift of the image frame center of the target image frame relative to the detected simple model.
(8)
The designation reception processing unit includes:
a vertical positional deviation amount indicating a vertical positional deviation amount included in the model image frame center positional deviation information is calculated as a relative value scaled by a first scaler that uses a vertical position of a predetermined portion of the virtual simple model as a reference value and a length between predetermined portions spaced apart in the vertical direction in the virtual simple model as a reference unit;
The image frame calculation unit
The information processing device described in (7) above, in a second scaler that uses the vertical position of the specified part of the detection simple model as a reference value and the length between the specified parts vertically spaced apart in the detection simple model as a reference unit, determines the vertical center position of the target image frame to a vertical position specified by the vertical position shift amount.
(9)
the sensor device has, as the light receiving sensor, a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and has an imaging function of obtaining an image of visible light based on a light receiving signal of the visible light sensor, and a distance measuring function of obtaining a distance image based on the light receiving signal of the infrared light sensor,
a frame calculation unit that calculates a target frame of the tracking camera for tracking the tracking target subject based on three-dimensional position information of a detection simple model that is a simple model of the tracking target subject generated based on the distance image obtained by the sensor device,
The designation reception processing unit includes:
a process of displaying a designation acceptance screen on which a plurality of virtual simple models, which are virtual simple models of the subjects, are arranged as a designation acceptance screen for the tracking target subjects during group tracking in which the plurality of subjects are the tracking target subjects of the tracking camera, and accepting designation of a body part from each of the plurality of arranged virtual simple models;
The image frame calculation unit
When the simple detection model of each subject that is a tracking target of the group tracking is obtained, the information processing device according to any one of (1) to (8) calculates the target image frame so that a part specified on the specification receiving screen among the detected parts of each subject that is a tracking target is included within the target image frame.
(10)
The information processing device according to (1), wherein the tracking camera is configured to be able to perform switching control of a tracking sensitivity of the tracking camera while the tracking camera is tracking the tracking target subject.
(11)
A designated tracking composition control for controlling a composition of the tracking camera so that the tracking composition is a tracking composition designated by a user for the tracking target subject;
a semi-automatic composition control that selects a composition of the tracking camera based on a composition selection table in which weight information is associated with each combination of an object to be imaged by the tracking camera and a composition type of the tracking camera, and controls the composition of the tracking camera so as to obtain the selected composition;
The designation reception processing unit includes:
The information processing device according to any one of (1) to (10), further comprising: a control unit configured to receive an instruction to execute the designated tracking composition control; and a control unit configured to receive an instruction to execute the semi-automatic composition control.
(12)
the sensor device has an infrared light sensor capable of receiving infrared light as the light receiving sensor, and has a distance measuring function of obtaining a distance image based on a light receiving signal of the infrared light sensor,
The information processing device described in any of (1) to (11) is equipped with a calibration processing unit that performs a calibration process to identify the positions of identical parts between a sensor device side simple model, which is a simple model of the subject generated from the distance image obtained by the sensor device, and a camera side simple model, which is a simple model of the subject generated from the image obtained by the tracking camera, as a calibration process to identify the same points that appear in both the distance image obtained by the sensor device and the image obtained by the tracking camera.
(13)
The number of the sensor devices and the number of the tracking cameras are each plural,
The calibration processing unit:
The information processing device described in (12) executes a unit sampling process for each sensor device while switching between the target sensor devices, in which the position of each part of the sensor device side simple model and the camera side simple model is sampled with the composition adjusted so that each tracking camera captures the ranging range of one of the target sensor devices.
(14)
The number of the sensor devices and the number of the tracking cameras are each plural,
The tracking cameras include a first tracking camera arranged so as to be able to capture the imaging ranges of all the other tracking cameras and the distance measurement ranges of all the sensor devices, and a second tracking camera arranged on either the left or right side as viewed from the first tracking camera and arranged so as to be able to capture the imaging range of the first tracking camera,
The sensor device includes a same-side sensor device and an opposite-side sensor device that are arranged at a position on the same side as the second tracking camera when viewed from the first tracking camera, and at a position on the opposite side as the second tracking camera when viewed from the first tracking camera,
The calibration processing unit:
The information processing device described in (12) above, wherein the calibration process between the first tracking camera and the same-side sensor device is performed based on the identification result of the same point between the second tracking camera and the first tracking camera and the result of the calibration process between the first tracking camera and the same-side sensor device.
(15)
The information processing device according to any one of (1) to (15), wherein the event to be imaged by the tracking camera is a live music event.
(16)
An information processing device,
An information processing method for allowing a user to specify a subject to be tracked by a tracking camera capable of tracking a subject from among subjects captured in an image sensed by a sensor device having a light receiving sensor separate from the tracking camera.
(17)
A recording medium on which a computer-readable program is recorded,
A recording medium having a program recorded thereon that causes a computer device to realize a designation reception processing function that allows a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from among subjects captured in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.

100 Image processing system 1 Information processing device 2 Parent camera 3, 3-1, 3-2, 3-3 Tracking camera 4, 4-1, 4-2, 4-3 Pan head 5 Switcher 6, 6-1, 6-2, 6-3, 6-4 Distance measurement/imaging device 11 CPU
19 Memory unit 20 Communication unit Gm Operation screen Gs Setting screen Arc Camera name display area Are Performance area Art Target area At Tracking target display area t1, t2, t3, t4, t5, t6 Individual display area ti Image display area tn Name display area Bm Execution instruction button B0 Menu button B1 Semi-auto ON/OFF button It Tracking indicator Acn Controller name display area Bpd Pull-down button B2 Common setting button B3 Control setting button Bs Start button As1 Camera switching operation area As2 Subject name setting area As3 Controller setting area bcm Camera name input box bcn Controller name input box bt1, bt2, b3, bt4, bt5, bt6 Subject name input box Bp1, Bp2 Add button Be1, Be2 Delete button Bc Camera switching button As4 Performance setting area As5 Preset setting areas cb, cb1, cb2, cb3, cb4, cb5, cbs, cbt Check box bpn Preset name input box Fd Detection frame Ims Captured image Mbv Virtual simple model Fs Image frame Vc Vertical center line J Intersection Vrc Vertical position deviation amount Hof Horizontal offset amount Mbd Detection simple models cba, cbb Check box B9 OK button B10 Cancel button F1 Calibration section F2 Designation acceptance section F3 Operation acceptance section F4 Composition selection section F5 Composition switching control section F6 Weight update section F7 Image recognition processing section F8 Image frame calculation section F9 Coordinate calculation section F10 Tripod head/camera control section F11 Cutout image generation section cbg Check box 30 Joystick

Claims

An information processing device comprising: a designation reception processing unit that performs processing to allow a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from subjects appearing in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.
the sensor device has, as the light receiving sensor, a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and has an imaging function of obtaining an image of visible light based on a light receiving signal of the visible light sensor, and a distance measuring function of obtaining a distance image based on the light receiving signal of the infrared light sensor,
The designation reception processing unit includes:
The information processing device of claim 1, further comprising: a processing step of displaying an image based on the distance image as an image for accepting designation of the tracking target subject when the brightness of the captured image becomes equal to or lower than a predetermined brightness while an image captured by the sensor device is being displayed as an image for accepting designation of the tracking target subject.
The designation reception processing unit includes:
2. The information processing device according to claim 1, wherein, in response to the tracking target subject being designated from among the subjects appearing in a sensing image captured by the sensor device, a process is performed to display name information of the tracking target subject at a position corresponding to a detection frame of the tracking target subject superimposed on a display image of the sensing image.
The designation reception processing unit includes:
displaying a sensing image captured by the sensor device on an operation screen on which an instruction operation as to whether or not to track the tracking target subject by at least the tracking camera can be performed;
2. The information processing device according to claim 1, wherein, in response to the tracking target subject being designated from among the subjects appearing in the sensing image, a process is performed to display an image of the designated tracking target subject extracted from the sensing image in a tracking target display area provided on the operation screen that displays name information of the tracking target subject.
The designation reception processing unit includes:
The information processing apparatus according to claim 1 , wherein a designation of a tracking composition for the subject by the tracking camera is accepted as a designation of a tracking image frame for a display image of a virtual simple model that is a virtual simple model of the subject.
The designation reception processing unit includes:
The information processing apparatus according to claim 5 , wherein a designation of a tracking composition for the subject by the tracking camera is accepted as a designation of a position of the tracking image frame and a size of the tracking image frame with respect to a display image of the virtual simple model.
the sensor device has an infrared light sensor capable of receiving infrared light as the light receiving sensor, and has a distance measuring function of obtaining a distance image based on a light receiving signal of the infrared light sensor,
an image frame calculation unit that calculates a target image frame of the tracking camera for tracking the tracking target subject based on three-dimensional position information of a detection simple model that is a simple model of the tracking target subject generated based on the distance image obtained by the sensor device,
The designation reception processing unit includes:
A process is performed to store information indicating a positional deviation of the image frame center of the tracking image frame with respect to the virtual simple model as image frame center positional deviation information with respect to the model;
The image frame calculation unit
The information processing apparatus according to claim 6 , wherein the target image frame is calculated so that the positional deviation indicated by the information on positional deviation of the image frame center relative to the model is reproduced as a positional deviation of the image frame center of the target image frame relative to the detected simple model.
The designation reception processing unit includes:
a vertical positional deviation amount indicating a vertical positional deviation amount included in the model image frame center positional deviation information is calculated as a relative value scaled by a first scaler that uses a vertical position of a predetermined portion of the virtual simple model as a reference value and a length between predetermined portions spaced apart in the vertical direction in the virtual simple model as a reference unit;
The image frame calculation unit
8. The information processing device according to claim 7, wherein in a second scaler that uses the vertical position of the specified portion of the simple detection model as a reference value and the length between the specified portions vertically spaced apart in the simple detection model as a reference unit, the vertical center position of the target image frame is determined to be at a vertical position specified by the vertical position shift amount.
the sensor device has, as the light receiving sensor, a visible light sensor capable of receiving visible light and an infrared light sensor capable of receiving infrared light, and has an imaging function of obtaining an image of visible light based on a light receiving signal of the visible light sensor, and a distance measuring function of obtaining a distance image based on the light receiving signal of the infrared light sensor,
a frame calculation unit that calculates a target frame of the tracking camera for tracking the tracking target subject based on three-dimensional position information of a detection simple model that is a simple model of the tracking target subject generated based on the distance image obtained by the sensor device,
The designation reception processing unit includes:
a process of displaying a designation acceptance screen on which a plurality of virtual simple models, which are virtual simple models of the subjects, are arranged as a designation acceptance screen for the tracking target subjects during group tracking in which the plurality of subjects are the tracking target subjects of the tracking camera, and accepting designation of a body part from each of the plurality of arranged virtual simple models;
The image frame calculation unit
2. The information processing device according to claim 1, wherein when the simple detection model of each subject that is a tracking target of the group tracking is obtained, the target image frame is calculated so that a part of the detected part of each subject that is a tracking target, which is specified on the specification receiving screen, is included within the target image frame.
The information processing device according to claim 1 , further comprising: a tracking sensitivity control unit that controls switching of the tracking sensitivity of the tracking camera while the tracking camera is tracking the tracking target subject.
A designated tracking composition control for controlling a composition of the tracking camera so that the tracking composition is a tracking composition designated by a user for the tracking target subject;
a semi-automatic composition control that selects a composition of the tracking camera based on a composition selection table in which weight information is associated with each combination of an object to be imaged by the tracking camera and a composition type of the tracking camera, and controls the composition of the tracking camera so as to obtain the selected composition;
The designation reception processing unit includes:
The information processing apparatus according to claim 1 , further comprising: a control unit configured to receive an instruction to execute the designated tracking composition control and an instruction to execute the semi-automatic composition control.
the sensor device has an infrared light sensor capable of receiving infrared light as the light receiving sensor, and has a distance measuring function of obtaining a distance image based on a light receiving signal of the infrared light sensor,
The information processing device of claim 1, further comprising a calibration processing unit that performs a calibration process to identify the positions of identical parts between a sensor device side simple model, which is a simple model of the subject generated from the distance image obtained by the sensor device, and a camera side simple model, which is a simple model of the subject generated from the image obtained by the tracking camera, as a calibration process to identify the same points that appear in both the distance image obtained by the sensor device and the captured image obtained by the tracking camera.
The number of the sensor devices and the number of the tracking cameras are each plural,
The calibration processing unit:
The information processing device according to claim 12, wherein a unit sampling process is executed for each sensor device while switching between the target sensor devices, in which the positions of parts of the sensor device side simple model and the camera side simple model are sampled with the composition adjusted so that each tracking camera captures the measuring range of a targeted sensor device.
The number of the sensor devices and the number of the tracking cameras are each plural,
The tracking cameras include a first tracking camera arranged so as to be able to capture the imaging ranges of all the other tracking cameras and the distance measurement ranges of all the sensor devices, and a second tracking camera arranged on either the left or right side as viewed from the first tracking camera and arranged so as to be able to capture the imaging range of the first tracking camera,
The sensor device includes a same-side sensor device and an opposite-side sensor device that are arranged at a position on the same side as the second tracking camera when viewed from the first tracking camera, and at a position on the opposite side as the second tracking camera when viewed from the first tracking camera,
The calibration processing unit:
The information processing device according to claim 12, wherein the calibration process between the first tracking camera and the same-side sensor device is performed based on the result of identifying identical points between the second tracking camera and the first tracking camera and the result of the calibration process between the first tracking camera and the same-side sensor device.
The information processing device according to claim 1 , wherein the event to be imaged by the tracking camera is a live music event.
An information processing device,
An information processing method for allowing a user to specify a subject to be tracked by a tracking camera capable of tracking a subject from among subjects captured in an image sensed by a sensor device having a light receiving sensor separate from the tracking camera.
A recording medium on which a computer-readable program is recorded,
A recording medium having a program recorded thereon that causes a computer device to realize a designation reception processing function that allows a user to designate a subject to be tracked by a tracking camera capable of tracking a subject from among subjects captured in a sensing image captured by a sensor device having a light receiving sensor separate from the tracking camera.