WO2007145331A1

WO2007145331A1 - Camera control apparatus, camera control method, camera control program, and recording medium

Info

Publication number: WO2007145331A1
Application number: PCT/JP2007/062151
Authority: WO
Inventors: Hiroaki Shibasaki
Original assignee: Pioneer Corporation
Priority date: 2006-06-16
Filing date: 2007-06-15
Publication date: 2007-12-21
Also published as: JPWO2007145331A1; JP4660592B2

Abstract

A camera control apparatus (100) comprises an acquiring section (101), a judging section (102), a control section (103), an input section (104), and a detecting section (105). The acquiring section (101) acquires sound around a camera (110). The judging section (102) judges a word or phrase (hereinafter referred to as a specific word or phrase) for specifying a subject to be captured by the camera (110). The control section (103) controls the direction in which the camera (110) captures the subject according to the specific word or and phrase judged by the judging section (102). The input section (104) receives information on candidates for the subject. The detecting section (105) detects the positions of the candidates for the subject.

Description

Specification

CAMERA CONTROL DEVICE, CAMERA CONTROL METHOD, CAMERA CONTROL PROGRAM, AND RECORDING MEDIUM

The present invention relates to a camera control device, a camera control method, a camera control program, and a recording medium that control a shooting direction of a camera. However, the use of the present invention is not limited to the above-described camera control device, camera control method, camera control program, and recording medium. Background art

Conventionally, there are video conference systems using the Internet and telephone lines that allow a camera to be pointed at a camera in the direction in which the voice is emitted so that the conference can proceed without a sense of incongruity. In such a video conference system, for example, if the direction of sound generation is within the current angle of view of the camera, it is determined whether or not the force is within the angle of view by changing the direction due to the pan head, Drive the pan head to enter the angle of view, display an image, and if it is determined that it does not enter, widen the angle of view to enter the angle of view and drive the pan head to display the image. (For example, see Patent Document 1 below.) O

[0003] Patent Document 1: Japanese Patent Application Laid-Open No. 2000-244885

Disclosure of the invention

Problems to be solved by the invention

[0004] However, according to the above-described conventional technology, it is not possible to take a picture in consideration of the content of the speech made by the speaker. According to the prior art, it is not always appropriate to photograph a speaker depending on the content of the power utterance, which is a speaker who is uttering a voice. For example, if the person's power related to the content of the remark is within the camera's shooting range, it may be preferable to shoot that person.

[0005] In addition, according to the above-described prior art, in the case of shooting targets other than the speaker! / ヽ, there is a problem that it is necessary to change the shooting direction of the camera by manual operation by the operator. Can be mentioned. In this case, an operator who operates the camera is required, and the problem that photographing becomes complicated is an example. Means for solving the problem

[0006] In order to solve the above-described problems and achieve the object, a camera control device according to the invention of claim 1 includes: an acquisition unit that acquires audio around a camera; and the audio acquired by the acquisition unit, Discriminating means for discriminating a word (hereinafter, referred to as “specific word”) that identifies an object to be photographed by the camera, and control means for controlling the shooting direction of the power camera based on the specific word or phrase discriminated by the discriminating means It is characterized by providing.

[0007] Further, in the camera control method according to the invention of claim 7, an acquisition step of acquiring audio around the camera, and a phrase for specifying a subject to be imaged by the camera from the audio acquired by the acquisition step ( And a control step for controlling the shooting direction of the camera based on the specific phrase determined by the determination step.

[0008] Further, a camera control program according to the invention of claim 8 causes a computer to execute the camera control method according to claim 7.

[0009] Further, a recording medium according to the invention of claim 9 is readable by a computer having the camera control program of claim 8 recorded thereon.

Brief Description of Drawings

FIG. 1 is a block diagram showing a functional configuration of a camera control device.

FIG. 2 is a flowchart showing a procedure of camera control processing by the camera control device.

FIG. 3 is an explanatory view showing the vicinity of a dashboard of a vehicle in which a navigation device is installed.

FIG. 4 is a block diagram showing a hardware configuration of the navigation device.

FIG. 5 is an explanatory diagram schematically showing the contents of a database created by the navigation device.

[Fig. 6] Fig. 6 is a flowchart showing a procedure of in-vehicle shooting processing of the navigation device. [Fig. 7] Fig. 7 is a flowchart showing a procedure of in-vehicle shooting processing of the navigation device. FIG. 8 is an explanatory diagram showing an example of an input screen for prospective passenger information.

Explanation of symbols

[0011] 100 camera control device

101 Acquisition Department

102 Discriminator

103 Control unit

104 Input section

105 Detector

110 Camera

BEST MODE FOR CARRYING OUT THE INVENTION

Exemplary embodiments of a camera control device, a camera control method, a camera control program, and a recording medium according to the present invention will be described below in detail with reference to the accompanying drawings.

[0013] (Embodiment)

First, a functional configuration of the camera control device 100 that is useful for the embodiment will be described. FIG. 1 is a block diagram showing a functional configuration of the camera control apparatus. Camera control device 10

0 is configured by an acquisition unit 101, a determination unit 102, a control unit 103, an input unit 104, and a detection unit 105.

The acquisition unit 101 acquires audio around the camera 110. The sound around the camera 110 is, for example, an utterance made by a person located around the camera 110. The acquisition unit 101 acquires the sound around the camera 110 using, for example, a microphone.

The discriminating unit 102 discriminates a phrase (hereinafter referred to as a specific phrase) that identifies a subject to be photographed by the camera 110 from the sound acquired by the acquiring unit 101. The discriminating unit 102 discriminates, for example, information relating to a candidate for photographing input to the input unit 104 described later as a specific phrase.

The control unit 103 controls the shooting direction of the camera 110 based on the specific phrase determined by the determination unit 102. For example, when the determination unit 102 determines a word or phrase that substantially matches information related to the candidate for the shooting target, the control unit 103 determines the shooting direction of the camera 110 at the position of the candidate for the shooting target detected by the detection unit 105 described later. Turn. Words that match A phrase is a phrase that is the same as or similar to a phrase that is input as information about a candidate to be photographed.

The input unit 104 receives input of information related to a candidate for photographing. The information regarding the candidates for the imaging target includes, for example, name information (name, nickname, etc.) of the candidate for the imaging target and attribute information. In addition, the input unit 104 may receive a candidate image or sound to be captured.

[0018] The detection unit 105 detects the position of a candidate for imaging. The detection unit 105, for example, collates the image of the photographing target candidate input to the input unit 104 with the image photographed by the camera 110, and detects the position of the photographing target candidate. Here, for example, when the camera 110 is installed in a vehicle, the shooting target position is the position of the seat where the shooting target is seated, and the detection unit 105 determines which seat the passenger is seated on. Is detected. Further, the position to be imaged may be a relative direction from the camera 110, a relative direction, or the like. Specifically, for example, when an object whose similarity with a candidate image to be captured is reflected in a video captured by the camera 110, the detection unit 105 detects the position of the object. It is set as the position of a candidate for photographing.

In addition, the detection unit 105 detects the position of the candidate for the photographing target by collating the voice of the candidate for the photographing target input to the input unit 104 with the voice acquired by the acquiring unit 101, for example. . Specifically, for example, if the voice acquired by the acquisition unit 101 includes a voice whose similarity with the candidate voice to be shot is a predetermined value or more! /, The voice is pronounced. Therefore, this position is set as the position of the candidate for the photographing target.

[0020] The camera 110 may be installed inside the vehicle. In this case, for example, the acquisition unit 101 acquires the utterance of the passenger of the vehicle, the determination unit 102 determines information about the passenger to be photographed as a specific phrase, and the control unit 103 specifies the specific phrase by the specific phrase. Aim the shooting direction of the camera 110 at the boarding position of the passenger. Further, for example, information related to the passenger is input to the input unit 104, and the detection unit 105 detects which seat each passenger is seated in.

Next, control processing of the camera 110 by the camera control device 100 will be described. FIG. 2 is a flowchart showing a procedure of camera control processing by the camera control apparatus. Figure 2 In the flowchart, first, information relating to a candidate for an imaging target is input to the input unit 104 (step S201). Further, the position of the candidate for photographing is detected by the detection unit 105 (step S 202).

Next, the sound around the camera 110 is acquired by the acquisition unit 101 (step S 203).

Subsequently, the determination unit 102 determines a specific phrase from the voice acquired in step S203 (step S204). The specific word / phrase at this time is information on the candidate to be photographed. Thereby, a photographing target to be photographed is specified from among photographing target candidates. Then, the control unit 103 controls the shooting direction of the camera 110 to the position of the shooting target detected in step S202 (step S205), and the process according to this flowchart ends.

[0023] As described above, according to the camera control device 100, the sound force around the camera also specifies the shooting target and controls the shooting direction of the camera toward the shooting target. Thereby, it is possible to change the shooting direction of the camera 110 in consideration of the audio content around the camera.

[0024] In addition, since the camera control device 100 specifies the shooting target using information relating to the shooting target candidates input in advance as specific words, the shooting target can be specified with higher accuracy. Furthermore, since the position of the shooting target is detected from the image and sound of the shooting target candidate, the shooting direction of the camera 110 can be controlled with higher accuracy.

Example

Next, an example of the camera control device 100 that works on the above-described embodiment will be described. In the following embodiment, a case where the camera control device 100 is applied to a navigation device 300 mounted on a vehicle will be described.

[0026] (Peripheral device configuration of the navigation device 300)

First, the peripheral device configuration of the navigation device 300 will be described. FIG. 3 is an explanatory diagram showing the vicinity of the dashboard of the vehicle in which the navigation device is installed. The navigation device 300 is installed on the dashboard of the vehicle. The navigation device 300 includes a main body M and a display unit (display) D. The display unit D displays the current location of the vehicle, map information, current time, and the like.

In addition, the navigation apparatus 300 is connected to an in-vehicle camera 311 installed in the vicinity of the knock mirror and an in-vehicle microphone 312 installed in the sun-noiser. Car camera 311 The shooting direction can be changed, and images are taken in front of the vehicle and in the vehicle (passengers, etc.). Hereinafter, the in-vehicle camera 311 may be a camera capable of capturing only still images, assuming that the camera 311 is capable of capturing moving images and still images.

[0028] The in-vehicle microphone 312 is used when an in-vehicle sound is input and an operation by the sound input of the navigation device 300 or a state in the vehicle is recorded. It should be noted that the position of the in-vehicle microphone 312 is not limited to the sun visor as long as it can efficiently input voice in the vehicle. Further, a plurality of in-vehicle cameras 311 and in-vehicle microphones 312 may be installed in the vehicle, and may be movable rather than fixed. In the present embodiment, it is assumed that the in-vehicle microphone 312 is provided for each passenger's seat.

[0029] The navigation device 300 has an in-vehicle shooting function for recording a state in the vehicle being driven, in addition to searching for a route to the destination and recording information. The in-vehicle shooting function records in-vehicle video and audio with the in-vehicle camera 311 and the in-vehicle microphone 312. The video and audio recorded by the in-vehicle shooting function are recorded on a recording medium (magnetic disk 405 and optical disk 407 described later) of the navigation device 300. Further, the recorded video and sound may be recorded on an external recording medium so that they can be enjoyed on a home TV or the like.

[0030] (Hardware configuration of navigation device 300)

Next, the hardware configuration of the navigation device 300 will be described. FIG. 4 is a block diagram showing the hardware configuration of the navigation device. In FIG. 4, the navigation device 300 includes a CPU 401, a ROM 402, a RAM (memory) 403, a magnetic disk drive 404, a magnetic disk 405, an optical disk drive 406, an optical disk 407, and an audio IZF (interface) 408. A microphone 409, a speaker 410, an input device 411, a video IZF 412, a camera 413, a display 414, a communication IZF 415, a GPS unit 416, various sensors 417, and an external connection IZF 418. . Each component 40 1-418 is connected by a bus 420.

First, the CPU 401 governs overall control of the navigation device 300. The ROM 402 records programs such as a boot program, a communication program, a database creation program, and a data analysis program. RAM403 is the work area of CPU401 used.

The magnetic disk drive 404 controls reading and writing of data to the magnetic disk 405 according to the control of the CPU 401. The magnetic disk 405 records data written under the control of the magnetic disk drive 404. As the magnetic disk 405, for example, HD (node disk) or FD (flexible disk) can be used.

The optical disk drive 406 controls data reading / writing to the optical disk 407 according to the control of the CPU 401. The optical disk 407 is a detachable recording medium from which data is read according to the control of the optical disk drive 406. The optical disc 407 can also use a writable recording medium. Further, the removable recording medium may be a power MO of the optical disc 407, a memory card, or the like.

[0034] As an example of information recorded on the magnetic disk 405 or the optical disk 407, there is map data used for route search and route guidance. The map data includes background data representing features (features) such as buildings, rivers, and the ground surface, and road shape data representing the shape of the road, and is displayed in two or three dimensions on the display screen of the display 414. It is drawn. When the navigation device 300 is guiding a route, the map data and the current location of the host vehicle acquired by the GPS unit 416 described later are displayed in an overlapping manner.

Audio IZF 408 is connected to audio input microphone 409 (for example, in-vehicle microphone 312 in FIG. 3) and audio output speaker 410. The sound received by the microphone 409 is AZD converted in the sound IZF 408. In addition, sound is output from the speaker 410. Note that the sound input from the microphone 409 can be recorded on the magnetic disk 405 or the optical disk 407 as sound data.

[0036] Examples of the input device 411 include a remote controller, a keyboard, a mouse, and a touch panel that are provided with a plurality of keys for inputting characters, numerical values, various instructions, and the like. Further, the input device 411 can be connected to another information processing terminal such as a digital camera or a mobile phone terminal to input / output data.

The video IZF 412 is connected to a video input camera 413 (for example, the in-vehicle camera 311 in FIG. 3) and a video output display 414. Video IZF412 is specifically a graphic controller that controls the entire display 414 and an immediate display, for example. It consists of a buffer memory such as VRAM (Video RAM) that temporarily records possible image information and a control IC that controls display 414 based on image data output from the graphic controller.

[0038] The camera 413 captures images (including moving images) inside and outside the vehicle and outputs them as image data. An image taken by the camera 413 can be recorded on the magnetic disk 405 or the optical disk 407 as image data. This image data can be used by other information processing terminals by outputting it on the display 414, recording it on a recording medium, or transmitting it via a network.

The display 414 displays icons, cursors, menus, windows, or various data such as characters and images. As this display 414, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like can be adopted.

Communication IZF 415 is connected to a communication network such as the Internet via radio, and functions as an interface between this communication network and CPU 401. Communication networks include LAN, WAN, public line network and mobile phone network.

[0041] 0-3 416 receives a radio wave from a GPS satellite and outputs information indicating the current position of the vehicle (the current position of the navigation device 300). The output information of the GPS unit 416 is used when the CPU 401 calculates the current position of the vehicle together with output values of various sensors 417 described later. The information indicating the current location is information specifying one point on the map data, for example, latitude'longitude and altitude.

[0042] Various sensors 417 output information that can determine the position and behavior of the vehicle, such as a vehicle speed sensor, an acceleration sensor, and an angular velocity sensor. The output values of the various sensors 417 are used for calculation of the current position by the CCU 401 and measurement of speed and direction change.

[0043] The IZF 418 for external connection is an interface for connecting to an external device such as an audio device or a vehicle air conditioner. The external connection IZF 418 is configured by, for example, a dedicated connection cable port and an infrared communication port.

[0044] In addition, in the configuration of the camera control apparatus 100 that is relevant to the embodiment, the acquisition unit 101 is a voice I / F 408, a microphone 409, a half IJ additional U 咅 102, a detection 咅 is a CPU 401, and an ffilj control unit 103. The input unit 104 is connected to the input device 411 by the CPU 401 and the video IZF 412. To realize each function.

[0045] (In-car shooting process using navigation device 300)

Next, in-vehicle shooting processing by the navigation device 300 will be described. As described above, the in-vehicle camera 311 and the in-vehicle microphone 312 are provided in the vehicle in which the navigation device 300 is installed, and the state in the vehicle during driving can be recorded. Here, when shooting inside the vehicle, simply shooting the inside of the vehicle simply reduces the opportunity to view the recorded video that is not attractive as content.

For this reason, the navigation apparatus 300 identifies the shooting target based on the content of the utterance by the passenger, and changes the shooting direction of the in-vehicle camera 311. Specifically, when a speech corresponding to a specific passenger is included in the speaker's utterance, the vehicle-mounted camera 311 is pointed at the passenger and photographed. This makes it possible to focus on the passenger at the center of the topic and take pictures of the situation inside the vehicle, improving the attractiveness of the recorded video content.

Here, in the navigation device 300, in order to identify the passenger to be imaged, information on the passenger is registered in advance, and the vehicle-mounted camera 311 is controlled based on the registered information. ing. Specifically, the navigation device 300 is a storage-type database (planned passenger database) in which information on the possibility of boarding a vehicle or a person who has a plan (hereinafter referred to as “planned passenger”) is accumulated. An updated database (current passenger database) that is updated each time a passenger gets on or off the vehicle is created.

FIG. 5 is an explanatory diagram schematically showing the contents of a database created by the navigation device. In the prospective passenger database 510, text information 511 such as name information 522, nickname information 522, attribute information 523, keyword 524 corresponding to the prospective passenger, and the face of the passenger Image data 512 and voiceprint data 513 are registered.

[0049] In the current passenger database 530, text information 541 corresponding to the passenger who has boarded this time and the boarding position information 542 are registered. As will be described later, the navigation device 300 identifies the passenger who has boarded this time from the image or voiceprint of the passenger. And scheduled to board The information (text information 541) corresponding to the passenger stored in the passenger information database 510 is stored in the passenger database 530 this time.

[0050] Like the boarding person database 510, the text information 541 includes name information 551, nickname information 552, attribute information 553, a keyword 554 corresponding to the boarding person, and the like. The boarding position information 542 is information on the boarding positions of the passengers who have boarded this time, such as “passenger seat” and “right rear seat”.

[0051] In relation to the boarding position information 542, the control table 560 of the in-vehicle camera 311 corresponding to the position of each seat is recorded in the ROM 402 of the navigation device 300. The control table 500 defines a change amount for changing the shooting direction of the in-vehicle camera 311 when shooting each seat. Specifically, for example, if the reference shooting direction (0 ° direction) is perpendicular to the rear window, the driver's seat is 45 ° left from the reference shooting direction, and the passenger seat is the reference shooting direction. 45 ° to the right, right rear seat 30 ° to the left from the standard shooting direction, and left rear seat 30 ° to the right from the standard shooting direction.

Using the database as described above, the navigation apparatus 300 controls the shooting direction of the in-vehicle camera 311.

FIG. 6 and FIG. 7 are flowcharts showing the procedure of the in-vehicle shooting process of the navigation device. In the flowchart of FIG. 6, the navigation apparatus 300 first causes the person who is scheduled to board the vehicle on which the apparatus is installed to input boarding person information (step S601). The boarding person information corresponds to the text information 511 in FIG. 5 and is information such as the name, nickname, attribute, and keyword of the passenger. In addition to the input of prospective passenger information using the input device 411 of the navigation device 300, it is also possible to use the power of each prospective passenger's home PC (Personal Review →) or mobile phone terminal via the network. .

FIG. 8 is an explanatory diagram showing an example of an input screen for prospective passenger information. FIG. 8 illustrates an example of registering a person who is scheduled to board using the input device 411 of the navigation apparatus 300. In FIG. 8, on the display 414 of the navigation device 300, a new registration screen 800 of the person who is scheduled to board is displayed. On the new registration screen 800, a name input part 811 for inputting the name of the prospective boarder, a nickname input part 812 for entering the nickname of the prospective boarder, an attribute input part 813 for entering the attributes of the prospective boarder, Related key A keyword input unit 814 for inputting one word is displayed.

[0055] The newly scheduled passenger who enters the board inputs his / her information in these input sections. Then, the user presses the shooting button 821 and shoots his / her face image using the in-vehicle camera 311. In addition, the sound collection button 822 is pressed and the in-vehicle microphone 312 is used to pick up its own voice. Note that only one of the face image shooting and the sound collection may be used. The person who is registered as a boarding person is not limited to a human but may be an animal such as a dog or a cat. In this case, a person registers in the boarding person information database 510 instead.

[0056] As described above, the new registration screen is displayed on the input screen and the prospective boarder information is input in characters. For example, the prospective boarder information is spoken by voice conversation, and the voice is spoken. It is good also as inputting. According to this method, audio data can be acquired simultaneously. In addition, the face image of the person who will be boarding may be taken automatically by triggering the opening / closing of the door of the vehicle.

Returning to the description of FIG. 6, the navigation apparatus 300 stores the occupant information input in step S601 in the planned passenger database 510 (step S602). Specifically, text information 511 such as the name, nickname, and attribute of the prospective boarder is associated with the face image data 512 of the prospective boarder and voiceprint data 513 (which may be either one) that also extracts voice power. And store it in the planned passenger database.

Next, navigation device 300 waits until the passenger gets in the vehicle (step S603: No loop). Whether or not the passenger has boarded the vehicle is determined based on, for example, whether or not the vehicle engine has been started and whether or not the vehicle door has been opened or closed. When the passenger gets on (step S603: Yes), the in-vehicle camera 311 captures an image of the entire interior of the vehicle, and the in-vehicle microphone 312 collects the sound of the passenger who has boarded each seat (step S604). Specifically, the shooting direction of the in-vehicle camera 311 is controlled to shoot a range where the faces of the passengers in all seats are located. In addition, the passenger is instructed to speak to the in-vehicle microphone 312 provided in each seat. Note that only one of image shooting and sound pickup may be performed.

Next, the navigation apparatus 300 extracts the face image and voiceprint of the passenger who has boarded this time from the image and sound captured and collected in step S604 (step S605). And Then, any one of the face images and voiceprints extracted in step S605 is collated with the planned passenger database 510 (step S606), and the face image data 512 and voiceprint data 513 similar to the face images and voiceprints are checked. (Step S607) o Specifically, the facial image and voiceprint feature points of the passenger who boarded this time, and the facial image data ₅ stored in the planned passenger database ₅特徴 2 and the feature points of voiceprint data 513 are compared, and it is determined whether or not the face image data 512 or voiceprint data 513 having a similarity higher than a predetermined value is stored.

[0060] If similar face image data 512 and voiceprint data 513 are accumulated (step S607: Yes), the text information 511 of the planned passenger corresponding to the face image data 512 and voiceprint data 513 is stored. The text information 541 is stored in the current passenger database 530, and the boarding position information 542 of the passenger is stored in the current passenger database (step S608). The text information 511 stored in the planned passenger database 510 is also retained as it is.

On the other hand, if similar face image data 512 and voiceprint data 513 are not accumulated (step S607: No), the passenger information is input to the passenger (step S609). Then, the input information is stored in the boarding person database 510 as boarding person information (step S610). The passenger information input in step S609 is stored as text information 541 in the current passenger database 530, and the boarding position information 542 of the passenger is stored in the current passenger database (step S611).

[0062] Until all face images and voiceprints are collated (step S612: No), the process returns to step S606 and the subsequent processing is repeated. When all face images and voiceprints are collated (step S612: Yes), the process proceeds to step S613 in FIG. Through the processing up to this point, the scheduled passenger database 510 and the current passenger database 530 are created.

[0063] Since the arrangement of the passenger may change during traveling, a predetermined timing during traveling, for example, when the door is opened or closed after the vehicle stops is detected The passenger database 530 may be updated this time. In this case, the navigation device 300 performs the process from step S604 onward, so that the current passenger database 53 Update 0.

[0064] Turning to the description of FIG. 7, the navigation device 300 monitors the voice in the vehicle (step S613), and the words / phrases included in the text information 541 stored in the passenger database 530 this time in the passenger's utterance It is determined whether or not (specific phrase) is included (step S614). Voice monitoring is performed using commonly used voice recognition technology. Specifically, the voice picked up by the in-vehicle microphone 312 is converted into text, and it is determined whether or not the phrase included in the text information 541 is included in the utterance of the passenger.

[0065] When the phrase included in the text information 541 is uttered (step S614: Yes), the passenger database 530 is checked this time, and the boarding position information 542 of the passenger corresponding to the spoken phrase is acquired ( Step S615). Subsequently, the navigation device 300 determines whether or not there is a priority camera operation (step S616). The priority camera operation is a process that should be performed with priority over the passenger photographing process among processes using the in-vehicle camera 311 such as a drive recorder function and a passenger recognition process. For example, if the vehicle's running speed is greater than or equal to a predetermined speed, or if the current position of the vehicle is within a predetermined distance from the intersection, the drive recorder function is prioritized because the possibility of an accident is higher than normal. . The priority camera operation condition setting may be determined in advance or may be set by the user. Until this time, the in-vehicle camera 311 may be in a stopped (power off) state.

[0066] If there is a priority camera operation (step S616: Yes), the process waits until the camera operation ends. If there is no priority camera operation (step S616: No), refer to the control table 560 recorded in ROM 402 and the shooting direction of the in-vehicle camera 311 at the boarding position of the passenger corresponding to the spoken phrase (Step S617) and photograph the passenger (step S618). On the other hand, when the phrase included in the text information 541 is not uttered (step S614: No), the process returns to step S613, and the voice monitoring is continued.

[0067] It should be noted that when a phrase included in the text information 541 is uttered (see step S614), a specific keyword (shooting instruction keyword) is added in addition to the phrase included in the text information 541 that does not necessarily photograph the passenger. You may shoot only when) is spoken. Shooting instruction keywords are, for example, “shooting” and “take a picture”. In addition to the words shown, it is a word that indirectly instructs the photographer, such as “Look at Mr. OO” and “Towards Mr. OO”. Shooting instruction keywords may be determined in advance or registered by the user!

[0068] Alternatively, for example, after the shooting direction is directed to the passenger in step S617, shooting may be started when an utterance including a shooting instruction key word is made. In this case, if a shooting instruction keyword is not spoken until a predetermined time elapses after a keyword for specifying a subject is issued, the shooting direction of the in-vehicle camera 311 may be returned to the initial state as a timeout. .

[0069] Conversely, shooting starts when an utterance that includes a shooting instruction keyword is made, and if a phrase included in text information 541 is uttered during shooting, the phrase that corresponds to the spoken phrase is handled. It may be directed toward the passenger. Furthermore, when the utterance containing the shooting instruction keyword is made, the shooting direction is directed toward the speaker, and then, when the phrase included in the text information 541 is spoken, the phrase corresponding to the spoken phrase is dealt with. The shooting direction may be directed toward the passenger to start shooting.

[0070] In addition, the number of passengers to be photographed is not limited to one. For example, when an attribute common to a plurality of passengers such as "all" or "male" is spoken, the corresponding passenger Shoot sequentially. In addition, when a plurality of words included in the text information 541 are uttered in the utterance, the passengers corresponding to each word are photographed sequentially. In addition, the shooting priority order of each passenger may be determined.

[0071] The navigation device 300, when instructed to end the shooting (step S619: Yes) or when the shooting has started and the force has elapsed for a predetermined time (step S620: Yes), End (step S621). Here, the shooting end instruction includes, for example, words such as “stop” and “end” that directly indicate the end of shooting, as well as utterances of specific keywords and button operations. If there is no instruction to end the shooting (Step S619: No) and the predetermined time has not elapsed since the start of shooting (Step S620: No), the process returns to Step S618 and the passenger's shooting is continued.

[0072] After shooting is completed in step S621, the shooting direction of the in-vehicle camera 311 is returned to the original position (step S622). Until the vehicle finishes running (Step S623: No), Return to step S613 to continue the following processing. When the vehicle finishes traveling (step S623: Yes), the current passenger database is deleted (step S624), and the process of this flow chart is terminated.

[0073] In the above description, the planned passenger information is stored in the planned passenger information database 510. For example, the planned passenger information database 510 is not created, and the It is also possible to simply generate the passenger database 530 this time by inputting information. In other words, the registration screen as shown in FIG. 8 is displayed for each run, and all the passengers on board at that time enter the passenger information and store it in the current passenger database 530 together with the boarding position information. . Since the passenger database 530 is deleted every time traveling is completed, the passenger needs to input information every time he / she gets on the board, but the navigation device 300 does not need to hold the passenger information.

[0074] Further, in the present embodiment, the power of photographing the state in the vehicle can be similarly applied to, for example, a video conference system. In this case, the boarding person in the present embodiment is a meeting attendee and the boarding position is the seating position in the meeting room.

As described above, according to the navigation device 300, the shooting target is specified from the sound in the vehicle, and the shooting direction of the in-vehicle camera 311 is controlled toward the shooting target. Accordingly, the shooting direction of the in-vehicle camera 311 can be changed in consideration of the content of the passenger's utterance. For example, a child sitting in the back seat can be called out, the shooting direction of the in-vehicle camera 311 can be directed to the child, and the video can be output to the display 414. This makes it possible to confirm the child in the rear seat without looking back while driving.

In addition, the navigation device 300 extracts the position of the occupant from the video captured by the in-vehicle camera 311 and the sound collected from the in-vehicle microphone 312. As a result, the boarding position of the passenger can be specified without having the passenger input the boarding position. Furthermore, since the boarding person information is stored in the boarding person database, it is possible to take a picture without performing complicated processing for the passenger without having to input information about the boarder each time.

It should be noted that the camera control method described in the present embodiment is a program prepared in advance. Can be realized by executing it on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by reading the recording medium force by the computer. The program may be a transmission medium that can be distributed through a network such as the Internet.

Claims

The scope of the claims

[1] Acquisition means for acquiring audio around the camera;

Discriminating means for discriminating a phrase (hereinafter referred to as “specific phrase” ヽぅ) for identifying a subject to be photographed by the camera from the voice acquired by the acquiring means;

Control means for controlling the shooting direction of the camera based on the specific phrase determined by the determination means;

A camera control device comprising:

[2] An input means for receiving input of information on the candidate to be photographed;

Detecting means for detecting the position of the candidate for photographing,

The discrimination means includes

A phrase that substantially matches the information about the candidate to be photographed input to the input means is determined as the specific phrase,

The control means includes

In the case where a word or phrase that substantially matches the information related to the candidate for the photographing target is determined by the determining means, the photographing direction of the camera is directed to the position of the candidate for the photographing target detected by the detecting means. The camera control device according to claim 1.

[3] The input means includes

3. The camera according to claim 2, wherein at least one of name information of the candidate for photographing target or attribute information of the candidate for photographing target is input as the information regarding the candidate for photographing target. Control device.

[4] The input means includes

A candidate image of the shooting target is input,

The detection means includes

3. The camera control device according to claim 2, wherein the position of the candidate to be photographed is detected by comparing the image and a video photographed by the camera.

[5] The input means includes

The voice of the candidate for shooting is input,

The detection means includes The camera control apparatus according to claim 2, wherein the position of the candidate to be imaged is detected by comparing the sound and the sound acquired by the acquisition unit.

[6]

It is installed inside the vehicle,

The acquisition means includes

Obtaining the utterance of the passenger of the vehicle,

The discrimination means includes

Discriminating information about the passenger to be photographed as a specific phrase, the control means,

6. The camera control device according to claim 1, wherein a shooting direction of the camera is directed to a boarding position of the passenger specified by the specific phrase.

[7] An acquisition process for acquiring audio around the camera;

A discrimination step of discriminating a phrase (hereinafter referred to as a specific phrase and! ゝぅ) that identifies a subject to be photographed by the camera from the voice acquired in the acquisition step;

A control step of controlling the shooting direction of the camera based on the specific phrase determined by the determination step;

A camera control method comprising:

[8] A camera control program that causes a computer to execute the camera control method according to claim 7.

[9] A computer-readable recording medium in which the camera control program according to claim 8 is recorded.