GB2589950A

GB2589950A - Information processing system, information processing apparatus, and program

Info

Publication number: GB2589950A
Application number: GB2013486.2A
Authority: GB
Inventors: Taniyama Kazutoshi; Kato Kei
Original assignee: Fujitsu Client Computing Ltd
Current assignee: Fujitsu Client Computing Ltd
Priority date: 2019-10-28
Filing date: 2020-08-27
Publication date: 2021-06-16
Also published as: JP6767664B1; JP2021069079A; GB202013486D0

Abstract

An information processing system includes an information processing apparatus, a camera, and a speaker. The camera captures an image of a person located in space. The speaker is not integrated with the camera, has audio directivity, and rotates based on an instruction from a control unit. The control unit in the information processing apparatus determines a target person from an image captured by the camera, detects the head location of the target person, and calculates the rotation angle of the speaker for outputting audio to the head location. In addition, the control unit selects an audio pattern to be outputted to the target person and causes the speaker to rotate at the rotation angle and output the selected audio pattern. Preferably the control unit associates a two-dimensional image of the captured image with a three-dimensional space before detecting the coordinates of feet and top of the head of the target person and maps these coordinates to the 3D space. A ear location of the subject would then be determined and used as the head location.

Description

INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING APPARATUS, AND PROGRAM Aspects of embodiments relate to information 5 processing systems, information processing apparatus, and programs.

Recent years have seen advancement in the information processing technology and in the resolution of monitoring cameras. Accordingly, there have been developed systems for detecting a person from an image captured by a monitoring camera and outputting audio from a speaker. Installing such a system in a store or the like could be a countermeasure against entering of a suspicious person into the store and could enable store clerks to exchange messages, for example.

Japanese Laid-open Patent Publication No. 2017215806 discusses a conventional system.

In such a conventional system described above, a monitoring camera and a speaker are integrated and oriented in the same direction. However, to output audio to a person located in space, this system needs a plurality of sets of a monitoring camera and a speaker that are integrated. Thus, since the system scale is consequently increased, there is a problem in its inefficiency.

SUMMARY

It is desirable to provide an information processing system, an information processing apparatus, and a program that efficiently output an audio notification to a person located in predetermined space without increasing the system scale.

There is provided an information processing system including: a camera; a speaker which is not integrated with the camera, has directivity, and rotates; and a control unit which determines a target person from an image captured by the camera, detects a head location of the target person, calculates a rotation angle of the speaker for outputting audio to the head location, selects an audio pattern to be outputted to the target person, and causes the speaker to rotate at the rotation angle and output the audio pattern.

The disclosure is described, by way of example only, with reference to the figures, in which: FIG. 1 illustrates an example of an information processing system according to a first embodiment; FIG. 2 illustrates an example of a configuration of an information processing system according to a second embodiment; FIG. 3 illustrates an example of a configuration of a speaker; FIG. 4 illustrates an example of a hardware configuration of an informaT_ion processing apparatus; FIG. 5 illustrates an example of an operation sequence performed from capturing of an image of a person to outputting of a verbal notification; FIG. 6 illustrates an example of an operation sequence performed from capturing of an image of a person to outputting of a verbal notification; FIG. 7 illustrates an example of an operation sequence performed from capturing of an image of a person 10 to outputting of a verbal notification; FIG. 8 illustrates an example of an audio pattern table; FIG. 9 illustrates the locations of a camera and a person in three-dimensional (3D) space; FIG. 10 illustrates the locations of the camera and the person in the 3D space; FIG. 11 is a flowchart illustrating an example of an overall operation performed from detection of a person to outputting a verbal notification; FIG. 12 is a flowchart illustrating an example of head location detection processing; FIG. 13 is a flowchart illustrating an example of processing for estimating the moving velocity of a target person and updating a head location; FIG. 14 is a flowchart illustrating an example of processing for calculating the rotation angle of the speaker; FIG. 15 is a flowchart illustrating an example of processing for calculating a rotation angle associated with movement of the target person; FIG. 16 is a flowchart illustrating an example 5 of an operation of rotating the speaker and outputting a verbal notification; and FIG. 17 is a flowchart illustrating an example of an operation of rotating the speaker and outputting a verbal notification.

DESCRIPTION OF EMBODIMENTS

Hereinafter, embodiments will be described with reference to drawings.

[First Embodiment] FIG. 1 illustrates an example of an information processing system according to a first embodiment. This information processing system 1-1 includes an information processing apparatus 1, a camera 2, and a speaker 3. The information processing apparatus 1 includes a control unit la and a storage unit lb. The camera 2 monitors a person located in predetermined space and captures an image of the person. The speaker 3 is not integrated with the camera 2 and has audio directivity. In addition, the speaker 3 rotates and outputs audio based on an instruction from the control unit la.

The control unit la performs image analysis based on artificial intelligence (Al) processing on the image captured by the camera 2. In addition, based on the result of the image analysis, the control unit la performs rotation control processing and audio output processing on the speaker 3. The storage unit lb holds various kinds of data needed for the processing performed by the control unit la.

For example, a processor (not illustrated in FIG. 1) included in the information processing apparatus 1 10 executes a predetermined program to perform the processing of the control unit la and.The storage unit lb. An operation of the control unit la will be described.

[Step 51] The control unit la determines a 15 target person from an image captured by the camera 2.

[Step 52] The control unit la detects the head location of the target person.

[Step S3] The control unit la calculates the rotation angle of the speaker 3 in order to output audio 20 to the head location.

[Step Sq] The control unit la selects an audio pattern suitable for the target person.

[Step 55] The control unit la rotates the speaker 3 at the calculated rotation angle and causes the 25 speaker 3 to output the selected audio pattern.

As described above, the information processing system 1-1 uses the speaker 3 that is not integrated with the camera 2, has directivity, and rotates. In addition, the information processing system 1-1 rotates the speaker 3 to the head location of the target person calculated from an image captured by the camera 2 and causes the speaker 3 to output audio to the target person. Since the number of speakers installed is consequently reduced, an audio notification is efficiently outputted to a person located in predetermined space without increasing the system scale.

[Second Embodiment] Next, a second embodiment will be described. In the following description, outputting an audio notification to a target person will be referred to a verbal notification, as needed.

FIG. 2 illustrates an example of a configuration of an information processing system according to a second embodiment. This information processing system 1-2 includes an information processing apparatus 10, cameras 20-1 to 20-n (when these cameras 20- 1 to 20-n do not need to be distinguished from one another, any one of these cameras will simply be referred to as a camera 20), a speaker 30, a terminal 41 (for maintenance and management), a terminal 42 (for notification), an access point (AP) 50, a hub 61, and a power-over-Ethernet (PoE) hub 62 (Ethernet is a registered trademark).

The information processing apparatus 10 includes a control unit 11 and a storage unit 12. The control unit 11 has the functions of the control unit la in FIG. 1, and the storage unit 12 has the functions of the storage unit lb in FIG. 1. The speaker 30 has the functions of the speaker 3 in FIG. 1.

The hub 61 includes ports pl to p4, and the PoE hub 62 includes a port p11 and ports p12-1 to p12-n. For example, the ports pl to p4 and the port pll are each connectable to a communication line of 1 gigabit per second (Gbit/s). For example, the ports p12-1 to p12-n are each connectable to a communication line of 100 megabits per second (Mbit/s).

The port p1 of the hub 61 and the port p11 of the PoE hub 62 are connected to one another via a local area network (LAN) cable L1. The 2oE hub 62 supplies power via the LAN cable Li of category 5e or more used in Ethernet communication.

Thus, by connecting the cameras 20 to the PoE hub 62, no external power supply such as an alternating current (AC) adapter is needed. Namely, only the LAN cable Ll that performs data communication is needed to supply power. This configuration allows the cameras 20 to be installed even in locations where power is not supplied easily such as outdoors and on ceilings.

In contrast, the port p2 of the hub 61 is connected to the terminal 41, and the port p3 of the hub 61 is connected to the infcrmation processing apparatus 10. In addition, the port p4 of the hub 61 is connected to the AP 50. The ports p12-1 to p12-n of the PoE hub 62 are connected to the cameras 20-1 to 20-n, respectively. The AP 50 is wirelessly connected to the terminal 42 and the speaker 30.

<Configuration of Speaker> FIG. 3 illustrates an example of a configuration of the speaker. The speaker 30 includes an audio output unit 31 and a rotation mechanism unit 32. The audio output unit 31 has an audio propagation function using ultrasound and outputs audio with directivity.

The rotation mechanism unit 32 has a biaxial rotation mechanism in the horizontal direction and the vertical direction. A horizontal-direction motor rotation mechanism of the rotation mechanism unit 32 rotates the audio output unit 31 in the horizontal direction, which is in a plus direction (an arrow hl) or a minus direction (an arrow h2) from 0 degree of a horizontal axis h. In addition, a vertical-direction motor rotation mechanism of the rotation mechanism unit 32 rotates the audio output unit 31 in the vertical direction, which is in a plus direction (an arrow v1) or a minus direction (an arrow v2) from 0 degree on a vertical axis v. The top surface of the rotation mechanism unit 32 has an attachment part 33 for attaching the speaker 30 to a wall or the like. The speaker 30 also includes a wireless LAN communication function not Illustrated.

<Hardware Configuration> FIG. 4 illustrates an example of a hardware configuration of the information processing apparatus. The information processing apparatus 10 is comprehensively controlled by a processor (a computer) 100. The processor 100 realizes the functions of the control unit 11.

The processor 100 is connected to a memory 101, an input-output interface 102, and a network interface 104 via a bus 103. The processor 100 may be a multiprocessor. The processor 100 is, for example, a central processing unit (CPU), a micro processing unit (MPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a programmable logic device (PLD). Alternatively, the processor 100 may be a combination of at least two of a CPU, an MPU, a DSP, an ASIC, and a PLD.

The memory 101 includes the functions of the storage unit 12 and is used as a main storage device of the information processing apparatus 10. At least a part of an operating system (OS) program and an application program executed by the processor 100 is temporarily stored in the memory 101. In addition, various kinds of data needed for the processing performed by the processor 100 is stored in the memory 101.

The memory 101 is also used as an auxiliary storage device of the information processing apparatus 10, and the OS program, the application program, and various kinds of data are stored in the memory 101. Examples of the memory 101 as an auxiliary storage device include a semiconductor storage device such as a flash memory or a solid state drive (SSD) and a magnetic storage medium such as a hard disk drive (HDD).

Examples of the peripheral devices connected to 5 the bus 103 include the input-output interface 102 and the network interface 104. The input-output interface 102 is connectable to a monitor (for example, a light emitting diode (LED) or a liquid crystal display (LCD)) functioning as a display device that displays a state of the 10 information processing apparatus 10 in accordance with a command from the processor 100.

The input-output interface 102 is also connectable to an information input device such as a keyboard or a mouse and transmits signals sent from the 15 information input apparatus to the processor 100.

The input-output interface 102 also functions as a communication interface for connection of peripheral devices. For example, the input-output interface 102 is connectable to an optical drive that reads data stored in an optical disc by using laser light or the like. Examples of the optical disc include a digital versatile disc (DVD), a Blu-ray disc (registered trademark), a compact disc read-only memory (CD-ROM), and a CD-R (Recordable)/RW (Rewritable).

The input-output interface 102 is also connectable to a memory device and a memory reader and writer. The memory device is a recording medium having a function of communicating with the input-output interface 102. The memory reader and writer is a device that performs reading and writing of data on a memory card. The memory card is a card-type recording medium.

The network interface 104 is connected to a network and performs network interface control processing. For example, a network interface card (NTC), and a wireless LAN card, or the like may be used. Data received by the network interface 104 is outputted to the memory 101 or the processor 100.

The above hardware configuration realizes the processing functions of the information processing apparatus 10. For example, by causing the processor 100 to execute an individual predetermined program, the processing according to the embodiment is executed by the information processing apparatus 10.

The information processing apparatus 10 performs the processing functions according to the embodiment, for example, by executing a program stored in a computer-readable storage medium. The program in which the processing contents executed by the information processing apparatus 10 are written may be stored in various kinds of storage medium.

For example, the program executed by the 25 information processing apparatus 10 may previously be stored in an auxiliary storage device. The processor 100 loads at least a part of the program in the auxiliary storage device to a main storage device and executes the program.

In addition, the program may previously be stored in a portable storage medium such as an optical disc, a memory device, or a memory card. For example, after the program stored in a portable storage medium is installed in an auxiliary storage device by the processor 100, the program becomes ready to be executed. The processor 100 may directly read the program from a 10 portable storage medium and execute the program.

<Verbal Notification Operation Sequences> FIG. 5 illustrates an example of an operation sequence performed from capturing of an image of a person to outputting of a verbal notification. FIG. 5 illustrates an operation sequence performed when a verbal notification is outputted to a suspicious person.

[Step 511] A person enters a store.

[Step 811a] The camera 20 captures an image of the person who has entered.71-te store.

[Step Sllb] The camera 20 transmits the captured image of the person to the control unit 11.

[Step Silo] The control unit 11 analyzes the captured image by performing Al processing and detects and tracks the person.

[Step S12] The person exhibits some suspicious behavior.

[Step S]2a] The camera 20 captures an image of the suspicious behavior of 7_he person.

[Step S12b] The camera 20 transmits the captured image of the suspicious behavior to the control unit 11.

[Step 512c] The control unit 11 has previously recognized normal behavior (or suspicious behavior) patterns as behavioral patterns of people. The control unit 11 examines the behavioral pattern of the person based on the captured image received. If the control unit 11 detects a behavioral pattern that is not a normal behavior (or detects a suspicious behavior pattern), the control unit 11 determines that the person is a suspicious person.

[Step 513] The control unit 11 notifies the 15 notification terminal 42 of the detection of the suspicious person.

[Step S14] The terminal 42 displays the entering of the suspicious person on a screen.

[Step 515] The control unit 11 performs three-20 dimensional (3D) space mapping processing, head location detection processing, rotation angle calculation processing, and audio pattern selection processing, to cause the speaker 30 to output a verbal notification to the suspicious person.

In the 3D space mapping processing, the location of the person is mapped to 3D space. In the head location detection processing, the coordinates of the head location of the person in the 3D space are detected. In the rotation angle calculation processing, the rotation angle of the speaker 30 is calculated so that the speaker 30 is oriented to the detected head location of the person.

In the audio pattern selection processing, an audio pattern outputted when the verbal notification is performed is selected from a plurality of audio sources (specific examples of the audio patterns will be described below with reference to FIG. 8).

[Step 816] The control unit 11 transmits a verbal notification command (the calculated rotation angle and the selected audio pattern) to the speaker 30.

[Step S17] The control unit 11 causes the speaker 30 to rotate at the instructed rotation angle 15 based on the received verbal notification command.

[Step S18] The control unit 11 causes the speaker 30 to output the audio of the instructed audio pattern to the suspicious person based on the received verbal notification command. The suspicious person consequently notices the verbal notification.

FIG. 6 illustrates an example of an operation sequence performed from capturing of an image of a person to outputting of a verbal notification. FIG. 6 illustrates an operation sequence performed when a verbal notification is outputted to a certain person. The certain person is a person other than a suspicious person and is, for example, a general customer who enters a store.

[Step S21] A person enters a store.

[Step S21a] The camera 20 captures an image of the person who has entered.71-le store.

[Step 521b] The camera 20 transmits the 5 captured image of the person to the control unit 11.

[Step 521c] The control unit 11 analyzes the captured image by performing AT processing and detects and tracks the certain person. The control unit 11 has previously recognized behavioral patterns of people. The control unit 11 examines the behavioral pattern of the certain person based on the captured image received. For example, if the control unit 11 detects a normal behavioral pattern, the control unit 11 determines that this person is the certain person.

[Step S22] The control unit 11 notifies the notification terminal 42 of the detection of the certain person.

[Step S23] The terminal 42 displays the entering of the certain person on a screen.

[Step 524] The control unit 11 performs 3D space mapping processing, head location detection processing, rotation angle calculation processing, and audio pattern selection processing, to cause the speaker 30 to output a verbal notification to the certain person.

[Step S25] The control unit 11 transmits a verbal notification command (the calculated rotation angle and the selected audio pattern) to the speaker 30.

[Step 326] The control unit 11 causes the speaker 30 to rotate at the instructed rotation angle based on the received verbal notification command.

[Step 3271 The control unit 11 causes the speaker 30 to output the audio of the instructed audio pattern to the certain person based on the received verbal notification command. The certain person consequently notices the verbal notifica7ion.

FIG. 7 illustrates an example of an operation sequence performed from capturing of an image of a person to outputting of a verbal notification. FIG. 7 illustrates an operation sequence performed when a verbal notification is outputted to a certain person in a certain area. The certain person in the certain area is, for example, a store clerk on a sales floor in a store.

[Step 531] A person enters a store.

[Step S31a] The camera 20 captures an image of the person who has entered The store.

[Step S31b] The camera 20 transmits the 20 captured image of the person to the control unit 11.

[Step S31c] The control unit 11 analyzes the captured image by performing Al processing, detects the certain person, and maps the certain person to 3D space. In addition, the control unit 11 tracks the certain person in the 3D space.

[Step 532] The person enters a certain area. [Step S32a] The camera 20 captures an image of the person in the certain area.

[Step 532b] The camera 20 transmits the captured image of the person to the control unit 11.

[Step 532c] The control unit 11 determines that 5 the certain person is in the certain area.

[Step S33] The control unit 11 notifies the notification terminal 42 of the detection of the certain person in the certain area.

[Step S34] The terminal 42 displays the certain 10 person in the certain area on a screen.

[Step S35] The control unit 11 performs head location detection processing, rotation angle calculation processing, and audio pattern selection processing, to cause the speaker 30 to output a verbal notification to the certain person.

[Step S36] The control unit 11 transmits a verbal notification command (the calculated rotation angle and the selected audio pattern) to the speaker 30.

[Step S37] The control unit 11 causes the 20 speaker 30 to rotate at the instructed rotation angle based on the received verbal notification command.

[Step 538] The control unit 11 causes the speaker 30 to output the audio of the instructed audio pattern to the certain person in the certain area based on the received verbal notification command. The certain person in the certain area consequently notices the verbal notification.

<Audio Pattern> FIG. 8 illustrates an example of an audio pattern table. This audio pattern table 12a includes columns "persons", "audio files", and "audio patterns" (audio contents), and the data structure of this table is stored in the storage unit 12.

The table includes the following contents, for example. If the person is a suspicious person, audio file 1.wav, audio file 2.wav, and audio file 3.wav are registered as the audio files. The audio pattern of audio file 1.wav is "May I help you?", and the audio pattern of audio file 2.wav is "A customer is waiting in the HH area". The audio pattern of audio file 3.wav is "Thank you for your purchase".

In addition, if the person is a certain person (for example, a male in his 30's), audio file 4.wav is registered as the audio file. The audio pattern of audio file 4.wav is "We recommend product AA".

In addition, if the person is a certain person (for example, a store clerk) in a certain area, audio file 5.wav is registered as the audio file. The audio pattern of audio file 5.wav is "Please come to xx".

In this way, audio that is suitably used when a verbal notification is outputted to the target person is 25 registered in the audio patern table 12a.

<Locations of Camera and Person in 3D Space> FIGS. 9 and 10 illustrate the locations of a camera and a person in 3D space. FIG. 10 illustrates the image in FIG. 9 in an x-z plane. In FIG. 9, the feet of the target person are located at coordinates A (xl, yl, zl -0) in the 3D xyz space. In addition, the camera 20 is located at coordinates (x2, y2, z2).

In FIG. 10, the target person is located at coordinates (xl, zl), and the camera 20 is located at (x2, z2). In addition, the top of the head of the target person is located at coordinates (xl, H), and a point where an extension of the line from the camera 20 to the top of the head of the target person crosses the x axis is located at coordinates B (x3, z3 = 0).

<Flowcharts> Next, detailed operations will be described with reference to flowcharts in FIGS. 11 to 17. FIG. 11 is a flowchart illustrating an example of an overall operation performed from detection of a person to outputting a verbal notification.

[Step 541] The control unit 11 starts linage 20 analysis processing based on Al processing.

[Step 542] The control unit 11 detects a person from an image captured by the camera 20 and determines whether the detected person is a target person to whom a verbal notification needs to be outputted. If the detected person is a target person to whom a verbal notification needs to be outputted, the processing proceeds to step S43. If not, the control unit 11 repeats the person detection and determination processing.

[Step 543] The control unit 11 detects the head location of the target person in the 3D space.

[Step 544] The control unit 11 determines 5 whether to predict the destination of the target person. If this prediction is performed, the processing proceeds to step 845. If not, the processing proceeds to step 846. [Step 545] The control unit 11 estimates the moving velocity of the target person and updates the head location.

[Step 546] The control unit 11 calculates the rotation angle of the speaker 30.

[Step S47] The control unit 11 selects an audio pattern suitable for the target person by using The audio 15 pattern table 12a.

[Step 548] The control unit 11 determines whether to output a verbal notification while tracking the target person. If the control unit 11 needs to output a verbal notification while tracking the target person, the processing proceeds to step 549. If not, the processing proceeds to step 550a.

[Step 849] The control unit 11 calculates the rotation angle of the speaker 30 associated with movement of the target person. The processing proceeds to step S50b.

[Step 550a] The control unit 11 causes the speaker 30 to rotate at the rotation angle instructed by the control unit 11 and output a verbal notification to the target person based on the audio pattern instructed by the control unit 11.

[Step 550b] The control unit 11 causes the speaker 30 to rotate at the rotation angle that is associated with the movement of the target person and that is Instructed by the control unit 11 and output a verbal notification to the target person based on the audio pattern instructed by the control unit 11.

FIG. 12 is a flowchart illustrating an example of the head location detection processing. FIG. 12 specifically illustrates step S43 in FIG. 11.

[Step S43a] The control unit 11 associates the camera screen corrected by calibration of the camera 20 with the 3D space.

[Step S43b] The control unit 11 detects the target person from the image captured by the camera 20 and acquires the coordinates of the target person in the captured image. If a person is detected, for example, the location of the person is indicated by a rectangle (rectangular information).

[Step 843c] The control unit 11 detects the coordinates of the feet of the target person from the rectangular information of the target person. For example, the coordinates of an interlELediate point on the lower side of the rectangle indicating the location of the person are calculated as the coordinates of the feet.

[Step 843d] The control unit 11 maps the detected coordinates of the feet to the corresponding coordinates in the 3D space (corresponding to coordinates A in FIG. 9).

[Step S43e] The control unit 11 calculates the 5 coordinates of the top of the head of the target person from the rectangular information of the target person. For example, the coordinates of an intermediate point on the upper size of the rectangle indicating the location of the person are calculated as the coordinates of the top of the 10 head.

[Step S43f] The control unit 11 maps the coordinates of the top of the head in the 2D image (the captured image) to the 3D space by assuming the coordinates of the top of the head to be on the floor level in the 3D image (corresponding to the coordinates B in FIG. 10).

[Step S43g] The control unit 11 extracts the z component equivalent to the H component of the coordinates A on the line between the coordinates B and the coordinates of the camera 20 (the y component may be used instead of the x component).

[Step S43h] The control unit 11 subtracts a predetermined length (for example, 20 centimeters) from the location of the extracted z component and sets this 25 height to be a height H of.71-le ears of the target person. [Step 5431] The control unit 11 sets the coordinates A whose z component has been changed to represent the height of the ears to be the coordinates of the head location and dete nines this head location as the coordinates to which the speaker 30 is oriented (as coordinates C).

As described above, the control unit 11 maps the captured image to the 3D space and calculates the head location of the target person in the 3D space. Next, the control unit 11 calculates the location of the ears from the location of the top of the head and uses this ear location as the head location. Since the control unit 11 consequently rotates the speaker 30 to this head location, the target person clearly hears the audio from the speaker 30.

FIG. 13 is a flowchart illustrating an example 15 of processing for estimating the moving velocity of the target person and updating the head location. FIG. 13 specifically illustrates step 545 in FIG. 11.

[Step 545a] The control unit 11 detects a plurality of sets of coordinates of the feet in the past 20 few seconds of 2D images of the target person.

[Step 545b] The control unit 11 converts the detected past coordinates of the feet into coordinates in the 3D space. As a result, the control unit 11 obtains time-series coordinate data including the coordinates A. [Step 545c] The control unit 11 estimates the moving amounts of the target person in the 3D space in t seconds based on the time-series coordinate data. For example, the control unit 11 calculates the inter-coordinate moving velocities from the time-series coordinate data as vectors in the three directions of xyz and calculates an average value per component (moving velocity Va). By multiplying the moving velocity Va by time t, the control unit 11 estimates moving amounts dL in t seconds.

The length of t seconds corresponds to the delay time from the detection of the target person to outputting of audio. The embodiment assumes that the length of t seconds is calculated by a previous system test or the like and is previously held as a set value.

[Step 545d] The control unit 11 adds the moving amounts dL in t seconds in the xyz directions to the coordinates A. As a result, coordinates (A + dL) are obtained. The coordinates (A + dL) represent the feet of the target person to whom a verbal notification is to be outputted (the coordinates will be referred to as coordinates A2). In addition, by setting the z component of the coordinates R2 to the height H of the ears and using the resultant coordinates A2 as the post-movement head location, the control unit 11 is able to orient the speaker 30 to the target person (updates the coordinates C of the head location).

If a verbal notification is outputted to the person, a delay time from detection of the person to outputting of audio from the speaker 30 occurs. Namely, this delay time needs to be taken into account. Even if a verbal notification is outputted to the person, if this person is moving, this person could not be present at the same place.

As described above, the control unit 11 acquires time-series coordinate data by detecting a plurality of sets of coordinates of the feet of the target person from 2D images at certain time intervals and updates the head location based on the moving amounts calculated from the coordinate data. In this way, the post-movement location of the target person is accurately detected.

In addition, the control unit 11 calculates the moving amounts including the delay time from the detection of the target person to outputting of an audio pattern from the speaker 30. In this way, even if the person is moving, audio is outputted from the speaker 30 to the post-movement head location of the person. Namely, the accuracy of the verbal notification is improved.

FIG. 14 is a flowchart illustrating an example of processing for calculating the rotation angle of the speaker. FIG. 14 specifically illustrates step S46 in FIG. 11.

[Step S46a] The control unit 11 subtracts the 25 coordinates of the installation location of the speaker 30 from the coordinates C (the head location) in the 3D space. This subtraction processing corresponds to performing vector conversion of the coordinates C in which the speaker 30 is used as the center. The subtraction result is referred to as a vector S. [Step 546b] The control unit 11 calculates the rotation angle in the horizontal direction (the horizontal rotation angle) from the horizontal direction components (the x and y components) of the vector S. The horizontal rotation angle is calculated by the following equation (1).

(xCOMPONENT) (HORIZONTAL ROTATION ANGLE) = arctan * * (1) N COMPONENT) [Step S46c] The control unit 11 calculates an r component, which represents the rotation direction component when the speaker 30 is rotated at the horizontal rotation angle calculated by equation (1), from the x and y components. The r component is calculated by the following equation (2).

(r COMPONENT) = Af(x COMPONENT + (y COMPONENT)2 an (2) [Step S46d] The control unit 11 calculates the rotation angle in the vertical direction (the vertical rotation angle) from the above r component and the z component, which represents the vertical direction component of the vector S. The vertical rotation angle is calculated by the following equation (3).

(VERTICAL ROTATION ANGLE) = arctan (r COMPONENT) (a COMPONENT) -(3) The control unit 11 calculates the horizontal and vertical rotation angles by using the calculation equations as described above. As a result, the rotation angle of the speaker 30 is calculated easily and accurately.

FIG. 13 is a flowchart illustrating an example of processing for calculating a rotation angle associated with movement of the target person. FIG. 15 specifically illustrates step 549 in FIG. 11.

[Step 549a] The control unit 11 determines a reproduction time t2 of the selected audio pattern outputted when the verbal notification is performed.

[Step 549b] The control unit 11 multiplies the time t2 by the moving velocity Va to calculate a 15 multiplication result as the moving amounts.

[Step 549c] The control unit 11 adds the calculated moving amounts to the coordinates A2 (the coordinates of the feet after the movement) and calculates the head location (coordinates Ca) by using the z component as the height H of the ears. The coordinates Ca represent the head location of the target person after the verbal notification is outputted.

[Step S49d] The control unit 11 subtracts the coordinates of the installation location of the speaker 30 from the coordinates Ca. This corresponds to performing vector conversion of the coordinates Ca in which the speaker 30 is used as the center. The subtraction result is referred to as a vector Sa.

[Step 549e] The control unit 11 calculates the rotation angle in the horizontal direction (the horizontal rotation angle) from the x and y components of the vector Sa by using equation (1).

[Step S49f] The control unit 11 calculates the r component, which represents the rotation angle direction in the horizontal direction, from the x and y components by using equation (2).

[Step 549g] The control unit 11 calculates the rotation angle in the vertical direction (the vertical rotation angle) from the r and z components of the vector Sa.

FIG. 16 is a flowchart illustrating an example of an operation of rotating the speaker and outputting a verbal notification. FIG. 16 specifically illustrates step S50a in FIG. 11. The control unit 11 uses the horizontal rotation angle of the speaker 30 calculated based on the coordinates C (the initial head location) described in FIG. 14 as a horizontal rotation angle al and uses the vertical rotation angle of the speaker 30 calculated based on the coordinates C as a vertical rotation angle bl.

[Step S50a1] The control unit 11 transmits the calculated horizontal rotation angle al and vertical 25 rotation angle bl (a first rotation angle) and the selected audio pattern to the speaker 30.

[Step 550a2] The control unit 11 causes the speaker 30 to rotate at the horizontal rotation angle al and the vertical rotation angle bl.

[Step 550a3] After rotating the speaker 30, the control unit 11 causes the speaker 30 to output a verbal 5 notification to the target person based on the instructed audio pattern.

FIG. 17 is a flowchart illustrating an example of an operation of rotating the speaker and outputting a verbal notification. FIG. 17 specifically illustrates step S50b in FIG. 11. The control unit 11 uses the horizontal rotation angle of the speaker 30 calculated based on the coordinates Ca (the post-movement head location) described in FIG. 15 as a horizontal rotation angle a2 and uses the vertical rotation angle of the speaker 30 calculated based on the coordinates Ca as a vertical rotation angle b2.

[Step S50b1] The control unit 11 transmits the calculated horizontal rotation angle al and vertical rotation angle bl (the first rotation angle) and the selected audio pattern to the speaker 30.

[Step S50b2] The control unit 11 transmits the calculated horizontal rotation angle a2 and vertical rotation angle b2 (a second rotation angle) and information about the time t2 to the speaker 30. The time t2 is the reproduction time of the audio pattern that 25 takes the delay time into consideration as described above. [Step S50b3] The control unit 11 causes the speaker 30 to rotate at the horizontal rotation angle al and the vertical rotation angle bl (the first rotation angle).

[Step S50b4] After the control unit 11 rotates the speaker 30 at the horizontal rotation angle al and the 5 vertical rotation angle bl, the control unit 11 causes the speaker 30 to output a verbal notification based on the instructed audio pattern for the transmitted audio pattern reproduction time (time t2). In addition, the control unit 11 causes the speaker 30 to rotate at the horizontal 10 rotation angle a2 and the vertical rotation angle b2 (the second rotation angle) while causing the speaker 30 to output the verbal notification.

[Step 550b5] Simultaneously with or after the completion of the verbal notification, the control unit 11 15 causes the speaker 30 to stop its rotation.

As described above, the control unit 11 subtracts the coordinates of the installation location of the speaker 30 in the 3D space from the coordinates of the head location, to calculate a vector of the coordinates of the head location in which the speaker 30 is used as the center. The control unit 11 calculates the horizontal rotation angle of the speaker 30 based on the horizontal direction component of the vector.

Next, the control unit 11 calculates the 25 vertical rotation angle of the speaker 30 based on the rotation direction component when the speaker 30 rotates at the horizontal rotation angle and the vertical direction component of the vector and uses the horizontal rotation angle and the vertical rotation angle as the rotation angle of the speaker 30. In this way, the control unit 11 accurately calculates the rotation angle of the speaker 30 having a biaxial rotation mechanism in the horizontal direction and the vertical direction.

In addition, the control unit 11 calculates the first rotation angle (the horizontal rotation angle al and the vertical rotation angle bl) for orienting the speaker 30 to the detected head location. When the destination of the target person is not predicted, the control unit 11 causes the speaker 30 to rotate at the first rotation angle and output an audio pattern.

When the destination of the target person is predicted, the control unit 11 calculates the second rotation angle (the horizontal rotation angle a2 and the vertical rotation angle b2) for orienting the speaker 30 to the updated head location. After causing the speaker 30 to rotate at the first rotation angle, the control unit 11 causes the speaker 30 to rotate at the second rotation angle while causing the speaker 30 to output the audio pattern.

In this way, since the speaker 30 is controlled along with the movement of the target person, even when the target person moves, the audio from the speaker 30 reaches the target person. Namely, the target person hears the verbal notification without fail.

As described above, the information processing system 1-2 according to the second embodiment uses the speaker 30 that is not integrated with the camera 20, has directivity, and rotates. In addition, the information processing system 1-2 rotates the speaker 30 toward the head location of the target person calculated from an image captured by the camera 20 and causes the speaker 30 to output audio to the target person. Since the number of speakers installed is consequently reduced, an audio notification is efficiently outputted to a person located in predetermined space without increasing the system scale.

The processing functions of the information processing systems 1-1 and 1-2 according to the above embodiments may be realized by a computer. In this case, a program is provided in which the processing contents of the functions of the information processing systems 1-1 and 1-2 are written. When the program is executed by a computer, the above processing functions are realized on the computer.

The program in which the processing contents are written may be stored in a computer-readable storage medium. Examples of the computer-readable storage medium include a magnetic storage unit, an optical disc, a magneto-optical storage medium, and a semiconductor memory.

For example, the magnetic storage unit is a hard disk device (HDD), a flexible disk (FD), or a magnetic tape. For example, the optical disc is a CD-ROM/RW. For example, the magneto-optical storage medium is a magneto optical (MO) disk.

One way to distribute the program is to make portable recording media such as CD-ROMs holding the program available. The program may be stored in a storage unit of a server computer and forwarded to other computers from the server computer via a network.

For example, a computer, which executes the program, stores the program recorded in a portable recording medium or forwarded from the server computer in its storage unit. Next, the computer reads the program from its storage unit and executes processing in accordance with the program. The computer may directly read the program from the portable recording medium and perform processing in accordance with the program.

In addition, each time the computer receives a program from a server computer connected to a network, the computer may execute processing in accordance with the received program. At least a part of the above processing functions may be realized by an electric circuit such as a DSP, an ASIC, or a PLD.

In one aspect, the embodiments enable efficiently outputting an audio notification to a person located in predetermined space without increasing the 25 scale.

Claims

CLAIMS1. An information processing system comprising: a camera; a speaker which is not integrated with the camera, has directivity, and rotates; and a control unit which determines a target person from an image captured by the camera, detects a head location of the target person, calculates a rotation angle of the speaker for outputting audio to the head location, selects an audio pattern to be outputted to the target person, and causes the speaker to rotate at the rotation angle and output the audio pattern.
2. The information processing system according to claim 1, wherein the control unit associates a two-dimensional image of the captured image with three-dimensional space, detects coordinates of feet and coordinates of a top of a head of the target person from the two-dimensional image, maps the coordinates of the feet and the coordinates of the top of the head to the three-dimensional space, detects an ear location of the target person by subtracting a predetermined value from a height of the top of the head of the target person based on the coordinates of the top of the head mapped to the three-dimensional space, and uses the ear location as the head location of the target person.
3. The information processing system according to claim 2, wherein the control unit acquires time-series coordinate data by detecting a plurality of sets of 5 coordinates of the feet of the target person from a plurality of the two-dimensional images at certain time intervals, calculates moving amounts of the target person in a predetermined time from the coordinate data, and updates the head location based on the moving amounts. 10
4. The information processing system according to claim 3, wherein the control unit previously holds a delay time from detecting of the target person to outputting of the audio pattern from the speaker and calculates the moving amounts by including the delay time in the predetermined time.
5. The information processing system according to claim 2, wherein the control unit subtracts coordinates of an installation location of the speaker in the three-dimensional space from coordinates of the head location to calculate a vector of the coordinates of the head location in which the speaker is used as a center, calculates a horizontal rotation angle of the speaker based on a horizontal direction component of the vector, calculates a vertical rotation angle of the speaker based on a rotation direction component when the speaker rotates at the horizontal rotation angle and a vertical direction componen7 of the vector, and uses the horizontal rotation angle and the vertical rotation angle as the rotation angle of the speaker.
6. The information processing system according to 10 claim 3, wherein the control unit calculates a first rotation angle for orienting the speaker to the head location detected, causes the speaker to rotate at the first rotation angle and output the audio pattern when a destination of 15 the target person is not predicted, and calculates, when a destination of the target person is predicted, a second rotation angle for orienting the speaker to the head location updated, causes the speaker to rotate at the first rotation angle, and causes, after causing the speaker to rotate at the first rotation angle, the speaker to rotate at the second rotation angle while causing the speaker to output the audio pattern.
7. An information processing apparatus comprising: a control unit which determines a target person from an image captured by a camera, detects a head location of the target person, calculates a rotation angle of a speaker that is not integrated with the camera, has directivity, rotates, and is used for outputting audio to the head location, selects an audio pattern to be outputted to the target person, and causes the speaker to rotate at the rotation angle and output the audio pattern; and a storage unit that holds the audio pattern.
8. A computer program that causes a computer to 10 execute a process comprising: determining a target person from an image captured by a camera; detecting a head location of the target person; calculating a rotation angle of a speaker that is 15 not integrated with the camera, has directivity, rotates, and is used for outputting audio to the head location; selecting an audio pattern to be outputted to the target person; and causing the speaker to rotate at the rotation 20 angle and output the audio pattern.