WO2017086108A1

WO2017086108A1 - Information presentation apparatus, information presentation method, program, information processing apparatus, and guide robot control system

Info

Publication number: WO2017086108A1
Application number: PCT/JP2016/081751
Authority: WO
Inventors: 伊藤　直之; 靖寿松葉; 松本　征二
Original assignee: 大日本印刷株式会社
Priority date: 2015-11-16
Filing date: 2016-10-26
Publication date: 2017-05-26

Abstract

Provided is an information presentation apparatus or the like that is capable of natural conversation just like with a human, when starting a conversation with a user. An information presentation apparatus 10 is provided with: an image input unit 11 to which an image including a user's face is inputted; an area determination unit 12 that determines areas of facial organs from the image; a color information extraction unit 13 that extracts color information from the areas; a storage unit 15 that stores a color language table 14 in which color information and language information are associated; and a checking unit 16 that obtains the language information by checking, in the color language table 14, the color information extracted by the color information extraction unit 13.

Description

Information presentation apparatus, information presentation method, program, information processing apparatus, and guidance robot control system

The present invention relates to an information presentation apparatus that performs a dialog with a user (human).
The present invention also relates to an information processing apparatus that performs processing according to a person's attribute (classification).
Furthermore, the present invention relates to a guidance robot control system and the like.

In recent years, research and development have been promoted so that robots and computers can interact with human users. For example, Patent Document 1 discloses an information presentation device that receives voice input from a user, evaluates a response to the user's response processing based on the voice mode of the voice input, and changes the response content based on the evaluation result. ing. As in Patent Document 1, in the conventional information presentation apparatus, content to be uttered is determined in consideration of human utterance content, changes in facial expressions, and the like.

Also, conventionally, there are systems that automatically process information guidance, dialogue, advertisement, etc. for a person. Many of these systems perform uniform processing regardless of a person's attribute (classification), and effective provision of information to a person is not necessarily realized. On the other hand, there is a system that automatically performs processing according to a person's attribute (classification).

Patent Document 2 discloses an information processing apparatus that presents effective information by recognizing a person's age and gender and performing a presentation effect by voice or movement. Patent Document 3 discloses an output control apparatus that controls information to be output by referring to individual information determined based on individual image data. According to these prior arts, effective automatic correspondence reflecting each person's information is realized.

Furthermore, a robot that provides a guidance service to visitors has been disclosed (for example, Patent Document 4).

JP2015-138147A JP 2012-185303 A Japanese Patent Laying-Open No. 2015-021813 JP 2008-260107 A

By the way, when humans (especially those who meet for the first time) have a conversation, the topic of the introduction part of the conversation is often decided based on the impression (especially the first impression) received from the other party. However, the conventional information presentation apparatus including Patent Document 1 cannot realize a natural dialogue between humans when introducing a dialogue with a user (especially a first-time user).

Also, in Patent Document 2, the age and sex of a person are discriminated only from an image. Patent Document 3 also determines an attribute based only on image data captured by the imaging unit. As described above, since the methods of Patent Documents 2 and 3 determine the attribute (classification) of a person based on a single type of data including only images, there are few determination indexes for determining the attribute (classification) of a person. In some cases, the attribute (classification) of a person could not be specified accurately. Further, the discrimination target is limited only to the attribute (classification) of the person that can be discriminated only from the image.

Furthermore, since the robot described in Patent Document 4 provides a guidance service according to a pattern related to the movement of a user such as a visitor, the guidance service is forcibly provided regardless of the user's intention. There was a possibility.

The present invention has been made in view of the above-described problems, and an object of the present invention is to provide an information presentation apparatus and the like capable of natural conversations between humans when introducing a conversation with users. It is to be.
In addition, the present invention has been made from such a viewpoint, and it is an object of the present invention to appropriately specify a person's classification and to execute processing according to the identified classification. It is to provide a device or the like.
Furthermore, an object of the present invention is to provide a guidance robot control system and the like that can perform voice calls in consideration of the user's intention.

A first invention for achieving the above-described object is an information presentation apparatus for interacting with a user, and is input with storage means for storing a color language table in which color information and language information are associated with each other. Extracted by an image input means for acquiring an image, an area determination means for determining one or more predetermined areas from the image, a color information extraction means for extracting the color information from the area, and the color information extraction means. And a collating unit that collates the color language table based on the color information and obtains the language information. According to the first invention, when a dialogue with a user is introduced, a natural dialogue like humans is possible.

The first invention may further include speech content determining means for determining speech content based on the language information. This enables a dialogue based on the user's impression obtained from the color information.

Further, the color information extracting means in the first invention may perform a color reduction process on each of the areas to extract one or a plurality of the color information representing the area. This makes it possible to accurately extract color information.

Further, the language information in the first invention may be an impression word expressing an impression. Thus, an accurate language that expresses the user's impression can be obtained.

Further, the first invention may further include related word acquisition means for acquiring related words related to the language information. This makes it possible to obtain a wide range of languages that express user impressions.

Further, the storage means in the first invention further stores a dialogue rule in which the area and the language information are associated with the utterance content, and the utterance content determination means is based on the dialogue rule. The utterance content may be determined. As a result, an accurate dialogue based on the impression of the user can be performed.

Further, the storage means in the first invention further stores a recommendation table in which the area and the language information are associated with recommendation information recommended to the user, and the utterance content determination means further includes The recommendation information may be determined based on the recommendation table. Thus, accurate recommendation information based on the user's impression can be presented.

A second invention is an information presentation method in which a computer storing a color language table in which color information and language information are associated with each other interacts with a user, and the image input step inputs the user's image. An area determination step for determining a predetermined area from the image; a color information extraction step for extracting the color information from the area; and the color language table based on the color information extracted by the color information extraction step. And a collation step of collating and obtaining the language information. Thus, according to the second aspect of the present invention, a natural dialogue like humans is possible when a dialogue with a user is introduced.

The third invention is a program that causes a computer to function as the information presentation device according to claim 1. The information presentation apparatus of the first invention can be obtained by installing the program of the third invention on a general-purpose computer.

According to a fourth aspect of the present invention, there is provided a storage unit that stores a feature for each person classification, an extraction unit that extracts a person feature, a specifying unit that specifies the person classification based on the extracted feature, An information processing apparatus comprising: processing means for performing processing according to classification.
According to the fourth invention, there is provided an information processing apparatus capable of suitably specifying a person's classification (person attributes such as preference, gender, age, language, nationality) and executing a process according to the specified classification. Provided.

In the fourth aspect of the invention, it is desirable that the extraction unit extracts features using a plurality of different types of data. Thereby, the classification of a person can be suitably specified using a plurality of different types of data.

According to a fourth aspect of the present invention, the image acquisition device further includes image acquisition means for acquiring image data including the person, the extraction means extracts at least the characteristics of the person from the image data, and the identification means includes at least: It is desirable to specify the classification of the person based on the feature. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the video data.

Further, the extraction means may be configured to extract features related to the movement of the person. Thereby, the classification of the person can be suitably specified based on the characteristics of the movement of the person extracted from the video data.

Further, an information presenting means for presenting information to a person may be further provided, and the extracting means may be configured to extract features related to the person's reaction to the information. Thereby, based on the characteristic of the reaction of the person at the time of presenting information, the person classification can be suitably specified.

According to a fourth aspect of the present invention, the apparatus further comprises sound acquisition means for acquiring sound data of the person, wherein the extraction means extracts at least the characteristics of the person from the sound data, and the specifying means includes at least the It is desirable to identify the classification of the person based on characteristics. Accordingly, it is possible to preferably specify the classification of the person based on at least the characteristics of the person extracted from the audio data.

The extraction means is configured to extract at least one of voice strength, voice pitch, frequency characteristics, voice quality, utterance tempo, harmonic characteristics, intonation, response sound characteristics, utterance language, and utterance content. May be. Thereby, the characteristics of a person are preferably extracted from the audio data.

In the fourth invention, the apparatus further comprises odor acquisition means for acquiring odor data of the person, wherein the extraction means extracts at least the characteristics of the person from the odor data, and the specifying means includes at least the odor data. It is desirable to identify the classification based on features. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the odor data.

In the fourth aspect of the invention, it is desirable to totalize to which category each of the extracted features belongs, and to specify the category with the largest total number as the category of the person. Thereby, the classification of the person can be suitably specified based on the extracted feature.

According to a fifth aspect of the present invention, there is provided a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of frame images continuous in time series, and an analysis of the plurality of frame images acquired by the image acquisition unit. And a direction determining unit that determines a direction that the user is viewing for each frame image, and a direction change amount that represents a magnitude of a change in the direction over time based on the direction determined by the direction determining unit. Based on the direction change calculated by the direction change amount calculating means, guidance determining means for determining whether or not to guide based on the direction change calculated by the direction change amount calculating means, and the guidance determining means A guidance robot control system comprising: guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot.
According to a sixth aspect of the present invention, in the guidance robot control system according to the fifth aspect of the present invention, distance calculation means for calculating a distance between the user and the robot based on the position information of the user in the frame image acquired by the image acquisition means. The guidance determination unit is configured to determine that guidance is performed when the distance calculated by the distance calculation unit is within a specified range.
7th invention is the guidance robot control system of 5th invention or 6th invention, The movement which calculates the moving speed of the said user based on the said user's positional information in the frame image which the said image acquisition means acquired The guidance robot control system includes a speed calculation unit, and the guidance determination unit determines to perform guidance when the movement speed calculated by the movement speed calculation unit is within a predetermined range.
According to an eighth aspect of the present invention, in the guided robot control system of the seventh aspect, the robot can move, and based on the position information of the user in the frame image acquired by the image acquisition means, the moving direction of the user And a movement control means for moving the robot based on the movement direction of the user calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means. The guide robot control system characterized by the above.
A ninth aspect of the present invention is the guidance robot control system according to any one of the fifth to seventh aspects, wherein the direction discriminating means is configured to be viewed by the user based on a relative position between the user's eyeball and pupil. A plurality of frames acquired by the image acquisition means, and a movement direction calculation means for calculating a movement direction of the user based on position information of the user in the frame image acquired by the image acquisition means. Analyzing the image and acquiring face data of the user; and face direction determining means for determining the orientation of the user's face for each of the frame images, wherein the guidance determining means includes the movement When the moving direction of the user calculated by the direction calculating unit is different from the direction of the face common to the plurality of frame images determined by the face direction determining unit, They are determined that the to a guidance robot control system according to claim.
According to a tenth aspect of the present invention, in the guidance robot control system according to the ninth aspect of the invention, the direction determining means detects a facial organ based on the face data acquired by the face acquiring means, A guidance robot control system characterized by determining a direction and acquiring a direction that the user is looking at.
An eleventh aspect of the present invention is the guide robot control system according to any one of the fifth aspect to the tenth aspect, wherein at least one of the frame images acquired by the image acquisition unit is analyzed, and the facial expression of the user is analyzed. The guidance data storage unit stores guidance data corresponding to a facial expression, and the guidance unit refers to the guidance data storage unit and is determined by the facial expression determination unit. A guidance robot control system characterized by extracting guidance data based on a user's facial expression and performing guidance.
According to a twelfth aspect of the present invention, in the guidance robot control system of the eleventh aspect of the invention, after guiding by guidance data, at least the processing by the image acquisition means and the processing by the facial expression discrimination means are executed, and the user before guidance Facial expression comparing means for comparing the facial expression of the user and the facial expression of the user after guidance, the guidance means refers to the guidance data storage unit and extracts guidance data based on the comparison result by the facial expression comparison means This is a guidance robot control system characterized by performing guidance.
In a thirteenth aspect of the present invention, in any one of the guided robot control systems according to the fifth to twelfth aspects of the present invention, the attribute of the user is analyzed by analyzing at least one of the frame images acquired by the image acquisition unit. Attribute determining means for determining the guide data, the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit A guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.

In a fourteenth aspect of the present invention, a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of time-series frame images, and the frame image acquired by the image acquisition unit are analyzed. User direction acquisition means for specifying a user and acquiring a viewing direction; movement direction calculation means for analyzing the frame image acquired by the image acquisition means to calculate the movement direction of the user; and the user direction Guidance determination means for determining whether or not to guide based on the user's viewing direction acquired by the acquisition means and the user's movement direction calculated by the movement direction calculation means; and guidance by the guidance determination means A guidance robot that extracts guidance data from the guidance data storage unit and performs guidance by a robot when it is determined to perform guidance. It is a control system.
A fifteenth aspect of the present invention is the guide robot control system according to the fourteenth aspect of the present invention, wherein the guide data storage unit stores guide data corresponding to a moving direction.
A sixteenth aspect of the invention is the guidance robot control system according to the fourteenth aspect of the invention or the fifteenth aspect of the invention, further comprising distance calculation means for calculating a distance between the user and the robot, and the guidance determination means includes the distance calculation means. A guidance robot control system, characterized in that, when the calculated distance is within a specified range, it is determined that guidance is to be provided.
A seventeenth aspect of the invention is the guidance robot control system according to any one of the fourteenth aspect to the sixteenth aspect of the invention, further comprising movement speed calculation means for calculating the movement speed of the user, wherein the guidance determination means A guidance robot control system characterized in that guidance is determined when the movement speed calculated by the calculation means is within a specified range.
According to an eighteenth aspect of the present invention, in the guided robot control system of the seventeenth aspect, the robot can move, and the movement direction calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means. A guidance robot control system comprising a movement control means for moving the robot based on speed.
A nineteenth aspect of the present invention is the guide robot control system according to any one of the fourteenth aspect to the eighteenth aspect, wherein the face image obtained by analyzing the frame image obtained by the image obtaining means is obtained. The guidance robot control system includes an acquisition unit, and the user direction acquisition unit acquires a direction that the user is viewing based on the user face data acquired by the face acquisition unit.
According to a twentieth aspect, in the guidance robot control system according to the nineteenth aspect, the user direction acquisition means detects a facial organ based on the face data acquired by the face acquisition means, and determines the face from the positional relationship. The direction of the user is determined and the direction the user is looking at is acquired, or the direction the user is looking at is acquired based on the relative position between the user's eyeball and pupil. System.
According to a twenty-first aspect, in the guidance robot control system of the nineteenth aspect or the twentieth aspect, the face data of the user acquired by the face acquisition means is stored when the guidance data is output by the guidance means. A face data storage unit, a face collation unit that collates the face data of the user acquired by the face acquisition unit and the face data stored in the face data storage unit, and the guide unit includes Guidance characterized by extracting guidance data different from the previous one from the guidance data storage unit and providing guidance when face matching means can collate with the face data stored in the face data storage unit. It is a robot control system.
A twenty-second aspect of the present invention is the guided robot control system according to any one of the nineteenth to twenty-first aspects, wherein the facial expression for determining the user's facial expression is based on the user's facial data acquired by the facial acquisition means. Determining means, wherein the guide data storage unit stores guide data corresponding to a facial expression, and the guide unit refers to the guide data storage unit to the facial expression of the user determined by the facial expression determination unit. A guidance robot control system characterized by extracting guidance data based on the guidance and performing guidance.
According to a twenty-third aspect, in the guidance robot control system according to the twenty-second aspect, after guiding by guidance data, at least the processing by the image acquisition unit, the processing by the face acquisition unit, and the processing by the facial expression determination unit A facial expression comparison unit that executes and compares the facial expression of the user before guidance and the facial expression of the user after guidance, and the guidance unit compares the facial expression with the facial expression comparison unit with reference to the guidance data storage unit A guidance robot control system characterized by extracting guidance data based on a result and performing guidance.
According to a twenty-fourth aspect, in the guided robot control system according to any one of the fourteenth aspect to the twenty-third aspect, the at least one frame image obtained by the image obtaining unit is analyzed, and the attribute of the user is analyzed. Attribute determining means for determining the guide data, the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit A guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.

According to the present invention, it is possible to provide an information presenting apparatus and the like capable of a natural dialogue between humans when a dialogue with a user is introduced.
In addition, according to the present invention, there is provided an information processing apparatus and the like that can appropriately specify a person's classification and execute processing according to the identified classification.
Furthermore, according to the present invention, it is possible to provide a guidance robot control system capable of calling out in consideration of the user's intention.

The block diagram which shows the structure of one Embodiment of the information presentation apparatus of 1st Embodiment. The block diagram which shows the structure of other embodiment of the information presentation apparatus of 1st Embodiment. The block diagram which shows the structure of the area | region determination part 12 of 1st Embodiment. Explanatory drawing of the face detection process by the face detection part 12a of 1st Embodiment. Explanatory drawing of the face organ detection process by the face organ detection part 12b of 1st Embodiment. Explanatory drawing of the clothing area | region identification process by the clothing area | region identification part 12c of 1st Embodiment. Explanatory drawing of the color information extraction process by the color information extraction part 13 of 1st Embodiment. The figure which shows the specific example of the color image scale for forming the color language table 14 of 1st Embodiment. The figure which shows the example of the color image scale which matched the three combination colors and language of 1st Embodiment The figure which shows the other example of the color image scale which matched the three combination colors and language of 1st Embodiment. The figure which shows the specific example of the color language table 14 of 1st Embodiment. The figure which shows the specific example of the monochrome table 100 of 1st Embodiment. The figure which shows the specific example of the dialogue rule 110 of 1st Embodiment. The figure which shows the specific example of the recommendation table 120 of 1st Embodiment. The flowchart which shows operation | movement of the information presentation apparatus 10 of 1st Embodiment. The figure which shows an example of the system configuration | structure of the information processing system 2001 of 2nd Embodiment. The internal block diagram of the robot 2002 of 2nd Embodiment The internal block diagram of the computer 2003 of 2nd Embodiment Functional configuration diagram of the robot 2002 and the information processing system 2001 of the second embodiment The figure which illustrates the 1st characteristic 2030A of a 2nd embodiment The figure which illustrates the 2nd characteristic 2030B of a 2nd embodiment. The figure which illustrates the 3rd characteristic 2030C of 2nd Embodiment The figure which shows the characteristic (person characteristic 2040) of the extracted person of 2nd Embodiment The figure which shows the classification table 2050 of 2nd Embodiment. The figure which shows the process database 2060 of 2nd Embodiment. A flowchart showing an operation of the robot 2002 according to the second embodiment. It is a figure which shows the example of a scene suitable for operation | use of the guidance robot control system which concerns on 3rd Embodiment. It is a functional block diagram of the communication robot which concerns on 3rd Embodiment. It is a figure which shows the example of the guidance data storage part of the communication robot which concerns on 3rd Embodiment. It is a flowchart which shows the robot control process in the communication robot which concerns on 3rd Embodiment. It is a flowchart which shows the image analysis process in the communication robot which concerns on 3rd Embodiment. It is a flowchart which shows the user determination process in the communication robot which concerns on 3rd Embodiment. It is a flowchart which shows the guidance post-process in the communication robot which concerns on 3rd Embodiment. It is a figure which shows the specific example of the guidance robot control system which concerns on 3rd Embodiment. It is a figure which shows the specific example of the guidance robot control system which concerns on 3rd Embodiment. It is a figure which shows the specific example of the guidance robot control system which concerns on 3rd Embodiment. It is a figure which shows the example of a scene suitable for operation | use of the guidance robot control system which concerns on 4th Embodiment. It is a functional block diagram of the guidance robot control system which concerns on 4th Embodiment. It is a figure which shows the example of the guidance data storage part of the communication robot which concerns on 4th Embodiment. It is a flowchart which shows the robot control process in the communication robot which concerns on 4th Embodiment. It is a flowchart which shows the image analysis process in the communication robot which concerns on 4th Embodiment. It is a flowchart which shows the user determination process in the communication robot which concerns on 4th Embodiment. It is a figure which shows the specific example of the guidance robot control system which concerns on 4th Embodiment. It is a figure which shows the specific example of the guidance robot control system which concerns on 4th Embodiment. It is a figure which shows the specific example of the guidance robot control system which concerns on a deformation | transformation form. It is a figure which shows the specific example of the guidance robot control system which concerns on a deformation | transformation form.

(First embodiment)
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of the information presentation apparatus of the present invention. The information presentation device 10 is a device that interacts with a human user, and is a robot, a computer, or the like.

The information presentation apparatus 10 includes an image input unit 11 that inputs an image including a user's face, a region determination unit 12 that determines each region of the facial organ from the image, and a color information extraction unit 13 that extracts color information from the region. The storage unit 15 that stores the color language table 14 in which the color information and the language information are associated with each other and the color language table 14 are collated based on the color information extracted by the color information extraction unit 13 to obtain the language information. And a matching unit 16 to be acquired.

The information presentation device 10 may further include an utterance content determination unit 17 that determines utterance content based on language information, and an output unit 18 that outputs the utterance content determined by the utterance content determination unit 17. good. In addition, the information presentation apparatus 10 may include a microphone that inputs a user's voice.

The image input unit 11 includes a camera, a scanner, a storage media drive, a communication control device, and the like. The image input unit 11 captures a target user, reads a printed matter on which the target user is printed, and stores an image of the target user. The stored storage medium is read, or an image of a target user transmitted from another computer is received. The input image is temporarily stored in a RAM (Random Access Memory) of the information presentation device 10.

The area determination unit 12, the color information extraction unit 13, the collation unit 16, and the utterance content determination unit 17 are configured by a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU calls a program stored in the storage unit 15, ROM, recording medium or the like to a work memory area on the RAM, executes it, controls the drive of each device connected via the bus, and is performed by the information presentation device 10. The processing described later is realized. The ROM is a non-volatile memory, and permanently stores a program such as a boot program and BIOS of the information presentation apparatus 10, data, and the like. The RAM is a volatile memory, and temporarily holds a program, data, and the like loaded from a storage unit, a ROM, a recording medium, and the like, and includes a work area used by the CPU for performing various processes.

The storage unit 15 is an HDD (Hard Disk Drive), a flash memory, or the like, and stores a program executed by the CPU, data necessary for program execution, an OS (Operating System), and the like. As for the program, a control program corresponding to the OS and an application program for causing the CPU to execute processing described later are stored. Each program code is read by the CPU as necessary, transferred to the RAM, read by the CPU, and executed as various means.

The output unit 18 includes a speaker, a display device, a printer, a communication control device, and the like. The output unit 18 outputs voice indicating the utterance content, displays characters indicating the utterance content, prints characters indicating the utterance content, Data indicating the utterance content is transmitted to another computer.

FIG. 2 is a block diagram showing a configuration of another embodiment of the information presentation apparatus of the present invention. The information presentation device 20 illustrated in FIG. 2 includes a terminal device 30 used by a user, and a server 40 connected to the terminal device 30 via a network.

The terminal device 30 is a mobile terminal such as a smartphone or a tablet, a PC (Personal Computer), or the like, and includes an image input unit 31 that inputs an image including a user's face, and a transmission unit 39a that transmits data such as an image to the server 40. And comprising.

Further, the terminal device 30 may include a receiving unit 39b that receives data indicating the utterance content from the server 40 and an output unit 32 that outputs the utterance content. Further, the terminal device 30 may include a microphone for inputting a user's voice.

The server 40 includes a receiving unit 49b that receives data from the terminal device 30, an area determination unit 42 that determines each area of the facial organ from the image, a color information extraction unit 43 that extracts color information from the area, color information, A storage unit 45 that stores the color language table 44 associated with the language information, and a collation unit that collates the color language table 44 based on the color information extracted by the color information extraction unit 43 and acquires language information. 46.

The information presentation device 10 includes an utterance content determination unit 47 that determines utterance content based on language information, and a transmission unit 49 a that transmits the utterance content determined by the utterance content determination unit 47 to the terminal device 30. You may do it.

The image input unit 31, the output unit 32, the region determination unit 42, the color information extraction unit 43, the color language table 44, the storage unit 45, the collation unit 46, and the utterance content determination unit 47 are respectively shown in FIG. The output unit 18, the region determination unit 12, the color information extraction unit 13, the color language table 14, the storage unit 15, the collation unit 16, and the utterance content determination unit 17 have the same functions.

The transmission unit 39a transmits data to the server 40 via the communication line 21a. The receiving unit 39b receives data from the server 40 via the communication line 21b. The transmission unit 49a transmits data to the terminal device 30 via the communication line 21b. The receiving unit 49b receives data from the terminal device 30 via the communication line 21a.

Hereinafter, the configuration and operation of the information presentation apparatus 10 shown in FIG. 1 will be described in detail. The information presentation apparatus 20 shown in FIG. 2 performs the same configuration and operation except for data transmission / reception, and thus the description of the information presentation apparatus 20 is omitted.

FIG. 3 is a block diagram illustrating a configuration of the area determination unit 12. As shown in FIG. 3, the region determination unit 12 includes a face detection unit 12 a that detects a user's face from an image, and a facial organ (eyes, mouth, nose, chin, and the like) from the face region detected by the face detection unit 12 a. ) And a clothing region specifying unit 12c that specifies the region of the user's clothes from the image.

FIG. 4 is an explanatory diagram of face detection processing by the face detection unit 12a. In the figure, 50 is an image and 51 is a face area. The face detection unit 12a detects a rectangular area as the face area 51 from the image 50 by using a face detection algorithm of the Viola-Jones method, for example. In the Viola-Jones method, a search window (for example, a determination area such as 8 pixels × 8 pixels) is scanned from the upper left with respect to an image to be subjected to face detection, and is sequentially moved. The face detection unit 12a determines, for each area of the search window, whether or not the image is a face image by using a classifier that is constructed in advance by machine learning (AdaBoost) using Haar-Like feature values. Note that the face detection algorithm is not limited to the Viola-Jones method, and ordinary techniques in this field can be used.

FIG. 5 is an explanatory diagram of face organ detection processing by the face organ detection unit 12b. In the figure, 51 is a face area, 52 is an eye area, and 53 is a mouth area. The face organ detection unit 12b specifies the positions (x, y) of feature points of each face organ such as eyebrows, eyes, nose, mouth, and jaw. As an algorithm for specifying the position of the feature point of each facial organ, a normal technique in this field can be used. Then, the facial organ detection unit 12b specifies the region of each facial organ based on the position (x, y) of the feature point. For example, the facial organ detection unit 12 b specifies a rectangular area including all the eye feature points as the eye area 52. Similarly, the facial organ detection unit 12 b specifies a rectangular area including all mouth feature points as the mouth area 53. Note that the facial organ detection unit 12b may specify not only the eye region 52 and the mouth region 53 but also the nose region, the hair region, the cheek region, and the like.

FIG. 6 is an explanatory diagram of the clothing area specifying process by the clothing area specifying unit 12c. In the figure, 50 is an image, 51 is a face area, and 54 is a clothes area. The clothing region specifying unit 12c specifies the clothing region 54 based on the size and position (x, y) of the rectangular region of the face region 51 detected by the face detection unit 12a. For example, the clothing region specifying unit 12 c sets the lower end of the rectangular region of the face region 51 as the upper end of the clothing region 54 and sets the lower end of the image 50 as the lower end of the clothing region 54. In addition, for example, the clothing area specifying unit 12 c sets a value obtained by multiplying the horizontal width of the rectangular area of the face area 51 by a predetermined magnification as the horizontal width of the clothing area 54 and uses the center position of the facial area 51 as a reference. Determine the left and right edges. Alternatively, the clothing region specifying unit 12 c sets the boundary between the background and the clothing in the left-right direction as the left end and the right end of the clothing region 54. Note that the algorithm for specifying the clothing region 54 is not limited to this, and ordinary techniques in this field can be used. In addition, the clothing region specifying unit 12c may specify not only a jacket, but also a hat, glasses, pants, a bag, and the like.

FIG. 7 is an explanatory diagram of color information extraction processing by the color information extraction unit 13. In the figure, 52 is an eye region, 53 is a mouth region, 54 is a clothing region, 55, 56, and 57 are color information, and 55a, 55b, 55c, 56a, 56b, 56c, 57a, 57b, and 57c are color information. Color (single color). The color information extraction unit 13 performs a color reduction process on each of the eye area 52, the mouth area 53, and the clothes area 54, and extracts one or a plurality of color information 55 to 57 representing each area. In the example shown in FIG. 7, the color information extraction unit 13 extracts three colors as the color information 55 to 57. The color information extraction unit 13 specifies one color (single color) or a plurality of colors (single color) by extracting one color (single color) or a plurality of colors (single color) from each area to be subjected to the color reduction processing (for example, 5 Alternatively, one color or three colors may be extracted from each area subjected to color reduction processing.

The color reduction processing by the color information extraction unit 13 is performed by, for example, the k-means method. Specifically, when the image is reduced to k colors (k colors representing each region are extracted), k representative colors are randomly selected from each region, and the closest representative color is selected for each pixel. All pixels are divided into k groups, and the average color of each pixel in each divided group is used as a new representative color, and all the pixels are divided again into k groups, and divided into groups based on this representative color. By repeatedly replacing the average color of the group with the representative color, the color is reduced to k colors.

FIG. 8 is a diagram showing a specific example of a color image scale for forming the color language table 14. In the figure, 60 is a color image scale, and 61 to 63 are single colors arranged on the color image scale 60.

The color image scale 60 has a single color (one single color) corresponding to the image on the coordinate axes of warm (WARM) -cool (COOL) and soft (SOFT) -hard (HARD) which are image determination criteria. ) And language. For example, in the upper left (second quadrant) of the color image scale 60,

single colors

61, 62, 63 and others are arranged, the language corresponding to the single color 61 is “sweet”, and the language corresponding to the single color 62 is “pretty” The language corresponding to the single color 63 is “congratulations”. In the color image scale 60, a language associated with one single color includes many adjectives such as “sweet” and “cute”. This is because adjectives are most suitable as a language for expressing the impression given by color.

Also, it is desirable that the single color arranged on the color image scale 60 be a color in which one of brightness and saturation or brightness and saturation is different with respect to hue. That is, the hue is divided into, for example, red, yellow-red, yellow, yellow-green, green, blue-green, blue, purple-blue, purple, red-purple, and achromatic color, and / or one of the lightness and saturation of each hue. It is desirable to construct a single color of the color image scale 60 from the colors having different colors. This is because the impression that a color gives depends on the hue, lightness, and again that color has. Note that the single color and language of the color image scale 60 shown in FIG. 8 are only representative, and actually, the single color and language are arranged more finely.

9 and 10 are diagrams showing examples of color image scales in which three combination colors and languages are associated with each other. In the figure, 80 and 90 are color image scales, and 81 and 91 are three combined colors (color arrangements) arranged on the color image scales 80 and 90.

The color image scale 80 shown in FIG. 9 has three combination colors (color schemes) and languages corresponding to the image arranged on a public-uplifting, orthodox-individual coordinate axis. For example, in the upper left (second quadrant) of the color image scale 80, the combination color 81 and the like are arranged, the language corresponding to the combination color 81 is “settled”, and the combination color 81 is a single color 81a, A single color 81b and a single color 81c are combined. The color image scale 80 is applied, for example, to the color and language of a car.

The color image scale 90 shown in FIG. 10 has three combination colors (color schemes) and languages corresponding to the image arranged on the formal-casual and static-motion coordinate axes. For example, in the upper left (second quadrant) of the color image scale 90, the combination color 91 and the like are arranged, the language corresponding to the combination color 91 is “elegant”, and the combination color 91 includes the single color 91a and the single color 91b. And a single color 91c. This color image scale 90 is applied to, for example, the color and language of fashion (clothes).

FIG. 11 is a diagram showing a specific example of the color language table 14. As shown in FIG. 11, in the color language table 14, a combination color ID 14a, a single color ID 14b, a language 14c, and a related word 14d are associated with each other.

The combination color ID 14a is an ID for identifying a combination color arranged on a color image scale (see FIGS. 9 and 10) in which three combination colors and languages are associated with each other. The single color ID 14b is composed of single color IDs of color scheme 1, color scheme 2 and color scheme 3 which are single colors. For example, the data of which the combination color ID 14a is “A00100” corresponds to the combination color in which the single color ID 14b of the color scheme 1 is “c3255”, the single color ID 14b of the color scheme 2 is “c4529”, and the single color ID 14b of the color scheme 3 is “c5043”. The language 14c is “innocent”.

The related word 14d is a language related to the language 14c. For example, the related word of the language “innocence” includes “innocence”. When the color language table 14 is formed from a color image scale (see FIG. 8) in which a single color is associated with a language (see FIG. 8), the color language table 14 in which the single color arranged on the color image scale is directly associated with the language is formed. Is done. Note that related terms can be stored in advance in association with storage means, but can also be acquired by crawling Web pages on the Internet, for example.

FIG. 12 is a diagram showing a specific example of the single color table 100. The single color table 100 indicates what single color the color schemes 1 to 3 specified by the single color ID 14b are composed. In the figure, 100 is a monochrome table, 100a is a monochrome ID, and 100b is a color value.

The single color ID is an ID for identifying a single color forming the color schemes 1 to 3 specified by the combination color 14a. The color value 100b represents a single color specified by a single color ID as a coordinate value in a color space. In the single color table 100 of FIG. 12, CEIxy colors in the XYZ color system (xyY color system). The values are x and y in the degree diagram.

For example, the three single color IDs 14b constituting the combination color whose combination color ID 14a is “A00100” shown in FIG. (0.3219, 0.5523), (0.4501, 0.2956), and (0.5027, 0.4315), respectively. Note that the color space representing the color value 70b is not limited to the XYZ color system (xyY color system) in FIG. 12, but the RGB color system, L * u * v * color system, L * a. Various color systems such as * b * color system are appropriately used.

The collation unit 16 includes one color (single color) or a plurality of colors (combination colors) extracted by the color information extraction unit 13 and one color (single color) or combination of the color language table 14 stored in the storage unit 15. A single color is compared and the color value difference (distance in the color space) is below a certain threshold (in the case of a combination color, the difference between all single color values is below a certain threshold) ) Is specified as one color (single color) or combination color.

For example, the combination color extracted by the color information extraction unit 13 is the color information 55 including the single color 55a, the single color 55b, and the single color 55c shown in FIG. 7, and the combination color ID is N in the color language table 14. Assume that the single color ID of color scheme 1 constituting the (combination color N) is NA (single color NA), the single color ID of color scheme 2 is NB (single color NB), and the single color ID of color scheme 3 is NC (single color NC). In this case, when comparing the single color of the color information 55 with the single color of the combination color N, the following six combinations of the single color are compared.
(1) Single color 55a-single color NA, single color 55b-single color NB, single color 55c-single color NC
(2) Single color 55a-Single color NA, Single color 55b-Single color NC, Single color 55c-Single color NB
(3) Single color 55a-Single color NB, Single color 55b-Single color NA, Single color 55c-Single color NC
(4) Single color 55a-Single color NB, Single color 55b-Single color NC, Single color 55c-Single color NA
(5) Single color 55a-Single color NC, Single color 55b-Single color NA, Single color 55c-Single color NB
(6) Single color 55a-single color NC, single color 55b-single color NB, single color 55c-single color NA

In addition, in the combinations (1) to (6), if there is a combination in which the differences in the color values of the contrasting single colors are all equal to or less than a certain threshold value, the combination color N is the same as or similar to the color information 55. Let it be a color. Here, assuming that the color value of the single color A (xA, yA) and the color value of the single color B (xB, yB), the difference ΔC between the color values of the single color A and the single color B is as follows.
ΔC = (xA−xB) ² + (yA−yB) ²
In this way, the collation unit 16 specifies a combination color that is the same as or close to the combination color extracted by the color information extraction unit 13 from the combination colors in the color language table 14.

Thereafter, the collation unit 16 acquires the language (language information) associated with the combination color specified as the combination color that is the same as or similar to the color information 55 and the related word. For example, if the combination color 81 specified in the color image scale 80 (see FIG. 9) is specified as a combination color that is the same as or similar to the color information 55, the collation unit 16 sets the language “calm” associated with the combination color 81. Is "and its related term" calm ".

When there are a plurality of combination colors that satisfy the condition in the color language table 14, the collation unit 16 may acquire a plurality of language information. Further, in the color language table 14, when one color (single color) and a language are associated with each other, the collation unit 16 extracts the one color (single color) extracted by the color information extraction unit 13 and the color language table 14. One color (single color) is compared, one color (single color) whose color value difference is equal to or less than a certain threshold value is specified, and is associated with the specified one color (single color) Get color information. Further, the collation unit 16 may not only acquire related words from the color language table 14, but may also acquire related words from the storage unit 15 or a dictionary database stored in another computer.

The utterance content determination unit 17 determines the utterance content based on language information and related words acquired by the collation unit 16 from various text templates stored in the storage unit 15. Specifically, the utterance content determination unit 17 determines the utterance content based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15.

FIG. 13 is a diagram showing a specific example of the dialogue rule 110. In the figure, 110 is a dialogue rule, 111 is a region, 112 is language information, and 113 is an utterance content. The conversation rule 110 is stored with the utterance content 113 associated with each region 111 and language information 112. For example, in the data No. 1, the area 111 is associated with “clothes” and the language information 112 is associated with “Hisense”.

Also, in the utterance content 113 of the dialogue rule 110, only sentences that can be inserted with a character string such as “It is a very OO clothes!” (00 indicates a place where a character string can be inserted) are stored. In addition, the utterance content determination unit 17 may determine the utterance content by inserting language information into the sentence. For example, when the language information is “elegant”, the utterance content determination unit 17 may insert the language information into “It ’s a very elegant clothes!” And make “A very elegant clothes!” As the utterance content.

FIG. 14 is a diagram showing a specific example of the recommendation table 120. In the figure, 120 is a recommendation table, 121 is an area, 122 is language information, 123 is utterance content, and 124 is recommendation information. The recommendation table 120 stores utterance contents 123 and recommendation information 124 in association with each other for each area 121 and language information 122. For example, for the data No. 1, the region 121 is “mouth” and the language information 122 is “chic”, and the utterance content 123 is “If you like a calm color, how about such a lipstick?”, Recommendation information 124 is associated with “lipstick A”. The recommendation information 124 may be associated with not only a character string but also an image, a moving image, a web page, and the like. For example, not only the character string “lipstick A” but also the product image, CM video, product introduction web page, and the like of the lipstick may be associated.

FIG. 15 is a flowchart showing the operation of the information presentation apparatus 10. As shown in FIG. 15, the image input unit 11 inputs an image including the user's face (step S1). Specifically, the image input unit 11 is a camera, for example, and inputs an image including a user's face photographed by the camera.

Next, the region determination unit 12 determines each region of the facial organ from the image input in step S1 (step S2). Specifically, the face detection unit 12a detects the face of the user from the image, and the face organ detection unit 12b detects the face organs (eyes, mouth, nose, chin, etc.) from the face area detected by the face detection unit 12a. Then, the clothes area specifying unit 12c specifies the area of the user's clothes from the image.

Next, the color information extraction unit 13 extracts color information for each area specified in step S2 (step S3). Specifically, the color information extraction unit 13 performs color reduction processing on each of the facial organ area such as the eye area and the mouth area and the clothing area, and extracts one or more color information representing each area.

Next, the collation unit 16 collates the color language table 14 based on the color information extracted in step S3, and acquires language information and related words (step S4). Specifically, the collation unit 16 compares one color (single color) or a plurality of colors (combination colors) extracted in step S3 with one color (single color) or a combination color of the color language table 14. In contrast, one color of the color language table 14 in which a color value difference is equal to or smaller than a certain threshold value (in the case of a combination color, a color value difference between all single colors is equal to or smaller than a certain threshold value) exists. (Single color) or combination color is specified. Then, the collation unit 16 acquires language information associated with the combination color specified as the combination color that is the same as or close to the combination color from the color language table 14.

Next, the utterance content determination unit 17 determines the utterance content based on the language information acquired in step S4 (step S5). Specifically, the utterance content determination unit 17 determines the utterance content and recommendation information based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15. When there are a plurality of search results of the dialogue rule 110 and the recommendation table 120, the utterance content determination unit 17 selects at random or according to a predetermined priority order. As the priority order, for example, ranking by area can be considered. For example, since eyes give a strong impression even between humans, it is conceivable that the language information associated with the color information of the eye area is given the highest priority. In addition, when the information to be recommended by the information presentation device 10 is related to a specific area, for example, when the information to be recommended is a lipstick, the language information associated with the color information of the mouth area is given the highest ranking. It is possible.

Next, the output unit 18 outputs the utterance content and recommendation information determined in step S5 (step S6). Specifically, the output unit 18 outputs a sound indicating the utterance content from a speaker and displays characters indicating the utterance content and recommendation information on the display device. When the information presentation apparatus 10 includes a microphone for inputting the user's voice, the user's voice may be input, and the process may be repeated (continue the dialogue) from step S1 in consideration of the user's voice. .

According to the above description, the information presenting apparatus 10 (20) according to the present invention enables a natural dialogue like humans when introducing a dialogue with a user. In particular, the information presentation apparatus 10 (20) extracts color information for each facial organ and clothes, and determines the utterance content based on language information associated with the color information. Therefore, it is possible to start a conversation based on the first impression received from the other party, as with the first-time human beings, for the first-time user.

The preferred embodiments of the information presentation apparatus according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. Understood.

(Second Embodiment)
FIG. 16 is a diagram illustrating an example of a system configuration of the information processing system 2001. An information processing system 2001 includes a robot 2002, a computer 2003, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007, which are an embodiment of an information processing apparatus according to the present invention. A robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007 are communicatively connected to a computer 2003, and controlled by the computer 2003 so that the operations of the respective devices cooperate. The form of communication connection between the robot 2002, the camera 2004, the microphone 2005, the odor sensor 2006, the display 2007, and the computer 2003 may be wired or wireless.

The camera 2004 acquires surrounding video data. The camera 2004 is, for example, a single camera, a stereo camera, a distance sensor, or a composite sensor that combines these. Note that video data can be acquired using a camera 2210 built in the robot 2002 instead of the camera 2004. In this case, the camera 2004 may be omitted.

The microphone 2005 acquires surrounding audio data. The microphone 2005 is, for example, a microphone array including a single microphone or a plurality of microphones. The microphone 2005 can also acquire voice data using the microphone 2211 built in the robot 2002. In this case, the microphone 2005 may be omitted.

The odor sensor 2006 acquires ambient odor data. The odor sensor 2006 is, for example, a sensor composed of a single crystal resonator type or semiconductor type sensor, or a sensor array composed of a plurality of sensors. As for the odor sensor 2006, the odor data can be acquired using the odor sensor 2212 incorporated in the robot 2002. In this case, the odor sensor 2006 may be omitted.

In the present embodiment, the robot 2002 includes a plurality of different types of data, that is, video data acquired from the camera 2004 (or camera 2210), audio data acquired from the microphone 2005 (or microphone 2211), and an odor sensor. The feature of the person is extracted using odor data acquired from 2006 (or the odor sensor 2212), and the classification of the person (attributes of the person such as preference, gender, age, language, nationality) based on the extracted feature. Identify. Then, processing (processing such as dialogue, information guidance, and advertisement) according to the identified classification is performed.

The display 2007 is controlled by the computer 2003 and displays an image or the like. The display 2007 may be omitted when display control is performed using the display unit 2204 mounted on the robot 2002, or when the robot 2002 executes processing such as dialogue, information guidance, and advertisement only by voice.

The computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, display control of the display 2007, and controls the operation of the robot 2002.

The robot 2002 (information processing apparatus) is, for example, an interactive robot. As shown in FIG. 17, the robot 2002 includes a control unit 2201, a storage unit 2202, an input unit 2203, a display unit 2204, a media input / output unit 2205, a communication I / F 2206, a movable unit drive mechanism 2207, and a peripheral device I / F unit. 2208, an audio processing unit 2209, a camera 2210, a microphone 2211, an odor sensor 2212, a speaker 2213, and various sensors 2214 are connected via a bus 2220.

The control unit 2201 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU calls a program stored in the storage unit 2202, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2220.

The CPU of the control unit 2201 executes all processes by the robot 2002.
The ROM permanently stores programs such as a boot program for the robot 2002, a BIOS, data, and the like. The RAM temporarily stores the loaded program and data, and includes a work area used by the control unit 2201 to perform various processes.

The storage unit 2202 stores a program executed by the control unit 2201, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2201 as necessary, transferred to the RAM, and read and executed by the CPU.

The storage unit 2202 also includes a classification table 2050 (FIG. 24) that is referred to when specifying a person's classification, a processing database 2060 (FIG. 25) that is referred to when executing a process according to the classification, and the like.
In addition, it has an action DB (not shown) in which a control sequence table describing the control sequence of the robot 2002 is stored. The control sequence table stores data in which all actions performed by the robot 2002 are associated with the movable part control sequence of the robot 2002. The movable part control sequence is a list of driving positions and values of the movable parts to be controlled when realizing each action (for example, head tilt angle 20 degrees, right arm tilt angle 40 degrees, etc.).

The input unit 2203 includes an input device such as a touch panel and various operation buttons, for example, and outputs input data to the control unit 2201.
The display unit 2204 includes, for example, a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2201 on the display device. Let Note that the touch panel of the input unit 2203 is configured integrally with the display of the display unit 2204.

The media input / output unit 2205 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and inputs / outputs data.

The communication I / F 2206 is an interface that mediates communication with the network 2008. The network 2008 includes a LAN (Local Area Network), a WAN (Wide Area Network) connected to a wider area, or a public communication line such as the Internet, a base station, and the like. The communication connection with the network 2008 may be wired or wireless. The robot 2002 can access a Web server via the network 2008 and can transmit and receive various programs and data.

The movable part drive mechanism 2207 includes a mechanism and a drive device for driving a movable part provided on the head, neck, shoulder, arm, waist, leg, and the like of the robot 2002. The operation of the movable unit driving mechanism 2207 is controlled by the control unit 2201. The control unit 2201 reads the movable part control sequence from the action DB of the storage unit 2202, sends a control signal to each movable part drive mechanism 2207, and operates each drive part.

Peripheral device I / F (interface) unit 2208 is a port for connecting a peripheral device, and mediates transmission / reception of data with the peripheral device. The connection form with the peripheral device may be wired or wireless. The robot 2002 can be connected to a computer 2003 or the like via the peripheral device I / F unit 2208.

The speech processing unit 2209 includes a speech synthesis unit, a language discrimination unit, a speech recognition unit, and the like. The speech synthesizer generates synthesized speech data corresponding to character information (text data) based on a command from the control unit 2201 and outputs synthesized speech from the speaker 2213. The synthesized speech is generated based on the correspondence between character information and speech data stored in advance in the utterance database of the storage unit 2202.

The language discriminating unit uses, for example, a language discriminating model that statistically models differences in phoneme type, word phoneme arrangement pattern, prosodic pattern, vocabulary, etc., for each person's speech data. The speech language discrimination process is executed to discriminate the speech language of the person.
The speech recognition unit performs speech recognition processing on speech data that has been subjected to language discrimination using a statistical model (acoustic model and language model) for speech recognition prepared for each language. Recognize

The camera 2210 is a single camera built in the robot 2002, such as a CCD (Charge-Coupled Device) camera, a stereo camera, a distance sensor, or a combination of these, and acquires surrounding video data. To do. The acquired video data is sent to the control unit 2201.

The microphone 2211 is a microphone array built in the robot 2002, for example, a single microphone or a plurality of microphones, and acquires ambient audio data. The acquired audio data is sent to the control unit 2201.

The odor sensor 2212 is a sensor array that is built in the robot 2002 and is composed of, for example, a single sensor of a crystal oscillator type or a semiconductor type, or a sensor array including a plurality of sensors, and acquires ambient odor data. The acquired odor data is sent to the control unit 2201.

The speaker 2213 outputs the synthesized speech generated by the speech synthesis unit of the speech processing unit 2209.

The various sensors 2214 include sensors such as a 3D sensor, a touch sensor, a gyro sensor, a sonar sensor, a laser sensor, a bumper sensor, and an infrared sensor. Each sensor inputs the detected signal to the control unit 2201.
The bus 2220 is a path that mediates transmission / reception of control signals, data signals, and the like.

A computer 2003 is a control device that controls the overall operation of the information processing system 2001. The computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, and display control of the display 2007. In addition, the operation of the robot 2002 is controlled.

As shown in FIG. 18, the computer 2003 includes a control unit 2301, a storage unit 2302, a media input / output unit 2303, a peripheral device I / F unit 2304, an input unit 2306, a display unit 2307, and a communication I / F 2308 via a bus 2309. Connected and configured.

The control unit 2301 includes a CPU, a ROM, a RAM, and the like. The CPU calls a program stored in the storage unit 2302, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2309.

The CPU of the control unit 2301 performs shooting control by the camera 2004, voice acquisition control of the microphone 2005, odor acquisition control by the odor sensor 2006, and display control of the display 2007 in the process executed by the robot 2000 (FIG. 26).

The ROM permanently stores a program such as a boot program for the computer 2003, a BIOS, data, and the like. The RAM temporarily holds the loaded program and data, and includes a work area used by the control unit 2301 to perform various processes.

The storage unit 2302 stores a program executed by the control unit 2301, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2301 as necessary, transferred to the RAM, and read and executed by the CPU.

The media input / output unit 2303 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and performs data input / output.

Peripheral device I / F (interface) unit 2304 is a port for connecting a peripheral device, and transmits / receives data to / from the peripheral device via peripheral device I / F unit 2304. The connection form with the peripheral device may be wired or wireless. A robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, a display 2007, and the like can be connected to a computer 2003 via a peripheral device I / F unit 2304.

The input unit 2306 is an input device such as a keyboard, a mouse, or a touch panel, and outputs input data to the control unit 2301.

The display unit 2307 includes a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2301 on the display device. Let Note that the touch panel of the input unit 2306 is configured integrally with the display of the display unit 2307. The display unit 2307 may be an external display 2007. The control unit 2301 sends display information to the display unit 2307 or the display 2007 to display it.

The communication I / F 2308 is an interface that mediates communication with the network 2008. A computer 2003 can transmit and receive various programs and data from a Web server on the network 2008.

The bus 2309 is a path that mediates transmission / reception of control signals, data signals, and the like.

Next, the functional configuration of the robot 2002 (information processing system 2001) will be described with reference to FIG.

The robot 2002 (information processing system 2001) includes functions of a person identification unit 2020, a video acquisition unit 2021, an audio acquisition unit 2022, an odor acquisition unit 2023, a feature extraction unit 2024, a classification identification unit 2025, a processing unit 2026, and an action unit 2029. Is provided. Processing of each function is executed by a program. This program may be stored in advance in the storage unit 2202 of the robot 2002 or the storage unit 2302 of the computer 2003, or may be downloaded and installed via the network 2008.

The person specifying unit 2020 specifies a target person for which the robot 2002 performs processing such as dialogue, information guidance, and advertisement. The target person is preferably a person who is interested in the robot 2002 or a person who wants to interact with the robot 2002. For example, the target person is a person who talks to the robot 2002 or touches the robot 2002.

The person who talks to the robot 2002 is, for example, the arrival direction of the person's utterance voice estimated by the microphone 2211 built in the robot 2002 or the external microphone 2005, and the camera 2210 built in the robot 2002 or the external microphone. It is specified by detecting the coincidence with the face direction of the person in the video data acquired by the camera 2004.
The person who has touched the robot 2002 detects, for example, a person close to the robot 2002 from video data acquired by the camera 2210 built in the robot 2002 or the external camera 2004, and the robot 2002 has a touch sensor. Is identified by detecting the contact action of a person.

Note that the target person may be a person who is approaching the robot 2002 or a person who is simply reflected in the video data, in addition to a person who has spoken to the robot 2002 or a person who has touched the robot 2002.
Further, when a plurality of persons are included in the video data, the target person may be specified as one group including a plurality of persons. In this case, a target person (one group) is specified by determining which person and which person belong to the same group based on the relationship between positions and movements among a plurality of persons in the video data. .

The video acquisition unit 2021 acquires video data including the person specified by the person specifying unit 2020 by the camera 2210 built in the robot 2002 or the external camera 2004.

The voice acquisition unit 2022 acquires the voice data of the person specified by the person specifying unit 2020 using the microphone 2211 built in the robot 2002 or the external microphone 2005. In the case where the microphone 2211 or the microphone 2005 is a microphone array whose directivity can be controlled, it is desirable to acquire the voice data of the person with the directivity of the microphone 2211 or the microphone 2005 directed toward the face of the person.

The odor acquisition unit 2023 acquires the odor data of the person specified by the person specifying unit 2020 by the odor sensor 2212 built in the robot 2002 or the external odor sensor 2006. When the odor sensor 2212 or the odor sensor 2006 is a sensor array capable of controlling directivity, the odor data of the person can be acquired by directing the directivity of the odor sensor 2212 or the odor sensor 2006 toward the person. desirable.

The feature extraction unit 2024 extracts the features of the person specified by the person specifying unit 2020 using a plurality of different types of data. Specifically, using the data of the video data acquired by the video acquisition unit 2021, the audio data acquired by the audio acquisition unit 2022, and the odor data acquired by the odor acquisition unit 2023, the data of each person Extract features.
Here, the feature of the person extracted based on the video data is “first feature”, the feature of the person extracted based on the audio data is “second feature”, and the person is extracted based on the odor data. This feature is referred to as a “third feature”. Specific examples of the first feature, the second feature, and the third feature will be described below.

(First feature)
The first feature is a feature of a person extracted from video data. As shown in FIG. 20, for example, “motion feature” (motion 1, motion 2,...), “Appearance feature” (look 1, appearance 2). ,..., “Reaction characteristics” (reaction 1, reaction 2,...).

“Motion characteristics” are characteristics related to the movement of a person, for example, how the person walks (walking speed, stride, arm swing, posture, etc.) obtained by applying motion analysis to video data. It is a feature. When a person is a group, it is an average characteristic of how to walk each person belonging to the group and a characteristic of a positional relationship between the persons (one line of one row / two lines of one column / one column of columns, etc.).

“Appearance features” are features related to the appearance of a person. For example, the gender, age (age), eye color, hair color, etc. of a person obtained by applying facial image analysis to video data It is the feature. A known method (Japanese Patent Application Laid-Open No. 2012-053813, etc.) may be used as a method for recognizing the gender, age (age), etc. of a person from a face image.
Also, facial expressions and emotions may be extracted as appearance features. The facial expressions are classified into several patterns such as joy, surprise, anger, sadness, and no facial expression. These facial expressions can be read from the rising angle of the mouth corners, the positional relationship of the cheek, eyes, eyebrows, and the like.
The emotion can be estimated from the read facial expression. For example, a method disclosed in Japanese Patent Laid-Open Nos. 10-255043 and 2014-206903 may be used.
Further, a person's clothes type (suit, dress, folk costume,...) Obtained by applying a known clothes identification analysis to the video data may be extracted as an appearance feature.
When the person is a group, the feature extraction unit 2024 extracts an average appearance feature of each person belonging to the group.

The “reaction feature” is a feature related to the reaction of the person to the presentation of information performed by the robot 2002 to the person. The presentation of information includes, for example, information such as weather information and news displayed on the display unit 2204 and the display 2007, and also a greeting for a person. For example, the synthesized speech corresponding to a predetermined greeting ( "Good morning", "Hello", "Are you happy how?" ...) was generated by the audio processing unit 2209, the audio output (reading) from the speaker 2213 makes. Alternatively, a movable part control sequence of an action corresponding to a predetermined greeting action (“waving hand”, “bowing”,...) Is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part To work.
Then, the feature extraction unit 2024 performs motion analysis, facial expression analysis, and the like on the video data acquired by the video acquisition unit 2021 after the information is presented, so that a person's reaction to the information presentation (“smile” ”,“ Becoming serious ”,“ coming closer ”,“ turning back ”,“ no response ”, etc.) as features.
When the person is a group, the feature extraction unit 2024 extracts an average reaction feature of each person belonging to the group.

(Second feature)
The second feature is a feature of the person extracted from the voice data. For example, as shown in FIG. 21, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality” ( Bright voice / dark voice / thick voice / faint voice, etc.), “speech tempo” (speak slowly / busily, etc.), “overtone characteristics” (characteristics related to softness of sound), “intonation” (intonation), “Reaction sound characteristics” (characteristics of utterance sounds that show some response to the utterances of the other party, such as “um”, “n”, “e”). These are non-verbal information of speech, and are speech features that represent personal features and physical features of a person, and intentions and emotions of the person.

Also, speech linguistic information such as “utterance language” and “utterance content” may be extracted as features. For example, the “speech language” is statistically modeled by the speech processing unit 2209 of the robot 2002 for differences in phoneme types, word phoneme arrangement patterns, prosodic patterns, vocabulary, etc., for human speech data. It is extracted by analyzing using the language discrimination model. Further, the “utterance content” is extracted by applying a voice recognition process to the voice data input from the microphone 2211 by the voice processing unit 2209 of the robot 2002.

(Third feature)
The third feature is a feature of a person extracted from odor data. As shown in FIG. 22, for example, “scent type” (type 1, type 2,...), “Scent intensity” (strength). 1, strength 2, ...).
The odor type is specified by analyzing using an odor discrimination model for discriminating the odor type, which is acquired by learning a large amount of odor data in which the odor type is known in advance. The intensity of odor can be calculated from the absolute value of odor data (input value) acquired by the odor sensor 2006 (or odor sensor 2212).

The classification specifying unit 2025 specifies the person classification 2051 based on the above-described features (first feature, second feature, and third feature) extracted by the feature extraction unit 2024.
The classification 2051 is a grouping of persons with arbitrary attributes (preference, gender, age, language, nationality, etc.). As shown in the classification table 2050 in FIG. 24, each classification 2051 (class 1, classification 2, classification 2). ,... Are defined by the first feature, the second feature, and the third feature. The characteristics that define each class 2051 (class 1, class 2,...) Are determined based on the characteristics (first characteristics) from a large number of persons who have previously known the class 2051 (class 1, class 2,...). , Second feature, and third feature) are extracted and learned statistically. Thereby, it is statistically determined what kind of features (first feature, second feature, third feature) a person belonging to each category 2051 (category 1, category 2,...) Has. I can grasp it.

For example, in the example of FIG. 24, a person belonging to “Category 1” statistically has the characteristics of “operation 3”, “appearance 10”, “response 6”,. 2) "(voice) strength 2", "(voice) height 9", "frequency characteristic 1", etc., and "third characteristic" is "(odor) type" It can be understood that there is a tendency to have the characteristics of “5”, “(odor) intensity 4”.
In addition, a person belonging to “Category 2” statistically has “motion 5”, “appearance 2”, “response 1”, etc. as “first characteristics”, and “second characteristics” (Voice) strength 4 ”, (Voice) height 7”, “Frequency characteristics 3”, etc., and “Third feature” includes “(Odor) type 3” “Odor” It can be understood that there is a tendency to have the characteristic of strength 1 ”.
In addition, a person belonging to “Category 3” statistically has “motion 2”, “appearance 7”, “response 3”, etc. as “first feature”, and “second feature” as “second feature” (Voice) strength 6 ", (voice) height 1", "frequency characteristic 4", and so on, and "third characteristic" includes "(smell) type 2""(smell)" It can be understood that there is a tendency to have the characteristic of strength 8 ”.

The classification specifying unit 2025 refers to the classification table 2050 and specifies the person's classification based on the features extracted by the feature extracting unit 2024. For example, for all the features extracted by the feature extraction unit 2024, the category 2051 (category 1, category 2,...) To which each feature belongs is aggregated, and the category 2051 with the largest aggregation count is identified as the person category. To do.

The processing unit 2026 performs processing based on the processing data 2061 of the classification 2051 specified by the classification specifying unit 2025. “Processing” includes processing (interaction or the like) in which the robot 2002 becomes a person's partner and receives and answers, and processing (advertisement or the like) in which the robot 2002 unilaterally provides information to the person.
FIG. 25 is a diagram showing the contents of the processing database 2060 that holds the processing data 2061. As shown in the figure, the processing database 2060 holds processing data 2061 (processing data 1, processing data 2,...) For each category 2051 (class 1, classification 2,...).

The processing data 2061 is, for example, video data, audio data, text data, operation data, or any combination of these data.
When the processing data 2061 is video data, the video is displayed on the display unit 2204 or the display 2007 of the robot 2002. When the processing data 2061 is audio data, audio is output from the speaker 2213 of the robot 2002. When the processing data is text data, the voice processing unit 2209 generates a synthesized voice of the text data and outputs the voice from the speaker 2213 (or displays the text data on the display unit 2204 or the display 2007 of the robot 2002). When the processing data 2061 is operation data, a movable part control sequence corresponding to each action included in the operation data is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part is operated.

Processing data 2061 is arbitrarily set in advance by a system designer or the like. For example, if each category 2051 (category 1, category 2,...) Is grouped by “language” such as category 1 = “English” and category 2 = “French”, the processing data 2061 for each language Dialog data (voice data) can be set. Further, if each category 2051 (category 1, category 2,...) Is grouped by “age” such as category 1 = “10's” and category 2 = “20's”, as processing data 2061, Information for each age (such as product information that has recently become popular in each age) can be set.

Next, an example of the flow of processing executed by the robot 2002 will be described with reference to FIG.
First, the robot 2002 identifies a target person for processing such as dialogue, information guidance, and advertisement (step S2000).

Subsequently, the robot 2002 extracts the characteristics of the person identified in step S2001 using a plurality of different types of data.
First, the robot 2002 acquires video data and extracts a first feature of a person (step S2001). For example, as described above, the “motion characteristics”, “appearance characteristics”, “reaction characteristics”, etc. of the person (see FIG. 20) are extracted from the video data.
When extracting “reaction features”, predetermined information is presented to a person before step S2001, and in step S2001, a person's reaction to the information presentation is extracted as a feature.

Subsequently, the robot 2002 acquires voice data and extracts the second feature of the person (step S2002). For example, as described above, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality”, “speech tempo”, “overtone characteristics”, “inflection”, “reaction sound”, “speech language” of a person from voice data “Speech contents” and the like (see FIG. 21) are extracted.

Furthermore, the robot 2002 acquires odor data and extracts a third feature of the person (step S2003). For example, as described above, a person's “scent type”, “scent intensity”, and the like (see FIG. 22) are extracted from the scent data.

Subsequently, the robot 2002 refers to the classification table 2050, and based on the characteristics of the person (first characteristic, second characteristic, and third characteristic) extracted in steps S2001 to S2003, the classification of persons 2051 is performed. Is identified (step S2004). For example, for all the features extracted in steps S2001 to S2003, the classification 2051 (classification 1, classification 2,...) To which each characteristic belongs is aggregated, and the classification 2051 with the largest aggregation count is specified as the person classification. To do.

For example, as illustrated in FIG. 23, “motion 3”, “appearance 2”, and “response 6” are extracted as “first feature”, and “(voice) strength 2” “(( It is assumed that “sound height” 7 ”and“ frequency characteristic 1 ”are extracted, and“ (odor) type 5 ”and“ (odor) intensity 8 ”are extracted as third characteristics.
In this case, referring to the classification table 2050 of FIG. 24, among the extracted features, “operation 3” “reaction 6” “(voice) strength 2” “frequency characteristic 1” “scent” type 5 ” 5 features belong to “Category 1”, 2 features “Appearance 2” and “Height 7” belong to “Category 2”, and 1 feature “Smell strength 8” Belongs to “Category 3”. That is, since most features belong to “class 1”, the robot 2002 identifies “class 1” as a person category.

Then, the robot 2002 refers to the processing database 2060, acquires the processing data 2061 of the classification 2051 specified in step S2004, and executes predetermined processing based on the processing data 2061 (step S2005). For example, when the specified category 2051 is “category 1”, the robot 2002 executes processing based on “processing data 1” (see FIG. 25) corresponding to “classification 1”.

The preferred embodiments according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples.

For example, although the information processing apparatus according to the present invention has been described in the form of a robot, the application form is not limited to the robot, and can be applied to vending machines, digital signage, and the like.

Further, in FIG. 19, various functions and various data included in the information processing system 2001 (person identification unit 2020, video acquisition unit 2021, voice acquisition unit 2022, odor acquisition unit 2023, and the like are included in the control unit 2201 and the storage unit 2202 in the robot 2002. A feature extraction unit 2024, a classification specifying unit 2025, a processing unit 2026, a classification table 2050, a processing database 2060, etc.), but these units and data are provided in a computer 2003 provided separately from the robot 2002. Also good.

Further, in steps S2001 to S2003 in FIG. 26, it is not always possible to extract the characteristics of a person from all types of data such as video data, audio data, and odor data. For example, in the case where a person does not speak, the characteristics of the person cannot be extracted from the voice data. In this case, the person classification may be specified based on the characteristics of the person extracted from the video data and the odor data. Further, when the person is in a place away from the robot 2002, the characteristics of the person cannot be extracted from the odor data. In this case, the person classification may be specified based on the characteristics of the person extracted from the video data and the audio data.

In addition, in the processing of FIG. 26, if there is an error in the classification 2051 of the person specified by the robot 2002, the classification table 2050 may be relearned. Specifically, after step S2004 in FIG. 26, processing for confirming the correctness of the classification 2051 specified by the robot 2002 for the person is added. For example, it is assumed that classification 2051 (classification 1, classification 2,...) Is grouped by “nationality”, and the classification 2051 of the specified person is “France”. In this case, the robot 2002 asks the person a question confirming the correctness of the specified classification 2051 such as “Where are you from?”. On the other hand, when the user answers the category 2051 that is different from the category 2051 (“France”) in which the person is specified (for example, the response “I came from Italy”), the robot 2002 specifies the specified person. Class 2051 ("France") is judged to be an error, and correct classification 2051 ("Italy") and the features (first feature, second feature, and third feature) extracted in steps S2001 to S2003 are correctly learned. As the data, the classification table 2050 is relearned.

In addition, it is obvious that those skilled in the art can come up with various changes and modifications within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. It is understood.

(Third embodiment)
<Guiding robot control system 3100>
FIG. 27 is a diagram illustrating an exemplary scene suitable for operation of the guidance robot control system 3100 according to the third embodiment.
The guide robot control system 3100 is a system using a communication robot 3001 (guide robot). The communication robot 3001 is disposed in a space 3000S such as in an exhibition hall or a store, for example. Note that the communication robot 3001 is desirably arranged at a position where the space 3000S can be looked over.

The communication robot 3001 is, for example, a humanoid robot having a height of about 1 m.
Then, the communication robot 3001 acquires and analyzes a frame image including the user P in the space 3000S (users P1 to P3 in the example of FIG. 27), determines the user P to be called, Make a voice call (guidance). In the example described below, the communication robot 3001 estimates that the user P who is busy is in trouble, and determines that the user P is a voice call target.

<Communication robot 3001>
FIG. 28 is a functional block diagram of a communication robot 3001 according to the third embodiment.
FIG. 29 is a diagram illustrating an example of the guidance data storage unit 3022 of the communication robot 3001 according to the third embodiment.
As shown in FIG. 28, the communication robot 3001 includes a control unit 3010, a storage unit 3020, a camera 3031 (imaging unit), a speaker 3032, and a motor unit 3033.
The control unit 3010 is a central processing unit (CPU) that controls the entire communication robot 3001. The control unit 3010 executes various functions in cooperation with the hardware described above by appropriately reading and executing an operating system (OS) and application programs stored in the storage unit 3020.

The control unit 3010 includes an image acquisition unit 3011 (image acquisition unit), an image analysis unit 3012, a user determination processing unit 3013, a guidance data extraction unit 3015, a movement control unit 3016 (movement control unit), and a guidance unit 3017. (Guide means) and a facial expression comparison unit 3018 (facial expression comparison means).
The image acquisition unit 3011 acquires a plurality of frame images via the camera 3031. The acquired plurality of frame images are continuous in time series.
The image analysis unit 3012 analyzes the frame image acquired by the image acquisition unit 3011. The image analysis unit 3012 includes a direction determination unit 3012a (direction determination unit), a distance calculation unit 3012b (distance calculation unit), an expression determination unit 3012c (expression determination unit), and an attribute determination unit 3012d (attribute determination unit). Prepare.
For each user P included in the frame image, the direction determination unit 3012a determines the direction that the user P is viewing.
The distance calculation unit 3012b calculates the distance between the communication robot 3001 and the user P for each user P included in the frame image.
The facial expression determination unit 3012c determines the facial expression of the user P for each user P included in the frame image.
The attribute determination unit 3012d determines the attribute of the user P for each user P included in the frame image. Here, the attribute of the user P refers to, for example, age, sex, and the like.

The user determination processing unit 3013 determines a user P to call out based on the frame image analyzed by the image analysis unit 3012. The user determination processing unit 3013 includes a direction change amount calculation unit 3013a (direction change amount calculation unit), a movement speed calculation unit 3013b (movement speed calculation unit), a movement direction calculation unit 3013c (movement direction calculation unit), and guidance determination. Unit 3013d (guidance determination means).
The direction change amount calculation unit 3013a calculates a direction change amount that is the magnitude of a change with time in the direction that the user P is viewing, as determined by the direction determination unit 3012a.
The movement speed calculation unit 3013b calculates the movement speed based on the change of the distance of the user P calculated by the distance calculation unit 3012b with time.
The movement direction calculation unit 3013c calculates the movement direction based on the change of the distance of the user P calculated by the distance calculation unit 3012b over time.
The guidance determination unit 3013d determines the user P to call out based on the direction change amount and the like.

The guidance data extraction unit 3015 extracts guidance data for guiding the determined user P from the guidance data storage unit 3022.
The movement control unit 3016 moves the communication robot 3001 so as to approach the determined user P based on the movement direction calculated by the movement direction calculation unit 3013c.
The guide unit 3017 outputs the guide data extracted by the guide data extraction unit 3015 via the speaker 3032.
The expression comparison unit 3018 compares the facial expressions determined by the expression determination unit 3012c before and after guidance.
Details of each process will be described later.

The storage unit 3020 is a storage area such as a semiconductor memory element for storing programs, data, and the like necessary for the control unit 3010 to execute various processes.
The storage unit 3020 includes a program storage unit 3021, a guidance data storage unit 3022, and an image data storage unit 3023.
The program storage unit 3021 is a storage area for storing a program. The program storage unit 3021 stores a control program 3021a (program) for executing the various functions of the control unit 3010 described above.
The guide data storage unit 3022 is a storage area for storing guide data issued by the communication robot 3001. As shown in FIG. 29, the guidance data storage unit 3022 stores guidance data for calling out in association with facial expressions and attributes of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
Returning to FIG. 28, the image data storage unit 3023 is a storage area that stores the frame image acquired by the image acquisition unit 3011.

The camera 3031 is a photographing device. The camera 3031 is provided, for example, at a position corresponding to the eye portion of the communication robot 3001. The camera 3031 shoots the subject in the direction that the communication robot 3001 faces.
The speaker 3032 is an audio output device. The speaker 3032 is provided, for example, at a position corresponding to the mouth portion of the communication robot 3001. The speaker 3032 outputs sound based on an instruction from the control unit 3010.

The motor unit 3033 is a motor for causing the communication robot 3001 to operate. The motor unit 3033 includes a head motor 3033a and a foot motor 3033b.
The head motor 3033 a is a motor that moves the head of the communication robot 3001. The head motor 3033a is controlled by the control unit 3010, and moves the head up, down, left, and right in order to photograph various users P in the space 3000S as subjects.
The foot motor 3033b is a motor for moving the communication robot 3001 on the foot of the communication robot 3001.

<Processing of communication robot 3001>
Next, processing of the communication robot 3001 will be described.
FIG. 30 is a flowchart showing robot control processing in the communication robot 3001 according to the third embodiment.
FIG. 31 is a flowchart showing image analysis processing in the communication robot 3001 according to the third embodiment.
FIG. 32 is a flowchart showing user determination processing in the communication robot 3001 according to the third embodiment.
FIG. 33 is a flowchart showing post-guidance processing in the communication robot 3001 according to the third embodiment.

In step S (hereinafter referred to as “S”) 3010 in FIG. 30, the control unit 3010 (image acquisition unit 3011) of the communication robot 3001 performs an image acquisition process. In the image acquisition process, the control unit 3010 always acquires a frame image via the camera 3031 and determines whether or not one or more users P are included in the acquired frame image. When it is determined that one or more users P are included in the frame image, the control unit 3010 stores the frame image acquired at an appropriate timing (for example, every 0.1 second) in the image data storage unit 3023. Remember me.
In step S3011, the control unit 3010 determines whether a processing condition has been met. The processing condition refers to, for example, a case where a plurality of frame images (for example, 30 images) are stored in the image data storage unit 3023, a case where a predetermined time (for example, 3 seconds) has elapsed, or the like. If it is determined that the processing condition has been met (S3011: YES), the control unit 3010 moves the process to S3012. On the other hand, if it is determined that the processing condition is not satisfied (S3011: NO), the control unit 3010 moves the process to S3010.

In step S 3012, the control unit 3010 (image analysis unit 3012) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
Here, the image analysis processing will be described with reference to FIG.
In S3030 of FIG. 31, the control unit 3010 extracts one frame image from the image data storage unit 3023.
In step S3031, the control unit 3010 (direction determination unit 3012a) determines the direction in which the user P is looking for each user P included in the frame image. For example, when two users P1 and P2 are included in the frame image, the control unit 3010 determines the direction that the user P1 is viewing and the direction that the user P2 is viewing. For example, the control unit 3010 can detect a face image from a frame image and determine the orientation of the face based on the arrangement position of each part (organ) of the face. Therefore, the control unit 3010 can determine the orientation of the determined face in the direction in which the user P is looking.
There are various known methods for obtaining the position of each organ (face, nose, mouth, etc.) of the face. For example, a facial organ detection method using a regression model can be used. A facial organ detection method using a regression model is described in, for example, US Patent Application Publication No. 2014/0185924.

In step S 3032, the control unit 3010 (distance calculation unit 3012 b) calculates the distance between the user P and the communication robot 3001 for each user P included in the frame image. For example, the control unit 3010 can calculate a rough distance between the user P and the communication robot 3001 based on the size of the user P included in the frame image, the background of the frame image, and the like.
In step S3033, the control unit 3010 (expression determination unit 3012c) determines the facial expression for each user P included in the frame image. The control unit 3010 can determine, for example, whether it is a smile (positive expression) or a sad face (negative expression) from the shape of the eyes, nose, mouth, etc. as facial expressions.

In step S 3034, the control unit 3010 determines whether all of the processing target frame images included in the image data storage unit 3023 have been processed. When processing has been performed on all the frame images to be processed (S3034: YES), the control unit 3010 ends this processing and shifts the processing to FIG. On the other hand, when the processing has not been performed on all the frame images to be processed (S3034: NO), the control unit 3010 moves the process to S3035.
In step S3035, the control unit 3010 extracts one unprocessed frame image from the processing target frame images included in the image data storage unit 3023. Thereafter, the control unit 3010 moves the process to S3031.

Returning to FIG. 30, in step S3013, the control unit 3010 (user determination processing unit 3013) performs user determination processing.
Here, the user determination process will be described with reference to FIG.
In S3040 of FIG. 32, the control unit 3010 arranges the frame images subjected to the image analysis process in time series.
In step S3041, the control unit 3010 (direction change amount calculation unit 3013a) calculates the direction change amount for each user P based on the plurality of rearranged frame images. Here, the amount of change in direction refers to the amount of change in the direction in which the user P is looking as time passes. Specifically, when the user P is drenching intensely, the degree of drowning is high, and the control unit 3010 calculates the amount of change in direction to be high.

In S3042, the control unit 3010 (movement speed calculation unit 3013b) calculates the movement speed for each user P based on the rearranged frame images. The control unit 3010 can calculate the moving speed based on the change in the position of the user P among the plurality of frame images. Note that the control unit 3010 may calculate whether the moving speed is faster than a specified range.
In step S3043, the control unit 3010 (movement direction calculation unit 3013c) calculates the movement direction for each user P based on the rearranged frame images. The control unit 3010 can calculate the moving direction based on the change in the position of the user P in the plurality of frame images.
In step S 3044, the control unit 3010 (user determination processing unit 3013) determines the facial expression that appears most frequently for each user P based on the plurality of rearranged frame images.

In step S3045, the control unit 3010 (guidance determination unit 3013d) calculates the calling priority based on each index.
Here, the voice call priority is a degree calculated by weighting each of the direction change amount, the position of the user P, the moving speed, and the facial expression. In this example, since the user P who is in trouble is given priority, the degree of weighting is set highest for the direction change amount. Then, when the direction change amount is large, the moving speed is small, and the facial expression is negative, the control unit 3010 performs weighting so that the calling priority is the highest. In addition, when the distance between the user P and the communication robot 3001 is within a specified range (for example, 10 m), the control unit 3010 sets the user P as a calling target.

In S3046, the control unit 3010 (guidance determination unit 3013d) determines the user P to call on the basis of the calling priority.
Note that when the user P has a small direction change amount, the moving speed is slow, and the priority is a small value even if the facial expression is negative. In such a case, that is, if the priority is less than the predetermined value, the control unit 3010 may determine that there is no user P to call out.
Thereafter, the control unit 3010 ends this processing, and moves the processing to FIG.

Returning to FIG. 30, in step S3014, the control unit 3010 (user determination processing unit 3013) determines whether or not the user P is determined by the user determination processing. When the user P is determined (S3014: YES), the control unit 3010 moves the process to S3015. On the other hand, when the user P is not determined (S3014: NO), the control unit 3010 ends this process.
In step S3015, the control unit 3010 (attribute determination unit 3012d) determines the attribute of the determined user P. Specifically, the control unit 3010 determines an approximate age or the like from the image of the user P extracted from the frame image. The control unit 3010 may determine whether the child is an adult without determining the age. In that case, the control unit 3010 may estimate the height from the image of the user P and determine whether or not the child is a child.

In step S3016, the control unit 3010 (guidance data extraction unit 3015) refers to the guidance data storage unit 3022 (FIG. 29) based on the facial expression of the user P determined in step S3044 in FIG. 32 and the attributes determined in step S3015. Extract data.
In step S3017, the control unit 3010 (movement control unit 3016) performs movement processing. As the moving process, the control unit 3010 controls the motor unit 3033 (foot motor 3033b) based on the moving speed and moving direction of the user P calculated in S3042 and S3043 in FIG. The robot 3001 is moved.
In step S3018, the control unit 3010 (guide unit 3017) performs guidance processing. As the guidance process, the control unit 3010 outputs the guidance data extracted in S3016 via the speaker 3032 so as to speak (speak) to the user P after moving.

In step S3019, the control unit 3010 performs post-guidance processing.
Here, the post-guidance process will be described with reference to FIG.
In S3050 of FIG. 33, the control unit 3010 (image acquisition unit 3011) acquires a frame image including the user P after outputting the guidance data via the camera 3031.
In step S3051, the control unit 3010 (expression determination unit 3012c) analyzes the acquired frame image to determine the facial expression.
In step S3052, the control unit 3010 (expression comparison unit 3018) performs expression comparison processing. As facial expression comparison processing, the control unit 3010 compares the facial expression before the guidance data is output (the processing in S3044 in FIG. 32) with the facial expression acquired in S3051.

In step S3053, the control unit 3010 (guidance data extraction unit 3015) extracts guidance data corresponding to the facial expression comparison result from the guidance data storage unit 3022 (FIG. 29).
In step S3054, the control unit 3010 (guidance unit 3017) outputs the guidance data extracted in step S3053 via the speaker 3032 as guidance processing. Thereafter, the control unit 3010 shifts the processing to FIG.
Returning to FIG. 30, the control unit 3010 ends this processing.

<Explanation of specific examples>
Here, a specific example using the guidance robot control system 3100 will be described.
34A and 34B are diagrams showing a specific example of the guidance robot control system 3100 according to the third embodiment.
First, the control unit 3010 of the communication robot 3001 obtains a frame image of the child C1 who is lost and crawls as shown in FIG. 34A (S3010 in FIG. 30), and determines that it is a call target. (S3014 in FIG. 30 is YES).
Next, the control unit 3010 extracts guidance data at the time of calling when the attribute is a child and the expression is negative from the guidance data storage unit 3022 (see FIG. 29) (S3016 in FIG. 30).
34A and 34B, the control unit 3010 moves toward the position of the child C1 (S3017 in FIG. 30), and outputs the extracted guidance data via the speaker 3032 (FIG. 30). S3018).

As a result, the communication robot 3001 can guess the state of the child C1 based on the movement and facial expression of the child C1, and can make a call suitable for the intention of the child C1.
Thereafter, the control unit 3010 obtains a frame image of the child C1, and if the facial expression of the child C1 remains negative as shown in FIG. 34C, the control unit 3010 reads the voice data from the guidance data storage unit 3022. Guidance data of the corresponding facial expression is extracted and output via the speaker 3032 (S3019 in FIG. 30).

Thus, according to the guidance robot control system 3100 of the third embodiment, there are the following effects.
(1) The amount of change in the direction in which the user P is looking is calculated from a plurality of frame images. Since the direction change amount becomes a large value when the user P is jogging, the communication robot 3001 determines that the user P is in trouble and calls out. Therefore, the state of the user P as to whether or not the user P is in trouble is estimated based on the amount of change in the direction in which the user P is looking, and the communication robot 3001 determines that the user P is in trouble. It is possible to preferentially make a voice call that suits the intention.

(2) When the position of the user P and the position of the communication robot 3001 are within the specified range, the communication robot 3001 calls out to the user P. Therefore, the communication robot 3001 can preferentially talk to the user P nearby.
(3) When the moving speed of the user P is within the specified range, the communication robot 3001 calls out to the user P. Therefore, when the user P is walking slowly or stops, the communication robot 3001 can preferentially talk.
(4) When the movement position of the user P is estimated and the communication robot 3001 approaches, the conversation by the communication robot 3001 with respect to the user P can be performed more naturally.

(5) Based on the facial expression of the user P, the guidance data is selected, and the communication robot 3001 speaks to the user P. Therefore, the communication robot 3001 can talk by guiding the content that matches the facial expression of the user P.
(6) Based on the change in the facial expression of the user P before and after calling, the communication robot 3001 further guides the next. Therefore, the communication robot 3001 can talk about the content that can be communicated more.

(7) An attribute such as the age of the user P is discriminated from the frame image, guidance data corresponding to the attribute of the user P is selected, and the communication robot 3001 calls out to the user P. Therefore, it is possible to talk by guiding the content that matches the attribute of the user P.
(8) The communication robot 3001 can perform all processing from acquisition of frame images to output of guidance data. Therefore, only the communication robot 3001 needs to be prepared, and installation can be performed easily.

(Fourth embodiment)
In the fourth embodiment, a description will be given of what causes a communication robot to call a user. In the following description, parts that perform the same functions as those in the third embodiment described above are given the same reference numerals or the same reference numerals at the end, and redundant descriptions are omitted as appropriate.

<Guiding robot control system 3200>
FIG. 35 is a diagram illustrating an example of a scene suitable for operation of the guidance robot control system 3200 according to the fourth embodiment.
FIG. 36 is a functional block diagram of a guidance robot control system 3200 according to the fourth embodiment.
FIG. 37 is a diagram illustrating an example of the guidance data storage unit 3222 of the communication robot 3201 according to the fourth embodiment.

The guidance robot control system 3200 places the communication robot 3201 at the entrance of a space 3200S such as an exhibition booth or a store entrance. The guidance robot control system 3200 is a system used by the communication robot 3201 to call the user P (in this example, the user P4) to enter the space 3200S.
The guidance robot control system 3200 includes a communication robot 3201 and a camera 3205. Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S.

In this example, the camera 3205 is a surveillance camera. The camera 3205 acquires the frame image of the user P passing through the space 3200S, and transmits the frame image to the communication robot 3201.
The communication robot 3201 analyzes the frame image of the user P received from the camera 3205, determines the user P to be called, and provides guidance to the user P. The communication robot 3201 calls out the user P who passes in front of the camera 3205 by determining whether he / she is interested in the space 3200S.

<Communication robot 3201>
As illustrated in FIG. 36, the communication robot 3201 includes a control unit 3210, a storage unit 3220, a speaker 3032, and a motor unit 3033.
The control unit 3210 includes an image reception unit 3211 (image acquisition unit), an image analysis unit 3212, a user determination processing unit 3213, a face data collation unit (face collation unit) 3214, a guidance data extraction unit 3215, and movement control. A unit 3016, a guide unit 3017, and a facial expression comparison unit 3018 are provided.
The image reception unit 3211 receives the frame image transmitted by the camera 3205, thereby receiving the frame image.

The image analysis unit 3212 analyzes the frame image received by the image reception unit 3211. The image analysis unit 3212 includes a direction acquisition unit 3212a (user direction acquisition unit), a distance calculation unit 3012b, a facial expression determination unit 3212c, and a face acquisition unit 3212e (face acquisition unit).
The direction acquisition unit 3212a analyzes the frame image to identify the user P, determines the direction that the user P is viewing for each user P, and acquires the direction that the user P is viewing.
The facial expression determination unit 3212c determines the facial expression of the user P based on the face data of the user P acquired by the face acquisition unit 3212e.
The face acquisition unit 3212e acquires the face data of the user P for each user P included in the frame image.

The user determination processing unit 3213 determines a user P who is to talk to based on a change with time based on the frame image analyzed by the image analysis unit 3212. The user determination processing unit 3213 includes a movement speed calculation unit 3013b, a movement direction calculation unit 3013c, and a guidance determination unit 3213d.
The guidance determination unit 3213d determines the user P to call on, based on the viewing direction of the user P acquired by the direction acquisition unit 3212a, the movement direction calculated by the movement direction calculation unit 3013c, and the like.
The face data collation unit 3214 collates the face data stored in the face data storage unit 3224 and the face data acquired by the face acquisition unit 3212e.
The guidance data extraction unit 3215 extracts the guidance data for guiding the user P determined by the user determination processing unit 3213 from the guidance data storage unit 3222. At that time, the guidance data extraction unit 3215 extracts guidance data different from the previous one when the face data matching unit 3214 can collate.

The storage unit 3220 includes a program storage unit 3021, a guidance data storage unit 3222, an image data storage unit 3023, and a face data storage unit 3224.
The program storage unit 3021 stores a control program 3221a for executing the various functions of the control unit 3210 described above.
As shown in FIG. 37, the guidance data storage unit 3222 stores guidance data for calling in association with facial expressions of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
The face data storage unit 3224 is a storage area for storing face data. When the guide data is output, the face data storage unit 3224 stores the face data acquired by the face acquisition unit 3212e in association with the output guide data.

<Camera 3205>
The camera 3205 is a live camera represented by a web camera, for example. The camera 3205 may be a stereo camera. A stereo camera refers to a camera that can capture information about the depth of an object by simultaneously capturing images from slightly shifted positions with a plurality of lenses and generating parallax.
The camera 3205 includes an image acquisition unit 3251 and an image transmission unit 3252.
The image acquisition unit 3251 is a control unit that acquires a frame image.
The image transmission unit 3252 is a control unit that transmits the acquired frame image via the communication network 3000N.
Communication robot 3201 and camera 3205 are communicably connected via communication network 3000N. The communication network 3000N is, for example, an Internet line or a mobile terminal communication network. The communication network 3000N may use, for example, short-range wireless communication based on the Bluetooth (registered trademark) standard.

<Processing of communication robot 3201>
Next, processing of the communication robot 3201 will be described.
FIG. 38 is a flowchart showing robot control processing in the communication robot 3201 according to the fourth embodiment.
FIG. 39 is a flowchart showing image analysis processing in the communication robot 3201 according to the fourth embodiment.
FIG. 40 is a flowchart showing user determination processing in the communication robot 3201 according to the fourth embodiment.

In S3210 of FIG. 38, the control unit 3210 (image receiving unit 3211) of the communication robot 3201 receives the frame image transmitted by the camera 3205. Then, the control unit 3210 stores the received frame image in the image data storage unit 3023.
In step S 3211, the control unit 3210 (image analysis unit 3212) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
Here, the image analysis processing will be described with reference to FIG.

In S3230 of FIG. 39, the control unit 3210 (direction acquisition unit 3212a) detects the user P included in the frame image and acquires the direction in which the user P is viewing. For example, when two users P1 and P2 are included in the frame image, the control unit 3210 detects the users P1 and P2, and acquires the direction in which each user P is looking.
There are various known methods for detecting the user P. For example, a moving object detection method can be used. In the moving object detection method, a frame image in which the user P is not captured is stored as background image data in advance, and a difference between the background image data and the frame image to be processed is obtained, so that the user P who is a moving object. Is detected. Further, human detection using the HOG feature amount can also be used. Human detection using HOG features is described in “N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection.

In addition, as the direction in which the user P is viewing, the control unit 3210 acquires, for example, the orientation of the face of the user P. The face direction of the user P is assumed to be the front direction with respect to the camera 3205 when the face can be detected by the control unit 3210 (face acquisition unit 3212e) using the face detection method. There are various known methods for detecting a face. For example, face detection using a Haar-like feature can be used. For face detection using Haar-like features, see “P. Viola and M. J. Jones:“ Rapid Object Detection using the Coordinates of the Cofesto et Ecée et al. ”, Proceedings of the Effort. , Pp. 511-518, (2001).
Note that the control unit 3210 may detect a face image from the frame image, extract eyes from the face image, and acquire the direction in which the user P is looking based on the relative position between the eyeball and the pupil.

The processing of S3231 and S3232 is the same as the processing of S3032 and S3033 of the third embodiment (FIG. 31). In step S3232, the control unit 3210 (face acquisition unit 3212e) acquires face data, and the control unit 3210 (expression determination unit 3212c) determines the expression based on the acquired face data. Thereafter, the control unit 3210 shifts the processing to FIG.

Returning to FIG. 38, in step S3212, the control unit 3210 (user determination processing unit 3213) performs user determination processing.
Here, the user determination process will be described with reference to FIG.
In S3240 of FIG. 40, the control unit 3210 arranges the frame images subjected to the image analysis process in time series.
In S3241, the control unit 3210 (direction acquisition unit 3212a) acquires the viewing direction for each user P based on the plurality of rearranged frame images. Here, in a plurality of frame images, when the direction in which the user P is viewing is the same direction, the control unit 3210 acquires the direction as the viewing direction. Therefore, for example, when the user P is jogging, the control unit 3210 does not acquire the viewing direction.

The processing from S3242 to S3244 is the same as the processing from S3042 to S3044 of the third embodiment (FIG. 32). In calculating the moving direction for each user P, for example, human body tracking using the Mean-shift feature can be used. For human body tracking using the Mean-shift feature, see “Collins R .: 'Mean-Shift Blob Tracking, Throw Scale Space Space, 4-Proc. IEEE Conf. ."It is described in.

In step S3245, the control unit 3210 (guidance determination unit 3213d) calculates the calling priority based on each index.
Here, the voice call priority is a degree calculated by weighting each of the direction that the user P is viewing, the moving direction, the position of the user P, and the moving speed. In this example, the user 3 is interested in the space 3200S, but the user P who is about to pass is preferentially spoken, so the degree of weighting is different in the direction in which the user P is viewing and the direction of movement. Yes, when the user P is facing the front direction (the direction of the camera 3205), the highest is set. Then, the control unit 3210 performs weighting so that the calling priority is the highest when the above condition is satisfied and the moving speed is low. In addition, when the distance between the user P and the communication robot 3201 is within a specified range (for example, 5 m), the control unit 3210 sets the user P as a calling target.
In step S3246, the control unit 3210 (guidance determination unit 3213d) determines the user P to call based on the calling priority.

When the user P is not facing the front direction, the direction that the user P is viewing is different from the moving direction, and the distance between the user P and the communication robot 3201 is within a specified range. Alternatively, the control unit 3210 may determine that there is no user P to call out.
Thereafter, the control unit 3210 ends this processing, and moves the processing to FIG.

Returning to FIG. 38, in step S3213, the control unit 3210 (user determination processing unit 3213) determines whether or not the user P is determined by the user determination processing. When the user P is determined (S3213: YES), the control unit 3210 moves the process to S3214. On the other hand, when the user P is not determined (S3213: NO), the control unit 3210 ends this process.
In step S3214, the control unit 3210 (face data collation unit 3214) collates the face data acquired in step S3232 of FIG. 39 with the face data in the face data storage unit 3224.
In step S3215, the control unit 3210 (face data collation unit 3214) determines whether or not collation has been completed. If collation is successful (S3215: YES), control unit 3210 advances the process to S3216. On the other hand, if the verification has failed (S3215: NO), control unit 3210 moves the process to S3216a. In addition, the case where it can collate is a case where the user P was called in the past.

In step S3216, the control unit 3210 refers to the face data storage unit 3224 and the guide data storage unit 3222, and guide data corresponding to the facial expression of the user P is guide data that is different from the output guide data. Extracted from the storage unit 3222.
On the other hand, in S3216a, the control unit 3210 refers to the guidance data storage unit 3222 and extracts guidance data corresponding to the facial expression of the user P.
The processing of S3217 and S3218 is the same as the processing of S3017 and S3018 of the third embodiment (FIG. 30).
In step S 3219, the control unit 3210 associates the face data of the user P with the output guidance data and stores the data in the face data storage unit 3224.
In step S3220, the control unit 3210 performs post-guidance processing. The post-guidance process is the same as in the third embodiment (FIG. 33). Thereafter, the control unit 3210 ends this process.

<Explanation of specific examples>
Next, a specific example using this guidance robot control system 3200 will be described.
41A and 41B are diagrams showing a specific example of a guidance robot control system 3200 according to the fourth embodiment.
Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S. The image acquisition unit 3251 of the camera 3205 acquires a frame image as needed, and the image transmission unit 3252 transmits the acquired frame image to the communication robot 3201 as needed.

As shown in FIG. 41A, the control unit 3210 walks in front of the space 3200S and targets the user P4 whose face is facing the communication robot 3201 (YES in S3213 in FIG. 38).
Next, the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the acquired face data is not stored, the control unit 3210 starts from the guidance data storage unit 3222 (see FIG. 37). Then, guidance data at the time of calling corresponding to the facial expression is extracted (S3216a in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 asks the user P4 “Please drop in at the store”.
Then, when the user P4 stops and looks at the communication robot 3201, a further call is made (S3220 in FIG. 38).

After a while, when the same user P4 is walking in front of the space 3200S and the face is facing the communication robot 3201, as shown in FIG. 41B, the control unit 3210 calls the user P4. Target (S3213 in FIG. 38 is YES).
Next, the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the user P4 has provided guidance, the control data storage unit 3222 (see FIG. 37), the guidance data at the time of calling different from the previous one corresponding to the facial expression is extracted (S3216 in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 calls out to the user P4, “You have already passed?”.

Thus, according to the fourth embodiment, the guidance robot control system 3200 has the following effects.
(1) The communication robot 3201 speaks by determining whether or not to speak to the user P based on the relationship between the direction in which the user P is viewing and the moving direction of the user P. Therefore, for example, the user P who is looking in the direction of the space 3200S while passing in front of the space 3200S can be presumed to be interested in the space 3200S and can be spoken to the user P.
(2) Since the communication robot 3201 can acquire the direction in which the user P is looking based on whether or not the face of the user P has been detected, the processing can be easily performed.

(3) Since the communication robot 3201 stores the face data of the user P who has been guided, when the next guidance is given to the same user P, different guidance data from the previous one is provided. Can be output. Therefore, since the guidance can be changed by using the face authentication function, different voice calls can be made.
(4) Since the communication robot 3201 performs processing using the frame image acquired by the camera 3205, even if the communication robot 3201 moves, the camera 3205 is fixed and the position for acquiring the frame image does not change. For example, it is possible to easily process a plurality of frame images arranged in time series.

As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above. In addition, the effects described in the embodiments are merely a list of the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the embodiments. In addition, although embodiment mentioned above and the deformation | transformation form mentioned later can also be used in combination suitably, detailed description is abbreviate | omitted.

(Deformation)
(1) In the third embodiment, the communication robot includes a camera. In the fourth embodiment, the communication robot receives a frame image from the camera. However, the present invention is not limited to this.
For example, in the third embodiment, a communication robot may receive a frame image from a camera. In that case, the position of the camera is not limited to the vicinity of the communication robot, and may be installed on the ceiling of the space. In addition, frame images may be acquired from a plurality of cameras.
In the fourth embodiment, a communication robot including a camera may be used. In that case, in order to easily perform image processing, it is desirable that the communication robot does not move and does not change the acquisition position of the frame image.

(2) In the third and fourth embodiments, the communication robot performs various processes as an example, but the present invention is not limited to this. A server that is communicably connected to the communication robot may be provided, and the server may perform various processes. In that case, the communication robot only needs to include at least a speaker that outputs guidance data transmitted from the server and a motor unit for operating the communication robot. Various processes may be performed by both the communication robot and the server.

(3) In the third and fourth embodiments, the example using the frame image acquired at an appropriate timing has been described as an example, but the present invention is not limited to this. For example, moving image data may be used. In that case, a plurality of frame images may be acquired from the moving image data, and image analysis processing may be performed for each frame image.
(4) In the third and fourth embodiments, the example in which the image analysis process is performed on all users included in the acquired frame image has been described as an example. However, the present invention is not limited to this. One user may be selected from the acquired frame images, and an image of the selected user may be analyzed to determine whether or not to make a call target.
(5) In 3rd and 4th embodiment, after outputting guidance data, it demonstrated as an example what compares the user's facial expression before and after guidance data output, and outputs guidance data, but it is not limited to this. For example, the guidance data may continue to be output until the user leaves the communication robot.

(6) In the third embodiment, the direction in which the user is viewing is set as the face direction, and the case where the neck is moved and the direction change amount is increased as it is staggered has been described. However, the present invention is not limited to this. Not. For example, the direction that the user is viewing may be the direction of the line of sight. For example, the control unit can detect the face image from the frame image and extract the eye from the face image, thereby acquiring the eye direction based on the relative position between the eyeball and the pupil. In that case, when the user's moving direction is different from the face direction, and the user's viewing direction is different, it can be a voice call target.

(7) In the fourth embodiment, an example using a Web camera or a stereo camera has been described, but the present invention is not limited to this. For example, a camera having a sensor that acquires depth information represented by Kinect (registered trademark) may be used. By using this kinect, the process of detecting the user's face from the frame image and calculating the moving direction of the user can be performed more easily.
(8) In the fourth embodiment, the processing related to the user attribute is not performed. However, as in the third embodiment, the user attribute is determined, and the guidance data corresponding to the determined user attribute is extracted. It may be.

(9) In the fourth embodiment, the description has been given of outputting the guidance data different from the previous one when it is determined that the communication robot is the same as the user who spoke before, but the present invention is not limited to this. For example, as shown in FIG. 42, different guidance data may be output depending on the moving direction of the user.
42A and 42B are diagrams showing a specific example of a guidance robot control system 3300 according to a modified embodiment.
FIG. 42A shows a case where the user P5 who is walking toward the space 3300S and whose face is facing the communication robot 3301 (camera 3305) is a voice call target. In this case, the communication robot 3301 outputs the guidance data corresponding to the moving direction “I welcome you. Would you like to guide you?” And speaks to the user P5.

Next, FIG. 42B shows a case in which the user P6 who comes out of the space 3300S and whose face is facing the communication robot 3301 is targeted for calling. In this case, the communication robot 3301 outputs “thank you”, which is guidance data corresponding to the moving direction, and speaks to the user P6.
Thus, the communication robot 3301 extracts and outputs guidance data corresponding to the moving direction of the user P. Therefore, the communication robot 3301 can make different calls to the user P5 entering the space 3300S and the user P6 coming out of the space 3300S, and make different appropriate calls according to the situation of the user P. The communication robot 3301 can be used.

DESCRIPTION OF

SYMBOLS

10, 20

Information presentation apparatus

11, 31

Image input part

12, 42 Area | region determination part 12a Face detection part 12b Face organ detection part 12c Clothes area |

region identification part

13, 43 Color

information extraction part

14, 44 Color language table 14a Combination color ID
14b Single color ID of colors 1-3
14c Language (Language information)
14d

Related terms

15, 45

Storage unit

16, 46

Collation unit

17, 47 Utterance

content determination unit

18, 32

Output unit

21a,

21b Communication line

39a,

49a Transmission unit

39b, 49b Reception unit 30 Terminal device 40 Server 50 Image 51 Facial area 52 eye area 53 mouth area 54

clothes area

55, 56, 57, 81, 91

combination color

55a, 55b, 55c, 56a, 56b, 56c, 57a, 57b, 57c, 61, 62, 63, 81a, 81b, 81c, 91a, 91b,

91c Single color

60, 80, 90 Color image scale 100 Single color table 100a Single color ID
100b Color value 110 Dialog rule 111 Area 112 Language information 113 Utterance content 120 Recommendation table 121 Area 122 Language information 123 Utterance content 124 Recommendation information 2001 Information processing system 2002 Robot 2003 Computer 2004 Camera 2005 Microphone 2006 Odor sensor
2007 display 2020 person identification unit 2021 video acquisition unit 2022 voice acquisition unit 2023 odor acquisition unit 2024 feature extraction unit 2025 classification identification unit 2026 processing unit 2040 person feature 2050 classification table 2060

processing database

3001, 3201, 3301 communication robot 3010, 3210 control unit 3011

Image acquisition unit

3012, 3212 Image analysis unit 3012a Direction determination unit 3012b

Distance calculation unit

3012c, 3212c Expression determination unit 3012d

Attribute determination unit

3013, 3213 User determination processing unit 3013a Direction change amount calculation unit 3013b Movement speed calculation unit 3013c Movement

direction calculation Unit

3013d, 3213d guidance determination unit 3016 movement control unit 3017 guide unit 3018 facial expression comparison unit 3020, 3220 storage Unit 3021a, 3221a Control program 3022, 3222 Guide

data storage unit

3031, 3205, 3305 Camera 3032 Speaker 3033

Motor unit

3100, 3200, 3300 Guide robot control system 3211 Image reception unit 3212a Direction acquisition unit 3212e Face acquisition unit 3214 Face data verification unit 3224 Face data storage unit 3251 Image acquisition unit 3252 Image transmission unit C1 Child P, P1 to P6

User

3000S, 3200S, 3300S Space

Claims

Storage means for storing a color language table in which color information and language information are associated;
An image input means for acquiring an input image;
Area determination means for determining one or more predetermined areas from the image;
Color information extracting means for extracting the color information from the region;
Collating means for collating the color language table based on the color information extracted by the color information extracting means and obtaining the language information;
An information presentation device comprising:
Speech content determining means for determining speech content based on the language information;
The information presentation apparatus according to claim 1, further comprising:
3. The information presentation according to claim 1, wherein the color information extraction unit performs a color reduction process on each of the regions, and extracts one or a plurality of the color information representing the region. apparatus.
The information presentation apparatus according to claim 1, wherein the language information is an impression word expressing an impression.
The storage means stores a related word related to language information in association with the language information,
Related word acquisition means for acquiring related words related to the language information from the storage means;
The information presentation apparatus according to claim 1, further comprising:
6. The information presentation apparatus according to claim 5, wherein the related word acquisition unit acquires a related word from the Web via a network.
The storage means further stores a dialogue rule in which the language information is associated with the utterance content,
The information presentation apparatus according to claim 2, wherein the utterance content determination unit determines the utterance content based on the dialogue rule.
The storage means further stores a recommendation table in which the region and the language information are associated with recommendation information recommended to a user.
The information presentation apparatus according to claim 2, wherein the utterance content determination unit further determines the recommendation information based on the recommendation table.
A computer that stores a color language table in which color information and language information are associated is an information presentation method for interacting with a user,
An image input step for acquiring an input image;
An area determination step for determining a predetermined area from the image;
A color information extraction step for extracting the color information from the region;
A collation step of collating the color language table based on the color information extracted by the color information extraction step, and acquiring the language information;
An information presentation method comprising:
A program that causes a computer to function as the information presentation device according to claim 1.
Storage means for storing features for each person classification;
Extraction means for extracting the characteristics of the person;
Identifying means for identifying the classification of the person based on the extracted features;
Processing means for performing processing according to the identified classification;
An information processing apparatus comprising:
A guidance data storage unit for storing guidance data for the user;
Image acquisition means for acquiring a plurality of frame images continuous in time series;
Analyzing the plurality of frame images acquired by the image acquisition unit, and determining a direction that the user is viewing for each frame image;
Direction change amount calculating means for calculating a direction change amount representing the magnitude of change in the direction with time based on the direction determined by the direction determining means;
Guidance determining means for determining whether or not to guide based on a change in direction calculated by the direction change amount calculating means;
Guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot when the guidance judgment means judges to guide;
A guidance robot control system comprising:
A guidance data storage unit for storing guidance data for the user;
Image acquisition means for acquiring a plurality of frame images continuous in time series;
User direction acquisition means for analyzing the frame image acquired by the image acquisition means to identify a user and acquiring a viewing direction;
Analyzing the frame image acquired by the image acquisition means and calculating a movement direction of the user;
Guidance determining means for determining whether or not to guide based on the user's viewing direction acquired by the user direction acquiring means and the user's moving direction calculated by the moving direction calculating means;
Guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot when the guidance judgment means judges to guide;
A guidance robot control system comprising: