WO2017086108A1 - Information presentation apparatus, information presentation method, program, information processing apparatus, and guide robot control system - Google Patents

Information presentation apparatus, information presentation method, program, information processing apparatus, and guide robot control system Download PDF

Info

Publication number
WO2017086108A1
WO2017086108A1 PCT/JP2016/081751 JP2016081751W WO2017086108A1 WO 2017086108 A1 WO2017086108 A1 WO 2017086108A1 JP 2016081751 W JP2016081751 W JP 2016081751W WO 2017086108 A1 WO2017086108 A1 WO 2017086108A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
user
color
information
guidance
Prior art date
Application number
PCT/JP2016/081751
Other languages
French (fr)
Japanese (ja)
Inventor
伊藤 直之
靖寿 松葉
松本 征二
Original Assignee
大日本印刷株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2015224262A external-priority patent/JP6657827B2/en
Priority claimed from JP2016045573A external-priority patent/JP6607092B2/en
Priority claimed from JP2016046761A external-priority patent/JP6610349B2/en
Priority claimed from JP2016065426A external-priority patent/JP2017182261A/en
Application filed by 大日本印刷株式会社 filed Critical 大日本印刷株式会社
Publication of WO2017086108A1 publication Critical patent/WO2017086108A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • the present invention relates to an information presentation apparatus that performs a dialog with a user (human).
  • the present invention also relates to an information processing apparatus that performs processing according to a person's attribute (classification).
  • the present invention relates to a guidance robot control system and the like.
  • Patent Document 1 discloses an information presentation device that receives voice input from a user, evaluates a response to the user's response processing based on the voice mode of the voice input, and changes the response content based on the evaluation result. ing.
  • content to be uttered is determined in consideration of human utterance content, changes in facial expressions, and the like.
  • Patent Document 2 discloses an information processing apparatus that presents effective information by recognizing a person's age and gender and performing a presentation effect by voice or movement.
  • Patent Document 3 discloses an output control apparatus that controls information to be output by referring to individual information determined based on individual image data. According to these prior arts, effective automatic correspondence reflecting each person's information is realized.
  • Patent Document 4 a robot that provides a guidance service to visitors has been disclosed (for example, Patent Document 4).
  • Patent Document 1 cannot realize a natural dialogue between humans when introducing a dialogue with a user (especially a first-time user).
  • Patent Document 2 the age and sex of a person are discriminated only from an image.
  • Patent Document 3 also determines an attribute based only on image data captured by the imaging unit. As described above, since the methods of Patent Documents 2 and 3 determine the attribute (classification) of a person based on a single type of data including only images, there are few determination indexes for determining the attribute (classification) of a person. In some cases, the attribute (classification) of a person could not be specified accurately. Further, the discrimination target is limited only to the attribute (classification) of the person that can be discriminated only from the image.
  • the robot described in Patent Document 4 provides a guidance service according to a pattern related to the movement of a user such as a visitor, the guidance service is forcibly provided regardless of the user's intention. There was a possibility.
  • the present invention has been made in view of the above-described problems, and an object of the present invention is to provide an information presentation apparatus and the like capable of natural conversations between humans when introducing a conversation with users. It is to be.
  • the present invention has been made from such a viewpoint, and it is an object of the present invention to appropriately specify a person's classification and to execute processing according to the identified classification. It is to provide a device or the like.
  • an object of the present invention is to provide a guidance robot control system and the like that can perform voice calls in consideration of the user's intention.
  • a first invention for achieving the above-described object is an information presentation apparatus for interacting with a user, and is input with storage means for storing a color language table in which color information and language information are associated with each other. Extracted by an image input means for acquiring an image, an area determination means for determining one or more predetermined areas from the image, a color information extraction means for extracting the color information from the area, and the color information extraction means. And a collating unit that collates the color language table based on the color information and obtains the language information. According to the first invention, when a dialogue with a user is introduced, a natural dialogue like humans is possible.
  • the first invention may further include speech content determining means for determining speech content based on the language information. This enables a dialogue based on the user's impression obtained from the color information.
  • the color information extracting means in the first invention may perform a color reduction process on each of the areas to extract one or a plurality of the color information representing the area. This makes it possible to accurately extract color information.
  • the language information in the first invention may be an impression word expressing an impression.
  • an accurate language that expresses the user's impression can be obtained.
  • the first invention may further include related word acquisition means for acquiring related words related to the language information. This makes it possible to obtain a wide range of languages that express user impressions.
  • the storage means in the first invention further stores a dialogue rule in which the area and the language information are associated with the utterance content, and the utterance content determination means is based on the dialogue rule.
  • the utterance content may be determined.
  • the storage means in the first invention further stores a recommendation table in which the area and the language information are associated with recommendation information recommended to the user, and the utterance content determination means further includes The recommendation information may be determined based on the recommendation table.
  • accurate recommendation information based on the user's impression can be presented.
  • a second invention is an information presentation method in which a computer storing a color language table in which color information and language information are associated with each other interacts with a user, and the image input step inputs the user's image.
  • An area determination step for determining a predetermined area from the image; a color information extraction step for extracting the color information from the area; and the color language table based on the color information extracted by the color information extraction step.
  • a collation step of collating and obtaining the language information.
  • the third invention is a program that causes a computer to function as the information presentation device according to claim 1.
  • the information presentation apparatus of the first invention can be obtained by installing the program of the third invention on a general-purpose computer.
  • a storage unit that stores a feature for each person classification, an extraction unit that extracts a person feature, a specifying unit that specifies the person classification based on the extracted feature,
  • An information processing apparatus comprising: processing means for performing processing according to classification.
  • an information processing apparatus capable of suitably specifying a person's classification (person attributes such as preference, gender, age, language, nationality) and executing a process according to the specified classification. Provided.
  • the extraction unit extracts features using a plurality of different types of data.
  • the classification of a person can be suitably specified using a plurality of different types of data.
  • the image acquisition device further includes image acquisition means for acquiring image data including the person, the extraction means extracts at least the characteristics of the person from the image data, and the identification means includes at least: It is desirable to specify the classification of the person based on the feature. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the video data.
  • the extraction means may be configured to extract features related to the movement of the person.
  • the classification of the person can be suitably specified based on the characteristics of the movement of the person extracted from the video data.
  • an information presenting means for presenting information to a person may be further provided, and the extracting means may be configured to extract features related to the person's reaction to the information. Thereby, based on the characteristic of the reaction of the person at the time of presenting information, the person classification can be suitably specified.
  • the apparatus further comprises sound acquisition means for acquiring sound data of the person, wherein the extraction means extracts at least the characteristics of the person from the sound data, and the specifying means includes at least the It is desirable to identify the classification of the person based on characteristics. Accordingly, it is possible to preferably specify the classification of the person based on at least the characteristics of the person extracted from the audio data.
  • the extraction means is configured to extract at least one of voice strength, voice pitch, frequency characteristics, voice quality, utterance tempo, harmonic characteristics, intonation, response sound characteristics, utterance language, and utterance content. May be. Thereby, the characteristics of a person are preferably extracted from the audio data.
  • the apparatus further comprises odor acquisition means for acquiring odor data of the person, wherein the extraction means extracts at least the characteristics of the person from the odor data, and the specifying means includes at least the odor data. It is desirable to identify the classification based on features. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the odor data.
  • the classification of the person can be suitably specified based on the extracted feature.
  • a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of frame images continuous in time series, and an analysis of the plurality of frame images acquired by the image acquisition unit. And a direction determining unit that determines a direction that the user is viewing for each frame image, and a direction change amount that represents a magnitude of a change in the direction over time based on the direction determined by the direction determining unit.
  • guidance determining means for determining whether or not to guide based on the direction change calculated by the direction change amount calculating means, and the guidance determining means
  • a guidance robot control system comprising: guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot.
  • distance calculation means for calculating a distance between the user and the robot based on the position information of the user in the frame image acquired by the image acquisition means.
  • the guidance determination unit is configured to determine that guidance is performed when the distance calculated by the distance calculation unit is within a specified range.
  • the 7th invention is the guidance robot control system of 5th invention or 6th invention,
  • the movement which calculates the moving speed of the said user based on the said user's positional information in the frame image which the said image acquisition means acquired
  • the guidance robot control system includes a speed calculation unit, and the guidance determination unit determines to perform guidance when the movement speed calculated by the movement speed calculation unit is within a predetermined range.
  • the robot in the guided robot control system of the seventh aspect, can move, and based on the position information of the user in the frame image acquired by the image acquisition means, the moving direction of the user And a movement control means for moving the robot based on the movement direction of the user calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means.
  • a ninth aspect of the present invention is the guidance robot control system according to any one of the fifth to seventh aspects, wherein the direction discriminating means is configured to be viewed by the user based on a relative position between the user's eyeball and pupil.
  • the guidance determining means includes the movement
  • the direction determining means detects a facial organ based on the face data acquired by the face acquiring means, A guidance robot control system characterized by determining a direction and acquiring a direction that the user is looking at.
  • An eleventh aspect of the present invention is the guide robot control system according to any one of the fifth aspect to the tenth aspect, wherein at least one of the frame images acquired by the image acquisition unit is analyzed, and the facial expression of the user is analyzed.
  • the guidance data storage unit stores guidance data corresponding to a facial expression, and the guidance unit refers to the guidance data storage unit and is determined by the facial expression determination unit.
  • a guidance robot control system characterized by extracting guidance data based on a user's facial expression and performing guidance.
  • the guidance robot control system of the eleventh aspect of the invention after guiding by guidance data, at least the processing by the image acquisition means and the processing by the facial expression discrimination means are executed, and the user before guidance Facial expression comparing means for comparing the facial expression of the user and the facial expression of the user after guidance, the guidance means refers to the guidance data storage unit and extracts guidance data based on the comparison result by the facial expression comparison means
  • the attribute of the user is analyzed by analyzing at least one of the frame images acquired by the image acquisition unit.
  • Attribute determining means for determining the guide data the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit
  • a guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.
  • a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of time-series frame images, and the frame image acquired by the image acquisition unit are analyzed.
  • User direction acquisition means for specifying a user and acquiring a viewing direction; movement direction calculation means for analyzing the frame image acquired by the image acquisition means to calculate the movement direction of the user; and the user direction Guidance determination means for determining whether or not to guide based on the user's viewing direction acquired by the acquisition means and the user's movement direction calculated by the movement direction calculation means; and guidance by the guidance determination means
  • a guidance robot that extracts guidance data from the guidance data storage unit and performs guidance by a robot when it is determined to perform guidance. It is a control system.
  • a fifteenth aspect of the present invention is the guide robot control system according to the fourteenth aspect of the present invention, wherein the guide data storage unit stores guide data corresponding to a moving direction.
  • a sixteenth aspect of the invention is the guidance robot control system according to the fourteenth aspect of the invention or the fifteenth aspect of the invention, further comprising distance calculation means for calculating a distance between the user and the robot, and the guidance determination means includes the distance calculation means.
  • a guidance robot control system characterized in that, when the calculated distance is within a specified range, it is determined that guidance is to be provided.
  • a seventeenth aspect of the invention is the guidance robot control system according to any one of the fourteenth aspect to the sixteenth aspect of the invention, further comprising movement speed calculation means for calculating the movement speed of the user, wherein the guidance determination means
  • a guidance robot control system characterized in that guidance is determined when the movement speed calculated by the calculation means is within a specified range.
  • the robot in the guided robot control system of the seventeenth aspect, the robot can move, and the movement direction calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means.
  • a guidance robot control system comprising a movement control means for moving the robot based on speed.
  • a nineteenth aspect of the present invention is the guide robot control system according to any one of the fourteenth aspect to the eighteenth aspect, wherein the face image obtained by analyzing the frame image obtained by the image obtaining means is obtained.
  • the guidance robot control system includes an acquisition unit, and the user direction acquisition unit acquires a direction that the user is viewing based on the user face data acquired by the face acquisition unit.
  • the user direction acquisition means detects a facial organ based on the face data acquired by the face acquisition means, and determines the face from the positional relationship. The direction of the user is determined and the direction the user is looking at is acquired, or the direction the user is looking at is acquired based on the relative position between the user's eyeball and pupil. System.
  • the face data of the user acquired by the face acquisition means is stored when the guidance data is output by the guidance means.
  • a face data storage unit, a face collation unit that collates the face data of the user acquired by the face acquisition unit and the face data stored in the face data storage unit, and the guide unit includes Guidance characterized by extracting guidance data different from the previous one from the guidance data storage unit and providing guidance when face matching means can collate with the face data stored in the face data storage unit. It is a robot control system.
  • a twenty-second aspect of the present invention is the guided robot control system according to any one of the nineteenth to twenty-first aspects, wherein the facial expression for determining the user's facial expression is based on the user's facial data acquired by the facial acquisition means.
  • Determining means wherein the guide data storage unit stores guide data corresponding to a facial expression, and the guide unit refers to the guide data storage unit to the facial expression of the user determined by the facial expression determination unit.
  • a guidance robot control system characterized by extracting guidance data based on the guidance and performing guidance.
  • the guidance robot control system after guiding by guidance data, at least the processing by the image acquisition unit, the processing by the face acquisition unit, and the processing by the facial expression determination unit
  • a facial expression comparison unit that executes and compares the facial expression of the user before guidance and the facial expression of the user after guidance, and the guidance unit compares the facial expression with the facial expression comparison unit with reference to the guidance data storage unit
  • a guidance robot control system characterized by extracting guidance data based on a result and performing guidance.
  • the at least one frame image obtained by the image obtaining unit is analyzed, and the attribute of the user is analyzed.
  • Attribute determining means for determining the guide data the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit
  • a guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.
  • an information presenting apparatus and the like capable of a natural dialogue between humans when a dialogue with a user is introduced.
  • an information processing apparatus and the like that can appropriately specify a person's classification and execute processing according to the identified classification.
  • a guidance robot control system capable of calling out in consideration of the user's intention.
  • the figure which shows the example of the color image scale which matched the three combination colors and language of 1st Embodiment The figure which shows the other example of the color image scale which matched the three combination colors and language of 1st Embodiment.
  • the figure which shows the specific example of the color language table 14 of 1st Embodiment The figure which shows the specific example of the monochrome table 100 of 1st Embodiment.
  • the figure which shows the specific example of the recommendation table 120 of 1st Embodiment The figure which shows the specific example of the color image scale for forming the color language table 14 of 1st Embodiment.
  • the flowchart which shows operation
  • the figure which shows an example of the system configuration
  • the internal block diagram of the robot 2002 of 2nd Embodiment The internal block diagram of the computer 2003 of 2nd Embodiment Functional configuration diagram of the robot 2002 and the information processing system 2001 of the second embodiment.
  • the figure which illustrates the 1st characteristic 2030A of a 2nd embodiment The figure which illustrates the 2nd characteristic 2030B of a 2nd embodiment.
  • the figure which shows the characteristic (person characteristic 2040) of the extracted person of 2nd Embodiment The figure which shows the classification table 2050 of 2nd Embodiment.
  • FIG. 1 is a block diagram showing the configuration of an embodiment of the information presentation apparatus of the present invention.
  • the information presentation device 10 is a device that interacts with a human user, and is a robot, a computer, or the like.
  • the information presentation apparatus 10 includes an image input unit 11 that inputs an image including a user's face, a region determination unit 12 that determines each region of the facial organ from the image, and a color information extraction unit 13 that extracts color information from the region.
  • the storage unit 15 that stores the color language table 14 in which the color information and the language information are associated with each other and the color language table 14 are collated based on the color information extracted by the color information extraction unit 13 to obtain the language information.
  • a matching unit 16 to be acquired.
  • the information presentation device 10 may further include an utterance content determination unit 17 that determines utterance content based on language information, and an output unit 18 that outputs the utterance content determined by the utterance content determination unit 17. good.
  • the information presentation apparatus 10 may include a microphone that inputs a user's voice.
  • the image input unit 11 includes a camera, a scanner, a storage media drive, a communication control device, and the like.
  • the image input unit 11 captures a target user, reads a printed matter on which the target user is printed, and stores an image of the target user.
  • the stored storage medium is read, or an image of a target user transmitted from another computer is received.
  • the input image is temporarily stored in a RAM (Random Access Memory) of the information presentation device 10.
  • the area determination unit 12, the color information extraction unit 13, the collation unit 16, and the utterance content determination unit 17 are configured by a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
  • the CPU calls a program stored in the storage unit 15, ROM, recording medium or the like to a work memory area on the RAM, executes it, controls the drive of each device connected via the bus, and is performed by the information presentation device 10.
  • the processing described later is realized.
  • the ROM is a non-volatile memory, and permanently stores a program such as a boot program and BIOS of the information presentation apparatus 10, data, and the like.
  • the RAM is a volatile memory, and temporarily holds a program, data, and the like loaded from a storage unit, a ROM, a recording medium, and the like, and includes a work area used by the CPU for performing various processes.
  • the storage unit 15 is an HDD (Hard Disk Drive), a flash memory, or the like, and stores a program executed by the CPU, data necessary for program execution, an OS (Operating System), and the like.
  • a control program corresponding to the OS and an application program for causing the CPU to execute processing described later are stored.
  • Each program code is read by the CPU as necessary, transferred to the RAM, read by the CPU, and executed as various means.
  • the output unit 18 includes a speaker, a display device, a printer, a communication control device, and the like.
  • the output unit 18 outputs voice indicating the utterance content, displays characters indicating the utterance content, prints characters indicating the utterance content, Data indicating the utterance content is transmitted to another computer.
  • FIG. 2 is a block diagram showing a configuration of another embodiment of the information presentation apparatus of the present invention.
  • the information presentation device 20 illustrated in FIG. 2 includes a terminal device 30 used by a user, and a server 40 connected to the terminal device 30 via a network.
  • the terminal device 30 is a mobile terminal such as a smartphone or a tablet, a PC (Personal Computer), or the like, and includes an image input unit 31 that inputs an image including a user's face, and a transmission unit 39a that transmits data such as an image to the server 40. And comprising.
  • the terminal device 30 may include a receiving unit 39b that receives data indicating the utterance content from the server 40 and an output unit 32 that outputs the utterance content. Further, the terminal device 30 may include a microphone for inputting a user's voice.
  • the server 40 includes a receiving unit 49b that receives data from the terminal device 30, an area determination unit 42 that determines each area of the facial organ from the image, a color information extraction unit 43 that extracts color information from the area, color information, A storage unit 45 that stores the color language table 44 associated with the language information, and a collation unit that collates the color language table 44 based on the color information extracted by the color information extraction unit 43 and acquires language information. 46.
  • the information presentation device 10 includes an utterance content determination unit 47 that determines utterance content based on language information, and a transmission unit 49 a that transmits the utterance content determined by the utterance content determination unit 47 to the terminal device 30. You may do it.
  • the image input unit 31, the output unit 32, the region determination unit 42, the color information extraction unit 43, the color language table 44, the storage unit 45, the collation unit 46, and the utterance content determination unit 47 are respectively shown in FIG.
  • the output unit 18, the region determination unit 12, the color information extraction unit 13, the color language table 14, the storage unit 15, the collation unit 16, and the utterance content determination unit 17 have the same functions.
  • the transmission unit 39a transmits data to the server 40 via the communication line 21a.
  • the receiving unit 39b receives data from the server 40 via the communication line 21b.
  • the transmission unit 49a transmits data to the terminal device 30 via the communication line 21b.
  • the receiving unit 49b receives data from the terminal device 30 via the communication line 21a.
  • the information presentation apparatus 20 shown in FIG. 2 performs the same configuration and operation except for data transmission / reception, and thus the description of the information presentation apparatus 20 is omitted.
  • FIG. 3 is a block diagram illustrating a configuration of the area determination unit 12.
  • the region determination unit 12 includes a face detection unit 12 a that detects a user's face from an image, and a facial organ (eyes, mouth, nose, chin, and the like) from the face region detected by the face detection unit 12 a. ) And a clothing region specifying unit 12c that specifies the region of the user's clothes from the image.
  • FIG. 4 is an explanatory diagram of face detection processing by the face detection unit 12a.
  • 50 is an image and 51 is a face area.
  • the face detection unit 12a detects a rectangular area as the face area 51 from the image 50 by using a face detection algorithm of the Viola-Jones method, for example.
  • a search window for example, a determination area such as 8 pixels ⁇ 8 pixels
  • the face detection unit 12a determines, for each area of the search window, whether or not the image is a face image by using a classifier that is constructed in advance by machine learning (AdaBoost) using Haar-Like feature values.
  • AdaBoost machine learning
  • the face detection algorithm is not limited to the Viola-Jones method, and ordinary techniques in this field can be used.
  • FIG. 5 is an explanatory diagram of face organ detection processing by the face organ detection unit 12b.
  • 51 is a face area
  • 52 is an eye area
  • 53 is a mouth area.
  • the face organ detection unit 12b specifies the positions (x, y) of feature points of each face organ such as eyebrows, eyes, nose, mouth, and jaw.
  • a normal technique in this field can be used as an algorithm for specifying the position of the feature point of each facial organ.
  • the facial organ detection unit 12b specifies the region of each facial organ based on the position (x, y) of the feature point. For example, the facial organ detection unit 12 b specifies a rectangular area including all the eye feature points as the eye area 52.
  • the facial organ detection unit 12 b specifies a rectangular area including all mouth feature points as the mouth area 53. Note that the facial organ detection unit 12b may specify not only the eye region 52 and the mouth region 53 but also the nose region, the hair region, the cheek region, and the like.
  • FIG. 6 is an explanatory diagram of the clothing area specifying process by the clothing area specifying unit 12c.
  • 50 is an image
  • 51 is a face area
  • 54 is a clothes area.
  • the clothing region specifying unit 12c specifies the clothing region 54 based on the size and position (x, y) of the rectangular region of the face region 51 detected by the face detection unit 12a. For example, the clothing region specifying unit 12 c sets the lower end of the rectangular region of the face region 51 as the upper end of the clothing region 54 and sets the lower end of the image 50 as the lower end of the clothing region 54.
  • the clothing area specifying unit 12 c sets a value obtained by multiplying the horizontal width of the rectangular area of the face area 51 by a predetermined magnification as the horizontal width of the clothing area 54 and uses the center position of the facial area 51 as a reference. Determine the left and right edges.
  • the clothing region specifying unit 12 c sets the boundary between the background and the clothing in the left-right direction as the left end and the right end of the clothing region 54.
  • the algorithm for specifying the clothing region 54 is not limited to this, and ordinary techniques in this field can be used.
  • the clothing region specifying unit 12c may specify not only a jacket, but also a hat, glasses, pants, a bag, and the like.
  • FIG. 7 is an explanatory diagram of color information extraction processing by the color information extraction unit 13.
  • 52 is an eye region
  • 53 is a mouth region
  • 54 is a clothing region
  • 55, 56, and 57 are color information
  • 55a, 55b, 55c, 56a, 56b, 56c, 57a, 57b, and 57c are color information.
  • Color single color
  • the color information extraction unit 13 performs a color reduction process on each of the eye area 52, the mouth area 53, and the clothes area 54, and extracts one or a plurality of color information 55 to 57 representing each area. In the example shown in FIG. 7, the color information extraction unit 13 extracts three colors as the color information 55 to 57.
  • the color information extraction unit 13 specifies one color (single color) or a plurality of colors (single color) by extracting one color (single color) or a plurality of colors (single color) from each area to be subjected to the color reduction processing (for example, 5 Alternatively, one color or three colors may be extracted from each area subjected to color reduction processing.
  • the color reduction processing by the color information extraction unit 13 is performed by, for example, the k-means method. Specifically, when the image is reduced to k colors (k colors representing each region are extracted), k representative colors are randomly selected from each region, and the closest representative color is selected for each pixel. All pixels are divided into k groups, and the average color of each pixel in each divided group is used as a new representative color, and all the pixels are divided again into k groups, and divided into groups based on this representative color. By repeatedly replacing the average color of the group with the representative color, the color is reduced to k colors.
  • FIG. 8 is a diagram showing a specific example of a color image scale for forming the color language table 14.
  • 60 is a color image scale
  • 61 to 63 are single colors arranged on the color image scale 60.
  • the color image scale 60 has a single color (one single color) corresponding to the image on the coordinate axes of warm (WARM) -cool (COOL) and soft (SOFT) -hard (HARD) which are image determination criteria. ) And language. For example, in the upper left (second quadrant) of the color image scale 60, single colors 61, 62, 63 and others are arranged, the language corresponding to the single color 61 is “sweet”, and the language corresponding to the single color 62 is “pretty” The language corresponding to the single color 63 is “congratulations”. In the color image scale 60, a language associated with one single color includes many adjectives such as “sweet” and “cute”. This is because adjectives are most suitable as a language for expressing the impression given by color.
  • the single color arranged on the color image scale 60 be a color in which one of brightness and saturation or brightness and saturation is different with respect to hue. That is, the hue is divided into, for example, red, yellow-red, yellow, yellow-green, green, blue-green, blue, purple-blue, purple, red-purple, and achromatic color, and / or one of the lightness and saturation of each hue. It is desirable to construct a single color of the color image scale 60 from the colors having different colors. This is because the impression that a color gives depends on the hue, lightness, and again that color has. Note that the single color and language of the color image scale 60 shown in FIG. 8 are only representative, and actually, the single color and language are arranged more finely.
  • FIGS. 9 and 10 are diagrams showing examples of color image scales in which three combination colors and languages are associated with each other.
  • 80 and 90 are color image scales
  • 81 and 91 are three combined colors (color arrangements) arranged on the color image scales 80 and 90.
  • the color image scale 80 shown in FIG. 9 has three combination colors (color schemes) and languages corresponding to the image arranged on a public-uplifting, orthodox-individual coordinate axis. For example, in the upper left (second quadrant) of the color image scale 80, the combination color 81 and the like are arranged, the language corresponding to the combination color 81 is “settled”, and the combination color 81 is a single color 81a, A single color 81b and a single color 81c are combined.
  • the color image scale 80 is applied, for example, to the color and language of a car.
  • the color image scale 90 shown in FIG. 10 has three combination colors (color schemes) and languages corresponding to the image arranged on the formal-casual and static-motion coordinate axes.
  • the combination color 91 and the like are arranged, the language corresponding to the combination color 91 is “elegant”, and the combination color 91 includes the single color 91a and the single color 91b. And a single color 91c.
  • This color image scale 90 is applied to, for example, the color and language of fashion (clothes).
  • FIG. 11 is a diagram showing a specific example of the color language table 14. As shown in FIG. 11, in the color language table 14, a combination color ID 14a, a single color ID 14b, a language 14c, and a related word 14d are associated with each other.
  • the combination color ID 14a is an ID for identifying a combination color arranged on a color image scale (see FIGS. 9 and 10) in which three combination colors and languages are associated with each other.
  • the single color ID 14b is composed of single color IDs of color scheme 1, color scheme 2 and color scheme 3 which are single colors.
  • the data of which the combination color ID 14a is “A00100” corresponds to the combination color in which the single color ID 14b of the color scheme 1 is “c3255”, the single color ID 14b of the color scheme 2 is “c4529”, and the single color ID 14b of the color scheme 3 is “c5043”.
  • the language 14c is “innocent”.
  • the related word 14d is a language related to the language 14c.
  • the related word of the language “innocence” includes “innocence”.
  • the color language table 14 is formed from a color image scale (see FIG. 8) in which a single color is associated with a language (see FIG. 8)
  • the color language table 14 in which the single color arranged on the color image scale is directly associated with the language is formed. Is done. Note that related terms can be stored in advance in association with storage means, but can also be acquired by crawling Web pages on the Internet, for example.
  • FIG. 12 is a diagram showing a specific example of the single color table 100.
  • the single color table 100 indicates what single color the color schemes 1 to 3 specified by the single color ID 14b are composed.
  • 100 is a monochrome table
  • 100a is a monochrome ID
  • 100b is a color value.
  • the single color ID is an ID for identifying a single color forming the color schemes 1 to 3 specified by the combination color 14a.
  • the color value 100b represents a single color specified by a single color ID as a coordinate value in a color space.
  • CEIxy colors in the XYZ color system xyY color system.
  • the values are x and y in the degree diagram.
  • the color space representing the color value 70b is not limited to the XYZ color system (xyY color system) in FIG. 12, but the RGB color system, L * u * v * color system, L * a.
  • Various color systems such as * b * color system are appropriately used.
  • the collation unit 16 includes one color (single color) or a plurality of colors (combination colors) extracted by the color information extraction unit 13 and one color (single color) or combination of the color language table 14 stored in the storage unit 15.
  • a single color is compared and the color value difference (distance in the color space) is below a certain threshold (in the case of a combination color, the difference between all single color values is below a certain threshold) ) Is specified as one color (single color) or combination color.
  • the combination color extracted by the color information extraction unit 13 is the color information 55 including the single color 55a, the single color 55b, and the single color 55c shown in FIG. 7, and the combination color ID is N in the color language table 14.
  • the single color ID of color scheme 1 constituting the (combination color N) is NA (single color NA)
  • the single color ID of color scheme 2 is NB (single color NB)
  • the single color ID of color scheme 3 is NC (single color NC).
  • the single color of the color information 55 with the single color of the combination color N the following six combinations of the single color are compared.
  • the combination color N is the same as or similar to the color information 55.
  • the color value of the single color A (xA, yA) and the color value of the single color B (xB, yB) the difference ⁇ C between the color values of the single color A and the single color B is as follows.
  • the collation unit 16 specifies a combination color that is the same as or close to the combination color extracted by the color information extraction unit 13 from the combination colors in the color language table 14.
  • the collation unit 16 acquires the language (language information) associated with the combination color specified as the combination color that is the same as or similar to the color information 55 and the related word. For example, if the combination color 81 specified in the color image scale 80 (see FIG. 9) is specified as a combination color that is the same as or similar to the color information 55, the collation unit 16 sets the language “calm” associated with the combination color 81. Is "and its related term” calm ".
  • the collation unit 16 may acquire a plurality of language information. Further, in the color language table 14, when one color (single color) and a language are associated with each other, the collation unit 16 extracts the one color (single color) extracted by the color information extraction unit 13 and the color language table 14. One color (single color) is compared, one color (single color) whose color value difference is equal to or less than a certain threshold value is specified, and is associated with the specified one color (single color) Get color information. Further, the collation unit 16 may not only acquire related words from the color language table 14, but may also acquire related words from the storage unit 15 or a dictionary database stored in another computer.
  • the utterance content determination unit 17 determines the utterance content based on language information and related words acquired by the collation unit 16 from various text templates stored in the storage unit 15. Specifically, the utterance content determination unit 17 determines the utterance content based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15.
  • FIG. 13 is a diagram showing a specific example of the dialogue rule 110.
  • 110 is a dialogue rule
  • 111 is a region
  • 112 is language information
  • 113 is an utterance content.
  • the conversation rule 110 is stored with the utterance content 113 associated with each region 111 and language information 112.
  • the area 111 is associated with “clothes” and the language information 112 is associated with “Hisense”.
  • the utterance content determination unit 17 may determine the utterance content by inserting language information into the sentence. For example, when the language information is “elegant”, the utterance content determination unit 17 may insert the language information into “It ’s a very elegant clothes!” And make “A very elegant clothes!” As the utterance content.
  • FIG. 14 is a diagram showing a specific example of the recommendation table 120.
  • 120 is a recommendation table
  • 121 is an area
  • 122 is language information
  • 123 is utterance content
  • 124 is recommendation information.
  • the recommendation table 120 stores utterance contents 123 and recommendation information 124 in association with each other for each area 121 and language information 122.
  • the region 121 is “mouth” and the language information 122 is “chic”
  • the utterance content 123 is “If you like a calm color, how about such a lipstick?”
  • Recommendation information 124 is associated with “lipstick A”.
  • the recommendation information 124 may be associated with not only a character string but also an image, a moving image, a web page, and the like. For example, not only the character string “lipstick A” but also the product image, CM video, product introduction web page, and the like of the lipstick may be associated.
  • FIG. 15 is a flowchart showing the operation of the information presentation apparatus 10.
  • the image input unit 11 inputs an image including the user's face (step S1).
  • the image input unit 11 is a camera, for example, and inputs an image including a user's face photographed by the camera.
  • the region determination unit 12 determines each region of the facial organ from the image input in step S1 (step S2). Specifically, the face detection unit 12a detects the face of the user from the image, and the face organ detection unit 12b detects the face organs (eyes, mouth, nose, chin, etc.) from the face area detected by the face detection unit 12a. Then, the clothes area specifying unit 12c specifies the area of the user's clothes from the image.
  • the color information extraction unit 13 extracts color information for each area specified in step S2 (step S3). Specifically, the color information extraction unit 13 performs color reduction processing on each of the facial organ area such as the eye area and the mouth area and the clothing area, and extracts one or more color information representing each area.
  • the collation unit 16 collates the color language table 14 based on the color information extracted in step S3, and acquires language information and related words (step S4). Specifically, the collation unit 16 compares one color (single color) or a plurality of colors (combination colors) extracted in step S3 with one color (single color) or a combination color of the color language table 14. In contrast, one color of the color language table 14 in which a color value difference is equal to or smaller than a certain threshold value (in the case of a combination color, a color value difference between all single colors is equal to or smaller than a certain threshold value) exists. (Single color) or combination color is specified. Then, the collation unit 16 acquires language information associated with the combination color specified as the combination color that is the same as or close to the combination color from the color language table 14.
  • a certain threshold value in the case of a combination color, a color value difference between all single colors is equal to or smaller than a certain threshold value
  • the utterance content determination unit 17 determines the utterance content based on the language information acquired in step S4 (step S5). Specifically, the utterance content determination unit 17 determines the utterance content and recommendation information based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15. When there are a plurality of search results of the dialogue rule 110 and the recommendation table 120, the utterance content determination unit 17 selects at random or according to a predetermined priority order. As the priority order, for example, ranking by area can be considered. For example, since eyes give a strong impression even between humans, it is conceivable that the language information associated with the color information of the eye area is given the highest priority. In addition, when the information to be recommended by the information presentation device 10 is related to a specific area, for example, when the information to be recommended is a lipstick, the language information associated with the color information of the mouth area is given the highest ranking. It is possible.
  • the output unit 18 outputs the utterance content and recommendation information determined in step S5 (step S6). Specifically, the output unit 18 outputs a sound indicating the utterance content from a speaker and displays characters indicating the utterance content and recommendation information on the display device.
  • the information presentation apparatus 10 includes a microphone for inputting the user's voice
  • the user's voice may be input, and the process may be repeated (continue the dialogue) from step S1 in consideration of the user's voice. .
  • the information presenting apparatus 10 (20) enables a natural dialogue like humans when introducing a dialogue with a user.
  • the information presentation apparatus 10 (20) extracts color information for each facial organ and clothes, and determines the utterance content based on language information associated with the color information. Therefore, it is possible to start a conversation based on the first impression received from the other party, as with the first-time human beings, for the first-time user.
  • FIG. 16 is a diagram illustrating an example of a system configuration of the information processing system 2001.
  • An information processing system 2001 includes a robot 2002, a computer 2003, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007, which are an embodiment of an information processing apparatus according to the present invention.
  • a robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007 are communicatively connected to a computer 2003, and controlled by the computer 2003 so that the operations of the respective devices cooperate.
  • the form of communication connection between the robot 2002, the camera 2004, the microphone 2005, the odor sensor 2006, the display 2007, and the computer 2003 may be wired or wireless.
  • the camera 2004 acquires surrounding video data.
  • the camera 2004 is, for example, a single camera, a stereo camera, a distance sensor, or a composite sensor that combines these.
  • video data can be acquired using a camera 2210 built in the robot 2002 instead of the camera 2004. In this case, the camera 2004 may be omitted.
  • the microphone 2005 acquires surrounding audio data.
  • the microphone 2005 is, for example, a microphone array including a single microphone or a plurality of microphones.
  • the microphone 2005 can also acquire voice data using the microphone 2211 built in the robot 2002. In this case, the microphone 2005 may be omitted.
  • the odor sensor 2006 acquires ambient odor data.
  • the odor sensor 2006 is, for example, a sensor composed of a single crystal resonator type or semiconductor type sensor, or a sensor array composed of a plurality of sensors.
  • the odor data can be acquired using the odor sensor 2212 incorporated in the robot 2002. In this case, the odor sensor 2006 may be omitted.
  • the robot 2002 includes a plurality of different types of data, that is, video data acquired from the camera 2004 (or camera 2210), audio data acquired from the microphone 2005 (or microphone 2211), and an odor sensor.
  • the feature of the person is extracted using odor data acquired from 2006 (or the odor sensor 2212), and the classification of the person (attributes of the person such as preference, gender, age, language, nationality) based on the extracted feature. Identify. Then, processing (processing such as dialogue, information guidance, and advertisement) according to the identified classification is performed.
  • the display 2007 is controlled by the computer 2003 and displays an image or the like.
  • the display 2007 may be omitted when display control is performed using the display unit 2204 mounted on the robot 2002, or when the robot 2002 executes processing such as dialogue, information guidance, and advertisement only by voice.
  • the computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, display control of the display 2007, and controls the operation of the robot 2002.
  • the robot 2002 (information processing apparatus) is, for example, an interactive robot. As shown in FIG. 17, the robot 2002 includes a control unit 2201, a storage unit 2202, an input unit 2203, a display unit 2204, a media input / output unit 2205, a communication I / F 2206, a movable unit drive mechanism 2207, and a peripheral device I / F unit. 2208, an audio processing unit 2209, a camera 2210, a microphone 2211, an odor sensor 2212, a speaker 2213, and various sensors 2214 are connected via a bus 2220.
  • the control unit 2201 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
  • the CPU calls a program stored in the storage unit 2202, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2220.
  • the CPU of the control unit 2201 executes all processes by the robot 2002.
  • the ROM permanently stores programs such as a boot program for the robot 2002, a BIOS, data, and the like.
  • the RAM temporarily stores the loaded program and data, and includes a work area used by the control unit 2201 to perform various processes.
  • the storage unit 2202 stores a program executed by the control unit 2201, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2201 as necessary, transferred to the RAM, and read and executed by the CPU.
  • the storage unit 2202 also includes a classification table 2050 (FIG. 24) that is referred to when specifying a person's classification, a processing database 2060 (FIG. 25) that is referred to when executing a process according to the classification, and the like.
  • a classification table 2050 (FIG. 24) that is referred to when specifying a person's classification
  • a processing database 2060 (FIG. 25) that is referred to when executing a process according to the classification, and the like.
  • it has an action DB (not shown) in which a control sequence table describing the control sequence of the robot 2002 is stored.
  • the control sequence table stores data in which all actions performed by the robot 2002 are associated with the movable part control sequence of the robot 2002.
  • the movable part control sequence is a list of driving positions and values of the movable parts to be controlled when realizing each action (for example, head tilt angle 20 degrees, right arm tilt angle 40 degrees, etc.).
  • the input unit 2203 includes an input device such as a touch panel and various operation buttons, for example, and outputs input data to the control unit 2201.
  • the display unit 2204 includes, for example, a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2201 on the display device. Let Note that the touch panel of the input unit 2203 is configured integrally with the display of the display unit 2204.
  • the media input / output unit 2205 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and inputs / outputs data.
  • a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory
  • the communication I / F 2206 is an interface that mediates communication with the network 2008.
  • the network 2008 includes a LAN (Local Area Network), a WAN (Wide Area Network) connected to a wider area, or a public communication line such as the Internet, a base station, and the like.
  • the communication connection with the network 2008 may be wired or wireless.
  • the robot 2002 can access a Web server via the network 2008 and can transmit and receive various programs and data.
  • the movable part drive mechanism 2207 includes a mechanism and a drive device for driving a movable part provided on the head, neck, shoulder, arm, waist, leg, and the like of the robot 2002.
  • the operation of the movable unit driving mechanism 2207 is controlled by the control unit 2201.
  • the control unit 2201 reads the movable part control sequence from the action DB of the storage unit 2202, sends a control signal to each movable part drive mechanism 2207, and operates each drive part.
  • Peripheral device I / F (interface) unit 2208 is a port for connecting a peripheral device, and mediates transmission / reception of data with the peripheral device.
  • the connection form with the peripheral device may be wired or wireless.
  • the robot 2002 can be connected to a computer 2003 or the like via the peripheral device I / F unit 2208.
  • the speech processing unit 2209 includes a speech synthesis unit, a language discrimination unit, a speech recognition unit, and the like.
  • the speech synthesizer generates synthesized speech data corresponding to character information (text data) based on a command from the control unit 2201 and outputs synthesized speech from the speaker 2213.
  • the synthesized speech is generated based on the correspondence between character information and speech data stored in advance in the utterance database of the storage unit 2202.
  • the language discriminating unit uses, for example, a language discriminating model that statistically models differences in phoneme type, word phoneme arrangement pattern, prosodic pattern, vocabulary, etc., for each person's speech data.
  • the speech language discrimination process is executed to discriminate the speech language of the person.
  • the speech recognition unit performs speech recognition processing on speech data that has been subjected to language discrimination using a statistical model (acoustic model and language model) for speech recognition prepared for each language. Recognize
  • the camera 2210 is a single camera built in the robot 2002, such as a CCD (Charge-Coupled Device) camera, a stereo camera, a distance sensor, or a combination of these, and acquires surrounding video data. To do.
  • the acquired video data is sent to the control unit 2201.
  • CCD Charge-Coupled Device
  • the microphone 2211 is a microphone array built in the robot 2002, for example, a single microphone or a plurality of microphones, and acquires ambient audio data.
  • the acquired audio data is sent to the control unit 2201.
  • the odor sensor 2212 is a sensor array that is built in the robot 2002 and is composed of, for example, a single sensor of a crystal oscillator type or a semiconductor type, or a sensor array including a plurality of sensors, and acquires ambient odor data.
  • the acquired odor data is sent to the control unit 2201.
  • the speaker 2213 outputs the synthesized speech generated by the speech synthesis unit of the speech processing unit 2209.
  • the various sensors 2214 include sensors such as a 3D sensor, a touch sensor, a gyro sensor, a sonar sensor, a laser sensor, a bumper sensor, and an infrared sensor. Each sensor inputs the detected signal to the control unit 2201.
  • the bus 2220 is a path that mediates transmission / reception of control signals, data signals, and the like.
  • a computer 2003 is a control device that controls the overall operation of the information processing system 2001.
  • the computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, and display control of the display 2007.
  • the operation of the robot 2002 is controlled.
  • the computer 2003 includes a control unit 2301, a storage unit 2302, a media input / output unit 2303, a peripheral device I / F unit 2304, an input unit 2306, a display unit 2307, and a communication I / F 2308 via a bus 2309. Connected and configured.
  • the control unit 2301 includes a CPU, a ROM, a RAM, and the like.
  • the CPU calls a program stored in the storage unit 2302, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2309.
  • the CPU of the control unit 2301 performs shooting control by the camera 2004, voice acquisition control of the microphone 2005, odor acquisition control by the odor sensor 2006, and display control of the display 2007 in the process executed by the robot 2000 (FIG. 26).
  • the ROM permanently stores a program such as a boot program for the computer 2003, a BIOS, data, and the like.
  • the RAM temporarily holds the loaded program and data, and includes a work area used by the control unit 2301 to perform various processes.
  • the storage unit 2302 stores a program executed by the control unit 2301, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2301 as necessary, transferred to the RAM, and read and executed by the CPU.
  • the media input / output unit 2303 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and performs data input / output.
  • a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory
  • Peripheral device I / F (interface) unit 2304 is a port for connecting a peripheral device, and transmits / receives data to / from the peripheral device via peripheral device I / F unit 2304.
  • the connection form with the peripheral device may be wired or wireless.
  • a robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, a display 2007, and the like can be connected to a computer 2003 via a peripheral device I / F unit 2304.
  • the input unit 2306 is an input device such as a keyboard, a mouse, or a touch panel, and outputs input data to the control unit 2301.
  • the display unit 2307 includes a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2301 on the display device.
  • the touch panel of the input unit 2306 is configured integrally with the display of the display unit 2307.
  • the display unit 2307 may be an external display 2007.
  • the control unit 2301 sends display information to the display unit 2307 or the display 2007 to display it.
  • the communication I / F 2308 is an interface that mediates communication with the network 2008.
  • a computer 2003 can transmit and receive various programs and data from a Web server on the network 2008.
  • the bus 2309 is a path that mediates transmission / reception of control signals, data signals, and the like.
  • the robot 2002 includes functions of a person identification unit 2020, a video acquisition unit 2021, an audio acquisition unit 2022, an odor acquisition unit 2023, a feature extraction unit 2024, a classification identification unit 2025, a processing unit 2026, and an action unit 2029. Is provided. Processing of each function is executed by a program. This program may be stored in advance in the storage unit 2202 of the robot 2002 or the storage unit 2302 of the computer 2003, or may be downloaded and installed via the network 2008.
  • the person specifying unit 2020 specifies a target person for which the robot 2002 performs processing such as dialogue, information guidance, and advertisement.
  • the target person is preferably a person who is interested in the robot 2002 or a person who wants to interact with the robot 2002.
  • the target person is a person who talks to the robot 2002 or touches the robot 2002.
  • the person who talks to the robot 2002 is, for example, the arrival direction of the person's utterance voice estimated by the microphone 2211 built in the robot 2002 or the external microphone 2005, and the camera 2210 built in the robot 2002 or the external microphone. It is specified by detecting the coincidence with the face direction of the person in the video data acquired by the camera 2004.
  • the person who has touched the robot 2002 detects, for example, a person close to the robot 2002 from video data acquired by the camera 2210 built in the robot 2002 or the external camera 2004, and the robot 2002 has a touch sensor. Is identified by detecting the contact action of a person.
  • the target person may be a person who is approaching the robot 2002 or a person who is simply reflected in the video data, in addition to a person who has spoken to the robot 2002 or a person who has touched the robot 2002.
  • the target person may be specified as one group including a plurality of persons.
  • a target person (one group) is specified by determining which person and which person belong to the same group based on the relationship between positions and movements among a plurality of persons in the video data. .
  • the video acquisition unit 2021 acquires video data including the person specified by the person specifying unit 2020 by the camera 2210 built in the robot 2002 or the external camera 2004.
  • the voice acquisition unit 2022 acquires the voice data of the person specified by the person specifying unit 2020 using the microphone 2211 built in the robot 2002 or the external microphone 2005.
  • the microphone 2211 or the microphone 2005 is a microphone array whose directivity can be controlled, it is desirable to acquire the voice data of the person with the directivity of the microphone 2211 or the microphone 2005 directed toward the face of the person.
  • the odor acquisition unit 2023 acquires the odor data of the person specified by the person specifying unit 2020 by the odor sensor 2212 built in the robot 2002 or the external odor sensor 2006.
  • the odor sensor 2212 or the odor sensor 2006 is a sensor array capable of controlling directivity
  • the odor data of the person can be acquired by directing the directivity of the odor sensor 2212 or the odor sensor 2006 toward the person. desirable.
  • the feature extraction unit 2024 extracts the features of the person specified by the person specifying unit 2020 using a plurality of different types of data. Specifically, using the data of the video data acquired by the video acquisition unit 2021, the audio data acquired by the audio acquisition unit 2022, and the odor data acquired by the odor acquisition unit 2023, the data of each person Extract features.
  • the feature of the person extracted based on the video data is “first feature”
  • the feature of the person extracted based on the audio data is “second feature”
  • the person is extracted based on the odor data.
  • This feature is referred to as a “third feature”. Specific examples of the first feature, the second feature, and the third feature will be described below.
  • the first feature is a feature of a person extracted from video data. As shown in FIG. 20, for example, “motion feature” (motion 1, motion 2,%), “Appearance feature” (look 1, appearance 2). ,..., “Reaction characteristics” (reaction 1, reaction 2,).
  • “Motion characteristics” are characteristics related to the movement of a person, for example, how the person walks (walking speed, stride, arm swing, posture, etc.) obtained by applying motion analysis to video data. It is a feature. When a person is a group, it is an average characteristic of how to walk each person belonging to the group and a characteristic of a positional relationship between the persons (one line of one row / two lines of one column / one column of columns, etc.).
  • “Appearance features” are features related to the appearance of a person. For example, the gender, age (age), eye color, hair color, etc. of a person obtained by applying facial image analysis to video data It is the feature.
  • a known method Japanese Patent Application Laid-Open No. 2012-053813, etc.
  • facial expressions and emotions may be extracted as appearance features. The facial expressions are classified into several patterns such as joy, surprise, anger, sadness, and no facial expression. These facial expressions can be read from the rising angle of the mouth corners, the positional relationship of the cheek, eyes, eyebrows, and the like. The emotion can be estimated from the read facial expression.
  • a method disclosed in Japanese Patent Laid-Open Nos. 10-255043 and 2014-206903 may be used.
  • a person's clothes type (suit, dress, folk costume,...) Obtained by applying a known clothes identification analysis to the video data may be extracted as an appearance feature.
  • the feature extraction unit 2024 extracts an average appearance feature of each person belonging to the group.
  • the “reaction feature” is a feature related to the reaction of the person to the presentation of information performed by the robot 2002 to the person.
  • the presentation of information includes, for example, information such as weather information and news displayed on the display unit 2204 and the display 2007, and also a greeting for a person.
  • the synthesized speech corresponding to a predetermined greeting ( "Good morning”, “Hello”, “Are you happy how?" 7) was generated by the audio processing unit 2209, the audio output (reading) from the speaker 2213 makes.
  • a movable part control sequence of an action corresponding to a predetermined greeting action (“waving hand”, “bowing”,...) Is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part To work.
  • the feature extraction unit 2024 performs motion analysis, facial expression analysis, and the like on the video data acquired by the video acquisition unit 2021 after the information is presented, so that a person's reaction to the information presentation (“smile” ”,“ Becoming serious ”,“ coming closer ”,“ turning back ”,“ no response ”, etc.) as features.
  • the feature extraction unit 2024 extracts an average reaction feature of each person belonging to the group.
  • the second feature is a feature of the person extracted from the voice data.
  • voice strength For example, as shown in FIG. 21, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality” ( Bright voice / dark voice / thick voice / faint voice, etc.), “speech tempo” (speak slowly / busily, etc.), “overtone characteristics” (characteristics related to softness of sound), “intonation” (intonation), “Reaction sound characteristics” (characteristics of utterance sounds that show some response to the utterances of the other party, such as “um”, “n”, “e”).
  • voice strength For example, as shown in FIG. 21, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality” ( Bright voice / dark voice / thick voice / faint voice, etc.), “speech tempo” (speak slowly / busily, etc.), “overtone characteristics” (characteristics related to softness of sound), “intonation” (intonation), “Reaction sound characteristics” (characteristics of
  • speech linguistic information such as “utterance language” and “utterance content” may be extracted as features.
  • the “speech language” is statistically modeled by the speech processing unit 2209 of the robot 2002 for differences in phoneme types, word phoneme arrangement patterns, prosodic patterns, vocabulary, etc., for human speech data. It is extracted by analyzing using the language discrimination model. Further, the “utterance content” is extracted by applying a voice recognition process to the voice data input from the microphone 2211 by the voice processing unit 2209 of the robot 2002.
  • the third feature is a feature of a person extracted from odor data. As shown in FIG. 22, for example, “scent type” (type 1, type 2,%), “Scent intensity” (strength). 1, strength 2, ).
  • the odor type is specified by analyzing using an odor discrimination model for discriminating the odor type, which is acquired by learning a large amount of odor data in which the odor type is known in advance.
  • the intensity of odor can be calculated from the absolute value of odor data (input value) acquired by the odor sensor 2006 (or odor sensor 2212).
  • the classification specifying unit 2025 specifies the person classification 2051 based on the above-described features (first feature, second feature, and third feature) extracted by the feature extraction unit 2024.
  • the classification 2051 is a grouping of persons with arbitrary attributes (preference, gender, age, language, nationality, etc.). As shown in the classification table 2050 in FIG. 24, each classification 2051 (class 1, classification 2, classification 2). ,... Are defined by the first feature, the second feature, and the third feature.
  • the characteristics that define each class 2051 (class 1, class 2,...) Are determined based on the characteristics (first characteristics) from a large number of persons who have previously known the class 2051 (class 1, class 2,). , Second feature, and third feature) are extracted and learned statistically. Thereby, it is statistically determined what kind of features (first feature, second feature, third feature) a person belonging to each category 2051 (category 1, category 2,...) Has. I can grasp it.
  • a person belonging to “Category 1” statistically has the characteristics of “operation 3”, “appearance 10”, “response 6”,. 2) "(voice) strength 2", “(voice) height 9", “frequency characteristic 1", etc., and "third characteristic” is "(odor) type” It can be understood that there is a tendency to have the characteristics of “5”, “(odor) intensity 4”.
  • a person belonging to “Category 2” statistically has “motion 5”, “appearance 2”, “response 1”, etc.
  • first characteristics and “second characteristics” (Voice) strength 4 ”, (Voice) height 7”, “Frequency characteristics 3”, etc.
  • “Third feature” includes “(Odor) type 3” “Odor” It can be understood that there is a tendency to have the characteristic of strength 1 ”.
  • a person belonging to “Category 3” statistically has “motion 2”, “appearance 7”, “response 3”, etc.
  • first feature and “second feature” as “second feature” (Voice) strength 6 ", (voice) height 1", “frequency characteristic 4", and so on, and "third characteristic” includes “(smell) type 2""(smell)” It can be understood that there is a tendency to have the characteristic of strength 8 ”.
  • the classification specifying unit 2025 refers to the classification table 2050 and specifies the person's classification based on the features extracted by the feature extracting unit 2024. For example, for all the features extracted by the feature extraction unit 2024, the category 2051 (category 1, category 2,%) To which each feature belongs is aggregated, and the category 2051 with the largest aggregation count is identified as the person category. To do.
  • the processing unit 2026 performs processing based on the processing data 2061 of the classification 2051 specified by the classification specifying unit 2025.
  • “Processing” includes processing (interaction or the like) in which the robot 2002 becomes a person's partner and receives and answers, and processing (advertisement or the like) in which the robot 2002 unilaterally provides information to the person.
  • FIG. 25 is a diagram showing the contents of the processing database 2060 that holds the processing data 2061. As shown in the figure, the processing database 2060 holds processing data 2061 (processing data 1, processing data 2,...) For each category 2051 (class 1, classification 2,).
  • the processing data 2061 is, for example, video data, audio data, text data, operation data, or any combination of these data.
  • the processing data 2061 is video data
  • the video is displayed on the display unit 2204 or the display 2007 of the robot 2002.
  • audio is output from the speaker 2213 of the robot 2002.
  • the voice processing unit 2209 generates a synthesized voice of the text data and outputs the voice from the speaker 2213 (or displays the text data on the display unit 2204 or the display 2007 of the robot 2002).
  • a movable part control sequence corresponding to each action included in the operation data is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part is operated.
  • the robot 2002 identifies a target person for processing such as dialogue, information guidance, and advertisement (step S2000).
  • the robot 2002 extracts the characteristics of the person identified in step S2001 using a plurality of different types of data.
  • the robot 2002 acquires video data and extracts a first feature of a person (step S2001). For example, as described above, the “motion characteristics”, “appearance characteristics”, “reaction characteristics”, etc. of the person (see FIG. 20) are extracted from the video data.
  • action features predetermined information is presented to a person before step S2001, and in step S2001, a person's reaction to the information presentation is extracted as a feature.
  • the robot 2002 acquires voice data and extracts the second feature of the person (step S2002). For example, as described above, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality”, “speech tempo”, “overtone characteristics”, “inflection”, “reaction sound”, “speech language” of a person from voice data “Speech contents” and the like (see FIG. 21) are extracted.
  • the robot 2002 acquires odor data and extracts a third feature of the person (step S2003). For example, as described above, a person's “scent type”, “scent intensity”, and the like (see FIG. 22) are extracted from the scent data.
  • the robot 2002 refers to the classification table 2050, and based on the characteristics of the person (first characteristic, second characteristic, and third characteristic) extracted in steps S2001 to S2003, the classification of persons 2051 is performed. Is identified (step S2004). For example, for all the features extracted in steps S2001 to S2003, the classification 2051 (classification 1, classification 2,%) To which each characteristic belongs is aggregated, and the classification 2051 with the largest aggregation count is specified as the person classification. To do.
  • the robot 2002 refers to the processing database 2060, acquires the processing data 2061 of the classification 2051 specified in step S2004, and executes predetermined processing based on the processing data 2061 (step S2005). For example, when the specified category 2051 is “category 1”, the robot 2002 executes processing based on “processing data 1” (see FIG. 25) corresponding to “classification 1”.
  • the information processing apparatus has been described in the form of a robot, the application form is not limited to the robot, and can be applied to vending machines, digital signage, and the like.
  • various functions and various data included in the information processing system 2001 are included in the control unit 2201 and the storage unit 2202 in the robot 2002.
  • steps S2001 to S2003 in FIG. 26 it is not always possible to extract the characteristics of a person from all types of data such as video data, audio data, and odor data.
  • data such as video data, audio data, and odor data.
  • the person classification may be specified based on the characteristics of the person extracted from the video data and the odor data.
  • the person classification may be specified based on the characteristics of the person extracted from the video data and the audio data.
  • the classification table 2050 may be relearned. Specifically, after step S2004 in FIG. 26, processing for confirming the correctness of the classification 2051 specified by the robot 2002 for the person is added. For example, it is assumed that classification 2051 (classification 1, classification 2,%) Is grouped by “nationality”, and the classification 2051 of the specified person is “France”. In this case, the robot 2002 asks the person a question confirming the correctness of the specified classification 2051 such as “Where are you from?”.
  • the robot 2002 specifies the specified person.
  • Class 2051 (“France") is judged to be an error, and correct classification 2051 ("Italy") and the features (first feature, second feature, and third feature) extracted in steps S2001 to S2003 are correctly learned.
  • the classification table 2050 is relearned.
  • FIG. 27 is a diagram illustrating an exemplary scene suitable for operation of the guidance robot control system 3100 according to the third embodiment.
  • the guide robot control system 3100 is a system using a communication robot 3001 (guide robot).
  • the communication robot 3001 is disposed in a space 3000S such as in an exhibition hall or a store, for example. Note that the communication robot 3001 is desirably arranged at a position where the space 3000S can be looked over.
  • the communication robot 3001 is, for example, a humanoid robot having a height of about 1 m. Then, the communication robot 3001 acquires and analyzes a frame image including the user P in the space 3000S (users P1 to P3 in the example of FIG. 27), determines the user P to be called, Make a voice call (guidance). In the example described below, the communication robot 3001 estimates that the user P who is busy is in trouble, and determines that the user P is a voice call target.
  • FIG. 28 is a functional block diagram of a communication robot 3001 according to the third embodiment.
  • FIG. 29 is a diagram illustrating an example of the guidance data storage unit 3022 of the communication robot 3001 according to the third embodiment.
  • the communication robot 3001 includes a control unit 3010, a storage unit 3020, a camera 3031 (imaging unit), a speaker 3032, and a motor unit 3033.
  • the control unit 3010 is a central processing unit (CPU) that controls the entire communication robot 3001.
  • the control unit 3010 executes various functions in cooperation with the hardware described above by appropriately reading and executing an operating system (OS) and application programs stored in the storage unit 3020.
  • OS operating system
  • the control unit 3010 includes an image acquisition unit 3011 (image acquisition unit), an image analysis unit 3012, a user determination processing unit 3013, a guidance data extraction unit 3015, a movement control unit 3016 (movement control unit), and a guidance unit 3017. (Guide means) and a facial expression comparison unit 3018 (facial expression comparison means).
  • the image acquisition unit 3011 acquires a plurality of frame images via the camera 3031. The acquired plurality of frame images are continuous in time series.
  • the image analysis unit 3012 analyzes the frame image acquired by the image acquisition unit 3011.
  • the image analysis unit 3012 includes a direction determination unit 3012a (direction determination unit), a distance calculation unit 3012b (distance calculation unit), an expression determination unit 3012c (expression determination unit), and an attribute determination unit 3012d (attribute determination unit).
  • the direction determination unit 3012a determines the direction that the user P is viewing.
  • the distance calculation unit 3012b calculates the distance between the communication robot 3001 and the user P for each user P included in the frame image.
  • the facial expression determination unit 3012c determines the facial expression of the user P for each user P included in the frame image.
  • the attribute determination unit 3012d determines the attribute of the user P for each user P included in the frame image.
  • the attribute of the user P refers to, for example, age, sex, and the like.
  • the user determination processing unit 3013 determines a user P to call out based on the frame image analyzed by the image analysis unit 3012.
  • the user determination processing unit 3013 includes a direction change amount calculation unit 3013a (direction change amount calculation unit), a movement speed calculation unit 3013b (movement speed calculation unit), a movement direction calculation unit 3013c (movement direction calculation unit), and guidance determination.
  • Unit 3013d guidance determination means.
  • the direction change amount calculation unit 3013a calculates a direction change amount that is the magnitude of a change with time in the direction that the user P is viewing, as determined by the direction determination unit 3012a.
  • the movement speed calculation unit 3013b calculates the movement speed based on the change of the distance of the user P calculated by the distance calculation unit 3012b with time.
  • the movement direction calculation unit 3013c calculates the movement direction based on the change of the distance of the user P calculated by the distance calculation unit 3012b over time.
  • the guidance determination unit 3013d determines the user P to call out based on the direction
  • the guidance data extraction unit 3015 extracts guidance data for guiding the determined user P from the guidance data storage unit 3022.
  • the movement control unit 3016 moves the communication robot 3001 so as to approach the determined user P based on the movement direction calculated by the movement direction calculation unit 3013c.
  • the guide unit 3017 outputs the guide data extracted by the guide data extraction unit 3015 via the speaker 3032.
  • the expression comparison unit 3018 compares the facial expressions determined by the expression determination unit 3012c before and after guidance. Details of each process will be described later.
  • the storage unit 3020 is a storage area such as a semiconductor memory element for storing programs, data, and the like necessary for the control unit 3010 to execute various processes.
  • the storage unit 3020 includes a program storage unit 3021, a guidance data storage unit 3022, and an image data storage unit 3023.
  • the program storage unit 3021 is a storage area for storing a program.
  • the program storage unit 3021 stores a control program 3021a (program) for executing the various functions of the control unit 3010 described above.
  • the guide data storage unit 3022 is a storage area for storing guide data issued by the communication robot 3001. As shown in FIG. 29, the guidance data storage unit 3022 stores guidance data for calling out in association with facial expressions and attributes of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
  • the image data storage unit 3023 is a storage area that stores the frame image acquired by the image acquisition unit 3011.
  • the camera 3031 is a photographing device.
  • the camera 3031 is provided, for example, at a position corresponding to the eye portion of the communication robot 3001.
  • the camera 3031 shoots the subject in the direction that the communication robot 3001 faces.
  • the speaker 3032 is an audio output device.
  • the speaker 3032 is provided, for example, at a position corresponding to the mouth portion of the communication robot 3001.
  • the speaker 3032 outputs sound based on an instruction from the control unit 3010.
  • the motor unit 3033 is a motor for causing the communication robot 3001 to operate.
  • the motor unit 3033 includes a head motor 3033a and a foot motor 3033b.
  • the head motor 3033 a is a motor that moves the head of the communication robot 3001.
  • the head motor 3033a is controlled by the control unit 3010, and moves the head up, down, left, and right in order to photograph various users P in the space 3000S as subjects.
  • the foot motor 3033b is a motor for moving the communication robot 3001 on the foot of the communication robot 3001.
  • FIG. 30 is a flowchart showing robot control processing in the communication robot 3001 according to the third embodiment.
  • FIG. 31 is a flowchart showing image analysis processing in the communication robot 3001 according to the third embodiment.
  • FIG. 32 is a flowchart showing user determination processing in the communication robot 3001 according to the third embodiment.
  • FIG. 33 is a flowchart showing post-guidance processing in the communication robot 3001 according to the third embodiment.
  • step S the control unit 3010 (image acquisition unit 3011) of the communication robot 3001 performs an image acquisition process.
  • the control unit 3010 In the image acquisition process, the control unit 3010 always acquires a frame image via the camera 3031 and determines whether or not one or more users P are included in the acquired frame image.
  • the control unit 3010 stores the frame image acquired at an appropriate timing (for example, every 0.1 second) in the image data storage unit 3023.
  • the control unit 3010 determines whether a processing condition has been met.
  • the processing condition refers to, for example, a case where a plurality of frame images (for example, 30 images) are stored in the image data storage unit 3023, a case where a predetermined time (for example, 3 seconds) has elapsed, or the like. If it is determined that the processing condition has been met (S3011: YES), the control unit 3010 moves the process to S3012. On the other hand, if it is determined that the processing condition is not satisfied (S3011: NO), the control unit 3010 moves the process to S3010.
  • step S ⁇ b> 3012 the control unit 3010 (image analysis unit 3012) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
  • the control unit 3010 extracts one frame image from the image data storage unit 3023.
  • the control unit 3010 determines the direction in which the user P is looking for each user P included in the frame image. For example, when two users P1 and P2 are included in the frame image, the control unit 3010 determines the direction that the user P1 is viewing and the direction that the user P2 is viewing.
  • control unit 3010 can detect a face image from a frame image and determine the orientation of the face based on the arrangement position of each part (organ) of the face. Therefore, the control unit 3010 can determine the orientation of the determined face in the direction in which the user P is looking.
  • a facial organ detection method using a regression model can be used.
  • a facial organ detection method using a regression model is described in, for example, US Patent Application Publication No. 2014/0185924.
  • step S ⁇ b> 3032 the control unit 3010 (distance calculation unit 3012 b) calculates the distance between the user P and the communication robot 3001 for each user P included in the frame image. For example, the control unit 3010 can calculate a rough distance between the user P and the communication robot 3001 based on the size of the user P included in the frame image, the background of the frame image, and the like.
  • the control unit 3010 (expression determination unit 3012c) determines the facial expression for each user P included in the frame image. The control unit 3010 can determine, for example, whether it is a smile (positive expression) or a sad face (negative expression) from the shape of the eyes, nose, mouth, etc. as facial expressions.
  • step S ⁇ b> 3034 the control unit 3010 determines whether all of the processing target frame images included in the image data storage unit 3023 have been processed. When processing has been performed on all the frame images to be processed (S3034: YES), the control unit 3010 ends this processing and shifts the processing to FIG. On the other hand, when the processing has not been performed on all the frame images to be processed (S3034: NO), the control unit 3010 moves the process to S3035. In step S3035, the control unit 3010 extracts one unprocessed frame image from the processing target frame images included in the image data storage unit 3023. Thereafter, the control unit 3010 moves the process to S3031.
  • step S3013 the control unit 3010 (user determination processing unit 3013) performs user determination processing.
  • the control unit 3010 arranges the frame images subjected to the image analysis process in time series.
  • step S3041 the control unit 3010 (direction change amount calculation unit 3013a) calculates the direction change amount for each user P based on the plurality of rearranged frame images.
  • the amount of change in direction refers to the amount of change in the direction in which the user P is looking as time passes. Specifically, when the user P is drenching intensely, the degree of drowning is high, and the control unit 3010 calculates the amount of change in direction to be high.
  • the control unit 3010 calculates the movement speed for each user P based on the rearranged frame images.
  • the control unit 3010 can calculate the moving speed based on the change in the position of the user P among the plurality of frame images. Note that the control unit 3010 may calculate whether the moving speed is faster than a specified range.
  • the control unit 3010 (movement direction calculation unit 3013c) calculates the movement direction for each user P based on the rearranged frame images.
  • the control unit 3010 can calculate the moving direction based on the change in the position of the user P in the plurality of frame images.
  • step S ⁇ b> 3044 the control unit 3010 (user determination processing unit 3013) determines the facial expression that appears most frequently for each user P based on the plurality of rearranged frame images.
  • step S3045 the control unit 3010 (guidance determination unit 3013d) calculates the calling priority based on each index.
  • the voice call priority is a degree calculated by weighting each of the direction change amount, the position of the user P, the moving speed, and the facial expression.
  • the degree of weighting is set highest for the direction change amount.
  • the control unit 3010 performs weighting so that the calling priority is the highest.
  • the control unit 3010 sets the user P as a calling target.
  • the control unit 3010 determines the user P to call on the basis of the calling priority. Note that when the user P has a small direction change amount, the moving speed is slow, and the priority is a small value even if the facial expression is negative. In such a case, that is, if the priority is less than the predetermined value, the control unit 3010 may determine that there is no user P to call out. Thereafter, the control unit 3010 ends this processing, and moves the processing to FIG.
  • step S3014 the control unit 3010 (user determination processing unit 3013) determines whether or not the user P is determined by the user determination processing.
  • the control unit 3010 moves the process to S3015.
  • the control unit 3010 ends this process.
  • the control unit 3010 (attribute determination unit 3012d) determines the attribute of the determined user P. Specifically, the control unit 3010 determines an approximate age or the like from the image of the user P extracted from the frame image. The control unit 3010 may determine whether the child is an adult without determining the age. In that case, the control unit 3010 may estimate the height from the image of the user P and determine whether or not the child is a child.
  • step S3016 the control unit 3010 (guidance data extraction unit 3015) refers to the guidance data storage unit 3022 (FIG. 29) based on the facial expression of the user P determined in step S3044 in FIG. 32 and the attributes determined in step S3015. Extract data.
  • step S3017 the control unit 3010 (movement control unit 3016) performs movement processing. As the moving process, the control unit 3010 controls the motor unit 3033 (foot motor 3033b) based on the moving speed and moving direction of the user P calculated in S3042 and S3043 in FIG. The robot 3001 is moved.
  • step S3018 the control unit 3010 (guide unit 3017) performs guidance processing. As the guidance process, the control unit 3010 outputs the guidance data extracted in S3016 via the speaker 3032 so as to speak (speak) to the user P after moving.
  • step S3019 the control unit 3010 performs post-guidance processing.
  • the control unit 3010 image acquisition unit 3011
  • the control unit 3010 expression determination unit 3012c
  • the control unit 3010 expression comparison unit 3018
  • performs expression comparison processing As facial expression comparison processing, the control unit 3010 compares the facial expression before the guidance data is output (the processing in S3044 in FIG. 32) with the facial expression acquired in S3051.
  • step S3053 the control unit 3010 (guidance data extraction unit 3015) extracts guidance data corresponding to the facial expression comparison result from the guidance data storage unit 3022 (FIG. 29).
  • step S3054 the control unit 3010 (guidance unit 3017) outputs the guidance data extracted in step S3053 via the speaker 3032 as guidance processing. Thereafter, the control unit 3010 shifts the processing to FIG. Returning to FIG. 30, the control unit 3010 ends this processing.
  • 34A and 34B are diagrams showing a specific example of the guidance robot control system 3100 according to the third embodiment.
  • the control unit 3010 of the communication robot 3001 obtains a frame image of the child C1 who is lost and crawls as shown in FIG. 34A (S3010 in FIG. 30), and determines that it is a call target. (S3014 in FIG. 30 is YES).
  • the control unit 3010 extracts guidance data at the time of calling when the attribute is a child and the expression is negative from the guidance data storage unit 3022 (see FIG. 29) (S3016 in FIG. 30).
  • the control unit 3010 moves toward the position of the child C1 (S3017 in FIG. 30), and outputs the extracted guidance data via the speaker 3032 (FIG. 30). S3018).
  • the communication robot 3001 can guess the state of the child C1 based on the movement and facial expression of the child C1, and can make a call suitable for the intention of the child C1. Thereafter, the control unit 3010 obtains a frame image of the child C1, and if the facial expression of the child C1 remains negative as shown in FIG. 34C, the control unit 3010 reads the voice data from the guidance data storage unit 3022. Guidance data of the corresponding facial expression is extracted and output via the speaker 3032 (S3019 in FIG. 30).
  • the amount of change in the direction in which the user P is looking is calculated from a plurality of frame images. Since the direction change amount becomes a large value when the user P is jogging, the communication robot 3001 determines that the user P is in trouble and calls out. Therefore, the state of the user P as to whether or not the user P is in trouble is estimated based on the amount of change in the direction in which the user P is looking, and the communication robot 3001 determines that the user P is in trouble. It is possible to preferentially make a voice call that suits the intention.
  • the communication robot 3001 When the position of the user P and the position of the communication robot 3001 are within the specified range, the communication robot 3001 calls out to the user P. Therefore, the communication robot 3001 can preferentially talk to the user P nearby. (3) When the moving speed of the user P is within the specified range, the communication robot 3001 calls out to the user P. Therefore, when the user P is walking slowly or stops, the communication robot 3001 can preferentially talk. (4) When the movement position of the user P is estimated and the communication robot 3001 approaches, the conversation by the communication robot 3001 with respect to the user P can be performed more naturally.
  • the communication robot 3001 Based on the facial expression of the user P, the guidance data is selected, and the communication robot 3001 speaks to the user P. Therefore, the communication robot 3001 can talk by guiding the content that matches the facial expression of the user P. (6) Based on the change in the facial expression of the user P before and after calling, the communication robot 3001 further guides the next. Therefore, the communication robot 3001 can talk about the content that can be communicated more.
  • An attribute such as the age of the user P is discriminated from the frame image, guidance data corresponding to the attribute of the user P is selected, and the communication robot 3001 calls out to the user P. Therefore, it is possible to talk by guiding the content that matches the attribute of the user P.
  • the communication robot 3001 can perform all processing from acquisition of frame images to output of guidance data. Therefore, only the communication robot 3001 needs to be prepared, and installation can be performed easily.
  • FIG. 35 is a diagram illustrating an example of a scene suitable for operation of the guidance robot control system 3200 according to the fourth embodiment.
  • FIG. 36 is a functional block diagram of a guidance robot control system 3200 according to the fourth embodiment.
  • FIG. 37 is a diagram illustrating an example of the guidance data storage unit 3222 of the communication robot 3201 according to the fourth embodiment.
  • the guidance robot control system 3200 places the communication robot 3201 at the entrance of a space 3200S such as an exhibition booth or a store entrance.
  • the guidance robot control system 3200 is a system used by the communication robot 3201 to call the user P (in this example, the user P4) to enter the space 3200S.
  • the guidance robot control system 3200 includes a communication robot 3201 and a camera 3205. Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S.
  • the camera 3205 is a surveillance camera.
  • the camera 3205 acquires the frame image of the user P passing through the space 3200S, and transmits the frame image to the communication robot 3201.
  • the communication robot 3201 analyzes the frame image of the user P received from the camera 3205, determines the user P to be called, and provides guidance to the user P.
  • the communication robot 3201 calls out the user P who passes in front of the camera 3205 by determining whether he / she is interested in the space 3200S.
  • the communication robot 3201 includes a control unit 3210, a storage unit 3220, a speaker 3032, and a motor unit 3033.
  • the control unit 3210 includes an image reception unit 3211 (image acquisition unit), an image analysis unit 3212, a user determination processing unit 3213, a face data collation unit (face collation unit) 3214, a guidance data extraction unit 3215, and movement control.
  • a unit 3016, a guide unit 3017, and a facial expression comparison unit 3018 are provided.
  • the image reception unit 3211 receives the frame image transmitted by the camera 3205, thereby receiving the frame image.
  • the image analysis unit 3212 analyzes the frame image received by the image reception unit 3211.
  • the image analysis unit 3212 includes a direction acquisition unit 3212a (user direction acquisition unit), a distance calculation unit 3012b, a facial expression determination unit 3212c, and a face acquisition unit 3212e (face acquisition unit).
  • the direction acquisition unit 3212a analyzes the frame image to identify the user P, determines the direction that the user P is viewing for each user P, and acquires the direction that the user P is viewing.
  • the facial expression determination unit 3212c determines the facial expression of the user P based on the face data of the user P acquired by the face acquisition unit 3212e.
  • the face acquisition unit 3212e acquires the face data of the user P for each user P included in the frame image.
  • the user determination processing unit 3213 determines a user P who is to talk to based on a change with time based on the frame image analyzed by the image analysis unit 3212.
  • the user determination processing unit 3213 includes a movement speed calculation unit 3013b, a movement direction calculation unit 3013c, and a guidance determination unit 3213d.
  • the guidance determination unit 3213d determines the user P to call on, based on the viewing direction of the user P acquired by the direction acquisition unit 3212a, the movement direction calculated by the movement direction calculation unit 3013c, and the like.
  • the face data collation unit 3214 collates the face data stored in the face data storage unit 3224 and the face data acquired by the face acquisition unit 3212e.
  • the guidance data extraction unit 3215 extracts the guidance data for guiding the user P determined by the user determination processing unit 3213 from the guidance data storage unit 3222. At that time, the guidance data extraction unit 3215 extracts guidance data different from the previous one when the face data matching unit 3214 can collate.
  • the storage unit 3220 includes a program storage unit 3021, a guidance data storage unit 3222, an image data storage unit 3023, and a face data storage unit 3224.
  • the program storage unit 3021 stores a control program 3221a for executing the various functions of the control unit 3210 described above.
  • the guidance data storage unit 3222 stores guidance data for calling in association with facial expressions of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
  • the face data storage unit 3224 is a storage area for storing face data. When the guide data is output, the face data storage unit 3224 stores the face data acquired by the face acquisition unit 3212e in association with the output guide data.
  • the camera 3205 is a live camera represented by a web camera, for example.
  • the camera 3205 may be a stereo camera.
  • a stereo camera refers to a camera that can capture information about the depth of an object by simultaneously capturing images from slightly shifted positions with a plurality of lenses and generating parallax.
  • the camera 3205 includes an image acquisition unit 3251 and an image transmission unit 3252.
  • the image acquisition unit 3251 is a control unit that acquires a frame image.
  • the image transmission unit 3252 is a control unit that transmits the acquired frame image via the communication network 3000N.
  • Communication robot 3201 and camera 3205 are communicably connected via communication network 3000N.
  • the communication network 3000N is, for example, an Internet line or a mobile terminal communication network.
  • the communication network 3000N may use, for example, short-range wireless communication based on the Bluetooth (registered trademark) standard.
  • FIG. 38 is a flowchart showing robot control processing in the communication robot 3201 according to the fourth embodiment.
  • FIG. 39 is a flowchart showing image analysis processing in the communication robot 3201 according to the fourth embodiment.
  • FIG. 40 is a flowchart showing user determination processing in the communication robot 3201 according to the fourth embodiment.
  • control unit 3210 (image receiving unit 3211) of the communication robot 3201 receives the frame image transmitted by the camera 3205. Then, the control unit 3210 stores the received frame image in the image data storage unit 3023. In step S ⁇ b> 3211, the control unit 3210 (image analysis unit 3212) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
  • image analysis processing will be described with reference to FIG.
  • the control unit 3210 detects the user P included in the frame image and acquires the direction in which the user P is viewing. For example, when two users P1 and P2 are included in the frame image, the control unit 3210 detects the users P1 and P2, and acquires the direction in which each user P is looking.
  • a moving object detection method can be used. In the moving object detection method, a frame image in which the user P is not captured is stored as background image data in advance, and a difference between the background image data and the frame image to be processed is obtained, so that the user P who is a moving object. Is detected.
  • human detection using the HOG feature amount can also be used. Human detection using HOG features is described in “N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection.
  • the control unit 3210 acquires, for example, the orientation of the face of the user P.
  • the face direction of the user P is assumed to be the front direction with respect to the camera 3205 when the face can be detected by the control unit 3210 (face acquisition unit 3212e) using the face detection method.
  • face detection using a Haar-like feature can be used.
  • Haar-like features see “P. Viola and M. J. Jones:“ Rapid Object Detection using the Coordinates of the Cofesto et Ecée et al. ”, Proceedings of the Effort. , Pp. 511-518, (2001).
  • the control unit 3210 may detect a face image from the frame image, extract eyes from the face image, and acquire the direction in which the user P is looking based on the relative position between the eyeball and the pupil.
  • step S3232 the control unit 3210 (face acquisition unit 3212e) acquires face data, and the control unit 3210 (expression determination unit 3212c) determines the expression based on the acquired face data. Thereafter, the control unit 3210 shifts the processing to FIG.
  • the control unit 3210 (user determination processing unit 3213) performs user determination processing.
  • the control unit 3210 arranges the frame images subjected to the image analysis process in time series.
  • the control unit 3210 (direction acquisition unit 3212a) acquires the viewing direction for each user P based on the plurality of rearranged frame images.
  • the control unit 3210 acquires the direction as the viewing direction. Therefore, for example, when the user P is jogging, the control unit 3210 does not acquire the viewing direction.
  • the processing from S3242 to S3244 is the same as the processing from S3042 to S3044 of the third embodiment (FIG. 32).
  • human body tracking using the Mean-shift feature can be used.
  • human body tracking using the Mean-shift feature see “Collins R .: 'Mean-Shift Blob Tracking, Throw Scale Space Space, 4-Proc. IEEE Conf. .”It is described in.
  • step S3245 the control unit 3210 (guidance determination unit 3213d) calculates the calling priority based on each index.
  • the voice call priority is a degree calculated by weighting each of the direction that the user P is viewing, the moving direction, the position of the user P, and the moving speed.
  • the user 3 is interested in the space 3200S, but the user P who is about to pass is preferentially spoken, so the degree of weighting is different in the direction in which the user P is viewing and the direction of movement. Yes, when the user P is facing the front direction (the direction of the camera 3205), the highest is set. Then, the control unit 3210 performs weighting so that the calling priority is the highest when the above condition is satisfied and the moving speed is low.
  • control unit 3210 sets the user P as a calling target.
  • control unit 3210 (guidance determination unit 3213d) determines the user P to call based on the calling priority.
  • control unit 3210 may determine that there is no user P to call out. Thereafter, the control unit 3210 ends this processing, and moves the processing to FIG.
  • step S3213 the control unit 3210 (user determination processing unit 3213) determines whether or not the user P is determined by the user determination processing.
  • the control unit 3210 moves the process to S3214.
  • the control unit 3210 ends this process.
  • step S3214 the control unit 3210 (face data collation unit 3214) collates the face data acquired in step S3232 of FIG. 39 with the face data in the face data storage unit 3224.
  • step S3215 the control unit 3210 (face data collation unit 3214) determines whether or not collation has been completed.
  • control unit 3210 advances the process to S3216. On the other hand, if the verification has failed (S3215: NO), control unit 3210 moves the process to S3216a. In addition, the case where it can collate is a case where the user P was called in the past.
  • step S3216 the control unit 3210 refers to the face data storage unit 3224 and the guide data storage unit 3222, and guide data corresponding to the facial expression of the user P is guide data that is different from the output guide data. Extracted from the storage unit 3222.
  • the control unit 3210 refers to the guidance data storage unit 3222 and extracts guidance data corresponding to the facial expression of the user P.
  • the processing of S3217 and S3218 is the same as the processing of S3017 and S3018 of the third embodiment (FIG. 30).
  • step S ⁇ b> 3219 the control unit 3210 associates the face data of the user P with the output guidance data and stores the data in the face data storage unit 3224.
  • step S3220 the control unit 3210 performs post-guidance processing.
  • the post-guidance process is the same as in the third embodiment (FIG. 33). Thereafter, the control unit 3210 ends this process.
  • 41A and 41B are diagrams showing a specific example of a guidance robot control system 3200 according to the fourth embodiment.
  • Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S.
  • the image acquisition unit 3251 of the camera 3205 acquires a frame image as needed, and the image transmission unit 3252 transmits the acquired frame image to the communication robot 3201 as needed.
  • the control unit 3210 walks in front of the space 3200S and targets the user P4 whose face is facing the communication robot 3201 (YES in S3213 in FIG. 38).
  • the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the acquired face data is not stored, the control unit 3210 starts from the guidance data storage unit 3222 (see FIG. 37). Then, guidance data at the time of calling corresponding to the facial expression is extracted (S3216a in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 asks the user P4 “Please drop in at the store”. Then, when the user P4 stops and looks at the communication robot 3201, a further call is made (S3220 in FIG. 38).
  • the control unit 3210 After a while, when the same user P4 is walking in front of the space 3200S and the face is facing the communication robot 3201, as shown in FIG. 41B, the control unit 3210 calls the user P4. Target (S3213 in FIG. 38 is YES). Next, the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the user P4 has provided guidance, the control data storage unit 3222 (see FIG. 37), the guidance data at the time of calling different from the previous one corresponding to the facial expression is extracted (S3216 in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 calls out to the user P4, “You have already passed?”.
  • the guidance robot control system 3200 has the following effects.
  • the communication robot 3201 speaks by determining whether or not to speak to the user P based on the relationship between the direction in which the user P is viewing and the moving direction of the user P. Therefore, for example, the user P who is looking in the direction of the space 3200S while passing in front of the space 3200S can be presumed to be interested in the space 3200S and can be spoken to the user P.
  • the communication robot 3201 can acquire the direction in which the user P is looking based on whether or not the face of the user P has been detected, the processing can be easily performed.
  • the communication robot 3201 Since the communication robot 3201 stores the face data of the user P who has been guided, when the next guidance is given to the same user P, different guidance data from the previous one is provided. Can be output. Therefore, since the guidance can be changed by using the face authentication function, different voice calls can be made. (4) Since the communication robot 3201 performs processing using the frame image acquired by the camera 3205, even if the communication robot 3201 moves, the camera 3205 is fixed and the position for acquiring the frame image does not change. For example, it is possible to easily process a plurality of frame images arranged in time series.
  • the communication robot includes a camera.
  • the communication robot receives a frame image from the camera.
  • the present invention is not limited to this.
  • a communication robot may receive a frame image from a camera.
  • the position of the camera is not limited to the vicinity of the communication robot, and may be installed on the ceiling of the space.
  • frame images may be acquired from a plurality of cameras.
  • a communication robot including a camera may be used. In that case, in order to easily perform image processing, it is desirable that the communication robot does not move and does not change the acquisition position of the frame image.
  • the communication robot performs various processes as an example, but the present invention is not limited to this.
  • a server that is communicably connected to the communication robot may be provided, and the server may perform various processes.
  • the communication robot only needs to include at least a speaker that outputs guidance data transmitted from the server and a motor unit for operating the communication robot.
  • Various processes may be performed by both the communication robot and the server.
  • the example using the frame image acquired at an appropriate timing has been described as an example, but the present invention is not limited to this.
  • moving image data may be used.
  • a plurality of frame images may be acquired from the moving image data, and image analysis processing may be performed for each frame image.
  • image analysis processing may be performed for each frame image.
  • the example in which the image analysis process is performed on all users included in the acquired frame image has been described as an example.
  • the present invention is not limited to this.
  • One user may be selected from the acquired frame images, and an image of the selected user may be analyzed to determine whether or not to make a call target.
  • the direction in which the user is viewing is set as the face direction, and the case where the neck is moved and the direction change amount is increased as it is staggered has been described.
  • the present invention is not limited to this.
  • the direction that the user is viewing may be the direction of the line of sight.
  • the control unit can detect the face image from the frame image and extract the eye from the face image, thereby acquiring the eye direction based on the relative position between the eyeball and the pupil. In that case, when the user's moving direction is different from the face direction, and the user's viewing direction is different, it can be a voice call target.
  • the present invention is not limited to this.
  • a camera having a sensor that acquires depth information represented by Kinect registered trademark
  • Kinect registered trademark
  • the process of detecting the user's face from the frame image and calculating the moving direction of the user can be performed more easily.
  • the processing related to the user attribute is not performed.
  • the user attribute is determined, and the guidance data corresponding to the determined user attribute is extracted. It may be.
  • FIG. 42A and 42B are diagrams showing a specific example of a guidance robot control system 3300 according to a modified embodiment.
  • FIG. 42A shows a case where the user P5 who is walking toward the space 3300S and whose face is facing the communication robot 3301 (camera 3305) is a voice call target.
  • the communication robot 3301 outputs the guidance data corresponding to the moving direction “I welcome you. Would you like to guide you?” And speaks to the user P5.
  • FIG. 42B shows a case in which the user P6 who comes out of the space 3300S and whose face is facing the communication robot 3301 is targeted for calling.
  • the communication robot 3301 outputs “thank you”, which is guidance data corresponding to the moving direction, and speaks to the user P6.
  • the communication robot 3301 extracts and outputs guidance data corresponding to the moving direction of the user P. Therefore, the communication robot 3301 can make different calls to the user P5 entering the space 3300S and the user P6 coming out of the space 3300S, and make different appropriate calls according to the situation of the user P.
  • the communication robot 3301 can be used.

Abstract

Provided is an information presentation apparatus or the like that is capable of natural conversation just like with a human, when starting a conversation with a user. An information presentation apparatus 10 is provided with: an image input unit 11 to which an image including a user's face is inputted; an area determination unit 12 that determines areas of facial organs from the image; a color information extraction unit 13 that extracts color information from the areas; a storage unit 15 that stores a color language table 14 in which color information and language information are associated; and a checking unit 16 that obtains the language information by checking, in the color language table 14, the color information extracted by the color information extraction unit 13.

Description

情報提示装置、情報提示方法、プログラム、情報処理装置及び案内ロボット制御システムInformation presentation apparatus, information presentation method, program, information processing apparatus, and guidance robot control system
 本発明は、ユーザ(人間)と対話を行う情報提示装置等に関する。
 また、本発明は、人物の属性(分類)に応じた処理を行う情報処理装置等に関する。
 さらに、本発明は、案内ロボット制御システム等に関する。
The present invention relates to an information presentation apparatus that performs a dialog with a user (human).
The present invention also relates to an information processing apparatus that performs processing according to a person's attribute (classification).
Furthermore, the present invention relates to a guidance robot control system and the like.
 近年、ロボットやコンピュータが、人間であるユーザと対話を行うための研究や開発が進められている。例えば、特許文献1には、ユーザから音声入力を受信し、音声入力の音声態様に基づいてユーザの応答処理に対する反応を評価し、評価結果に基づいて応答内容を変更する情報提示装置が開示されている。特許文献1のように、従来の情報提示装置では、人間の発話内容や表情の変化等を考慮して発話する内容を決める。 In recent years, research and development have been promoted so that robots and computers can interact with human users. For example, Patent Document 1 discloses an information presentation device that receives voice input from a user, evaluates a response to the user's response processing based on the voice mode of the voice input, and changes the response content based on the evaluation result. ing. As in Patent Document 1, in the conventional information presentation apparatus, content to be uttered is determined in consideration of human utterance content, changes in facial expressions, and the like.
 また、従来から、人物に対して、情報案内、対話、宣伝等の処理を自動で行うシステムが存在する。これらのシステムは、人物の属性(分類)に関わらず画一的な処理を行うものが多く、人物に対して効果的な情報提供が必ずしも実現されていない。これに対して、人物の属性(分類)に応じた処理を自動で行うシステムも存在する。 Also, conventionally, there are systems that automatically process information guidance, dialogue, advertisement, etc. for a person. Many of these systems perform uniform processing regardless of a person's attribute (classification), and effective provision of information to a person is not necessarily realized. On the other hand, there is a system that automatically performs processing according to a person's attribute (classification).
 特許文献2には、人物の年齢および性別を認識して音声や動きによる演出効果を行うことで、効果的な情報提示を行う、情報処理装置が開示されている。また特許文献3には、個体の画像データに基づき判別される個体情報を参照することにより、出力する情報を制御する出力制御装置が開示されている。これら先行技術によれば、各人物の情報を反映した効果的な自動対応を実現している。 Patent Document 2 discloses an information processing apparatus that presents effective information by recognizing a person's age and gender and performing a presentation effect by voice or movement. Patent Document 3 discloses an output control apparatus that controls information to be output by referring to individual information determined based on individual image data. According to these prior arts, effective automatic correspondence reflecting each person's information is realized.
 さらに、従来、来訪者に対して案内サービスを提供するロボットが開示されている(例えば、特許文献4)。 Furthermore, a robot that provides a guidance service to visitors has been disclosed (for example, Patent Document 4).
特開2015-138147号公報JP2015-138147A 特開2012-185303号公報JP 2012-185303 A 特開2015-021813号公報Japanese Patent Laying-Open No. 2015-021813 特開2008-260107号公報JP 2008-260107 A
 ところで、人間同士(特に、初対面の人同士)が会話する際、相手から受ける印象(特に、第1印象)を基に、会話の導入部分の話題を決めることが多い。しかしながら、特許文献1を含む従来の情報提示装置では、ユーザ(特に、初対面のユーザ)との対話の導入時において、人間同士のような自然な対話が実現できていない。 By the way, when humans (especially those who meet for the first time) have a conversation, the topic of the introduction part of the conversation is often decided based on the impression (especially the first impression) received from the other party. However, the conventional information presentation apparatus including Patent Document 1 cannot realize a natural dialogue between humans when introducing a dialogue with a user (especially a first-time user).
 また、特許文献2では、画像のみから人物の年齢及び性別を判別するものである。また、特許文献3も、撮影部で撮像された画像データのみに基づき、属性を判別するものである。このように、特許文献2、3の方法は、画像のみの単一種別のデータに基づいて人物の属性(分類)を判別するため、人物の属性(分類)を判別するための判断指標が少なく、人物の属性(分類)を的確に特定できない場合があった。また、画像のみから判別可能な人物の属性(分類)だけに判別対象が限定されていた。 Also, in Patent Document 2, the age and sex of a person are discriminated only from an image. Patent Document 3 also determines an attribute based only on image data captured by the imaging unit. As described above, since the methods of Patent Documents 2 and 3 determine the attribute (classification) of a person based on a single type of data including only images, there are few determination indexes for determining the attribute (classification) of a person. In some cases, the attribute (classification) of a person could not be specified accurately. Further, the discrimination target is limited only to the attribute (classification) of the person that can be discriminated only from the image.
 さらに、特許文献4に記載のロボットは、来訪者等のユーザの移動に関するパターンに応じた案内サービスを提供するものであったので、ユーザの意図によらずに、案内サービスを強制的に提供する可能性があった。 Furthermore, since the robot described in Patent Document 4 provides a guidance service according to a pattern related to the movement of a user such as a visitor, the guidance service is forcibly provided regardless of the user's intention. There was a possibility.
 本発明は、前述した問題点に鑑みてなされたものであり、その目的とすることは、ユーザとの対話の導入時において、人間同士のような自然な対話が可能な情報提示装置等を提供することである。
 また、本発明は、このような観点からなされたものであり、その目的とすることは、人物の分類を好適に特定し、特定した分類に応じた処理を実行することが可能な、情報処理装置等を提供することである。
 さらに、本発明は、ユーザの意図を考慮して声掛けを行うことができる案内ロボット制御システム等を提供することを目的とする。
The present invention has been made in view of the above-described problems, and an object of the present invention is to provide an information presentation apparatus and the like capable of natural conversations between humans when introducing a conversation with users. It is to be.
In addition, the present invention has been made from such a viewpoint, and it is an object of the present invention to appropriately specify a person's classification and to execute processing according to the identified classification. It is to provide a device or the like.
Furthermore, an object of the present invention is to provide a guidance robot control system and the like that can perform voice calls in consideration of the user's intention.
 前述した目的を達成するための第1の発明は、ユーザと対話を行う情報提示装置であって、色情報と言語情報とが対応付けられている色言語テーブルを記憶する記憶手段と、入力される画像を取得する画像入力手段と、前記画像から1以上の所定の領域を判定する領域判定手段と、前記領域から前記色情報を抽出する色情報抽出手段と、前記色情報抽出手段によって抽出される前記色情報に基づいて前記色言語テーブルを照合し、前記言語情報を取得する照合手段と、を備えることを特徴とする情報提示装置である。第1の発明によって、ユーザとの対話の導入時において、人間同士のような自然な対話が可能となる。 A first invention for achieving the above-described object is an information presentation apparatus for interacting with a user, and is input with storage means for storing a color language table in which color information and language information are associated with each other. Extracted by an image input means for acquiring an image, an area determination means for determining one or more predetermined areas from the image, a color information extraction means for extracting the color information from the area, and the color information extraction means. And a collating unit that collates the color language table based on the color information and obtains the language information. According to the first invention, when a dialogue with a user is introduced, a natural dialogue like humans is possible.
 第1の発明は、前記言語情報に基づいて発話内容を決定する発話内容決定手段、を更に備えるようにしても良い。これによって、色情報から得られるユーザの印象に基づく対話が可能となる。 The first invention may further include speech content determining means for determining speech content based on the language information. This enables a dialogue based on the user's impression obtained from the color information.
 また、第1の発明における前記色情報抽出手段は、前記領域それぞれに対して減色処理を行い、前記領域を代表する1又は複数の前記色情報を抽出するようにしても良い。これによって、色情報を的確に抽出することができる。 Further, the color information extracting means in the first invention may perform a color reduction process on each of the areas to extract one or a plurality of the color information representing the area. This makes it possible to accurately extract color information.
 また、第1の発明における前記言語情報は、印象を表現する印象語であるようにしても良い。これによって、ユーザの印象を表現する的確な言語を得ることができる。 Further, the language information in the first invention may be an impression word expressing an impression. Thus, an accurate language that expresses the user's impression can be obtained.
 また、第1の発明は、前記言語情報に関連する関連語を取得する関連語取得手段、を更に備えるようにしても良い。これによって、ユーザの印象を表現する言語を幅広く得ることができる。 Further, the first invention may further include related word acquisition means for acquiring related words related to the language information. This makes it possible to obtain a wide range of languages that express user impressions.
 また、第1の発明における前記記憶手段は、更に、前記領域及び前記言語情報と、前記発話内容とが対応付けられている対話ルールを記憶し、前記発話内容決定手段は、前記対話ルールに基づいて前記発話内容を決定するようにしても良い。これによって、ユーザの印象に基づく的確な対話を行うことができる。 Further, the storage means in the first invention further stores a dialogue rule in which the area and the language information are associated with the utterance content, and the utterance content determination means is based on the dialogue rule. The utterance content may be determined. As a result, an accurate dialogue based on the impression of the user can be performed.
 また、第1の発明における前記記憶手段は、更に、前記領域及び前記言語情報と、前記ユーザに推薦する推薦情報とが対応付けられている推薦テーブルを記憶し、前記発話内容決定手段は、更に、前記推薦テーブルに基づいて前記推薦情報を決定するようにしても良い。これによって、ユーザの印象に基づく的確な推薦情報を提示することができる。 Further, the storage means in the first invention further stores a recommendation table in which the area and the language information are associated with recommendation information recommended to the user, and the utterance content determination means further includes The recommendation information may be determined based on the recommendation table. Thus, accurate recommendation information based on the user's impression can be presented.
 第2の発明は、色情報と言語情報とが対応付けられている色言語テーブルを記憶するコンピュータが、ユーザと対話を行う情報提示方法であって、前記ユーザの画像を入力する画像入力ステップと、前記画像から所定の領域を判定する領域判定ステップと、前記領域から前記色情報を抽出する色情報抽出ステップと、前記色情報抽出ステップによって抽出される前記色情報に基づいて前記色言語テーブルを照合し、前記言語情報を取得する照合ステップと、を備えることを特徴とする情報提示方法である。これによって、第2の発明によって、ユーザとの対話の導入時において、人間同士のような自然な対話が可能となる。 A second invention is an information presentation method in which a computer storing a color language table in which color information and language information are associated with each other interacts with a user, and the image input step inputs the user's image. An area determination step for determining a predetermined area from the image; a color information extraction step for extracting the color information from the area; and the color language table based on the color information extracted by the color information extraction step. And a collation step of collating and obtaining the language information. Thus, according to the second aspect of the present invention, a natural dialogue like humans is possible when a dialogue with a user is introduced.
 第3の発明は、コンピュータを、請求項1に記載の情報提示装置として機能させることを特徴とするプログラムである。第3の発明のプログラムを汎用のコンピュータにインストールすることによって、第1の発明の情報提示装置を得ることができる。 The third invention is a program that causes a computer to function as the information presentation device according to claim 1. The information presentation apparatus of the first invention can be obtained by installing the program of the third invention on a general-purpose computer.
 第4の発明は、人物の分類ごとに特徴を記憶する記憶手段と、人物の特徴を抽出する抽出手段と、抽出した前記特徴に基づいて前記人物の分類を特定する特定手段と、特定した前記分類に応じた処理を行う処理手段と、を備えることを特徴とする情報処理装置である。
 第4の発明によって、人物の分類(嗜好、性別、年代、言語、国籍等の人物の属性)を好適に特定し、特定した分類に応じた処理を実行することが可能な、情報処理装置が提供される。
According to a fourth aspect of the present invention, there is provided a storage unit that stores a feature for each person classification, an extraction unit that extracts a person feature, a specifying unit that specifies the person classification based on the extracted feature, An information processing apparatus comprising: processing means for performing processing according to classification.
According to the fourth invention, there is provided an information processing apparatus capable of suitably specifying a person's classification (person attributes such as preference, gender, age, language, nationality) and executing a process according to the specified classification. Provided.
 また第4の発明において、前記抽出手段は、複数の異なる種別のデータを用いて特徴を抽出することが望ましい。これにより、複数の異なる種別のデータを用いて人物の分類を好適に特定することができる。 In the fourth aspect of the invention, it is desirable that the extraction unit extracts features using a plurality of different types of data. Thereby, the classification of a person can be suitably specified using a plurality of different types of data.
 また第4の発明において、前記人物を含む映像データを取得する映像取得手段、を更に備え、前記抽出手段は、少なくとも、前記映像データから前記人物の特徴を抽出し、前記特定手段は、少なくとも、前記特徴に基づいて前記人物の分類を特定することが望ましい。これにより、少なくとも、映像データから抽出される人物の特徴に基づいて、人物の分類を好適に特定することができる。 According to a fourth aspect of the present invention, the image acquisition device further includes image acquisition means for acquiring image data including the person, the extraction means extracts at least the characteristics of the person from the image data, and the identification means includes at least: It is desirable to specify the classification of the person based on the feature. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the video data.
 また、前記抽出手段は、前記人物の動作に関する特徴を抽出するように構成してもよい。これにより、映像データから抽出される人物の動作の特徴に基づいて、人物の分類を好適に特定することができる。 Further, the extraction means may be configured to extract features related to the movement of the person. Thereby, the classification of the person can be suitably specified based on the characteristics of the movement of the person extracted from the video data.
 また、人物に対して情報を提示する情報提示手段、を更に備え、前記抽出手段は、前記情報に対する、前記人物の反応に関する特徴を抽出するように構成してもよい。これにより、情報を提示した際の人物の反応の特徴に基づいて、人物の分類を好適に特定することができる。 Further, an information presenting means for presenting information to a person may be further provided, and the extracting means may be configured to extract features related to the person's reaction to the information. Thereby, based on the characteristic of the reaction of the person at the time of presenting information, the person classification can be suitably specified.
 また第4の発明において、前記人物の音声データを取得する音声取得手段、を更に備え、前記抽出手段は、少なくとも、前記音声データから前記人物の特徴を抽出し、前記特定手段は、少なくとも、前記特徴に基づいて前記人物の前記分類を特定することが望ましい。これにより、少なくとも、音声データから抽出される人物の特徴に基づいて、人物の分類を好適に特定することができる。 According to a fourth aspect of the present invention, the apparatus further comprises sound acquisition means for acquiring sound data of the person, wherein the extraction means extracts at least the characteristics of the person from the sound data, and the specifying means includes at least the It is desirable to identify the classification of the person based on characteristics. Accordingly, it is possible to preferably specify the classification of the person based on at least the characteristics of the person extracted from the audio data.
 また、前記抽出手段は、声の強さ、声の高さ、周波数特性、声質、発話テンポ、倍音特性、抑揚、反応音の特徴、発話言語、発話内容の少なくともいずれかを抽出するように構成してもよい。これにより、音声データから人物の特徴が好適に抽出される。 The extraction means is configured to extract at least one of voice strength, voice pitch, frequency characteristics, voice quality, utterance tempo, harmonic characteristics, intonation, response sound characteristics, utterance language, and utterance content. May be. Thereby, the characteristics of a person are preferably extracted from the audio data.
 また第4の発明において、前記人物の匂いデータを取得する匂い取得手段、を更に備え、前記抽出手段は、少なくとも、前記匂いデータから前記人物の特徴を抽出し、前記特定手段は、少なくとも、前記特徴に基づいて前記分類を特定することが望ましい。これにより、少なくとも、匂いデータから抽出される人物の特徴に基づいて、人物の分類を好適に特定することができる。 In the fourth invention, the apparatus further comprises odor acquisition means for acquiring odor data of the person, wherein the extraction means extracts at least the characteristics of the person from the odor data, and the specifying means includes at least the odor data. It is desirable to identify the classification based on features. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the odor data.
 また第4の発明において、抽出した全ての特徴について、各特徴がどの分類に属するかを集計し、最も集計数が多い分類を人物の分類として特定することが望ましい。これにより、抽出された特徴に基づいて人物の分類を好適に特定することができる。 In the fourth aspect of the invention, it is desirable to totalize to which category each of the extracted features belongs, and to specify the category with the largest total number as the category of the person. Thereby, the classification of the person can be suitably specified based on the extracted feature.
 第5の発明は、ユーザに対する案内データを記憶する案内データ記憶部と、時系列に連続した複数のフレーム画像を取得する画像取得手段と、前記画像取得手段により取得した前記複数のフレーム画像を分析して、前記フレーム画像ごとにユーザが見ている方向を判別する方向判別手段と、前記方向判別手段により判別した方向に基づいて、時間経過に伴う前記方向の変化の大きさを表す方向変化量を算出する方向変化量算出手段と、前記方向変化量算出手段によって算出した方向の変化に基づいて、案内をするか否かを判断する案内判断手段と、前記案内判断手段により案内をすると判断された場合に、前記案内データ記憶部から案内データを抽出してロボットによる案内を行う案内手段と、を備える案内ロボット制御システムである。
 第6の発明は、第5の発明の案内ロボット制御システムにおいて、前記画像取得手段が取得したフレーム画像における前記ユーザの位置情報に基づいて、前記ユーザと、ロボットとの距離を算出する距離算出手段を備え、前記案内判断手段は、前記距離算出手段が算出した前記距離が規定範囲内である場合に、案内をすると判断すること、を特徴とする案内ロボット制御システムである。
 第7の発明は、第5の発明又は第6の発明の案内ロボット制御システムにおいて、前記画像取得手段が取得したフレーム画像における前記ユーザの位置情報に基づいて、前記ユーザの移動速度を算出する移動速度算出手段を備え、前記案内判断手段は、前記移動速度算出手段が算出した前記移動速度が規定範囲内である場合に、案内をすると判断すること、を特徴とする案内ロボット制御システムである。
 第8の発明は、第7の発明の案内ロボット制御システムにおいて、ロボットは、移動が可能であり、前記画像取得手段が取得したフレーム画像における前記ユーザの位置情報に基づいて、前記ユーザの移動方向を算出する移動方向算出手段と、前記移動方向算出手段により算出した前記ユーザの移動方向及び前記移動速度算出手段が算出した前記移動速度に基づいて、前記ロボットを移動させる移動制御手段と、備えること、を特徴とする案内ロボット制御システムである。
 第9の発明は、第5の発明から第7の発明までのいずれかの案内ロボット制御システムにおいて、前記方向判別手段は、前記ユーザの眼球と瞳孔との相対位置に基づき、前記ユーザが見ている方向を判別し、前記画像取得手段が取得したフレーム画像における前記ユーザの位置情報に基づいて、前記ユーザの移動方向を算出する移動方向算出手段と、前記画像取得手段により取得した前記複数のフレーム画像を分析して、前記ユーザの顔データを取得する顔取得手段と、前記フレーム画像ごとに前記ユーザの顔の向きを判別する顔方向判別手段と、を備え、前記案内判断手段は、前記移動方向算出手段が算出した前記ユーザの移動方向と、前記顔方向判別手段で判別した複数のフレーム画像に共通の顔の向きとが異なる方向である場合に、案内をすると判断すること、を特徴とする案内ロボット制御システムである。
 第10の発明は、第9の発明の案内ロボット制御システムにおいて、前記方向判別手段は、前記顔取得手段により取得した顔データに基づいて、顔の器官を検出し、それらの位置関係から顔の向きを判定し、前記ユーザの見ている方向を取得すること、を特徴とする案内ロボット制御システムである。
 第11の発明は、第5の発明から第10の発明までのいずれかの案内ロボット制御システムにおいて、前記画像取得手段により取得した少なくとも1つ以上の前記フレーム画像を分析して、前記ユーザの表情を判別する表情判別手段を備え、前記案内データ記憶部は、表情に対応する案内データを記憶し、前記案内手段は、前記案内データ記憶部を参照して、前記表情判別手段により判別された前記ユーザの表情に基づく案内データを抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
 第12の発明は、第11の発明の案内ロボット制御システムにおいて、案内データによる案内をした後に、少なくとも前記画像取得手段による処理と、前記表情判別手段による処理とを実行し、案内前の前記ユーザの表情と、案内後の前記ユーザの表情とを比較する表情比較手段を備え、前記案内手段は、前記案内データ記憶部を参照して、前記表情比較手段による比較結果に基づく案内データを抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
 第13の発明は、第5の発明から第12の発明までのいずれかの案内ロボット制御システムにおいて、前記画像取得手段により取得した少なくとも1つ以上の前記フレーム画像を分析して、前記ユーザの属性を判別する属性判別手段を備え、前記案内データ記憶部は、ユーザの属性に対応する案内データを記憶し、前記案内手段は、前記案内データ記憶部を参照して、前記属性判別手段により判別した前記ユーザの属性に対応する案内データを抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
According to a fifth aspect of the present invention, there is provided a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of frame images continuous in time series, and an analysis of the plurality of frame images acquired by the image acquisition unit. And a direction determining unit that determines a direction that the user is viewing for each frame image, and a direction change amount that represents a magnitude of a change in the direction over time based on the direction determined by the direction determining unit. Based on the direction change calculated by the direction change amount calculating means, guidance determining means for determining whether or not to guide based on the direction change calculated by the direction change amount calculating means, and the guidance determining means A guidance robot control system comprising: guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot.
According to a sixth aspect of the present invention, in the guidance robot control system according to the fifth aspect of the present invention, distance calculation means for calculating a distance between the user and the robot based on the position information of the user in the frame image acquired by the image acquisition means. The guidance determination unit is configured to determine that guidance is performed when the distance calculated by the distance calculation unit is within a specified range.
7th invention is the guidance robot control system of 5th invention or 6th invention, The movement which calculates the moving speed of the said user based on the said user's positional information in the frame image which the said image acquisition means acquired The guidance robot control system includes a speed calculation unit, and the guidance determination unit determines to perform guidance when the movement speed calculated by the movement speed calculation unit is within a predetermined range.
According to an eighth aspect of the present invention, in the guided robot control system of the seventh aspect, the robot can move, and based on the position information of the user in the frame image acquired by the image acquisition means, the moving direction of the user And a movement control means for moving the robot based on the movement direction of the user calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means. The guide robot control system characterized by the above.
A ninth aspect of the present invention is the guidance robot control system according to any one of the fifth to seventh aspects, wherein the direction discriminating means is configured to be viewed by the user based on a relative position between the user's eyeball and pupil. A plurality of frames acquired by the image acquisition means, and a movement direction calculation means for calculating a movement direction of the user based on position information of the user in the frame image acquired by the image acquisition means. Analyzing the image and acquiring face data of the user; and face direction determining means for determining the orientation of the user's face for each of the frame images, wherein the guidance determining means includes the movement When the moving direction of the user calculated by the direction calculating unit is different from the direction of the face common to the plurality of frame images determined by the face direction determining unit, They are determined that the to a guidance robot control system according to claim.
According to a tenth aspect of the present invention, in the guidance robot control system according to the ninth aspect of the invention, the direction determining means detects a facial organ based on the face data acquired by the face acquiring means, A guidance robot control system characterized by determining a direction and acquiring a direction that the user is looking at.
An eleventh aspect of the present invention is the guide robot control system according to any one of the fifth aspect to the tenth aspect, wherein at least one of the frame images acquired by the image acquisition unit is analyzed, and the facial expression of the user is analyzed. The guidance data storage unit stores guidance data corresponding to a facial expression, and the guidance unit refers to the guidance data storage unit and is determined by the facial expression determination unit. A guidance robot control system characterized by extracting guidance data based on a user's facial expression and performing guidance.
According to a twelfth aspect of the present invention, in the guidance robot control system of the eleventh aspect of the invention, after guiding by guidance data, at least the processing by the image acquisition means and the processing by the facial expression discrimination means are executed, and the user before guidance Facial expression comparing means for comparing the facial expression of the user and the facial expression of the user after guidance, the guidance means refers to the guidance data storage unit and extracts guidance data based on the comparison result by the facial expression comparison means This is a guidance robot control system characterized by performing guidance.
In a thirteenth aspect of the present invention, in any one of the guided robot control systems according to the fifth to twelfth aspects of the present invention, the attribute of the user is analyzed by analyzing at least one of the frame images acquired by the image acquisition unit. Attribute determining means for determining the guide data, the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit A guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.
 第14の発明は、ユーザに対する案内データを記憶する案内データ記憶部と、時系列に連続した複数のフレーム画像を取得する画像取得手段と、前記画像取得手段により取得した前記フレーム画像を分析してユーザを特定し、見ている方向を取得するユーザ方向取得手段と、前記画像取得手段により取得した前記フレーム画像を分析して、前記ユーザの移動方向を算出する移動方向算出手段と、前記ユーザ方向取得手段により取得したユーザの見ている向きと、前記移動方向算出手段が算出したユーザの移動方向とに基づいて、案内をするか否かを判断する案内判断手段と、前記案内判断手段により案内をすると判断された場合に、前記案内データ記憶部から案内データを抽出してロボットによる案内を行う案内手段と、を備える案内ロボット制御システムである。
 第15の発明は、第14の発明の案内ロボット制御システムにおいて、前記案内データ記憶部は、移動方向に対応する案内データを記憶していること、を特徴とする案内ロボット制御システムである。
 第16の発明は、第14の発明又は第15の発明の案内ロボット制御システムにおいて、前記ユーザと、ロボットとの距離を算出する距離算出手段を備え、前記案内判断手段は、前記距離算出手段が算出した前記距離が規定範囲内である場合に、案内をすると判断すること、を特徴とする案内ロボット制御システムである。
 第17の発明は、第14の発明から第16の発明までのいずれかの案内ロボット制御システムにおいて、前記ユーザの移動速度を算出する移動速度算出手段を備え、前記案内判断手段は、前記移動速度算出手段が算出した前記移動速度が規定範囲内である場合に、案内をすると判断すること、を特徴とする案内ロボット制御システムである。
 第18の発明は、第17の発明の案内ロボット制御システムにおいて、ロボットは、移動が可能であり、前記移動方向算出手段により算出した前記ユーザの移動方向及び前記移動速度算出手段が算出した前記移動速度に基づいて、前記ロボットを移動させる移動制御手段を備えること、を特徴とする案内ロボット制御システムである。
 第19の発明は、第14の発明から第18の発明までのいずれかの案内ロボット制御システムにおいて、前記画像取得手段により取得した前記フレーム画像を分析して、前記ユーザの顔データを取得する顔取得手段を備え、前記ユーザ方向取得手段は、前記顔取得手段が取得した前記ユーザの顔データに基づいて前記ユーザの見ている方向を取得すること、を特徴とする案内ロボット制御システムである。
 第20の発明は、第19の発明の案内ロボット制御システムにおいて、前記ユーザ方向取得手段は、前記顔取得手段により取得した顔データに基づいて、顔の器官を検出し、それらの位置関係から顔の向きを判定し前記ユーザの見ている方向を取得するか、又は、前記ユーザの眼球と瞳孔との相対位置に基づき前記ユーザが見ている方向を取得すること、を特徴とする案内ロボット制御システムである。
 第21の発明は、第19の発明又は第20の発明の案内ロボット制御システムにおいて、前記案内手段により前記案内データが出力された場合に、前記顔取得手段により取得した前記ユーザの顔データを記憶する顔データ記憶部と、前記顔取得手段により取得した前記ユーザの顔データと、前記顔データ記憶部に記憶された顔データとを照合する顔照合手段と、を備え、前記案内手段は、前記顔照合手段により前記顔データ記憶部に記憶されている顔データと照合できた場合には、前回とは異なる案内データを前記案内データ記憶部から抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
 第22の発明は、第19の発明から第21の発明までのいずれかの案内ロボット制御システムにおいて、前記顔取得手段により取得した前記ユーザの顔データに基づいて、前記ユーザの表情を判別する表情判別手段を備え、前記案内データ記憶部は、表情に対応する案内データを記憶し、前記案内手段は、前記案内データ記憶部を参照して、前記表情判別手段により判別された前記ユーザの表情に基づく案内データを抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
 第23の発明は、第22の発明の案内ロボット制御システムにおいて、案内データによる案内をした後に、少なくとも前記画像取得手段による処理と、前記顔取得手段による処理と、前記表情判別手段による処理とを実行し、案内前の前記ユーザの表情と、案内後の前記ユーザの表情とを比較する表情比較手段を備え、前記案内手段は、前記案内データ記憶部を参照して、前記表情比較手段による比較結果に基づく案内データを抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
 第24の発明は、第14の発明から第23の発明までのいずれかの案内ロボット制御システムにおいて、前記画像取得手段により取得した少なくとも1つ以上の前記フレーム画像を分析して、前記ユーザの属性を判別する属性判別手段を備え、前記案内データ記憶部は、ユーザの属性に対応する案内データを記憶し、前記案内手段は、前記案内データ記憶部を参照して、前記属性判別手段により判別した前記ユーザの属性に対応する案内データを抽出して案内を行うこと、を特徴とする案内ロボット制御システムである。
In a fourteenth aspect of the present invention, a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of time-series frame images, and the frame image acquired by the image acquisition unit are analyzed. User direction acquisition means for specifying a user and acquiring a viewing direction; movement direction calculation means for analyzing the frame image acquired by the image acquisition means to calculate the movement direction of the user; and the user direction Guidance determination means for determining whether or not to guide based on the user's viewing direction acquired by the acquisition means and the user's movement direction calculated by the movement direction calculation means; and guidance by the guidance determination means A guidance robot that extracts guidance data from the guidance data storage unit and performs guidance by a robot when it is determined to perform guidance. It is a control system.
A fifteenth aspect of the present invention is the guide robot control system according to the fourteenth aspect of the present invention, wherein the guide data storage unit stores guide data corresponding to a moving direction.
A sixteenth aspect of the invention is the guidance robot control system according to the fourteenth aspect of the invention or the fifteenth aspect of the invention, further comprising distance calculation means for calculating a distance between the user and the robot, and the guidance determination means includes the distance calculation means. A guidance robot control system, characterized in that, when the calculated distance is within a specified range, it is determined that guidance is to be provided.
A seventeenth aspect of the invention is the guidance robot control system according to any one of the fourteenth aspect to the sixteenth aspect of the invention, further comprising movement speed calculation means for calculating the movement speed of the user, wherein the guidance determination means A guidance robot control system characterized in that guidance is determined when the movement speed calculated by the calculation means is within a specified range.
According to an eighteenth aspect of the present invention, in the guided robot control system of the seventeenth aspect, the robot can move, and the movement direction calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means. A guidance robot control system comprising a movement control means for moving the robot based on speed.
A nineteenth aspect of the present invention is the guide robot control system according to any one of the fourteenth aspect to the eighteenth aspect, wherein the face image obtained by analyzing the frame image obtained by the image obtaining means is obtained. The guidance robot control system includes an acquisition unit, and the user direction acquisition unit acquires a direction that the user is viewing based on the user face data acquired by the face acquisition unit.
According to a twentieth aspect, in the guidance robot control system according to the nineteenth aspect, the user direction acquisition means detects a facial organ based on the face data acquired by the face acquisition means, and determines the face from the positional relationship. The direction of the user is determined and the direction the user is looking at is acquired, or the direction the user is looking at is acquired based on the relative position between the user's eyeball and pupil. System.
According to a twenty-first aspect, in the guidance robot control system of the nineteenth aspect or the twentieth aspect, the face data of the user acquired by the face acquisition means is stored when the guidance data is output by the guidance means. A face data storage unit, a face collation unit that collates the face data of the user acquired by the face acquisition unit and the face data stored in the face data storage unit, and the guide unit includes Guidance characterized by extracting guidance data different from the previous one from the guidance data storage unit and providing guidance when face matching means can collate with the face data stored in the face data storage unit. It is a robot control system.
A twenty-second aspect of the present invention is the guided robot control system according to any one of the nineteenth to twenty-first aspects, wherein the facial expression for determining the user's facial expression is based on the user's facial data acquired by the facial acquisition means. Determining means, wherein the guide data storage unit stores guide data corresponding to a facial expression, and the guide unit refers to the guide data storage unit to the facial expression of the user determined by the facial expression determination unit. A guidance robot control system characterized by extracting guidance data based on the guidance and performing guidance.
According to a twenty-third aspect, in the guidance robot control system according to the twenty-second aspect, after guiding by guidance data, at least the processing by the image acquisition unit, the processing by the face acquisition unit, and the processing by the facial expression determination unit A facial expression comparison unit that executes and compares the facial expression of the user before guidance and the facial expression of the user after guidance, and the guidance unit compares the facial expression with the facial expression comparison unit with reference to the guidance data storage unit A guidance robot control system characterized by extracting guidance data based on a result and performing guidance.
According to a twenty-fourth aspect, in the guided robot control system according to any one of the fourteenth aspect to the twenty-third aspect, the at least one frame image obtained by the image obtaining unit is analyzed, and the attribute of the user is analyzed. Attribute determining means for determining the guide data, the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit A guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.
 本発明により、ユーザとの対話の導入時において、人間同士のような自然な対話が可能な情報提示装置等を提供することができる。
 また、本発明により、人物の分類を好適に特定し、特定した分類に応じた処理を実行することが可能な、情報処理装置等が提供される。
 さらに、本発明によれば、ユーザの意図を考慮して声掛けを行うことができる案内ロボット制御システムを提供することができる。
According to the present invention, it is possible to provide an information presenting apparatus and the like capable of a natural dialogue between humans when a dialogue with a user is introduced.
In addition, according to the present invention, there is provided an information processing apparatus and the like that can appropriately specify a person's classification and execute processing according to the identified classification.
Furthermore, according to the present invention, it is possible to provide a guidance robot control system capable of calling out in consideration of the user's intention.
第1実施形態の情報提示装置の一実施形態の構成を示すブロック図The block diagram which shows the structure of one Embodiment of the information presentation apparatus of 1st Embodiment. 第1実施形態の情報提示装置の他の実施形態の構成を示すブロック図The block diagram which shows the structure of other embodiment of the information presentation apparatus of 1st Embodiment. 第1実施形態の領域判定部12の構成を示すブロック図The block diagram which shows the structure of the area | region determination part 12 of 1st Embodiment. 第1実施形態の顔検出部12aによる顔検出処理の説明図Explanatory drawing of the face detection process by the face detection part 12a of 1st Embodiment. 第1実施形態の顔器官検出部12bによる顔器官検出処理の説明図Explanatory drawing of the face organ detection process by the face organ detection part 12b of 1st Embodiment. 第1実施形態の服領域特定部12cによる服領域特定処理の説明図Explanatory drawing of the clothing area | region identification process by the clothing area | region identification part 12c of 1st Embodiment. 第1実施形態の色情報抽出部13による色情報抽出処理の説明図Explanatory drawing of the color information extraction process by the color information extraction part 13 of 1st Embodiment. 第1実施形態の色言語テーブル14を形成するためのカラーイメージスケールの具体例を示す図The figure which shows the specific example of the color image scale for forming the color language table 14 of 1st Embodiment. 第1実施形態の3色の組合せ色と言語を対応付けたカラーイメージスケールの例を示す図The figure which shows the example of the color image scale which matched the three combination colors and language of 1st Embodiment 第1実施形態の3色の組合せ色と言語を対応付けたカラーイメージスケールの他の例を示す図The figure which shows the other example of the color image scale which matched the three combination colors and language of 1st Embodiment. 第1実施形態の色言語テーブル14の具体例を示す図The figure which shows the specific example of the color language table 14 of 1st Embodiment. 第1実施形態の単色テーブル100の具体例を示す図The figure which shows the specific example of the monochrome table 100 of 1st Embodiment. 第1実施形態の対話ルール110の具体例を示す図The figure which shows the specific example of the dialogue rule 110 of 1st Embodiment. 第1実施形態の推薦テーブル120の具体例を示す図The figure which shows the specific example of the recommendation table 120 of 1st Embodiment. 第1実施形態の情報提示装置10の動作を示すフローチャートThe flowchart which shows operation | movement of the information presentation apparatus 10 of 1st Embodiment. 第2実施形態の情報処理システム2001のシステム構成の一例を示す図The figure which shows an example of the system configuration | structure of the information processing system 2001 of 2nd Embodiment. 第2実施形態のロボット2002の内部構成図The internal block diagram of the robot 2002 of 2nd Embodiment 第2実施形態のコンピュータ2003の内部構成図The internal block diagram of the computer 2003 of 2nd Embodiment 第2実施形態のロボット2002、情報処理システム2001の機能構成図Functional configuration diagram of the robot 2002 and the information processing system 2001 of the second embodiment 第2実施形態の第1の特徴2030Aを例示する図The figure which illustrates the 1st characteristic 2030A of a 2nd embodiment 第2実施形態の第2の特徴2030Bを例示する図The figure which illustrates the 2nd characteristic 2030B of a 2nd embodiment. 第2実施形態の第3の特徴2030Cを例示する図The figure which illustrates the 3rd characteristic 2030C of 2nd Embodiment 第2実施形態の抽出した人物の特徴(人物特徴2040)を示す図The figure which shows the characteristic (person characteristic 2040) of the extracted person of 2nd Embodiment 第2実施形態の分類テーブル2050を示す図The figure which shows the classification table 2050 of 2nd Embodiment. 第2実施形態の処理データベース2060を示す図The figure which shows the process database 2060 of 2nd Embodiment. 第2実施形態のロボット2002の動作を示すフローチャートA flowchart showing an operation of the robot 2002 according to the second embodiment. 第3実施形態に係る案内ロボット制御システムの運用に適する場面例を示す図である。It is a figure which shows the example of a scene suitable for operation | use of the guidance robot control system which concerns on 3rd Embodiment. 第3実施形態に係るコミュニケーションロボットの機能ブロック図である。It is a functional block diagram of the communication robot which concerns on 3rd Embodiment. 第3実施形態に係るコミュニケーションロボットの案内データ記憶部の例を示す図である。It is a figure which shows the example of the guidance data storage part of the communication robot which concerns on 3rd Embodiment. 第3実施形態に係るコミュニケーションロボットでのロボット制御処理を示すフローチャートである。It is a flowchart which shows the robot control process in the communication robot which concerns on 3rd Embodiment. 第3実施形態に係るコミュニケーションロボットでの画像分析処理を示すフローチャートである。It is a flowchart which shows the image analysis process in the communication robot which concerns on 3rd Embodiment. 第3実施形態に係るコミュニケーションロボットでのユーザ決定処理を示すフローチャートである。It is a flowchart which shows the user determination process in the communication robot which concerns on 3rd Embodiment. 第3実施形態に係るコミュニケーションロボットでの案内後処理を示すフローチャートである。It is a flowchart which shows the guidance post-process in the communication robot which concerns on 3rd Embodiment. 第3実施形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on 3rd Embodiment. 第3実施形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on 3rd Embodiment. 第3実施形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on 3rd Embodiment. 第4実施形態に係る案内ロボット制御システムの運用に適する場面例を示す図である。It is a figure which shows the example of a scene suitable for operation | use of the guidance robot control system which concerns on 4th Embodiment. 第4実施形態に係る案内ロボット制御システムの機能ブロック図である。It is a functional block diagram of the guidance robot control system which concerns on 4th Embodiment. 第4実施形態に係るコミュニケーションロボットの案内データ記憶部の例を示す図である。It is a figure which shows the example of the guidance data storage part of the communication robot which concerns on 4th Embodiment. 第4実施形態に係るコミュニケーションロボットでのロボット制御処理を示すフローチャートである。It is a flowchart which shows the robot control process in the communication robot which concerns on 4th Embodiment. 第4実施形態に係るコミュニケーションロボットでの画像分析処理を示すフローチャートである。It is a flowchart which shows the image analysis process in the communication robot which concerns on 4th Embodiment. 第4実施形態に係るコミュニケーションロボットでのユーザ決定処理を示すフローチャートである。It is a flowchart which shows the user determination process in the communication robot which concerns on 4th Embodiment. 第4実施形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on 4th Embodiment. 第4実施形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on 4th Embodiment. 変形形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on a deformation | transformation form. 変形形態に係る案内ロボット制御システムの具体例を示す図である。It is a figure which shows the specific example of the guidance robot control system which concerns on a deformation | transformation form.
(第1実施形態)
 以下図面に基づいて、本発明の実施形態を詳細に説明する。図1は、本発明の情報提示装置の一実施形態の構成を示すブロック図である。情報提示装置10は、人間であるユーザと対話を行う装置であって、ロボットやコンピュータ等である。
(First embodiment)
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of the information presentation apparatus of the present invention. The information presentation device 10 is a device that interacts with a human user, and is a robot, a computer, or the like.
 情報提示装置10は、ユーザの顔を含む画像を入力する画像入力部11と、画像から顔器官の各領域を判定する領域判定部12と、領域から色情報を抽出する色情報抽出部13と、色情報と言語情報とが対応付けられている色言語テーブル14を記憶する記憶部15と、色情報抽出部13によって抽出される色情報に基づいて色言語テーブル14を照合し、言語情報を取得する照合部16と、を備える。 The information presentation apparatus 10 includes an image input unit 11 that inputs an image including a user's face, a region determination unit 12 that determines each region of the facial organ from the image, and a color information extraction unit 13 that extracts color information from the region. The storage unit 15 that stores the color language table 14 in which the color information and the language information are associated with each other and the color language table 14 are collated based on the color information extracted by the color information extraction unit 13 to obtain the language information. And a matching unit 16 to be acquired.
 また、情報提示装置10は、言語情報に基づいて発話内容を決定する発話内容決定部17と、発話内容決定部17によって決定される発話内容を出力する出力部18と、を備えるようにしても良い。また、情報提示装置10は、ユーザの音声を入力するマイクロフォンを備えるようにしても良い。 The information presentation device 10 may further include an utterance content determination unit 17 that determines utterance content based on language information, and an output unit 18 that outputs the utterance content determined by the utterance content determination unit 17. good. In addition, the information presentation apparatus 10 may include a microphone that inputs a user's voice.
 画像入力部11は、カメラ、スキャナー、記憶メディアドライブ、通信制御装置等から構成され、対象のユーザを撮影したり、対象のユーザが印刷されている印刷物を読み取ったり、対象のユーザの画像が記憶されている記憶メディアを読み取ったり、他のコンピュータから送信される対象のユーザの画像を受信したりする。入力される画像は、情報提示装置10のRAM(Random Access Memory)に一時的に保存される。 The image input unit 11 includes a camera, a scanner, a storage media drive, a communication control device, and the like. The image input unit 11 captures a target user, reads a printed matter on which the target user is printed, and stores an image of the target user. The stored storage medium is read, or an image of a target user transmitted from another computer is received. The input image is temporarily stored in a RAM (Random Access Memory) of the information presentation device 10.
 領域判定部12、色情報抽出部13、照合部16及び発話内容決定部17は、CPU(Central Processing Unit)及び、ROM(Read Only Memory)、RAM(Random Access Memory)等によって構成される。CPUは、記憶部15、ROM、記録媒体等に格納されるプログラムをRAM上のワークメモリ領域に呼び出して実行し、バスを介して接続された各装置を駆動制御し、情報提示装置10が行う後述する処理を実現する。ROMは、不揮発性メモリであり、情報提示装置10のブートプログラムやBIOS等のプログラム、データ等を恒久的に保持している。RAMは、揮発性メモリであり、記憶部、ROM、記録媒体等からロードしたプログラム、データ等を一時的に保持するとともに、CPUが各種処理を行う為に使用するワークエリアを備える。 The area determination unit 12, the color information extraction unit 13, the collation unit 16, and the utterance content determination unit 17 are configured by a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU calls a program stored in the storage unit 15, ROM, recording medium or the like to a work memory area on the RAM, executes it, controls the drive of each device connected via the bus, and is performed by the information presentation device 10. The processing described later is realized. The ROM is a non-volatile memory, and permanently stores a program such as a boot program and BIOS of the information presentation apparatus 10, data, and the like. The RAM is a volatile memory, and temporarily holds a program, data, and the like loaded from a storage unit, a ROM, a recording medium, and the like, and includes a work area used by the CPU for performing various processes.
 記憶部15は、HDD(Hard Disk Drive)、フラッシュメモリ等であり、CPUが実行するプログラム、プログラム実行に必要なデータ、OS(Operating System)等が格納される。プログラムに関しては、OSに相当する制御プログラムや、後述する処理をCPUに実行させるためのアプリケーションプログラムが格納されている。各プログラムコードは、CPUにより必要に応じて読み出されてRAMに移され、CPUに読み出されて各種の手段として実行される。 The storage unit 15 is an HDD (Hard Disk Drive), a flash memory, or the like, and stores a program executed by the CPU, data necessary for program execution, an OS (Operating System), and the like. As for the program, a control program corresponding to the OS and an application program for causing the CPU to execute processing described later are stored. Each program code is read by the CPU as necessary, transferred to the RAM, read by the CPU, and executed as various means.
 出力部18は、スピーカ、ディスプレイ装置、プリンタ、通信制御装置等から構成され、発話内容を示す音声を出力したり、発話内容を示す文字を表示したり、発話内容を示す文字を印刷したり、発話内容を示すデータを他のコンピュータに送信したりする。 The output unit 18 includes a speaker, a display device, a printer, a communication control device, and the like. The output unit 18 outputs voice indicating the utterance content, displays characters indicating the utterance content, prints characters indicating the utterance content, Data indicating the utterance content is transmitted to another computer.
 図2は、本発明の情報提示装置の他の実施形態の構成を示すブロック図である。図2に示す情報提示装置20は、ユーザが使用する端末装置30と、端末装置30とネットワークを介して接続されるサーバ40とによって構成される。 FIG. 2 is a block diagram showing a configuration of another embodiment of the information presentation apparatus of the present invention. The information presentation device 20 illustrated in FIG. 2 includes a terminal device 30 used by a user, and a server 40 connected to the terminal device 30 via a network.
 端末装置30は、スマートフォン、タブレット等の携帯端末やPC(Personal Computer)等であり、ユーザの顔を含む画像を入力する画像入力部31と、画像などのデータをサーバ40に送信する送信部39aと、を備える。 The terminal device 30 is a mobile terminal such as a smartphone or a tablet, a PC (Personal Computer), or the like, and includes an image input unit 31 that inputs an image including a user's face, and a transmission unit 39a that transmits data such as an image to the server 40. And comprising.
 また、端末装置30は、発話内容を示すデータなどをサーバ40から受信する受信部39bと、発話内容を出力する出力部32と、を備えるようにしても良い。また、端末装置30は、ユーザの音声を入力するマイクロフォンを備えるようにしても良い。 Further, the terminal device 30 may include a receiving unit 39b that receives data indicating the utterance content from the server 40 and an output unit 32 that outputs the utterance content. Further, the terminal device 30 may include a microphone for inputting a user's voice.
 サーバ40は、端末装置30からデータを受信する受信部49bと、画像から顔器官の各領域を判定する領域判定部42と、領域から色情報を抽出する色情報抽出部43と、色情報と言語情報とが対応付けられている色言語テーブル44を記憶する記憶部45と、色情報抽出部43によって抽出される色情報に基づいて色言語テーブル44を照合し、言語情報を取得する照合部46と、を備える。 The server 40 includes a receiving unit 49b that receives data from the terminal device 30, an area determination unit 42 that determines each area of the facial organ from the image, a color information extraction unit 43 that extracts color information from the area, color information, A storage unit 45 that stores the color language table 44 associated with the language information, and a collation unit that collates the color language table 44 based on the color information extracted by the color information extraction unit 43 and acquires language information. 46.
 また、情報提示装置10は、言語情報に基づいて発話内容を決定する発話内容決定部47と、発話内容決定部47によって決定される発話内容を端末装置30に送信する送信部49aと、を備えるようにしても良い。 The information presentation device 10 includes an utterance content determination unit 47 that determines utterance content based on language information, and a transmission unit 49 a that transmits the utterance content determined by the utterance content determination unit 47 to the terminal device 30. You may do it.
 画像入力部31、出力部32、領域判定部42、色情報抽出部43、色言語テーブル44、記憶部45、照合部46、発話内容決定部47は、それぞれ、図1に示す画像入力部11、出力部18、領域判定部12、色情報抽出部13、色言語テーブル14、記憶部15、照合部16、発話内容決定部17と同様の構成であって、同様の機能を有する。 The image input unit 31, the output unit 32, the region determination unit 42, the color information extraction unit 43, the color language table 44, the storage unit 45, the collation unit 46, and the utterance content determination unit 47 are respectively shown in FIG. The output unit 18, the region determination unit 12, the color information extraction unit 13, the color language table 14, the storage unit 15, the collation unit 16, and the utterance content determination unit 17 have the same functions.
 送信部39aは、通信回線21aを介し、サーバ40にデータを送信する。受信部39bは、通信回線21bを介し、サーバ40からデータを受信する。送信部49aは、通信回線21bを介し、端末装置30にデータを送信する。受信部49bは、通信回線21aを介し、端末装置30からデータを受信する。 The transmission unit 39a transmits data to the server 40 via the communication line 21a. The receiving unit 39b receives data from the server 40 via the communication line 21b. The transmission unit 49a transmits data to the terminal device 30 via the communication line 21b. The receiving unit 49b receives data from the terminal device 30 via the communication line 21a.
 以下では、図1に示す情報提示装置10の構成及び動作の詳細について説明する。尚、図2に示す情報提示装置20も、データの送受信を除き、同様の構成及び動作を行うため、情報提示装置20については説明を省略する。 Hereinafter, the configuration and operation of the information presentation apparatus 10 shown in FIG. 1 will be described in detail. The information presentation apparatus 20 shown in FIG. 2 performs the same configuration and operation except for data transmission / reception, and thus the description of the information presentation apparatus 20 is omitted.
 図3は、領域判定部12の構成を示すブロック図である。図3に示すように、領域判定部12は、画像からユーザの顔を検出する顔検出部12aと、顔検出部12aによって検出される顔の領域から顔器官(目、口、鼻、顎等)を検出する顔器官検出部12bと、画像からユーザの服の領域を特定する服領域特定部12cと、によって構成される。 FIG. 3 is a block diagram illustrating a configuration of the area determination unit 12. As shown in FIG. 3, the region determination unit 12 includes a face detection unit 12 a that detects a user's face from an image, and a facial organ (eyes, mouth, nose, chin, and the like) from the face region detected by the face detection unit 12 a. ) And a clothing region specifying unit 12c that specifies the region of the user's clothes from the image.
 図4は、顔検出部12aによる顔検出処理の説明図である。図中、50が画像、51が顔領域である。顔検出部12aは、例えば、Viola-Jones法の顔検出アルゴリズムによって、画像50から顔領域51とする矩形領域を検出する。Viola-Jones法では、顔検出を行いたい対象の画像に対して、探索窓(例えば8ピクセル×8ピクセルのような判定領域)を左上から走査して順番に動かしていく。顔検出部12aは、この探索窓の領域ごとに、予めHaar-Like特徴量を用いた機械学習(AdaBoost)によって構築される識別器を用いて、顔画像か否かを判定する。尚、顔検出アルゴリズムは、Viola-Jones法に限定されるものではなく、この分野における通常の技術を用いることが可能である。 FIG. 4 is an explanatory diagram of face detection processing by the face detection unit 12a. In the figure, 50 is an image and 51 is a face area. The face detection unit 12a detects a rectangular area as the face area 51 from the image 50 by using a face detection algorithm of the Viola-Jones method, for example. In the Viola-Jones method, a search window (for example, a determination area such as 8 pixels × 8 pixels) is scanned from the upper left with respect to an image to be subjected to face detection, and is sequentially moved. The face detection unit 12a determines, for each area of the search window, whether or not the image is a face image by using a classifier that is constructed in advance by machine learning (AdaBoost) using Haar-Like feature values. Note that the face detection algorithm is not limited to the Viola-Jones method, and ordinary techniques in this field can be used.
 図5は、顔器官検出部12bによる顔器官検出処理の説明図である。図中、51が顔領域、52が目領域、53が口領域である。顔器官検出部12bは、眉毛、目、鼻、口、顎などの各顔器官の特徴点の位置(x、y)を特定する。各顔器官の特徴点の位置を特定するアルゴリズムは、この分野における通常の技術を用いることが可能である。そして、顔器官検出部12bは、特徴点の位置(x、y)に基づいて、各顔器官の領域を特定する。例えば、顔器官検出部12bは、目の特徴点を全て含む矩形領域を目領域52として特定する。同様に、顔器官検出部12bは、口の特徴点を全て含む矩形領域を口領域53として特定する。尚、顔器官検出部12bは、目領域52及び口領域53の特定だけでなく、鼻領域、髪領域、頬領域などを特定しても良い。 FIG. 5 is an explanatory diagram of face organ detection processing by the face organ detection unit 12b. In the figure, 51 is a face area, 52 is an eye area, and 53 is a mouth area. The face organ detection unit 12b specifies the positions (x, y) of feature points of each face organ such as eyebrows, eyes, nose, mouth, and jaw. As an algorithm for specifying the position of the feature point of each facial organ, a normal technique in this field can be used. Then, the facial organ detection unit 12b specifies the region of each facial organ based on the position (x, y) of the feature point. For example, the facial organ detection unit 12 b specifies a rectangular area including all the eye feature points as the eye area 52. Similarly, the facial organ detection unit 12 b specifies a rectangular area including all mouth feature points as the mouth area 53. Note that the facial organ detection unit 12b may specify not only the eye region 52 and the mouth region 53 but also the nose region, the hair region, the cheek region, and the like.
 図6は、服領域特定部12cによる服領域特定処理の説明図である。図中、50が画像、51が顔領域、54が服領域である。服領域特定部12cは、顔検出部12aによって検出される顔領域51の矩形領域の大きさ及び位置(x、y)に基づいて、服領域54を特定する。例えば、服領域特定部12cは、顔領域51の矩形領域の下端を服領域54の上端とし、画像50の下端を服領域54の下端とする。また、例えば、服領域特定部12cは、顔領域51の矩形領域の横幅に対して所定の倍率を掛けた値を服領域54の横幅とし、顔領域51の中心位置を基準として服領域54の左端及び右端を決定する。或いは、服領域特定部12cは、左右方向における背景と服との境界を服領域54の左端及び右端とする。尚、服領域54を特定するためのアルゴリズムは、これに限定されるものではなく、この分野における通常の技術を用いることが可能である。また、服領域特定部12cは、上着の特定だけでなく、帽子、眼鏡、ズボン、鞄などを特定しても良い。 FIG. 6 is an explanatory diagram of the clothing area specifying process by the clothing area specifying unit 12c. In the figure, 50 is an image, 51 is a face area, and 54 is a clothes area. The clothing region specifying unit 12c specifies the clothing region 54 based on the size and position (x, y) of the rectangular region of the face region 51 detected by the face detection unit 12a. For example, the clothing region specifying unit 12 c sets the lower end of the rectangular region of the face region 51 as the upper end of the clothing region 54 and sets the lower end of the image 50 as the lower end of the clothing region 54. In addition, for example, the clothing area specifying unit 12 c sets a value obtained by multiplying the horizontal width of the rectangular area of the face area 51 by a predetermined magnification as the horizontal width of the clothing area 54 and uses the center position of the facial area 51 as a reference. Determine the left and right edges. Alternatively, the clothing region specifying unit 12 c sets the boundary between the background and the clothing in the left-right direction as the left end and the right end of the clothing region 54. Note that the algorithm for specifying the clothing region 54 is not limited to this, and ordinary techniques in this field can be used. In addition, the clothing region specifying unit 12c may specify not only a jacket, but also a hat, glasses, pants, a bag, and the like.
 図7は、色情報抽出部13による色情報抽出処理の説明図である。図中、52が目領域、53が口領域、54が服領域、55、56、57が色情報、55a、55b、55c、56a、56b、56c、57a、57b、57cが色情報を構成する色(単色)である。色情報抽出部13は、目領域52、口領域53及び服領域54それぞれに対して減色処理を行い、各領域を代表する1又は複数の色情報55~57を抽出する。図7に示す例では、色情報抽出部13は、色情報55~57として、3色を抽出している。尚、色情報抽出部13による一の色(単色)又は複数の色(単色)の特定は、減色処理される各領域から一の色(単色)又は複数の色(単色)を抽出(例えば5色に減色処理される各領域から1色又は3色を抽出)することにより行っても良い。 FIG. 7 is an explanatory diagram of color information extraction processing by the color information extraction unit 13. In the figure, 52 is an eye region, 53 is a mouth region, 54 is a clothing region, 55, 56, and 57 are color information, and 55a, 55b, 55c, 56a, 56b, 56c, 57a, 57b, and 57c are color information. Color (single color). The color information extraction unit 13 performs a color reduction process on each of the eye area 52, the mouth area 53, and the clothes area 54, and extracts one or a plurality of color information 55 to 57 representing each area. In the example shown in FIG. 7, the color information extraction unit 13 extracts three colors as the color information 55 to 57. The color information extraction unit 13 specifies one color (single color) or a plurality of colors (single color) by extracting one color (single color) or a plurality of colors (single color) from each area to be subjected to the color reduction processing (for example, 5 Alternatively, one color or three colors may be extracted from each area subjected to color reduction processing.
 色情報抽出部13による減色処理は、例えば、k-means法により行う。具体的には、画像をk色に減色(各領域を代表するk色を抽出)する場合、各領域からランダムにk個の代表色を選定し、各画素について一番近い代表色を選んで全画素をk個のグループに分割し、分割した各グループの各画素の平均色を新たな代表色として再度全画素をk個のグループに分割し、この代表色によるグループへの分割、分割したグループの平均色を代表色に置き換えることを繰り返すことにより、k色に減色する。 The color reduction processing by the color information extraction unit 13 is performed by, for example, the k-means method. Specifically, when the image is reduced to k colors (k colors representing each region are extracted), k representative colors are randomly selected from each region, and the closest representative color is selected for each pixel. All pixels are divided into k groups, and the average color of each pixel in each divided group is used as a new representative color, and all the pixels are divided again into k groups, and divided into groups based on this representative color. By repeatedly replacing the average color of the group with the representative color, the color is reduced to k colors.
 図8は、色言語テーブル14を形成するためのカラーイメージスケールの具体例を示す図である。図中、60はカラーイメージスケール、61~63はカラーイメージスケール60上に配置した単色である。 FIG. 8 is a diagram showing a specific example of a color image scale for forming the color language table 14. In the figure, 60 is a color image scale, and 61 to 63 are single colors arranged on the color image scale 60.
 カラーイメージスケール60は、イメージの判断基準であるウォーム(WARM)-クール(COOL)、ソフト(SOFT)-ハード(HARD)の座標軸上に、そのイメージに対応する単一の色(1色の単色)と言語を配置したものである。例えば、カラーイメージスケール60の左上(第2象限)には、単色61、62、63他が配置され、単色61に対応する言語が「甘い」、単色62に対応する言語が「プリティな」、単色63に対応する言語が「おめでたい」となっている。カラーイメージスケール60において、1色の単色に対応付けられた言語には、「甘い」、「かわいい」等の形容詞が多く含まれている。これは、形容詞は、色が与える印象を表現する言語として最も適しているからである。 The color image scale 60 has a single color (one single color) corresponding to the image on the coordinate axes of warm (WARM) -cool (COOL) and soft (SOFT) -hard (HARD) which are image determination criteria. ) And language. For example, in the upper left (second quadrant) of the color image scale 60, single colors 61, 62, 63 and others are arranged, the language corresponding to the single color 61 is “sweet”, and the language corresponding to the single color 62 is “pretty” The language corresponding to the single color 63 is “congratulations”. In the color image scale 60, a language associated with one single color includes many adjectives such as “sweet” and “cute”. This is because adjectives are most suitable as a language for expressing the impression given by color.
 また、カラーイメージスケール60に配置される単色は、色相を基準として、明度と彩度、あるいは明度と彩度の一方を異ならせた色とするのが望ましい。すなわち、色相を、例えば、赤、黄赤、黄、黄緑、緑、青緑、青、紫青、紫、赤紫、無彩色に分けて、各色相の明度と彩度の両方あるいはその一方を異ならせた色から、カラーイメージスケール60の単色を構成するのが望ましい。これは、色が与える印象は、その色がもつ色相、明度、再度により異なるからである。なお、図8に示すカラーイメージスケール60の単色と言語は、代表的なものだけを示したもので、実際にはさらに細かく単色と言語が配置される。 Also, it is desirable that the single color arranged on the color image scale 60 be a color in which one of brightness and saturation or brightness and saturation is different with respect to hue. That is, the hue is divided into, for example, red, yellow-red, yellow, yellow-green, green, blue-green, blue, purple-blue, purple, red-purple, and achromatic color, and / or one of the lightness and saturation of each hue. It is desirable to construct a single color of the color image scale 60 from the colors having different colors. This is because the impression that a color gives depends on the hue, lightness, and again that color has. Note that the single color and language of the color image scale 60 shown in FIG. 8 are only representative, and actually, the single color and language are arranged more finely.
 図9、図10は、3色の組合せ色と言語を対応付けたカラーイメージスケールの例を示す図である。図中、80、90はカラーイメージスケール、81、91はカラーイメージスケール80、90上に配置させた3色の組合せ色(配色)である。 9 and 10 are diagrams showing examples of color image scales in which three combination colors and languages are associated with each other. In the figure, 80 and 90 are color image scales, and 81 and 91 are three combined colors (color arrangements) arranged on the color image scales 80 and 90.
 図9に示すカラーイメージスケール80は、大衆的-高揚感、オーソドックス-個性的の座標軸に、そのイメージに対応する3色の組合せ色(配色)と言語を配置したものである。例えば、カラーイメージスケール80の左上(第2象限)には、組合せ色81他が配置され、組合せ色81に対応する言語が「落ち着きがある」となっており、組合せ色81は、単色81a、単色81b及び単色81cを組み合わせたものとなっている。このカラーイメージスケール80は、例えば、自動車の色と言語に適用される。 The color image scale 80 shown in FIG. 9 has three combination colors (color schemes) and languages corresponding to the image arranged on a public-uplifting, orthodox-individual coordinate axis. For example, in the upper left (second quadrant) of the color image scale 80, the combination color 81 and the like are arranged, the language corresponding to the combination color 81 is “settled”, and the combination color 81 is a single color 81a, A single color 81b and a single color 81c are combined. The color image scale 80 is applied, for example, to the color and language of a car.
 図10に示すカラーイメージスケール90は、フォーマル-カジュアル、静-動の座標軸に、そのイメージに対応する3色の組合せ色(配色)と言語を配置したものである。例えば、カラーイメージスケール90の左上(第2象限)には、組合せ色91他が配置され、組合せ色91に対応する言語が「エレガント」となっており、組合せ色91は、単色91a、単色91b及び単色91cを組み合わせたものとなっている。このカラーイメージスケール90は、例えば、ファッション(洋服)の色と言語に適用される。 The color image scale 90 shown in FIG. 10 has three combination colors (color schemes) and languages corresponding to the image arranged on the formal-casual and static-motion coordinate axes. For example, in the upper left (second quadrant) of the color image scale 90, the combination color 91 and the like are arranged, the language corresponding to the combination color 91 is “elegant”, and the combination color 91 includes the single color 91a and the single color 91b. And a single color 91c. This color image scale 90 is applied to, for example, the color and language of fashion (clothes).
 図11は、色言語テーブル14の具体例を示す図である。図11に示すように、色言語テーブル14においては、組合せ色ID14a、単色ID14b、言語14c及び関連語14dが対応付けられている。 FIG. 11 is a diagram showing a specific example of the color language table 14. As shown in FIG. 11, in the color language table 14, a combination color ID 14a, a single color ID 14b, a language 14c, and a related word 14d are associated with each other.
 組合せ色ID14aは、3色の組合せ色と言語を対応付けたカラーイメージスケール(図9、図10参照)に配置される組合せ色を識別するためのIDである。単色ID14bは、単色である配色1、配色2及び配色3の各単色IDから構成される。例えば、組合せ色ID14aが「A00100」のデータは、配色1の単色ID14bが「c3255」、配色2の単色ID14bが「c4529」、配色3の単色ID14bが「c5043」であり、組合せ色に対応する言語14cは「あどけない」である。 The combination color ID 14a is an ID for identifying a combination color arranged on a color image scale (see FIGS. 9 and 10) in which three combination colors and languages are associated with each other. The single color ID 14b is composed of single color IDs of color scheme 1, color scheme 2 and color scheme 3 which are single colors. For example, the data of which the combination color ID 14a is “A00100” corresponds to the combination color in which the single color ID 14b of the color scheme 1 is “c3255”, the single color ID 14b of the color scheme 2 is “c4529”, and the single color ID 14b of the color scheme 3 is “c5043”. The language 14c is “innocent”.
 関連語14dは、言語14cに関連する言語であり、例えば言語「あどけない」の関連語としては「あどけなさ」が挙げられる。なお、色言語テーブル14が単色と言語を対応付けたカラーイメージスケール(図8参照)から形成される場合、カラーイメージスケールに配置された単色と言語がそのまま対応付けられた色言語テーブル14が形成される。なお、関連語は、あらかじめ記憶手段に対応付けて記憶することもできるが、たとえばインターネット上のWebページをクローリングすることで取得することも可能である。 The related word 14d is a language related to the language 14c. For example, the related word of the language “innocence” includes “innocence”. When the color language table 14 is formed from a color image scale (see FIG. 8) in which a single color is associated with a language (see FIG. 8), the color language table 14 in which the single color arranged on the color image scale is directly associated with the language is formed. Is done. Note that related terms can be stored in advance in association with storage means, but can also be acquired by crawling Web pages on the Internet, for example.
 図12は、単色テーブル100の具体例を示す図である。単色テーブル100は、単色ID14bで特定される配色1~3がどのような単色からなるかを表している。図中、100は単色テーブル、100aは単色ID、100bは色値である。 FIG. 12 is a diagram showing a specific example of the single color table 100. The single color table 100 indicates what single color the color schemes 1 to 3 specified by the single color ID 14b are composed. In the figure, 100 is a monochrome table, 100a is a monochrome ID, and 100b is a color value.
 単色IDは、組合せ色14aで特定される配色1~3を形成する単色を識別するためのIDである。また、色値100bは、単色IDで特定される単色を、色空間における座標値で表したものであり、図12の単色テーブル100においては、XYZ表色系(xyY表色系)のCEIxy色度図におけるx、yの値となっている。 The single color ID is an ID for identifying a single color forming the color schemes 1 to 3 specified by the combination color 14a. The color value 100b represents a single color specified by a single color ID as a coordinate value in a color space. In the single color table 100 of FIG. 12, CEIxy colors in the XYZ color system (xyY color system). The values are x and y in the degree diagram.
 例えば、図11に示す組合せ色ID14aが「A00100」の組合せ色を構成する3つの単色ID14bは、各々「c3255」、「c4529」、「c5043」であり、その色値(x、y)は、各々(0.3219、0.5523)、(0.4501、0.2956)、(0.5027、0.4315)となる。なお、色値70bを表す色空間は、図12のXYZ表色系(xyY表色系)に限定されるものではなく、RGB表色系、L*u*v*表色系、L*a*b*表色系等の各種表色系が適宜使用される。 For example, the three single color IDs 14b constituting the combination color whose combination color ID 14a is “A00100” shown in FIG. (0.3219, 0.5523), (0.4501, 0.2956), and (0.5027, 0.4315), respectively. Note that the color space representing the color value 70b is not limited to the XYZ color system (xyY color system) in FIG. 12, but the RGB color system, L * u * v * color system, L * a. Various color systems such as * b * color system are appropriately used.
 照合部16は、色情報抽出部13によって抽出される一の色(単色)又は複数の色(組合せ色)と、記憶部15に記憶される色言語テーブル14の一の色(単色)又は組合せ色の単色同士を対比し、その色値の差(色空間上の距離)が一定の閾値以下となるもの(組合せ色の場合は全ての単色の色値の差が一定の閾値以下となるもの)が存在する色言語テーブル14の一の色(単色)又は組合せ色を特定する。 The collation unit 16 includes one color (single color) or a plurality of colors (combination colors) extracted by the color information extraction unit 13 and one color (single color) or combination of the color language table 14 stored in the storage unit 15. A single color is compared and the color value difference (distance in the color space) is below a certain threshold (in the case of a combination color, the difference between all single color values is below a certain threshold) ) Is specified as one color (single color) or combination color.
 例えば、色情報抽出部13によって抽出される組合せ色が図7に示す単色55a、単色55b及び単色55cから構成される色情報55であり、色言語テーブル14において、組合せ色IDがNの組合せ色(組合せ色N)を構成する配色1の単色IDがNA(単色NA)、配色2の単色IDがNB(単色NB)、配色3の単色IDがNC(単色NC)であるとする。この場合、色情報55の単色と組合せ色Nの単色を対比すると、対比する単色の組合せは以下の6通りである。
(1)単色55a-単色NA、単色55b-単色NB,単色55c-単色NC
(2)単色55a-単色NA、単色55b-単色NC,単色55c-単色NB
(3)単色55a-単色NB、単色55b-単色NA,単色55c-単色NC
(4)単色55a-単色NB、単色55b-単色NC,単色55c-単色NA
(5)単色55a-単色NC、単色55b-単色NA,単色55c-単色NB
(6)単色55a-単色NC、単色55b-単色NB,単色55c-単色NA
For example, the combination color extracted by the color information extraction unit 13 is the color information 55 including the single color 55a, the single color 55b, and the single color 55c shown in FIG. 7, and the combination color ID is N in the color language table 14. Assume that the single color ID of color scheme 1 constituting the (combination color N) is NA (single color NA), the single color ID of color scheme 2 is NB (single color NB), and the single color ID of color scheme 3 is NC (single color NC). In this case, when comparing the single color of the color information 55 with the single color of the combination color N, the following six combinations of the single color are compared.
(1) Single color 55a-single color NA, single color 55b-single color NB, single color 55c-single color NC
(2) Single color 55a-Single color NA, Single color 55b-Single color NC, Single color 55c-Single color NB
(3) Single color 55a-Single color NB, Single color 55b-Single color NA, Single color 55c-Single color NC
(4) Single color 55a-Single color NB, Single color 55b-Single color NC, Single color 55c-Single color NA
(5) Single color 55a-Single color NC, Single color 55b-Single color NA, Single color 55c-Single color NB
(6) Single color 55a-single color NC, single color 55b-single color NB, single color 55c-single color NA
 そして、(1)~(6)の組合せにおいて、対比する単色の色値の差が全て一定の閾値以下となるものが存在すれば、その組合せ色Nは、色情報55と同一または近似する組合せ色であるとする。ここで、単色Aの色値(xA、yA)とし、単色Bの色値(xB、yB)とすると、単色Aと単色Bの色値の差△Cは、以下のようになる。
 △C=(xA-xB)+(yA-yB)
 このようにして、照合部16は、色言語テーブル14の組合せ色の中から、色情報抽出部13によって抽出される組合せ色と同一又は近似する組合せ色を特定する。
In addition, in the combinations (1) to (6), if there is a combination in which the differences in the color values of the contrasting single colors are all equal to or less than a certain threshold value, the combination color N is the same as or similar to the color information 55. Let it be a color. Here, assuming that the color value of the single color A (xA, yA) and the color value of the single color B (xB, yB), the difference ΔC between the color values of the single color A and the single color B is as follows.
ΔC = (xA−xB) 2 + (yA−yB) 2
In this way, the collation unit 16 specifies a combination color that is the same as or close to the combination color extracted by the color information extraction unit 13 from the combination colors in the color language table 14.
 この後、照合部16は、色情報55と同一または近似する組合せ色として特定される組合せ色に対応付けられている言語(言語情報)及びその関連語を取得する。例えば、色情報55と同一又は近似する組合せ色として特定されるものが、カラーイメージスケール80(図9参照)の組合せ色81とすると、照合部16は、組合せ色81に対応付けられる言語「落ち着きがある」及びその関連語「落ち着き」を取得する。 Thereafter, the collation unit 16 acquires the language (language information) associated with the combination color specified as the combination color that is the same as or similar to the color information 55 and the related word. For example, if the combination color 81 specified in the color image scale 80 (see FIG. 9) is specified as a combination color that is the same as or similar to the color information 55, the collation unit 16 sets the language “calm” associated with the combination color 81. Is "and its related term" calm ".
 尚、条件を満たす組合せ色が色言語テーブル14に複数存在する場合、照合部16は、複数の言語情報を取得するようにしても良い。また、色言語テーブル14において、一の色(単色)と言語が対応付けられている場合、照合部16は、色情報抽出部13によって抽出される一の色(単色)と色言語テーブル14の一の色(単色)を対比し、その色値の差が一定の閾値以下となる色言語テーブル14の一の色(単色)を特定し、その特定される一の色(単色)に対応付けられている色情報を取得する。また、照合部16は、色言語テーブル14から関連語を取得するだけでなく、記憶部15や他のコンピュータに記憶される辞書データベース等から関連語を取得するようにしても良い。 When there are a plurality of combination colors that satisfy the condition in the color language table 14, the collation unit 16 may acquire a plurality of language information. Further, in the color language table 14, when one color (single color) and a language are associated with each other, the collation unit 16 extracts the one color (single color) extracted by the color information extraction unit 13 and the color language table 14. One color (single color) is compared, one color (single color) whose color value difference is equal to or less than a certain threshold value is specified, and is associated with the specified one color (single color) Get color information. Further, the collation unit 16 may not only acquire related words from the color language table 14, but may also acquire related words from the storage unit 15 or a dictionary database stored in another computer.
 発話内容決定部17は、記憶部15に記憶される各種文章のテンプレートから、照合部16によって取得される言語情報や関連語に基づいて発話内容を決定する。具体的には、発話内容決定部17は、記憶部15に記憶される対話ルール110(図13参照)や推薦テーブル120(図14参照)に基づいて発話内容を決定する。 The utterance content determination unit 17 determines the utterance content based on language information and related words acquired by the collation unit 16 from various text templates stored in the storage unit 15. Specifically, the utterance content determination unit 17 determines the utterance content based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15.
 図13は、対話ルール110の具体例を示す図である。図中、110は対話ルール、111は領域、112は言語情報、113は発話内容である。対話ルール110は、領域111及び言語情報112ごとに、発話内容113が対応付けられて記憶されている。例えば、No1のデータは、領域111が「服」、言語情報112が「ハイセンス」に対して、発話内容113として「すごくおしゃれな服ですね!」が対応付けられている。 FIG. 13 is a diagram showing a specific example of the dialogue rule 110. In the figure, 110 is a dialogue rule, 111 is a region, 112 is language information, and 113 is an utterance content. The conversation rule 110 is stored with the utterance content 113 associated with each region 111 and language information 112. For example, in the data No. 1, the area 111 is associated with “clothes” and the language information 112 is associated with “Hisense”.
 また、対話ルール110の発話内容113には、「すごく○○な服ですね!」(〇〇は文字列が挿入可能な個所を示す。)等の文字列が挿入可能な文章のみを記憶しておき、発話内容決定部17が、その文章に言語情報を挿入して発話内容を決定しても良い。例えば、言語情報が「エレガント」の場合、発話内容決定部17は「すごく〇〇な服ですね!」に言語情報を挿入し、「すごくエレガントな服ですね!」を発話内容としても良い。 Also, in the utterance content 113 of the dialogue rule 110, only sentences that can be inserted with a character string such as “It is a very OO clothes!” (00 indicates a place where a character string can be inserted) are stored. In addition, the utterance content determination unit 17 may determine the utterance content by inserting language information into the sentence. For example, when the language information is “elegant”, the utterance content determination unit 17 may insert the language information into “It ’s a very elegant clothes!” And make “A very elegant clothes!” As the utterance content.
 図14は、推薦テーブル120の具体例を示す図である。図中、120は推薦テーブル、121は領域、122は言語情報、123は発話内容、124は推薦情報である。推薦テーブル120は、領域121及び言語情報122ごとに、発話内容123と推薦情報124が対応付けられて記憶されている。例えば、No1のデータは、領域121が「口」、言語情報122が「シック」に対して、発話内容123として「落ち着いた色がお好きでしたら、こんな口紅はいかがですか?」、推薦情報124として「口紅A」が対応付けられている。推薦情報124は、文字列だけでなく、画像、動画、ウェブページなどが対応付けられていても良い。例えば、「口紅A」の文字列だけでなく、その口紅の商品画像、CM動画、商品紹介ウェブページなどが対応付けられていても良い。 FIG. 14 is a diagram showing a specific example of the recommendation table 120. In the figure, 120 is a recommendation table, 121 is an area, 122 is language information, 123 is utterance content, and 124 is recommendation information. The recommendation table 120 stores utterance contents 123 and recommendation information 124 in association with each other for each area 121 and language information 122. For example, for the data No. 1, the region 121 is “mouth” and the language information 122 is “chic”, and the utterance content 123 is “If you like a calm color, how about such a lipstick?”, Recommendation information 124 is associated with “lipstick A”. The recommendation information 124 may be associated with not only a character string but also an image, a moving image, a web page, and the like. For example, not only the character string “lipstick A” but also the product image, CM video, product introduction web page, and the like of the lipstick may be associated.
 図15は、情報提示装置10の動作を示すフローチャートである。図15に示すように、画像入力部11は、ユーザの顔を含む画像を入力する(ステップS1)。具体的には、画像入力部11は、例えばカメラであり、カメラによって撮影されるユーザの顔を含む画像を入力する。 FIG. 15 is a flowchart showing the operation of the information presentation apparatus 10. As shown in FIG. 15, the image input unit 11 inputs an image including the user's face (step S1). Specifically, the image input unit 11 is a camera, for example, and inputs an image including a user's face photographed by the camera.
 次に、領域判定部12は、ステップS1によって入力される画像から顔器官の各領域を判定する(ステップS2)。具体的には、顔検出部12aが画像からユーザの顔を検出し、顔器官検出部12bが顔検出部12aによって検出される顔の領域から顔器官(目、口、鼻、顎等)を検出し、服領域特定部12cが画像からユーザの服の領域を特定する。 Next, the region determination unit 12 determines each region of the facial organ from the image input in step S1 (step S2). Specifically, the face detection unit 12a detects the face of the user from the image, and the face organ detection unit 12b detects the face organs (eyes, mouth, nose, chin, etc.) from the face area detected by the face detection unit 12a. Then, the clothes area specifying unit 12c specifies the area of the user's clothes from the image.
 次に、色情報抽出部13は、ステップS2によって特定される領域ごとに色情報を抽出する(ステップS3)。具体的には、色情報抽出部13は、目領域、口領域などの顔器官領域と服領域のそれぞれに対して減色処理を行い、各領域を代表する1又は複数の色情報を抽出する。 Next, the color information extraction unit 13 extracts color information for each area specified in step S2 (step S3). Specifically, the color information extraction unit 13 performs color reduction processing on each of the facial organ area such as the eye area and the mouth area and the clothing area, and extracts one or more color information representing each area.
 次に、照合部16は、ステップS3によって抽出される色情報に基づいて色言語テーブル14を照合し、言語情報やその関連語を取得する(ステップS4)。具体的には、照合部16は、ステップS3によって抽出される一の色(単色)又は複数の色(組合せ色)と、色言語テーブル14の一の色(単色)又は組合せ色の単色同士を対比し、その色値の差が一定の閾値以下となるもの(組合せ色の場合は全ての単色の色値の差が一定の閾値以下となるもの)が存在する色言語テーブル14の一の色(単色)又は組合せ色を特定する。そして、照合部16は、組合せ色と同一または近似する組合せ色として特定される組合せ色に対応付けられている言語情報を色言語テーブル14から取得する。 Next, the collation unit 16 collates the color language table 14 based on the color information extracted in step S3, and acquires language information and related words (step S4). Specifically, the collation unit 16 compares one color (single color) or a plurality of colors (combination colors) extracted in step S3 with one color (single color) or a combination color of the color language table 14. In contrast, one color of the color language table 14 in which a color value difference is equal to or smaller than a certain threshold value (in the case of a combination color, a color value difference between all single colors is equal to or smaller than a certain threshold value) exists. (Single color) or combination color is specified. Then, the collation unit 16 acquires language information associated with the combination color specified as the combination color that is the same as or close to the combination color from the color language table 14.
 次に、発話内容決定部17は、ステップS4によって取得される言語情報に基づいて発話内容を決定する(ステップS5)。具体的には、発話内容決定部17は、記憶部15に記憶される対話ルール110(図13参照)や推薦テーブル120(図14参照)に基づいて発話内容や推薦情報を決定する。対話ルール110や推薦テーブル120の検索結果が複数存在する場合、発話内容決定部17は、ランダムに選択するか、或いは予め定められる優先順位に従って選択する。優先順位としては、例えば、領域による順位付けが考えられる。例えば、目は人間同士においても強い印象を与えることから、目領域の色情報に対応付けられている言語情報を最も高い優先順位にすることが考えられる。また、情報提示装置10によって推薦したい情報が特定の領域に関連している場合、例えば、推薦したい情報が口紅の場合、口領域の色情報に対応付けられている言語情報を最も高い順位にすることが考えられる。 Next, the utterance content determination unit 17 determines the utterance content based on the language information acquired in step S4 (step S5). Specifically, the utterance content determination unit 17 determines the utterance content and recommendation information based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15. When there are a plurality of search results of the dialogue rule 110 and the recommendation table 120, the utterance content determination unit 17 selects at random or according to a predetermined priority order. As the priority order, for example, ranking by area can be considered. For example, since eyes give a strong impression even between humans, it is conceivable that the language information associated with the color information of the eye area is given the highest priority. In addition, when the information to be recommended by the information presentation device 10 is related to a specific area, for example, when the information to be recommended is a lipstick, the language information associated with the color information of the mouth area is given the highest ranking. It is possible.
 次に、出力部18は、ステップS5によって決定される発話内容や推薦情報を出力する(ステップS6)。具体的には、出力部18は、スピーカから発話内容を示す音を出力するとともに、ディスプレイ装置に発話内容を示す文字や推薦情報を表示する。情報提示装置10が、ユーザの音声を入力するマイクロフォンを備えている場合、ユーザの音声を入力し、ユーザの音声も考慮して、ステップS1から処理を繰り返す(対話を続ける)ようにしても良い。 Next, the output unit 18 outputs the utterance content and recommendation information determined in step S5 (step S6). Specifically, the output unit 18 outputs a sound indicating the utterance content from a speaker and displays characters indicating the utterance content and recommendation information on the display device. When the information presentation apparatus 10 includes a microphone for inputting the user's voice, the user's voice may be input, and the process may be repeated (continue the dialogue) from step S1 in consideration of the user's voice. .
 以上の説明によれば、本発明に係る情報提示装置10(20)は、ユーザとの対話の導入時において、人間同士のような自然な対話が可能となる。特に、情報提示装置10(20)は、顔器官や服ごとに色情報を抽出し、その色情報に対応付けられる言語情報に基づいて発話内容を決定する。従って、初対面のユーザに対して、初対面の人間同士と同じように、相手から受ける第1印象に基づいて会話を始めることが可能となる。 According to the above description, the information presenting apparatus 10 (20) according to the present invention enables a natural dialogue like humans when introducing a dialogue with a user. In particular, the information presentation apparatus 10 (20) extracts color information for each facial organ and clothes, and determines the utterance content based on language information associated with the color information. Therefore, it is possible to start a conversation based on the first impression received from the other party, as with the first-time human beings, for the first-time user.
 以上、添付図面を参照しながら、本発明に係る情報提示装置等の好適な実施形態について説明したが、本発明はかかる例に限定されない。当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the information presentation apparatus according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. Understood.
(第2実施形態)
 図16は、情報処理システム2001のシステム構成の一例を示す図である。情報処理システム2001は、本発明に係る情報処理装置の一実施例であるロボット2002、コンピュータ2003、カメラ2004、マイク2005、匂いセンサ2006及びディスプレイ2007を備えて構成される。ロボット2002、カメラ2004、マイク2005、匂いセンサ2006及びディスプレイ2007はコンピュータ2003に通信接続され、コンピュータ2003により各装置の動作が連携するよう制御される。ロボット2002、カメラ2004、マイク2005、匂いセンサ2006及びディスプレイ2007とコンピュータ2003との通信接続の形態は、有線、無線を問わない。
(Second Embodiment)
FIG. 16 is a diagram illustrating an example of a system configuration of the information processing system 2001. An information processing system 2001 includes a robot 2002, a computer 2003, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007, which are an embodiment of an information processing apparatus according to the present invention. A robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007 are communicatively connected to a computer 2003, and controlled by the computer 2003 so that the operations of the respective devices cooperate. The form of communication connection between the robot 2002, the camera 2004, the microphone 2005, the odor sensor 2006, the display 2007, and the computer 2003 may be wired or wireless.
 カメラ2004は、周囲の映像データを取得する。カメラ2004は、例えば、単独のカメラ、ステレオカメラ、距離センサ、或いはこれらを組み合わせた複合的なセンサである。なお、カメラ2004に替えてロボット2002に内蔵されるカメラ2210を用いて映像データを取得することもできる。この場合はカメラ2004を省略してよい。 The camera 2004 acquires surrounding video data. The camera 2004 is, for example, a single camera, a stereo camera, a distance sensor, or a composite sensor that combines these. Note that video data can be acquired using a camera 2210 built in the robot 2002 instead of the camera 2004. In this case, the camera 2004 may be omitted.
 マイク2005は、周囲の音声データを取得する。マイク2005は、例えば、単一のマイクロホン、或いは複数のマイクロホンから構成されるマイクロホンアレイである。マイク2005についてもロボット2002に内蔵されているマイク2211を用いて音声データを取得することもできる。この場合はマイク2005を省略してよい。 The microphone 2005 acquires surrounding audio data. The microphone 2005 is, for example, a microphone array including a single microphone or a plurality of microphones. The microphone 2005 can also acquire voice data using the microphone 2211 built in the robot 2002. In this case, the microphone 2005 may be omitted.
 匂いセンサ2006は、周囲の匂いデータを取得する。匂いセンサ2006は、例えば、水晶振動子式、半導体式の単一のセンサからなるもの、或いは複数のセンサからなるセンサアレイである。匂いセンサ2006についてもロボット2002に内蔵されている匂いセンサ2212を用いて匂いデータを取得することができる。この場合は匂いセンサ2006を省略してよい。 The odor sensor 2006 acquires ambient odor data. The odor sensor 2006 is, for example, a sensor composed of a single crystal resonator type or semiconductor type sensor, or a sensor array composed of a plurality of sensors. As for the odor sensor 2006, the odor data can be acquired using the odor sensor 2212 incorporated in the robot 2002. In this case, the odor sensor 2006 may be omitted.
 本実施形態において、ロボット2002は、複数の異なる種別のデータ、すなわち、上記したカメラ2004(或いはカメラ2210)から取得される映像データ、マイク2005(或いはマイク2211)から取得される音声データ、匂いセンサ2006(或いは匂いセンサ2212)から取得される匂いデータ、を用いて人物の特徴を抽出し、抽出した特徴に基づいて人物の分類(嗜好、性別、年代、言語、国籍等の人物の属性)を特定する。そして、特定した分類に応じた処理(対話、情報案内、宣伝などの処理)を行う。 In the present embodiment, the robot 2002 includes a plurality of different types of data, that is, video data acquired from the camera 2004 (or camera 2210), audio data acquired from the microphone 2005 (or microphone 2211), and an odor sensor. The feature of the person is extracted using odor data acquired from 2006 (or the odor sensor 2212), and the classification of the person (attributes of the person such as preference, gender, age, language, nationality) based on the extracted feature. Identify. Then, processing (processing such as dialogue, information guidance, and advertisement) according to the identified classification is performed.
 ディスプレイ2007は、コンピュータ2003によって制御され、映像等を表示する。ロボット2002に搭載されている表示部2204を用いて表示制御を行う場合や、ロボット2002が音声のみで対話、情報案内、宣伝などの処理を実行する場合はディスプレイ2007を省略してもよい。 The display 2007 is controlled by the computer 2003 and displays an image or the like. The display 2007 may be omitted when display control is performed using the display unit 2204 mounted on the robot 2002, or when the robot 2002 executes processing such as dialogue, information guidance, and advertisement only by voice.
 コンピュータ2003は、カメラ2004による映像データの取得、マイク2005による音声データの取得、匂いセンサ2006による匂いデータの取得、ディスプレイ2007の表示制御を行う他、ロボット2002の動作を制御する。 The computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, display control of the display 2007, and controls the operation of the robot 2002.
 ロボット2002(情報処理装置)は、例えば対話型のロボットである。図17に示すように、ロボット2002は、制御部2201、記憶部2202、入力部2203、表示部2204、メディア入出力部2205、通信I/F2206、可動部駆動機構2207、周辺機器I/F部2208、音声処理部2209、カメラ2210、マイク2211、匂いセンサ2212、スピーカ2213、各種センサ2214がバス2220を介して接続されて構成される。 The robot 2002 (information processing apparatus) is, for example, an interactive robot. As shown in FIG. 17, the robot 2002 includes a control unit 2201, a storage unit 2202, an input unit 2203, a display unit 2204, a media input / output unit 2205, a communication I / F 2206, a movable unit drive mechanism 2207, and a peripheral device I / F unit. 2208, an audio processing unit 2209, a camera 2210, a microphone 2211, an odor sensor 2212, a speaker 2213, and various sensors 2214 are connected via a bus 2220.
 制御部2201は、CPU(Central Processing Unit)、ROM(Read Only Memory)、RAM(Random Access Memory)等により構成される。CPUは、記憶部2202、ROM、記録媒体等に格納されるプログラムをRAM上のワークメモリ領域に呼び出して実行し、バス2220を介して接続された各部を駆動制御する。 The control unit 2201 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU calls a program stored in the storage unit 2202, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2220.
 制御部2201のCPUは、ロボット2002による全処理を実行する。
 ROMは、ロボット2002のブートプログラムやBIOS等のプログラム、データ等を恒久的に保持する。RAMは、ロードしたプログラムやデータを一時的に保持するとともに、制御部2201が各種処理を行うために使用するワークエリアを備える。
The CPU of the control unit 2201 executes all processes by the robot 2002.
The ROM permanently stores programs such as a boot program for the robot 2002, a BIOS, data, and the like. The RAM temporarily stores the loaded program and data, and includes a work area used by the control unit 2201 to perform various processes.
 記憶部2202は、制御部2201が実行するプログラムや、プログラム実行に必要なデータ、オペレーティングシステム等が格納されている。これらのプログラムコードは、制御部2201により必要に応じて読み出されてRAMに移され、CPUに読み出されて実行される。 The storage unit 2202 stores a program executed by the control unit 2201, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2201 as necessary, transferred to the RAM, and read and executed by the CPU.
 また記憶部2202は、人物の分類を特定する際に参照される分類テーブル2050(図24)、分類に応じた処理を実行する際に参照される処理データベース2060(図25)等を有する。
 また、ロボット2002の制御シーケンスを記載した制御シーケンステーブルが格納されるアクションDB(不図示)を有する。制御シーケンステーブルには、ロボット2002が行う全てのアクションとロボット2002の可動部制御シーケンスとを紐づけたデータが格納される。可動部制御シーケンスとは、各アクションを実現する際の制御対象となる各可動部の駆動箇所と値等(例えば頭部チルト角20度、右腕チルト角40度等)のリストである。
The storage unit 2202 also includes a classification table 2050 (FIG. 24) that is referred to when specifying a person's classification, a processing database 2060 (FIG. 25) that is referred to when executing a process according to the classification, and the like.
In addition, it has an action DB (not shown) in which a control sequence table describing the control sequence of the robot 2002 is stored. The control sequence table stores data in which all actions performed by the robot 2002 are associated with the movable part control sequence of the robot 2002. The movable part control sequence is a list of driving positions and values of the movable parts to be controlled when realizing each action (for example, head tilt angle 20 degrees, right arm tilt angle 40 degrees, etc.).
 入力部2203は、例えばタッチパネル等の入力装置や各種操作ボタン等を含み、入力されたデータを制御部2201へ出力する。
 表示部2204は、例えば液晶パネル等のディスプレイ装置と、ディスプレイ装置と連携して表示処理を実行するための論理回路で構成され、制御部2201の制御により入力された表示情報をディスプレイ装置上に表示させる。なお、入力部2203のタッチパネルは表示部2204のディスプレイと一体的に構成される。
The input unit 2203 includes an input device such as a touch panel and various operation buttons, for example, and outputs input data to the control unit 2201.
The display unit 2204 includes, for example, a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2201 on the display device. Let Note that the touch panel of the input unit 2203 is configured integrally with the display of the display unit 2204.
 メディア入出力部2205は、例えば、磁気ディスク、光ディスク、半導体メモリ等の記録媒体(メディア)等のドライブ装置であり、データの入出力を行う。 The media input / output unit 2205 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and inputs / outputs data.
 通信I/F2206は、ネットワーク2008との通信を媒介するインタフェースである。ネットワーク2008は、LAN(Local Area Network)や、より広域に通信接続されたWAN(Wide Area Network)、またはインターネット等の公衆の通信回線、基地局等を含む。ネットワーク2008との通信接続は有線、無線を問わない。ロボット2002はネットワーク2008を介してWebサーバにアクセスし、各種のプログラムやデータを送受信可能である。 The communication I / F 2206 is an interface that mediates communication with the network 2008. The network 2008 includes a LAN (Local Area Network), a WAN (Wide Area Network) connected to a wider area, or a public communication line such as the Internet, a base station, and the like. The communication connection with the network 2008 may be wired or wireless. The robot 2002 can access a Web server via the network 2008 and can transmit and receive various programs and data.
 可動部駆動機構2207は、ロボット2002の頭部、首、肩、腕、腰、脚等に設けられた可動部を駆動する機構及び駆動装置を含む。可動部駆動機構2207の動作は制御部2201により制御される。制御部2201は記憶部2202のアクションDBから可動部制御シーケンスを読み出し、各可動部駆動機構2207に対して制御信号を送り、各駆動部を動作させる。 The movable part drive mechanism 2207 includes a mechanism and a drive device for driving a movable part provided on the head, neck, shoulder, arm, waist, leg, and the like of the robot 2002. The operation of the movable unit driving mechanism 2207 is controlled by the control unit 2201. The control unit 2201 reads the movable part control sequence from the action DB of the storage unit 2202, sends a control signal to each movable part drive mechanism 2207, and operates each drive part.
 周辺機器I/F(インタフェース)部2208は、周辺機器を接続させるためのポートであり、周辺機器とのデータの送受信を媒介する。周辺機器との接続形態は有線、無線を問わない。ロボット2002は周辺機器I/F部2208を介してコンピュータ2003等と通信接続可能となる。 Peripheral device I / F (interface) unit 2208 is a port for connecting a peripheral device, and mediates transmission / reception of data with the peripheral device. The connection form with the peripheral device may be wired or wireless. The robot 2002 can be connected to a computer 2003 or the like via the peripheral device I / F unit 2208.
 音声処理部2209は、音声合成部、言語判別部、音声認識部等を含む。音声合成部は、制御部2201からの指令に基づき、文字情報(テキストデータ)に対応する合成音声データを生成し、スピーカ2213から合成音声を出力する。合成音声は、予め記憶部2202の発話データベースに格納されている文字情報と音声データとの対応関係に基づいて生成される。 The speech processing unit 2209 includes a speech synthesis unit, a language discrimination unit, a speech recognition unit, and the like. The speech synthesizer generates synthesized speech data corresponding to character information (text data) based on a command from the control unit 2201 and outputs synthesized speech from the speaker 2213. The synthesized speech is generated based on the correspondence between character information and speech data stored in advance in the utterance database of the storage unit 2202.
 言語判別部は、入力された人物の音声データに対して、例えば、各言語による音素の種類、単語の音素配列パターン、韻律パターン、語彙などの違いを統計的にモデル化した言語判別モデルを用いて音声言語判別処理を実行することで、人物の発話言語を判別する。
 音声認識部は、言語判別された音声データに対して、言語毎に用意された音声認識用の統計モデル(音響モデルおよび言語モデル)を用いて音声認識処理を実行することで、人物の発話内容を認識する。
The language discriminating unit uses, for example, a language discriminating model that statistically models differences in phoneme type, word phoneme arrangement pattern, prosodic pattern, vocabulary, etc., for each person's speech data. The speech language discrimination process is executed to discriminate the speech language of the person.
The speech recognition unit performs speech recognition processing on speech data that has been subjected to language discrimination using a statistical model (acoustic model and language model) for speech recognition prepared for each language. Recognize
 カメラ2210は、ロボット2002に内蔵される、例えばCCD(Charge-Coupled Device)カメラ等の単独のカメラ、ステレオカメラ、距離センサ、或いはこれらを組み合わせた複合的なセンサであり、周囲の映像データを取得する。取得した映像データは制御部2201に送られる。 The camera 2210 is a single camera built in the robot 2002, such as a CCD (Charge-Coupled Device) camera, a stereo camera, a distance sensor, or a combination of these, and acquires surrounding video data. To do. The acquired video data is sent to the control unit 2201.
 マイク2211は、ロボット2002に内蔵される、例えば、単一のマイクロホン、或いは複数のマイクロホンから構成されるマイクロホンアレイであり、周囲の音声データを取得する。取得した音声データは制御部2201に送られる。 The microphone 2211 is a microphone array built in the robot 2002, for example, a single microphone or a plurality of microphones, and acquires ambient audio data. The acquired audio data is sent to the control unit 2201.
 匂いセンサ2212は、ロボット2002に内蔵される、例えば、水晶振動子式、半導体式の単一のセンサからなるもの、或いは複数のセンサからなるセンサアレイであり、周囲の匂いデータを取得する。取得した匂いデータは制御部2201に送られる。 The odor sensor 2212 is a sensor array that is built in the robot 2002 and is composed of, for example, a single sensor of a crystal oscillator type or a semiconductor type, or a sensor array including a plurality of sensors, and acquires ambient odor data. The acquired odor data is sent to the control unit 2201.
 スピーカ2213は、音声処理部2209の音声合成部により生成された合成音声を出力する。 The speaker 2213 outputs the synthesized speech generated by the speech synthesis unit of the speech processing unit 2209.
 各種センサ2214は、3Dセンサ、タッチセンサ、ジャイロセンサ、ソナーセンサ、レーザセンサ、バンパーセンサ、赤外線センサ等のセンサを含む。各センサは検出した信号を制御部2201に入力する。
 バス2220は、制御信号、データ信号等の授受を媒介する経路である。
The various sensors 2214 include sensors such as a 3D sensor, a touch sensor, a gyro sensor, a sonar sensor, a laser sensor, a bumper sensor, and an infrared sensor. Each sensor inputs the detected signal to the control unit 2201.
The bus 2220 is a path that mediates transmission / reception of control signals, data signals, and the like.
 コンピュータ2003は、情報処理システム2001全体の動作を制御する制御装置であり、カメラ2004による映像データの取得、マイク2005による音声データの取得、匂いセンサ2006による匂いデータの取得、ディスプレイ2007の表示制御を行う他、ロボット2002の動作を制御する。 A computer 2003 is a control device that controls the overall operation of the information processing system 2001. The computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, and display control of the display 2007. In addition, the operation of the robot 2002 is controlled.
 図18に示すように、コンピュータ2003は、制御部2301、記憶部2302、メディア入出力部2303、周辺機器I/F部2304、入力部2306、表示部2307、通信I/F2308がバス2309を介して接続されて構成される。 As shown in FIG. 18, the computer 2003 includes a control unit 2301, a storage unit 2302, a media input / output unit 2303, a peripheral device I / F unit 2304, an input unit 2306, a display unit 2307, and a communication I / F 2308 via a bus 2309. Connected and configured.
 制御部2301は、CPU、ROM、RAM等により構成される。CPUは、記憶部2302、ROM、記録媒体等に格納されるプログラムをRAM上のワークメモリ領域に呼び出して実行し、バス2309を介して接続された各部を駆動制御する。 The control unit 2301 includes a CPU, a ROM, a RAM, and the like. The CPU calls a program stored in the storage unit 2302, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2309.
 制御部2301のCPUは、ロボット2000が実行する処理(図26)において、カメラ2004による撮影制御やマイク2005の音声取得制御や匂いセンサ2006による匂い取得制御やディスプレイ2007の表示制御を行う。 The CPU of the control unit 2301 performs shooting control by the camera 2004, voice acquisition control of the microphone 2005, odor acquisition control by the odor sensor 2006, and display control of the display 2007 in the process executed by the robot 2000 (FIG. 26).
 ROMは、コンピュータ2003のブートプログラムやBIOS等のプログラム、データ等を恒久的に保持する。RAMは、ロードしたプログラムやデータを一時的に保持するとともに、制御部2301が各種処理を行うために使用するワークエリアを備える。 The ROM permanently stores a program such as a boot program for the computer 2003, a BIOS, data, and the like. The RAM temporarily holds the loaded program and data, and includes a work area used by the control unit 2301 to perform various processes.
 記憶部2302は、制御部2301が実行するプログラムや、プログラム実行に必要なデータ、オペレーティングシステム等が格納されている。これらのプログラムコードは、制御部2301により必要に応じて読み出されてRAMに移され、CPUに読み出されて実行される。 The storage unit 2302 stores a program executed by the control unit 2301, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2301 as necessary, transferred to the RAM, and read and executed by the CPU.
 メディア入出力部2303は、例えば、磁気ディスク、光ディスク、半導体メモリ等の記録媒体(メディア)等のドライブ装置であり、データの入出力を行う。 The media input / output unit 2303 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and performs data input / output.
 周辺機器I/F(インタフェース)部2304は、周辺機器を接続させるためのポートであり、周辺機器I/F部2304を介して周辺機器とのデータの送受信を行う。周辺機器との接続形態は有線、無線を問わない。周辺機器I/F部2304を介してコンピュータ2003に、ロボット2002、カメラ2004、マイク2005、匂いセンサ2006、及びディスプレイ2007等を通信接続できる。 Peripheral device I / F (interface) unit 2304 is a port for connecting a peripheral device, and transmits / receives data to / from the peripheral device via peripheral device I / F unit 2304. The connection form with the peripheral device may be wired or wireless. A robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, a display 2007, and the like can be connected to a computer 2003 via a peripheral device I / F unit 2304.
 入力部2306は、例えばキーボード、マウス、タッチパネル等の入力装置であり、入力されたデータを制御部2301へ出力する。 The input unit 2306 is an input device such as a keyboard, a mouse, or a touch panel, and outputs input data to the control unit 2301.
 表示部2307は、例えば液晶パネル等のディスプレイ装置と、ディスプレイ装置と連携して表示処理を実行するための論理回路で構成され、制御部2301の制御により入力された表示情報をディスプレイ装置上に表示させる。なお、入力部2306のタッチパネルは表示部2307のディスプレイと一体的に構成される。また、表示部2307は外付けのディスプレイ2007としてもよい。制御部2301は表示部2307またはディスプレイ2007に対して表示情報を送り、表示させる。 The display unit 2307 includes a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2301 on the display device. Let Note that the touch panel of the input unit 2306 is configured integrally with the display of the display unit 2307. The display unit 2307 may be an external display 2007. The control unit 2301 sends display information to the display unit 2307 or the display 2007 to display it.
 通信I/F2308は、ネットワーク2008との通信を媒介するインタフェースである。コンピュータ2003はネットワーク2008上のWebサーバから各種のプログラムやデータを送受信可能である。 The communication I / F 2308 is an interface that mediates communication with the network 2008. A computer 2003 can transmit and receive various programs and data from a Web server on the network 2008.
 バス2309は、制御信号、データ信号等の授受を媒介する経路である。 The bus 2309 is a path that mediates transmission / reception of control signals, data signals, and the like.
 次に、図19を参照してロボット2002(情報処理システム2001)の機能構成について説明する。 Next, the functional configuration of the robot 2002 (information processing system 2001) will be described with reference to FIG.
 ロボット2002(情報処理システム2001)は、人物特定部2020、映像取得部2021、音声取得部2022、匂い取得部2023、特徴抽出部2024、分類特定部2025、処理部2026、アクション部2029の各機能を備える。各機能の処理はプログラムによって実行される。このプログラムはロボット2002の記憶部2202またはコンピュータ2003の記憶部2302に予め記憶されものとしてもよいし、ネットワーク2008を介してダウンロードしインストール可能なものとしてもよい。 The robot 2002 (information processing system 2001) includes functions of a person identification unit 2020, a video acquisition unit 2021, an audio acquisition unit 2022, an odor acquisition unit 2023, a feature extraction unit 2024, a classification identification unit 2025, a processing unit 2026, and an action unit 2029. Is provided. Processing of each function is executed by a program. This program may be stored in advance in the storage unit 2202 of the robot 2002 or the storage unit 2302 of the computer 2003, or may be downloaded and installed via the network 2008.
 人物特定部2020は、ロボット2002が対話、情報案内、宣伝などの処理を行う対象人物を特定する。対象人物は、ロボット2002に興味を示した人物やロボット2002との対話を望む人物であることが好ましく、例えば、ロボット2002に話しかけてきた人物や、ロボット2002に触れてきた人物である。 The person specifying unit 2020 specifies a target person for which the robot 2002 performs processing such as dialogue, information guidance, and advertisement. The target person is preferably a person who is interested in the robot 2002 or a person who wants to interact with the robot 2002. For example, the target person is a person who talks to the robot 2002 or touches the robot 2002.
 ロボット2002に話しかけてきた人物は、例えば、ロボット2002に内蔵されるマイク2211または外付けのマイク2005により推定される人物の発話音声の到来方向と、ロボット2002に内蔵されるカメラ2210または外付けのカメラ2004により取得される映像データ中における人物の顔方向と、の一致を検知することで特定される。
 ロボット2002に触れてきた人物は、例えば、ロボット2002に内蔵されるカメラ2210または外付けのカメラ2004により取得される映像データからロボット2002に近接する人物を検知し、且つ、ロボット2002が備えるタッチセンサにより人物の接触動作を検知することで特定される。
The person who talks to the robot 2002 is, for example, the arrival direction of the person's utterance voice estimated by the microphone 2211 built in the robot 2002 or the external microphone 2005, and the camera 2210 built in the robot 2002 or the external microphone. It is specified by detecting the coincidence with the face direction of the person in the video data acquired by the camera 2004.
The person who has touched the robot 2002 detects, for example, a person close to the robot 2002 from video data acquired by the camera 2210 built in the robot 2002 or the external camera 2004, and the robot 2002 has a touch sensor. Is identified by detecting the contact action of a person.
 なお対象人物は、ロボット2002に話しかけてきた人物や、ロボット2002に触れてきた人物以外にも、単にロボット2002に接近してきた人物や単に映像データ中に映っている人物でもよい。
 また、映像データ中に複数の人物が含まれている場合には、対象人物を複数の人物を含む1つの集団として特定してもよい。この場合、映像データ中の複数の人物間の位置や動きの関係性に基づいて、どの人物とどの人物が同一の集団に属するかを判別することで、対象人物(1つの集団)を特定する。
Note that the target person may be a person who is approaching the robot 2002 or a person who is simply reflected in the video data, in addition to a person who has spoken to the robot 2002 or a person who has touched the robot 2002.
Further, when a plurality of persons are included in the video data, the target person may be specified as one group including a plurality of persons. In this case, a target person (one group) is specified by determining which person and which person belong to the same group based on the relationship between positions and movements among a plurality of persons in the video data. .
 映像取得部2021は、ロボット2002に内蔵されるカメラ2210または外付けのカメラ2004により人物特定部2020により特定された人物を含む映像データを取得する。 The video acquisition unit 2021 acquires video data including the person specified by the person specifying unit 2020 by the camera 2210 built in the robot 2002 or the external camera 2004.
 音声取得部2022は、ロボット2002に内蔵されるマイク2211または外付けのマイク2005により人物特定部2020により特定された人物の音声データを取得する。なお、マイク2211またはマイク2005が指向性を制御可能なマイクロホンアレイの場合には、人物の顔方向にマイク2211またはマイク2005の指向性を向けて、人物の音声データを取得することが望ましい。 The voice acquisition unit 2022 acquires the voice data of the person specified by the person specifying unit 2020 using the microphone 2211 built in the robot 2002 or the external microphone 2005. In the case where the microphone 2211 or the microphone 2005 is a microphone array whose directivity can be controlled, it is desirable to acquire the voice data of the person with the directivity of the microphone 2211 or the microphone 2005 directed toward the face of the person.
 匂い取得部2023は、ロボット2002に内蔵される匂いセンサ2212または外付けの匂いセンサ2006により人物特定部2020により特定された人物の匂いデータを取得する。なお、匂いセンサ2212または匂いセンサ2006が指向性を制御可能なセンサアレイの場合には、人物の方向に匂いセンサ2212または匂いセンサ2006の指向性を向けて、人物の匂いデータを取得することが望ましい。 The odor acquisition unit 2023 acquires the odor data of the person specified by the person specifying unit 2020 by the odor sensor 2212 built in the robot 2002 or the external odor sensor 2006. When the odor sensor 2212 or the odor sensor 2006 is a sensor array capable of controlling directivity, the odor data of the person can be acquired by directing the directivity of the odor sensor 2212 or the odor sensor 2006 toward the person. desirable.
 特徴抽出部2024は、複数の異なる種別のデータを用いて、人物特定部2020により特定した人物の特徴を抽出する。具体的には、映像取得部2021により取得された映像データ、音声取得部2022により取得された音声データ、匂い取得部2023により取得された匂いデータ、の各データを用いて、データ毎に人物の特徴を抽出する。
 ここで、映像データに基づいて抽出される人物の特徴を「第1の特徴」、音声データに基づいて抽出される人物の特徴を「第2の特徴」、匂いデータに基づいて抽出される人物の特徴を「第3の特徴」と呼ぶ。以下、第1の特徴、第2の特徴、第3の特徴について具体例を示す。
The feature extraction unit 2024 extracts the features of the person specified by the person specifying unit 2020 using a plurality of different types of data. Specifically, using the data of the video data acquired by the video acquisition unit 2021, the audio data acquired by the audio acquisition unit 2022, and the odor data acquired by the odor acquisition unit 2023, the data of each person Extract features.
Here, the feature of the person extracted based on the video data is “first feature”, the feature of the person extracted based on the audio data is “second feature”, and the person is extracted based on the odor data. This feature is referred to as a “third feature”. Specific examples of the first feature, the second feature, and the third feature will be described below.
(第1の特徴)
 第1の特徴は、映像データから抽出される人物の特徴であり、図20に示すように、例えば「動作特徴」(動作1、動作2、…)、「外見特徴」(外見1、外見2、…)、「反応特徴」(反応1、反応2、…)である。
(First feature)
The first feature is a feature of a person extracted from video data. As shown in FIG. 20, for example, “motion feature” (motion 1, motion 2,...), “Appearance feature” (look 1, appearance 2). ,..., “Reaction characteristics” (reaction 1, reaction 2,...).
 「動作特徴」とは、人物の動作に関する特徴であり、例えば、映像データに対して動作解析を適用することで得られる、人物の歩き方(歩行速度、歩幅、腕の振り、姿勢など)の特徴である。なお人物が集団の場合には、集団に属する各人物の歩き方の平均的な特徴や各人物間の位置関係(横隊1列/横隊2列/縦隊1列、…など)の特徴である。 “Motion characteristics” are characteristics related to the movement of a person, for example, how the person walks (walking speed, stride, arm swing, posture, etc.) obtained by applying motion analysis to video data. It is a feature. When a person is a group, it is an average characteristic of how to walk each person belonging to the group and a characteristic of a positional relationship between the persons (one line of one row / two lines of one column / one column of columns, etc.).
 「外見特徴」とは、人物の外見に関する特徴であり、例えば、映像データに対して顔画像解析を適用することで得られる、人物の性別、年齢(年代)、眼の色、髪の色などの特徴である。顔画像から人物の性別、年齢(年代)などを認識する方法は公知の手法(特開2012-053813等)を用いればよい。
 また、表情や感情を外見特徴として抽出してもよい。表情は、喜び、驚き、怒り、悲しみ、無表情等のいくつかのパターンに分類され、これら表情のパターンは、口角の上り具合、頬、目、眉等の位置関係等から読み取り可能である。
 感情は、読み取った表情から推定可能であり、例えば、特開平10-255043、特開2014-206903等に開示されている方法を用いればよい。
 また、映像データに対して公知の服装識別解析を適用することで得られる、人物の服装種類(スーツ、ワンピース、民族衣装、…)を外見特徴として抽出してもよい。
 なお人物が集団の場合には、特徴抽出部2024は、集団に属する各人物の平均的な外見特徴を抽出する。
“Appearance features” are features related to the appearance of a person. For example, the gender, age (age), eye color, hair color, etc. of a person obtained by applying facial image analysis to video data It is the feature. A known method (Japanese Patent Application Laid-Open No. 2012-053813, etc.) may be used as a method for recognizing the gender, age (age), etc. of a person from a face image.
Also, facial expressions and emotions may be extracted as appearance features. The facial expressions are classified into several patterns such as joy, surprise, anger, sadness, and no facial expression. These facial expressions can be read from the rising angle of the mouth corners, the positional relationship of the cheek, eyes, eyebrows, and the like.
The emotion can be estimated from the read facial expression. For example, a method disclosed in Japanese Patent Laid-Open Nos. 10-255043 and 2014-206903 may be used.
Further, a person's clothes type (suit, dress, folk costume,...) Obtained by applying a known clothes identification analysis to the video data may be extracted as an appearance feature.
When the person is a group, the feature extraction unit 2024 extracts an average appearance feature of each person belonging to the group.
 「反応特徴」とは、ロボット2002が人物へ行う情報の提示に対する、人物の反応に関する特徴である。情報の提示とは、例えば、天気情報、ニュースなどの情報を表示部2204やディスプレイ2007に表示して提示する他、人物に対して行う挨拶などを含む。例えば、所定の挨拶文(「おはようございます」「こんにちは」「ご機嫌いかがですか?」…)に対応する合成音声を音声処理部2209により生成して、スピーカ2213から音声出力(読み上げ)させる。或いは、所定の挨拶動作(「手を振る」「お辞儀をする」…)に対応するアクションの可動部制御シーケンスをアクションDBから読み出し、可動部駆動機構2207に対して制御信号を送り、各駆動部を動作させる。
 そして、特徴抽出部2024は、情報の提示を行った後に映像取得部2021により取得される映像データに対して動作解析、表情解析等を実行することで、情報の提示に対する人物の反応(「笑顔になる」「深刻な顔になる」「近づいてくる」「手を振り返す」「反応なし」など)を特徴として抽出する。
 なお人物が集団の場合には、特徴抽出部2024は、集団に属する各人物の平均的な反応特徴を抽出する。
The “reaction feature” is a feature related to the reaction of the person to the presentation of information performed by the robot 2002 to the person. The presentation of information includes, for example, information such as weather information and news displayed on the display unit 2204 and the display 2007, and also a greeting for a person. For example, the synthesized speech corresponding to a predetermined greeting ( "Good morning", "Hello", "Are you happy how?" ...) was generated by the audio processing unit 2209, the audio output (reading) from the speaker 2213 makes. Alternatively, a movable part control sequence of an action corresponding to a predetermined greeting action (“waving hand”, “bowing”,...) Is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part To work.
Then, the feature extraction unit 2024 performs motion analysis, facial expression analysis, and the like on the video data acquired by the video acquisition unit 2021 after the information is presented, so that a person's reaction to the information presentation (“smile” ”,“ Becoming serious ”,“ coming closer ”,“ turning back ”,“ no response ”, etc.) as features.
When the person is a group, the feature extraction unit 2024 extracts an average reaction feature of each person belonging to the group.
(第2の特徴)
 第2の特徴は、音声データから抽出される人物の特徴であり、例えば、図21に示すように、「声の強さ」、「声の高さ」、「周波数特性」、「声質」(明るい声/暗い声/太い声/痩せた声、など)、「発話テンポ」(ゆっくり話す/せわしなく話す、など)、「倍音特性」(音の柔らかさに関する特性)、「抑揚」(イントネーション)、「反応音特徴」(「うーん」「んー」「えー」などのような話し相手の発言に対して何らかの反応を示す発話音の特徴)である。これらは、音声の非言語的情報であり、人物の個人的特徴や身体的特徴に起因した音声の特徴、人物の意図や感情などが表われる音声の特徴である。
(Second feature)
The second feature is a feature of the person extracted from the voice data. For example, as shown in FIG. 21, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality” ( Bright voice / dark voice / thick voice / faint voice, etc.), “speech tempo” (speak slowly / busily, etc.), “overtone characteristics” (characteristics related to softness of sound), “intonation” (intonation), “Reaction sound characteristics” (characteristics of utterance sounds that show some response to the utterances of the other party, such as “um”, “n”, “e”). These are non-verbal information of speech, and are speech features that represent personal features and physical features of a person, and intentions and emotions of the person.
 また、「発話言語」「発話内容」のような音声の言語的情報を特徴として抽出してもよい。例えば、「発話言語」は、ロボット2002の音声処理部2209によって人物の音声データに対して、各言語による音素の種類、単語の音素配列パターン、韻律パターン、語彙などの違いを統計的にモデル化した言語判別モデルを用いて分析することで抽出される。また「発話内容」は、ロボット2002の音声処理部2209によって、マイク2211から入力された音声データについて音声認識処理を適用することで抽出される。 Also, speech linguistic information such as “utterance language” and “utterance content” may be extracted as features. For example, the “speech language” is statistically modeled by the speech processing unit 2209 of the robot 2002 for differences in phoneme types, word phoneme arrangement patterns, prosodic patterns, vocabulary, etc., for human speech data. It is extracted by analyzing using the language discrimination model. Further, the “utterance content” is extracted by applying a voice recognition process to the voice data input from the microphone 2211 by the voice processing unit 2209 of the robot 2002.
(第3の特徴)
 第3の特徴は、匂いデータから抽出される人物の特徴であり、図22に示すように、例えば「匂いの種類」(種類1、種類2、…)、「匂いの強さ」(強さ1、強さ2、…)である。
 匂いの種類は、予め匂いの種類を既知とした匂いデータを多数収集して学習した、匂いの種類を判別する匂い判別モデルを用いて分析することで、特定される。匂いの強さは、匂いセンサ2006(或いは匂いセンサ2212)により取得される匂いデータ(入力値)の絶対値等から算出できる。
(Third feature)
The third feature is a feature of a person extracted from odor data. As shown in FIG. 22, for example, “scent type” (type 1, type 2,...), “Scent intensity” (strength). 1, strength 2, ...).
The odor type is specified by analyzing using an odor discrimination model for discriminating the odor type, which is acquired by learning a large amount of odor data in which the odor type is known in advance. The intensity of odor can be calculated from the absolute value of odor data (input value) acquired by the odor sensor 2006 (or odor sensor 2212).
 分類特定部2025は、特徴抽出部2024により抽出される上記した特徴(第1の特徴、第2の特徴、第3の特徴)に基づいて、人物の分類2051を特定する。
 分類2051とは、人物を任意の属性(嗜好、性別、年代、言語、国籍等)でグループ分けしたものであり、図24の分類テーブル2050に示すように、各分類2051(分類1、分類2、…)は、第1の特徴、第2の特徴、第3の特徴の各特徴によって規定される。各分類2051(分類1、分類2、…)をどのような特徴によって規定するかは、予め分類2051(分類1、分類2、…)を既知とした多数の人物から各特徴(第1の特徴、第2の特徴、第3の特徴)を抽出し学習することで統計的に決定される。これにより、各分類2051(分類1、分類2、…)に属する人物が、どのような特徴(第1の特徴、第2の特徴、第3の特徴)を有する傾向があるかを統計的に把握することができる。
The classification specifying unit 2025 specifies the person classification 2051 based on the above-described features (first feature, second feature, and third feature) extracted by the feature extraction unit 2024.
The classification 2051 is a grouping of persons with arbitrary attributes (preference, gender, age, language, nationality, etc.). As shown in the classification table 2050 in FIG. 24, each classification 2051 (class 1, classification 2, classification 2). ,... Are defined by the first feature, the second feature, and the third feature. The characteristics that define each class 2051 (class 1, class 2,...) Are determined based on the characteristics (first characteristics) from a large number of persons who have previously known the class 2051 (class 1, class 2,...). , Second feature, and third feature) are extracted and learned statistically. Thereby, it is statistically determined what kind of features (first feature, second feature, third feature) a person belonging to each category 2051 (category 1, category 2,...) Has. I can grasp it.
 例えば、図24の例では、「分類1」に属する人物は、統計的に、「第1の特徴」として、「動作3」「外見10」「反応6」…の特徴を有し、「第2の特徴」として、「(声の)強さ2」「(声の)高さ9」「周波数特性1」…の特徴を有し、「第3の特徴」として、「(匂いの)種類5」「(匂いの)強さ4」…の特徴を有する傾向が多いことが把握できる。
 また「分類2」に属する人物は、統計的に、「第1の特徴」として、「動作5」「外見2」「反応1」…の特徴を有し、「第2の特徴」として、「(声の)強さ4」「(声の)高さ7」「周波数特性3」…の特徴を有し、「第3の特徴」として、「(匂いの)種類3」「(匂いの)強さ1」…の特徴を有する傾向が多いことが把握できる。
 また「分類3」に属する人物は、統計的に、「第1の特徴」として、「動作2」「外見7」「反応3」…の特徴を有し、「第2の特徴」として、「(声の)強さ6」「(声の)高さ1」「周波数特性4」…の特徴を有し、「第3の特徴」として、「(匂いの)種類2」「(匂いの)強さ8」…の特徴を有する傾向が多いことが把握できる。
For example, in the example of FIG. 24, a person belonging to “Category 1” statistically has the characteristics of “operation 3”, “appearance 10”, “response 6”,. 2) "(voice) strength 2", "(voice) height 9", "frequency characteristic 1", etc., and "third characteristic" is "(odor) type" It can be understood that there is a tendency to have the characteristics of “5”, “(odor) intensity 4”.
In addition, a person belonging to “Category 2” statistically has “motion 5”, “appearance 2”, “response 1”, etc. as “first characteristics”, and “second characteristics” (Voice) strength 4 ”, (Voice) height 7”, “Frequency characteristics 3”, etc., and “Third feature” includes “(Odor) type 3” “Odor” It can be understood that there is a tendency to have the characteristic of strength 1 ”.
In addition, a person belonging to “Category 3” statistically has “motion 2”, “appearance 7”, “response 3”, etc. as “first feature”, and “second feature” as “second feature” (Voice) strength 6 ", (voice) height 1", "frequency characteristic 4", and so on, and "third characteristic" includes "(smell) type 2""(smell)" It can be understood that there is a tendency to have the characteristic of strength 8 ”.
 分類特定部2025は、分類テーブル2050を参照して、特徴抽出部2024により抽出された特徴に基づいて人物の分類を特定する。例えば、特徴抽出部2024により抽出された全ての特徴について、各特徴がどの分類2051(分類1、分類2、…)に属するかを集計し、最も集計数が多い分類2051を人物の分類として特定する。 The classification specifying unit 2025 refers to the classification table 2050 and specifies the person's classification based on the features extracted by the feature extracting unit 2024. For example, for all the features extracted by the feature extraction unit 2024, the category 2051 (category 1, category 2,...) To which each feature belongs is aggregated, and the category 2051 with the largest aggregation count is identified as the person category. To do.
 処理部2026は、分類特定部2025により特定された分類2051の処理データ2061に基づいて、処理を行う。「処理」とは、ロボット2002が人物の相手になって受け答えをする処理(対話など)や、ロボット2002が人物に対して一方的に情報を提供する処理(宣伝など)を含む。
 図25は処理データ2061を保持する処理データベース2060の内容を示す図である。図に示すように、処理データベース2060は、分類2051(分類1、分類2、…)毎に処理データ2061(処理データ1、処理データ2、…)を保持する。
The processing unit 2026 performs processing based on the processing data 2061 of the classification 2051 specified by the classification specifying unit 2025. “Processing” includes processing (interaction or the like) in which the robot 2002 becomes a person's partner and receives and answers, and processing (advertisement or the like) in which the robot 2002 unilaterally provides information to the person.
FIG. 25 is a diagram showing the contents of the processing database 2060 that holds the processing data 2061. As shown in the figure, the processing database 2060 holds processing data 2061 (processing data 1, processing data 2,...) For each category 2051 (class 1, classification 2,...).
 処理データ2061は、例えば、映像データ、音声データ、テキストデータ、動作データ、或いはこれらデータの任意の組み合わせである。
 処理データ2061が映像データの場合、ロボット2002の表示部2204やディスプレイ2007に映像を表示させる。処理データ2061が音声データの場合、ロボット2002のスピーカ2213から音声出力させる。処理データがテキストデータの場合、音声処理部2209が、テキストデータの合成音声を生成し、スピーカ2213から音声出力させる(或いは、ロボット2002の表示部2204やディスプレイ2007にテキストデータを表示させる)。処理データ2061が動作データの場合、動作データに含まれる各アクションに対応した可動部制御シーケンスをアクションDBから読み出し、可動部駆動機構2207に対して制御信号を送り、各駆動部を動作させる。
The processing data 2061 is, for example, video data, audio data, text data, operation data, or any combination of these data.
When the processing data 2061 is video data, the video is displayed on the display unit 2204 or the display 2007 of the robot 2002. When the processing data 2061 is audio data, audio is output from the speaker 2213 of the robot 2002. When the processing data is text data, the voice processing unit 2209 generates a synthesized voice of the text data and outputs the voice from the speaker 2213 (or displays the text data on the display unit 2204 or the display 2007 of the robot 2002). When the processing data 2061 is operation data, a movable part control sequence corresponding to each action included in the operation data is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part is operated.
 処理データ2061は、システム設計者等により事前に任意に設定される。例えば各分類2051(分類1、分類2、…)が、分類1=「英語」、分類2=「フランス語」のように「言語」によりグループ分けされていれば、処理データ2061として、言語毎の対話データ(音声データ)を設定することができる。また、各分類2051(分類1、分類2、…)が、分類1=「10代」、分類2=「20代」のように「年代」によりグループ分けされていれば、処理データ2061として、各年代向けの情報(各年代で最近流行している商品情報等)を設定することができる。 Processing data 2061 is arbitrarily set in advance by a system designer or the like. For example, if each category 2051 (category 1, category 2,...) Is grouped by “language” such as category 1 = “English” and category 2 = “French”, the processing data 2061 for each language Dialog data (voice data) can be set. Further, if each category 2051 (category 1, category 2,...) Is grouped by “age” such as category 1 = “10's” and category 2 = “20's”, as processing data 2061, Information for each age (such as product information that has recently become popular in each age) can be set.
 次に、図26を参照してロボット2002が実行する処理の流れの一例を説明する。
 ロボット2002は、まず、対話、情報案内、宣伝などの処理を行う対象人物を特定する(ステップS2000)。
Next, an example of the flow of processing executed by the robot 2002 will be described with reference to FIG.
First, the robot 2002 identifies a target person for processing such as dialogue, information guidance, and advertisement (step S2000).
 続いて、ロボット2002は、複数の異なる種別のデータを用いて、ステップS2001において特定した人物の特徴を抽出する。
 まず、ロボット2002は、映像データを取得し、人物の第1の特徴を抽出する(ステップS2001)。例えば、前記したように、映像データから人物の「動作特徴」「外見特徴」「反応特徴」等(図20参照)を抽出する。
 なお「反応特徴」を抽出する場合には、ステップS2001の前に、人物へ所定の情報提示を行い、ステップS2001において、この情報提示に対する人物の反応を特徴として抽出する。
Subsequently, the robot 2002 extracts the characteristics of the person identified in step S2001 using a plurality of different types of data.
First, the robot 2002 acquires video data and extracts a first feature of a person (step S2001). For example, as described above, the “motion characteristics”, “appearance characteristics”, “reaction characteristics”, etc. of the person (see FIG. 20) are extracted from the video data.
When extracting “reaction features”, predetermined information is presented to a person before step S2001, and in step S2001, a person's reaction to the information presentation is extracted as a feature.
 続いて、ロボット2002は、音声データを取得し、人物の第2の特徴を抽出する(ステップS2002)。例えば、前記したように、音声データから人物の「声の強さ」「声の高さ」「周波数特性」「声質」「発話テンポ」「倍音特性」「抑揚」「反応音」「発話言語」「発話内容」等(図21参照)を抽出する。 Subsequently, the robot 2002 acquires voice data and extracts the second feature of the person (step S2002). For example, as described above, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality”, “speech tempo”, “overtone characteristics”, “inflection”, “reaction sound”, “speech language” of a person from voice data “Speech contents” and the like (see FIG. 21) are extracted.
 さらに、ロボット2002は、匂いデータを取得し、人物の第3の特徴を抽出する(ステップS2003)。例えば、前記したように、匂いデータから人物の「匂いの種類」「匂いの強さ」等(図22参照)を抽出する。 Furthermore, the robot 2002 acquires odor data and extracts a third feature of the person (step S2003). For example, as described above, a person's “scent type”, “scent intensity”, and the like (see FIG. 22) are extracted from the scent data.
 続いて、ロボット2002は、分類テーブル2050を参照して、ステップS2001~S2003において抽出された人物の特徴(第1の特徴、第2の特徴、第3の特徴)に基づいて、人物の分類2051を特定する(ステップS2004)。例えば、ステップS2001~S2003において抽出された全ての特徴について、各特徴がどの分類2051(分類1、分類2、…)に属するかを集計し、最も集計数が多い分類2051を人物の分類として特定する。 Subsequently, the robot 2002 refers to the classification table 2050, and based on the characteristics of the person (first characteristic, second characteristic, and third characteristic) extracted in steps S2001 to S2003, the classification of persons 2051 is performed. Is identified (step S2004). For example, for all the features extracted in steps S2001 to S2003, the classification 2051 (classification 1, classification 2,...) To which each characteristic belongs is aggregated, and the classification 2051 with the largest aggregation count is specified as the person classification. To do.
 例えば、図23に示すように、「第1の特徴」として「動作3」「外見2」「反応6」が抽出され、「第2の特徴」として「(声の)強さ2」「(声の)高さ7」「周波数特性1」が抽出され、第3の特徴として「(匂いの)種類5」「(匂いの)強さ8」が抽出されたとする。
 この場合、図24の分類テーブル2050を参照すると、抽出された特徴のうち、「動作3」「反応6」「(声の)強さ2」「周波数特性1」「(匂いの)種類5」の5つの特徴が「分類1」に属し、「外見2」「(声の)高さ7」の2つの特徴が「分類2」に属し、「(匂い)の強さ8」の1つの特徴が「分類3」に属する。すなわち、「分類1」に最も多くの特徴が属するため、ロボット2002は、「分類1」を人物の分類として特定する。
For example, as illustrated in FIG. 23, “motion 3”, “appearance 2”, and “response 6” are extracted as “first feature”, and “(voice) strength 2” “(( It is assumed that “sound height” 7 ”and“ frequency characteristic 1 ”are extracted, and“ (odor) type 5 ”and“ (odor) intensity 8 ”are extracted as third characteristics.
In this case, referring to the classification table 2050 of FIG. 24, among the extracted features, “operation 3” “reaction 6” “(voice) strength 2” “frequency characteristic 1” “scent” type 5 ” 5 features belong to “Category 1”, 2 features “Appearance 2” and “Height 7” belong to “Category 2”, and 1 feature “Smell strength 8” Belongs to “Category 3”. That is, since most features belong to “class 1”, the robot 2002 identifies “class 1” as a person category.
 そして、ロボット2002は、処理データベース2060を参照して、ステップS2004において特定された分類2051の処理データ2061を取得し、この処理データ2061に基づいて、所定の処理を実行する(ステップS2005)。例えば、特定した分類2051が「分類1」の場合、ロボット2002は、「分類1」に対応する「処理データ1」(図25参照)に基づいて、処理を実行する。 Then, the robot 2002 refers to the processing database 2060, acquires the processing data 2061 of the classification 2051 specified in step S2004, and executes predetermined processing based on the processing data 2061 (step S2005). For example, when the specified category 2051 is “category 1”, the robot 2002 executes processing based on “processing data 1” (see FIG. 25) corresponding to “classification 1”.
 以上、添付図面を参照して、本発明に係る好適な実施形態について説明したが、本発明は係る例に限定されない。 The preferred embodiments according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples.
 例えば、本発明に係る情報処理装置を、ロボットの形態で説明したが、適用形態はロボットに限定されず、自動販売機、デジタルサイネージ等に適用することも可能である。 For example, although the information processing apparatus according to the present invention has been described in the form of a robot, the application form is not limited to the robot, and can be applied to vending machines, digital signage, and the like.
 また、図19ではロボット2002内の制御部2201及び記憶部2202に、情報処理システム2001が備える各種機能及び各種データ(人物特定部2020、映像取得部2021、音声取得部2022、匂い取得部2023、特徴抽出部2024、分類特定部2025、処理部2026、分類テーブル2050、処理データベース2060等)が設けられるものとしているが、ロボット2002とは別に設けられるコンピュータ2003にこれらの各部やデータを設けるものとしてもよい。 Further, in FIG. 19, various functions and various data included in the information processing system 2001 (person identification unit 2020, video acquisition unit 2021, voice acquisition unit 2022, odor acquisition unit 2023, and the like are included in the control unit 2201 and the storage unit 2202 in the robot 2002. A feature extraction unit 2024, a classification specifying unit 2025, a processing unit 2026, a classification table 2050, a processing database 2060, etc.), but these units and data are provided in a computer 2003 provided separately from the robot 2002. Also good.
 また、図26のステップS2001~S2003において、映像データ、音声データ、匂いデータの全種別のデータから人物の特徴を抽出できるとは限らない。例えば、人物が発話しないような場合には、音声データから人物の特徴を抽出できない。この場合には、映像データと匂いデータから抽出された人物の特徴に基づいて、人物の分類を特定すればよい。また、人物がロボット2002から離れた場所に居るような場合には、匂いデータから人物の特徴を抽出できない。この場合には、映像データと音声データから抽出される人物の特徴に基づいて、人物の分類を特定すればよい。 Further, in steps S2001 to S2003 in FIG. 26, it is not always possible to extract the characteristics of a person from all types of data such as video data, audio data, and odor data. For example, in the case where a person does not speak, the characteristics of the person cannot be extracted from the voice data. In this case, the person classification may be specified based on the characteristics of the person extracted from the video data and the odor data. Further, when the person is in a place away from the robot 2002, the characteristics of the person cannot be extracted from the odor data. In this case, the person classification may be specified based on the characteristics of the person extracted from the video data and the audio data.
 また、図26の処理において、ロボット2002が特定した人物の分類2051に誤りがあった場合、分類テーブル2050を再学習させてもよい。具体的には、図26のステップS2004の後に、ロボット2002が人物に対して特定した分類2051の正誤を確認する処理を加える。例えば分類2051(分類1、分類2、…)が「国籍」でグループ分けされている場合であって、特定された人物の分類2051が「フランス」だとする。この場合、ロボット2002は人物に対して「あなたはどこの国の方ですか?」など特定した分類2051の正誤を確認する質問を行う。これに対し、人物が特定された分類2051(「フランス」)とは異なる分類2051を回答した場合(例えば「“イタリア”から来ました」などの回答をした場合)、ロボット2002は特定した人物の分類2051(「フランス」)が誤りと判断し、正しい分類2051(「イタリア」)とステップS2001~S2003で抽出した特徴(第1の特徴、第2の特徴、第3の特徴)を正しい学習データとして、分類テーブル2050を再学習させる。 In addition, in the processing of FIG. 26, if there is an error in the classification 2051 of the person specified by the robot 2002, the classification table 2050 may be relearned. Specifically, after step S2004 in FIG. 26, processing for confirming the correctness of the classification 2051 specified by the robot 2002 for the person is added. For example, it is assumed that classification 2051 (classification 1, classification 2,...) Is grouped by “nationality”, and the classification 2051 of the specified person is “France”. In this case, the robot 2002 asks the person a question confirming the correctness of the specified classification 2051 such as “Where are you from?”. On the other hand, when the user answers the category 2051 that is different from the category 2051 (“France”) in which the person is specified (for example, the response “I came from Italy”), the robot 2002 specifies the specified person. Class 2051 ("France") is judged to be an error, and correct classification 2051 ("Italy") and the features (first feature, second feature, and third feature) extracted in steps S2001 to S2003 are correctly learned. As the data, the classification table 2050 is relearned.
 その他、当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 In addition, it is obvious that those skilled in the art can come up with various changes and modifications within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. It is understood.
(第3実施形態)
<案内ロボット制御システム3100>
 図27は、第3実施形態に係る案内ロボット制御システム3100の運用に適する場面例を示す図である。
 案内ロボット制御システム3100は、コミュニケーションロボット3001(案内ロボット)を用いたシステムである。コミュニケーションロボット3001は、例えば、展示場内、店舗内等のスペース3000Sに配置されている。なお、コミュニケーションロボット3001は、スペース3000Sを見渡せる位置に配置することが望ましい。
(Third embodiment)
<Guiding robot control system 3100>
FIG. 27 is a diagram illustrating an exemplary scene suitable for operation of the guidance robot control system 3100 according to the third embodiment.
The guide robot control system 3100 is a system using a communication robot 3001 (guide robot). The communication robot 3001 is disposed in a space 3000S such as in an exhibition hall or a store, for example. Note that the communication robot 3001 is desirably arranged at a position where the space 3000S can be looked over.
 コミュニケーションロボット3001は、例えば、高さが1m程度の人型のロボットである。
 そして、コミュニケーションロボット3001は、スペース3000SにいるユーザPを含む(図27の例では、ユーザP1~P3)フレーム画像を取得して分析し、声掛け対象のユーザPを判断して、ユーザPに対して声掛け(案内)をする。コミュニケーションロボット3001は、以下に説明する例では、きょろきょろしているユーザPを、何か困っていると推測して、声掛け対象であると判断する。
The communication robot 3001 is, for example, a humanoid robot having a height of about 1 m.
Then, the communication robot 3001 acquires and analyzes a frame image including the user P in the space 3000S (users P1 to P3 in the example of FIG. 27), determines the user P to be called, Make a voice call (guidance). In the example described below, the communication robot 3001 estimates that the user P who is busy is in trouble, and determines that the user P is a voice call target.
<コミュニケーションロボット3001>
 図28は、第3実施形態に係るコミュニケーションロボット3001の機能ブロック図である。
 図29は、第3実施形態に係るコミュニケーションロボット3001の案内データ記憶部3022の例を示す図である。
 図28に示すように、コミュニケーションロボット3001は、制御部3010と、記憶部3020と、カメラ3031(撮影部)と、スピーカ3032と、モータ部3033とを備える。
 制御部3010は、コミュニケーションロボット3001の全体を制御する中央処理装置(CPU)である。制御部3010は、記憶部3020に記憶されているオペレーティングシステム(OS)やアプリケーションプログラムを適宜読み出して実行することにより、上述したハードウェアと協働し、各種機能を実行する。
<Communication robot 3001>
FIG. 28 is a functional block diagram of a communication robot 3001 according to the third embodiment.
FIG. 29 is a diagram illustrating an example of the guidance data storage unit 3022 of the communication robot 3001 according to the third embodiment.
As shown in FIG. 28, the communication robot 3001 includes a control unit 3010, a storage unit 3020, a camera 3031 (imaging unit), a speaker 3032, and a motor unit 3033.
The control unit 3010 is a central processing unit (CPU) that controls the entire communication robot 3001. The control unit 3010 executes various functions in cooperation with the hardware described above by appropriately reading and executing an operating system (OS) and application programs stored in the storage unit 3020.
 制御部3010は、画像取得部3011(画像取得手段)と、画像分析部3012と、ユーザ決定処理部3013と、案内データ抽出部3015と、移動制御部3016(移動制御手段)と、案内部3017(案内手段)と、表情比較部3018(表情比較手段)とを備える。
 画像取得部3011は、カメラ3031を介してフレーム画像を複数取得する。取得した複数のフレーム画像は、時系列に連続したものである。
 画像分析部3012は、画像取得部3011が取得したフレーム画像を分析する。画像分析部3012は、方向判別部3012a(方向判別手段)と、距離算出部3012b(距離算出手段)と、表情判別部3012c(表情判別手段)と、属性判別部3012d(属性判別手段)とを備える。
 方向判別部3012aは、フレーム画像に含まれるユーザPごとに、そのユーザPの見ている方向を判別する。
 距離算出部3012bは、フレーム画像に含まれるユーザPごとに、コミュニケーションロボット3001とそのユーザPとの距離を算出する。
 表情判別部3012cは、フレーム画像に含まれるユーザPごとに、そのユーザPの顔の表情を判別する。
 属性判別部3012dは、フレーム画像に含まれるユーザPごとに、そのユーザPの属性を判別する。ここで、ユーザPの属性とは、例えば、年齢、性別等をいう。
The control unit 3010 includes an image acquisition unit 3011 (image acquisition unit), an image analysis unit 3012, a user determination processing unit 3013, a guidance data extraction unit 3015, a movement control unit 3016 (movement control unit), and a guidance unit 3017. (Guide means) and a facial expression comparison unit 3018 (facial expression comparison means).
The image acquisition unit 3011 acquires a plurality of frame images via the camera 3031. The acquired plurality of frame images are continuous in time series.
The image analysis unit 3012 analyzes the frame image acquired by the image acquisition unit 3011. The image analysis unit 3012 includes a direction determination unit 3012a (direction determination unit), a distance calculation unit 3012b (distance calculation unit), an expression determination unit 3012c (expression determination unit), and an attribute determination unit 3012d (attribute determination unit). Prepare.
For each user P included in the frame image, the direction determination unit 3012a determines the direction that the user P is viewing.
The distance calculation unit 3012b calculates the distance between the communication robot 3001 and the user P for each user P included in the frame image.
The facial expression determination unit 3012c determines the facial expression of the user P for each user P included in the frame image.
The attribute determination unit 3012d determines the attribute of the user P for each user P included in the frame image. Here, the attribute of the user P refers to, for example, age, sex, and the like.
 ユーザ決定処理部3013は、画像分析部3012によって分析されたフレーム画像に基づき、声掛けをするユーザPを決定する。ユーザ決定処理部3013は、方向変化量算出部3013a(方向変化量算出手段)と、移動速度算出部3013b(移動速度算出手段)と、移動方向算出部3013c(移動方向算出手段)と、案内判断部3013d(案内判断手段)とを備える。
 方向変化量算出部3013aは、方向判別部3012aにより判別されたそのユーザPの見ている方向の時間経過に伴う変化の大きさである方向変化量を算出する。
 移動速度算出部3013bは、距離算出部3012bにより算出されたそのユーザPの距離の時間経過に伴う変化に基づいて、移動速度を算出する。
 移動方向算出部3013cは、距離算出部3012bにより算出されたそのユーザPの距離の時間経過に伴う変化に基づいて、移動方向を算出する。
 案内判断部3013dは、方向変化量等に基づいて、声掛けするユーザPを決定する。
The user determination processing unit 3013 determines a user P to call out based on the frame image analyzed by the image analysis unit 3012. The user determination processing unit 3013 includes a direction change amount calculation unit 3013a (direction change amount calculation unit), a movement speed calculation unit 3013b (movement speed calculation unit), a movement direction calculation unit 3013c (movement direction calculation unit), and guidance determination. Unit 3013d (guidance determination means).
The direction change amount calculation unit 3013a calculates a direction change amount that is the magnitude of a change with time in the direction that the user P is viewing, as determined by the direction determination unit 3012a.
The movement speed calculation unit 3013b calculates the movement speed based on the change of the distance of the user P calculated by the distance calculation unit 3012b with time.
The movement direction calculation unit 3013c calculates the movement direction based on the change of the distance of the user P calculated by the distance calculation unit 3012b over time.
The guidance determination unit 3013d determines the user P to call out based on the direction change amount and the like.
 案内データ抽出部3015は、決定したユーザPに対して案内する案内データを、案内データ記憶部3022から抽出する。
 移動制御部3016は、移動方向算出部3013cにより算出された移動方向に基づいて、決定したユーザPに近づくようにコミュニケーションロボット3001を移動させる。
 案内部3017は、案内データ抽出部3015により抽出された案内データを、スピーカ3032を介して出力する。
 表情比較部3018は、案内前後での表情判別部3012cにより判別された顔の表情を比較する。
 なお、各処理の詳細については、後述する。
The guidance data extraction unit 3015 extracts guidance data for guiding the determined user P from the guidance data storage unit 3022.
The movement control unit 3016 moves the communication robot 3001 so as to approach the determined user P based on the movement direction calculated by the movement direction calculation unit 3013c.
The guide unit 3017 outputs the guide data extracted by the guide data extraction unit 3015 via the speaker 3032.
The expression comparison unit 3018 compares the facial expressions determined by the expression determination unit 3012c before and after guidance.
Details of each process will be described later.
 記憶部3020は、制御部3010が各種の処理を実行するために必要なプログラム、データ等を記憶するための半導体メモリ素子等の記憶領域である。
 記憶部3020は、プログラム記憶部3021と、案内データ記憶部3022と、画像データ記憶部3023とを備える。
 プログラム記憶部3021は、プログラムを記憶する記憶領域である。プログラム記憶部3021は、上述した制御部3010の各種機能を実行するための制御プログラム3021a(プログラム)を記憶している。
 案内データ記憶部3022は、コミュニケーションロボット3001が発する案内データを記憶する記憶領域である。案内データ記憶部3022は、図29に示すように、ユーザPの顔の表情及び属性に対応付けて、声掛けをする案内データを記憶している。また、声掛け後のユーザPの顔の表情に対応付けて、次の案内データを記憶している。
 図28に戻り、画像データ記憶部3023は、画像取得部3011が取得したフレーム画像を記憶する記憶領域である。
The storage unit 3020 is a storage area such as a semiconductor memory element for storing programs, data, and the like necessary for the control unit 3010 to execute various processes.
The storage unit 3020 includes a program storage unit 3021, a guidance data storage unit 3022, and an image data storage unit 3023.
The program storage unit 3021 is a storage area for storing a program. The program storage unit 3021 stores a control program 3021a (program) for executing the various functions of the control unit 3010 described above.
The guide data storage unit 3022 is a storage area for storing guide data issued by the communication robot 3001. As shown in FIG. 29, the guidance data storage unit 3022 stores guidance data for calling out in association with facial expressions and attributes of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
Returning to FIG. 28, the image data storage unit 3023 is a storage area that stores the frame image acquired by the image acquisition unit 3011.
 カメラ3031は、撮影装置である。カメラ3031は、例えば、コミュニケーションロボット3001の目の部分に対応する位置に設けられている。そして、カメラ3031は、コミュニケーションロボット3001が向いた方向の被写体を撮影する。
 スピーカ3032は、音声出力装置である。スピーカ3032は、例えば、コミュニケーションロボット3001の口の部分に対応する位置に設けられている。そして、スピーカ3032は、制御部3010の指示に基づいて音声を出力する。
The camera 3031 is a photographing device. The camera 3031 is provided, for example, at a position corresponding to the eye portion of the communication robot 3001. The camera 3031 shoots the subject in the direction that the communication robot 3001 faces.
The speaker 3032 is an audio output device. The speaker 3032 is provided, for example, at a position corresponding to the mouth portion of the communication robot 3001. The speaker 3032 outputs sound based on an instruction from the control unit 3010.
 モータ部3033は、コミュニケーションロボット3001に動作をさせるためのモータである。モータ部3033は、頭部モータ3033aと、足部モータ3033bとを備える。
 頭部モータ3033aは、コミュニケーションロボット3001の頭部を動かすモータである。頭部モータ3033aは、制御部3010によって制御され、スペース3000S内にいる様々なユーザPを被写体として撮影するために、頭部を上下左右に動かす。
 足部モータ3033bは、コミュニケーションロボット3001の足部にあり、コミュニケーションロボット3001を移動させるためのモータである。
The motor unit 3033 is a motor for causing the communication robot 3001 to operate. The motor unit 3033 includes a head motor 3033a and a foot motor 3033b.
The head motor 3033 a is a motor that moves the head of the communication robot 3001. The head motor 3033a is controlled by the control unit 3010, and moves the head up, down, left, and right in order to photograph various users P in the space 3000S as subjects.
The foot motor 3033b is a motor for moving the communication robot 3001 on the foot of the communication robot 3001.
<コミュニケーションロボット3001の処理>
 次に、コミュニケーションロボット3001の処理について説明する。
 図30は、第3実施形態に係るコミュニケーションロボット3001でのロボット制御処理を示すフローチャートである。
 図31は、第3実施形態に係るコミュニケーションロボット3001での画像分析処理を示すフローチャートである。
 図32は、第3実施形態に係るコミュニケーションロボット3001でのユーザ決定処理を示すフローチャートである。
 図33は、第3実施形態に係るコミュニケーションロボット3001での案内後処理を示すフローチャートである。
<Processing of communication robot 3001>
Next, processing of the communication robot 3001 will be described.
FIG. 30 is a flowchart showing robot control processing in the communication robot 3001 according to the third embodiment.
FIG. 31 is a flowchart showing image analysis processing in the communication robot 3001 according to the third embodiment.
FIG. 32 is a flowchart showing user determination processing in the communication robot 3001 according to the third embodiment.
FIG. 33 is a flowchart showing post-guidance processing in the communication robot 3001 according to the third embodiment.
 図30のステップS(以下、「S」という。)3010において、コミュニケーションロボット3001の制御部3010(画像取得部3011)は、画像取得処理を行う。画像取得処理において、制御部3010は、常にカメラ3031を介してフレーム画像を取得しており、取得したフレーム画像に1以上のユーザPを含むか否かを判断する。そして、フレーム画像に1以上のユーザPを含むと判断された場合には、制御部3010は、適宜のタイミング(例えば、0.1秒ごと)で取得したフレーム画像を、画像データ記憶部3023に記憶させる。
 S3011において、制御部3010は、処理条件になったか否かを判断する。処理条件とは、例えば、画像データ記憶部3023に複数のフレーム画像(例えば、30枚)が記憶された場合や、規定の時間(例えば、3秒)が経過した場合等をいう。処理条件になったと判断された場合(S3011:YES)には、制御部3010は、処理をS3012に移す。他方、処理条件になっていないと判断された場合(S3011:NO)には、制御部3010は、処理をS3010に移す。
In step S (hereinafter referred to as “S”) 3010 in FIG. 30, the control unit 3010 (image acquisition unit 3011) of the communication robot 3001 performs an image acquisition process. In the image acquisition process, the control unit 3010 always acquires a frame image via the camera 3031 and determines whether or not one or more users P are included in the acquired frame image. When it is determined that one or more users P are included in the frame image, the control unit 3010 stores the frame image acquired at an appropriate timing (for example, every 0.1 second) in the image data storage unit 3023. Remember me.
In step S3011, the control unit 3010 determines whether a processing condition has been met. The processing condition refers to, for example, a case where a plurality of frame images (for example, 30 images) are stored in the image data storage unit 3023, a case where a predetermined time (for example, 3 seconds) has elapsed, or the like. If it is determined that the processing condition has been met (S3011: YES), the control unit 3010 moves the process to S3012. On the other hand, if it is determined that the processing condition is not satisfied (S3011: NO), the control unit 3010 moves the process to S3010.
 S3012において、制御部3010(画像分析部3012)は、画像データ記憶部3023に記憶されたフレーム画像を分析する画像分析処理を行う。
 ここで、画像分析処理について、図31に基づき説明する。
 図31のS3030において、制御部3010は、画像データ記憶部3023から1つのフレーム画像を抽出する。
 S3031において、制御部3010(方向判別部3012a)は、フレーム画像に含まれるユーザPごとに、ユーザPの見ている方向を判別する。例えば、フレーム画像に2人のユーザP1,P2が含まれる場合には、制御部3010は、ユーザP1の見ている方向と、ユーザP2の見ている方向とを各々判別する。制御部3010は、例えば、フレーム画像から顔画像を検出し、顔の各パーツ(器官)の配置位置に基づいて、顔の向きを判別できる。よって、制御部3010は、判別した顔の向きをユーザPの見ている方向することができる。
 この顔の各器官(目、鼻、口等)の位置を求める手法としては、様々な公知の手法があるが、例えば、回帰モデルを用いた顔の器官検出手法を用いることができる。そして、回帰モデルを用いた顔の器官検出手法に関しては、例えば、米国特許出願公開第2014/0185924号明細書に記載されている。
In step S <b> 3012, the control unit 3010 (image analysis unit 3012) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
Here, the image analysis processing will be described with reference to FIG.
In S3030 of FIG. 31, the control unit 3010 extracts one frame image from the image data storage unit 3023.
In step S3031, the control unit 3010 (direction determination unit 3012a) determines the direction in which the user P is looking for each user P included in the frame image. For example, when two users P1 and P2 are included in the frame image, the control unit 3010 determines the direction that the user P1 is viewing and the direction that the user P2 is viewing. For example, the control unit 3010 can detect a face image from a frame image and determine the orientation of the face based on the arrangement position of each part (organ) of the face. Therefore, the control unit 3010 can determine the orientation of the determined face in the direction in which the user P is looking.
There are various known methods for obtaining the position of each organ (face, nose, mouth, etc.) of the face. For example, a facial organ detection method using a regression model can be used. A facial organ detection method using a regression model is described in, for example, US Patent Application Publication No. 2014/0185924.
 S3032において、制御部3010(距離算出部3012b)は、フレーム画像に含まれるユーザPごとにユーザPと、コミュニケーションロボット3001との距離を算出する。制御部3010は、例えば、フレーム画像に含まれるユーザPの大きさや、フレーム画像の背景等に基づいて、ユーザPと、コミュニケーションロボット3001との大まかな距離を算出できる。
 S3033において、制御部3010(表情判別部3012c)は、フレーム画像に含まれるユーザPごとに顔の表情を判別する。制御部3010は、顔の表情として、目、鼻、口の形状等から、例えば、笑顔(ポジティブな表情)であるか、悲しい顔(ネガティブな表情)であるか等を判別できる。
In step S <b> 3032, the control unit 3010 (distance calculation unit 3012 b) calculates the distance between the user P and the communication robot 3001 for each user P included in the frame image. For example, the control unit 3010 can calculate a rough distance between the user P and the communication robot 3001 based on the size of the user P included in the frame image, the background of the frame image, and the like.
In step S3033, the control unit 3010 (expression determination unit 3012c) determines the facial expression for each user P included in the frame image. The control unit 3010 can determine, for example, whether it is a smile (positive expression) or a sad face (negative expression) from the shape of the eyes, nose, mouth, etc. as facial expressions.
 S3034において、制御部3010は、画像データ記憶部3023に含まれる処理対象のフレーム画像の全てに対して処理をしたか否かを判断する。処理対象のフレーム画像の全てに対して処理をした場合(S3034:YES)には、制御部3010は、本処理を終了し、処理を図30に移す。他方、処理対象のフレーム画像の全てに対して処理をしていない場合(S3034:NO)には、制御部3010は、処理をS3035に移す。
 S3035において、制御部3010は、画像データ記憶部3023に含まれる処理対象のフレーム画像のうち、未処理のフレーム画像を1つ抽出する。その後、制御部3010は、処理をS3031に移す。
In step S <b> 3034, the control unit 3010 determines whether all of the processing target frame images included in the image data storage unit 3023 have been processed. When processing has been performed on all the frame images to be processed (S3034: YES), the control unit 3010 ends this processing and shifts the processing to FIG. On the other hand, when the processing has not been performed on all the frame images to be processed (S3034: NO), the control unit 3010 moves the process to S3035.
In step S3035, the control unit 3010 extracts one unprocessed frame image from the processing target frame images included in the image data storage unit 3023. Thereafter, the control unit 3010 moves the process to S3031.
 図30に戻り、S3013において、制御部3010(ユーザ決定処理部3013)は、ユーザ決定処理を行う。
 ここで、ユーザ決定処理について、図32に基づき説明する。
 図32のS3040において、制御部3010は、画像分析処理をしたフレーム画像を時系列に並べる。
 S3041において、制御部3010(方向変化量算出部3013a)は、並べ替え後の複数のフレーム画像に基づき、ユーザPごとの方向変化量を算出する。ここで、方向変化量とは、時間経過に伴うユーザPの見ている方向の変化の大きさを表すものをいう。具体的には、ユーザPが激しくきょろきょろしている場合には、きょろきょろしている度合いが高く、制御部3010は、方向変化量を高く算出する。
Returning to FIG. 30, in step S3013, the control unit 3010 (user determination processing unit 3013) performs user determination processing.
Here, the user determination process will be described with reference to FIG.
In S3040 of FIG. 32, the control unit 3010 arranges the frame images subjected to the image analysis process in time series.
In step S3041, the control unit 3010 (direction change amount calculation unit 3013a) calculates the direction change amount for each user P based on the plurality of rearranged frame images. Here, the amount of change in direction refers to the amount of change in the direction in which the user P is looking as time passes. Specifically, when the user P is drenching intensely, the degree of drowning is high, and the control unit 3010 calculates the amount of change in direction to be high.
 S3042において、制御部3010(移動速度算出部3013b)は、並べ替え後の複数のフレーム画像に基づき、ユーザPごとの移動速度を算出する。制御部3010は、複数のフレーム画像のユーザPの位置の変化に基づいて、移動速度を算出できる。なお、制御部3010は、移動速度として、規定範囲より速いか否かを算出するものであってもよい。
 S3043において、制御部3010(移動方向算出部3013c)は、並べ替え後の複数のフレーム画像に基づき、ユーザPごとの移動方向を算出する。制御部3010は、複数のフレーム画像のユーザPの位置の変化に基づいて、移動方向を算出できる。
 S3044において、制御部3010(ユーザ決定処理部3013)は、並べ替え後の複数のフレーム画像に基づき、ユーザPごとに最も多く表れていた表情を判別する。
In S3042, the control unit 3010 (movement speed calculation unit 3013b) calculates the movement speed for each user P based on the rearranged frame images. The control unit 3010 can calculate the moving speed based on the change in the position of the user P among the plurality of frame images. Note that the control unit 3010 may calculate whether the moving speed is faster than a specified range.
In step S3043, the control unit 3010 (movement direction calculation unit 3013c) calculates the movement direction for each user P based on the rearranged frame images. The control unit 3010 can calculate the moving direction based on the change in the position of the user P in the plurality of frame images.
In step S <b> 3044, the control unit 3010 (user determination processing unit 3013) determines the facial expression that appears most frequently for each user P based on the plurality of rearranged frame images.
 S3045において、制御部3010(案内判断部3013d)は、各指標に基づいて、声掛け優先度を算出する。
 ここで、声掛け優先度は、方向変化量と、ユーザPの位置と、移動速度と、表情とに対して各々重み付けをして算出された度合いである。この例では、困っているユーザPを優先して声掛けをするので、重み付けの度合いは、方向変化量に対するものを最も高くする。そして、制御部3010は、方向変化量が大きく、移動速度が小さく、表情がネガティブである場合には、声掛け優先度が最も高くなるように重み付けをする。また、制御部3010は、ユーザPとコミュニケーションロボット3001との距離が、規定範囲(例えば、10m)内である場合に、そのユーザPを声掛け対象にする。
In step S3045, the control unit 3010 (guidance determination unit 3013d) calculates the calling priority based on each index.
Here, the voice call priority is a degree calculated by weighting each of the direction change amount, the position of the user P, the moving speed, and the facial expression. In this example, since the user P who is in trouble is given priority, the degree of weighting is set highest for the direction change amount. Then, when the direction change amount is large, the moving speed is small, and the facial expression is negative, the control unit 3010 performs weighting so that the calling priority is the highest. In addition, when the distance between the user P and the communication robot 3001 is within a specified range (for example, 10 m), the control unit 3010 sets the user P as a calling target.
 S3046において、制御部3010(案内判断部3013d)は、声掛け優先度に基づいて声掛けするユーザPを決定する。
 なお、ユーザPが、方向変化量が小さい場合には、移動速度が遅く、表情がネガティブであっても優先度が小さい値になる。そのような場合、つまり、優先度が所定値未満であれば、制御部3010は、声掛けするユーザPがいないと判断してもよい。
 その後、制御部3010は、本処理を終了し、処理を図30に移す。
In S3046, the control unit 3010 (guidance determination unit 3013d) determines the user P to call on the basis of the calling priority.
Note that when the user P has a small direction change amount, the moving speed is slow, and the priority is a small value even if the facial expression is negative. In such a case, that is, if the priority is less than the predetermined value, the control unit 3010 may determine that there is no user P to call out.
Thereafter, the control unit 3010 ends this processing, and moves the processing to FIG.
 図30に戻り、S3014において、制御部3010(ユーザ決定処理部3013)は、ユーザ決定処理によりユーザPが決定されたか否かを判断する。ユーザPが決定された場合(S3014:YES)には、制御部3010は、処理をS3015に移す。他方、ユーザPが決定されなかった場合(S3014:NO)には、制御部3010は、本処理を終了する。
 S3015において、制御部3010(属性判別部3012d)は、決定したユーザPの属性を判別する。具体的には、制御部3010は、フレーム画像から抽出したユーザPの画像から、おおよその年齢等を判別する。制御部3010は、年齢を判別せずとも、子供であるか、大人であるかを判別するものであってもよい。その場合には、制御部3010は、ユーザPの画像から身長を推定して、子供か否かを判別してもよい。
Returning to FIG. 30, in step S3014, the control unit 3010 (user determination processing unit 3013) determines whether or not the user P is determined by the user determination processing. When the user P is determined (S3014: YES), the control unit 3010 moves the process to S3015. On the other hand, when the user P is not determined (S3014: NO), the control unit 3010 ends this process.
In step S3015, the control unit 3010 (attribute determination unit 3012d) determines the attribute of the determined user P. Specifically, the control unit 3010 determines an approximate age or the like from the image of the user P extracted from the frame image. The control unit 3010 may determine whether the child is an adult without determining the age. In that case, the control unit 3010 may estimate the height from the image of the user P and determine whether or not the child is a child.
 S3016において、制御部3010(案内データ抽出部3015)は、図32のS3044で判別したユーザPの表情及びS3015で判別した属性に基づいて、案内データ記憶部3022(図29)を参照して案内データを抽出する。
 S3017において、制御部3010(移動制御部3016)は、移動処理を行う。移動処理として、制御部3010は、図32のS3042及びS3043で算出したユーザPの移動速度及び移動方向に基づき、モータ部3033(足部モータ3033b)を制御して、ユーザPに近づくようにコミュニケーションロボット3001を移動させる。
 S3018において、制御部3010(案内部3017)は、案内処理を行う。案内処理として、制御部3010は、移動後にユーザPに対して話しかける(声掛けする)ように、S3016で抽出した案内データを、スピーカ3032を介して出力する。
In step S3016, the control unit 3010 (guidance data extraction unit 3015) refers to the guidance data storage unit 3022 (FIG. 29) based on the facial expression of the user P determined in step S3044 in FIG. 32 and the attributes determined in step S3015. Extract data.
In step S3017, the control unit 3010 (movement control unit 3016) performs movement processing. As the moving process, the control unit 3010 controls the motor unit 3033 (foot motor 3033b) based on the moving speed and moving direction of the user P calculated in S3042 and S3043 in FIG. The robot 3001 is moved.
In step S3018, the control unit 3010 (guide unit 3017) performs guidance processing. As the guidance process, the control unit 3010 outputs the guidance data extracted in S3016 via the speaker 3032 so as to speak (speak) to the user P after moving.
 S3019において、制御部3010は、案内後処理を行う。
 ここで、案内後処理について、図33に基づき説明する。
 図33のS3050において、制御部3010(画像取得部3011)は、カメラ3031を介して、案内データを出力後のユーザPを含むフレーム画像を取得する。
 S3051において、制御部3010(表情判別部3012c)は、取得したフレーム画像を分析して顔の表情を判別する。
 S3052において、制御部3010(表情比較部3018)は、表情比較処理を行う。表情比較処理として、制御部3010は、案内データの出力前の顔の表情(図32のS3044での処理)と、S3051で取得した顔の表情とを比較する。
In step S3019, the control unit 3010 performs post-guidance processing.
Here, the post-guidance process will be described with reference to FIG.
In S3050 of FIG. 33, the control unit 3010 (image acquisition unit 3011) acquires a frame image including the user P after outputting the guidance data via the camera 3031.
In step S3051, the control unit 3010 (expression determination unit 3012c) analyzes the acquired frame image to determine the facial expression.
In step S3052, the control unit 3010 (expression comparison unit 3018) performs expression comparison processing. As facial expression comparison processing, the control unit 3010 compares the facial expression before the guidance data is output (the processing in S3044 in FIG. 32) with the facial expression acquired in S3051.
 S3053において、制御部3010(案内データ抽出部3015)は、表情の比較結果に応じた案内データを、案内データ記憶部3022(図29)から抽出する。
 S3054において、制御部3010(案内部3017)は、案内処理として、S3053において抽出した案内データを、スピーカ3032を介して出力させる。その後、制御部3010は、処理を図30に移す。
 図30に戻り、制御部3010は、本処理を終了する。
In step S3053, the control unit 3010 (guidance data extraction unit 3015) extracts guidance data corresponding to the facial expression comparison result from the guidance data storage unit 3022 (FIG. 29).
In step S3054, the control unit 3010 (guidance unit 3017) outputs the guidance data extracted in step S3053 via the speaker 3032 as guidance processing. Thereafter, the control unit 3010 shifts the processing to FIG.
Returning to FIG. 30, the control unit 3010 ends this processing.
<具体例の説明>
 ここで、この案内ロボット制御システム3100を用いた具体例について説明する。
 図34A及び図34Bは、第3実施形態に係る案内ロボット制御システム3100の具体例を示す図である。
 まず、コミュニケーションロボット3001の制御部3010は、図34Aに示すような、迷子になり、きょろきょろしている子供C1のフレーム画像を取得する(図30のS3010)ことで、声掛け対象であると判断する(図30のS3014がYES)。
 次に、制御部3010は、案内データ記憶部3022(図29参照)から、属性が子供であり、表情がネガティブである場合の、声掛け時の案内データを抽出する(図30のS3016)。
 そして、図34A及び図34Bに示すように、制御部3010は、子供C1の位置に向かって移動して(図30のS3017)、抽出した案内データを、スピーカ3032を介して出力する(図30のS3018)。
<Explanation of specific examples>
Here, a specific example using the guidance robot control system 3100 will be described.
34A and 34B are diagrams showing a specific example of the guidance robot control system 3100 according to the third embodiment.
First, the control unit 3010 of the communication robot 3001 obtains a frame image of the child C1 who is lost and crawls as shown in FIG. 34A (S3010 in FIG. 30), and determines that it is a call target. (S3014 in FIG. 30 is YES).
Next, the control unit 3010 extracts guidance data at the time of calling when the attribute is a child and the expression is negative from the guidance data storage unit 3022 (see FIG. 29) (S3016 in FIG. 30).
34A and 34B, the control unit 3010 moves toward the position of the child C1 (S3017 in FIG. 30), and outputs the extracted guidance data via the speaker 3032 (FIG. 30). S3018).
 これにより、コミュニケーションロボット3001は、子供C1の状態を、子供C1の動作や表情によって推測して、子供C1の意図に合った声掛けをすることができる。
 その後、制御部3010は、子供C1のフレーム画像を取得し、図34Cに示すように、子供C1の表情がネガティブのままであれば、制御部3010は、案内データ記憶部3022から声掛け後の該当の表情の案内データを抽出して、スピーカ3032を介して出力する(図30のS3019)。
As a result, the communication robot 3001 can guess the state of the child C1 based on the movement and facial expression of the child C1, and can make a call suitable for the intention of the child C1.
Thereafter, the control unit 3010 obtains a frame image of the child C1, and if the facial expression of the child C1 remains negative as shown in FIG. 34C, the control unit 3010 reads the voice data from the guidance data storage unit 3022. Guidance data of the corresponding facial expression is extracted and output via the speaker 3032 (S3019 in FIG. 30).
 このように、第3実施形態の案内ロボット制御システム3100によれば、以下のような効果がある。
 (1)複数のフレーム画像からユーザPの見ている方向の変化量を算出する。方向変化量は、ユーザPがきょろきょろしているような場合には、大きい値になるので、コミュニケーションロボット3001は、そのユーザPが困っていると判断して声掛けをする。よって、ユーザPの見ている方向の変化量の大きさによって、困っているか否かというユーザPの状態を推測し、困っていると推測したユーザPに対して、コミュニケーションロボット3001がユーザPの意図に合った声掛けを優先的に行うことができる。
Thus, according to the guidance robot control system 3100 of the third embodiment, there are the following effects.
(1) The amount of change in the direction in which the user P is looking is calculated from a plurality of frame images. Since the direction change amount becomes a large value when the user P is jogging, the communication robot 3001 determines that the user P is in trouble and calls out. Therefore, the state of the user P as to whether or not the user P is in trouble is estimated based on the amount of change in the direction in which the user P is looking, and the communication robot 3001 determines that the user P is in trouble. It is possible to preferentially make a voice call that suits the intention.
 (2)ユーザPの位置とコミュニケーションロボット3001の位置とが、規定範囲内である場合には、コミュニケーションロボット3001がユーザPに対して声掛けをする。よって、近くにいるユーザPに対して、コミュニケーションロボット3001が優先的に話しかけることができる。
 (3)ユーザPの移動速度が規定範囲内である場合には、コミュニケーションロボット3001がユーザPに対して声掛けをする。よって、ユーザPがゆっくり歩いていたり、立ち止まったりする場合に、コミュニケーションロボット3001が優先的に話しかけることができる。
 (4)ユーザPの移動位置を推測して、コミュニケーションロボット3001が近づくことで、ユーザPに対するコミュニケーションロボット3001による話しかけが、より自然に行われるようにできる。
(2) When the position of the user P and the position of the communication robot 3001 are within the specified range, the communication robot 3001 calls out to the user P. Therefore, the communication robot 3001 can preferentially talk to the user P nearby.
(3) When the moving speed of the user P is within the specified range, the communication robot 3001 calls out to the user P. Therefore, when the user P is walking slowly or stops, the communication robot 3001 can preferentially talk.
(4) When the movement position of the user P is estimated and the communication robot 3001 approaches, the conversation by the communication robot 3001 with respect to the user P can be performed more naturally.
 (5)ユーザPの表情に基づいて、案内データを選定して、コミュニケーションロボット3001が、ユーザPに対して声掛けをする。よって、コミュニケーションロボット3001は、ユーザPの表情に合った内容の案内による話しかけをすることができる。
 (6)声掛け前後のユーザPの表情の変化に基づいて、コミュニケーションロボット3001は、さらに次の案内をする。よって、コミュニケーションロボット3001は、よりコミュニケーションがとれた内容の話しかけをすることができる。
(5) Based on the facial expression of the user P, the guidance data is selected, and the communication robot 3001 speaks to the user P. Therefore, the communication robot 3001 can talk by guiding the content that matches the facial expression of the user P.
(6) Based on the change in the facial expression of the user P before and after calling, the communication robot 3001 further guides the next. Therefore, the communication robot 3001 can talk about the content that can be communicated more.
 (7)ユーザPの年齢等の属性をフレーム画像から判別して、ユーザPの属性に対応する案内データを選定して、コミュニケーションロボット3001が、ユーザPに対して声掛けをする。よって、ユーザPの属性に合った内容の案内による話しかけをすることができる。
 (8)フレーム画像の取得から案内データの出力までの全ての処理を、コミュニケーションロボット3001が行うことができる。よって、コミュニケーションロボット3001のみを準備すればよく、設置を容易に行うことができる。
(7) An attribute such as the age of the user P is discriminated from the frame image, guidance data corresponding to the attribute of the user P is selected, and the communication robot 3001 calls out to the user P. Therefore, it is possible to talk by guiding the content that matches the attribute of the user P.
(8) The communication robot 3001 can perform all processing from acquisition of frame images to output of guidance data. Therefore, only the communication robot 3001 needs to be prepared, and installation can be performed easily.
(第4実施形態)
 第4実施形態では、コミュニケーションロボットにユーザの呼び込みを行わせるものについて説明する。なお、以降の説明において、上述した第3実施形態と同様の機能を果たす部分には、同一の符号又は末尾に同一の符号を付して、重複する説明を適宜省略する。
(Fourth embodiment)
In the fourth embodiment, a description will be given of what causes a communication robot to call a user. In the following description, parts that perform the same functions as those in the third embodiment described above are given the same reference numerals or the same reference numerals at the end, and redundant descriptions are omitted as appropriate.
<案内ロボット制御システム3200>
 図35は、第4実施形態に係る案内ロボット制御システム3200の運用に適する場面例を示す図である。
 図36は、第4実施形態に係る案内ロボット制御システム3200の機能ブロック図である。
 図37は、第4実施形態に係るコミュニケーションロボット3201の案内データ記憶部3222の例を示す図である。
<Guiding robot control system 3200>
FIG. 35 is a diagram illustrating an example of a scene suitable for operation of the guidance robot control system 3200 according to the fourth embodiment.
FIG. 36 is a functional block diagram of a guidance robot control system 3200 according to the fourth embodiment.
FIG. 37 is a diagram illustrating an example of the guidance data storage unit 3222 of the communication robot 3201 according to the fourth embodiment.
 案内ロボット制御システム3200は、コミュニケーションロボット3201を、例えば、展示ブースや店舗の出入口等のスペース3200Sの入口に配置する。そして、案内ロボット制御システム3200は、ユーザP(この例では、ユーザP4)に対してスペース3200Sへの入室等をさせるための呼び込みを、コミュニケーションロボット3201が行うために用いるシステムである。
 案内ロボット制御システム3200は、コミュニケーションロボット3201と、カメラ3205とを備える。コミュニケーションロボット3201と、カメラ3205とは、スペース3200Sの入口付近に設置される。
The guidance robot control system 3200 places the communication robot 3201 at the entrance of a space 3200S such as an exhibition booth or a store entrance. The guidance robot control system 3200 is a system used by the communication robot 3201 to call the user P (in this example, the user P4) to enter the space 3200S.
The guidance robot control system 3200 includes a communication robot 3201 and a camera 3205. Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S.
 カメラ3205は、この例では、監視カメラである。カメラ3205は、スペース3200Sを通りかかるユーザPのフレーム画像を取得して、コミュニケーションロボット3201に対してフレーム画像を送信する。
 コミュニケーションロボット3201は、カメラ3205から受信したユーザPのフレーム画像を分析し、声掛け対象のユーザPを判断して、ユーザPに対して案内をする。コミュニケーションロボット3201は、カメラ3205の前を通り過ぎるユーザPに対して、スペース3200Sに対する興味がありそうか否かを判断して声掛けをする。
In this example, the camera 3205 is a surveillance camera. The camera 3205 acquires the frame image of the user P passing through the space 3200S, and transmits the frame image to the communication robot 3201.
The communication robot 3201 analyzes the frame image of the user P received from the camera 3205, determines the user P to be called, and provides guidance to the user P. The communication robot 3201 calls out the user P who passes in front of the camera 3205 by determining whether he / she is interested in the space 3200S.
<コミュニケーションロボット3201>
 図36に示すように、コミュニケーションロボット3201は、制御部3210と、記憶部3220と、スピーカ3032と、モータ部3033とを備える。
 制御部3210は、画像受付部3211(画像取得手段)と、画像分析部3212と、ユーザ決定処理部3213と、顔データ照合部(顔照合手段)3214と、案内データ抽出部3215と、移動制御部3016と、案内部3017と、表情比較部3018とを備える。
 画像受付部3211は、カメラ3205が送信したフレーム画像を受信することで、フレーム画像を受け付ける。
<Communication robot 3201>
As illustrated in FIG. 36, the communication robot 3201 includes a control unit 3210, a storage unit 3220, a speaker 3032, and a motor unit 3033.
The control unit 3210 includes an image reception unit 3211 (image acquisition unit), an image analysis unit 3212, a user determination processing unit 3213, a face data collation unit (face collation unit) 3214, a guidance data extraction unit 3215, and movement control. A unit 3016, a guide unit 3017, and a facial expression comparison unit 3018 are provided.
The image reception unit 3211 receives the frame image transmitted by the camera 3205, thereby receiving the frame image.
 画像分析部3212は、画像受付部3211が受け付けたフレーム画像を分析する。画像分析部3212は、方向取得部3212a(ユーザ方向取得手段)と、距離算出部3012bと、表情判別部3212cと、顔取得部3212e(顔取得手段)とを備える。
 方向取得部3212aは、フレーム画像を分析してユーザPを特定し、ユーザPごとにそのユーザPの見ている方向を判別して、ユーザPの見ている方向を取得する。
 表情判別部3212cは、顔取得部3212eが取得したユーザPの顔データに基づいて、そのユーザPの顔の表情を判別する。
 顔取得部3212eは、フレーム画像に含まれるユーザPごとに、そのユーザPの顔データを取得する。
The image analysis unit 3212 analyzes the frame image received by the image reception unit 3211. The image analysis unit 3212 includes a direction acquisition unit 3212a (user direction acquisition unit), a distance calculation unit 3012b, a facial expression determination unit 3212c, and a face acquisition unit 3212e (face acquisition unit).
The direction acquisition unit 3212a analyzes the frame image to identify the user P, determines the direction that the user P is viewing for each user P, and acquires the direction that the user P is viewing.
The facial expression determination unit 3212c determines the facial expression of the user P based on the face data of the user P acquired by the face acquisition unit 3212e.
The face acquisition unit 3212e acquires the face data of the user P for each user P included in the frame image.
 ユーザ決定処理部3213は、画像分析部3212によって分析されたフレーム画像に基づき、時間経過に伴う変化によって、声掛けをするユーザPを決定する。ユーザ決定処理部3213は、移動速度算出部3013bと、移動方向算出部3013cと、案内判断部3213dとを備える。
 案内判断部3213dは、方向取得部3212aにより取得したユーザPの見ている方向、移動方向算出部3013cにより算出した移動方向等に基づいて、声掛けするユーザPを決定する。
 顔データ照合部3214は、顔データ記憶部3224に記憶された顔データと、顔取得部3212eにより取得した顔データとを照合する。
 案内データ抽出部3215は、ユーザ決定処理部3213により決定したユーザPに対して案内する案内データを、案内データ記憶部3222から抽出する。その際、案内データ抽出部3215は、顔データ照合部3214により照合できた場合には、前回とは異なる案内データを抽出する。
The user determination processing unit 3213 determines a user P who is to talk to based on a change with time based on the frame image analyzed by the image analysis unit 3212. The user determination processing unit 3213 includes a movement speed calculation unit 3013b, a movement direction calculation unit 3013c, and a guidance determination unit 3213d.
The guidance determination unit 3213d determines the user P to call on, based on the viewing direction of the user P acquired by the direction acquisition unit 3212a, the movement direction calculated by the movement direction calculation unit 3013c, and the like.
The face data collation unit 3214 collates the face data stored in the face data storage unit 3224 and the face data acquired by the face acquisition unit 3212e.
The guidance data extraction unit 3215 extracts the guidance data for guiding the user P determined by the user determination processing unit 3213 from the guidance data storage unit 3222. At that time, the guidance data extraction unit 3215 extracts guidance data different from the previous one when the face data matching unit 3214 can collate.
 記憶部3220は、プログラム記憶部3021と、案内データ記憶部3222と、画像データ記憶部3023と、顔データ記憶部3224とを備える。
 プログラム記憶部3021は、上述した制御部3210の各種機能を実行するための制御プログラム3221aを記憶している。
 案内データ記憶部3222は、図37に示すように、ユーザPの顔の表情に対応付けて、声掛けをする案内データを記憶している。また、声掛け後のユーザPの顔の表情に対応付けて、次の案内データを記憶している。
 顔データ記憶部3224は、顔データを記憶する記憶領域である。顔データ記憶部3224は、案内データを出力した場合に、顔取得部3212eが取得した顔データと、出力した案内データとを対応付けて記憶する。
The storage unit 3220 includes a program storage unit 3021, a guidance data storage unit 3222, an image data storage unit 3023, and a face data storage unit 3224.
The program storage unit 3021 stores a control program 3221a for executing the various functions of the control unit 3210 described above.
As shown in FIG. 37, the guidance data storage unit 3222 stores guidance data for calling in association with facial expressions of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
The face data storage unit 3224 is a storage area for storing face data. When the guide data is output, the face data storage unit 3224 stores the face data acquired by the face acquisition unit 3212e in association with the output guide data.
<カメラ3205>
 カメラ3205は、例えば、Webカメラに代表されるライブカメラである。カメラ3205は、ステレオカメラであってもよい。ステレオカメラとは、複数個のレンズで微妙にずれた位置から同時に撮影し、視差を生み出すことによって、物体の奥行きに関する情報を取得できるようにしたカメラをいう。
 カメラ3205は、画像取得部3251と、画像送信部3252とを備える。
 画像取得部3251は、フレーム画像を取得する制御部である。
 画像送信部3252は、取得したフレーム画像を、通信ネットワーク3000Nを介して送信する制御部である。
 コミュニケーションロボット3201と、カメラ3205とは、通信ネットワーク3000Nを介して通信可能に接続されている。通信ネットワーク3000Nは、例えば、インターネット回線や携帯端末通信網等である。また、通信ネットワーク3000Nは、例えば、Bluetooth(登録商標)規格に基づく短距離無線通信等を用いるものであってもよい。
<Camera 3205>
The camera 3205 is a live camera represented by a web camera, for example. The camera 3205 may be a stereo camera. A stereo camera refers to a camera that can capture information about the depth of an object by simultaneously capturing images from slightly shifted positions with a plurality of lenses and generating parallax.
The camera 3205 includes an image acquisition unit 3251 and an image transmission unit 3252.
The image acquisition unit 3251 is a control unit that acquires a frame image.
The image transmission unit 3252 is a control unit that transmits the acquired frame image via the communication network 3000N.
Communication robot 3201 and camera 3205 are communicably connected via communication network 3000N. The communication network 3000N is, for example, an Internet line or a mobile terminal communication network. The communication network 3000N may use, for example, short-range wireless communication based on the Bluetooth (registered trademark) standard.
<コミュニケーションロボット3201の処理>
 次に、コミュニケーションロボット3201の処理について説明する。
 図38は、第4実施形態に係るコミュニケーションロボット3201でのロボット制御処理を示すフローチャートである。
 図39は、第4実施形態に係るコミュニケーションロボット3201での画像分析処理を示すフローチャートである。
 図40は、第4実施形態に係るコミュニケーションロボット3201でのユーザ決定処理を示すフローチャートである。
<Processing of communication robot 3201>
Next, processing of the communication robot 3201 will be described.
FIG. 38 is a flowchart showing robot control processing in the communication robot 3201 according to the fourth embodiment.
FIG. 39 is a flowchart showing image analysis processing in the communication robot 3201 according to the fourth embodiment.
FIG. 40 is a flowchart showing user determination processing in the communication robot 3201 according to the fourth embodiment.
 図38のS3210において、コミュニケーションロボット3201の制御部3210(画像受付部3211)は、カメラ3205が送信したフレーム画像を受信する。そして、制御部3210は、受信したフレーム画像を、画像データ記憶部3023に記憶させる。
 S3211において、制御部3210(画像分析部3212)は、画像データ記憶部3023に記憶されたフレーム画像を分析する画像分析処理を行う。
 ここで、画像分析処理について、図39に基づき説明する。
In S3210 of FIG. 38, the control unit 3210 (image receiving unit 3211) of the communication robot 3201 receives the frame image transmitted by the camera 3205. Then, the control unit 3210 stores the received frame image in the image data storage unit 3023.
In step S <b> 3211, the control unit 3210 (image analysis unit 3212) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
Here, the image analysis processing will be described with reference to FIG.
 図39のS3230において、制御部3210(方向取得部3212a)は、フレーム画像に含まれるユーザPを検出し、ユーザPの見ている方向を取得する。例えば、フレーム画像に2人のユーザP1,P2が含まれる場合には、制御部3210は、ユーザP1及びP2を検出し、各々のユーザPの見ている方向を取得する。
 ユーザPの検出としては、様々な公知の手法があるが、例えば、動体検知の手法を用いることができる。動体検知の手法は、予め、ユーザPが写っていないフレーム画像を、背景画像データとして記憶しておき、背景画像データと、処理対象のフレーム画像との差分を求めることで、動体であるユーザPを検出するものである。また、HOG特徴量を用いた人検出等を用いることもできる。HOG特徴量を用いた人検出に関しては、「N. Dalal and B.Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, pages 886-893, 2005」に記載されている。
In S3230 of FIG. 39, the control unit 3210 (direction acquisition unit 3212a) detects the user P included in the frame image and acquires the direction in which the user P is viewing. For example, when two users P1 and P2 are included in the frame image, the control unit 3210 detects the users P1 and P2, and acquires the direction in which each user P is looking.
There are various known methods for detecting the user P. For example, a moving object detection method can be used. In the moving object detection method, a frame image in which the user P is not captured is stored as background image data in advance, and a difference between the background image data and the frame image to be processed is obtained, so that the user P who is a moving object. Is detected. Further, human detection using the HOG feature amount can also be used. Human detection using HOG features is described in “N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection.
 また、ユーザPの見ている方向として、制御部3210は、例えば、ユーザPの顔の向きを取得する。ユーザPの顔の向きは、顔検出の手法を用いて、制御部3210(顔取得部3212e)によって顔が検出できた場合に、カメラ3205に対して正面方向であるとするものである。顔検出の手法としては、様々な公知の手法があるが、例えば、Haar-like特徴を用いた顔検出を用いることができる。Haar-like特徴を用いた顔検出に関しては、「P. Viola and M. J. Jones: “Rapid Object Detection Using a Boosted Cascade of Simple Features”,Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.511-518, (2001).」に記載されている。
 なお、制御部3210は、フレーム画像から顔画像を検出し、その中から目を抽出して、眼球と瞳孔との相対位置に基づいてユーザPの見ている方向を取得してもよい。
In addition, as the direction in which the user P is viewing, the control unit 3210 acquires, for example, the orientation of the face of the user P. The face direction of the user P is assumed to be the front direction with respect to the camera 3205 when the face can be detected by the control unit 3210 (face acquisition unit 3212e) using the face detection method. There are various known methods for detecting a face. For example, face detection using a Haar-like feature can be used. For face detection using Haar-like features, see “P. Viola and M. J. Jones:“ Rapid Object Detection using the Coordinates of the Cofesto et Ecée et al. ”, Proceedings of the Effort. , Pp. 511-518, (2001).
Note that the control unit 3210 may detect a face image from the frame image, extract eyes from the face image, and acquire the direction in which the user P is looking based on the relative position between the eyeball and the pupil.
 S3231及びS3232の処理は、第3実施形態(図31)のS3032及びS3033の処理と同様である。なお、S3232では、制御部3210(顔取得部3212e)が顔データを取得し、取得した顔データに基づいて、制御部3210(表情判別部3212c)が表情を判別する。その後、制御部3210は、処理を図38に移す。 The processing of S3231 and S3232 is the same as the processing of S3032 and S3033 of the third embodiment (FIG. 31). In step S3232, the control unit 3210 (face acquisition unit 3212e) acquires face data, and the control unit 3210 (expression determination unit 3212c) determines the expression based on the acquired face data. Thereafter, the control unit 3210 shifts the processing to FIG.
 図38に戻り、S3212において、制御部3210(ユーザ決定処理部3213)は、ユーザ決定処理を行う。
 ここで、ユーザ決定処理について、図40に基づき説明する。
 図40のS3240において、制御部3210は、画像分析処理をしたフレーム画像を時系列に並べる。
 S3241において、制御部3210(方向取得部3212a)は、並べ替え後の複数のフレーム画像に基づき、ユーザPごとの見ている方向を取得する。ここでは、複数のフレーム画像において、ユーザPが見ている方向が同じ方向である場合に、制御部3210は、その方向を、見ている方向として取得する。よって、例えば、ユーザPがきょろきょろしている場合には、制御部3210は、見ている方向を取得しない。
Returning to FIG. 38, in step S3212, the control unit 3210 (user determination processing unit 3213) performs user determination processing.
Here, the user determination process will be described with reference to FIG.
In S3240 of FIG. 40, the control unit 3210 arranges the frame images subjected to the image analysis process in time series.
In S3241, the control unit 3210 (direction acquisition unit 3212a) acquires the viewing direction for each user P based on the plurality of rearranged frame images. Here, in a plurality of frame images, when the direction in which the user P is viewing is the same direction, the control unit 3210 acquires the direction as the viewing direction. Therefore, for example, when the user P is jogging, the control unit 3210 does not acquire the viewing direction.
 S3242からS3244までの処理は、第3実施形態(図32)のS3042からS3044までの処理と同様である。なお、ユーザPごとの移動方向の算出においては、例えば、Mean-shift特徴を用いた人体トラッキングを用いることができる。Mean-shift特徴を用いた人体トラッキングに関しては、「Collins R.: ‘Mean-Shift Blob Tracking through Scale Space’, Proc. IEEE Conf. Computer Vision and Pattern Recognition, Wisconsin, USA, 2003, pp. 234-240.」に記載されている。 The processing from S3242 to S3244 is the same as the processing from S3042 to S3044 of the third embodiment (FIG. 32). In calculating the moving direction for each user P, for example, human body tracking using the Mean-shift feature can be used. For human body tracking using the Mean-shift feature, see “Collins R .: 'Mean-Shift Blob Tracking, Throw Scale Space Space, 4-Proc. IEEE Conf. ."It is described in.
 S3245において、制御部3210(案内判断部3213d)は、各指標に基づいて、声掛け優先度を算出する。
 ここで、声掛け優先度は、ユーザPの見ている方向と、移動方向と、ユーザPの位置と、移動速度とに対して各々重み付けをして算出された度合いである。この例では、スペース3200Sに興味を持っているが、通り過ぎようとしているユーザPを優先して声掛けをするので、重み付けの度合いは、ユーザPの見ている方向と、移動方向が異なる方向であり、ユーザPが正面方向(カメラ3205の方向)を向いている場合を、最も高くする。そして、制御部3210は、上記の条件を満たし、移動速度が遅い場合には、声掛け優先度が最も高くなるように重み付けをする。また、制御部3210は、ユーザPとコミュニケーションロボット3201との距離が、規定範囲(例えば、5m)内である場合に、そのユーザPを声掛け対象にする。
 S3246において、制御部3210(案内判断部3213d)は、声掛け優先度に基づいて声掛けするユーザPを決定する。
In step S3245, the control unit 3210 (guidance determination unit 3213d) calculates the calling priority based on each index.
Here, the voice call priority is a degree calculated by weighting each of the direction that the user P is viewing, the moving direction, the position of the user P, and the moving speed. In this example, the user 3 is interested in the space 3200S, but the user P who is about to pass is preferentially spoken, so the degree of weighting is different in the direction in which the user P is viewing and the direction of movement. Yes, when the user P is facing the front direction (the direction of the camera 3205), the highest is set. Then, the control unit 3210 performs weighting so that the calling priority is the highest when the above condition is satisfied and the moving speed is low. In addition, when the distance between the user P and the communication robot 3201 is within a specified range (for example, 5 m), the control unit 3210 sets the user P as a calling target.
In step S3246, the control unit 3210 (guidance determination unit 3213d) determines the user P to call based on the calling priority.
 なお、ユーザPが正面方向を向いていない場合には、ユーザPの見ている方向と、移動方向が異なる方向であって、ユーザPとコミュニケーションロボット3201との距離が、規定範囲内であっても、制御部3210は、声掛けするユーザPがいないと判断してもよい。
 その後、制御部3210は、本処理を終了し、処理を図38に移す。
When the user P is not facing the front direction, the direction that the user P is viewing is different from the moving direction, and the distance between the user P and the communication robot 3201 is within a specified range. Alternatively, the control unit 3210 may determine that there is no user P to call out.
Thereafter, the control unit 3210 ends this processing, and moves the processing to FIG.
 図38に戻り、S3213において、制御部3210(ユーザ決定処理部3213)は、ユーザ決定処理によりユーザPが決定されたか否かを判断する。ユーザPが決定された場合(S3213:YES)には、制御部3210は、処理をS3214に移す。他方、ユーザPが決定されなかった場合(S3213:NO)には、制御部3210は、本処理を終了する。
 S3214において、制御部3210(顔データ照合部3214)は、図39のS3232において取得した顔データを、顔データ記憶部3224の顔データと照合する。
 S3215において、制御部3210(顔データ照合部3214)は、照合できたか否かを判断する。照合できた場合(S3215:YES)には、制御部3210は、処理をS3216に移す。他方、照合できなかった場合(S3215:NO)には、制御部3210は、処理をS3216aに移す。なお、照合できた場合とは、過去にそのユーザPに対する声掛けをした場合である。
Returning to FIG. 38, in step S3213, the control unit 3210 (user determination processing unit 3213) determines whether or not the user P is determined by the user determination processing. When the user P is determined (S3213: YES), the control unit 3210 moves the process to S3214. On the other hand, when the user P is not determined (S3213: NO), the control unit 3210 ends this process.
In step S3214, the control unit 3210 (face data collation unit 3214) collates the face data acquired in step S3232 of FIG. 39 with the face data in the face data storage unit 3224.
In step S3215, the control unit 3210 (face data collation unit 3214) determines whether or not collation has been completed. If collation is successful (S3215: YES), control unit 3210 advances the process to S3216. On the other hand, if the verification has failed (S3215: NO), control unit 3210 moves the process to S3216a. In addition, the case where it can collate is a case where the user P was called in the past.
 S3216において、制御部3210は、顔データ記憶部3224及び案内データ記憶部3222を参照し、出力済の案内データとは異なる案内データであって、ユーザPの表情に対応した案内データを、案内データ記憶部3222から抽出する。
 他方、S3216aにおいて、制御部3210は、案内データ記憶部3222を参照し、ユーザPの表情に対応した案内データを抽出する。
 S3217及びS3218の処理は、第3実施形態(図30)のS3017及びS3018の処理と同様である。
 S3219において、制御部3210は、ユーザPの顔データと、出力した案内データとを対応付けて、顔データ記憶部3224に記憶させる。
 S3220において、制御部3210は、案内後処理を行う。案内後処理は、第3実施形態(図33)と同様である。その後、制御部3210は、本処理を終了する。
In step S3216, the control unit 3210 refers to the face data storage unit 3224 and the guide data storage unit 3222, and guide data corresponding to the facial expression of the user P is guide data that is different from the output guide data. Extracted from the storage unit 3222.
On the other hand, in S3216a, the control unit 3210 refers to the guidance data storage unit 3222 and extracts guidance data corresponding to the facial expression of the user P.
The processing of S3217 and S3218 is the same as the processing of S3017 and S3018 of the third embodiment (FIG. 30).
In step S <b> 3219, the control unit 3210 associates the face data of the user P with the output guidance data and stores the data in the face data storage unit 3224.
In step S3220, the control unit 3210 performs post-guidance processing. The post-guidance process is the same as in the third embodiment (FIG. 33). Thereafter, the control unit 3210 ends this process.
<具体例の説明>
 次に、この案内ロボット制御システム3200を用いた具体例について説明する。
 図41A及び図41Bは、第4実施形態に係る案内ロボット制御システム3200の具体例を示す図である。
 コミュニケーションロボット3201及びカメラ3205は、スペース3200Sの入口付近に設置されている。そして、カメラ3205の画像取得部3251は、フレーム画像を随時取得し、画像送信部3252は、取得したフレーム画像を、コミュニケーションロボット3201に対して随時送信している。
<Explanation of specific examples>
Next, a specific example using this guidance robot control system 3200 will be described.
41A and 41B are diagrams showing a specific example of a guidance robot control system 3200 according to the fourth embodiment.
Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S. The image acquisition unit 3251 of the camera 3205 acquires a frame image as needed, and the image transmission unit 3252 transmits the acquired frame image to the communication robot 3201 as needed.
 制御部3210は、図41Aに示すように、スペース3200Sの前を歩いており、顔の向きがコミュニケーションロボット3201を向いているユーザP4を、声掛け対象とする(図38のS3213がYES)。
 次に、制御部3210は、取得した顔データが顔データ記憶部3224に記憶されているか否かを判断し、記憶されていないと判断した場合に、案内データ記憶部3222(図37参照)から、表情に対応した声掛け時の案内データを抽出する(図38のS3216a)。そして、コミュニケーションロボット3201は、抽出した案内データを、スピーカ3032を介して出力する(図38のS3218)。この例は、コミュニケーションロボット3201は、ユーザP4に対して「どうぞ店内にお立ち寄りください。」と声掛けをする。
 そして、そこでユーザP4が立ち止まってコミュニケーションロボット3201を見た場合には、さらに声掛けをする(図38のS3220)。
As shown in FIG. 41A, the control unit 3210 walks in front of the space 3200S and targets the user P4 whose face is facing the communication robot 3201 (YES in S3213 in FIG. 38).
Next, the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the acquired face data is not stored, the control unit 3210 starts from the guidance data storage unit 3222 (see FIG. 37). Then, guidance data at the time of calling corresponding to the facial expression is extracted (S3216a in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 asks the user P4 “Please drop in at the store”.
Then, when the user P4 stops and looks at the communication robot 3201, a further call is made (S3220 in FIG. 38).
 しばらくたって、同じユーザP4が、図41Bに示すように、スペース3200Sの前を歩いており、顔の向きがコミュニケーションロボット3201を向いている場合には、制御部3210は、ユーザP4を、声掛け対象とする(図38のS3213がYES)。
 次に、制御部3210は、取得した顔データが顔データ記憶部3224に記憶されているか否かを判断し、先ほど案内をしたユーザP4であると判断した場合に、案内データ記憶部3222(図37参照)から、表情に対応した、先ほどとは異なる声掛け時の案内データを抽出する(図38のS3216)。そして、コミュニケーションロボット3201は、抽出した案内データを、スピーカ3032を介して出力する(図38のS3218)。この例は、コミュニケーションロボット3201は、ユーザP4に対して「さっきも通ったよね?」と声掛けをする。
After a while, when the same user P4 is walking in front of the space 3200S and the face is facing the communication robot 3201, as shown in FIG. 41B, the control unit 3210 calls the user P4. Target (S3213 in FIG. 38 is YES).
Next, the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the user P4 has provided guidance, the control data storage unit 3222 (see FIG. 37), the guidance data at the time of calling different from the previous one corresponding to the facial expression is extracted (S3216 in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 calls out to the user P4, “You have already passed?”.
 このように、第4実施形態によれば、案内ロボット制御システム3200は、以下のような効果がある。
 (1)コミュニケーションロボット3201は、ユーザPの見ている方向と、ユーザPの移動方向との関係によって、そのユーザPに声掛けをするか否かを判断して声掛けをする。よって、例えば、スペース3200Sの前を通り過ぎながらスペース3200Sの方向を見ているユーザPに対して、このスペース3200Sに興味があると推測して、ユーザPに声掛けをすることができる。
 (2)コミュニケーションロボット3201は、ユーザPの見ている方向を、ユーザPの顔が検出できたか否かによって取得できるので、処理を容易に行うことができる。
Thus, according to the fourth embodiment, the guidance robot control system 3200 has the following effects.
(1) The communication robot 3201 speaks by determining whether or not to speak to the user P based on the relationship between the direction in which the user P is viewing and the moving direction of the user P. Therefore, for example, the user P who is looking in the direction of the space 3200S while passing in front of the space 3200S can be presumed to be interested in the space 3200S and can be spoken to the user P.
(2) Since the communication robot 3201 can acquire the direction in which the user P is looking based on whether or not the face of the user P has been detected, the processing can be easily performed.
 (3)コミュニケーションロボット3201は、案内した相手であるユーザPの顔データを記憶しておくので、同じユーザPに対して、次に案内をする場合には、前回のものとは異なる案内データを出力することができる。よって、顔認証の機能を用いることで、案内を変えることができるため、異なる声掛けをすることができる。
 (4)コミュニケーションロボット3201は、カメラ3205が取得したフレーム画像を用いて処理をするので、コミュニケーションロボット3201が移動しても、カメラ3205は固定されており、フレーム画像を取得する位置が変わらないので、例えば、時系列に並べた複数のフレーム画像の処理を容易に行うことができる。
(3) Since the communication robot 3201 stores the face data of the user P who has been guided, when the next guidance is given to the same user P, different guidance data from the previous one is provided. Can be output. Therefore, since the guidance can be changed by using the face authentication function, different voice calls can be made.
(4) Since the communication robot 3201 performs processing using the frame image acquired by the camera 3205, even if the communication robot 3201 moves, the camera 3205 is fixed and the position for acquiring the frame image does not change. For example, it is possible to easily process a plurality of frame images arranged in time series.
 以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されるものではない。また、実施形態に記載した効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載したものに限定されない。なお、上述した実施形態及び後述する変形形態は、適宜組み合わせて用いることもできるが、詳細な説明は省略する。 As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above. In addition, the effects described in the embodiments are merely a list of the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the embodiments. In addition, although embodiment mentioned above and the deformation | transformation form mentioned later can also be used in combination suitably, detailed description is abbreviate | omitted.
(変形形態)
 (1)第3実施形態では、コミュニケーションロボットがカメラを備え、第4実施形態では、コミュニケーションロボットがカメラからフレーム画像を受信するものを例に説明したが、これに限定されない。
 例えば、第3実施形態では、コミュニケーションロボットがカメラからフレーム画像を受信してもよい。その場合には、カメラの位置は、コミュニケーションロボットの近傍に限定されず、スペースの天井等に設置してもよい。また、複数台のカメラからフレーム画像を取得してもよい。
 また、第4実施形態では、カメラを備えたコミュニケーションロボットを使用してもよい。その場合には、画像処理を容易に行うようにするために、コミュニケーションロボットは、移動をせずにフレーム画像の取得位置を変えない方が望ましい。
(Deformation)
(1) In the third embodiment, the communication robot includes a camera. In the fourth embodiment, the communication robot receives a frame image from the camera. However, the present invention is not limited to this.
For example, in the third embodiment, a communication robot may receive a frame image from a camera. In that case, the position of the camera is not limited to the vicinity of the communication robot, and may be installed on the ceiling of the space. In addition, frame images may be acquired from a plurality of cameras.
In the fourth embodiment, a communication robot including a camera may be used. In that case, in order to easily perform image processing, it is desirable that the communication robot does not move and does not change the acquisition position of the frame image.
 (2)第3及び第4実施形態では、コミュニケーションロボットが各種処理を行うものを例に説明したが、これに限定されない。コミュニケーションロボットに対して通信可能に接続されたサーバを備え、サーバが各種処理を行ってもよい。その場合には、コミュニケーションロボットは、少なくとも、サーバが送信した案内データを出力するスピーカと、コミュニケーションロボットが動作をするためのモータ部とを備えるものであればよい。また、コミュニケーションロボットと、サーバとの両方の装置によって各種処理を行ってもよい。 (2) In the third and fourth embodiments, the communication robot performs various processes as an example, but the present invention is not limited to this. A server that is communicably connected to the communication robot may be provided, and the server may perform various processes. In that case, the communication robot only needs to include at least a speaker that outputs guidance data transmitted from the server and a motor unit for operating the communication robot. Various processes may be performed by both the communication robot and the server.
 (3)第3及び第4実施形態では、適宜のタイミングで取得したフレーム画像を用いるものを例に説明したが、これに限定されない。例えば、動画データを用いてもよい。その場合には、動画データから複数のフレーム画像を取得して、フレーム画像ごとに画像分析処理をすればよい。
 (4)第3及び第4実施形態では、取得したフレーム画像に含まれる全てのユーザに対して、画像分析処理を行うものを例に説明したが、これに限定されない。取得したフレーム画像から1のユーザを選択し、選択したユーザの画像を分析して声掛け対象にするか否かを判断してもよい。
 (5)第3及び第4実施形態では、案内データを出力後、案内データ出力前後でのユーザの表情を比較してさらに案内データを出力するものを例に説明したが、これに限定されない。例えば、ユーザがコミュニケーションロボットから離れるまでは、案内データを出力し続けてもよい。
(3) In the third and fourth embodiments, the example using the frame image acquired at an appropriate timing has been described as an example, but the present invention is not limited to this. For example, moving image data may be used. In that case, a plurality of frame images may be acquired from the moving image data, and image analysis processing may be performed for each frame image.
(4) In the third and fourth embodiments, the example in which the image analysis process is performed on all users included in the acquired frame image has been described as an example. However, the present invention is not limited to this. One user may be selected from the acquired frame images, and an image of the selected user may be analyzed to determine whether or not to make a call target.
(5) In 3rd and 4th embodiment, after outputting guidance data, it demonstrated as an example what compares the user's facial expression before and after guidance data output, and outputs guidance data, but it is not limited to this. For example, the guidance data may continue to be output until the user leaves the communication robot.
 (6)第3実施形態では、ユーザの見ている方向を、顔の方向として、首を動かしている場合に、きょろきょろしているとして方向変化量を大きくするものを説明したが、これに限定されない。例えば、ユーザの見ている方向を、目線の方向としてもよい。例えば、制御部は、フレーム画像から顔画像を検出し、その中から目を抽出することで、眼球と瞳孔との相対位置に基づいて目線の向きを取得できる。その場合、ユーザの移動方向と、顔の向きが異なり、ユーザの見ている方向がきょろきょろしている場合に、声掛け対象にすることができる。 (6) In the third embodiment, the direction in which the user is viewing is set as the face direction, and the case where the neck is moved and the direction change amount is increased as it is staggered has been described. However, the present invention is not limited to this. Not. For example, the direction that the user is viewing may be the direction of the line of sight. For example, the control unit can detect the face image from the frame image and extract the eye from the face image, thereby acquiring the eye direction based on the relative position between the eyeball and the pupil. In that case, when the user's moving direction is different from the face direction, and the user's viewing direction is different, it can be a voice call target.
 (7)第4実施形態では、Webカメラやステレオカメラを用いるものを例に説明したが、これに限定されない。例えば、Kinect(登録商標)に代表される深度情報を取得するセンサを有するカメラを用いてもよい。このkinectを用いれば、フレーム画像からユーザの顔を検出し、ユーザの移動方向を算出する処理をより簡易に行うことができる。
 (8)第4実施形態では、ユーザの属性に関する処理をしていないが、第3実施形態と同様に、ユーザの属性を判別して、判別したユーザの属性に対応する案内データを抽出するようにしてもよい。
(7) In the fourth embodiment, an example using a Web camera or a stereo camera has been described, but the present invention is not limited to this. For example, a camera having a sensor that acquires depth information represented by Kinect (registered trademark) may be used. By using this kinect, the process of detecting the user's face from the frame image and calculating the moving direction of the user can be performed more easily.
(8) In the fourth embodiment, the processing related to the user attribute is not performed. However, as in the third embodiment, the user attribute is determined, and the guidance data corresponding to the determined user attribute is extracted. It may be.
 (9)第4実施形態では、コミュニケーションロボットが以前に声掛けしたユーザと同じであると判断した場合には、以前とは異なる案内データを出力させるものを説明したが、これに限定されない。例えば、図42に示すように、ユーザの移動方向に応じて、異なる案内データを出力させるものであってもよい。
 図42A及び図42Bは、変形形態に係る案内ロボット制御システム3300の具体例を示す図である。
 図42Aは、スペース3300Sに向かって歩いており、顔の向きがコミュニケーションロボット3301(カメラ3305)を向いているユーザP5を、声掛け対象とした場合である。この場合には、コミュニケーションロボット3301は、移動方向に対応した案内データである「いらっしゃいませ。ご案内しましょうか?」を出力して、ユーザP5に声掛けをする。
(9) In the fourth embodiment, the description has been given of outputting the guidance data different from the previous one when it is determined that the communication robot is the same as the user who spoke before, but the present invention is not limited to this. For example, as shown in FIG. 42, different guidance data may be output depending on the moving direction of the user.
42A and 42B are diagrams showing a specific example of a guidance robot control system 3300 according to a modified embodiment.
FIG. 42A shows a case where the user P5 who is walking toward the space 3300S and whose face is facing the communication robot 3301 (camera 3305) is a voice call target. In this case, the communication robot 3301 outputs the guidance data corresponding to the moving direction “I welcome you. Would you like to guide you?” And speaks to the user P5.
 次に、図42Bは、スペース3300Sから出てきて、顔の向きがコミュニケーションロボット3301を向いているユーザP6を、声掛け対象とした場合である。この場合には、コミュニケーションロボット3301は、移動方向に対応した案内データである「ありがとうございました」を出力して、ユーザP6に声掛けをする。
 このように、コミュニケーションロボット3301は、ユーザPの移動方向に対応した案内データを抽出して出力する。よって、コミュニケーションロボット3301は、スペース3300Sに入るユーザP5と、スペース3300Sから出てきたユーザP6とに対して、異なる声掛けをすることができ、ユーザPの状況に対応した異なる適切な声掛けを、コミュニケーションロボット3301にさせることができる。
Next, FIG. 42B shows a case in which the user P6 who comes out of the space 3300S and whose face is facing the communication robot 3301 is targeted for calling. In this case, the communication robot 3301 outputs “thank you”, which is guidance data corresponding to the moving direction, and speaks to the user P6.
Thus, the communication robot 3301 extracts and outputs guidance data corresponding to the moving direction of the user P. Therefore, the communication robot 3301 can make different calls to the user P5 entering the space 3300S and the user P6 coming out of the space 3300S, and make different appropriate calls according to the situation of the user P. The communication robot 3301 can be used.
 10、20 情報提示装置
 11、31 画像入力部
 12、42 領域判定部
 12a 顔検出部
 12b 顔器官検出部
 12c 服領域特定部
 13、43 色情報抽出部
 14、44 色言語テーブル
 14a 組合せ色ID
 14b 配色1~3の単色ID
 14c 言語(言語情報)
 14d 関連語
 15、45 記憶部
 16、46 照合部
 17、47 発話内容決定部
 18、32 出力部
 21a、21b 通信回線
 39a、49a 送信部
 39b、49b 受信部
 30 端末装置
 40 サーバ
 50 画像
 51 顔領域
 52 目領域
 53 口領域
 54 服領域
 55、56、57、81、91 組合せ色
 55a、55b、55c、56a、56b、56c、57a、57b、57c、61、62、63、81a、81b、81c、91a、91b、91c 単色
 60、80、90 カラーイメージスケール
 100 単色テーブル
 100a 単色ID
 100b 色値
 110 対話ルール
 111 領域
 112 言語情報
 113 発話内容
 120 推薦テーブル
 121 領域
 122 言語情報
 123 発話内容
 124 推薦情報
 2001 情報処理システム
 2002 ロボット
 2003 コンピュータ
 2004 カメラ
 2005 マイク
 2006 匂いセンサ 
 2007 ディスプレイ
 2020 人物特定部
 2021 映像取得部
 2022 音声取得部
 2023 匂い取得部
 2024 特徴抽出部
 2025 分類特定部
 2026 処理部
 2040 人物特徴
 2050 分類テーブル
 2060 処理データベース
 3001,3201,3301 コミュニケーションロボット
 3010,3210 制御部
 3011 画像取得部
 3012,3212 画像分析部
 3012a 方向判別部
 3012b 距離算出部
 3012c,3212c 表情判別部
 3012d 属性判別部
 3013,3213 ユーザ決定処理部
 3013a 方向変化量算出部
 3013b 移動速度算出部
 3013c 移動方向算出部
 3013d,3213d 案内判断部
 3016 移動制御部
 3017 案内部
 3018 表情比較部
 3020,3220 記憶部
 3021a,3221a 制御プログラム
 3022,3222 案内データ記憶部
 3031,3205,3305 カメラ
 3032 スピーカ
 3033 モータ部
 3100,3200,3300 案内ロボット制御システム
 3211 画像受付部
 3212a 方向取得部
 3212e 顔取得部
 3214 顔データ照合部
 3224 顔データ記憶部
 3251 画像取得部
 3252 画像送信部
 C1 子供
 P,P1~P6 ユーザ
 3000S,3200S,3300S スペース
DESCRIPTION OF SYMBOLS 10, 20 Information presentation apparatus 11, 31 Image input part 12, 42 Area | region determination part 12a Face detection part 12b Face organ detection part 12c Clothes area | region identification part 13, 43 Color information extraction part 14, 44 Color language table 14a Combination color ID
14b Single color ID of colors 1-3
14c Language (Language information)
14d Related terms 15, 45 Storage unit 16, 46 Collation unit 17, 47 Utterance content determination unit 18, 32 Output unit 21a, 21b Communication line 39a, 49a Transmission unit 39b, 49b Reception unit 30 Terminal device 40 Server 50 Image 51 Facial area 52 eye area 53 mouth area 54 clothes area 55, 56, 57, 81, 91 combination color 55a, 55b, 55c, 56a, 56b, 56c, 57a, 57b, 57c, 61, 62, 63, 81a, 81b, 81c, 91a, 91b, 91c Single color 60, 80, 90 Color image scale 100 Single color table 100a Single color ID
100b Color value 110 Dialog rule 111 Area 112 Language information 113 Utterance content 120 Recommendation table 121 Area 122 Language information 123 Utterance content 124 Recommendation information 2001 Information processing system 2002 Robot 2003 Computer 2004 Camera 2005 Microphone 2006 Odor sensor
2007 display 2020 person identification unit 2021 video acquisition unit 2022 voice acquisition unit 2023 odor acquisition unit 2024 feature extraction unit 2025 classification identification unit 2026 processing unit 2040 person feature 2050 classification table 2060 processing database 3001, 3201, 3301 communication robot 3010, 3210 control unit 3011 Image acquisition unit 3012, 3212 Image analysis unit 3012a Direction determination unit 3012b Distance calculation unit 3012c, 3212c Expression determination unit 3012d Attribute determination unit 3013, 3213 User determination processing unit 3013a Direction change amount calculation unit 3013b Movement speed calculation unit 3013c Movement direction calculation Unit 3013d, 3213d guidance determination unit 3016 movement control unit 3017 guide unit 3018 facial expression comparison unit 3020, 3220 storage Unit 3021a, 3221a Control program 3022, 3222 Guide data storage unit 3031, 3205, 3305 Camera 3032 Speaker 3033 Motor unit 3100, 3200, 3300 Guide robot control system 3211 Image reception unit 3212a Direction acquisition unit 3212e Face acquisition unit 3214 Face data verification unit 3224 Face data storage unit 3251 Image acquisition unit 3252 Image transmission unit C1 Child P, P1 to P6 User 3000S, 3200S, 3300S Space

Claims (13)

  1.  色情報と言語情報とが対応付けられている色言語テーブルを記憶する記憶手段と、
     入力される画像を取得する画像入力手段と、
     前記画像から1以上の所定の領域を判定する領域判定手段と、
     前記領域から前記色情報を抽出する色情報抽出手段と、
     前記色情報抽出手段によって抽出される前記色情報に基づいて前記色言語テーブルを照合し、前記言語情報を取得する照合手段と、
    を備えることを特徴とする情報提示装置。
    Storage means for storing a color language table in which color information and language information are associated;
    An image input means for acquiring an input image;
    Area determination means for determining one or more predetermined areas from the image;
    Color information extracting means for extracting the color information from the region;
    Collating means for collating the color language table based on the color information extracted by the color information extracting means and obtaining the language information;
    An information presentation device comprising:
  2.  前記言語情報に基づいて発話内容を決定する発話内容決定手段、
    を更に備えることを特徴とする請求項1に記載の情報提示装置。
    Speech content determining means for determining speech content based on the language information;
    The information presentation apparatus according to claim 1, further comprising:
  3.  前記色情報抽出手段は、前記領域それぞれに対して減色処理を行い、前記領域を代表する1又は複数の前記色情報を抽出する
    ことを特徴とする請求項1または請求項2に記載の情報提示装置。
    3. The information presentation according to claim 1, wherein the color information extraction unit performs a color reduction process on each of the regions, and extracts one or a plurality of the color information representing the region. apparatus.
  4.  前記言語情報は、印象を表現する印象語である
    ことを特徴とする請求項1から請求項3のいずれかに記載の情報提示装置。
    The information presentation apparatus according to claim 1, wherein the language information is an impression word expressing an impression.
  5.  前記記憶手段は、言語情報に関連する関連語を前記言語情報に対応付けて記憶し、
    前記言語情報に関連する関連語を前記記憶手段から取得する関連語取得手段、
    を更に備えることを特徴とする請求項1から請求項4のいずれかに記載の情報提示装置。
    The storage means stores a related word related to language information in association with the language information,
    Related word acquisition means for acquiring related words related to the language information from the storage means;
    The information presentation apparatus according to claim 1, further comprising:
  6.  前記関連語取得手段は、ネットワークを介してWeb上から関連語を取得することを特徴とする請求項5に記載の情報提示装置。 6. The information presentation apparatus according to claim 5, wherein the related word acquisition unit acquires a related word from the Web via a network.
  7.  前記記憶手段は、更に、前記言語情報と、前記発話内容とが対応付けられている対話ルールを記憶し、
     前記発話内容決定手段は、前記対話ルールに基づいて前記発話内容を決定する
    ことを特徴とする請求項2に記載の情報提示装置。
    The storage means further stores a dialogue rule in which the language information is associated with the utterance content,
    The information presentation apparatus according to claim 2, wherein the utterance content determination unit determines the utterance content based on the dialogue rule.
  8.  前記記憶手段は、更に、前記領域及び前記言語情報と、ユーザに推薦する推薦情報とが対応付けられている推薦テーブルを記憶し、
     前記発話内容決定手段は、更に、前記推薦テーブルに基づいて前記推薦情報を決定する
    ことを特徴とする請求項2または請求項7に記載の情報提示装置。
    The storage means further stores a recommendation table in which the region and the language information are associated with recommendation information recommended to a user.
    The information presentation apparatus according to claim 2, wherein the utterance content determination unit further determines the recommendation information based on the recommendation table.
  9.  色情報と言語情報とが対応付けられている色言語テーブルを記憶するコンピュータが、ユーザと対話を行う情報提示方法であって、
     入力される画像を取得する画像入力ステップと、
     前記画像から所定の領域を判定する領域判定ステップと、
     前記領域から前記色情報を抽出する色情報抽出ステップと、
     前記色情報抽出ステップによって抽出される前記色情報に基づいて前記色言語テーブルを照合し、前記言語情報を取得する照合ステップと、
    を備えることを特徴とする情報提示方法。
    A computer that stores a color language table in which color information and language information are associated is an information presentation method for interacting with a user,
    An image input step for acquiring an input image;
    An area determination step for determining a predetermined area from the image;
    A color information extraction step for extracting the color information from the region;
    A collation step of collating the color language table based on the color information extracted by the color information extraction step, and acquiring the language information;
    An information presentation method comprising:
  10.  コンピュータを、請求項1に記載の情報提示装置として機能させることを特徴とするプログラム。 A program that causes a computer to function as the information presentation device according to claim 1.
  11.  人物の分類ごとに特徴を記憶する記憶手段と、
     人物の特徴を抽出する抽出手段と、
     抽出した前記特徴に基づいて前記人物の分類を特定する特定手段と、
     特定した前記分類に応じた処理を行う処理手段と、
     を備えることを特徴とする情報処理装置。
    Storage means for storing features for each person classification;
    Extraction means for extracting the characteristics of the person;
    Identifying means for identifying the classification of the person based on the extracted features;
    Processing means for performing processing according to the identified classification;
    An information processing apparatus comprising:
  12.  ユーザに対する案内データを記憶する案内データ記憶部と、
     時系列に連続した複数のフレーム画像を取得する画像取得手段と、
     前記画像取得手段により取得した前記複数のフレーム画像を分析して、前記フレーム画像ごとにユーザが見ている方向を判別する方向判別手段と、
     前記方向判別手段により判別した方向に基づいて、時間経過に伴う前記方向の変化の大きさを表す方向変化量を算出する方向変化量算出手段と、
     前記方向変化量算出手段によって算出した方向の変化に基づいて、案内をするか否かを判断する案内判断手段と、
     前記案内判断手段により案内をすると判断された場合に、前記案内データ記憶部から案内データを抽出してロボットによる案内を行う案内手段と、
     を備える案内ロボット制御システム。
    A guidance data storage unit for storing guidance data for the user;
    Image acquisition means for acquiring a plurality of frame images continuous in time series;
    Analyzing the plurality of frame images acquired by the image acquisition unit, and determining a direction that the user is viewing for each frame image;
    Direction change amount calculating means for calculating a direction change amount representing the magnitude of change in the direction with time based on the direction determined by the direction determining means;
    Guidance determining means for determining whether or not to guide based on a change in direction calculated by the direction change amount calculating means;
    Guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot when the guidance judgment means judges to guide;
    A guidance robot control system comprising:
  13.  ユーザに対する案内データを記憶する案内データ記憶部と、
     時系列に連続した複数のフレーム画像を取得する画像取得手段と、
     前記画像取得手段により取得した前記フレーム画像を分析してユーザを特定し、見ている方向を取得するユーザ方向取得手段と、
     前記画像取得手段により取得した前記フレーム画像を分析して、前記ユーザの移動方向を算出する移動方向算出手段と、
     前記ユーザ方向取得手段により取得したユーザの見ている向きと、前記移動方向算出手段が算出したユーザの移動方向とに基づいて、案内をするか否かを判断する案内判断手段と、
     前記案内判断手段により案内をすると判断された場合に、前記案内データ記憶部から案内データを抽出してロボットによる案内を行う案内手段と、
     を備える案内ロボット制御システム。
    A guidance data storage unit for storing guidance data for the user;
    Image acquisition means for acquiring a plurality of frame images continuous in time series;
    User direction acquisition means for analyzing the frame image acquired by the image acquisition means to identify a user and acquiring a viewing direction;
    Analyzing the frame image acquired by the image acquisition means and calculating a movement direction of the user;
    Guidance determining means for determining whether or not to guide based on the user's viewing direction acquired by the user direction acquiring means and the user's moving direction calculated by the moving direction calculating means;
    Guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot when the guidance judgment means judges to guide;
    A guidance robot control system comprising:
PCT/JP2016/081751 2015-11-16 2016-10-26 Information presentation apparatus, information presentation method, program, information processing apparatus, and guide robot control system WO2017086108A1 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
JP2015-224262 2015-11-16
JP2015224262A JP6657827B2 (en) 2015-11-16 2015-11-16 Information presentation device, information presentation method, and program
JP2016-045573 2016-03-09
JP2016045573A JP6607092B2 (en) 2016-03-09 2016-03-09 Guide robot control system, program, and guide robot
JP2016046761A JP6610349B2 (en) 2016-03-10 2016-03-10 Guide robot control system, program, and guide robot
JP2016-046761 2016-03-10
JP2016-065426 2016-03-29
JP2016065426A JP2017182261A (en) 2016-03-29 2016-03-29 Information processing apparatus, information processing method, and program

Publications (1)

Publication Number Publication Date
WO2017086108A1 true WO2017086108A1 (en) 2017-05-26

Family

ID=58718847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/081751 WO2017086108A1 (en) 2015-11-16 2016-10-26 Information presentation apparatus, information presentation method, program, information processing apparatus, and guide robot control system

Country Status (1)

Country Link
WO (1) WO2017086108A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229868A1 (en) * 2017-06-13 2018-12-20 富士通株式会社 Robot, control device, robot control method and robot control program
JP2019074865A (en) * 2017-10-13 2019-05-16 ロボットスタート株式会社 Conversation collection device, conversation collection system, and conversation collection method
JP2020071852A (en) * 2018-11-02 2020-05-07 深和パテントサービス株式会社 Information processing device, information processing system, learning device, learned estimation model, and method for collecting data for learning
CN112947402A (en) * 2019-11-25 2021-06-11 丰田自动车株式会社 Control system, control method, and program
JP2022526772A (en) * 2019-08-28 2022-05-26 北京市商▲湯▼科技▲開▼▲發▼有限公司 Interactive methods, devices, devices, and storage media

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007293399A (en) * 2006-04-21 2007-11-08 Seiko Epson Corp Image exchange device, image exchange method, and image exchange program
JP2009301140A (en) * 2008-06-10 2009-12-24 Nippon Telegr & Teleph Corp <Ntt> Text segmentation device, method, program, and computer-readable recording medium
JP2011221606A (en) * 2010-04-05 2011-11-04 Sony Corp Information processing method and graphical user interface
JP2011238019A (en) * 2010-05-11 2011-11-24 Yahoo Japan Corp Category processing device and method
JP2012161851A (en) * 2011-02-03 2012-08-30 Advanced Telecommunication Research Institute International Robot system and space formation recognizing device used in the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007293399A (en) * 2006-04-21 2007-11-08 Seiko Epson Corp Image exchange device, image exchange method, and image exchange program
JP2009301140A (en) * 2008-06-10 2009-12-24 Nippon Telegr & Teleph Corp <Ntt> Text segmentation device, method, program, and computer-readable recording medium
JP2011221606A (en) * 2010-04-05 2011-11-04 Sony Corp Information processing method and graphical user interface
JP2011238019A (en) * 2010-05-11 2011-11-24 Yahoo Japan Corp Category processing device and method
JP2012161851A (en) * 2011-02-03 2012-08-30 Advanced Telecommunication Research Institute International Robot system and space formation recognizing device used in the same

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229868A1 (en) * 2017-06-13 2018-12-20 富士通株式会社 Robot, control device, robot control method and robot control program
JP2019074865A (en) * 2017-10-13 2019-05-16 ロボットスタート株式会社 Conversation collection device, conversation collection system, and conversation collection method
JP7055327B2 (en) 2017-10-13 2022-04-18 ロボットスタート株式会社 Conversation collection device, conversation collection system and conversation collection method
JP2020071852A (en) * 2018-11-02 2020-05-07 深和パテントサービス株式会社 Information processing device, information processing system, learning device, learned estimation model, and method for collecting data for learning
JP7450241B2 (en) 2018-11-02 2024-03-15 深和パテントサービス株式会社 Information processing devices, information processing systems, and learning devices
JP2022526772A (en) * 2019-08-28 2022-05-26 北京市商▲湯▼科技▲開▼▲發▼有限公司 Interactive methods, devices, devices, and storage media
JP7224488B2 (en) 2019-08-28 2023-02-17 北京市商▲湯▼科技▲開▼▲發▼有限公司 Interactive method, apparatus, device and storage medium
CN112947402A (en) * 2019-11-25 2021-06-11 丰田自动车株式会社 Control system, control method, and program

Similar Documents

Publication Publication Date Title
US10810409B2 (en) Identifying facial expressions in acquired digital images
WO2017086108A1 (en) Information presentation apparatus, information presentation method, program, information processing apparatus, and guide robot control system
Wu et al. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies
Bettadapura Face expression recognition and analysis: the state of the art
Littlewort et al. Dynamics of facial expression extracted automatically from video
Lin et al. Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition
Varghese et al. Overview on emotion recognition system
KR102167760B1 (en) Sign language analysis Algorithm System using Recognition of Sign Language Motion process and motion tracking pre-trained model
Hassanat Visual speech recognition
KR102351008B1 (en) Apparatus and method for recognizing emotions
CN113658254B (en) Method and device for processing multi-modal data and robot
Paleari et al. Features for multimodal emotion recognition: An extensive study
WO2010133661A1 (en) Identifying facial expressions in acquired digital images
JP2017182261A (en) Information processing apparatus, information processing method, and program
JP6972526B2 (en) Content providing device, content providing method, and program
Kumar et al. Facial emotion recognition and detection using cnn
Adelhardt et al. Multimodal user state recognition in a modern dialogue system
JP7322374B2 (en) ROBOT CONTROL DEVICE, ROBOT, ROBOT CONTROL METHOD AND PROGRAM
Lee Virtual representation of facial avatar through weighted emotional recognition
Rivera et al. Development of an automatic expression recognition system based on facial action coding system
KR102630872B1 (en) Apparatus and method for learning facial expression recognition
JP7195545B2 (en) Equipment for providing products and services with interactive functions
Naidoo South African Sign Language recognition using feature vectors and hidden markov models
Namuduri Human-Machine Interface Using Facial Gesture Recognition
Koch et al. One-shot lip-based biometric authentication: extending behavioral features with authentication phrase information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16866110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16866110

Country of ref document: EP

Kind code of ref document: A1