WO2017086108A1 - Appareil de présentation d'informations, procédé de présentation d'informations, programme, appareil de traitement d'informations, et système de commande de robot de guidage - Google Patents

Appareil de présentation d'informations, procédé de présentation d'informations, programme, appareil de traitement d'informations, et système de commande de robot de guidage Download PDF

Info

Publication number
WO2017086108A1
WO2017086108A1 PCT/JP2016/081751 JP2016081751W WO2017086108A1 WO 2017086108 A1 WO2017086108 A1 WO 2017086108A1 JP 2016081751 W JP2016081751 W JP 2016081751W WO 2017086108 A1 WO2017086108 A1 WO 2017086108A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
user
color
information
guidance
Prior art date
Application number
PCT/JP2016/081751
Other languages
English (en)
Japanese (ja)
Inventor
伊藤 直之
靖寿 松葉
松本 征二
Original Assignee
大日本印刷株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2015224262A external-priority patent/JP6657827B2/ja
Priority claimed from JP2016045573A external-priority patent/JP6607092B2/ja
Priority claimed from JP2016046761A external-priority patent/JP6610349B2/ja
Priority claimed from JP2016065426A external-priority patent/JP2017182261A/ja
Application filed by 大日本印刷株式会社 filed Critical 大日本印刷株式会社
Publication of WO2017086108A1 publication Critical patent/WO2017086108A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • the present invention relates to an information presentation apparatus that performs a dialog with a user (human).
  • the present invention also relates to an information processing apparatus that performs processing according to a person's attribute (classification).
  • the present invention relates to a guidance robot control system and the like.
  • Patent Document 1 discloses an information presentation device that receives voice input from a user, evaluates a response to the user's response processing based on the voice mode of the voice input, and changes the response content based on the evaluation result. ing.
  • content to be uttered is determined in consideration of human utterance content, changes in facial expressions, and the like.
  • Patent Document 2 discloses an information processing apparatus that presents effective information by recognizing a person's age and gender and performing a presentation effect by voice or movement.
  • Patent Document 3 discloses an output control apparatus that controls information to be output by referring to individual information determined based on individual image data. According to these prior arts, effective automatic correspondence reflecting each person's information is realized.
  • Patent Document 4 a robot that provides a guidance service to visitors has been disclosed (for example, Patent Document 4).
  • Patent Document 1 cannot realize a natural dialogue between humans when introducing a dialogue with a user (especially a first-time user).
  • Patent Document 2 the age and sex of a person are discriminated only from an image.
  • Patent Document 3 also determines an attribute based only on image data captured by the imaging unit. As described above, since the methods of Patent Documents 2 and 3 determine the attribute (classification) of a person based on a single type of data including only images, there are few determination indexes for determining the attribute (classification) of a person. In some cases, the attribute (classification) of a person could not be specified accurately. Further, the discrimination target is limited only to the attribute (classification) of the person that can be discriminated only from the image.
  • the robot described in Patent Document 4 provides a guidance service according to a pattern related to the movement of a user such as a visitor, the guidance service is forcibly provided regardless of the user's intention. There was a possibility.
  • the present invention has been made in view of the above-described problems, and an object of the present invention is to provide an information presentation apparatus and the like capable of natural conversations between humans when introducing a conversation with users. It is to be.
  • the present invention has been made from such a viewpoint, and it is an object of the present invention to appropriately specify a person's classification and to execute processing according to the identified classification. It is to provide a device or the like.
  • an object of the present invention is to provide a guidance robot control system and the like that can perform voice calls in consideration of the user's intention.
  • a first invention for achieving the above-described object is an information presentation apparatus for interacting with a user, and is input with storage means for storing a color language table in which color information and language information are associated with each other. Extracted by an image input means for acquiring an image, an area determination means for determining one or more predetermined areas from the image, a color information extraction means for extracting the color information from the area, and the color information extraction means. And a collating unit that collates the color language table based on the color information and obtains the language information. According to the first invention, when a dialogue with a user is introduced, a natural dialogue like humans is possible.
  • the first invention may further include speech content determining means for determining speech content based on the language information. This enables a dialogue based on the user's impression obtained from the color information.
  • the color information extracting means in the first invention may perform a color reduction process on each of the areas to extract one or a plurality of the color information representing the area. This makes it possible to accurately extract color information.
  • the language information in the first invention may be an impression word expressing an impression.
  • an accurate language that expresses the user's impression can be obtained.
  • the first invention may further include related word acquisition means for acquiring related words related to the language information. This makes it possible to obtain a wide range of languages that express user impressions.
  • the storage means in the first invention further stores a dialogue rule in which the area and the language information are associated with the utterance content, and the utterance content determination means is based on the dialogue rule.
  • the utterance content may be determined.
  • the storage means in the first invention further stores a recommendation table in which the area and the language information are associated with recommendation information recommended to the user, and the utterance content determination means further includes The recommendation information may be determined based on the recommendation table.
  • accurate recommendation information based on the user's impression can be presented.
  • a second invention is an information presentation method in which a computer storing a color language table in which color information and language information are associated with each other interacts with a user, and the image input step inputs the user's image.
  • An area determination step for determining a predetermined area from the image; a color information extraction step for extracting the color information from the area; and the color language table based on the color information extracted by the color information extraction step.
  • a collation step of collating and obtaining the language information.
  • the third invention is a program that causes a computer to function as the information presentation device according to claim 1.
  • the information presentation apparatus of the first invention can be obtained by installing the program of the third invention on a general-purpose computer.
  • a storage unit that stores a feature for each person classification, an extraction unit that extracts a person feature, a specifying unit that specifies the person classification based on the extracted feature,
  • An information processing apparatus comprising: processing means for performing processing according to classification.
  • an information processing apparatus capable of suitably specifying a person's classification (person attributes such as preference, gender, age, language, nationality) and executing a process according to the specified classification. Provided.
  • the extraction unit extracts features using a plurality of different types of data.
  • the classification of a person can be suitably specified using a plurality of different types of data.
  • the image acquisition device further includes image acquisition means for acquiring image data including the person, the extraction means extracts at least the characteristics of the person from the image data, and the identification means includes at least: It is desirable to specify the classification of the person based on the feature. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the video data.
  • the extraction means may be configured to extract features related to the movement of the person.
  • the classification of the person can be suitably specified based on the characteristics of the movement of the person extracted from the video data.
  • an information presenting means for presenting information to a person may be further provided, and the extracting means may be configured to extract features related to the person's reaction to the information. Thereby, based on the characteristic of the reaction of the person at the time of presenting information, the person classification can be suitably specified.
  • the apparatus further comprises sound acquisition means for acquiring sound data of the person, wherein the extraction means extracts at least the characteristics of the person from the sound data, and the specifying means includes at least the It is desirable to identify the classification of the person based on characteristics. Accordingly, it is possible to preferably specify the classification of the person based on at least the characteristics of the person extracted from the audio data.
  • the extraction means is configured to extract at least one of voice strength, voice pitch, frequency characteristics, voice quality, utterance tempo, harmonic characteristics, intonation, response sound characteristics, utterance language, and utterance content. May be. Thereby, the characteristics of a person are preferably extracted from the audio data.
  • the apparatus further comprises odor acquisition means for acquiring odor data of the person, wherein the extraction means extracts at least the characteristics of the person from the odor data, and the specifying means includes at least the odor data. It is desirable to identify the classification based on features. Accordingly, it is possible to suitably specify the classification of the person based on at least the characteristics of the person extracted from the odor data.
  • the classification of the person can be suitably specified based on the extracted feature.
  • a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of frame images continuous in time series, and an analysis of the plurality of frame images acquired by the image acquisition unit. And a direction determining unit that determines a direction that the user is viewing for each frame image, and a direction change amount that represents a magnitude of a change in the direction over time based on the direction determined by the direction determining unit.
  • guidance determining means for determining whether or not to guide based on the direction change calculated by the direction change amount calculating means, and the guidance determining means
  • a guidance robot control system comprising: guidance means for extracting guidance data from the guidance data storage unit and performing guidance by a robot.
  • distance calculation means for calculating a distance between the user and the robot based on the position information of the user in the frame image acquired by the image acquisition means.
  • the guidance determination unit is configured to determine that guidance is performed when the distance calculated by the distance calculation unit is within a specified range.
  • the 7th invention is the guidance robot control system of 5th invention or 6th invention,
  • the movement which calculates the moving speed of the said user based on the said user's positional information in the frame image which the said image acquisition means acquired
  • the guidance robot control system includes a speed calculation unit, and the guidance determination unit determines to perform guidance when the movement speed calculated by the movement speed calculation unit is within a predetermined range.
  • the robot in the guided robot control system of the seventh aspect, can move, and based on the position information of the user in the frame image acquired by the image acquisition means, the moving direction of the user And a movement control means for moving the robot based on the movement direction of the user calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means.
  • a ninth aspect of the present invention is the guidance robot control system according to any one of the fifth to seventh aspects, wherein the direction discriminating means is configured to be viewed by the user based on a relative position between the user's eyeball and pupil.
  • the guidance determining means includes the movement
  • the direction determining means detects a facial organ based on the face data acquired by the face acquiring means, A guidance robot control system characterized by determining a direction and acquiring a direction that the user is looking at.
  • An eleventh aspect of the present invention is the guide robot control system according to any one of the fifth aspect to the tenth aspect, wherein at least one of the frame images acquired by the image acquisition unit is analyzed, and the facial expression of the user is analyzed.
  • the guidance data storage unit stores guidance data corresponding to a facial expression, and the guidance unit refers to the guidance data storage unit and is determined by the facial expression determination unit.
  • a guidance robot control system characterized by extracting guidance data based on a user's facial expression and performing guidance.
  • the guidance robot control system of the eleventh aspect of the invention after guiding by guidance data, at least the processing by the image acquisition means and the processing by the facial expression discrimination means are executed, and the user before guidance Facial expression comparing means for comparing the facial expression of the user and the facial expression of the user after guidance, the guidance means refers to the guidance data storage unit and extracts guidance data based on the comparison result by the facial expression comparison means
  • the attribute of the user is analyzed by analyzing at least one of the frame images acquired by the image acquisition unit.
  • Attribute determining means for determining the guide data the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit
  • a guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.
  • a guide data storage unit that stores guide data for a user, an image acquisition unit that acquires a plurality of time-series frame images, and the frame image acquired by the image acquisition unit are analyzed.
  • User direction acquisition means for specifying a user and acquiring a viewing direction; movement direction calculation means for analyzing the frame image acquired by the image acquisition means to calculate the movement direction of the user; and the user direction Guidance determination means for determining whether or not to guide based on the user's viewing direction acquired by the acquisition means and the user's movement direction calculated by the movement direction calculation means; and guidance by the guidance determination means
  • a guidance robot that extracts guidance data from the guidance data storage unit and performs guidance by a robot when it is determined to perform guidance. It is a control system.
  • a fifteenth aspect of the present invention is the guide robot control system according to the fourteenth aspect of the present invention, wherein the guide data storage unit stores guide data corresponding to a moving direction.
  • a sixteenth aspect of the invention is the guidance robot control system according to the fourteenth aspect of the invention or the fifteenth aspect of the invention, further comprising distance calculation means for calculating a distance between the user and the robot, and the guidance determination means includes the distance calculation means.
  • a guidance robot control system characterized in that, when the calculated distance is within a specified range, it is determined that guidance is to be provided.
  • a seventeenth aspect of the invention is the guidance robot control system according to any one of the fourteenth aspect to the sixteenth aspect of the invention, further comprising movement speed calculation means for calculating the movement speed of the user, wherein the guidance determination means
  • a guidance robot control system characterized in that guidance is determined when the movement speed calculated by the calculation means is within a specified range.
  • the robot in the guided robot control system of the seventeenth aspect, the robot can move, and the movement direction calculated by the movement direction calculation means and the movement speed calculated by the movement speed calculation means.
  • a guidance robot control system comprising a movement control means for moving the robot based on speed.
  • a nineteenth aspect of the present invention is the guide robot control system according to any one of the fourteenth aspect to the eighteenth aspect, wherein the face image obtained by analyzing the frame image obtained by the image obtaining means is obtained.
  • the guidance robot control system includes an acquisition unit, and the user direction acquisition unit acquires a direction that the user is viewing based on the user face data acquired by the face acquisition unit.
  • the user direction acquisition means detects a facial organ based on the face data acquired by the face acquisition means, and determines the face from the positional relationship. The direction of the user is determined and the direction the user is looking at is acquired, or the direction the user is looking at is acquired based on the relative position between the user's eyeball and pupil. System.
  • the face data of the user acquired by the face acquisition means is stored when the guidance data is output by the guidance means.
  • a face data storage unit, a face collation unit that collates the face data of the user acquired by the face acquisition unit and the face data stored in the face data storage unit, and the guide unit includes Guidance characterized by extracting guidance data different from the previous one from the guidance data storage unit and providing guidance when face matching means can collate with the face data stored in the face data storage unit. It is a robot control system.
  • a twenty-second aspect of the present invention is the guided robot control system according to any one of the nineteenth to twenty-first aspects, wherein the facial expression for determining the user's facial expression is based on the user's facial data acquired by the facial acquisition means.
  • Determining means wherein the guide data storage unit stores guide data corresponding to a facial expression, and the guide unit refers to the guide data storage unit to the facial expression of the user determined by the facial expression determination unit.
  • a guidance robot control system characterized by extracting guidance data based on the guidance and performing guidance.
  • the guidance robot control system after guiding by guidance data, at least the processing by the image acquisition unit, the processing by the face acquisition unit, and the processing by the facial expression determination unit
  • a facial expression comparison unit that executes and compares the facial expression of the user before guidance and the facial expression of the user after guidance, and the guidance unit compares the facial expression with the facial expression comparison unit with reference to the guidance data storage unit
  • a guidance robot control system characterized by extracting guidance data based on a result and performing guidance.
  • the at least one frame image obtained by the image obtaining unit is analyzed, and the attribute of the user is analyzed.
  • Attribute determining means for determining the guide data the guide data storage unit stores guide data corresponding to a user attribute, and the guide means is determined by the attribute determination unit with reference to the guide data storage unit
  • a guidance robot control system characterized in that guidance is performed by extracting guidance data corresponding to the attribute of the user.
  • an information presenting apparatus and the like capable of a natural dialogue between humans when a dialogue with a user is introduced.
  • an information processing apparatus and the like that can appropriately specify a person's classification and execute processing according to the identified classification.
  • a guidance robot control system capable of calling out in consideration of the user's intention.
  • the figure which shows the example of the color image scale which matched the three combination colors and language of 1st Embodiment The figure which shows the other example of the color image scale which matched the three combination colors and language of 1st Embodiment.
  • the figure which shows the specific example of the color language table 14 of 1st Embodiment The figure which shows the specific example of the monochrome table 100 of 1st Embodiment.
  • the figure which shows the specific example of the recommendation table 120 of 1st Embodiment The figure which shows the specific example of the color image scale for forming the color language table 14 of 1st Embodiment.
  • the flowchart which shows operation
  • the figure which shows an example of the system configuration
  • the internal block diagram of the robot 2002 of 2nd Embodiment The internal block diagram of the computer 2003 of 2nd Embodiment Functional configuration diagram of the robot 2002 and the information processing system 2001 of the second embodiment.
  • the figure which illustrates the 1st characteristic 2030A of a 2nd embodiment The figure which illustrates the 2nd characteristic 2030B of a 2nd embodiment.
  • the figure which shows the characteristic (person characteristic 2040) of the extracted person of 2nd Embodiment The figure which shows the classification table 2050 of 2nd Embodiment.
  • FIG. 1 is a block diagram showing the configuration of an embodiment of the information presentation apparatus of the present invention.
  • the information presentation device 10 is a device that interacts with a human user, and is a robot, a computer, or the like.
  • the information presentation apparatus 10 includes an image input unit 11 that inputs an image including a user's face, a region determination unit 12 that determines each region of the facial organ from the image, and a color information extraction unit 13 that extracts color information from the region.
  • the storage unit 15 that stores the color language table 14 in which the color information and the language information are associated with each other and the color language table 14 are collated based on the color information extracted by the color information extraction unit 13 to obtain the language information.
  • a matching unit 16 to be acquired.
  • the information presentation device 10 may further include an utterance content determination unit 17 that determines utterance content based on language information, and an output unit 18 that outputs the utterance content determined by the utterance content determination unit 17. good.
  • the information presentation apparatus 10 may include a microphone that inputs a user's voice.
  • the image input unit 11 includes a camera, a scanner, a storage media drive, a communication control device, and the like.
  • the image input unit 11 captures a target user, reads a printed matter on which the target user is printed, and stores an image of the target user.
  • the stored storage medium is read, or an image of a target user transmitted from another computer is received.
  • the input image is temporarily stored in a RAM (Random Access Memory) of the information presentation device 10.
  • the area determination unit 12, the color information extraction unit 13, the collation unit 16, and the utterance content determination unit 17 are configured by a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
  • the CPU calls a program stored in the storage unit 15, ROM, recording medium or the like to a work memory area on the RAM, executes it, controls the drive of each device connected via the bus, and is performed by the information presentation device 10.
  • the processing described later is realized.
  • the ROM is a non-volatile memory, and permanently stores a program such as a boot program and BIOS of the information presentation apparatus 10, data, and the like.
  • the RAM is a volatile memory, and temporarily holds a program, data, and the like loaded from a storage unit, a ROM, a recording medium, and the like, and includes a work area used by the CPU for performing various processes.
  • the storage unit 15 is an HDD (Hard Disk Drive), a flash memory, or the like, and stores a program executed by the CPU, data necessary for program execution, an OS (Operating System), and the like.
  • a control program corresponding to the OS and an application program for causing the CPU to execute processing described later are stored.
  • Each program code is read by the CPU as necessary, transferred to the RAM, read by the CPU, and executed as various means.
  • the output unit 18 includes a speaker, a display device, a printer, a communication control device, and the like.
  • the output unit 18 outputs voice indicating the utterance content, displays characters indicating the utterance content, prints characters indicating the utterance content, Data indicating the utterance content is transmitted to another computer.
  • FIG. 2 is a block diagram showing a configuration of another embodiment of the information presentation apparatus of the present invention.
  • the information presentation device 20 illustrated in FIG. 2 includes a terminal device 30 used by a user, and a server 40 connected to the terminal device 30 via a network.
  • the terminal device 30 is a mobile terminal such as a smartphone or a tablet, a PC (Personal Computer), or the like, and includes an image input unit 31 that inputs an image including a user's face, and a transmission unit 39a that transmits data such as an image to the server 40. And comprising.
  • the terminal device 30 may include a receiving unit 39b that receives data indicating the utterance content from the server 40 and an output unit 32 that outputs the utterance content. Further, the terminal device 30 may include a microphone for inputting a user's voice.
  • the server 40 includes a receiving unit 49b that receives data from the terminal device 30, an area determination unit 42 that determines each area of the facial organ from the image, a color information extraction unit 43 that extracts color information from the area, color information, A storage unit 45 that stores the color language table 44 associated with the language information, and a collation unit that collates the color language table 44 based on the color information extracted by the color information extraction unit 43 and acquires language information. 46.
  • the information presentation device 10 includes an utterance content determination unit 47 that determines utterance content based on language information, and a transmission unit 49 a that transmits the utterance content determined by the utterance content determination unit 47 to the terminal device 30. You may do it.
  • the image input unit 31, the output unit 32, the region determination unit 42, the color information extraction unit 43, the color language table 44, the storage unit 45, the collation unit 46, and the utterance content determination unit 47 are respectively shown in FIG.
  • the output unit 18, the region determination unit 12, the color information extraction unit 13, the color language table 14, the storage unit 15, the collation unit 16, and the utterance content determination unit 17 have the same functions.
  • the transmission unit 39a transmits data to the server 40 via the communication line 21a.
  • the receiving unit 39b receives data from the server 40 via the communication line 21b.
  • the transmission unit 49a transmits data to the terminal device 30 via the communication line 21b.
  • the receiving unit 49b receives data from the terminal device 30 via the communication line 21a.
  • the information presentation apparatus 20 shown in FIG. 2 performs the same configuration and operation except for data transmission / reception, and thus the description of the information presentation apparatus 20 is omitted.
  • FIG. 3 is a block diagram illustrating a configuration of the area determination unit 12.
  • the region determination unit 12 includes a face detection unit 12 a that detects a user's face from an image, and a facial organ (eyes, mouth, nose, chin, and the like) from the face region detected by the face detection unit 12 a. ) And a clothing region specifying unit 12c that specifies the region of the user's clothes from the image.
  • FIG. 4 is an explanatory diagram of face detection processing by the face detection unit 12a.
  • 50 is an image and 51 is a face area.
  • the face detection unit 12a detects a rectangular area as the face area 51 from the image 50 by using a face detection algorithm of the Viola-Jones method, for example.
  • a search window for example, a determination area such as 8 pixels ⁇ 8 pixels
  • the face detection unit 12a determines, for each area of the search window, whether or not the image is a face image by using a classifier that is constructed in advance by machine learning (AdaBoost) using Haar-Like feature values.
  • AdaBoost machine learning
  • the face detection algorithm is not limited to the Viola-Jones method, and ordinary techniques in this field can be used.
  • FIG. 5 is an explanatory diagram of face organ detection processing by the face organ detection unit 12b.
  • 51 is a face area
  • 52 is an eye area
  • 53 is a mouth area.
  • the face organ detection unit 12b specifies the positions (x, y) of feature points of each face organ such as eyebrows, eyes, nose, mouth, and jaw.
  • a normal technique in this field can be used as an algorithm for specifying the position of the feature point of each facial organ.
  • the facial organ detection unit 12b specifies the region of each facial organ based on the position (x, y) of the feature point. For example, the facial organ detection unit 12 b specifies a rectangular area including all the eye feature points as the eye area 52.
  • the facial organ detection unit 12 b specifies a rectangular area including all mouth feature points as the mouth area 53. Note that the facial organ detection unit 12b may specify not only the eye region 52 and the mouth region 53 but also the nose region, the hair region, the cheek region, and the like.
  • FIG. 6 is an explanatory diagram of the clothing area specifying process by the clothing area specifying unit 12c.
  • 50 is an image
  • 51 is a face area
  • 54 is a clothes area.
  • the clothing region specifying unit 12c specifies the clothing region 54 based on the size and position (x, y) of the rectangular region of the face region 51 detected by the face detection unit 12a. For example, the clothing region specifying unit 12 c sets the lower end of the rectangular region of the face region 51 as the upper end of the clothing region 54 and sets the lower end of the image 50 as the lower end of the clothing region 54.
  • the clothing area specifying unit 12 c sets a value obtained by multiplying the horizontal width of the rectangular area of the face area 51 by a predetermined magnification as the horizontal width of the clothing area 54 and uses the center position of the facial area 51 as a reference. Determine the left and right edges.
  • the clothing region specifying unit 12 c sets the boundary between the background and the clothing in the left-right direction as the left end and the right end of the clothing region 54.
  • the algorithm for specifying the clothing region 54 is not limited to this, and ordinary techniques in this field can be used.
  • the clothing region specifying unit 12c may specify not only a jacket, but also a hat, glasses, pants, a bag, and the like.
  • FIG. 7 is an explanatory diagram of color information extraction processing by the color information extraction unit 13.
  • 52 is an eye region
  • 53 is a mouth region
  • 54 is a clothing region
  • 55, 56, and 57 are color information
  • 55a, 55b, 55c, 56a, 56b, 56c, 57a, 57b, and 57c are color information.
  • Color single color
  • the color information extraction unit 13 performs a color reduction process on each of the eye area 52, the mouth area 53, and the clothes area 54, and extracts one or a plurality of color information 55 to 57 representing each area. In the example shown in FIG. 7, the color information extraction unit 13 extracts three colors as the color information 55 to 57.
  • the color information extraction unit 13 specifies one color (single color) or a plurality of colors (single color) by extracting one color (single color) or a plurality of colors (single color) from each area to be subjected to the color reduction processing (for example, 5 Alternatively, one color or three colors may be extracted from each area subjected to color reduction processing.
  • the color reduction processing by the color information extraction unit 13 is performed by, for example, the k-means method. Specifically, when the image is reduced to k colors (k colors representing each region are extracted), k representative colors are randomly selected from each region, and the closest representative color is selected for each pixel. All pixels are divided into k groups, and the average color of each pixel in each divided group is used as a new representative color, and all the pixels are divided again into k groups, and divided into groups based on this representative color. By repeatedly replacing the average color of the group with the representative color, the color is reduced to k colors.
  • FIG. 8 is a diagram showing a specific example of a color image scale for forming the color language table 14.
  • 60 is a color image scale
  • 61 to 63 are single colors arranged on the color image scale 60.
  • the color image scale 60 has a single color (one single color) corresponding to the image on the coordinate axes of warm (WARM) -cool (COOL) and soft (SOFT) -hard (HARD) which are image determination criteria. ) And language. For example, in the upper left (second quadrant) of the color image scale 60, single colors 61, 62, 63 and others are arranged, the language corresponding to the single color 61 is “sweet”, and the language corresponding to the single color 62 is “pretty” The language corresponding to the single color 63 is “congratulations”. In the color image scale 60, a language associated with one single color includes many adjectives such as “sweet” and “cute”. This is because adjectives are most suitable as a language for expressing the impression given by color.
  • the single color arranged on the color image scale 60 be a color in which one of brightness and saturation or brightness and saturation is different with respect to hue. That is, the hue is divided into, for example, red, yellow-red, yellow, yellow-green, green, blue-green, blue, purple-blue, purple, red-purple, and achromatic color, and / or one of the lightness and saturation of each hue. It is desirable to construct a single color of the color image scale 60 from the colors having different colors. This is because the impression that a color gives depends on the hue, lightness, and again that color has. Note that the single color and language of the color image scale 60 shown in FIG. 8 are only representative, and actually, the single color and language are arranged more finely.
  • FIGS. 9 and 10 are diagrams showing examples of color image scales in which three combination colors and languages are associated with each other.
  • 80 and 90 are color image scales
  • 81 and 91 are three combined colors (color arrangements) arranged on the color image scales 80 and 90.
  • the color image scale 80 shown in FIG. 9 has three combination colors (color schemes) and languages corresponding to the image arranged on a public-uplifting, orthodox-individual coordinate axis. For example, in the upper left (second quadrant) of the color image scale 80, the combination color 81 and the like are arranged, the language corresponding to the combination color 81 is “settled”, and the combination color 81 is a single color 81a, A single color 81b and a single color 81c are combined.
  • the color image scale 80 is applied, for example, to the color and language of a car.
  • the color image scale 90 shown in FIG. 10 has three combination colors (color schemes) and languages corresponding to the image arranged on the formal-casual and static-motion coordinate axes.
  • the combination color 91 and the like are arranged, the language corresponding to the combination color 91 is “elegant”, and the combination color 91 includes the single color 91a and the single color 91b. And a single color 91c.
  • This color image scale 90 is applied to, for example, the color and language of fashion (clothes).
  • FIG. 11 is a diagram showing a specific example of the color language table 14. As shown in FIG. 11, in the color language table 14, a combination color ID 14a, a single color ID 14b, a language 14c, and a related word 14d are associated with each other.
  • the combination color ID 14a is an ID for identifying a combination color arranged on a color image scale (see FIGS. 9 and 10) in which three combination colors and languages are associated with each other.
  • the single color ID 14b is composed of single color IDs of color scheme 1, color scheme 2 and color scheme 3 which are single colors.
  • the data of which the combination color ID 14a is “A00100” corresponds to the combination color in which the single color ID 14b of the color scheme 1 is “c3255”, the single color ID 14b of the color scheme 2 is “c4529”, and the single color ID 14b of the color scheme 3 is “c5043”.
  • the language 14c is “innocent”.
  • the related word 14d is a language related to the language 14c.
  • the related word of the language “innocence” includes “innocence”.
  • the color language table 14 is formed from a color image scale (see FIG. 8) in which a single color is associated with a language (see FIG. 8)
  • the color language table 14 in which the single color arranged on the color image scale is directly associated with the language is formed. Is done. Note that related terms can be stored in advance in association with storage means, but can also be acquired by crawling Web pages on the Internet, for example.
  • FIG. 12 is a diagram showing a specific example of the single color table 100.
  • the single color table 100 indicates what single color the color schemes 1 to 3 specified by the single color ID 14b are composed.
  • 100 is a monochrome table
  • 100a is a monochrome ID
  • 100b is a color value.
  • the single color ID is an ID for identifying a single color forming the color schemes 1 to 3 specified by the combination color 14a.
  • the color value 100b represents a single color specified by a single color ID as a coordinate value in a color space.
  • CEIxy colors in the XYZ color system xyY color system.
  • the values are x and y in the degree diagram.
  • the color space representing the color value 70b is not limited to the XYZ color system (xyY color system) in FIG. 12, but the RGB color system, L * u * v * color system, L * a.
  • Various color systems such as * b * color system are appropriately used.
  • the collation unit 16 includes one color (single color) or a plurality of colors (combination colors) extracted by the color information extraction unit 13 and one color (single color) or combination of the color language table 14 stored in the storage unit 15.
  • a single color is compared and the color value difference (distance in the color space) is below a certain threshold (in the case of a combination color, the difference between all single color values is below a certain threshold) ) Is specified as one color (single color) or combination color.
  • the combination color extracted by the color information extraction unit 13 is the color information 55 including the single color 55a, the single color 55b, and the single color 55c shown in FIG. 7, and the combination color ID is N in the color language table 14.
  • the single color ID of color scheme 1 constituting the (combination color N) is NA (single color NA)
  • the single color ID of color scheme 2 is NB (single color NB)
  • the single color ID of color scheme 3 is NC (single color NC).
  • the single color of the color information 55 with the single color of the combination color N the following six combinations of the single color are compared.
  • the combination color N is the same as or similar to the color information 55.
  • the color value of the single color A (xA, yA) and the color value of the single color B (xB, yB) the difference ⁇ C between the color values of the single color A and the single color B is as follows.
  • the collation unit 16 specifies a combination color that is the same as or close to the combination color extracted by the color information extraction unit 13 from the combination colors in the color language table 14.
  • the collation unit 16 acquires the language (language information) associated with the combination color specified as the combination color that is the same as or similar to the color information 55 and the related word. For example, if the combination color 81 specified in the color image scale 80 (see FIG. 9) is specified as a combination color that is the same as or similar to the color information 55, the collation unit 16 sets the language “calm” associated with the combination color 81. Is "and its related term” calm ".
  • the collation unit 16 may acquire a plurality of language information. Further, in the color language table 14, when one color (single color) and a language are associated with each other, the collation unit 16 extracts the one color (single color) extracted by the color information extraction unit 13 and the color language table 14. One color (single color) is compared, one color (single color) whose color value difference is equal to or less than a certain threshold value is specified, and is associated with the specified one color (single color) Get color information. Further, the collation unit 16 may not only acquire related words from the color language table 14, but may also acquire related words from the storage unit 15 or a dictionary database stored in another computer.
  • the utterance content determination unit 17 determines the utterance content based on language information and related words acquired by the collation unit 16 from various text templates stored in the storage unit 15. Specifically, the utterance content determination unit 17 determines the utterance content based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15.
  • FIG. 13 is a diagram showing a specific example of the dialogue rule 110.
  • 110 is a dialogue rule
  • 111 is a region
  • 112 is language information
  • 113 is an utterance content.
  • the conversation rule 110 is stored with the utterance content 113 associated with each region 111 and language information 112.
  • the area 111 is associated with “clothes” and the language information 112 is associated with “Hisense”.
  • the utterance content determination unit 17 may determine the utterance content by inserting language information into the sentence. For example, when the language information is “elegant”, the utterance content determination unit 17 may insert the language information into “It ’s a very elegant clothes!” And make “A very elegant clothes!” As the utterance content.
  • FIG. 14 is a diagram showing a specific example of the recommendation table 120.
  • 120 is a recommendation table
  • 121 is an area
  • 122 is language information
  • 123 is utterance content
  • 124 is recommendation information.
  • the recommendation table 120 stores utterance contents 123 and recommendation information 124 in association with each other for each area 121 and language information 122.
  • the region 121 is “mouth” and the language information 122 is “chic”
  • the utterance content 123 is “If you like a calm color, how about such a lipstick?”
  • Recommendation information 124 is associated with “lipstick A”.
  • the recommendation information 124 may be associated with not only a character string but also an image, a moving image, a web page, and the like. For example, not only the character string “lipstick A” but also the product image, CM video, product introduction web page, and the like of the lipstick may be associated.
  • FIG. 15 is a flowchart showing the operation of the information presentation apparatus 10.
  • the image input unit 11 inputs an image including the user's face (step S1).
  • the image input unit 11 is a camera, for example, and inputs an image including a user's face photographed by the camera.
  • the region determination unit 12 determines each region of the facial organ from the image input in step S1 (step S2). Specifically, the face detection unit 12a detects the face of the user from the image, and the face organ detection unit 12b detects the face organs (eyes, mouth, nose, chin, etc.) from the face area detected by the face detection unit 12a. Then, the clothes area specifying unit 12c specifies the area of the user's clothes from the image.
  • the color information extraction unit 13 extracts color information for each area specified in step S2 (step S3). Specifically, the color information extraction unit 13 performs color reduction processing on each of the facial organ area such as the eye area and the mouth area and the clothing area, and extracts one or more color information representing each area.
  • the collation unit 16 collates the color language table 14 based on the color information extracted in step S3, and acquires language information and related words (step S4). Specifically, the collation unit 16 compares one color (single color) or a plurality of colors (combination colors) extracted in step S3 with one color (single color) or a combination color of the color language table 14. In contrast, one color of the color language table 14 in which a color value difference is equal to or smaller than a certain threshold value (in the case of a combination color, a color value difference between all single colors is equal to or smaller than a certain threshold value) exists. (Single color) or combination color is specified. Then, the collation unit 16 acquires language information associated with the combination color specified as the combination color that is the same as or close to the combination color from the color language table 14.
  • a certain threshold value in the case of a combination color, a color value difference between all single colors is equal to or smaller than a certain threshold value
  • the utterance content determination unit 17 determines the utterance content based on the language information acquired in step S4 (step S5). Specifically, the utterance content determination unit 17 determines the utterance content and recommendation information based on the dialogue rule 110 (see FIG. 13) and the recommendation table 120 (see FIG. 14) stored in the storage unit 15. When there are a plurality of search results of the dialogue rule 110 and the recommendation table 120, the utterance content determination unit 17 selects at random or according to a predetermined priority order. As the priority order, for example, ranking by area can be considered. For example, since eyes give a strong impression even between humans, it is conceivable that the language information associated with the color information of the eye area is given the highest priority. In addition, when the information to be recommended by the information presentation device 10 is related to a specific area, for example, when the information to be recommended is a lipstick, the language information associated with the color information of the mouth area is given the highest ranking. It is possible.
  • the output unit 18 outputs the utterance content and recommendation information determined in step S5 (step S6). Specifically, the output unit 18 outputs a sound indicating the utterance content from a speaker and displays characters indicating the utterance content and recommendation information on the display device.
  • the information presentation apparatus 10 includes a microphone for inputting the user's voice
  • the user's voice may be input, and the process may be repeated (continue the dialogue) from step S1 in consideration of the user's voice. .
  • the information presenting apparatus 10 (20) enables a natural dialogue like humans when introducing a dialogue with a user.
  • the information presentation apparatus 10 (20) extracts color information for each facial organ and clothes, and determines the utterance content based on language information associated with the color information. Therefore, it is possible to start a conversation based on the first impression received from the other party, as with the first-time human beings, for the first-time user.
  • FIG. 16 is a diagram illustrating an example of a system configuration of the information processing system 2001.
  • An information processing system 2001 includes a robot 2002, a computer 2003, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007, which are an embodiment of an information processing apparatus according to the present invention.
  • a robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, and a display 2007 are communicatively connected to a computer 2003, and controlled by the computer 2003 so that the operations of the respective devices cooperate.
  • the form of communication connection between the robot 2002, the camera 2004, the microphone 2005, the odor sensor 2006, the display 2007, and the computer 2003 may be wired or wireless.
  • the camera 2004 acquires surrounding video data.
  • the camera 2004 is, for example, a single camera, a stereo camera, a distance sensor, or a composite sensor that combines these.
  • video data can be acquired using a camera 2210 built in the robot 2002 instead of the camera 2004. In this case, the camera 2004 may be omitted.
  • the microphone 2005 acquires surrounding audio data.
  • the microphone 2005 is, for example, a microphone array including a single microphone or a plurality of microphones.
  • the microphone 2005 can also acquire voice data using the microphone 2211 built in the robot 2002. In this case, the microphone 2005 may be omitted.
  • the odor sensor 2006 acquires ambient odor data.
  • the odor sensor 2006 is, for example, a sensor composed of a single crystal resonator type or semiconductor type sensor, or a sensor array composed of a plurality of sensors.
  • the odor data can be acquired using the odor sensor 2212 incorporated in the robot 2002. In this case, the odor sensor 2006 may be omitted.
  • the robot 2002 includes a plurality of different types of data, that is, video data acquired from the camera 2004 (or camera 2210), audio data acquired from the microphone 2005 (or microphone 2211), and an odor sensor.
  • the feature of the person is extracted using odor data acquired from 2006 (or the odor sensor 2212), and the classification of the person (attributes of the person such as preference, gender, age, language, nationality) based on the extracted feature. Identify. Then, processing (processing such as dialogue, information guidance, and advertisement) according to the identified classification is performed.
  • the display 2007 is controlled by the computer 2003 and displays an image or the like.
  • the display 2007 may be omitted when display control is performed using the display unit 2204 mounted on the robot 2002, or when the robot 2002 executes processing such as dialogue, information guidance, and advertisement only by voice.
  • the computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, display control of the display 2007, and controls the operation of the robot 2002.
  • the robot 2002 (information processing apparatus) is, for example, an interactive robot. As shown in FIG. 17, the robot 2002 includes a control unit 2201, a storage unit 2202, an input unit 2203, a display unit 2204, a media input / output unit 2205, a communication I / F 2206, a movable unit drive mechanism 2207, and a peripheral device I / F unit. 2208, an audio processing unit 2209, a camera 2210, a microphone 2211, an odor sensor 2212, a speaker 2213, and various sensors 2214 are connected via a bus 2220.
  • the control unit 2201 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
  • the CPU calls a program stored in the storage unit 2202, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2220.
  • the CPU of the control unit 2201 executes all processes by the robot 2002.
  • the ROM permanently stores programs such as a boot program for the robot 2002, a BIOS, data, and the like.
  • the RAM temporarily stores the loaded program and data, and includes a work area used by the control unit 2201 to perform various processes.
  • the storage unit 2202 stores a program executed by the control unit 2201, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2201 as necessary, transferred to the RAM, and read and executed by the CPU.
  • the storage unit 2202 also includes a classification table 2050 (FIG. 24) that is referred to when specifying a person's classification, a processing database 2060 (FIG. 25) that is referred to when executing a process according to the classification, and the like.
  • a classification table 2050 (FIG. 24) that is referred to when specifying a person's classification
  • a processing database 2060 (FIG. 25) that is referred to when executing a process according to the classification, and the like.
  • it has an action DB (not shown) in which a control sequence table describing the control sequence of the robot 2002 is stored.
  • the control sequence table stores data in which all actions performed by the robot 2002 are associated with the movable part control sequence of the robot 2002.
  • the movable part control sequence is a list of driving positions and values of the movable parts to be controlled when realizing each action (for example, head tilt angle 20 degrees, right arm tilt angle 40 degrees, etc.).
  • the input unit 2203 includes an input device such as a touch panel and various operation buttons, for example, and outputs input data to the control unit 2201.
  • the display unit 2204 includes, for example, a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2201 on the display device. Let Note that the touch panel of the input unit 2203 is configured integrally with the display of the display unit 2204.
  • the media input / output unit 2205 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and inputs / outputs data.
  • a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory
  • the communication I / F 2206 is an interface that mediates communication with the network 2008.
  • the network 2008 includes a LAN (Local Area Network), a WAN (Wide Area Network) connected to a wider area, or a public communication line such as the Internet, a base station, and the like.
  • the communication connection with the network 2008 may be wired or wireless.
  • the robot 2002 can access a Web server via the network 2008 and can transmit and receive various programs and data.
  • the movable part drive mechanism 2207 includes a mechanism and a drive device for driving a movable part provided on the head, neck, shoulder, arm, waist, leg, and the like of the robot 2002.
  • the operation of the movable unit driving mechanism 2207 is controlled by the control unit 2201.
  • the control unit 2201 reads the movable part control sequence from the action DB of the storage unit 2202, sends a control signal to each movable part drive mechanism 2207, and operates each drive part.
  • Peripheral device I / F (interface) unit 2208 is a port for connecting a peripheral device, and mediates transmission / reception of data with the peripheral device.
  • the connection form with the peripheral device may be wired or wireless.
  • the robot 2002 can be connected to a computer 2003 or the like via the peripheral device I / F unit 2208.
  • the speech processing unit 2209 includes a speech synthesis unit, a language discrimination unit, a speech recognition unit, and the like.
  • the speech synthesizer generates synthesized speech data corresponding to character information (text data) based on a command from the control unit 2201 and outputs synthesized speech from the speaker 2213.
  • the synthesized speech is generated based on the correspondence between character information and speech data stored in advance in the utterance database of the storage unit 2202.
  • the language discriminating unit uses, for example, a language discriminating model that statistically models differences in phoneme type, word phoneme arrangement pattern, prosodic pattern, vocabulary, etc., for each person's speech data.
  • the speech language discrimination process is executed to discriminate the speech language of the person.
  • the speech recognition unit performs speech recognition processing on speech data that has been subjected to language discrimination using a statistical model (acoustic model and language model) for speech recognition prepared for each language. Recognize
  • the camera 2210 is a single camera built in the robot 2002, such as a CCD (Charge-Coupled Device) camera, a stereo camera, a distance sensor, or a combination of these, and acquires surrounding video data. To do.
  • the acquired video data is sent to the control unit 2201.
  • CCD Charge-Coupled Device
  • the microphone 2211 is a microphone array built in the robot 2002, for example, a single microphone or a plurality of microphones, and acquires ambient audio data.
  • the acquired audio data is sent to the control unit 2201.
  • the odor sensor 2212 is a sensor array that is built in the robot 2002 and is composed of, for example, a single sensor of a crystal oscillator type or a semiconductor type, or a sensor array including a plurality of sensors, and acquires ambient odor data.
  • the acquired odor data is sent to the control unit 2201.
  • the speaker 2213 outputs the synthesized speech generated by the speech synthesis unit of the speech processing unit 2209.
  • the various sensors 2214 include sensors such as a 3D sensor, a touch sensor, a gyro sensor, a sonar sensor, a laser sensor, a bumper sensor, and an infrared sensor. Each sensor inputs the detected signal to the control unit 2201.
  • the bus 2220 is a path that mediates transmission / reception of control signals, data signals, and the like.
  • a computer 2003 is a control device that controls the overall operation of the information processing system 2001.
  • the computer 2003 acquires video data by the camera 2004, audio data by the microphone 2005, acquisition of odor data by the odor sensor 2006, and display control of the display 2007.
  • the operation of the robot 2002 is controlled.
  • the computer 2003 includes a control unit 2301, a storage unit 2302, a media input / output unit 2303, a peripheral device I / F unit 2304, an input unit 2306, a display unit 2307, and a communication I / F 2308 via a bus 2309. Connected and configured.
  • the control unit 2301 includes a CPU, a ROM, a RAM, and the like.
  • the CPU calls a program stored in the storage unit 2302, ROM, recording medium, or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 2309.
  • the CPU of the control unit 2301 performs shooting control by the camera 2004, voice acquisition control of the microphone 2005, odor acquisition control by the odor sensor 2006, and display control of the display 2007 in the process executed by the robot 2000 (FIG. 26).
  • the ROM permanently stores a program such as a boot program for the computer 2003, a BIOS, data, and the like.
  • the RAM temporarily holds the loaded program and data, and includes a work area used by the control unit 2301 to perform various processes.
  • the storage unit 2302 stores a program executed by the control unit 2301, data necessary for program execution, an operating system, and the like. These program codes are read by the control unit 2301 as necessary, transferred to the RAM, and read and executed by the CPU.
  • the media input / output unit 2303 is a drive device such as a recording medium (media) such as a magnetic disk, an optical disk, or a semiconductor memory, and performs data input / output.
  • a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory
  • Peripheral device I / F (interface) unit 2304 is a port for connecting a peripheral device, and transmits / receives data to / from the peripheral device via peripheral device I / F unit 2304.
  • the connection form with the peripheral device may be wired or wireless.
  • a robot 2002, a camera 2004, a microphone 2005, an odor sensor 2006, a display 2007, and the like can be connected to a computer 2003 via a peripheral device I / F unit 2304.
  • the input unit 2306 is an input device such as a keyboard, a mouse, or a touch panel, and outputs input data to the control unit 2301.
  • the display unit 2307 includes a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 2301 on the display device.
  • the touch panel of the input unit 2306 is configured integrally with the display of the display unit 2307.
  • the display unit 2307 may be an external display 2007.
  • the control unit 2301 sends display information to the display unit 2307 or the display 2007 to display it.
  • the communication I / F 2308 is an interface that mediates communication with the network 2008.
  • a computer 2003 can transmit and receive various programs and data from a Web server on the network 2008.
  • the bus 2309 is a path that mediates transmission / reception of control signals, data signals, and the like.
  • the robot 2002 includes functions of a person identification unit 2020, a video acquisition unit 2021, an audio acquisition unit 2022, an odor acquisition unit 2023, a feature extraction unit 2024, a classification identification unit 2025, a processing unit 2026, and an action unit 2029. Is provided. Processing of each function is executed by a program. This program may be stored in advance in the storage unit 2202 of the robot 2002 or the storage unit 2302 of the computer 2003, or may be downloaded and installed via the network 2008.
  • the person specifying unit 2020 specifies a target person for which the robot 2002 performs processing such as dialogue, information guidance, and advertisement.
  • the target person is preferably a person who is interested in the robot 2002 or a person who wants to interact with the robot 2002.
  • the target person is a person who talks to the robot 2002 or touches the robot 2002.
  • the person who talks to the robot 2002 is, for example, the arrival direction of the person's utterance voice estimated by the microphone 2211 built in the robot 2002 or the external microphone 2005, and the camera 2210 built in the robot 2002 or the external microphone. It is specified by detecting the coincidence with the face direction of the person in the video data acquired by the camera 2004.
  • the person who has touched the robot 2002 detects, for example, a person close to the robot 2002 from video data acquired by the camera 2210 built in the robot 2002 or the external camera 2004, and the robot 2002 has a touch sensor. Is identified by detecting the contact action of a person.
  • the target person may be a person who is approaching the robot 2002 or a person who is simply reflected in the video data, in addition to a person who has spoken to the robot 2002 or a person who has touched the robot 2002.
  • the target person may be specified as one group including a plurality of persons.
  • a target person (one group) is specified by determining which person and which person belong to the same group based on the relationship between positions and movements among a plurality of persons in the video data. .
  • the video acquisition unit 2021 acquires video data including the person specified by the person specifying unit 2020 by the camera 2210 built in the robot 2002 or the external camera 2004.
  • the voice acquisition unit 2022 acquires the voice data of the person specified by the person specifying unit 2020 using the microphone 2211 built in the robot 2002 or the external microphone 2005.
  • the microphone 2211 or the microphone 2005 is a microphone array whose directivity can be controlled, it is desirable to acquire the voice data of the person with the directivity of the microphone 2211 or the microphone 2005 directed toward the face of the person.
  • the odor acquisition unit 2023 acquires the odor data of the person specified by the person specifying unit 2020 by the odor sensor 2212 built in the robot 2002 or the external odor sensor 2006.
  • the odor sensor 2212 or the odor sensor 2006 is a sensor array capable of controlling directivity
  • the odor data of the person can be acquired by directing the directivity of the odor sensor 2212 or the odor sensor 2006 toward the person. desirable.
  • the feature extraction unit 2024 extracts the features of the person specified by the person specifying unit 2020 using a plurality of different types of data. Specifically, using the data of the video data acquired by the video acquisition unit 2021, the audio data acquired by the audio acquisition unit 2022, and the odor data acquired by the odor acquisition unit 2023, the data of each person Extract features.
  • the feature of the person extracted based on the video data is “first feature”
  • the feature of the person extracted based on the audio data is “second feature”
  • the person is extracted based on the odor data.
  • This feature is referred to as a “third feature”. Specific examples of the first feature, the second feature, and the third feature will be described below.
  • the first feature is a feature of a person extracted from video data. As shown in FIG. 20, for example, “motion feature” (motion 1, motion 2,%), “Appearance feature” (look 1, appearance 2). ,..., “Reaction characteristics” (reaction 1, reaction 2,).
  • “Motion characteristics” are characteristics related to the movement of a person, for example, how the person walks (walking speed, stride, arm swing, posture, etc.) obtained by applying motion analysis to video data. It is a feature. When a person is a group, it is an average characteristic of how to walk each person belonging to the group and a characteristic of a positional relationship between the persons (one line of one row / two lines of one column / one column of columns, etc.).
  • “Appearance features” are features related to the appearance of a person. For example, the gender, age (age), eye color, hair color, etc. of a person obtained by applying facial image analysis to video data It is the feature.
  • a known method Japanese Patent Application Laid-Open No. 2012-053813, etc.
  • facial expressions and emotions may be extracted as appearance features. The facial expressions are classified into several patterns such as joy, surprise, anger, sadness, and no facial expression. These facial expressions can be read from the rising angle of the mouth corners, the positional relationship of the cheek, eyes, eyebrows, and the like. The emotion can be estimated from the read facial expression.
  • a method disclosed in Japanese Patent Laid-Open Nos. 10-255043 and 2014-206903 may be used.
  • a person's clothes type (suit, dress, folk costume,...) Obtained by applying a known clothes identification analysis to the video data may be extracted as an appearance feature.
  • the feature extraction unit 2024 extracts an average appearance feature of each person belonging to the group.
  • the “reaction feature” is a feature related to the reaction of the person to the presentation of information performed by the robot 2002 to the person.
  • the presentation of information includes, for example, information such as weather information and news displayed on the display unit 2204 and the display 2007, and also a greeting for a person.
  • the synthesized speech corresponding to a predetermined greeting ( "Good morning”, “Hello”, “Are you happy how?" 7) was generated by the audio processing unit 2209, the audio output (reading) from the speaker 2213 makes.
  • a movable part control sequence of an action corresponding to a predetermined greeting action (“waving hand”, “bowing”,...) Is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part To work.
  • the feature extraction unit 2024 performs motion analysis, facial expression analysis, and the like on the video data acquired by the video acquisition unit 2021 after the information is presented, so that a person's reaction to the information presentation (“smile” ”,“ Becoming serious ”,“ coming closer ”,“ turning back ”,“ no response ”, etc.) as features.
  • the feature extraction unit 2024 extracts an average reaction feature of each person belonging to the group.
  • the second feature is a feature of the person extracted from the voice data.
  • voice strength For example, as shown in FIG. 21, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality” ( Bright voice / dark voice / thick voice / faint voice, etc.), “speech tempo” (speak slowly / busily, etc.), “overtone characteristics” (characteristics related to softness of sound), “intonation” (intonation), “Reaction sound characteristics” (characteristics of utterance sounds that show some response to the utterances of the other party, such as “um”, “n”, “e”).
  • voice strength For example, as shown in FIG. 21, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality” ( Bright voice / dark voice / thick voice / faint voice, etc.), “speech tempo” (speak slowly / busily, etc.), “overtone characteristics” (characteristics related to softness of sound), “intonation” (intonation), “Reaction sound characteristics” (characteristics of
  • speech linguistic information such as “utterance language” and “utterance content” may be extracted as features.
  • the “speech language” is statistically modeled by the speech processing unit 2209 of the robot 2002 for differences in phoneme types, word phoneme arrangement patterns, prosodic patterns, vocabulary, etc., for human speech data. It is extracted by analyzing using the language discrimination model. Further, the “utterance content” is extracted by applying a voice recognition process to the voice data input from the microphone 2211 by the voice processing unit 2209 of the robot 2002.
  • the third feature is a feature of a person extracted from odor data. As shown in FIG. 22, for example, “scent type” (type 1, type 2,%), “Scent intensity” (strength). 1, strength 2, ).
  • the odor type is specified by analyzing using an odor discrimination model for discriminating the odor type, which is acquired by learning a large amount of odor data in which the odor type is known in advance.
  • the intensity of odor can be calculated from the absolute value of odor data (input value) acquired by the odor sensor 2006 (or odor sensor 2212).
  • the classification specifying unit 2025 specifies the person classification 2051 based on the above-described features (first feature, second feature, and third feature) extracted by the feature extraction unit 2024.
  • the classification 2051 is a grouping of persons with arbitrary attributes (preference, gender, age, language, nationality, etc.). As shown in the classification table 2050 in FIG. 24, each classification 2051 (class 1, classification 2, classification 2). ,... Are defined by the first feature, the second feature, and the third feature.
  • the characteristics that define each class 2051 (class 1, class 2,...) Are determined based on the characteristics (first characteristics) from a large number of persons who have previously known the class 2051 (class 1, class 2,). , Second feature, and third feature) are extracted and learned statistically. Thereby, it is statistically determined what kind of features (first feature, second feature, third feature) a person belonging to each category 2051 (category 1, category 2,...) Has. I can grasp it.
  • a person belonging to “Category 1” statistically has the characteristics of “operation 3”, “appearance 10”, “response 6”,. 2) "(voice) strength 2", “(voice) height 9", “frequency characteristic 1", etc., and "third characteristic” is "(odor) type” It can be understood that there is a tendency to have the characteristics of “5”, “(odor) intensity 4”.
  • a person belonging to “Category 2” statistically has “motion 5”, “appearance 2”, “response 1”, etc.
  • first characteristics and “second characteristics” (Voice) strength 4 ”, (Voice) height 7”, “Frequency characteristics 3”, etc.
  • “Third feature” includes “(Odor) type 3” “Odor” It can be understood that there is a tendency to have the characteristic of strength 1 ”.
  • a person belonging to “Category 3” statistically has “motion 2”, “appearance 7”, “response 3”, etc.
  • first feature and “second feature” as “second feature” (Voice) strength 6 ", (voice) height 1", “frequency characteristic 4", and so on, and "third characteristic” includes “(smell) type 2""(smell)” It can be understood that there is a tendency to have the characteristic of strength 8 ”.
  • the classification specifying unit 2025 refers to the classification table 2050 and specifies the person's classification based on the features extracted by the feature extracting unit 2024. For example, for all the features extracted by the feature extraction unit 2024, the category 2051 (category 1, category 2,%) To which each feature belongs is aggregated, and the category 2051 with the largest aggregation count is identified as the person category. To do.
  • the processing unit 2026 performs processing based on the processing data 2061 of the classification 2051 specified by the classification specifying unit 2025.
  • “Processing” includes processing (interaction or the like) in which the robot 2002 becomes a person's partner and receives and answers, and processing (advertisement or the like) in which the robot 2002 unilaterally provides information to the person.
  • FIG. 25 is a diagram showing the contents of the processing database 2060 that holds the processing data 2061. As shown in the figure, the processing database 2060 holds processing data 2061 (processing data 1, processing data 2,...) For each category 2051 (class 1, classification 2,).
  • the processing data 2061 is, for example, video data, audio data, text data, operation data, or any combination of these data.
  • the processing data 2061 is video data
  • the video is displayed on the display unit 2204 or the display 2007 of the robot 2002.
  • audio is output from the speaker 2213 of the robot 2002.
  • the voice processing unit 2209 generates a synthesized voice of the text data and outputs the voice from the speaker 2213 (or displays the text data on the display unit 2204 or the display 2007 of the robot 2002).
  • a movable part control sequence corresponding to each action included in the operation data is read from the action DB, a control signal is sent to the movable part drive mechanism 2207, and each drive part is operated.
  • the robot 2002 identifies a target person for processing such as dialogue, information guidance, and advertisement (step S2000).
  • the robot 2002 extracts the characteristics of the person identified in step S2001 using a plurality of different types of data.
  • the robot 2002 acquires video data and extracts a first feature of a person (step S2001). For example, as described above, the “motion characteristics”, “appearance characteristics”, “reaction characteristics”, etc. of the person (see FIG. 20) are extracted from the video data.
  • action features predetermined information is presented to a person before step S2001, and in step S2001, a person's reaction to the information presentation is extracted as a feature.
  • the robot 2002 acquires voice data and extracts the second feature of the person (step S2002). For example, as described above, “voice strength”, “voice pitch”, “frequency characteristics”, “voice quality”, “speech tempo”, “overtone characteristics”, “inflection”, “reaction sound”, “speech language” of a person from voice data “Speech contents” and the like (see FIG. 21) are extracted.
  • the robot 2002 acquires odor data and extracts a third feature of the person (step S2003). For example, as described above, a person's “scent type”, “scent intensity”, and the like (see FIG. 22) are extracted from the scent data.
  • the robot 2002 refers to the classification table 2050, and based on the characteristics of the person (first characteristic, second characteristic, and third characteristic) extracted in steps S2001 to S2003, the classification of persons 2051 is performed. Is identified (step S2004). For example, for all the features extracted in steps S2001 to S2003, the classification 2051 (classification 1, classification 2,%) To which each characteristic belongs is aggregated, and the classification 2051 with the largest aggregation count is specified as the person classification. To do.
  • the robot 2002 refers to the processing database 2060, acquires the processing data 2061 of the classification 2051 specified in step S2004, and executes predetermined processing based on the processing data 2061 (step S2005). For example, when the specified category 2051 is “category 1”, the robot 2002 executes processing based on “processing data 1” (see FIG. 25) corresponding to “classification 1”.
  • the information processing apparatus has been described in the form of a robot, the application form is not limited to the robot, and can be applied to vending machines, digital signage, and the like.
  • various functions and various data included in the information processing system 2001 are included in the control unit 2201 and the storage unit 2202 in the robot 2002.
  • steps S2001 to S2003 in FIG. 26 it is not always possible to extract the characteristics of a person from all types of data such as video data, audio data, and odor data.
  • data such as video data, audio data, and odor data.
  • the person classification may be specified based on the characteristics of the person extracted from the video data and the odor data.
  • the person classification may be specified based on the characteristics of the person extracted from the video data and the audio data.
  • the classification table 2050 may be relearned. Specifically, after step S2004 in FIG. 26, processing for confirming the correctness of the classification 2051 specified by the robot 2002 for the person is added. For example, it is assumed that classification 2051 (classification 1, classification 2,%) Is grouped by “nationality”, and the classification 2051 of the specified person is “France”. In this case, the robot 2002 asks the person a question confirming the correctness of the specified classification 2051 such as “Where are you from?”.
  • the robot 2002 specifies the specified person.
  • Class 2051 (“France") is judged to be an error, and correct classification 2051 ("Italy") and the features (first feature, second feature, and third feature) extracted in steps S2001 to S2003 are correctly learned.
  • the classification table 2050 is relearned.
  • FIG. 27 is a diagram illustrating an exemplary scene suitable for operation of the guidance robot control system 3100 according to the third embodiment.
  • the guide robot control system 3100 is a system using a communication robot 3001 (guide robot).
  • the communication robot 3001 is disposed in a space 3000S such as in an exhibition hall or a store, for example. Note that the communication robot 3001 is desirably arranged at a position where the space 3000S can be looked over.
  • the communication robot 3001 is, for example, a humanoid robot having a height of about 1 m. Then, the communication robot 3001 acquires and analyzes a frame image including the user P in the space 3000S (users P1 to P3 in the example of FIG. 27), determines the user P to be called, Make a voice call (guidance). In the example described below, the communication robot 3001 estimates that the user P who is busy is in trouble, and determines that the user P is a voice call target.
  • FIG. 28 is a functional block diagram of a communication robot 3001 according to the third embodiment.
  • FIG. 29 is a diagram illustrating an example of the guidance data storage unit 3022 of the communication robot 3001 according to the third embodiment.
  • the communication robot 3001 includes a control unit 3010, a storage unit 3020, a camera 3031 (imaging unit), a speaker 3032, and a motor unit 3033.
  • the control unit 3010 is a central processing unit (CPU) that controls the entire communication robot 3001.
  • the control unit 3010 executes various functions in cooperation with the hardware described above by appropriately reading and executing an operating system (OS) and application programs stored in the storage unit 3020.
  • OS operating system
  • the control unit 3010 includes an image acquisition unit 3011 (image acquisition unit), an image analysis unit 3012, a user determination processing unit 3013, a guidance data extraction unit 3015, a movement control unit 3016 (movement control unit), and a guidance unit 3017. (Guide means) and a facial expression comparison unit 3018 (facial expression comparison means).
  • the image acquisition unit 3011 acquires a plurality of frame images via the camera 3031. The acquired plurality of frame images are continuous in time series.
  • the image analysis unit 3012 analyzes the frame image acquired by the image acquisition unit 3011.
  • the image analysis unit 3012 includes a direction determination unit 3012a (direction determination unit), a distance calculation unit 3012b (distance calculation unit), an expression determination unit 3012c (expression determination unit), and an attribute determination unit 3012d (attribute determination unit).
  • the direction determination unit 3012a determines the direction that the user P is viewing.
  • the distance calculation unit 3012b calculates the distance between the communication robot 3001 and the user P for each user P included in the frame image.
  • the facial expression determination unit 3012c determines the facial expression of the user P for each user P included in the frame image.
  • the attribute determination unit 3012d determines the attribute of the user P for each user P included in the frame image.
  • the attribute of the user P refers to, for example, age, sex, and the like.
  • the user determination processing unit 3013 determines a user P to call out based on the frame image analyzed by the image analysis unit 3012.
  • the user determination processing unit 3013 includes a direction change amount calculation unit 3013a (direction change amount calculation unit), a movement speed calculation unit 3013b (movement speed calculation unit), a movement direction calculation unit 3013c (movement direction calculation unit), and guidance determination.
  • Unit 3013d guidance determination means.
  • the direction change amount calculation unit 3013a calculates a direction change amount that is the magnitude of a change with time in the direction that the user P is viewing, as determined by the direction determination unit 3012a.
  • the movement speed calculation unit 3013b calculates the movement speed based on the change of the distance of the user P calculated by the distance calculation unit 3012b with time.
  • the movement direction calculation unit 3013c calculates the movement direction based on the change of the distance of the user P calculated by the distance calculation unit 3012b over time.
  • the guidance determination unit 3013d determines the user P to call out based on the direction
  • the guidance data extraction unit 3015 extracts guidance data for guiding the determined user P from the guidance data storage unit 3022.
  • the movement control unit 3016 moves the communication robot 3001 so as to approach the determined user P based on the movement direction calculated by the movement direction calculation unit 3013c.
  • the guide unit 3017 outputs the guide data extracted by the guide data extraction unit 3015 via the speaker 3032.
  • the expression comparison unit 3018 compares the facial expressions determined by the expression determination unit 3012c before and after guidance. Details of each process will be described later.
  • the storage unit 3020 is a storage area such as a semiconductor memory element for storing programs, data, and the like necessary for the control unit 3010 to execute various processes.
  • the storage unit 3020 includes a program storage unit 3021, a guidance data storage unit 3022, and an image data storage unit 3023.
  • the program storage unit 3021 is a storage area for storing a program.
  • the program storage unit 3021 stores a control program 3021a (program) for executing the various functions of the control unit 3010 described above.
  • the guide data storage unit 3022 is a storage area for storing guide data issued by the communication robot 3001. As shown in FIG. 29, the guidance data storage unit 3022 stores guidance data for calling out in association with facial expressions and attributes of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
  • the image data storage unit 3023 is a storage area that stores the frame image acquired by the image acquisition unit 3011.
  • the camera 3031 is a photographing device.
  • the camera 3031 is provided, for example, at a position corresponding to the eye portion of the communication robot 3001.
  • the camera 3031 shoots the subject in the direction that the communication robot 3001 faces.
  • the speaker 3032 is an audio output device.
  • the speaker 3032 is provided, for example, at a position corresponding to the mouth portion of the communication robot 3001.
  • the speaker 3032 outputs sound based on an instruction from the control unit 3010.
  • the motor unit 3033 is a motor for causing the communication robot 3001 to operate.
  • the motor unit 3033 includes a head motor 3033a and a foot motor 3033b.
  • the head motor 3033 a is a motor that moves the head of the communication robot 3001.
  • the head motor 3033a is controlled by the control unit 3010, and moves the head up, down, left, and right in order to photograph various users P in the space 3000S as subjects.
  • the foot motor 3033b is a motor for moving the communication robot 3001 on the foot of the communication robot 3001.
  • FIG. 30 is a flowchart showing robot control processing in the communication robot 3001 according to the third embodiment.
  • FIG. 31 is a flowchart showing image analysis processing in the communication robot 3001 according to the third embodiment.
  • FIG. 32 is a flowchart showing user determination processing in the communication robot 3001 according to the third embodiment.
  • FIG. 33 is a flowchart showing post-guidance processing in the communication robot 3001 according to the third embodiment.
  • step S the control unit 3010 (image acquisition unit 3011) of the communication robot 3001 performs an image acquisition process.
  • the control unit 3010 In the image acquisition process, the control unit 3010 always acquires a frame image via the camera 3031 and determines whether or not one or more users P are included in the acquired frame image.
  • the control unit 3010 stores the frame image acquired at an appropriate timing (for example, every 0.1 second) in the image data storage unit 3023.
  • the control unit 3010 determines whether a processing condition has been met.
  • the processing condition refers to, for example, a case where a plurality of frame images (for example, 30 images) are stored in the image data storage unit 3023, a case where a predetermined time (for example, 3 seconds) has elapsed, or the like. If it is determined that the processing condition has been met (S3011: YES), the control unit 3010 moves the process to S3012. On the other hand, if it is determined that the processing condition is not satisfied (S3011: NO), the control unit 3010 moves the process to S3010.
  • step S ⁇ b> 3012 the control unit 3010 (image analysis unit 3012) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
  • the control unit 3010 extracts one frame image from the image data storage unit 3023.
  • the control unit 3010 determines the direction in which the user P is looking for each user P included in the frame image. For example, when two users P1 and P2 are included in the frame image, the control unit 3010 determines the direction that the user P1 is viewing and the direction that the user P2 is viewing.
  • control unit 3010 can detect a face image from a frame image and determine the orientation of the face based on the arrangement position of each part (organ) of the face. Therefore, the control unit 3010 can determine the orientation of the determined face in the direction in which the user P is looking.
  • a facial organ detection method using a regression model can be used.
  • a facial organ detection method using a regression model is described in, for example, US Patent Application Publication No. 2014/0185924.
  • step S ⁇ b> 3032 the control unit 3010 (distance calculation unit 3012 b) calculates the distance between the user P and the communication robot 3001 for each user P included in the frame image. For example, the control unit 3010 can calculate a rough distance between the user P and the communication robot 3001 based on the size of the user P included in the frame image, the background of the frame image, and the like.
  • the control unit 3010 (expression determination unit 3012c) determines the facial expression for each user P included in the frame image. The control unit 3010 can determine, for example, whether it is a smile (positive expression) or a sad face (negative expression) from the shape of the eyes, nose, mouth, etc. as facial expressions.
  • step S ⁇ b> 3034 the control unit 3010 determines whether all of the processing target frame images included in the image data storage unit 3023 have been processed. When processing has been performed on all the frame images to be processed (S3034: YES), the control unit 3010 ends this processing and shifts the processing to FIG. On the other hand, when the processing has not been performed on all the frame images to be processed (S3034: NO), the control unit 3010 moves the process to S3035. In step S3035, the control unit 3010 extracts one unprocessed frame image from the processing target frame images included in the image data storage unit 3023. Thereafter, the control unit 3010 moves the process to S3031.
  • step S3013 the control unit 3010 (user determination processing unit 3013) performs user determination processing.
  • the control unit 3010 arranges the frame images subjected to the image analysis process in time series.
  • step S3041 the control unit 3010 (direction change amount calculation unit 3013a) calculates the direction change amount for each user P based on the plurality of rearranged frame images.
  • the amount of change in direction refers to the amount of change in the direction in which the user P is looking as time passes. Specifically, when the user P is drenching intensely, the degree of drowning is high, and the control unit 3010 calculates the amount of change in direction to be high.
  • the control unit 3010 calculates the movement speed for each user P based on the rearranged frame images.
  • the control unit 3010 can calculate the moving speed based on the change in the position of the user P among the plurality of frame images. Note that the control unit 3010 may calculate whether the moving speed is faster than a specified range.
  • the control unit 3010 (movement direction calculation unit 3013c) calculates the movement direction for each user P based on the rearranged frame images.
  • the control unit 3010 can calculate the moving direction based on the change in the position of the user P in the plurality of frame images.
  • step S ⁇ b> 3044 the control unit 3010 (user determination processing unit 3013) determines the facial expression that appears most frequently for each user P based on the plurality of rearranged frame images.
  • step S3045 the control unit 3010 (guidance determination unit 3013d) calculates the calling priority based on each index.
  • the voice call priority is a degree calculated by weighting each of the direction change amount, the position of the user P, the moving speed, and the facial expression.
  • the degree of weighting is set highest for the direction change amount.
  • the control unit 3010 performs weighting so that the calling priority is the highest.
  • the control unit 3010 sets the user P as a calling target.
  • the control unit 3010 determines the user P to call on the basis of the calling priority. Note that when the user P has a small direction change amount, the moving speed is slow, and the priority is a small value even if the facial expression is negative. In such a case, that is, if the priority is less than the predetermined value, the control unit 3010 may determine that there is no user P to call out. Thereafter, the control unit 3010 ends this processing, and moves the processing to FIG.
  • step S3014 the control unit 3010 (user determination processing unit 3013) determines whether or not the user P is determined by the user determination processing.
  • the control unit 3010 moves the process to S3015.
  • the control unit 3010 ends this process.
  • the control unit 3010 (attribute determination unit 3012d) determines the attribute of the determined user P. Specifically, the control unit 3010 determines an approximate age or the like from the image of the user P extracted from the frame image. The control unit 3010 may determine whether the child is an adult without determining the age. In that case, the control unit 3010 may estimate the height from the image of the user P and determine whether or not the child is a child.
  • step S3016 the control unit 3010 (guidance data extraction unit 3015) refers to the guidance data storage unit 3022 (FIG. 29) based on the facial expression of the user P determined in step S3044 in FIG. 32 and the attributes determined in step S3015. Extract data.
  • step S3017 the control unit 3010 (movement control unit 3016) performs movement processing. As the moving process, the control unit 3010 controls the motor unit 3033 (foot motor 3033b) based on the moving speed and moving direction of the user P calculated in S3042 and S3043 in FIG. The robot 3001 is moved.
  • step S3018 the control unit 3010 (guide unit 3017) performs guidance processing. As the guidance process, the control unit 3010 outputs the guidance data extracted in S3016 via the speaker 3032 so as to speak (speak) to the user P after moving.
  • step S3019 the control unit 3010 performs post-guidance processing.
  • the control unit 3010 image acquisition unit 3011
  • the control unit 3010 expression determination unit 3012c
  • the control unit 3010 expression comparison unit 3018
  • performs expression comparison processing As facial expression comparison processing, the control unit 3010 compares the facial expression before the guidance data is output (the processing in S3044 in FIG. 32) with the facial expression acquired in S3051.
  • step S3053 the control unit 3010 (guidance data extraction unit 3015) extracts guidance data corresponding to the facial expression comparison result from the guidance data storage unit 3022 (FIG. 29).
  • step S3054 the control unit 3010 (guidance unit 3017) outputs the guidance data extracted in step S3053 via the speaker 3032 as guidance processing. Thereafter, the control unit 3010 shifts the processing to FIG. Returning to FIG. 30, the control unit 3010 ends this processing.
  • 34A and 34B are diagrams showing a specific example of the guidance robot control system 3100 according to the third embodiment.
  • the control unit 3010 of the communication robot 3001 obtains a frame image of the child C1 who is lost and crawls as shown in FIG. 34A (S3010 in FIG. 30), and determines that it is a call target. (S3014 in FIG. 30 is YES).
  • the control unit 3010 extracts guidance data at the time of calling when the attribute is a child and the expression is negative from the guidance data storage unit 3022 (see FIG. 29) (S3016 in FIG. 30).
  • the control unit 3010 moves toward the position of the child C1 (S3017 in FIG. 30), and outputs the extracted guidance data via the speaker 3032 (FIG. 30). S3018).
  • the communication robot 3001 can guess the state of the child C1 based on the movement and facial expression of the child C1, and can make a call suitable for the intention of the child C1. Thereafter, the control unit 3010 obtains a frame image of the child C1, and if the facial expression of the child C1 remains negative as shown in FIG. 34C, the control unit 3010 reads the voice data from the guidance data storage unit 3022. Guidance data of the corresponding facial expression is extracted and output via the speaker 3032 (S3019 in FIG. 30).
  • the amount of change in the direction in which the user P is looking is calculated from a plurality of frame images. Since the direction change amount becomes a large value when the user P is jogging, the communication robot 3001 determines that the user P is in trouble and calls out. Therefore, the state of the user P as to whether or not the user P is in trouble is estimated based on the amount of change in the direction in which the user P is looking, and the communication robot 3001 determines that the user P is in trouble. It is possible to preferentially make a voice call that suits the intention.
  • the communication robot 3001 When the position of the user P and the position of the communication robot 3001 are within the specified range, the communication robot 3001 calls out to the user P. Therefore, the communication robot 3001 can preferentially talk to the user P nearby. (3) When the moving speed of the user P is within the specified range, the communication robot 3001 calls out to the user P. Therefore, when the user P is walking slowly or stops, the communication robot 3001 can preferentially talk. (4) When the movement position of the user P is estimated and the communication robot 3001 approaches, the conversation by the communication robot 3001 with respect to the user P can be performed more naturally.
  • the communication robot 3001 Based on the facial expression of the user P, the guidance data is selected, and the communication robot 3001 speaks to the user P. Therefore, the communication robot 3001 can talk by guiding the content that matches the facial expression of the user P. (6) Based on the change in the facial expression of the user P before and after calling, the communication robot 3001 further guides the next. Therefore, the communication robot 3001 can talk about the content that can be communicated more.
  • An attribute such as the age of the user P is discriminated from the frame image, guidance data corresponding to the attribute of the user P is selected, and the communication robot 3001 calls out to the user P. Therefore, it is possible to talk by guiding the content that matches the attribute of the user P.
  • the communication robot 3001 can perform all processing from acquisition of frame images to output of guidance data. Therefore, only the communication robot 3001 needs to be prepared, and installation can be performed easily.
  • FIG. 35 is a diagram illustrating an example of a scene suitable for operation of the guidance robot control system 3200 according to the fourth embodiment.
  • FIG. 36 is a functional block diagram of a guidance robot control system 3200 according to the fourth embodiment.
  • FIG. 37 is a diagram illustrating an example of the guidance data storage unit 3222 of the communication robot 3201 according to the fourth embodiment.
  • the guidance robot control system 3200 places the communication robot 3201 at the entrance of a space 3200S such as an exhibition booth or a store entrance.
  • the guidance robot control system 3200 is a system used by the communication robot 3201 to call the user P (in this example, the user P4) to enter the space 3200S.
  • the guidance robot control system 3200 includes a communication robot 3201 and a camera 3205. Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S.
  • the camera 3205 is a surveillance camera.
  • the camera 3205 acquires the frame image of the user P passing through the space 3200S, and transmits the frame image to the communication robot 3201.
  • the communication robot 3201 analyzes the frame image of the user P received from the camera 3205, determines the user P to be called, and provides guidance to the user P.
  • the communication robot 3201 calls out the user P who passes in front of the camera 3205 by determining whether he / she is interested in the space 3200S.
  • the communication robot 3201 includes a control unit 3210, a storage unit 3220, a speaker 3032, and a motor unit 3033.
  • the control unit 3210 includes an image reception unit 3211 (image acquisition unit), an image analysis unit 3212, a user determination processing unit 3213, a face data collation unit (face collation unit) 3214, a guidance data extraction unit 3215, and movement control.
  • a unit 3016, a guide unit 3017, and a facial expression comparison unit 3018 are provided.
  • the image reception unit 3211 receives the frame image transmitted by the camera 3205, thereby receiving the frame image.
  • the image analysis unit 3212 analyzes the frame image received by the image reception unit 3211.
  • the image analysis unit 3212 includes a direction acquisition unit 3212a (user direction acquisition unit), a distance calculation unit 3012b, a facial expression determination unit 3212c, and a face acquisition unit 3212e (face acquisition unit).
  • the direction acquisition unit 3212a analyzes the frame image to identify the user P, determines the direction that the user P is viewing for each user P, and acquires the direction that the user P is viewing.
  • the facial expression determination unit 3212c determines the facial expression of the user P based on the face data of the user P acquired by the face acquisition unit 3212e.
  • the face acquisition unit 3212e acquires the face data of the user P for each user P included in the frame image.
  • the user determination processing unit 3213 determines a user P who is to talk to based on a change with time based on the frame image analyzed by the image analysis unit 3212.
  • the user determination processing unit 3213 includes a movement speed calculation unit 3013b, a movement direction calculation unit 3013c, and a guidance determination unit 3213d.
  • the guidance determination unit 3213d determines the user P to call on, based on the viewing direction of the user P acquired by the direction acquisition unit 3212a, the movement direction calculated by the movement direction calculation unit 3013c, and the like.
  • the face data collation unit 3214 collates the face data stored in the face data storage unit 3224 and the face data acquired by the face acquisition unit 3212e.
  • the guidance data extraction unit 3215 extracts the guidance data for guiding the user P determined by the user determination processing unit 3213 from the guidance data storage unit 3222. At that time, the guidance data extraction unit 3215 extracts guidance data different from the previous one when the face data matching unit 3214 can collate.
  • the storage unit 3220 includes a program storage unit 3021, a guidance data storage unit 3222, an image data storage unit 3023, and a face data storage unit 3224.
  • the program storage unit 3021 stores a control program 3221a for executing the various functions of the control unit 3210 described above.
  • the guidance data storage unit 3222 stores guidance data for calling in association with facial expressions of the user P. Further, the next guidance data is stored in association with the facial expression of the user P after calling.
  • the face data storage unit 3224 is a storage area for storing face data. When the guide data is output, the face data storage unit 3224 stores the face data acquired by the face acquisition unit 3212e in association with the output guide data.
  • the camera 3205 is a live camera represented by a web camera, for example.
  • the camera 3205 may be a stereo camera.
  • a stereo camera refers to a camera that can capture information about the depth of an object by simultaneously capturing images from slightly shifted positions with a plurality of lenses and generating parallax.
  • the camera 3205 includes an image acquisition unit 3251 and an image transmission unit 3252.
  • the image acquisition unit 3251 is a control unit that acquires a frame image.
  • the image transmission unit 3252 is a control unit that transmits the acquired frame image via the communication network 3000N.
  • Communication robot 3201 and camera 3205 are communicably connected via communication network 3000N.
  • the communication network 3000N is, for example, an Internet line or a mobile terminal communication network.
  • the communication network 3000N may use, for example, short-range wireless communication based on the Bluetooth (registered trademark) standard.
  • FIG. 38 is a flowchart showing robot control processing in the communication robot 3201 according to the fourth embodiment.
  • FIG. 39 is a flowchart showing image analysis processing in the communication robot 3201 according to the fourth embodiment.
  • FIG. 40 is a flowchart showing user determination processing in the communication robot 3201 according to the fourth embodiment.
  • control unit 3210 (image receiving unit 3211) of the communication robot 3201 receives the frame image transmitted by the camera 3205. Then, the control unit 3210 stores the received frame image in the image data storage unit 3023. In step S ⁇ b> 3211, the control unit 3210 (image analysis unit 3212) performs image analysis processing for analyzing the frame image stored in the image data storage unit 3023.
  • image analysis processing will be described with reference to FIG.
  • the control unit 3210 detects the user P included in the frame image and acquires the direction in which the user P is viewing. For example, when two users P1 and P2 are included in the frame image, the control unit 3210 detects the users P1 and P2, and acquires the direction in which each user P is looking.
  • a moving object detection method can be used. In the moving object detection method, a frame image in which the user P is not captured is stored as background image data in advance, and a difference between the background image data and the frame image to be processed is obtained, so that the user P who is a moving object. Is detected.
  • human detection using the HOG feature amount can also be used. Human detection using HOG features is described in “N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection.
  • the control unit 3210 acquires, for example, the orientation of the face of the user P.
  • the face direction of the user P is assumed to be the front direction with respect to the camera 3205 when the face can be detected by the control unit 3210 (face acquisition unit 3212e) using the face detection method.
  • face detection using a Haar-like feature can be used.
  • Haar-like features see “P. Viola and M. J. Jones:“ Rapid Object Detection using the Coordinates of the Cofesto et Ecée et al. ”, Proceedings of the Effort. , Pp. 511-518, (2001).
  • the control unit 3210 may detect a face image from the frame image, extract eyes from the face image, and acquire the direction in which the user P is looking based on the relative position between the eyeball and the pupil.
  • step S3232 the control unit 3210 (face acquisition unit 3212e) acquires face data, and the control unit 3210 (expression determination unit 3212c) determines the expression based on the acquired face data. Thereafter, the control unit 3210 shifts the processing to FIG.
  • the control unit 3210 (user determination processing unit 3213) performs user determination processing.
  • the control unit 3210 arranges the frame images subjected to the image analysis process in time series.
  • the control unit 3210 (direction acquisition unit 3212a) acquires the viewing direction for each user P based on the plurality of rearranged frame images.
  • the control unit 3210 acquires the direction as the viewing direction. Therefore, for example, when the user P is jogging, the control unit 3210 does not acquire the viewing direction.
  • the processing from S3242 to S3244 is the same as the processing from S3042 to S3044 of the third embodiment (FIG. 32).
  • human body tracking using the Mean-shift feature can be used.
  • human body tracking using the Mean-shift feature see “Collins R .: 'Mean-Shift Blob Tracking, Throw Scale Space Space, 4-Proc. IEEE Conf. .”It is described in.
  • step S3245 the control unit 3210 (guidance determination unit 3213d) calculates the calling priority based on each index.
  • the voice call priority is a degree calculated by weighting each of the direction that the user P is viewing, the moving direction, the position of the user P, and the moving speed.
  • the user 3 is interested in the space 3200S, but the user P who is about to pass is preferentially spoken, so the degree of weighting is different in the direction in which the user P is viewing and the direction of movement. Yes, when the user P is facing the front direction (the direction of the camera 3205), the highest is set. Then, the control unit 3210 performs weighting so that the calling priority is the highest when the above condition is satisfied and the moving speed is low.
  • control unit 3210 sets the user P as a calling target.
  • control unit 3210 (guidance determination unit 3213d) determines the user P to call based on the calling priority.
  • control unit 3210 may determine that there is no user P to call out. Thereafter, the control unit 3210 ends this processing, and moves the processing to FIG.
  • step S3213 the control unit 3210 (user determination processing unit 3213) determines whether or not the user P is determined by the user determination processing.
  • the control unit 3210 moves the process to S3214.
  • the control unit 3210 ends this process.
  • step S3214 the control unit 3210 (face data collation unit 3214) collates the face data acquired in step S3232 of FIG. 39 with the face data in the face data storage unit 3224.
  • step S3215 the control unit 3210 (face data collation unit 3214) determines whether or not collation has been completed.
  • control unit 3210 advances the process to S3216. On the other hand, if the verification has failed (S3215: NO), control unit 3210 moves the process to S3216a. In addition, the case where it can collate is a case where the user P was called in the past.
  • step S3216 the control unit 3210 refers to the face data storage unit 3224 and the guide data storage unit 3222, and guide data corresponding to the facial expression of the user P is guide data that is different from the output guide data. Extracted from the storage unit 3222.
  • the control unit 3210 refers to the guidance data storage unit 3222 and extracts guidance data corresponding to the facial expression of the user P.
  • the processing of S3217 and S3218 is the same as the processing of S3017 and S3018 of the third embodiment (FIG. 30).
  • step S ⁇ b> 3219 the control unit 3210 associates the face data of the user P with the output guidance data and stores the data in the face data storage unit 3224.
  • step S3220 the control unit 3210 performs post-guidance processing.
  • the post-guidance process is the same as in the third embodiment (FIG. 33). Thereafter, the control unit 3210 ends this process.
  • 41A and 41B are diagrams showing a specific example of a guidance robot control system 3200 according to the fourth embodiment.
  • Communication robot 3201 and camera 3205 are installed near the entrance of space 3200S.
  • the image acquisition unit 3251 of the camera 3205 acquires a frame image as needed, and the image transmission unit 3252 transmits the acquired frame image to the communication robot 3201 as needed.
  • the control unit 3210 walks in front of the space 3200S and targets the user P4 whose face is facing the communication robot 3201 (YES in S3213 in FIG. 38).
  • the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the acquired face data is not stored, the control unit 3210 starts from the guidance data storage unit 3222 (see FIG. 37). Then, guidance data at the time of calling corresponding to the facial expression is extracted (S3216a in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 asks the user P4 “Please drop in at the store”. Then, when the user P4 stops and looks at the communication robot 3201, a further call is made (S3220 in FIG. 38).
  • the control unit 3210 After a while, when the same user P4 is walking in front of the space 3200S and the face is facing the communication robot 3201, as shown in FIG. 41B, the control unit 3210 calls the user P4. Target (S3213 in FIG. 38 is YES). Next, the control unit 3210 determines whether or not the acquired face data is stored in the face data storage unit 3224. If the control unit 3210 determines that the user P4 has provided guidance, the control data storage unit 3222 (see FIG. 37), the guidance data at the time of calling different from the previous one corresponding to the facial expression is extracted (S3216 in FIG. 38). Then, the communication robot 3201 outputs the extracted guidance data via the speaker 3032 (S3218 in FIG. 38). In this example, the communication robot 3201 calls out to the user P4, “You have already passed?”.
  • the guidance robot control system 3200 has the following effects.
  • the communication robot 3201 speaks by determining whether or not to speak to the user P based on the relationship between the direction in which the user P is viewing and the moving direction of the user P. Therefore, for example, the user P who is looking in the direction of the space 3200S while passing in front of the space 3200S can be presumed to be interested in the space 3200S and can be spoken to the user P.
  • the communication robot 3201 can acquire the direction in which the user P is looking based on whether or not the face of the user P has been detected, the processing can be easily performed.
  • the communication robot 3201 Since the communication robot 3201 stores the face data of the user P who has been guided, when the next guidance is given to the same user P, different guidance data from the previous one is provided. Can be output. Therefore, since the guidance can be changed by using the face authentication function, different voice calls can be made. (4) Since the communication robot 3201 performs processing using the frame image acquired by the camera 3205, even if the communication robot 3201 moves, the camera 3205 is fixed and the position for acquiring the frame image does not change. For example, it is possible to easily process a plurality of frame images arranged in time series.
  • the communication robot includes a camera.
  • the communication robot receives a frame image from the camera.
  • the present invention is not limited to this.
  • a communication robot may receive a frame image from a camera.
  • the position of the camera is not limited to the vicinity of the communication robot, and may be installed on the ceiling of the space.
  • frame images may be acquired from a plurality of cameras.
  • a communication robot including a camera may be used. In that case, in order to easily perform image processing, it is desirable that the communication robot does not move and does not change the acquisition position of the frame image.
  • the communication robot performs various processes as an example, but the present invention is not limited to this.
  • a server that is communicably connected to the communication robot may be provided, and the server may perform various processes.
  • the communication robot only needs to include at least a speaker that outputs guidance data transmitted from the server and a motor unit for operating the communication robot.
  • Various processes may be performed by both the communication robot and the server.
  • the example using the frame image acquired at an appropriate timing has been described as an example, but the present invention is not limited to this.
  • moving image data may be used.
  • a plurality of frame images may be acquired from the moving image data, and image analysis processing may be performed for each frame image.
  • image analysis processing may be performed for each frame image.
  • the example in which the image analysis process is performed on all users included in the acquired frame image has been described as an example.
  • the present invention is not limited to this.
  • One user may be selected from the acquired frame images, and an image of the selected user may be analyzed to determine whether or not to make a call target.
  • the direction in which the user is viewing is set as the face direction, and the case where the neck is moved and the direction change amount is increased as it is staggered has been described.
  • the present invention is not limited to this.
  • the direction that the user is viewing may be the direction of the line of sight.
  • the control unit can detect the face image from the frame image and extract the eye from the face image, thereby acquiring the eye direction based on the relative position between the eyeball and the pupil. In that case, when the user's moving direction is different from the face direction, and the user's viewing direction is different, it can be a voice call target.
  • the present invention is not limited to this.
  • a camera having a sensor that acquires depth information represented by Kinect registered trademark
  • Kinect registered trademark
  • the process of detecting the user's face from the frame image and calculating the moving direction of the user can be performed more easily.
  • the processing related to the user attribute is not performed.
  • the user attribute is determined, and the guidance data corresponding to the determined user attribute is extracted. It may be.
  • FIG. 42A and 42B are diagrams showing a specific example of a guidance robot control system 3300 according to a modified embodiment.
  • FIG. 42A shows a case where the user P5 who is walking toward the space 3300S and whose face is facing the communication robot 3301 (camera 3305) is a voice call target.
  • the communication robot 3301 outputs the guidance data corresponding to the moving direction “I welcome you. Would you like to guide you?” And speaks to the user P5.
  • FIG. 42B shows a case in which the user P6 who comes out of the space 3300S and whose face is facing the communication robot 3301 is targeted for calling.
  • the communication robot 3301 outputs “thank you”, which is guidance data corresponding to the moving direction, and speaks to the user P6.
  • the communication robot 3301 extracts and outputs guidance data corresponding to the moving direction of the user P. Therefore, the communication robot 3301 can make different calls to the user P5 entering the space 3300S and the user P6 coming out of the space 3300S, and make different appropriate calls according to the situation of the user P.
  • the communication robot 3301 can be used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un appareil de présentation d'informations ou analogue qui est capable de mener une conversation naturelle tout comme avec un humain, lors du démarrage d'une conversation avec un utilisateur. Un appareil de présentation d'informations (10) comporte : une unité d'entrée d'image (11) sur laquelle une image comprenant un visage d'un utilisateur est entrée ; une unité de détermination de zone (12) qui détermine des zones d'organes faciaux à partir de l'image ; une unité d'extraction d'informations de couleur (13) qui extrait des informations de couleur à partir des zones ; une unité de stockage (15) qui stocke une table de langage de couleur (14) dans laquelle des informations de couleur et des informations de langage sont associées ; et une unité de vérification (16) qui obtient les informations de langage par vérification, dans le tableau de langage de couleur (14), des informations de couleur extraites par l'unité d'extraction d'informations de couleur (13).
PCT/JP2016/081751 2015-11-16 2016-10-26 Appareil de présentation d'informations, procédé de présentation d'informations, programme, appareil de traitement d'informations, et système de commande de robot de guidage WO2017086108A1 (fr)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
JP2015-224262 2015-11-16
JP2015224262A JP6657827B2 (ja) 2015-11-16 2015-11-16 情報提示装置、情報提示方法、およびプログラム
JP2016-045573 2016-03-09
JP2016045573A JP6607092B2 (ja) 2016-03-09 2016-03-09 案内ロボット制御システム、プログラム及び案内ロボット
JP2016046761A JP6610349B2 (ja) 2016-03-10 2016-03-10 案内ロボット制御システム、プログラム及び案内ロボット
JP2016-046761 2016-03-10
JP2016065426A JP2017182261A (ja) 2016-03-29 2016-03-29 情報処理装置、情報処理方法、およびプログラム
JP2016-065426 2016-03-29

Publications (1)

Publication Number Publication Date
WO2017086108A1 true WO2017086108A1 (fr) 2017-05-26

Family

ID=58718847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/081751 WO2017086108A1 (fr) 2015-11-16 2016-10-26 Appareil de présentation d'informations, procédé de présentation d'informations, programme, appareil de traitement d'informations, et système de commande de robot de guidage

Country Status (1)

Country Link
WO (1) WO2017086108A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229868A1 (fr) * 2017-06-13 2018-12-20 富士通株式会社 Robot, dispositif de commande, procédé et programme de commande de robot
JP2019074865A (ja) * 2017-10-13 2019-05-16 ロボットスタート株式会社 会話収集装置、会話収集システム及び会話収集方法
JP2020071852A (ja) * 2018-11-02 2020-05-07 深和パテントサービス株式会社 情報処理装置、情報処理システム、学習装置、学習済の推定モデル、および学習用データの収集方法
CN112947402A (zh) * 2019-11-25 2021-06-11 丰田自动车株式会社 控制系统、控制方法及程序
JP2022526772A (ja) * 2019-08-28 2022-05-26 北京市商▲湯▼科技▲開▼▲發▼有限公司 インタラクティブ方法、装置、デバイス、及び記憶媒体

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007293399A (ja) * 2006-04-21 2007-11-08 Seiko Epson Corp 画像交換装置および画像交換方法ならびに画像交換処理プログラム
JP2009301140A (ja) * 2008-06-10 2009-12-24 Nippon Telegr & Teleph Corp <Ntt> テキストセグメンテーション装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体
JP2011221606A (ja) * 2010-04-05 2011-11-04 Sony Corp 情報処理方法及びグラフィカルユーザインタフェース
JP2011238019A (ja) * 2010-05-11 2011-11-24 Yahoo Japan Corp カテゴリ処理装置及び方法
JP2012161851A (ja) * 2011-02-03 2012-08-30 Advanced Telecommunication Research Institute International ロボットシステムおよびそれに用いる空間陣形認識装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007293399A (ja) * 2006-04-21 2007-11-08 Seiko Epson Corp 画像交換装置および画像交換方法ならびに画像交換処理プログラム
JP2009301140A (ja) * 2008-06-10 2009-12-24 Nippon Telegr & Teleph Corp <Ntt> テキストセグメンテーション装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体
JP2011221606A (ja) * 2010-04-05 2011-11-04 Sony Corp 情報処理方法及びグラフィカルユーザインタフェース
JP2011238019A (ja) * 2010-05-11 2011-11-24 Yahoo Japan Corp カテゴリ処理装置及び方法
JP2012161851A (ja) * 2011-02-03 2012-08-30 Advanced Telecommunication Research Institute International ロボットシステムおよびそれに用いる空間陣形認識装置

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229868A1 (fr) * 2017-06-13 2018-12-20 富士通株式会社 Robot, dispositif de commande, procédé et programme de commande de robot
JP2019074865A (ja) * 2017-10-13 2019-05-16 ロボットスタート株式会社 会話収集装置、会話収集システム及び会話収集方法
JP7055327B2 (ja) 2017-10-13 2022-04-18 ロボットスタート株式会社 会話収集装置、会話収集システム及び会話収集方法
JP2020071852A (ja) * 2018-11-02 2020-05-07 深和パテントサービス株式会社 情報処理装置、情報処理システム、学習装置、学習済の推定モデル、および学習用データの収集方法
JP7450241B2 (ja) 2018-11-02 2024-03-15 深和パテントサービス株式会社 情報処理装置、情報処理システム、および学習装置
JP2022526772A (ja) * 2019-08-28 2022-05-26 北京市商▲湯▼科技▲開▼▲發▼有限公司 インタラクティブ方法、装置、デバイス、及び記憶媒体
JP7224488B2 (ja) 2019-08-28 2023-02-17 北京市商▲湯▼科技▲開▼▲發▼有限公司 インタラクティブ方法、装置、デバイス、及び記憶媒体
CN112947402A (zh) * 2019-11-25 2021-06-11 丰田自动车株式会社 控制系统、控制方法及程序
CN112947402B (zh) * 2019-11-25 2024-05-17 丰田自动车株式会社 控制系统、控制方法及程序

Similar Documents

Publication Publication Date Title
US10810409B2 (en) Identifying facial expressions in acquired digital images
WO2017086108A1 (fr) Appareil de présentation d&#39;informations, procédé de présentation d&#39;informations, programme, appareil de traitement d&#39;informations, et système de commande de robot de guidage
Wu et al. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies
Busso et al. Analysis of emotion recognition using facial expressions, speech and multimodal information
Bettadapura Face expression recognition and analysis: the state of the art
Littlewort et al. Dynamics of facial expression extracted automatically from video
Lin et al. Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition
Varghese et al. Overview on emotion recognition system
KR102167760B1 (ko) 수어동작 인식 처리절차 및 움직임 추적 Pre-trained 모델을 이용한 수어동작 분석 알고리즘 시스템
Hassanat Visual speech recognition
KR102351008B1 (ko) 감정 인식 장치 및 감정 인식 방법
CN113658254B (zh) 一种处理多模态数据的方法及装置、机器人
Paleari et al. Features for multimodal emotion recognition: An extensive study
WO2010133661A1 (fr) Identification d&#39;expressions faciales dans des images numériques acquises
JP2017182261A (ja) 情報処理装置、情報処理方法、およびプログラム
Kumar et al. Facial emotion recognition and detection using cnn
JP6972526B2 (ja) コンテンツ提供装置、コンテンツ提供方法、及びプログラム
Adelhardt et al. Multimodal user state recognition in a modern dialogue system
JP7322374B2 (ja) ロボットの制御装置、ロボット、ロボットの制御方法およびプログラム
Garg Converting American sign language to voice using RBFNN
Koch et al. One-shot lip-based biometric authentication: extending behavioral features with authentication phrase information
Lee Virtual representation of facial avatar through weighted emotional recognition
Rivera et al. Development of an automatic expression recognition system based on facial action coding system
KR102630872B1 (ko) 표정 인식 학습 장치 및 방법
Naidoo South African Sign Language recognition using feature vectors and hidden markov models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16866110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16866110

Country of ref document: EP

Kind code of ref document: A1