WO2021053949A1 - 情報処理装置、情報処理方法、およびプログラム - Google Patents
情報処理装置、情報処理方法、およびプログラム Download PDFInfo
- Publication number
- WO2021053949A1 WO2021053949A1 PCT/JP2020/027500 JP2020027500W WO2021053949A1 WO 2021053949 A1 WO2021053949 A1 WO 2021053949A1 JP 2020027500 W JP2020027500 W JP 2020027500W WO 2021053949 A1 WO2021053949 A1 WO 2021053949A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- event
- information
- feature data
- information processing
- unit
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 134
- 238000003672 processing method Methods 0.000 title claims description 11
- 230000009471 action Effects 0.000 claims description 22
- 230000007613 environmental effect Effects 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 description 27
- 238000000034 method Methods 0.000 description 27
- 238000010586 diagram Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 238000012545 processing Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 239000000284 extract Substances 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 208000013057 hereditary mucoepithelial dysplasia Diseases 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000016507 interphase Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J13/00—Controls for manipulators
- B25J13/003—Controls for manipulators by means of an audio-responsive input
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J11/00—Manipulators not otherwise provided for
- B25J11/0005—Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
Definitions
- This disclosure relates to information processing devices, information processing methods, and programs.
- mobile robots that operate in a predetermined environment not only differ greatly in visual information, but also this visual information changes from moment to moment. Further, the space area that can be observed by the mobile robot is partially occupied by an object or the like existing in a predetermined environment. Therefore, a recognition system that is not affected by environmental differences and changes or visual occlusion under a predetermined environment such as a house has been desired.
- this disclosure proposes an information processing device, an information processing method, and a program that can recognize a specific event from some information under a predetermined environment.
- the information processing device of one aspect according to the present disclosure includes a sensor unit that senses environmental information of a predetermined area, event feature data related to a predetermined event, and spatial information of the predetermined event associated with the event feature data. Control to retrieve the event information from the storage unit based on the storage unit in which the event information including the meta information is stored and the sensing result by the sensor unit, and acquire the spatial information included in the event information. It is equipped with a department.
- a specific event can be recognized from some information under a predetermined environment.
- the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure. Furthermore, the effects described herein are merely exemplary and not limited, and may have other effects.
- the visual information differs greatly depending on each environment, and it changes from moment to moment. That is, the recognition of daily events in a predetermined environment such as a house differs greatly in each environment, and the characteristics of the event differ in each environment. Therefore, the mobile robot needs to learn the event for each environment. Furthermore, the space that can be observed by the mobile robot described above becomes partial from time to time. That is, since the space observed by the sensors differs from time to time in the mobile robot, the features are integrated from the inputs of multiple sensors, and even if the observation is incomplete, the incomplete information is complemented and the event is performed. Need to be recognized. Therefore, there has been a demand for a recognition system that is not affected by differences or changes in the environment in the house or visual occlusion.
- daily events are defined by event feature data obtained based on inputs from a plurality of sensors and event meta information, and mapped three-dimensionally.
- the event feature data is visual information and auditory information that characterize the event itself.
- the event feature data includes object feature data indicating the feature amount of the object and voice feature data indicating the feature amount of voice.
- the event meta information includes position information indicating a predetermined position as spatial information.
- FIG. 1 is a diagram showing an example of an outline of a learning phase of an information processing apparatus according to the present embodiment.
- the information processing device 1 according to the present embodiment exists in an environment of a predetermined area such as a house H.
- a predetermined area such as a house H.
- the information processing device 1 may be a humanoid robot, a drone, or the like instead of the pet-type robot.
- the information processing device 1 is, for example, a pet-type robot provided with a pet-shaped housing, and can move in the house H.
- the information processing device 1 as a pet-type robot learns an event in the house H based on the detection result of a sensor that senses environmental information in the house H.
- a case of learning a person's visit, specifically a man's return home is described as an event, but the event is not necessarily limited to the above-mentioned person's visit to the house H.
- various events can be detected.
- the information processing device 1 includes a sensor unit 2 and can sense environmental information in a predetermined area.
- the information processing device 1 includes a microphone sensor 21 and a camera sensor 22.
- the information processing device 1 senses an event in the house H based on audio data as a part of the sensing result collected by the microphone sensor 21 and video data as a part of the sensing result captured by the camera sensor 22. To do.
- the microphone sensor 21 acquires audio data and the camera sensor 22 acquires video data. Since the video data is data composed of a plurality of image data, the video data includes the concept of image data.
- the information processing device 1 learns the interphase relationship between the audio data and the video data and converts it into event feature data. At the same time, the information processing device 1 maps the place where the learned event feature data is obtained and learns it as event meta information including three-dimensional position information as spatial information. In the example shown in FIG. 1, the position information of the door D of the entrance in the house H is mapped as the position where the learned event feature data is generated, and is used as the event meta information.
- FIG. 2 is a diagram showing an outline of an outline of the recall phase of the information processing apparatus according to the present embodiment.
- the information processing device 1 acquires voice data by the microphone sensor 21, it searches for event feature data based on the acquired voice data and obtains position information included in the related event meta information. Find out.
- the voice feature data "EA0011” is searched out based on the voice data of, for example, the footsteps of a man and the sound of the door D opening acquired by the microphone sensor 21.
- the voice feature data "EA0011” is searched, the related event feature ID "E001" and the event meta information "EM001" can be searched.
- the information processing device 1 moves to the location of the position information included in the retrieved event meta information "EM001" by the drive mechanism described later. As a result, the information processing device 1 which is a pet-type robot can automatically move to a place where a similar event that has occurred now occurs based on an event that has occurred in the past.
- the information processing device 1 can gradually refine the event information by sequentially updating the event DB when the audio data and the video data are acquired at the same time. Therefore, according to the information processing device 1, it is not necessary for the user to set detailed event information in advance, the event information can be optimized by a simple operation, and the optimization of the event information can be facilitated.
- FIG. 3 is a block diagram showing a configuration example of the information processing device 1 according to the present embodiment.
- the information processing device 1 includes a sensor unit 2, a communication unit 3, a storage unit 4, and a control unit 5.
- the sensor unit 2 has a sensor that senses environmental information in a predetermined area (inside the house H).
- the sensor unit 2 includes a microphone sensor 21, a camera sensor 22, and a depth sensor 23.
- the microphone sensor 21 is a device that collects ambient sound and outputs audio data converted into a digital signal via an amplifier and an ADC (Analog Digital Converter). That is, the microphone sensor 21 assumes a sensor capable of inputting voice, such as a microphone.
- the camera sensor 22 is an image pickup device that has a lens system such as an RGB camera and an image pickup element and captures an image (still image or moving image). It is assumed that the information acquired by the camera sensor 22, that is, the input is an image having a single color or a plurality of color information.
- the depth sensor 23 is a device that acquires depth information such as an infrared range finder, an ultrasonic range finder, a LiDAR (Laser Imaging Detection and Ranging), or a stereo camera. That is, the depth sensor 23 is a so-called 3D sensor that measures the distance to the subject.
- the information processing device 1 may acquire the sensing result of a predetermined area from the sensor unit 2 provided separately from the information processing device 1.
- the communication unit 3 is a communication module that transmits / receives data to / from another communicable device via a predetermined network.
- the communication unit 3 includes a reception unit 31 and a transmission unit 32.
- the receiving unit 31 receives predetermined information from another device and outputs it to the control unit 5.
- the transmission unit 32 transmits predetermined information to another device via the network.
- the storage unit 4 is a storage device for recording at least event information.
- the storage unit 4 stores the voice feature database (DB) 41, the object mask DB 42, the object feature DB 43, the event meta information DB 44, the event feature DB 45, the event DB 46, the threshold value DB 47, and the recall event meta information 48.
- DB voice feature database
- the voice feature data stored in the voice feature DB is information related to the feature amount of the voice data acquired by the information processing device 1.
- the voice feature data is, for example, a feature amount extracted by the control unit 5 described later based on the voice data acquired by the microphone sensor 21.
- FIG. 4 is a diagram showing a specific example of voice feature data according to the present embodiment. As shown in FIG. 4, the storage unit 4 stores the voice feature data 210 in which a predetermined feature amount is extracted from the voice data 200 acquired by the microphone sensor 21 in the voice feature DB 41.
- the voice feature data is abstracted as "EA0015" or the like, but the voice feature data such as "EA0015" is a specific voice feature data as shown in FIG.
- the object mask information stored in the object mask DB 42 is the information of the object mask for estimating the area of the object with respect to the video data acquired by the camera sensor 22.
- the object mask information is information that serves as a reference for detecting an object.
- FIG. 5A is a diagram showing a specific example of the image sequence according to the present embodiment, and FIGS. 5B and 5C are regions in which the object 101 included in the image sequence of the image data 100 in FIG. 5A is detected and the object exists. This is a specific example of the object mask information obtained by estimating.
- the storage unit 4 shown in FIG. 3 stores various object mask information shown in FIG. 5C.
- FIG. 5D is a diagram showing object feature data which is a feature amount of each object extracted based on each object mask information stored in the object mask DB 42 of the storage unit 4.
- Each object feature data obtained from the object mask information shown in FIG. 5C is abstracted as "EB001" or the like in the following description, but the object feature data "EB001” or the like is shown in FIG. 5D.
- Object feature data. Specific object feature data is, for example, 256-dimensional vector data. These object feature data are stored in the object feature DB 43 of the storage unit 4 shown in FIG.
- FIG. 6 is a diagram showing an example of event feature data.
- the event feature data includes the object feature data and the voice feature data associated with the event feature ID.
- the object feature data is stored in the event feature DB 45 of the storage unit 4 shown in FIG.
- the event meta information includes at least two-dimensional or three-dimensional position information.
- the event meta information may include time information.
- the event meta information is meta information including position information and time information related to a predetermined event.
- the event meta information may further include information necessary for the behavior of the mobile robot.
- the information required for the action of the mobile robot is, for example, category information, occurrence frequency information, occurrence date / time information, and the like related to the event.
- the event feature DB 45 is a database in which the above-mentioned voice feature data and object feature data are associated with each other and stored as event feature data.
- FIG. 6 is a diagram showing a specific example of event feature data.
- the event feature data is data in which the object feature data and the voice feature data associated with each other are associated with the event feature ID.
- the object feature data “EB003” and “EB005” are used as the event feature ID “EB003” (see FIG. 4). It is configured in association with "E001".
- the event DB 46 is a database in which the above-mentioned event feature data and event meta information are associated with each other and stored as event information.
- FIG. 7 is a diagram showing a specific example of event information.
- the event information is data in which the event feature ID and the event meta information associated with each other are associated with the event ID.
- the event feature ID "EM001” and the event meta information "EM001” related thereto are attached with the event ID "001" to form the event information.
- the event feature ID "EM002" and the event meta information "EM002” related thereto are attached with the event ID "002" to form the event information.
- the threshold value DB 47 includes threshold value information for determining the degree of agreement between the audio data acquired by the microphone sensor 21 and the video data acquired by the camera sensor 22.
- the threshold value of the degree of matching is referred to as a matching threshold value in the present specification, and is a threshold value relating to the degree of matching between the audio feature data obtained from the audio data and the object feature data obtained from the video data.
- the match threshold value is threshold information for determining whether or not to enter the learning phase, in other words, is a threshold value for determining whether or not the event should be registered.
- the learning phase is a process in which the event DB 46 is changed by the registration process or the update process performed by the control unit 5.
- the recall phase is a process in which the control unit 5 outputs the event meta information included in the predetermined event information from the event DB 46 under a predetermined condition.
- the threshold value DB 47 includes threshold information for determining the degree of similarity between the voice feature data registered in the voice feature DB 41 and the voice data acquired by the microphone sensor 21. Further, the threshold value DB 47 includes threshold value information for determining the degree of similarity between the object feature data registered in the object feature DB 43 and the video data acquired by the camera sensor 22. These thresholds are referred to herein as recall thresholds. In other words, as for the recall threshold, is there event feature data including voice feature data and object feature data similar to the feature amount of the input voice data and video data in the event information stored in the event DB 46? It is a threshold value for determining whether or not.
- the recall event meta information 48 is the event meta information included in the event information retrieved from the event DB 46.
- the information processing device 1 makes an action plan based on the recall event meta information 48.
- the control unit 5 has a function of controlling each configuration included in the information processing device 1.
- the control unit 5 includes a voice feature extraction unit 51, an object area estimation unit 52, an object feature extraction unit 53, a sound source object estimation unit 54, a spatial position information acquisition unit 55, a time information acquisition unit 56, and learning. It includes a recall unit 57 and an action plan control unit 58.
- the voice feature extraction unit 51 extracts the feature amount with a high degree of abstraction from the voice data input from the microphone sensor 21 and converts it into voice feature data.
- the conversion process from the voice data to the voice feature data can be realized by a technique such as a Fourier transform process.
- the object area estimation unit 52 estimates the area where the object 101 exists with respect to the plurality of image data 100 included in the video data shown in FIG. 5A acquired by the camera sensor 22, and the object area estimation unit 52 estimates the area where the object 101 exists. Outputs object mask information indicating the area.
- the individual objects 101 included in the image data 100 are distinguished, and the respective object mask information is stored in the storage unit 4.
- the object feature extraction unit 53 identifies the area of each object 101 from the plurality of image data 100 included in the input video data and the object mask information. As shown in FIG.
- the object feature extraction unit 53 extracts a feature amount having a high degree of abstraction for each region of the object 101 from the region of the specified object 101 and converts it into object feature data.
- the object feature extraction unit 53 is stored in the storage unit 4. It should be noted that the process of specifying the region of the object 101 and converting it into the object feature data can be realized by the existing technology.
- the sound source object estimation unit 54 calculates the degree of agreement between the voice feature data obtained by the voice feature extraction unit 51 and the respective object feature data obtained by the object feature extraction unit 53.
- the sound source object estimation unit 54 estimates the source of the voice data detected by the voice feature extraction unit 51, that is, the object 101 that is the sound source, based on the calculation of the degree of coincidence.
- the sound source object estimation unit 54 estimates the direction in which the sound source is generated by using various orientation calculation algorithms such as the MUSIC (Multiple Signal Classification) method for the voice data, and the position of the object to be the sound source. To estimate.
- MUSIC Multiple Signal Classification
- the sound source object estimation unit 54 associates the object feature data estimated to have a high degree of agreement with the audio feature data, and outputs the event feature data as event feature data.
- Examples of the method of calculating the degree of agreement between the voice feature data and the object feature data include, but are not necessarily limited to, a method of calculating the internal product of the object feature data and the voice feature data.
- the calculation of the degree of agreement between the voice feature data and the object feature data can also be performed by, for example, a neural network obtained by machine learning.
- the event feature data is assigned the event feature ID "E001"
- the object feature data "EB003" and "EB005" are associated with the voice feature data "EA0015". ing.
- the above microphone sensor 21, camera sensor 22, voice feature extraction unit 51, object area estimation unit 52, object feature extraction unit 53, sound source object estimation unit 54, voice feature DB 41, object mask DB 42, and object feature DB 43 are feature extraction. It constitutes a part 70.
- the feature extraction unit 70 generally extracts event feature data from various data such as input audio data and video data. On the other hand, the feature extraction unit 70 calculates the degree of coincidence between the object feature data and the audio feature data, and calculates whether or not the video data includes an object as a sound source.
- the spatial position information acquisition unit 55 creates a map of a predetermined area (inside the house H) based on the depth information detected by the depth sensor 23, and stores it in the storage unit 4 as map information as a base of event meta information.
- the spatial position information acquisition unit 55 can generate map information by SLAM (Simulation Localization and Mapping).
- the spatial position information acquisition unit 55 may update the map information at a predetermined cycle on the assumption that the furniture in the house H will be rearranged, and the map information may be updated every time the information processing device 1 is moved. May be generated each time. Further, the information processing device 1 may store a map generated by another device as map information.
- the spatial position information acquisition unit 55 can calculate specific position information by comparing the depth information obtained by the depth sensor 23 with the map information stored in the storage unit 4.
- a method of acquiring predetermined position information by the spatial position information acquisition unit 55 the following processing can be mentioned. That is, a process of acquiring coordinate information on the earth using a positioning system such as a GPS (Global Positioning System) system, or a relative position from a predetermined starting point using video data such as VisualSLAM. It is a process of self-position estimation to acquire.
- the time information acquisition unit 56 is a time information receiving mechanism that receives time information from a time measuring mechanism such as a clock or a server that outputs time information via a predetermined network.
- the spatial position information acquisition unit 55 outputs the position information associated with the observed event as a part of the event meta information.
- the event meta information is stored in the event meta information database of the storage unit 4.
- Event meta information includes at least event location information.
- the position information of the event refers to the coordinate representation by two or more numerical values with the origin at an arbitrary position.
- the position information can be represented by spatial information such as a relative position from a predetermined starting point in a map of the environment, that is, an XYZ position in the world coordinate system, or coordinate information of the world geodetic system obtained from GPS satellites.
- the time information acquired by the time information acquisition unit 56 may be associated with the position information calculated by the spatial position information acquisition unit 55 as the time information at the time when the event occurs, and may be used as a part of the event meta information.
- the depth sensor 23, the spatial position information acquisition unit 55, the time information acquisition unit 56, and the event meta information DB 44 constitute the event meta information acquisition unit 80.
- the event meta information acquisition unit 80 generally outputs information necessary for searching for event information and the action of the mobile robot as event meta information based on the input from the depth sensor 23, and stores it in the storage unit 4.
- the learning recall unit 57 as a part of the generation unit generates event information by associating the event feature data obtained by the feature extraction unit 70 with the event meta information obtained by the event meta information acquisition unit 80. It is stored in the event DB 46 of the storage unit 4.
- the event feature data is stored in the event feature DB 45, and the event information is stored in the event DB 46, but the present invention is not necessarily limited to this. That is, instead of using a database, a system capable of outputting related information from a specific input such as a Boltzmann machine or a self-organizing map may be used.
- the learning recall unit 57 performs any of registration processing, update processing, and recall processing for the event information based on the event feature data output from the feature extraction unit 70 and the matching threshold value and the recall threshold value stored in the threshold value DB 47. Determine if you want to do it.
- the event memory unit 90 is composed of the above learning recall unit 57, the event feature DB 45, the event DB 46, and the threshold value DB 47.
- the event memory unit 90 generally selects either registration, update, or recall processing for the event information, while generating the event information and storing it in the storage unit 4.
- the action plan control unit 58 has a function of planning the action to be performed by the information processing device 1 based on the information acquired by the sensor unit 2 and various data stored in the storage unit 4. First, the action plan control unit 58 according to the present embodiment searches for the event meta information stored in the event meta information DB 44 corresponding to the voice data from the voice data acquired by the microphone sensor 21. Subsequently, the action plan control unit 58 determines to execute the action of moving to the position designated by the position information based on the position information included in the retrieved event meta information.
- the action plan control unit 58 has a function of controlling the operation of the drive unit 6.
- the drive unit 6 has a function of driving the physical configuration of the information processing device 1.
- the drive unit 6 has a function for moving the position of the information processing device 1.
- the drive unit 6 is, for example, an actuator driven by a motor 61.
- the action plan control unit 58 controls the motor 61 of the drive unit 6 based on the above-mentioned action plan to drive the actuators provided in each joint portion provided in the drive unit 6.
- the drive unit 6 may have any configuration as long as the information processing device 1 can realize a desired operation.
- the drive unit 6 may have any configuration as long as the position of the information processing device 1 can be moved.
- the drive unit 6 drives the caterpillar or the tire.
- the drive unit 6 may further include sensors such as a GPS receiver and an acceleration sensor, which are necessary for controlling the mobile robot.
- FIG. 9 is a flowchart showing a processing procedure executed by the information processing apparatus 1 according to the embodiment.
- step ST1 the event feature is acquired by the feature extraction unit 70 of the information processing device 1.
- the microphone sensor 21 acquires audio data
- the camera sensor 22 acquires video data.
- the camera sensor 22 may acquire a plurality of image data instead of acquiring the video data.
- the voice feature extraction unit 51 of the control unit 5 extracts voice feature data from the acquired voice data and stores it in the voice feature DB 41.
- the object area estimation unit 52 and the object feature extraction unit 53 extract the object feature data from the video data using the object mask data and store it in the object feature DB 43.
- the sound source object estimation unit 54 estimates an object to be a sound source of the acquired voice data from the voice feature data and the object feature data.
- Event feature data is generated by combining voice feature data and object feature data.
- the event feature data may be composed of only the audio feature data or only the object feature data. Further, in parallel with the generation of the event feature data, the event meta information acquisition unit 80 generates the event meta information at the place where the audio data and the video data are acquired, and stores the event meta information in the event meta information DB 44.
- the event memory unit 90 of the information processing device 1 determines whether or not the generated event feature data exceeds the match threshold value. Specifically, first, the sound source object estimation unit 54 calculates the degree of coincidence between the voice feature data included in the event feature data and the object feature data, and outputs the data to the learning recall unit 57. When the learning recall unit 57 determines that the input degree of matching exceeds the matching threshold value (step ST2: Yes), the process proceeds to step ST3.
- the degree of coincidence between the voice feature data and the object feature data is high, it means that the camera sensor 22 captures the object that outputs the voice data at substantially the same time as the microphone sensor 21 acquires the voice data. .. In this case, as described above, the processing of the information processing device 1 enters the learning phase.
- step ST3 the control unit 5 of the information processing device 1 recalls an event based on the event feature data.
- the learning recall unit 57 of the control unit 5 searches for the event information stored in the event DB 46 based on the acquired event feature data.
- the event DB 46 stores, for example, an event feature ID and event meta information associated with the event ID as shown in FIG. 7.
- step ST4 the learning recall unit 57 determines whether or not there is event information having event feature data whose similarity to the acquired event feature data exceeds a predetermined recall threshold.
- the learning recall unit 57 recalls the threshold value based on other information included in the event meta information and the threshold value based on the frequency of occurrence and the date and time of occurrence. It may be used as a threshold value.
- the learning recall unit 57 updates the retrieved event feature data. Specifically, the learning recall unit 57 updates the event feature data included in the retrieved event information with the acquired event feature data. That is, for example, among the event feature data having the event feature ID "E001", the voice feature data is updated from the voice feature data "EA0015" shown in FIG. 6 to the voice feature data "EA0024” shown in FIG. The object feature data may be updated as needed. The updated event feature data of the event feature ID "E001" is stored in the event ID "001" shown in FIG. 7, and the event information is updated. As a result, the learning phase executed by the information processing device 1 is completed.
- step ST4 determines in step ST4 that there is no event information including event feature data exceeding a predetermined recall threshold value (step ST4: No)
- step ST6 the control unit 5 registers the event.
- the learning recall unit 57 generates event feature data from the voice feature data and the object feature data output from the feature extraction unit 70.
- the learning recall unit 57 acquires the event meta information output from the event meta information acquisition unit 80.
- the learning recall unit 57 associates the event feature data with the event meta information, attaches an event ID, and stores the event in the event DB 46. As a result, the learning phase executed by the information processing device 1 is completed.
- step ST2 determines in step ST2 that the calculated degree of matching is equal to or less than the matching threshold value (step ST2: No)
- step ST7 the degree of coincidence between the voice feature data and the object feature data is equal to or less than the matching threshold value
- the object that outputs the voice data is not captured by the camera sensor 22 at the time when the voice data is acquired by the microphone sensor 21. Become. In this case, as described above, the processing of the information processing device 1 enters the recall phase.
- step ST7 the control unit 5 of the information processing device 1 recalls the event based on the voice feature data.
- the learning recall unit 57 of the control unit 5 searches for the event information stored in the event DB 46 based on the acquired voice feature data.
- the learning recall unit 57 may search for event information based on the acquired object feature data.
- the event DB 46 stores, for example, an event feature ID and event meta information associated with the event ID as shown in FIG. 7.
- step ST8 the learning recall unit 57 determines whether or not there is event information in which the similarity between the voice feature data included in the retrieved event information and the acquired voice feature data exceeds a predetermined recall threshold value. Is determined.
- the learning recall unit 57 determines that there is event information including the voice feature data whose similarity with the acquired voice feature data exceeds the recall threshold value (step ST8: Yes)
- the process proceeds to step ST9.
- the acquired voice feature data is "EA0015" will be taken as an example.
- step ST9 the control unit 5 outputs the event meta information of the corresponding event.
- the learning recall unit 57 first searches for the event feature data "E001" (see FIG. 6) including the voice feature data "EA0015", and then searches for the event information of the event ID "001" shown in FIG. 7. ..
- the learning recall unit 57 reads out the event meta information "EM001” included in the event information of the searched event ID "001".
- the learning recall unit 57 outputs the read event meta information “EM001” as the recall event meta information 48 to the action plan control unit 58.
- the recall phase executed by the information processing device 1 is completed.
- the action plan control unit 58 into which the recall event meta information 48 is input executes an action plan based on the position information included in the recall event meta information 48, and controls the drive unit 6. As a result, the information processing device 1 moves to the location indicated by the position information included in the recall event meta information 48.
- step ST8 when the learning recall unit 57 determines that there is no event information including the voice feature data whose similarity with the acquired voice feature data exceeds the recall threshold (step ST8: No), the information.
- the recall phase executed by the processing device 1 ends.
- the spatial position information acquisition unit 55 of the information processing device 1 has a predetermined area (inside the house H) based on the depth information detected by the depth sensor 23, for example, by VisualSLAM. Create a map of. In this case, for example, the position of the door D that emits voice is also included in the map information. The created map is stored in the event meta information DB 44 of the storage unit 4 as the map information that is the base of the event meta information. As a result, the information processing apparatus 1 can always estimate its own position from the starting point.
- the object feature extraction unit 53 of the information processing device 1 detects a person or an object such as a resident of the house H by the camera sensor 22.
- the detected objects 102, 103, and 104 are converted into object feature data and stored in the object feature DB of the storage unit 4.
- the detection and identification of an object by the object feature extraction unit 53 can be realized by using known machine learning and pattern recognition techniques such as a boosting method, a neural network, and a hidden Markov model (HMM) method.
- HMM hidden Markov model
- the information processing device 1 acquires the voice generated at the door D as voice data.
- the control unit 5 determines that the degree of coincidence between the video data and the voice data is high. In this case, the processing of the information processing device 1 shifts to the learning phase described above.
- the information processing device 1 when the resident opens the front door D, the information processing device 1 may not be able to capture the sound generation status as video data. In this case, the information processing device 1 determines that the control unit 5 of the information processing device 1 has a low degree of coincidence between the video data and the audio data, and shifts to the recall phase. When shifting to the recall phase, the information processing device 1 searches for event information based on the input voice data and reads out the recall event meta information 48. The information processing device 1 executes an action plan based on the read recall event meta information 48, and moves to the position indicated by the position information included in the recall event meta information 48. As a result, the information processing device 1 can produce a situation in which the resident who has returned home in response to the generated voice is greeted.
- the event information including the same audio data is searched for based on the audio data acquired by the information processing device 1, and the event information is moved to the position based on the associated event meta information. It may be based on video data.
- the information processing device 1 that has acquired the light of thunder as video data searches for event information including object feature data based on the same video data, and moves to a position based on the associated event meta information. You may.
- the information processing device 1 composed of the mobile robot can newly generate the event information that is not stored in the event DB 46 by acquiring the audio data and the video data corresponding to each other substantially at the same time. Only when moving to the learning phase. In this case, the generation of event information depends on chance. Therefore, various methods can be adopted in order to facilitate the simultaneous acquisition of audio data and video data related to each other.
- the application installed in the mobile terminal device owned by the resident may be linked with the GPS information provided in the mobile terminal device. ..
- the mobile terminal device application is set so that the resident can be notified of information about the mobile robot and the resident's position information can be transmitted to the mobile robot.
- the mobile robot is moved to a random place and controlled to stand by. Further, when there is no resident in the house H, the person may wait at a different place each time. Further, an action plan for moving in the direction in which the sound is produced may be added to the microphone sensor 21 by using beamforming. Further, the application of the mobile terminal device may be used to greet the resident who returns home to the resident who is not out of the office together with the mobile robot.
- FIG. 12 is a diagram showing a specific example of the event DB according to the modified example of the present disclosure.
- 13A, 13B, and 13C are diagrams showing specific examples of the information processing method according to the modified example of the present disclosure and the movement of the information processing apparatus, respectively.
- the information processing device 1A according to the modified example is, for example, a mobile robot that operates a household electric appliance (hereinafter, home appliance).
- the event feature data of the object feature data and the voice feature data of each home appliance is the event meta information including the position information in which the home appliances are arranged in the event DB 46 of the storage unit 4. It is stored in association with.
- event IDs "010", “011”, and "012" are set corresponding to, for example, a water heater, a dishwasher, and a microwave oven, and are stored in the event DB 46, respectively.
- the event ID "010”, the event feature ID "E012”, the object feature data "EB012”, the voice feature data "EA0050”, and the event meta information "EM012” are associated with each other, and the event DB46 It is stored in.
- the information processing device 1A emitted by the home appliance acquires the information.
- the information processing device 1A extracts voice feature data from the acquired voice data, searches for voice feature data having a high degree of similarity to the extracted voice feature data, and searches for event meta information associated with the voice feature data. Put out.
- the voice feature data “EA0050” having a high degree of similarity to the acquired and extracted voice feature data is searched for, and the associated event meta information “EM012” is searched for.
- the information processing device 1A can recognize the position of the microwave oven, which is a home appliance.
- the information processing device 1A moves from the position shown in FIG. 13B to the position shown in FIG. 13C based on the position information included in the retrieved event meta information "EM012" to operate the home appliance.
- the information processing devices 1 and 1A are arranged in the predetermined area (house H) has been described, but the present invention is not limited to this.
- the information processing device 1 can be configured as a server device.
- FIG. 14 is a schematic view of the information processing device 300 according to the modified example.
- the information processing apparatus 300 is shown in a simplified manner.
- the information processing device 300 according to the modified example is a server device and includes an event meta information DB 144, an event feature DB 145, and an event DB 146.
- the information processing device 300 receives voice data and video data as a sensing result of environmental information transmitted from, for example, the pet-type robot 400.
- the pet-type robot 400 includes a sensor unit 2, a drive unit 6 that can move to a position designated by input position information, and a drive control unit that drives the drive unit 6.
- the information processing device 300 controls the behavior of the pet-type robot 400 based on the event meta information stored in the event meta information DB and the event feature data stored in the event feature DB 145.
- the information processing device 300 transmits position information to be moved by the pet-type robot 400 based on the received audio data or video data.
- the pet-type robot 400 that has received the position information moves to a position that includes the received position information.
- the information processing device 300 and the mobile terminal device 500 owned by the user may be made communicable so that the movement of the pet-type robot 400 can be controlled by the mobile terminal device 500.
- the predetermined area has been described as the house H, but the present invention is not limited to this, and any area can be set as the predetermined area.
- FIG. 15 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of the information processing device 1.
- the computer 1000 has a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600.
- Each part of the computer 1000 is connected by a bus 1050.
- the CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
- the ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.
- BIOS Basic Input Output System
- the HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by such a program.
- the HDD 1400 is a recording medium for recording a program according to the present disclosure, which is an example of program data 1450.
- the communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet).
- the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
- the input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000.
- the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media).
- the media is, for example, an optical recording medium such as DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
- an optical recording medium such as DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk)
- a magneto-optical recording medium such as MO (Magneto-Optical disk)
- tape medium such as DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk)
- MO Magneto-optical disk
- the CPU 1100 of the computer 1000 realizes the functions of the spatial position information acquisition unit 55 and the like by executing the program loaded on the RAM 1200.
- the HDD 1400 stores the program related to the present disclosure and the data in the storage unit 4.
- the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.
- both the image data and the audio data can be combined with the spatial information and stored in a state in which they can be recollected from each other.
- all other information specifically, the audio data, the image data, and the event meta information can be searched out and used for the behavior control of the mobile robot.
- the information processing devices 1 and 1A can move to a place where an event occurs as long as they can acquire audio data.
- the information processing devices 1 and 1A such as a mobile robot cannot acquire audio data, they can move to a place where an event occurs as long as they can acquire video data.
- event information is registered and continuously updated at the timing when audio data and video data can be acquired at the same time, the information processing devices 1 and 1A should be operated robustly against changes in the environment. Can be done.
- the environment such as objects in the house H changes from moment to moment, by shifting to the learning phase at the timing when audio data and video data are acquired at the same time, it operates in response to changes in the environment from the next time onward. it can.
- the present technology can also have the following configurations.
- a sensor unit that senses environmental information in a predetermined area
- a storage unit that stores event information including event feature data relating to a predetermined event and meta information including spatial information of the predetermined event associated with the event feature data.
- the control unit Based on the sensing result by the sensor unit, the control unit that retrieves the event information from the storage unit and acquires the spatial information included in the event information.
- Information processing device equipped with (2) The control unit The degree of similarity between the sensing result sensed by the sensor unit and the event feature data stored in the storage unit is determined.
- the information processing device according to (1) above, wherein when the similarity exceeds a predetermined recall threshold value, event information including event feature data exceeding the recall threshold value is retrieved from the storage unit.
- Information processing device (4) The control unit Based on the voice feature data obtained from the voice sensed by the sensor unit, the event information including the voice feature data whose similarity with the voice feature data exceeds a predetermined recall threshold is searched from the storage unit.
- the information processing apparatus according to (3).
- the control unit Based on the object feature data obtained from the object sensed by the sensor unit, the event information including the object feature data whose similarity with the object feature data exceeds a predetermined recall threshold is searched from the storage unit.
- the information processing device according to (3).
- the object feature data is a feature amount of the object sensed by the sensor unit.
- the information processing device according to any one of (3) to (5) above, wherein the voice feature data is a feature amount of voice emitted from an object sensed by the sensor unit.
- It is configured to be able to control a mobile robot equipped with a drive unit that moves the housing.
- the control unit The information processing device according to any one of (1) to (6) above, wherein an action plan is performed based on the acquired spatial information, and control is performed to make the mobile robot act according to the action plan.
- the information processing device according to any one of (1) to (7) above, which is a mobile robot.
- the computer Event information including event feature data related to a predetermined event and meta information including spatial information of the predetermined event associated with the event feature data based on a sensing result by a sensor unit that senses environmental information of a predetermined area.
- An information processing method that retrieves the event information from the storage unit in which the information is stored and outputs the spatial information included in the event information.
- Computer A sensor unit that senses environmental information in a predetermined area, A storage unit that stores event information including event feature data relating to a predetermined event and meta information including spatial information of the predetermined event associated with the event feature data.
- a control unit that retrieves the event information from the storage unit based on the sensing result by the sensor unit and outputs the spatial information included in the event information.
- a program that functions as.
- a sensor unit that senses environmental information in a predetermined area, The event feature data related to the predetermined event obtained based on the sensing result by the sensor unit and the meta information including the spatial information of the predetermined event obtained based on the sensing result are associated with each other to generate the event information.
- the generator to generate and Information processing device equipped with.
- the event feature data includes an object feature data obtained from an object that can be sensed by the sensor unit and a voice feature data obtained from a voice that can be sensed by the sensor unit.
- the control unit The degree of coincidence between the object feature data and the voice feature data obtained based on the sensing result is determined.
- the information processing device according to (10), wherein the generation unit generates the event information when the degree of matching exceeds a predetermined matching threshold.
- the computer Event feature data related to a predetermined event obtained based on the sensing result by the sensor unit that senses the environmental information of the predetermined area, and meta information including the spatial information of the predetermined event obtained based on the sensing result.
- An information processing method that generates event information by associating with each other.
- Computer A sensor unit that senses environmental information in a predetermined area, The event feature data related to the predetermined event obtained based on the sensing result by the sensor unit and the meta information including the spatial information of the predetermined event obtained based on the sensing result are associated with each other to generate the event information.
- the generator to generate and A program that functions as.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Manipulator (AREA)
Abstract
Description
[一実施形態によるシステムの構成]
まず、本開示の一実施形態の概要について説明する。上述したように、近年、例えばペット型ロボットなどの移動型ロボットによって、住宅内やデバイス周辺などの所定の環境情報を認識する技術が開発されている。
次に、一実施形態による情報処理装置1の構成例について説明する。図3は、本実施形態による情報処理装置1の構成例を示すブロック図である。図2に示すように、情報処理装置1は、センサ部2、通信部3、記憶部4、および制御部5を備える。
次に、本実施形態による情報処理装置1が実行する処理手順について説明する。図9は、実施形態に係る情報処理装置1が実行する処理手順を示すフローチャートである。
次に、実施形態による情報処理装置1の具体的な実施例について説明する。本実施例においては、住宅Hに夫や父親が帰宅した場合を例に説明する。まず、事前の準備として、図10に示すように、情報処理装置1の空間位置情報取得部55が、デプスセンサ23によって検出される深度情報に基づいて、例えばVisualSLAMによって、所定エリア(住宅H内)のマップを作成する。この場合、音声を発する例えばドアDの位置などもマップ情報に含められる。作成したマップは、イベントメタ情報のベースとなるマップ情報として、記憶部4のイベントメタ情報DB44に記憶される。これにより、情報処理装置1は、常に起点からの自らの位置を推定することが可能になる。
次に、上述した実施例の変形例について説明する。図12は、本開示の変形例によるイベントDBの具体例を示す図である。図13Aと、図13Bおよび図13Cとはそれぞれ、本開示の変形例による情報処理方法、および情報処理装置の移動の具体例を示す図である。ここで、変形例による情報処理装置1Aは、例えば家庭用電器(以下、家電)を操作する移動型ロボットである。
ところで、上述した実施形態では、情報処理装置1,1Aは、所定エリア(住宅H)に配置される場合について説明したが、これに限定されるものではない。例えば、情報処理装置1をサーバ装置として構成することも可能である。
ところで、上述した実施形態では、所定エリアを住宅Hとして説明したが、これに限定されるものではなく、任意のエリアを所定エリアとして設定することが可能である。
(1)
所定エリアの環境情報をセンシングするセンサ部と、
所定のイベントに関するイベント特徴データと、前記イベント特徴データに関連付けされた前記所定のイベントの空間情報を含むメタ情報とを含むイベント情報が記憶された記憶部と、
前記センサ部によるセンシング結果に基づいて、前記記憶部から前記イベント情報を索出して、前記イベント情報に含まれる前記空間情報を取得する制御部と、
を備える情報処理装置。
(2)
前記制御部は、
前記センサ部によってセンシングされたセンシング結果と、前記記憶部に記憶された前記イベント特徴データとの類似度を判定し、
前記類似度が所定の想起閾値を超えた場合に、前記記憶部から前記想起閾値を超えたイベント特徴データを含むイベント情報を索出する
前記(1)に記載の情報処理装置。
(3)
前記イベント特徴データが、前記センサ部によってセンシング可能な物体から得られる物体特徴データと、前記センサ部によってセンシング可能な音声から得られる音声特徴データとを含む
前記(1)または(2)に記載の情報処理装置。
(4)
前記制御部は、
前記センサ部によってセンシングされた音声から得られる音声特徴データに基づいて、前記音声特徴データとの類似度が所定の想起閾値を超えた音声特徴データを含むイベント情報を前記記憶部から索出する
前記(3)に記載の情報処理装置。
(5)
前記制御部は、
前記センサ部によってセンシングされた物体から得られる物体特徴データに基づいて、前記物体特徴データとの類似度が所定の想起閾値を超えた物体特徴データを含むイベント情報を前記記憶部から索出する
前記(3)に記載の情報処理装置。
(6)
前記物体特徴データは、前記センサ部によってセンシングされた物体の特徴量であり、
前記音声特徴データは、前記センサ部によってセンシングされた物体から発せられた音声の特徴量である
前記(3)~(5)のいずれか1項に記載の情報処理装置。
(7)
筐体を移動させる駆動部を備えた移動型ロボットを制御可能に構成され、
前記制御部は、
前記取得した空間情報に基づいて行動計画を行い、前記行動計画に従って前記移動型ロボットを行動させる制御を行う
前記(1)~(6)のいずれか1項に記載の情報処理装置。
(8)
移動型ロボットである
前記(1)~(7)のいずれか1項に記載の情報処理装置。
(9)
コンピュータが、
所定エリアの環境情報をセンシングするセンサ部によるセンシング結果に基づいて、所定のイベントに関するイベント特徴データと、前記イベント特徴データに関連付けされた前記所定のイベントの空間情報を含むメタ情報とを含むイベント情報が記憶された記憶部から、前記イベント情報を索出して、前記イベント情報に含まれる前記空間情報を出力する
情報処理方法。
(10)
コンピュータを、
所定エリアの環境情報をセンシングするセンサ部と、
所定のイベントに関するイベント特徴データと、前記イベント特徴データに関連付けされた前記所定のイベントの空間情報を含むメタ情報とを含むイベント情報が記憶された記憶部と、
前記センサ部によるセンシング結果に基づいて、前記記憶部から前記イベント情報を索出して、前記イベント情報に含まれる前記空間情報を出力する制御部と、
として機能させる、プログラム。
(11)
所定エリアの環境情報をセンシングするセンサ部と、
前記センサ部によるセンシング結果に基づいて得られた所定のイベントに関するイベント特徴データと、前記センシング結果に基づいて得られた前記所定のイベントの空間情報を含むメタ情報とを、互いに関連付けてイベント情報を生成する生成部と、
を備える情報処理装置。
(12)
前記イベント特徴データは、前記センサ部によってセンシング可能な物体から得られる物体特徴データと、前記センサ部によってセンシング可能な音声から得られる音声特徴データとを含み、
前記制御部は、
前記センシング結果に基づいて得られた前記物体特徴データと前記音声特徴データとの一致度を判定し、
前記一致度が所定の一致閾値を超えた場合に、前記生成部が前記イベント情報を生成する
前記(10)に記載の情報処理装置。
(13)
コンピュータが、
所定エリアの環境情報をセンシングするセンサ部によるセンシング結果に基づいて得られた、所定のイベントに関するイベント特徴データと、前記センシング結果に基づいて得られた、前記所定のイベントの空間情報を含むメタ情報とを互いに関連付けてイベント情報を生成する
情報処理方法。
(14)
コンピュータを、
所定エリアの環境情報をセンシングするセンサ部と、
前記センサ部によるセンシング結果に基づいて得られた所定のイベントに関するイベント特徴データと、前記センシング結果に基づいて得られた前記所定のイベントの空間情報を含むメタ情報とを、互いに関連付けてイベント情報を生成する生成部と、
として機能させる、プログラム。
2 センサ部
3 通信部
4 記憶部
5 制御部
6 駆動部
21 マイクセンサ
22 カメラセンサ
23 デプスセンサ
41 音声特徴DB
42 物体マスクDB
43 物体特徴DB
44,144 イベントメタ情報DB
45,145 イベント特徴DB
46,146 イベントDB
47 閾値DB
48 想起イベントメタ情報
51 音声特徴抽出部
52 物体領域推定部
53 物体特徴抽出部
54 音源物体推定部
55 空間位置情報取得部
56 時刻情報取得部
57 学習想起部
58 行動計画制御部
70 特徴抽出部
80 イベントメタ情報取得部
90 イベントメモリ部
Claims (13)
- 所定エリアの環境情報をセンシングするセンサ部と、
所定のイベントに関するイベント特徴データと、前記イベント特徴データに関連付けされた前記所定のイベントの空間情報を含むメタ情報とを含むイベント情報が記憶された記憶部と、
前記センサ部によるセンシング結果に基づいて、前記記憶部から前記イベント情報を索出して、前記イベント情報に含まれる前記空間情報を取得する制御部と、
を備える情報処理装置。 - 前記制御部は、
前記センサ部によってセンシングされたセンシング結果と、前記記憶部に記憶された前記イベント特徴データとの類似度を判定し、
前記類似度が所定の想起閾値を超えた場合に、前記記憶部から前記想起閾値を超えたイベント特徴データを含むイベント情報を索出する
請求項1に記載の情報処理装置。 - 前記イベント特徴データが、前記センサ部によってセンシング可能な物体から得られる物体特徴データと、前記センサ部によってセンシング可能な音声から得られる音声特徴データとを含む
請求項1に記載の情報処理装置。 - 前記制御部は、
前記センサ部によってセンシングされた音声から得られる音声特徴データに基づいて、前記音声特徴データとの類似度が所定の想起閾値を超えた音声特徴データを含むイベント情報を前記記憶部から索出する
請求項3に記載の情報処理装置。 - 前記制御部は、
前記センサ部によってセンシングされた物体から得られる物体特徴データに基づいて、前記物体特徴データとの類似度が所定の想起閾値を超えた物体特徴データを含むイベント情報を前記記憶部から索出する
請求項3に記載の情報処理装置。 - 前記物体特徴データは、前記センサ部によってセンシングされた物体の特徴量であり、
前記音声特徴データは、前記センサ部によってセンシングされた物体から発せられた音声の特徴量である
請求項3に記載の情報処理装置。 - 筐体を移動させる駆動部を備えた移動型ロボットを制御可能に構成され、
前記制御部は、
前記取得した空間情報に基づいて行動計画を行い、前記行動計画に従って前記移動型ロボットを行動させる制御を行う
請求項1に記載の情報処理装置。 - 移動型ロボットである
請求項1に記載の情報処理装置。 - コンピュータが、
所定エリアの環境情報をセンシングするセンサ部によるセンシング結果に基づいて、所定のイベントに関するイベント特徴データと、前記イベント特徴データに関連付けされた前記所定のイベントの空間情報を含むメタ情報とを含むイベント情報が記憶された記憶部から、前記イベント情報を索出して、前記イベント情報に含まれる前記空間情報を出力する
情報処理方法。 - コンピュータを、
所定エリアの環境情報をセンシングするセンサ部と、
所定のイベントに関するイベント特徴データと、前記イベント特徴データに関連付けされた前記所定のイベントの空間情報を含むメタ情報とを含むイベント情報が記憶された記憶部と、
前記センサ部によるセンシング結果に基づいて、前記記憶部から前記イベント情報を索出して、前記イベント情報に含まれる前記空間情報を出力する制御部と、
として機能させる、プログラム。 - 所定エリアの環境情報をセンシングするセンサ部と、
前記センサ部によるセンシング結果に基づいて得られた所定のイベントに関するイベント特徴データと、前記センシング結果に基づいて得られた前記所定のイベントの空間情報を含むメタ情報とを、互いに関連付けてイベント情報を生成する生成部と、
を備える情報処理装置。 - コンピュータが、
所定エリアの環境情報をセンシングするセンサ部によるセンシング結果に基づいて得られた、所定のイベントに関するイベント特徴データと、前記センシング結果に基づいて得られた、前記所定のイベントの空間情報を含むメタ情報とを互いに関連付けてイベント情報を生成する
情報処理方法。 - コンピュータを、
所定エリアの環境情報をセンシングするセンサ部と、
前記センサ部によるセンシング結果に基づいて得られた所定のイベントに関するイベント特徴データと、前記センシング結果に基づいて得られた前記所定のイベントの空間情報を含むメタ情報とを、互いに関連付けてイベント情報を生成する生成部と、
として機能させる、プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/641,784 US20240042619A1 (en) | 2019-09-17 | 2020-07-15 | Information processing apparatus, information processing method, and program |
EP20864454.2A EP4032594A4 (en) | 2019-09-17 | 2020-07-15 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM |
JP2021546525A JPWO2021053949A1 (ja) | 2019-09-17 | 2020-07-15 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019-168590 | 2019-09-17 | ||
JP2019168590 | 2019-09-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021053949A1 true WO2021053949A1 (ja) | 2021-03-25 |
Family
ID=74884166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/027500 WO2021053949A1 (ja) | 2019-09-17 | 2020-07-15 | 情報処理装置、情報処理方法、およびプログラム |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240042619A1 (ja) |
EP (1) | EP4032594A4 (ja) |
JP (1) | JPWO2021053949A1 (ja) |
WO (1) | WO2021053949A1 (ja) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09218955A (ja) | 1996-02-14 | 1997-08-19 | Hitachi Ltd | 位置認識方法及び装置 |
WO2014167700A1 (ja) | 2013-04-12 | 2014-10-16 | 株式会社日立製作所 | 移動ロボット、及び、音源位置推定システム |
JP2018163293A (ja) * | 2017-03-27 | 2018-10-18 | シャープ株式会社 | 情報端末、情報端末の制御方法、および制御プログラム |
JP2019010728A (ja) * | 2016-03-28 | 2019-01-24 | Groove X株式会社 | お出迎え行動する自律行動型ロボット |
US20190164218A1 (en) * | 2016-07-13 | 2019-05-30 | Sony Corporation | Agent robot control system, agent robot system, agent robot control method, and storage medium |
JP2019113696A (ja) * | 2017-12-22 | 2019-07-11 | カシオ計算機株式会社 | 発話タイミング判定装置、ロボット、発話タイミング判定方法及びプログラム |
WO2019151387A1 (ja) * | 2018-01-31 | 2019-08-08 | Groove X株式会社 | 経験に基づいて行動する自律行動型ロボット |
-
2020
- 2020-07-15 EP EP20864454.2A patent/EP4032594A4/en active Pending
- 2020-07-15 WO PCT/JP2020/027500 patent/WO2021053949A1/ja active Application Filing
- 2020-07-15 JP JP2021546525A patent/JPWO2021053949A1/ja active Pending
- 2020-07-15 US US17/641,784 patent/US20240042619A1/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09218955A (ja) | 1996-02-14 | 1997-08-19 | Hitachi Ltd | 位置認識方法及び装置 |
WO2014167700A1 (ja) | 2013-04-12 | 2014-10-16 | 株式会社日立製作所 | 移動ロボット、及び、音源位置推定システム |
JP2019010728A (ja) * | 2016-03-28 | 2019-01-24 | Groove X株式会社 | お出迎え行動する自律行動型ロボット |
US20190164218A1 (en) * | 2016-07-13 | 2019-05-30 | Sony Corporation | Agent robot control system, agent robot system, agent robot control method, and storage medium |
JP2018163293A (ja) * | 2017-03-27 | 2018-10-18 | シャープ株式会社 | 情報端末、情報端末の制御方法、および制御プログラム |
JP2019113696A (ja) * | 2017-12-22 | 2019-07-11 | カシオ計算機株式会社 | 発話タイミング判定装置、ロボット、発話タイミング判定方法及びプログラム |
WO2019151387A1 (ja) * | 2018-01-31 | 2019-08-08 | Groove X株式会社 | 経験に基づいて行動する自律行動型ロボット |
Non-Patent Citations (1)
Title |
---|
See also references of EP4032594A4 |
Also Published As
Publication number | Publication date |
---|---|
EP4032594A4 (en) | 2022-11-16 |
US20240042619A1 (en) | 2024-02-08 |
EP4032594A1 (en) | 2022-07-27 |
JPWO2021053949A1 (ja) | 2021-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102243179B1 (ko) | 이동 로봇 및 그 제어방법 | |
US20190080245A1 (en) | Methods and Systems for Generation of a Knowledge Graph of an Object | |
US10049267B2 (en) | Autonomous human-centric place recognition | |
JP2017045447A (ja) | 地図生成方法、自己位置推定方法、ロボットシステム、およびロボット | |
JP7375748B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
CN109551476A (zh) | 结合云服务系统的机器人系统 | |
US20180200884A1 (en) | Robot apparatus, methods and computer products | |
KR102024094B1 (ko) | 인공지능을 이용한 이동 로봇 및 그 제어방법 | |
CN109389641A (zh) | 室内地图综合数据生成方法及室内重定位方法 | |
KR20210029586A (ko) | 이미지 내의 특징적 객체에 기반하여 슬램을 수행하는 방법 및 이를 구현하는 로봇과 클라우드 서버 | |
JP6583450B2 (ja) | 移動ロボットによる外観モデル維持のための事前対応的データ取得 | |
US11055341B2 (en) | Controlling method for artificial intelligence moving robot | |
JP2021536075A (ja) | 拡充識別器を訓練するための装置および方法 | |
AU2017256477A1 (en) | Mobile robot, system for multiple mobile robots, and map learning method of mobile robot | |
US10339381B2 (en) | Control apparatus, control system, and control method | |
JP6991317B2 (ja) | 画像及び電波単語に基づく移動機器の改良された位置認識 | |
WO2021053949A1 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
US20230147768A1 (en) | Adaptive learning system for localizing and mapping user and object using an artificially intelligent machine | |
WO2020017111A1 (ja) | エージェント、存在確率マップ作成方法、エージェントの行動制御方法、及びプログラム | |
KR20210095284A (ko) | 사용자의 위치를 결정하는 시스템 및 방법 | |
JP4569663B2 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
JP2005074562A (ja) | ロボット装置、ロボット装置の制御方法、及び記録媒体 | |
JP2005271137A (ja) | ロボット装置及びその制御方法 | |
US20220262225A1 (en) | Information processing device, method, and program | |
Choi et al. | A practical solution to SLAM and navigation in home environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20864454 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021546525 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 17641784 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020864454 Country of ref document: EP Effective date: 20220419 |