WO2024180905A1 - 情報処理装置、及び情報処理方法 - Google Patents

情報処理装置、及び情報処理方法 Download PDF

Info

Publication number
WO2024180905A1
WO2024180905A1 PCT/JP2024/000123 JP2024000123W WO2024180905A1 WO 2024180905 A1 WO2024180905 A1 WO 2024180905A1 JP 2024000123 W JP2024000123 W JP 2024000123W WO 2024180905 A1 WO2024180905 A1 WO 2024180905A1
Authority
WO
WIPO (PCT)
Prior art keywords
information processing
name
processing device
unit
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2024/000123
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
一美 青山
徳宏 西川
舜一 関口
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corp filed Critical Sony Group Corp
Priority to JP2025503611A priority Critical patent/JPWO2024180905A1/ja
Publication of WO2024180905A1 publication Critical patent/WO2024180905A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63HTOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
    • A63H30/00Remote-control arrangements specially adapted for toys, e.g. for toy vehicles
    • A63H30/02Electrical arrangements
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/08Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • This disclosure relates to an information processing device and an information processing method.
  • Language is a means by which humans can freely communicate their intentions without any special training. For this reason, language is expected to serve as an interface for instructing robots to perform a variety of tasks.
  • Patent Document 1 discloses an entertainment robot that learns the user's face and name through linguistic interaction (i.e., conversation).
  • Language allows objects to be expressed in a variety of ways. Therefore, when a user instructs a robot to operate an object, it is important to specify the target object using a linguistic expression that uniquely represents the object, and for the user and the robot to recognize the same object as the target object. However, it places a heavy burden on the user to have to think of a linguistic expression for an object that can be distinguished from other objects every time they instruct the robot to operate the object.
  • an information processing device including: an image recall unit that extracts a registered object from a captured image and outputs an identifier corresponding to the extracted object; a name object registration unit that registers the correspondence between the name of the object and the identifier; and an overall control unit that determines the correspondence between the name included in the instruction text and the object included in the captured image based on the identifier and executes an action on the object instructed by the instruction text.
  • the present disclosure also provides an information processing method including: extracting a registered object from a captured image, outputting an identifier corresponding to the extracted object, registering a correspondence between the name of the object and the identifier, determining a correspondence between the name included in instruction text and the object included in the captured image based on the identifier, and executing an action on the object instructed by the instruction text.
  • FIG. 1 is an explanatory diagram illustrating an overview of an information processing device according to an embodiment of the present disclosure.
  • 1 is a block diagram showing a functional configuration of an information processing device according to an embodiment of the present invention.
  • FIG. 11 is a flowchart showing the flow of a first operation example of the information processing device.
  • FIG. 11 is a block diagram showing a data flow in a first operation example.
  • FIG. 11 is a flowchart showing the flow of a second operation example of the information processing device.
  • FIG. 11 is a block diagram showing the data flow of a second operational example when a target object is identified by name.
  • FIG. 11 is a block diagram showing the data flow of a second operation example when a target object is identified by something other than its name.
  • 1 is a block diagram showing an example of a hardware configuration of an information processing device according to an embodiment of the present invention.
  • FIG. 1 is an explanatory diagram illustrating an overview of an information processing device according to an embodiment of the present disclosure.
  • the robot 20 may be, for example, a manipulator robot capable of grasping an object 30, a mobile manipulator robot capable of grasping an object 30 and moving, a one-arm or two-arm arm robot, a pet-type robot, or a humanoid robot. These robots 20 may be used for object manipulation, such as bringing things or putting things away, in a home where there are various objects 30.
  • the robot 20 may also be an entity in a virtual space that does not have a physical form.
  • the robot 20 may be a machine, robot, agent, or NPC (Non-Player Character) that exists in the virtual space.
  • the user 10 may also instruct these robots 20 in the virtual space to operate the object 30 using natural language such as speech.
  • the user 10 will consider what linguistic expression should be used to express the target object 31 so that the robot 20 can accurately recognize the target object 31, and then verbally instruct the robot 20 to manipulate the object.
  • the target object 31 can always be specified with relatively simple linguistic expressions such as "ball” or "cup.”
  • the technology disclosed herein has been conceived in light of the above circumstances.
  • the information processing device registers the name of the target object 31 that the user 10 has taught to the robot 20 in association with an image of the target object 31, thereby enabling the user 10 to use the name to instruct the robot 20 to perform an action on the target object 31.
  • the information processing device can reduce the diversity and ambiguity when expressing the target object 31 in language, thereby improving the probability that the robot 20 will succeed in a task instructed by the user 10.
  • Fig. 2 is a block diagram showing the functional configuration of the information processing device 100 according to this embodiment.
  • the information processing device 100 includes an input unit 110, an overall control unit 120, an instruction analysis unit 130, an image recall unit 140, a named object registration unit 150, and a robot control unit 160.
  • the information processing device 100 is mounted on, for example, a robot 20.
  • the information processing device 100 can cause the robot 20 to perform an action instructed by a user 10 through spoken voice or text.
  • the input unit 110 accepts various types of information input from the outside. Specifically, the input unit 110 may accept instruction text for the robot 20.
  • the instruction text is an operation instruction for the robot 20 expressed in natural language.
  • the instruction text may be, for example, text obtained by voice recognition of a voice uttered by the user 10, or may be text input by the user 10 using a keyboard or the like.
  • the input unit 110 may also receive an image captured by an imaging device mounted on the robot 20 as an input image.
  • the information processing device 100 can cause the robot 20 to perform an action toward the target object 31 instructed by the instruction text by comparing the target object 31 shown in the input image with the content instructed by the instruction text.
  • the overall control unit 120 executes control and judgment in the information processing device 100. Specifically, the overall control unit 120 cooperates with the instruction analysis unit 130, the image recall unit 140, and the named object registration unit 150 to grasp the content of the action to be performed on the target object 31 instructed by the input instruction text, and outputs a drive command to the robot control unit 160 to perform the grasped action.
  • the robot control unit 160 controls the drive of each part of the robot 20 based on the output drive command, and the robot 20 can execute the action to the target object 31 instructed by the instruction text.
  • the instruction analysis unit 130 analyzes the instruction text expressed in natural language to identify the category of the action instructed by the user 10 and the target object 31 of the instructed action. Specifically, the instruction analysis unit 130 may analyze the instruction text using a known text analysis technique to identify the category of the action instructed by the instruction text and the target object 31 of the action. For example, the instruction analysis unit 130 may identify the category of the action instructed by the instruction text as one of four types: "naming the target object 31," "transporting the target object 31 to the user 10," "placing the target object 31 at a specified location,” or "no action instruction.”
  • the instruction analysis unit 130 may also identify from the input image the target object 31 of the action instructed by the instruction text, and cut out and output an image area including the identified target object 31 from the input image. For example, if the target object 31 is specified in the instruction text by a linguistic expression such as color, shape, or a general noun, the instruction analysis unit 130 may identify an object corresponding to the linguistic expression as the target object 31 from the input image, and output an image area including the target object 31. Also, if there is only one object shown in the input image, or if it is clear that a specific object is being indicated in the input image by the user 10 pointing at it, etc., the instruction analysis unit 130 may identify the object as the target object 31, and output an image area including the target object 31.
  • a linguistic expression such as color, shape, or a general noun
  • the instruction analysis unit 130 may further extract a name to be given to the target object 31 from the instruction text. In such a case, the instruction analysis unit 130 outputs an image area including the target object 31 to be named, and the name to be given to the target object 31.
  • the image area including the target object 31 is output to the image recall unit 140 via the overall control unit 120, and the name to be given to the target object 31 is output to the named object registration unit 150 via the overall control unit 120.
  • the image recall unit 140 registers images of various objects in association with identifiers, and outputs an identifier corresponding to the object included in the input image.
  • the identifier is, for example, an identification number, code, or symbol.
  • the image recall unit 140 compares the input image with the images of the various registered objects, and if the object in the input image matches any of the registered objects, it outputs an identifier corresponding to the matching object and an image area in the input image that includes the object. On the other hand, if the object in the input image does not match any of the registered objects, the image recall unit 140 newly registers the image area of the object in the input image and outputs a new identifier corresponding to the object.
  • the named object registration unit 150 registers the correspondence between the identifier output by the image recall unit 140 in association with the object and the name of the object.
  • the named object registration unit 150 may register one name in association with the identifier, or may register two or more names in association with the identifier.
  • the named object registration unit 150 outputs the identifier associated with the inputted name of the object.
  • the named object registration unit 150 may output the name of the object associated with the inputted identifier.
  • the named object registration unit 150 registers the correspondence between the name to be given to the target object 31 and the identifier output by the image recall unit 140 in association with the target object 31.
  • the named object registration unit 150 then outputs to the overall control unit 120 a message indicating that registration of the correspondence has been completed.
  • the robot control unit 160 controls the operation of each part of the robot 20 based on the output from the overall control unit 120. Specifically, the robot control unit 160 controls the operation of each part so that the robot 20 executes the action on the target object 31 specified by the instruction text based on the category of the action specified in the instruction text and the image area including the target object 31. For example, the robot control unit 160 may control the operation of the manipulator or moving mechanism of the robot 20 so that the robot 20 executes the action on the target object 31 specified by the instruction text.
  • Fig. 3 is a flow chart showing the flow of the first operation example of the information processing device 100.
  • Fig. 4 is a block diagram showing the flow of data in the first operation example.
  • the first operation example is an operation example when the operation instructed by the instruction text is to "name the target object 31.”
  • instruction text Txt1 and input image Img1 are input to the information processing device 100 (S101).
  • the instruction text Txt1 is an action instruction to the robot 20, expressed as the character string "This is a den-den-daiko.”
  • the input image Img1 is an image of the target object 31.
  • the instruction analysis unit 130 analyzes the instruction text Txt1 (S102). As a result, the instruction analysis unit 130 outputs that the category of the action instructed in the instruction text Txt1 is naming, that the target object 31 of the instructed action is "this" (the object that appears uniquely in the image), and that the name given to the target object 31 is "Den-den-daiko.”
  • the input image Img1 is input to the image recall unit 140.
  • the image recall unit 140 outputs the result of "not registered.”
  • the overall control unit 120 determines whether the category of the action instructed in the instruction text Txt1 is naming (S103). If the category of the instructed action is naming (S103/Yes), the overall control unit 120 outputs the registered image area reg1 of the target object 31 cut out from the input image Img1 to the image recall unit 140, and causes the image of the target object 31 in the registered image area reg1 to be registered in the image recall unit 140. As a result, the image recall unit 140 registers the image of the target object 31 in association with a new identifier (registered ID), and outputs the new identifier (registered ID) corresponding to the image of the target object 31 (S105).
  • the overall control unit 120 outputs the identifier (registration ID) corresponding to the target object 31 issued by the image recall unit 140 and the name of the target object 31, "Den-den-daiko", to the named object registration unit 150.
  • the named object registration unit 150 registers the correspondence between the identifier (registration ID) corresponding to the target object 31 and the name of the target object 31, "Den-den-daiko” (S106).
  • the overall control unit 120 causes the robot 20 to notify the user 10 via the robot control unit 160 that the name of the target object 31 has been registered (S107).
  • the robot 20 may notify the user 10 using an image display, light emission, sound, audio, or the like.
  • Fig. 5 is a flow chart showing the flow of the second operation example of the information processing device 100.
  • Fig. 6 is a block diagram showing the data flow of the second operation example when the target object 31 is specified by its name.
  • the second operation example is an operation example in which the operation instructed by the instruction text is "transport the target object 31 to the user 10" or "place the target object 31 at a specified location.”
  • the name of the target object 31 is "Den-Den-Daiko" and has already been registered in the information processing device 100.
  • instruction text Txt2 and input image Img2 are input to the information processing device 100 (S101).
  • the instruction text Txt2 is an action instruction to the robot 20, represented by the character string "Bring me a dummy.”
  • the input image Img2 is an image of multiple objects including the target object 31.
  • the instruction analysis unit 130 analyzes the instruction text Txt2 (S102). As a result, the instruction analysis unit 130 outputs that the category of the action instructed in the instruction text Txt2 is transportation (Bring), and that the target object 31 (Target) of the instructed action is a "pocket robot.”
  • an input image Img2 is input to the image recall unit 140. Since the image of the target object 31 shown in the input image Img2 is registered in the image recall unit 140, the image recall unit 140 outputs an identifier (registered ID) corresponding to the target object 31, and also cuts out and outputs an image region Obj2 including the target object 31 from the input image Img2.
  • the image recall unit 140 Since the image of the target object 31 shown in the input image Img2 is registered in the image recall unit 140, the image recall unit 140 outputs an identifier (registered ID) corresponding to the target object 31, and also cuts out and outputs an image region Obj2 including the target object 31 from the input image Img2.
  • the overall control unit 120 determines whether the category of the action instructed in the instruction text Txt2 is transporting the target object 31 (S201).
  • the overall control unit 120 judges whether the representation of the target object 31 in the instruction text Txt2 (a jack-o-lantern) is a name registered in the named object registration unit 150 (S202/Yes). If the representation of the target object 31 (a jack-o-lantern) is a name registered in the named object registration unit 150 (S202/Yes), the overall control unit 120 receives an identifier (registered ID) associated with the name from the named object registration unit 150.
  • the overall control unit 120 determines whether or not an image region Obj2 including the target object 31 has been cut out from the input image Img2 by the image recalling unit 140 (S203). If an image region Obj2 including the target object 31 has been cut out from the input image Img2 (S203/Yes), the overall control unit 120 receives the image region Obj2 including the target object 31 and an identifier (registered ID) corresponding to the target object 31 from the image recalling unit 140.
  • the overall control unit 120 judges whether the identifier (registration ID) received from the name object registration unit 150 matches the identifier (registration ID) received from the image recall unit 140 (S204). If both identifiers (registration IDs) match (S204/Yes), the overall control unit 120 judges that the target object 31 expressed as "a jack-of-all-trades" in the instruction text Txt2 is the target object 31 included in the image region Obj2 of the input image Img2. This enables the overall control unit 120 to identify that the target object 31 expressed as "a jack-of-all-trades" in the instruction text Txt2 is the target object 31 included in the image region Obj2 (S205).
  • the overall control unit 120 judges whether the instructed action is executable (S206). If the instructed action is executable, the overall control unit 120 drives each part of the robot 20 via the robot control unit 160 to cause the robot 20 to transport the target object 31 (S207).
  • the overall control unit 120 judges No in any of steps S201, S203, S204, or S206, the overall control unit 120 stops executing the operation instructed in the instruction text Txt2 and ends the operation.
  • the named object registration unit 150 receives the name of the target object 31 in the instruction text Txt2 and outputs an identifier (registered ID) associated with the name, but the technology disclosed herein is not limited to such an example.
  • the named object registration unit 150 may receive an identifier corresponding to the target object 31 output from the image recall unit 140 and output the name of the target object 31 associated with the input identifier. In such a case, the overall control unit 120 determines whether the name output from the named object registration unit 150 matches the name of the target object 31 in the instruction text Txt2. If both names match, the overall control unit 120 can identify the target object 31 included in the image region Obj2 as the target object 31 represented by its name in the instruction text Txt2, as described above.
  • FIG. 7 is a block diagram showing the data flow of a second operation example when the target object 31 is identified by something other than a name.
  • instruction text Txt3 and input image Img3 are input to the information processing device 100 (S101).
  • the instruction text Txt3 is an action instruction to the robot 20, represented by the character string "Bring me a drum with a handle.”
  • the input image Img3 is an image of multiple objects including the target object 31.
  • the instruction analysis unit 130 analyzes the instruction text Txt3 (S102). As a result, the instruction analysis unit 130 outputs that the category of the action instructed in the instruction text Txt3 is transportation (Bring), and that the target object 31 (Target) of the instructed action is a "drum with a handle.” At this time, the instruction analysis unit 130 can identify the target object 31 from the input image Img3 using the linguistic expression "drum with a handle,” and therefore cuts out the image region Obj3 that includes the target object 31 that corresponds to the "drum with a handle.”
  • step S102 input image Img3 is input to image recall unit 140.
  • image recall unit 140 outputs the result of "not registered.”
  • the overall control unit 120 determines whether the category of the action instructed in the instruction text Txt3 is transporting the target object 31 (S201). If the category of the action instructed is transporting the target object 31 (S201/Yes), the overall control unit 120 determines whether the expression of the target object 31 in the instruction text Txt3 is a name registered in the name object registration unit 150 (S202).
  • the expression of the target object 31 in the instruction text Txt3 is not a name registered in the named object registration unit 150, but a general linguistic expression, "a drum with a handle" (S202/No). Therefore, the overall control unit 120 can identify the area including the target object 31 as the image area Obj3 from the output result of the instruction analysis unit 130 (S205).
  • the category of the action instructed in the instruction text Txt3 and the target object 31 of the instructed action can be identified, and the overall control unit 120 judges whether or not the instructed action is executable (S206). If the instructed action is executable, the overall control unit 120 can drive each part of the robot 20 via the robot control unit 160 to cause the robot 20 to transport the target object 31 (S207).
  • the information processing device 100 can uniquely specify the target object 31 without using complex linguistic expressions by registering an image of the target object 31 and the name of the target object 31 in association with each other. This allows the information processing device 100 to reduce the diversity and ambiguity when expressing the target object 31 in language, thereby improving the probability that the robot 20 will succeed in the task instructed by the user 10.
  • FIG. 8 is a block diagram showing an example of the hardware configuration of the information processing device 100 according to this embodiment.
  • the functions of the information processing device 100 can be realized by collaboration between software and hardware described below.
  • the functions of the overall control unit 120, the instruction analysis unit 130, the image recall unit 140, the name object registration unit 150, and the robot control unit 160 can be executed by, for example, the CPU 901.
  • the functions of the input unit 110 can be executed by, for example, the input device 906, the connection port 910, or the communication device 911.
  • the information processing device 100 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, and a RAM (Random Access Memory) 903.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • the information processing device 100 may further include a host bus 904a, a bridge 904, an external bus 904b, an interface 905, an input device 906, an output device 907, a storage device 908, a drive 909, a connection port 910, or a communication device 911.
  • the information processing device 100 may have a processing circuit such as a DSP (Digital Signal Processor) or an ASIC (Application Specific Integrated Circuit) instead of or together with the CPU 901.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • the CPU 901 functions as an arithmetic processing device or control device, and controls operations within the information processing device 100 in accordance with various programs recorded in the ROM 902, the RAM 903, the storage device 908, or a removable recording medium attached to the drive 909.
  • the ROM 902 stores programs used by the CPU 901, arithmetic parameters, etc.
  • the RAM 903 temporarily stores programs used in the execution of the CPU 901, and parameters used during the execution of the programs.
  • the CPU 901, ROM 902, and RAM 903 are interconnected by a host bus 904a capable of high-speed data transmission.
  • the host bus 904a is connected to an external bus 904b, such as a PCI (Peripheral Component Interconnect/Interface) bus, via a bridge 904, and the external bus 904b is connected to various components via an interface 905.
  • PCI Peripheral Component Interconnect/Interface
  • the input device 906 is, for example, a device that accepts input from a user, such as a mouse, keyboard, touch panel, button, switch, or lever.
  • the input device 906 may also be a microphone that detects the user's voice.
  • the input device 906 may also be, for example, a remote control device that uses infrared rays or other radio waves, or may be an externally connected device that supports the operation of the information processing device 100.
  • the input device 906 further includes an input control circuit that outputs an input signal generated based on information input by the user to the CPU 901. By operating the input device 906, the user can input various data or instruct the information processing device 100 to perform processing operations.
  • the output device 907 is a device capable of visually or audibly presenting information acquired or generated by the information processing device 100 to the user.
  • the output device 907 may be, for example, a display device such as an LCD (Liquid Crystal Display), a PDP (Plasma Display Panel), an OLED (Organic Light Emitting Diode) display, a hologram, or a projector, or may be a sound output device such as a speaker or headphones, or may be a printing device such as a printer device.
  • the output device 907 can output information acquired by processing by the information processing device 100 as video such as text or images, and sound such as voice or audio.
  • the storage device 908 is a data storage device configured as an example of a memory unit of the information processing device 100.
  • the storage device 908 may be configured, for example, with a magnetic memory device such as a hard disk drive (HDD), a semiconductor memory device, an optical memory device, or a magneto-optical memory device.
  • the storage device 908 can store programs executed by the CPU 901, various data, or various data acquired from the outside.
  • the drive 909 is a device for reading or writing removable recording media such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and is built into the information processing device 100 or is externally attached.
  • the drive 909 can read information recorded on a removable recording medium that is attached and output the information to the RAM 903.
  • the drive 909 can also write information to a removable recording medium that is attached.
  • the connection port 910 is a port for directly connecting an external device to the information processing device 100.
  • the connection port 910 may be, for example, a Universal Serial Bus (USB) port, an IEEE 1394 port, or a Small Computer System Interface (SCSI) port.
  • the connection port 910 may also be an RS-232C port, an optical audio terminal, or an HDMI (registered trademark) (High-Definition Multimedia Interface) port.
  • the communication device 911 is, for example, a communication interface configured with a communication device for connecting to the communication network 920.
  • the communication device 911 may be, for example, a communication card for a wired or wireless LAN (Local Area Network), Wi-Fi (registered trademark), Bluetooth (registered trademark), or WUSB (Wireless USB).
  • the communication device 911 may also be a router for optical communications, a router for ADSL (Asymmetric Digital Subscriber Line), or a modem for various types of communications.
  • the communication device 911 can transmit and receive signals, for example, using a specific protocol such as TCP/IP between the Internet or other communication devices.
  • the communication network 920 connected to the communication device 911 is a wired or wireless network, and may be, for example, an Internet communication network, a home LAN, an infrared communication network, a radio wave communication network, or a satellite communication network.
  • an image recall unit that extracts a registered object from a captured image and outputs an identifier corresponding to the extracted object; a name and object registration unit that registers a correspondence between the name of the object and the identifier; an overall control unit that grasps a correspondence between the name included in the instruction text and the object included in the captured image based on the identifier, and executes an action on the object instructed by the instruction text;
  • An information processing device comprising: (2) The information processing device described in (1), wherein, when the instruction by the instruction text is an instruction to give a name to an object in the captured image, the name object registration unit registers the name contained in the instruction text in association with an identifier output from the image recall unit to which the captured image is input.
  • the name object registration unit outputs the identifier corresponding to the name included in the instruction text;
  • the overall control unit causes the action toward the object to be executed based on the analysis results of the instruction analysis unit.
  • the information processing device according to any one of (1) to (8), wherein the overall control unit causes a robot equipped with a manipulator to perform an action on the object.
  • the instruction text is generated by performing voice recognition on a voice uttered by a user of the robot.
  • (11) extracting a registered object from the captured image and outputting an identifier corresponding to the extracted object; registering a correspondence between the name of the object and the identifier; determining a correspondence between the name included in the instruction text and the object included in the captured image based on the identifier, and executing an action on the object instructed by the instruction text;
  • An information processing method comprising:
  • REFERENCE SIGNS LIST 10 User 20 Robot 30 Object 31 Target object 100 Information processing device 110 Input unit 120 Overall control unit 130 Instruction analysis unit 140 Image recall unit 150 Name object registration unit 160 Robot control unit Img1, Img2, Img3 Input image reg1 Registered image area Obj2, Obj3 Image area Txt1, Txt2, Txt3 Instruction text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Manipulator (AREA)
PCT/JP2024/000123 2023-02-28 2024-01-09 情報処理装置、及び情報処理方法 Ceased WO2024180905A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2025503611A JPWO2024180905A1 (https=) 2023-02-28 2024-01-09

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023-029076 2023-02-28
JP2023029076 2023-02-28

Publications (1)

Publication Number Publication Date
WO2024180905A1 true WO2024180905A1 (ja) 2024-09-06

Family

ID=92590183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/000123 Ceased WO2024180905A1 (ja) 2023-02-28 2024-01-09 情報処理装置、及び情報処理方法

Country Status (2)

Country Link
JP (1) JPWO2024180905A1 (https=)
WO (1) WO2024180905A1 (https=)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3951235B2 (ja) * 2003-02-19 2007-08-01 ソニー株式会社 学習装置及び学習方法並びにロボット装置
JP4595436B2 (ja) * 2004-03-25 2010-12-08 日本電気株式会社 ロボット、その制御方法及び制御用プログラム
JP2010282199A (ja) * 2009-06-02 2010-12-16 Honda Motor Co Ltd 語彙獲得装置、マルチ対話行動システム及び語彙獲得プログラム
JP4718987B2 (ja) * 2005-12-12 2011-07-06 本田技研工業株式会社 インターフェース装置およびそれを備えた移動ロボット
JP5892361B2 (ja) * 2011-08-02 2016-03-23 ソニー株式会社 制御装置、制御方法、プログラム、及びロボット制御システム
JP6468643B2 (ja) * 2015-03-09 2019-02-13 株式会社国際電気通信基礎技術研究所 コミュニケーションシステム、確認行動決定装置、確認行動決定プログラムおよび確認行動決定方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3951235B2 (ja) * 2003-02-19 2007-08-01 ソニー株式会社 学習装置及び学習方法並びにロボット装置
JP4595436B2 (ja) * 2004-03-25 2010-12-08 日本電気株式会社 ロボット、その制御方法及び制御用プログラム
JP4718987B2 (ja) * 2005-12-12 2011-07-06 本田技研工業株式会社 インターフェース装置およびそれを備えた移動ロボット
JP2010282199A (ja) * 2009-06-02 2010-12-16 Honda Motor Co Ltd 語彙獲得装置、マルチ対話行動システム及び語彙獲得プログラム
JP5892361B2 (ja) * 2011-08-02 2016-03-23 ソニー株式会社 制御装置、制御方法、プログラム、及びロボット制御システム
JP6468643B2 (ja) * 2015-03-09 2019-02-13 株式会社国際電気通信基礎技術研究所 コミュニケーションシステム、確認行動決定装置、確認行動決定プログラムおよび確認行動決定方法

Also Published As

Publication number Publication date
JPWO2024180905A1 (https=) 2024-09-06

Similar Documents

Publication Publication Date Title
US11580970B2 (en) System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection
US8682640B2 (en) Self-configuring language translation device
US20090132931A1 (en) Method, device and program for automatically generating reference mark in virtual shared space
CN107301213A (zh) 智能问答方法及装置
JP2019046468A (ja) インターフェイススマートインタラクティブ制御方法、装置、システム及びプログラム
WO2021071110A1 (en) Electronic apparatus and method for controlling electronic apparatus
JP2001229392A (ja) 少ないメッセージ交信により会話式キャラクタを実施する合理的アーキテクチャ
US11587305B2 (en) System and method for learning sensory media association without using text labels
US11978252B2 (en) Communication system, display apparatus, and display control method
JP2005149485A (ja) 逐次的なマルチモーダル入力
JP7289179B2 (ja) 未知の語を含むクエリに応答したリソースの発見
WO2025124093A1 (zh) 交互方法、装置、计算机可读存储介质和计算机程序产品
CN114663718A (zh) 训练装置、方法、设备以及计算机可读介质
WO2021125019A1 (ja) 情報システム、情報処理方法、情報処理プログラム、およびロボットシステム
CN113655933A (zh) 文本标注方法及装置、存储介质及电子设备
CN110070869B (zh) 语音教学互动生成方法、装置、设备和介质
CN119538967A (zh) 一种基于多模态理解的数字人交互方法及系统
US20200051556A1 (en) Speech control for complex commands
WO2022089546A1 (zh) 标签生成方法、装置及相关设备
US20060069543A1 (en) Emulated universal serial bus input devices
US20200125321A1 (en) Digital Assistant User Interface Amalgamation
Kapadia et al. EchoBot: Facilitating data collection for robot learning with the Amazon echo
WO2024180905A1 (ja) 情報処理装置、及び情報処理方法
CN115440220A (zh) 一种话语权切换方法、装置、设备和存储介质
CN118709743A (zh) 对抗攻击方法、相关装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24763390

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025503611

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025503611

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 24763390

Country of ref document: EP

Kind code of ref document: A1