WO2021153102A1 - Dispositif de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations - Google Patents

Dispositif de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations Download PDF

Info

Publication number
WO2021153102A1
WO2021153102A1 PCT/JP2020/047859 JP2020047859W WO2021153102A1 WO 2021153102 A1 WO2021153102 A1 WO 2021153102A1 JP 2020047859 W JP2020047859 W JP 2020047859W WO 2021153102 A1 WO2021153102 A1 WO 2021153102A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
information processing
speaker
terminal device
utterance
Prior art date
Application number
PCT/JP2020/047859
Other languages
English (en)
Japanese (ja)
Inventor
真里 斎藤
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021153102A1 publication Critical patent/WO2021153102A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • This disclosure relates to information processing devices, information processing systems, information processing methods and information processing programs.
  • the operation of the terminal device is based on the terminal detection unit that detects the end of the language information of the target speaker acquired by the terminal device and the information on the terminal detected by the terminal detection unit.
  • An information processing apparatus is provided that includes an operation control unit that performs processing for controlling the above.
  • Embodiment of the present disclosure >> ⁇ 1.1. Overview>
  • a system that understands a speaker's utterance and interacts with the speaker has become widespread.
  • a system in which the input utterance is converted into text and displayed has become common.
  • This system is realized, for example, as a speaker-type dialogue agent such as a smart speaker or a human-type dialogue agent such as Pepper (registered trademark).
  • the text may be displayed for a long time, and it was difficult to convey that the speaker understood the utterance.
  • the speaker's utterance for example, if a filler, which is a connecting word that has nothing to do with the utterance content, or a nod or an aizuchi can be performed, the speaker can be made to feel that the dialogue agent understands the utterance. obtain. Therefore, a technique related to a dialogue agent that performs fillers, nods, and aizuchi in the speaker's utterance is being advanced.
  • Patent Document 1 includes a technique for controlling the operation of the dialogue agent when it cannot be estimated that the utterance should be waited for or the utterance should be executed. It is disclosed.
  • the dialogue agent's behavior related to the dialogue is controlled regardless of the intention of the speaker's utterance, for example, the dialogue agent's behavior may interfere with the speaker's utterance. obtain.
  • the idea was conceived by paying attention to the above points, and a technique capable of controlling the operation related to the dialogue of the dialogue agent according to the intention of the speaker's utterance is proposed.
  • the present embodiment will be described in detail in order.
  • utterance data will be used for explanation.
  • the terminal device 20 will be described below as an example of the dialogue agent.
  • FIG. 1 is a diagram showing a configuration example of the information processing system 1.
  • the information processing system 1 includes an information processing device 10 and a terminal device 20.
  • Various devices can be connected to the information processing device 10.
  • a terminal device 20 is connected to the information processing device 10, and information is linked between the devices.
  • the terminal device 20 is wirelessly connected to the information processing device 10.
  • the information processing device 10 performs short-range wireless communication using the terminal device 20 and Bluetooth (registered trademark).
  • the information processing device 10 and the terminal device 20 include various interfaces such as I2C (Inter-Integrated Circuit) and SPI (Serial Peripheral Interface), and LAN (Local) regardless of whether they are wired or wireless. It may be connected via various networks such as Area Network), WAN (Wide Area Network), the Internet, and mobile communication networks.
  • I2C Inter-Integrated Circuit
  • SPI Serial Peripheral Interface
  • LAN Local Area Network
  • LAN Local Area Network
  • networks such as Area Network), WAN (Wide Area Network), the Internet, and mobile communication networks.
  • the information processing device 10 is an information processing device that controls, for example, an operation related to dialogue of the terminal device 20 according to the utterance (voice) data related to the speaker's utterance (voice). Specifically, the information processing device 10 controls the operation related to the dialogue of the terminal device 20 based on the information regarding the termination of the speaker's utterance data. Further, the information processing device 10 is an information processing device capable of recognizing a speaker's utterance. For example, the information processing device 10 performs recognition processing on the utterance data acquired by the terminal device 20.
  • the information processing device 10 is realized by a PC (Personal computer), a WS (Workstation), or the like.
  • the information processing device 10 is not limited to a PC, a WS, or the like.
  • the information processing device 10 may be an information processing device such as a PC or WS that implements the function of the information processing device 10 as an application.
  • Terminal device 20 The terminal device 20 is an information processing device to be controlled.
  • the terminal device 20 may be realized as any device.
  • the terminal device 20 may be realized as a speaker type device or a human type device.
  • FIG. 2 is a diagram showing an outline of the functions of the information processing system 1 according to the embodiment.
  • the terminal device 20 first detects the utterance TK11 of the speaker U12.
  • the terminal device 20 is controlled so as to perform an operation such that the line of sight is directed toward the speaker U12 during the utterance of the speaker U12 (S11).
  • the information processing system 1 detects the end of the utterance TK11, the information processing system 1 controls the terminal device 20 to perform an operation of noting the intention of the utterance TK11 (S12).
  • S12 the information processing system 1 controls to take notes of the linguistic information of "next month" and "business trip" as the intention of the utterance TK11.
  • the terminal device 20 detects the utterance TK12 of the speaker U12. Then, S11 is performed.
  • the information processing system 1 controls the terminal device 20 to perform an operation of noting the intention of the utterance TK12 (S13).
  • the information processing system 1 is controlled so as to perform an operation of writing down the linguistic information of "October", "business trip", and "Sapporo" as the intention of the utterance TK12. In this way, the terminal device 20 performs an operation of directing the line of sight toward the speaker U12 during the utterance of the speaker U12, and performs an operation of making a note at the end of the utterance TK11.
  • the information processing system 1 can make it easier for the speaker to convey that the terminal device 20 is appropriately performing utterance recognition and semantic analysis. For example, the information processing system 1 can easily convey that the terminal device 20 understands the speaker's linguistic information on a sentence-by-sentence basis. Then, the terminal device 20 detects the utterance TK13 of the speaker U12. Then, S11 is performed. When the information processing system 1 detects the end of the utterance TK13, the information processing system 1 controls the terminal device 20 to perform an operation of writing down the intention of the utterance TK13 (S14).
  • the information processing system 1 controls to take notes of the linguistic information of "next month”, “business trip”, “Sapporo”, and “hotel reservation” as the intention of the utterance TK13. Further, when the utterance TK13 includes the operation request of the speaker U12, the information processing system 1 adds, for example, information about the operation request to the scheduler TD11 of the terminal device 20. Then, the information processing system 1 controls the terminal device 20 so as to perform an operation corresponding to the information added to the scheduler TD 11. In S14, the information processing system 1 controls the operation of the terminal device 20 so as to perform the operation corresponding to "Teach when there is no request in October".
  • FIG. 3 an outline of the function of the information processing system 1 will be described by taking as an example a case where the speaker U12 makes an utterance different from that in FIG.
  • the terminal device 20 first detects the utterance TK21 of the speaker U12.
  • the terminal device 20 is controlled so as to perform an operation such that the line of sight is directed toward the speaker U12 during the utterance of the speaker U12 (S21).
  • the information processing system 1 detects the end of the utterance TK21
  • the information processing system 1 controls the terminal device 20 to perform an operation of noting the intention of the utterance TK21 (S22).
  • the information processing system 1 draws a strikethrough to delete the language information of the "shampoo” noted in S24, and controls to perform an operation of newly adding a memo of the language information of the "rinse".
  • the information processing system 1 adds new language information corresponding to the correction while leaving the language information before the correction corresponding to the correction. It may be controlled to perform the operation.
  • the information processing system 1 can easily convey to the speaker the change from the erroneously recognized linguistic information to the correct linguistic information even if there is an erroneous recognition in the utterance recognition or the semantic analysis.
  • FIG. 4 An outline of the function of the information processing system 1 will be described by taking as an example a case where the speaker U12 makes an utterance different from that in FIGS. 2 and 3.
  • the terminal device 20 first detects the utterance TK31 of the speaker U12.
  • the terminal device 20 is controlled so as to perform an operation such that the line of sight is directed toward the speaker U12 during the utterance of the speaker U12 (S31).
  • the information processing system 1 detects the end of the utterance TK31
  • the information processing system 1 controls the terminal device 20 to take note of the intention of the utterance TK31 (S32).
  • the information processing system 1 controls the operation of adding linguistic information memos of "preparation for tomorrow's school", “gym clothes", and “lunch box” as the intention of the utterance TK43.
  • the terminal device 20 detects the utterance TK44 of the speaker U12. Then, S41 is performed. Further, when the utterance TK44 includes a request for correction of the language information written down in another medium, the information processing system 1 controls to perform an operation of correcting the written language information according to the information regarding the correction request. (S46). For example, the information processing system 1 is controlled to display another medium and perform an operation of adding new linguistic information to the displayed other medium. In S46, the information processing system 1 controls the memo M42 to perform an operation of adding the language information of "ironing".
  • FIG. 6 is a block diagram showing a functional configuration example of the information processing system 1 according to the embodiment.
  • the information processing device 10 includes a communication unit 100, a control unit 110, and a storage unit 120.
  • the information processing device 10 has at least a control unit 110.
  • the communication unit 100 has a function of communicating with an external device. For example, the communication unit 100 outputs information received from the external device to the control unit 110 in communication with the external device. Specifically, the communication unit 100 outputs the utterance data received from the terminal device 20 to the control unit 110.
  • the communication unit 100 transmits the information input from the control unit 110 to the external device in communication with the external device. Specifically, the communication unit 100 transmits information regarding acquisition of utterance data input from the control unit 110 to the terminal device 20.
  • Control unit 110 has a function of controlling the operation of the information processing device 10. For example, the control unit 110 detects the end of the utterance data. Further, the control unit 110 performs a process of controlling the operation of the terminal device 20 based on the information regarding the detected termination.
  • control unit 110 includes a speaker identification unit 111, an utterance recognition unit 112, a terminal detection unit 113, an operation control unit 114, a semantic analysis unit 115, and a memo content. It has a control unit 116.
  • the speaker identification unit 111 has a function of performing speaker identification processing. For example, the speaker identification unit 111 accesses the storage unit 120 and performs identification processing using speaker information. Specifically, the speaker identification unit 111 identifies the speaker by comparing the image pickup information transmitted from the image pickup unit 212 via the communication unit 200 with the speaker information stored in the storage unit 120. do.
  • the utterance recognition unit 112 has a function of performing utterance (speech) recognition processing of the speaker. For example, the utterance recognition unit 112 performs utterance recognition processing on the utterance data transmitted from the utterance acquisition unit 211 via the communication unit 200. Specifically, the utterance recognition unit 112 converts the utterance data into linguistic information.
  • the terminal detection unit 113 has a function of performing a process of detecting the end of the utterance data. For example, the terminal detection unit 113 performs a process of detecting the end of the utterance data whose utterance is recognized by the utterance recognition unit 112. Specifically, the terminal detection unit 113 detects the end of the language information converted by the utterance recognition unit 112.
  • the motion control unit 114 has a function of performing processing for controlling the motion of the terminal device 20. For example, the motion control unit 114 performs a process for controlling the operation of writing down the language information on the medium, turning the medium on which the language information is written down, and the like as the operation of the terminal device 20. As shown in FIG. 6, the operation control unit 114 includes an operation generation unit 1141 and an operation presentation unit 1142.
  • the semantic analysis unit 115 has a function of analyzing the utterance of the speaker. For example, the semantic analysis unit 115 analyzes the linguistic information of the utterance data recognized by the utterance recognition unit 112. Specifically, the semantic analysis unit 115 classifies the linguistic information of the utterance data into nouns, verbs, modifiers, and the like.
  • the memo content control unit 116 has a function of performing processing for controlling the memo information. For example, the memo content control unit 116 determines the memo information based on the result of the analysis processing of the speaker's utterance.
  • FIG. 7 shows an example of the storage unit 120.
  • the storage unit 120 shown in FIG. 7 stores the correspondence of memo information.
  • the storage unit 120 may have items such as "memo ID”, “memo information”, and "related memo information".
  • the communication unit 200 has a function of communicating with an external device. For example, the communication unit 200 outputs information received from the external device to the control unit 210 in communication with the external device. Specifically, the communication unit 200 outputs information regarding acquisition of utterance data received from the information processing device 10 to the control unit 210. Further, the communication unit 200 outputs the control information received from the information processing device 10 to the control unit 210.
  • the communication unit 200 outputs the memo information received from the information processing device 10 to the presentation unit 220.
  • the communication unit 200 transmits the information input from the control unit 210 to the external device in the communication with the external device. Specifically, the communication unit 200 transmits the utterance data input from the control unit 210 to the information processing device 10.
  • Control unit 210 has a function of controlling the overall operation of the terminal device 20. For example, the control unit 210 controls the utterance data acquisition process by the utterance acquisition unit 211. Further, the control unit 210 controls a process in which the communication unit 200 transmits the utterance data acquired by the utterance acquisition unit 211 to the information processing device 10.
  • the utterance acquisition unit 211 has a function of acquiring the utterance data of the speaker.
  • the utterance acquisition unit 211 acquires utterance data using the utterance (voice) detector provided in the terminal device 20.
  • the image pickup unit 212 has a function of capturing a speaker.
  • the operation control unit 213 has a function of controlling the operation of the terminal device 20.
  • the operation control unit 213 controls the operation of the terminal device 20 according to the acquired control information.
  • the motion control unit 213 controls the motion of the terminal device 20 so that the speaker directs his / her line of sight to the speaker while the speaker is speaking, according to the acquired control information.
  • the presentation unit 220 has a function of controlling the overall presentation of memo information. For example, the presentation unit 220 presents the memo information recorded on the corresponding medium according to the acquired memo information.
  • FIG. 8 is a flowchart showing a flow of processing in the information processing device 10 according to the embodiment.
  • the information processing device 10 acquires the utterance data of the speaker (S100). Further, the information processing device 10 performs an utterance recognition process on the acquired utterance data (S102). Next, the information processing device 10 determines whether or not the utterance includes a request for correction of memo information (S104). Then, when the utterance does not include the request for correction of the memo information (S104; NO), the information processing device 10 extracts the linguistic information included in the utterance (S106). Then, the information processing device 10 adds memo information based on the linguistic information of the utterance (S108).
  • FIG. 9 is a flowchart showing a processing flow in the information processing system 1 according to the embodiment.
  • the terminal device 20 presents a default line of sight (S200). Further, the terminal device 20 receives the audio signal (S202). Next, the terminal device 20 detects the speaker's utterance (S204). Then, the terminal device 20 identifies the position of the speaker (S206). Then, the terminal device 20 changes the line of sight in the direction of the speaker (S208).
  • the terminal device 20 changes the line of sight to the direction of the memo (S216).
  • the terminal device 20 performs an operation presentation process (S218).
  • the terminal device 20 changes the line of sight in the direction of the speaker (S220).
  • the terminal device 20 determines whether or not the utterance is newly detected (S222).
  • the terminal device 20 returns to the process of S204.
  • the terminal device 20 ends the information processing.
  • the information processing device 10 may be controlled so as to note the emphasized modifiers and the like as well as the nouns included in the emphasized utterance.
  • the linguistic information instructed by the speaker to make a note is more likely to be the speaker's intention other than the noun.
  • the information processing device 10 may perform control to make a note of the linguistic information instructed by the speaker to make a note.
  • the information processing device 10 may process the continuous nouns as one noun.
  • Supplementary function of memo information 1 Supplement using another information source
  • the information processing device 10 may generate memo information supplemented with the information according to the linguistic information included in the utterance.
  • the memo content control unit 116 may generate memo information supplemented with the information.
  • the memo content control unit 116 may generate memo information to which information for displaying in a predetermined mode such as parentheses is added as memo information supplemented with the information. For example, “next month (October 2019)" or “here (place A11)". In this case, "next month” and “here” are the speaker's remarks, and "(October 2019)” and "(place A11)” are supplementary information.
  • the information processing device 10 may access an information source that stores the corresponding information and acquire supplementary information based on the remarks other than the abstract remarks.
  • the memo content control unit 116 may access an information source that stores the speaker's schedule and acquire information on a location corresponding to a predetermined date and time.
  • the memo content control unit 116 may access an information source that stores the work place information of the speaker to acquire the information of the place corresponding to the work place. Then, the memo content control unit 116 may generate memo information in which parentheses are added to the information acquired from another information source. For example, "meeting here next month (meeting room A12)" and "Mr. XX's place of work is also here (work place A13)".
  • the memo content control unit 116 accesses the information source that stores the speaker's schedule as supplementary information corresponding to "here" of "meeting here next month", and "meeting next month” and “meeting”. Acquire the "meeting room A12" which is the information of the place corresponding to.
  • the memo content control unit 116 accesses an information source that stores the speaker's work place information as supplementary information corresponding to "here" of "Mr. XX's work place is also here", and " ⁇ ". Acquire "Workplace A13", which is the work place information corresponding to "Mr. ⁇ " and "Work location”.
  • the utterance pitch may be presented in a color different from the color used in average language information or the like (for example, a color used for emphasizing deficits or the like).
  • the memo content control unit 116 performs processing for converting the information regarding the utterance mode of the speaker into memo information.
  • the control unit 110 has the utterance recognition unit 112 and the semantic analysis unit 115 . Not limited. That is, the control unit 110 does not have to have the utterance recognition unit 112 and the semantic analysis unit 115.
  • the information processing system 1 may perform the above-mentioned utterance recognition and semantic analysis processing via an external information processing device. Specifically, the control unit 110 performs the above-mentioned utterance recognition and semantic analysis processing by transmitting the utterance data transmitted from the utterance acquisition unit 211 to an external information processing device via the communication unit 100. You may.
  • Termination detection The information processing system 1 may learn the utterance history of the speaker, predict the termination timing, and control the operation so that the termination detection delay is eliminated.
  • the motion generation unit 1141 may generate control information for controlling the terminal device 20 to operate at the terminal timing by accessing the storage unit 120 and learning the history of information regarding the terminal. good.
  • the information processing system 1 predicts the end timing and controls the operation, the end of the speaker's utterance is not always the predicted timing, so even if the operation is controlled to be small. good. As a result, the information processing system 1 can be controlled to operate more appropriately without interfering with the speaker's utterance.
  • the information processing system 1 may control the operation of the terminal device 20 when the speaker behavior information after the terminal is detected satisfies a predetermined condition.
  • the motion control unit 114 performs a process for controlling the motion of the terminal device 20 when the action information indicating the speaker's nod or aizuchi after the termination is detected is equal to or higher than a predetermined threshold value. May be good.
  • the motion control unit 114 may perform a process for controlling the motion of the terminal device 20 when the loudness of the speaker's nod and the volume of the aizuchi are equal to or higher than a predetermined threshold value.
  • past memo information can be the intention of the speaker's utterance.
  • the information processing device 10 displays the past memo information in a predetermined area of the screen for displaying the memo information. It may be controlled as follows. In this case, the motion generation unit 1141 generates, for example, control information that controls the memo information to be displayed in a predetermined area of the screen for displaying the memo information.
  • the presentation unit 220 displays the memo information in a predetermined area of the screen for displaying the memo information based on the control information transmitted from the information processing device 10 via the communication unit 100. As a result, the speaker can confirm the memo information of the speaker's utterance while referring to the memo information of the past utterance.
  • FIG. 10 shows an outline of the function when the information processing system 1 detects a patient with dementia, taking as an example a case where the speaker U12 makes a statement different from the fact as compared with the past utterance of the speaker U12. Will be explained.
  • the information processing system 1 controls the terminal device 20 so as to perform an operation of making a note of the intention of the utterance TK51 of the speaker U12 (S51).
  • the information processing system 1 is controlled so as to perform an operation of writing down the linguistic information of "hospital” and "next Wednesday (November 11)" as the intention of the utterance TK51.
  • the information processing system 1 stores the memo information M41 in the storage unit 120 (S52).
  • the information processing system 1 determines whether or not the utterance TK54 of the speaker U12 conforms to the fact based on the utterances TK52 to TK54 between the speaker U12 and the terminal device 20, for example, the speaker U12. Access the schedule application AP1 that stores the schedule (S53). Then, the information processing system 1 acquires the schedule information of the speaker U12.
  • the information processing system 1 accesses, for example, the messaging application AP2 that notifies the family U13 of the speaker U12 of information that the speaker U12 may be prone to dementia. (S54). Then, the information processing system 1 notifies the family U13 of the speaker U12.
  • the above embodiment can also be applied to communication when an interlocutor such as a teleconference is remote.
  • the information processing device 10 may control the terminal device 20 so that, for example, a main point is presented in a telephone, a telephony, or the like exchanged between a plurality of speakers.
  • the memo content control unit 116 generates, for example, a main point based on the utterance data.
  • the presentation unit 220 presents the main points transmitted from the information processing device 10 via the communication unit 100.
  • the information processing device 10 may control the terminal device 20 so as to perform an operation indicating which speaker is the turn to speak (turn take) among the plurality of speakers.
  • the motion generation unit 1141 estimates the speaker to speak next based on, for example, the information about the end detected by the end detection unit 113 and the memo information controlled by the memo content control unit 116, and turns. Generates control information for performing actions that indicate a take.
  • FIG. 11 is a block diagram showing a hardware configuration example of the information processing device according to the embodiment.
  • the information processing device 900 shown in FIG. 11 can realize, for example, the information processing device 10 and the terminal device 20 shown in FIG.
  • the information processing by the information processing device 10 and the terminal device 20 according to the embodiment is realized by the cooperation between the software and the hardware described below.
  • the information processing device 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, and a RAM (Random Access Memory) 903.
  • the information processing device 900 includes a host bus 904a, a bridge 904, an external bus 904b, an interface 905, an input device 906, an output device 907, a storage device 908, a drive 909, a connection port 910, and a communication device 911.
  • the hardware configuration shown here is an example, and some of the components may be omitted. Further, the hardware configuration may further include components other than the components shown here.
  • the CPU 901, ROM 902, and RAM 903 are connected to each other via, for example, a host bus 904a capable of high-speed data transmission.
  • the host bus 904a is connected to the external bus 904b, which has a relatively low data transmission speed, via, for example, the bridge 904.
  • the external bus 904b is connected to various components via the interface 905.
  • the input device 906 is realized by a device such as a mouse, a keyboard, a touch panel, a button, a microphone, a switch, and a lever, in which information is input by a speaker. Further, the input device 906 may be, for example, a remote control device using infrared rays or other radio waves, or an externally connected device such as a mobile phone or a PDA that supports the operation of the information processing device 900. .. Further, the input device 906 may include, for example, an input control circuit that generates an input signal based on the information input by the speaker using the above input means and outputs the input signal to the CPU 901. By operating the input device 906, the speaker of the information processing device 900 can input various data to the information processing device 900 and instruct the processing operation.
  • the input device 906 may be formed by a device that detects information about the speaker.
  • the input device 906 includes an image sensor (for example, a camera), a depth sensor (for example, a stereo camera), an acceleration sensor, a gyro sensor, a geomagnetic sensor, an optical sensor, a sound sensor, and a distance measuring sensor (for example, ToF (Time of Flight). ) Sensors), may include various sensors such as force sensors.
  • the input device 906 includes information on the state of the information processing device 900 itself such as the posture and moving speed of the information processing device 900, and information on the surrounding environment of the information processing device 900 such as brightness and noise around the information processing device 900. May be obtained.
  • the input device 906 receives a GNSS signal (for example, a GPS signal from a GPS (Global Positioning System) satellite) from a GNSS (Global Navigation Satellite System) satellite and receives position information including the latitude, longitude and altitude of the device. It may include a GPS module to measure. Further, regarding the position information, the input device 906 may detect the position by transmission / reception with Wi-Fi (registered trademark), a mobile phone / PHS / smartphone, or short-range communication. The input device 906 can realize, for example, the function of the utterance acquisition unit 211 described with reference to FIG.
  • a GNSS signal for example, a GPS signal from a GPS (Global Positioning System) satellite
  • GNSS Global Navigation Satellite System
  • the output device 907 is formed of a device capable of visually or audibly notifying the speaker of the acquired information.
  • Such devices include display devices such as CRT display devices, liquid crystal display devices, plasma display devices, EL display devices, laser projectors, LED projectors and lamps, audio output devices such as speakers and headphones, and printer devices. ..
  • the output device 907 outputs, for example, the results obtained by various processes performed by the information processing device 900.
  • the display device visually displays the results obtained by various processes performed by the information processing device 900 in various formats such as texts, images, tables, and graphs.
  • the audio output device converts an audio signal composed of reproduced audio data, acoustic data, etc. into an analog signal and outputs it audibly.
  • the output device 907 can realize, for example, the function of the presentation unit 220 described with reference to FIG.
  • the storage device 908 is a data storage device formed as an example of the storage unit of the information processing device 900.
  • the storage device 908 is realized by, for example, a magnetic storage device such as an HDD, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.
  • the storage device 908 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deleting device that deletes the data recorded on the storage medium, and the like.
  • the storage device 908 stores programs executed by the CPU 901, various data, various data acquired from the outside, and the like.
  • the storage device 908 can realize, for example, the function of the storage unit 120 described with reference to FIG.
  • the drive 909 is a reader / writer for a storage medium, and is built in or externally attached to the information processing device 900.
  • the drive 909 reads information recorded on a removable storage medium such as a mounted magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and outputs the information to the RAM 903.
  • the drive 909 can also write information to the removable storage medium.
  • the communication device 911 is, for example, a communication interface formed by a communication device or the like for connecting to the network 920.
  • the communication device 911 is, for example, a communication card for a wired or wireless LAN (Local Area Network), LTE (Long Term Evolution), Bluetooth (registered trademark), WUSB (Wireless USB), or the like.
  • the communication device 911 may be a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), a modem for various communications, or the like.
  • the communication device 911 can transmit and receive signals and the like to and from the Internet and other communication devices in accordance with a predetermined protocol such as TCP / IP.
  • the communication device 911 can realize, for example, the functions of the communication unit 100 and the communication unit 200 described with reference to FIG.
  • the network 920 is a wired or wireless transmission path for information transmitted from a device connected to the network 920.
  • the network 920 may include a public network such as the Internet, a telephone line network, a satellite communication network, various LANs (Local Area Network) including Ethernet (registered trademark), and a WAN (Wide Area Network).
  • the network 920 may include a dedicated line network such as IP-VPN (Internet Protocol-Virtual Private Network).
  • the above is an example of a hardware configuration capable of realizing the functions of the information processing apparatus 900 according to the embodiment.
  • Each of the above components may be realized by using a general-purpose member, or may be realized by hardware specialized for the function of each component. Therefore, it is possible to appropriately change the hardware configuration to be used according to the technical level at each time when the embodiment is implemented.
  • the information processing device 10 performs a process of controlling the operation of the terminal device 20 based on the information regarding the termination of the language information of the target speaker. As a result, the information processing device 10 can control the terminal device 20 to operate at the end of the speaker's utterance.
  • each device described in the present specification may be realized as a single device, or a part or all of the devices may be realized as separate devices.
  • the information processing device 10 and the terminal device 20 shown in FIG. 6 may be realized as independent devices.
  • it may be realized as a server device connected to the information processing device 10 and the terminal device 20 via a network or the like.
  • the server device connected by a network or the like may have the function of the control unit 110 of the information processing device 10.
  • each device described in the present specification may be realized by using any of software, hardware, and a combination of software and hardware.
  • the programs constituting the software are stored in advance in, for example, a recording medium (non-temporary medium: non-transitory media) provided inside or outside each device. Then, each program is read into RAM at the time of execution by a computer and executed by a processor such as a CPU.
  • a terminal detector that detects the end of the language information of the target speaker acquired by the terminal device, An operation control unit that performs a process of controlling the operation of the terminal device based on information about the end detected by the terminal detection unit, and an operation control unit.
  • Information processing device (2) The motion control unit When the language information before and after the terminal is detected satisfies a predetermined condition, a process of controlling the operation of the terminal device is performed. The information processing device according to (1) above. (3) The motion control unit When a predetermined time elapses from the detection of the terminal to the detection of the next language information, a process of controlling the operation of the terminal device is performed. The information processing device according to (2) above.
  • the motion control unit When the linguistic information after the termination is detected is interpreted as linguistic information indicating a change in the topic of the speaker's linguistic information, a process of controlling the operation of the terminal device is performed.
  • the motion control unit When the linguistic information after the termination is detected is interpreted as linguistic information instructing the operation of the terminal device, a process of controlling the operation of the terminal device is performed based on the linguistic information.
  • (6) The motion control unit When the behavior information of the speaker after the terminal is detected satisfies a predetermined condition, a process of controlling the operation of the terminal device is performed.
  • the information processing device according to any one of (1) to (5) above.
  • the motion control unit When the action information indicating the nod or aizuchi of the speaker after the termination is detected is equal to or more than a predetermined threshold value, a process of controlling the operation of the terminal device is performed.
  • the information processing device according to (6) above.
  • the motion control unit As an operation of the terminal device, a process of controlling an operation related to a memo, which is a means for recording the language information, is performed.
  • the information processing device according to any one of (1) to (7) above.
  • the motion control unit When the linguistic information contains a noun, a process for controlling the operation related to the memo is performed.
  • the motion control unit When the linguistic information includes a modifier or verb that is interpreted as emphasized by the speaker, a process for controlling the operation related to the memo is performed.
  • the information processing device according to (8) or (9) above.
  • the motion control unit When the speaker instructs the language information to be corrected, the language information corresponding to the correction is processed to control the operation related to the memo.
  • the information processing device according to any one of (8) to (10) above.
  • the motion control unit A process for controlling the operation related to the memo is performed so that the transition speed of the modification is equal to or less than a predetermined threshold value.
  • the information processing device according to (11) above.
  • the motion control unit When the speaker instructs to correct the language information, a process of controlling the operation of adding new language information corresponding to the correction is performed while leaving the language information before the correction corresponding to the correction.
  • the information processing device according to (11) or (12).
  • the motion control unit As an operation related to the memo, a process of controlling the operation of writing the language information on the medium or the operation of turning the medium on which the language information is written is performed.
  • the terminal device has an imaging unit that captures the speaker.
  • the motion control unit When the speaker is speaking, a process of controlling the operation of the terminal device is performed so that the line of sight of the terminal device is directed to the speaker.
  • the information processing device according to any one of (1) to (14) above.
  • An information processing system that includes a terminal device and software used for processing that controls the operation of the terminal device.
  • the software is installed in the information processing device.
  • a process for controlling the operation of the terminal device is performed based on the information regarding the termination of the language information of the target speaker acquired by the terminal device.
  • Information processing system. (17) The computer Detects the end of the language information of the target speaker acquired by the terminal device, A process for controlling the operation of the terminal device is performed based on the information regarding the detected termination.
  • Information processing method (18) A terminal detection procedure for detecting the end of the language information of the target speaker acquired by the terminal device, and An operation control procedure that performs a process of controlling the operation of the terminal device based on the information about the end detected by the end detection procedure, and an operation control procedure.
  • An information processing program characterized by having a computer execute.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention, suivant l'intention d'un énoncé par un locuteur, commande une opération se rapportant à une conversation par un agent de conversation. Un dispositif de traitement d'informations (10) selon un mode de réalisation de la présente invention comprend une unité de détection d'extrémité (113) qui détecte l'extrémité d'informations de langue depuis un locuteur, qui est une cible pour une acquisition par un dispositif terminal (20), et une unité de commande de fonctionnement (114) qui exécute un processus pour commander le fonctionnement du dispositif terminal (20) sur la base d'informations relatives à l'extrémité détectée par l'unité de détection d'extrémité (113).
PCT/JP2020/047859 2020-01-27 2020-12-22 Dispositif de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations WO2021153102A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020011191A JP2021117372A (ja) 2020-01-27 2020-01-27 情報処理装置、情報処理システム、情報処理方法および情報処理プログラム
JP2020-011191 2020-01-27

Publications (1)

Publication Number Publication Date
WO2021153102A1 true WO2021153102A1 (fr) 2021-08-05

Family

ID=77078736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/047859 WO2021153102A1 (fr) 2020-01-27 2020-12-22 Dispositif de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations

Country Status (2)

Country Link
JP (1) JP2021117372A (fr)
WO (1) WO2021153102A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002032370A (ja) * 2000-07-18 2002-01-31 Fujitsu Ltd 情報処理装置
JP2015069037A (ja) * 2013-09-30 2015-04-13 ヤマハ株式会社 音声合成装置およびプログラム
WO2018163648A1 (fr) * 2017-03-10 2018-09-13 日本電信電話株式会社 Système de dialogue, procédé de dialogue, dispositif de dialogue et programme
JP6517419B1 (ja) * 2018-10-31 2019-05-22 株式会社eVOICE 対話要約生成装置、対話要約生成方法およびプログラム
WO2019098038A1 (fr) * 2017-11-15 2019-05-23 ソニー株式会社 Dispositif de traitement d'informations et procédé de traitement d'informations
JP2019109424A (ja) * 2017-12-20 2019-07-04 株式会社日立製作所 計算機、言語解析方法、及びプログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002032370A (ja) * 2000-07-18 2002-01-31 Fujitsu Ltd 情報処理装置
JP2015069037A (ja) * 2013-09-30 2015-04-13 ヤマハ株式会社 音声合成装置およびプログラム
WO2018163648A1 (fr) * 2017-03-10 2018-09-13 日本電信電話株式会社 Système de dialogue, procédé de dialogue, dispositif de dialogue et programme
WO2019098038A1 (fr) * 2017-11-15 2019-05-23 ソニー株式会社 Dispositif de traitement d'informations et procédé de traitement d'informations
JP2019109424A (ja) * 2017-12-20 2019-07-04 株式会社日立製作所 計算機、言語解析方法、及びプログラム
JP6517419B1 (ja) * 2018-10-31 2019-05-22 株式会社eVOICE 対話要約生成装置、対話要約生成方法およびプログラム

Also Published As

Publication number Publication date
JP2021117372A (ja) 2021-08-10

Similar Documents

Publication Publication Date Title
US9293133B2 (en) Improving voice communication over a network
US9479911B2 (en) Method and system for supporting a translation-based communication service and terminal supporting the service
US20180286389A1 (en) Conference system, conference system control method, and program
JPWO2017200074A1 (ja) 対話方法、対話システム、対話装置、及びプログラム
CN110299152A (zh) 人机对话的输出控制方法、装置、电子设备及存储介质
US20180288109A1 (en) Conference support system, conference support method, program for conference support apparatus, and program for terminal
JP2020021025A (ja) 情報処理装置、情報処理方法及びプログラム
US20230386461A1 (en) Voice user interface using non-linguistic input
JPWO2019031268A1 (ja) 情報処理装置、及び情報処理方法
KR102367778B1 (ko) 언어 정보를 처리하기 위한 방법 및 그 전자 장치
US20180288110A1 (en) Conference support system, conference support method, program for conference support device, and program for terminal
WO2019026617A1 (fr) Dispositif de traitement d'informations et procédé de traitement d'informations
US20240055003A1 (en) Automated assistant interaction prediction using fusion of visual and audio input
WO2019142418A1 (fr) Dispositif de traitement d'informations et procédé de traitement d'informations
Alkhalifa et al. Enssat: wearable technology application for the deaf and hard of hearing
WO2021153101A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations
WO2019239659A1 (fr) Dispositif et procédé de traitement d'informations
WO2019202804A1 (fr) Dispositif et procédé de traitement vocal
US20200090663A1 (en) Information processing apparatus and electronic device
WO2017199486A1 (fr) Dispositif de traitement d'informations
WO2021153102A1 (fr) Dispositif de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations
JP2018055155A (ja) 音声対話装置および音声対話方法
WO2019073668A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
Panek et al. Challenges in adopting speech control for assistive robots
US20220199096A1 (en) Information processing apparatus and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20916450

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20916450

Country of ref document: EP

Kind code of ref document: A1