WO2022209416A1 - Dispositif de traitement d'informations, système de traitement d'informations et procédé de traitement d'informations - Google Patents

Dispositif de traitement d'informations, système de traitement d'informations et procédé de traitement d'informations Download PDF

Info

Publication number
WO2022209416A1
WO2022209416A1 PCT/JP2022/006852 JP2022006852W WO2022209416A1 WO 2022209416 A1 WO2022209416 A1 WO 2022209416A1 JP 2022006852 W JP2022006852 W JP 2022006852W WO 2022209416 A1 WO2022209416 A1 WO 2022209416A1
Authority
WO
WIPO (PCT)
Prior art keywords
patient
doctor
satisfaction
information processing
degree
Prior art date
Application number
PCT/JP2022/006852
Other languages
English (en)
Japanese (ja)
Inventor
拓哉 岸本
乃愛 金子
咲湖 安川
拓 田中
正範 勝
厚志 大久保
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2022209416A1 publication Critical patent/WO2022209416A1/fr

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Definitions

  • the present disclosure relates to an information processing device, an information processing system, and an information processing method.
  • Physicians usually estimate the patient's satisfaction/dissatisfaction by interviewing and observing the patient's dynamics/expressive reactions (physical findings), but it is difficult to accurately grasp it.
  • physicians' needs to quantify patient satisfaction/dissatisfaction have become apparent, but it is difficult to estimate patient satisfaction/dissatisfaction by diversion of existing algorithms. Therefore, at present, it is difficult for a doctor to grasp a patient's satisfaction/dissatisfaction.
  • the present disclosure proposes an information processing device, an information processing system, and an information processing method capable of improving the therapeutic effect.
  • An information processing apparatus includes an input unit for inputting voices and images of a patient and a doctor, and extracts feature amounts related to communication between the patient and the doctor from the voices and images of the patient and the doctor. an estimating unit for estimating the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount; and an output unit for outputting the degree of satisfaction, dissatisfaction or anxiety of the patient.
  • An information processing apparatus includes an input unit for inputting a doctor's voice, an extraction unit for extracting the doctor's voice feature amount related to communication between the doctor and the patient from the doctor's voice, and the a clipping learning unit that learns the expected response time based on the voice feature amount and information about the expected response time of the doctor.
  • An information processing apparatus extracts feature amounts related to communication between the patient and the doctor from an input unit for inputting voices and images of the patient and the doctor, and from the voices and images of the patient and the doctor.
  • an extraction unit and a learning unit that learns the patient's satisfaction level, dissatisfaction level, or anxiety level based on the feature quantity and a questionnaire regarding the patient's satisfaction level, dissatisfaction level, or anxiety level.
  • An information processing system includes an information acquisition device that acquires voices and images of a patient and a doctor, and extracts feature amounts related to communication between the patient and the doctor from the voices and images of the patient and the doctor. an extraction unit that estimates the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature quantity; and a display unit that displays the degree of satisfaction, dissatisfaction or anxiety of the patient.
  • a computer acquires voices and images of a patient and a doctor, and extracts feature amounts related to communication between the patient and the doctor from the voices and images of the patient and the doctor. and estimating the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount, and displaying the degree of satisfaction, dissatisfaction or anxiety of the patient.
  • FIG. 1 is a first diagram illustrating an example of a schematic configuration of an information processing system according to an embodiment of the present disclosure
  • FIG. 2 is a second diagram illustrating an example of a schematic configuration of an information processing system according to an embodiment of the present disclosure
  • FIG. It is a figure showing an example of a schematic structure of each part of an information processing system concerning an embodiment of this indication.
  • 6 is a flowchart showing an example of the flow of satisfaction (or dissatisfaction) estimation processing according to the embodiment of the present disclosure
  • FIG. 4 is a diagram for explaining the flow of overall processing according to an embodiment of the present disclosure
  • FIG. FIG. 4 is a first diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure
  • FIG. 7 is a second diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure
  • FIG. 9 is a third diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure
  • FIG. 10 is a fourth diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure
  • FIG. 4 is a first diagram for explaining an example of a line-of-sight intersection estimation process according to an embodiment of the present disclosure
  • FIG. 7 is a second diagram for explaining an example of the line-of-sight intersection estimation process according to the embodiment of the present disclosure
  • FIG. 13 is a third diagram for explaining an example of the line-of-sight intersection estimation process according to the embodiment of the present disclosure
  • FIG. 4 is a first diagram for explaining an example of a line-of-sight intersection estimation process according to an embodiment of the present disclosure
  • FIG. 7 is a second diagram for explaining an example of the line-of-sight intersection estimation process according to the embodiment of the present disclosure
  • FIG. 4 is a diagram for explaining an example of a display image for a doctor according to the embodiment of the present disclosure
  • FIG. 1 is a diagram for explaining an example of a system application service according to an embodiment of the present disclosure
  • FIG. 1 is a diagram illustrating an example of a schematic configuration of hardware according to an embodiment of the present disclosure
  • Each of one or more embodiments (including examples and modifications) described below can be implemented independently.
  • at least some of the embodiments described below may be implemented in combination with at least some of the other embodiments as appropriate.
  • These multiple embodiments may include novel features that differ from each other. Therefore, these multiple embodiments can contribute to solving different purposes or problems, and can produce different effects.
  • the function, implementation place, etc. may be different from each other.
  • Embodiment 1-1 Example of schematic configuration of information processing system 1-2.
  • FIG. 1 and 2 are diagrams showing an example of a schematic configuration of an information processing system 10 according to this embodiment.
  • the information processing system 10 functions as a treatment continuation support system (medical interview continuation system) that supports treatment continuation.
  • the information processing system 10 includes a server device 20, an information acquisition device 30, a doctor terminal device 40, and a patient terminal device 50.
  • These server device 20 , information acquisition device 30 , doctor terminal device 40 and patient terminal device 50 are communicably connected via a wired and/or wireless communication network 60 .
  • the communication network 60 for example, the Internet, WAN (Wide Area Network), LAN (Local Area Network), satellite communication network, etc. are used.
  • the server device 20, the doctor terminal device 40, and the patient terminal device 50 each correspond to an information processing device.
  • the server device 20 receives various types of information from the information acquisition device 30, and also transmits various types of information to the doctor terminal device 40, the patient terminal device 50, and the like. In addition, the server device 20 performs various processes on the received various information, and appropriately transmits the processed various information to the doctor terminal device 40, the patient terminal device 50, and the like.
  • a computer device is used as the server device 20 .
  • the information acquisition device 30 acquires sensing data such as voices and images of the patient and the doctor in the interview room where the doctor and the patient meet, and transmits the acquired sensing data to the server device 20 .
  • a microphone, a camera, or the like is used as the information acquisition device 30 .
  • Devices such as microphones and cameras are installed, for example, in an interview room where a doctor and a patient interview.
  • a microphone and a camera may be provided commonly for the doctor and the patient, or may be provided individually for each doctor and the patient.
  • a microphone converts sound into an electrical signal and captures it.
  • a camera gathers light from a subject located in the surroundings to form an optical image on an imaging surface, and acquires an image by converting the optical image formed on the imaging surface into an electrical image signal.
  • the information acquisition device 30 may be provided in a room other than the interview room.
  • the information acquisition device 30 may be provided on both the patient side and the doctor side.
  • the location of the interview is not limited to the interview room, and may be different locations such as an online interview, or various locations.
  • the doctor terminal device 40 receives various information from the server device 20, and displays and presents the received various information to the doctor.
  • This doctor terminal device 40 is used by a doctor.
  • a personal computer for example, a desktop computer, a notebook computer, a tablet terminal, etc.
  • a smart phone or the like is used.
  • the patient terminal device 50 receives various information from the server device 20, and displays and presents the received various information to the patient.
  • This patient terminal device 50 is used by a patient. Similar to the doctor terminal device 40, the patient terminal device 50 is, for example, a personal computer, a smart phone, or the like.
  • the server device 20 may be provided inside the hospital, or may be provided outside the hospital. Also, the server device 20 may be realized by, for example, cloud computing.
  • the information acquisition device 30 may also be provided inside the hospital or outside the hospital. The information acquisition device 30 may be directly connected to the communication network 60 or may be connected to the communication network 60 via the doctor terminal device 40 .
  • the number of the doctor terminal device 40, the number of the patient terminal device 50, and the number of the information acquisition device 30 are respectively one, but the number is not limited, and may be one or more. good too.
  • the server device 20 acquires the voice information of the speaker of the patient or the doctor in advance from the information acquisition device 30 (recording + recording), and then analyzes the sound source (for example, frequency analysis) to separate the patient voice information and the doctor voice information. After that, the server device 20 identifies the doctor's expected response time from the doctor's voice information, and the feature amount (for example, voice feature amount, face image, etc.) based on the patient's or doctor's voice information, facial image information, etc.
  • the feature amount for example, voice feature amount, face image, etc.
  • Characteristic value, parallax complex composite feature value, etc.) and text data such as medical charts and patient questionnaires are input into a learning model (for example, an inference model), and the patient's satisfaction, dissatisfaction, depression, and Anxiety, etc. are scored.
  • a learning model for example, an inference model
  • the doctor examines the patient's condition, adds a time stamp and marker in real time, and inputs the score, so that the image and sound can be learned at the time of acquisition.
  • It can be a teacher DATA for building a model.
  • this device can be used as a device for generating teacher DATA. If real-time settlement is not possible, teacher DATA is generated by, for example, adding a time stamp and a marker to video+recording DATA.
  • the doctor's expected response time is, for example, the part of the doctor's voice information in which the doctor expects a response from the patient.
  • the feature amounts such as the voice feature amount, the face image feature amount, the parallax composite feature amount, etc. are the feature amounts related to the communication between the patient and the doctor in the medical interview.
  • a learning model is, for example, a model learned by a convolutional neural network (DNN) or deep learning.
  • DNN convolutional neural network
  • the teacher DATA can be generated by adding a time stamp to the voice DATA in real time by the doctor.
  • the server device 20 transmits score results such as satisfaction, dissatisfaction, depression, and anxiety to the doctor terminal device 40 .
  • the doctor terminal device 40 displays the score result and presents it to the doctor.
  • the server device 20 for example, approaches the patient based on the score results (e.g., exchanging e-mail, SNS (Social Networking Service), etc.), including monitoring and feedback to the patient report outcome electronic system (ePRO) provide a better application.
  • Electronic patient reporting outcomes systems include, for example, patient diary systems.
  • the approach to the patient by e-mail or the like is carried out according to the degree of satisfaction, the degree of dissatisfaction, the degree of depression, the degree of anxiety, and the like.
  • the doctor can encourage the patient to visit the hospital next time during the medical interview, or the patient at home can visit the hospital, watch educational content, and use the app.
  • the therapeutic effect can be improved by promoting the use or the like.
  • the doctor can objectively grasp the satisfaction, dissatisfaction, depression, anxiety, etc.
  • the doctor's subjectivity is not included, it is possible to suppress variations in the method of approaching the patient from doctor to doctor. Since the voice information includes the voices of the doctor, the nurse, and the patient, sound source separation is performed for the patient only and the doctor only.
  • the patient's satisfaction, dissatisfaction, depression, anxiety, etc. change depending on the interaction between the doctor and the patient during the medical interview, and the patient's voice alone does not improve the accuracy of the judgment. Therefore, by separating the patient's and doctor's voices with one or more microphones and incorporating the timing and response time of the doctor's and patient's utterances into the analysis parameters, satisfaction, dissatisfaction, depression, and anxiety can be obtained more accurately. etc. can be estimated. Furthermore, by using the patient's face image instead of the voice, the patient's degree of satisfaction, dissatisfaction, depression, anxiety, etc. can be estimated more accurately.
  • vital data such as heart rate, heart rate variability, perspiration, and blood pressure acquired from sensors attached to the patient or image analysis
  • the patient's satisfaction, dissatisfaction, depression, and anxiety levels can be more accurately calculated. can be estimated.
  • FIG. 3 is a diagram showing an example of a schematic configuration of each part of the information processing system 10 according to this embodiment.
  • the server device 20 includes an input unit 21, a processing unit 22, and an output unit 23.
  • the processing unit 22 includes an audio feature amount extraction unit 22a, a clipping learning model 22b, an audio/image feature amount extraction unit 22c, and a learning model 22d.
  • the processing unit 22 corresponds to an extraction unit, an estimation unit, a learning unit, and the like.
  • the input unit 21 receives voice information and image information of the patient and doctor acquired by the information acquisition device 30 and inputs them to the server device 20 .
  • the processing unit 22 extracts a feature amount related to communication between the patient and the doctor based on the voice information and image information of the patient and the doctor input by the input unit 21, and based on the extracted feature amount, the patient's degree of satisfaction and dissatisfaction. Estimate degree of depression, degree of depression, etc.
  • the output unit 23 outputs information about the patient's degree of satisfaction, degree of dissatisfaction, and degree of depression estimated by the processing unit 22, and transmits the information to the doctor terminal device 40, the patient terminal device 50, and the like.
  • the processing unit 22 can estimate any one or all of the degree of satisfaction, the degree of dissatisfaction and the degree of depression.
  • the processing unit 22 extracts the doctor's voice information from the patient's and the doctor's voice information using the voice feature extraction unit 22a, extracts the doctor's expected response time from the extracted doctor's voice information, and specifies it using the learning model 22b.
  • the processing unit 22 uses the voice/image feature amount extraction unit 22c to extract the voice feature amount and the image feature amount of the patient and the doctor at the expected reply time from the voice information and the patient information of the patient and the doctor, and extracts the voice feature amount and the image feature amount of the patient and the doctor, to estimate patient satisfaction, dissatisfaction and depression. The details of the flow of this process will be described later.
  • the cutout learning model 22b is a learning model for obtaining the expected response time.
  • the learning model 22d is a learning model for obtaining the patient's degree of satisfaction, degree of dissatisfaction, degree of depression, and the like.
  • DL deep learning
  • CNN convolutional neural network
  • each functional unit such as the processing unit 22 described above may be configured by either or both of hardware and software. Their configuration is not particularly limited.
  • each of the functional units described above may be realized by executing a program stored in advance in ROM using a computer such as a CPU (Central Processing Unit) or MPU (Micro Processing Unit) using RAM as a work area. good.
  • each functional unit may be implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array).
  • the clipped learning model 22b and the learning model 22d may be stored, for example, in various types of storage.
  • the input unit 21 may receive the patient's vital information (vital data) acquired by the information acquisition device 30 in addition to the voice information and image information of the patient and the doctor, and input it to the server device 20 .
  • the information acquisition device 30 for example, a vital sensor synchronized with a camera, a microphone, or the like is used.
  • the vital sensor converts one or all of the patient's heartbeat, heartbeat variability, perspiration, blood pressure, etc., into electrical signals, and acquires these values as vital information.
  • a wearable device such as a smart watch can be used. This wearable device detects vital information in synchronization with, for example, a camera and a microphone.
  • the doctor terminal device 40 has a control unit 41, a communication unit 42, a display unit 43, and an operation unit 44.
  • the control unit 41 controls each unit such as the communication unit 42 and the display unit 43 .
  • the communication unit 42 enables communication with external devices via the communication network 60 .
  • the display unit 43 displays various information.
  • the display unit 43 is implemented by, for example, a display device such as a liquid crystal display or an organic EL (Electro-Luminescence) display.
  • the operation unit 44 receives an input operation from an operator such as a doctor.
  • the operation unit 44 is implemented by, for example, an input device such as a touch panel or buttons.
  • the patient terminal device 50 has a control unit 51, a communication unit 52, a display unit 53, and an operation unit 54, similar to the doctor terminal device 40.
  • the control unit 51 controls each unit such as the communication unit 52 and the display unit 53 .
  • the communication unit 52 enables communication with external devices via the communication network 60 .
  • the display unit 53 displays various information.
  • the display unit 53 is realized by, for example, a display device such as a liquid crystal display or an organic EL display.
  • the operation unit 54 receives an input operation from an operator such as a patient.
  • the operation unit 54 is implemented by, for example, an input device such as a touch panel or buttons.
  • FIG. 4 is a flowchart showing an example of the flow of satisfaction (or dissatisfaction) estimation processing according to this embodiment.
  • the voice feature amount extraction unit 22a separates the voice information between the patient and the doctor (step S11), and extracts the voice feature amount of the patient and the doctor (step S12).
  • the voice feature amount extraction unit 22a analyzes the voice information between the patient and the doctor input by the input unit 21, and separates the voice information of the patient and the voice information of the doctor from the voice information between the patient and the doctor. Then, the speech feature amount extraction unit 22a extracts the patient's speech feature amount from the patient's speech information, and extracts the doctor's speech feature amount from the doctor's speech information.
  • the processing unit 22 uses the clipped learning model 22b trained with the speech feature quantity (for example, doctor's speech feature quantity) tagged as the expected response time (given meaning by the expected reply time), and uses the speech feature quantity (For example, doctor's speech feature quantity) is learned (step S13), and the expected response time is specified (step S14).
  • the speech feature quantity for example, doctor's speech feature quantity
  • the expected response time is specified (step S14).
  • a method of tagging the expected response time for example, when recording + recording in advance, the doctor examines the patient's condition, adds a time stamp and marker in real time, and inputs the tag, so that the image and speech can be used as teacher DATA for building a learning model when acquiring speech.
  • this device can be used as a device for generating teacher DATA. If it is not possible to input in real time, teacher DATA is generated by appending a time stamp and a marker to the video/recording DATA.
  • the voice/image feature amount extraction unit 22c extracts the feature amount (voice/image feature amount of the patient/doctor) at the expected response time from the voice/image information (voice information and image information) between the patient and the doctor (step S15). For example, the voice/image feature amount extraction unit 22c analyzes voice/image information between the patient and the doctor at the expected response time, and separates the patient's voice/image information from the doctor's voice/image information at the expected response time. . Then, the voice/image feature amount extraction unit 22c extracts the voice/image feature amount of the patient at the expected response time from the voice/image information of the patient at the expected response time, and extracts the voice/image feature amount of the patient at the expected response time. We extract voice/image features of the doctor at the expected response time.
  • the processing unit 22 uses a learning model 22d trained with voice/image feature values tagged with satisfaction/dissatisfaction (meaning by satisfaction/dissatisfaction), and uses voice/image features of patients and doctors.
  • the image feature amount is learned (step S16), and the patient's degree of dissatisfaction or satisfaction is output (step S17).
  • FIG. 5 is a diagram for explaining the overall processing flow according to this embodiment. This overall processing is basically executed by the processing unit 22 of the server device 20 .
  • the patient's voice is input (step S21) and the doctor's voice is input (step S22).
  • These patient's voice and doctor's voice are mixed and input as voice information. Therefore, the sound sources of the speakers are separated for each speaker (step S23).
  • step S24 the patient's upper body image is input (step S24), and the doctor's upper body image is input (step S25).
  • processing is executed by the cutout learner (step S26), the cutout time and number of cutouts are obtained (step S27), the cutout phoneme factor is obtained (step S28), and the cutout face image element factor is obtained (step S29). Also, a reply response time based on the cut-out phoneme factor is obtained (step S30).
  • the camera position is input (step S31), the positions of the camera, the patient, and the doctor are analyzed (step S32), and the line-of-sight intersection complex factor is obtained (step S33).
  • a line-of-sight intersection frequency/rate based on the line-of-sight intersection complex factor is obtained (step S34), and information on head gaze is obtained (step S35).
  • the satisfaction level is estimated by the satisfaction level learning machine (step S36), and the dissatisfaction level is estimated by the dissatisfaction level learning machine (step S37).
  • the depression degree is estimated by the depression degree learning device (step S38), and satisfaction, dissatisfaction and depression (depression degree) are obtained (step S39).
  • the above-mentioned reply, extraction time/number of times, degree of satisfaction, degree of dissatisfaction, degree of depression (degree of depression), frequency/rate of crossing eyes, information on head gaze, etc. are transmitted to the doctor terminal device 40, for example (step S40).
  • the doctor terminal device 40 displays the above various information on the display unit 43 . This allows the doctor to visually recognize the above various information.
  • FIG. 6 to 9 are diagrams for explaining the flow of various processes in the overall process according to this embodiment. These various processes are also basically executed by the processing unit 22 of the server device 20 .
  • the processing unit 22 of the server device 20 has a learning device (for example, the cutout learning model 22b, the learning model 22d, etc.), specifies which part of the conversation between the two during the medical interview is to be used, and It is possible to use the prime factor indices (eg, prime factors) of . Processing time and costs can be reduced by limiting the portion of the voice information and image information that is used for analysis of the patient's degree of satisfaction and the like.
  • eye contact e.g., eye contact rate, eye contact time, etc.
  • eye contact time e.g., eye contact time, etc.
  • a doctor's voice is input (step S51), prime factors are extracted from the doctor's voice information (step S52), and input to a clipping learner.
  • the doctor's expected response time tag is input to the extraction learner (step S53).
  • An expected response time factor is extracted by the clipping learner (step S54), the expected response time factor is output from the clipping learner (step S55), and a clipping time is obtained (step S56).
  • the clipping learning device implements the clipping learning model 22 b and is included in the processing unit 22 of the server device 20 .
  • step S53 need not be executed.
  • the doctor's expected response time tag is set in advance by the doctor, for example. In this case, the doctor may operate the operation unit 44 of the doctor terminal device 40 to set the expected response time tag.
  • the expected response time tag is an example of information related to the expected response time.
  • a doctor's voice is input (step S61), and prime factors are extracted from the doctor's voice information and analyzed (step S62).
  • a patient's voice is input (step S63), and prime factors are extracted from the patient's voice information and analyzed (step S64).
  • Prime factors are extracted, analyzed, and cut out based on the doctor's voice-based prime factor, the patient's voice-based prime factor, and the clipping time (step S65).
  • a doctor-cut speech factor is obtained (step S66), and a patient-cut speech factor is obtained (step S67).
  • the doctor's cut-out face image factor and the patient's cut-out face image factor are also obtained in the same procedure as steps S61 to S67 above. That is, a doctor's image and a patient's image (for example, a doctor's upper body image and a patient's upper body image) are input, and by extracting and analyzing prime factors, the prime factors at the extraction time, that is, the doctor's extracted face image factor and the patient's extracted face image factor are obtained. can get.
  • the processing unit 22 of the server device 20 can recognize faces, eyes, and the like from doctor images and patient images by image recognition processing, and can acquire doctor-cut face image factors and patient-cut face image factors. can.
  • the gaze crossing complex factor is input to the satisfaction level learner (or dissatisfaction learner) (step S71), and the clipped phoneme factor is input to the satisfaction level learner (or dissatisfaction learner) (Ste S72), the clipped face image element factor is input to a satisfaction level learner (or dissatisfaction level learner) (step S73).
  • teacher data regarding the degree of satisfaction or dissatisfaction of the patient questionnaire is input to the satisfaction learning device (or dissatisfaction learning device) (step S74).
  • processing is performed by a satisfaction learner (or dissatisfaction learner) (step S75), and individual estimation of the patient's satisfaction or dissatisfaction. A value is obtained (step S76).
  • the satisfaction level learner (or the dissatisfaction learner) implements part of the learning model 22 d and is included in the processing unit 22 of the server device 20 .
  • this satisfaction level learning device (or dissatisfaction learning device)
  • step S74 need not be executed.
  • the teacher data used for learning is, for example, a doctor who examines the patient's condition, and the doctor adds a time stamp and satisfaction score in real time, and the learning model is built when the image and sound are acquired. can do.
  • this device can be used as a device for generating teacher DATA. If real-time input is not possible, teacher data can be generated by, for example, adding a time stamp and a satisfaction score to the recorded + recorded DATA.
  • the line-of-sight crossing complex factor is the line-of-sight crossing rate, crossing time, etc. That is, the line-of-sight intersection complex factor may be both or one of the line-of-sight intersection rate and the line-of-sight intersection time, and corresponds to line-of-sight intersection information.
  • the patient's extracted speech element and the patient's extracted face image element are used as the extracted speech element and the extracted face image element, but they are limited to these.
  • doctor-cut speech element factors and doctor-cut face image element factors may be added and used as needed.
  • the patient's vital data heart rate, heart rate variability, perspiration, blood pressure, etc.
  • the gaze crossing complex factor is input to the depression level learning machine (step S81), the clipped phoneme factor is input to the depression level learning machine (step S82), and the clipped face image element is input to the depression level learning machine (step S82). It is input to the learning device (step S83). Furthermore, teacher data regarding the degree of depression in a patient questionnaire (eg, PHQ-9: Patient Health Questionnaire-9) is input to the depression degree learner (step S84). Processing is performed by the depression level learning device (step S85), and an estimated value of the patient's depression level is obtained (step S86).
  • a patient questionnaire eg, PHQ-9: Patient Health Questionnaire-9
  • the depression degree learning device implements a part of the learning model 22d and is included in the processing unit 22 of the server device 20.
  • step S84 need not be executed.
  • charts can be used as training data.
  • the gaze-crossing complex factor eg, gaze-crossing rate, gaze-crossing time, etc.
  • the phoneme factor, and the picture element factor are combined into learners (eg, satisfaction level learner, dissatisfaction level learner, depression degree learner, etc.), it is possible to obtain estimates of the patient's satisfaction, dissatisfaction and depression.
  • a learning model is generated by inputting information on satisfaction, dissatisfaction, depression, etc. from patient questionnaires into the learning machine as teacher data.
  • FIG. 10 to 12 are diagrams for explaining an example of the line-of-sight intersection estimation processing according to the present embodiment.
  • the line-of-sight intersection estimation process is basically executed by the processing unit 22 of the server device 20 .
  • This line-of-sight crossing estimation processing realizes the azimuth estimation of the patient and the doctor, and the line-of-sight crossing estimation of the patient and the doctor based on the azimuth estimation result.
  • two cameras (camera 1 and camera 2) are arranged at the center of a virtual circle (installation circle), and the positions of the patient and the doctor are placed on the virtual circle.
  • the global orientation of the patient's and doctor's individual faces is estimated from the camera positions.
  • the patient's omnidirectional estimated angle, the doctor's omnidirectional estimated angle, and the like can be obtained.
  • the estimated global azimuth angle and the image from each camera are analyzed in real time to determine the face direction.
  • the patient's angle from the camera is ⁇
  • the doctor's angle from the camera is ⁇
  • the doctor's angle seen from the patient is 180- ⁇ - ⁇
  • the line-of-sight angle is ⁇ .
  • the patient's gaze vector e.g., the vector based on the patient's gaze angle
  • the doctor's gaze vector which is the estimated face orientation
  • the doctor's face is aligned with the patient. It will be suitable. In such a case, the patient's face and the doctor's face face each other, and the lines of sight of the patient and the doctor intersect. Note that it is possible to obtain the face orientation in two dimensions or three dimensions.
  • the face directions of the patient and the doctor are obtained from the images of 10 seconds before and after the response expected time factor, and the obtained face directions of the patient and the doctor are used to determine the mutual face
  • the line-of-sight crossing time and the line-of-sight crossing time rate of the azimuth are calculated.
  • the line-of-sight crossing time is, for example, the time during which the arc range occupied by the doctor on the installation circle and the line-of-sight angle of the patient match.
  • the line-of-sight crossing time rate is, for example, an occupancy rate indicating how much the line-of-sight crossing time occupies within 10 seconds before and after the expected response time, that is, 20 seconds, or how long the line-of-sight crossing time is within the expected response time. It is the occupancy rate that indicates whether the degree is occupied.
  • FIGS. 10 and 11 As a method of estimating the line-of-sight intersection of the patient and the doctor, it is possible to use arrangements other than those shown in FIGS. 10 and 11. For example, in FIG. Only one unit may be placed in the center of the circle (installation circle). Alternatively, only one camera capable of imaging in all directions may be arranged between the patient and the doctor. In this case, even if the medical interview is an online interview, it is possible to detect crossing of the gazes of the patient and the doctor.
  • other feature values for estimating the patient's satisfaction level include the fact that the patient's face is facing the doctor from the patient's global azimuth angle at the time of extraction, and the fact that the patient's face is facing the doctor.
  • Composite factors for determining downward facing, and composite factors of the response time from the start time of the doctor's expected response time until the patient speaks (response time until the patient speaks to the doctor's question) A complex factor such as a heartbeat (pulse) generated from the face image at the cut-out time or heartbeat fluctuation may be used. Gaze intersections can also be generated by the physician entering time stamps and tags in real time.
  • a feature amount for estimating the patient's satisfaction level for example, eyebrow movement, forehead wrinkles, etc. may be used to estimate the patient's satisfaction level.
  • heartbeat body temperature, blood pressure, respiration, etc.
  • body temperature, blood pressure, respiration, etc. may be obtained from, for example, a face image, or may be obtained from a vital sensor.
  • the feature amount for estimating the patient's degree of satisfaction for example, the patient's vital data (heart rate, heart rate variability, perspiration, blood pressure, etc.) obtained by a vital sensor synchronized with a camera or microphone may be used.
  • prime factors feature values
  • images and sounds for example, OpecCV (Open Source Computer Vision Library), Librosa (python package for music and sound analysis), OpenFace (by deep neural network It is possible to use the factor extracted by open source that performs face authentication).
  • FIG. 13 is a diagram for explaining an example of a display image for a doctor according to this embodiment.
  • the display image for doctors is, for example, a UI (user interface) image.
  • the display image includes information on satisfaction, dissatisfaction and depression, as well as various types of information.
  • the degree of satisfaction, the degree of dissatisfaction and the degree of depression are indicated by radar graphs.
  • “real-time extraction timing: time, ON-OFF”, “line-of-sight matching rate: display (%, time)”, “accumulated number of extractions”, “head position (head, lower part)” )”, “response speed”, and “elapsed time” are displayed. These pieces of information are obtained from the information obtained by the processing shown in FIG.
  • Such a display image is displayed by the display unit 43 of the doctor terminal device 40, for example. This allows the doctor to visually recognize the above various information.
  • the display image shown in FIG. 13 is merely an example, and the display image for presenting various information may be in another form. Further, for example, it is possible to present information other than the above information, and the information obtained by the processing shown in FIG. 5 may be presented as it is.
  • FIG. 14 is a diagram for explaining an example of a system application service according to this embodiment.
  • the system application service functions as a scheduling system. This schedule system is implemented by the server device 20 .
  • processing is executed by a learning device (eg, FIG. 8, FIG. 9, etc.) (step S91), and satisfaction, dissatisfaction and depression (depression) are obtained (steps S92 to S94).
  • a learning device eg, FIG. 8, FIG. 9, etc.
  • satisfaction, dissatisfaction and depression are obtained (steps S92 to S94).
  • the patient's impression (feeling) of the medical interview is classified into a satisfaction group or a dissatisfaction group (step S95), and the satisfaction group result (patient's impression of the medical interview is satisfied) is input to the scheduling system (step S96), and the dissatisfaction group result (patient's impression of the medical interview is unsatisfactory) is input to the scheduling system (step S97).
  • the degree of satisfaction, the degree of dissatisfaction, and the degree of depression are judged whether they are large or small with respect to each predetermined value (largeness is judged), and depending on the judgment result, the patient's impression of the medical interview is satisfied or dissatisfied. classified into groups.
  • the degree of satisfaction is greater than or equal to a first predetermined value
  • the degree of dissatisfaction is less than a second predetermined value
  • the degree of depression is less than a third predetermined value
  • the patient's impression of the medical interview is classified into the dissatisfied group. be done.
  • one or both of satisfaction, dissatisfaction and depression may be used for classification. good too. Processing speed can be improved by reducing the number of elements used for classification. On the other hand, the accuracy of classification can be improved by increasing the number of elements used for classification.
  • the scheduling system executes processing (step S98), and an advance notice is sent to the patient and doctor informing them of the date and place of the medical interview (step S99).
  • the advance notification is sent to the doctor terminal device 40 and the patient terminal device 50 by e-mail, and displayed on the display section 43 of the doctor terminal device 40 and the display section 53 of the patient terminal device 50 (steps S100 and S101).
  • the doctor and the patient have a medical interview according to the schedule based on the advance notice (steps S102, S103).
  • the scheduling system executes processing (step S104), and high-frequency contact (high-frequency contact) such as change of doctor is sent to the patient (step S105).
  • high-frequency contact is sent by e-mail to the patient terminal device 50 and displayed on the display unit 53 of the patient terminal device 50 (step S101).
  • step S106 another doctor is guided to the patient based on the dissatisfied group result. For example, a guide to another doctor is sent to the patient terminal device 50 by e-mail and displayed on the display unit 53 of the patient terminal device 50 . Another doctor is selected from a plurality of doctors registered in the database.
  • the patient When the patient receives a guide to another doctor, and there are multiple doctors according to the guide, the patient selects the desired doctor from those doctors (step S107). If the patient selects another doctor, the new doctor will be sent advance notice of the date, time and location of the medical interview. In addition, a prior notice of cancellation of the medical interview is sent to the original doctor.
  • the doctor can grasp the patient's degree of satisfaction, dissatisfaction, depression, etc. By encouraging patients to visit the hospital, watch educational content, use apps, etc., it is possible to improve the effectiveness of treatment.
  • advance notice enables doctors and patients to grasp the date, time and place of medical consultations, thereby improving the convenience of doctors and patients.
  • any or all of the patient's satisfaction, dissatisfaction, and depression may be displayed as That is, any or all of the patient's satisfaction, dissatisfaction, anxiety and depression may be estimated and displayed.
  • the doctor can grasp the patient's anxiety level, and by encouraging the patient to come to the hospital next time during the medical consultation, the treatment effect can be further improved. can.
  • the input unit 21 inputs voices and images of the patient and the doctor
  • the extraction unit for example, the processing unit 22 extracts the patient and the doctor from the voices and images of the patient and the doctor.
  • the estimation unit estimates the patient's satisfaction level, dissatisfaction level, or anxiety level based on the extracted feature amount
  • the output unit 23 outputs the patient's satisfaction degree, dissatisfaction or anxiety.
  • the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be displayed and visualized by the doctor terminal device 40 or the like. Therefore, the doctor can grasp the patient's degree of satisfaction, dissatisfaction, or anxiety, and can prevent interruption of treatment due to the patient's decreased willingness to continue treatment, thereby improving the treatment effect.
  • the estimation unit estimates the patient's degree of depression in addition to the patient's degree of satisfaction, dissatisfaction or anxiety
  • the output unit 23 estimates the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient. may be output. This makes it possible to display and visualize the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient by the doctor terminal device 40 or the like. Therefore, doctors can grasp the patient's degree of depression in addition to the patient's degree of satisfaction, dissatisfaction or anxiety, and can reliably prevent interruption of treatment due to a decrease in the patient's willingness to continue treatment. can definitely be improved.
  • the extraction unit may separate the patient's voice and the doctor's voice, and extract the patient's voice feature amount and the doctor's voice feature amount.
  • the extraction unit may separate the patient's voice and the doctor's voice, and extract the patient's voice feature amount and the doctor's voice feature amount.
  • the extraction unit may obtain the doctor's expected response time based on the doctor's speech feature amount, and obtain the feature amount at the obtained expected response time. As a result, it is only necessary to obtain the feature amount for the doctor's expected response time from the patient's and doctor's voices and images, so that the processing speed can be improved.
  • the extracting unit may obtain the expected response time by using the extraction learning model 22b that is learned with the doctor's speech feature quantity that is given meaning by the doctor's expected response time. As a result, the expected response time can be obtained with high accuracy.
  • a learning unit may be further provided that generates the extraction learning model 22b based on the expected response time and the doctor's speech feature amount. Thereby, the clipped learning model 22b can be generated appropriately.
  • the estimating unit may estimate the patient's degree of satisfaction, dissatisfaction, or anxiety using a learning model 22d that has been learned with feature values assigned meaning by the degree of satisfaction, dissatisfaction, or anxiety of the patient. . Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • a learning unit may be provided that generates the learning model 22d based on the results of a questionnaire regarding patient satisfaction, dissatisfaction, or anxiety, or the results of scoring by a doctor, and feature amounts. Thereby, the learning model 22d can be appropriately generated.
  • the estimation unit may estimate a facial image from the patient's image and extract the feature amount from the facial image. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • the extracting unit may extract, as the feature quantity, gaze crossing information regarding the gaze crossing of the patient and the doctor. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • the estimating unit obtains one or both of the degree and frequency of eye contact between the patient and the doctor based on the eye-crossing information, and calculates the satisfaction level of the patient based on one or both of the obtained degree and frequency of eye contact. , dissatisfaction or anxiety may be estimated. As a result, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with higher accuracy.
  • the input unit 21 inputs the position of the first camera that acquires the patient's image and the position of the second camera that acquires the doctor's image
  • the extraction unit inputs the position of the patient, the doctor, the first camera, and the second camera.
  • Eye-crossing information may be extracted based on individual positions, patient images, and doctor images. Thereby, the line-of-sight crossing information can be obtained with high accuracy.
  • the patient, the doctor, the first camera, and the second camera may each be positioned on the same virtual circle. Thereby, the line-of-sight crossing information can be easily obtained.
  • the patient and the doctor may be positioned on the same virtual circle, and the first camera and the second camera may be positioned at the center of the virtual circle. Thereby, the line-of-sight crossing information can be easily obtained.
  • the extraction unit may extract, as the feature amount, a feature amount indicating that the patient's face is facing the doctor or facing downward from the doctor's face. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • the extraction unit may extract, as a feature amount, the response time until the patient speaks to the doctor's question. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • the extraction unit may also extract the patient's heartbeat or heartbeat fluctuation from the patient's image as the feature amount. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • the input unit 21 inputs the voice and images of the patient and the doctor as well as the vital information of the patient, and the extraction unit extracts the feature amount from the voice and images of the patient and the doctor as well as the vital information of the patient.
  • the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.
  • each component of each device illustrated is functionally conceptual and does not necessarily need to be physically configured as illustrated.
  • the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.
  • Example of hardware configuration> Specific hardware configuration examples of information devices such as the server device 20, the doctor terminal device 40, and the patient terminal device 50 according to the above-described embodiment (or modification) will be described.
  • Information devices such as the server device 20, the doctor terminal device 40, and the patient terminal device 50 according to the embodiment (or modification) may be implemented by, for example, a computer 500 configured as shown in FIG.
  • FIG. 15 is a diagram showing a configuration example of hardware that implements the functions of information devices such as the server device 20, the doctor terminal device 40, and the patient terminal device 50 according to the embodiment (or modification).
  • the computer 500 has a CPU 510, a RAM 520, a ROM (Read Only Memory) 530, a HDD (Hard Disk Drive) 540, a communication interface 550 and an input/output interface 560.
  • the parts of computer 500 are connected by bus 570 .
  • the CPU 510 operates based on programs stored in the ROM 530 or HDD 540 and controls each section. For example, the CPU 510 loads programs stored in the ROM 530 or HDD 540 into the RAM 520 and executes processes corresponding to various programs.
  • the ROM 530 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 510 when the computer 500 is started, a program depending on the hardware of the computer 500, and the like.
  • BIOS Basic Input Output System
  • the HDD 540 is a computer-readable recording medium that non-temporarily records programs executed by the CPU 510 and data used by such programs.
  • the HDD 540 is a recording medium that records an information processing program according to the present disclosure, which is an example of the program data 541 .
  • the communication interface 550 is an interface for connecting the computer 500 to an external network 580 (Internet as an example).
  • CPU 510 receives data from another device or transmits data generated by CPU 510 to another device via communication interface 550 .
  • the input/output interface 560 is an interface for connecting the input/output device 590 and the computer 500 .
  • CPU 510 receives data from an input device such as a keyboard or mouse via input/output interface 560 .
  • the CPU 510 also transmits data to an output device such as a display, speaker, or printer via the input/output interface 560 .
  • the input/output interface 560 may function as a media interface for reading programs and the like recorded on a predetermined recording medium (media).
  • media include optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable Disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, or semiconductor A memory or the like is used.
  • the CPU 510 of the computer 500 executes the information processing program loaded on the RAM 520. , the server device 20, the doctor terminal device 40, or the patient terminal device 50.
  • the HDD 540 also stores information processing programs and data (eg, objective data, subjective data, objective score data, subjective score data, score images, etc.) according to the present disclosure.
  • CPU 510 reads and executes program data 541 from HDD 540 , as another example, these programs may be obtained from another device via external network 580 .
  • the present technology can also take the following configuration.
  • Information processing device (2)
  • the estimation unit estimates the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient,
  • the output unit outputs the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient,
  • the information processing apparatus according to (1) above.
  • the extraction unit separates the patient's voice and the doctor's voice, and extracts the patient's voice feature amount and the doctor's voice feature amount.
  • the information processing apparatus according to (1) or (2) above.
  • the extraction unit obtains the doctor's expected response time based on the doctor's voice feature amount, and obtains the feature amount at the obtained expected response time.
  • the extracting unit obtains the expected response time using a clipping learning model learned with the doctor's voice feature value assigned meaning by the expected response time.
  • the estimating unit estimates the patient's satisfaction, dissatisfaction or anxiety using a learning model learned with the feature value assigned by the patient's satisfaction, dissatisfaction or anxiety.
  • the information processing apparatus according to any one of (1) to (6) above.
  • a learning unit that generates the learning model based on the patient's satisfaction, dissatisfaction or anxiety questionnaire or scoring results by the doctor and the feature amount.
  • the information processing apparatus according to (7) above.
  • the estimation unit estimates a facial image from the patient's image and extracts the feature amount from the facial image.
  • the information processing apparatus according to any one of (1) to (8) above.
  • the extracting unit extracts line-of-sight crossing information related to line-of-sight crossing of the patient and the doctor as the feature amount.
  • the information processing apparatus obtains one or both of the degree and frequency of eye contact between the patient and the doctor based on the eye-crossing information, and based on one or both of the obtained degree and frequency of eye contact, estimate patient satisfaction, dissatisfaction or anxiety;
  • the information processing apparatus according to (10) above.
  • the input unit inputs a position of a first camera that acquires an image of the patient and a position of a second camera that acquires an image of the doctor,
  • the extraction unit extracts the line-of-sight crossing information based on the individual positions of the patient, the doctor, the first camera and the second camera, the patient's image and the doctor's image, The information processing apparatus according to (10) or (11) above.
  • the extraction unit extracts, as the feature amount, a feature amount indicating that the patient's face is facing the doctor or facing downward from the doctor's face.
  • the information processing apparatus according to any one of (1) to (14) above.
  • the extraction unit extracts a response time until the patient speaks to the doctor's question as the feature quantity, The information processing apparatus according to any one of (1) to (15) above.
  • the extraction unit extracts the patient's heartbeat or heartbeat fluctuation from the patient's image as the feature amount.
  • the information processing apparatus according to any one of (1) to (16) above.
  • the input unit inputs vital information of the patient in addition to the voice and image of the patient and the doctor,
  • the extraction unit extracts the feature amount from the patient's vital information in addition to the voice and image of the patient and the doctor.
  • the information processing apparatus according to any one of (1) to (17) above.
  • an input unit for inputting a doctor's voice an extracting unit that extracts the doctor's voice feature amount related to communication between the doctor and the patient from the doctor's voice; a clipping learning unit that learns the expected response time based on the audio feature amount and information about the expected response time of the doctor;
  • Information processing device (20) an input unit for inputting voices and images of a patient and a doctor; an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor; A learning unit that learns the patient's satisfaction level, dissatisfaction level, or anxiety level based on the feature amount and a questionnaire regarding the patient's satisfaction level, dissatisfaction level, or anxiety level; Information processing device.
  • (21) an information acquisition device that acquires voices and images of patients and doctors; an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor; An estimating unit that estimates the patient's satisfaction, dissatisfaction, or anxiety based on the feature amount; a display unit that displays the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety; An information processing system comprising (22) the computer Acquiring voice and images of patients and doctors, Extracting a feature amount related to communication between the patient and the doctor from the voice and image of the patient and the doctor, Estimate the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount, indicating the patient's level of satisfaction, dissatisfaction or anxiety; information processing method, including (23) An information processing method using the information processing apparatus according to any one of (1) to (20) above. (24) An information processing system comprising the information processing apparatus according to any one of (1) to (20) above.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Psychiatry (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Epidemiology (AREA)
  • Developmental Disabilities (AREA)
  • Educational Technology (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

La présente invention concerne un dispositif de serveur (20) qui est un exemple d'une forme d'un dispositif de traitement d'informations, lequel dispositif de serveur comprend : une unité d'entrée (21) dans laquelle des voix et des images d'un patient et d'un médecin sont entrées ; une unité d'extraction (par exemple, une unité de traitement (22)) qui extrait, à partir des voix et des images du patient et du médecin, une quantité de caractéristiques relative à la communication entre le patient et le médecin ; une unité d'estimation (par exemple, l'unité de traitement (22)) qui estime un niveau de satisfaction, un niveau de mécontentement, ou un niveau d'anxiété du patient sur la base de la quantité de caractéristiques ; et une unité de sortie (23) qui délivre le niveau de satisfaction, le niveau de mécontentement, ou le niveau d'anxiété du patient.
PCT/JP2022/006852 2021-03-30 2022-02-21 Dispositif de traitement d'informations, système de traitement d'informations et procédé de traitement d'informations WO2022209416A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021058562 2021-03-30
JP2021-058562 2021-03-30

Publications (1)

Publication Number Publication Date
WO2022209416A1 true WO2022209416A1 (fr) 2022-10-06

Family

ID=83456017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/006852 WO2022209416A1 (fr) 2021-03-30 2022-02-21 Dispositif de traitement d'informations, système de traitement d'informations et procédé de traitement d'informations

Country Status (1)

Country Link
WO (1) WO2022209416A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004112518A (ja) * 2002-09-19 2004-04-08 Takenaka Komuten Co Ltd 情報提供装置
JP2018195164A (ja) * 2017-05-19 2018-12-06 コニカミノルタ株式会社 解析装置、解析プログラム及び解析方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004112518A (ja) * 2002-09-19 2004-04-08 Takenaka Komuten Co Ltd 情報提供装置
JP2018195164A (ja) * 2017-05-19 2018-12-06 コニカミノルタ株式会社 解析装置、解析プログラム及び解析方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DOKI, YUICHIRO ET AL.: "3. Initiatives for creating a "AI base hospital" at the Osaka University Hospital for "advanced medical examinations and treatment systems by AI hospitals"", INNERVISION, MAGUBUROSU SHUPPAN, TOKYO, JP, vol. 35, no. 7, 25 June 2020 (2020-06-25), JP , pages 8 - 10, XP009540079, ISSN: 0913-8919 *
HIGUCHI, TAKUYA; SUZUKI, MASAYUKI; NAGANO, TOHRU; TACHIBANA, RYUKI; NISHIMURA, MASAFUMI; TAGUCHI, TAKAYA; NEMOTO, KIYOTAKA; TACHIK: "Depression level estimation from human's voice recorded daily by portable terminals", PROCEEDINGS OF THE 2014 AUTUMN MEETING THE ACOUSTICAL SOCIETY OF JAPAN; SAPPORO, JAPAN; SEPTEMBER 3-5, 2014, 31 March 2014 (2014-03-31) - 5 September 2014 (2014-09-05), pages 307 - 310, XP009540021 *
NAKAMURA, SOTA ET AL.: "Analysis of Pulse Wave Obtained by Wristband Activity Monitor for Estimation of Anxiety State", IEICE TECHNICAL REPORT, DENSHI JOUHOU TSUUSHIN GAKKAI, JP, vol. 118, no. 44 (MBE2018-1), 19 May 2018 (2018-05-19), JP , pages 1 - 6, XP009540068, ISSN: 0913-5685 *

Similar Documents

Publication Publication Date Title
JP7491943B2 (ja) 個別化されたデジタル治療方法およびデバイス
US10524715B2 (en) Systems, environment and methods for emotional recognition and social interaction coaching
US20210248656A1 (en) Method and system for an interface for personalization or recommendation of products
US11301775B2 (en) Data annotation method and apparatus for enhanced machine learning
Won et al. Automatic detection of nonverbal behavior predicts learning in dyadic interactions
US20190290129A1 (en) Apparatus and method for user evaluation
Khalifa et al. Non-invasive identification of swallows via deep learning in high resolution cervical auscultation recordings
US9286442B2 (en) Telecare and/or telehealth communication method and system
JP2021529382A (ja) 精神的健康評価のためのシステム及び方法
CN102149319B (zh) 阿尔茨海默氏症认知使能器
US20190239791A1 (en) System and method to evaluate and predict mental condition
US20090132275A1 (en) Determining a demographic characteristic of a user based on computational user-health testing
US20120164613A1 (en) Determining a demographic characteristic based on computational user-health testing of a user interaction with advertiser-specified content
US20090119154A1 (en) Determining a demographic characteristic based on computational user-health testing of a user interaction with advertiser-specified content
KR102552220B1 (ko) 정신건강 진단 및 치료를 적응적으로 수행하기 위한 컨텐츠 제공 방법, 시스템 및 컴퓨터 프로그램
JP2022548473A (ja) 患者監視のためのシステム及び方法
van den Broek et al. Unobtrusive sensing of emotions (USE)
Ferrari et al. Using voice and biofeedback to predict user engagement during requirements interviews
Geiger et al. Computerized facial emotion expression recognition
EP4182875A1 (fr) Procédé et système d'interface pour la personnalisation ou la recommandation de produits
JP2018503187A (ja) 被験者とのインタラクションのスケジューリング
Van Stan et al. Recent innovations in voice assessment expected to impact the clinical management of voice disorders
WO2022209416A1 (fr) Dispositif de traitement d'informations, système de traitement d'informations et procédé de traitement d'informations
JP2019109859A (ja) 指導支援システム、指導支援方法及び指導支援プログラム
Virk et al. A multimodal feature fusion framework for sleep-deprived fatigue detection to prevent accidents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22779643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22779643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP