WO2022044095A1 - Information processing device, learning device, and learned model - Google Patents

Information processing device, learning device, and learned model Download PDF

Info

Publication number
WO2022044095A1
WO2022044095A1 PCT/JP2020/031904 JP2020031904W WO2022044095A1 WO 2022044095 A1 WO2022044095 A1 WO 2022044095A1 JP 2020031904 W JP2020031904 W JP 2020031904W WO 2022044095 A1 WO2022044095 A1 WO 2022044095A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
observation
image data
phrase
Prior art date
Application number
PCT/JP2020/031904
Other languages
French (fr)
Japanese (ja)
Inventor
伸之 渡辺
英敏 西村
一仁 堀内
善興 金子
Original Assignee
オリンパス株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オリンパス株式会社 filed Critical オリンパス株式会社
Priority to PCT/JP2020/031904 priority Critical patent/WO2022044095A1/en
Publication of WO2022044095A1 publication Critical patent/WO2022044095A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing

Definitions

  • the present invention relates to an information processing device, a learning device, and a trained model.
  • pathological diagnosis is performed in which a pathologist observes a specimen collected from a patient and makes a diagnosis.
  • a pathologist may create a diagnostic report by adding a diagnosis result (information such as presence / absence, type, position, etc. of a lesion) to an image of a specimen.
  • Patent Document 1 discloses a learning data generation support device that uses image data in which the lesion features of a diagnostic report and the lesion features detected by image processing match as correct answer data for machine learning.
  • a pathologist when generating learning data using the technique of Patent Document 1, a pathologist needs to create a diagnostic report. Specifically, the pathologist performs an input operation of the target area such as a lesion using a keyboard or a pointing device while observing the image of the specimen in order to associate the lesion or the like with which image and position. It was necessary and burdened by the pathologist.
  • the present invention has been made in view of the above, and is an information processing device, a learning device, and a learning device that can reduce the burden of setting a target area on image data and can easily generate learning data.
  • the purpose is to provide a finished model.
  • the information processing apparatus extracts predetermined words and phrases from the voice data recorded by the voice spoken by the user while observing the image data.
  • the observation data corresponding to the time when the predetermined phrase is uttered is extracted from the phrase extraction unit and the observation data recording the observation mode for the image data, and the image data is obtained based on the extracted observation data. It includes an area setting unit for setting a target area, and a generation unit for generating labeled image data in which the predetermined word / phrase is associated with the target area.
  • the phrase extraction unit has a conversion unit that converts the voice data into text data and the important words / phrases recorded in the recording unit from the text data. It has an extraction unit for extracting as words and phrases.
  • the extraction unit includes an important phrase storage unit for storing the important phrase, and the important phrase storage unit assigns a predetermined attribute to the important phrase. It is stored, and the extraction unit extracts one or more of the important words and phrases included in each attribute from the text data as the predetermined words and phrases.
  • the attributes of the important phrase include a first attribute representing the site or origin of the lesion and a second attribute representing the state of the lesion.
  • the text data is text data spoken as one phrase
  • the extraction unit extracts one or more of the important words and phrases from the text data.
  • the label corresponding to the target area is represented by the set of important words and phrases belonging to one or more attributes.
  • the area setting unit preferentially extracts the observation data having a large observation magnification
  • the generation unit preferentially extracts the observation data having a large observation magnification. Attached to labeled image data.
  • the area setting unit preferentially extracts the observation data having a small moving speed of the observation area, and the generating unit has a small moving speed of the observation area.
  • the labeled image data is given a higher degree of importance.
  • the area setting unit preferentially extracts the observation data for which the observation area stays in a predetermined area for a long time, and the generation unit generates the observation.
  • the longer the region stays in the predetermined region the higher the importance is given to the labeled image data.
  • the observation data includes the line-of-sight data obtained by detecting the line-of-sight of the user.
  • the observation data includes the visual field data in which the observation visual field of the user is recorded.
  • the observation data includes the position information of the pointing device.
  • a phrase extraction unit that extracts a predetermined phrase from voice data recording a voice uttered by a user while observing a frame image, and the predetermined phrase are uttered.
  • An area setting unit that extracts the frame image corresponding to the time and sets the extracted frame image in the target area, and a generation unit that generates labeled image data in which the predetermined phrase and the target area are associated with each other. , Equipped with.
  • the learning device records a word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data, and records the observation mode for the image data.
  • the observation data corresponding to the time when the predetermined word is uttered is extracted from the observed data, and the area setting unit for setting the target area in the image data based on the extracted observation data, and the predetermined area.
  • a trained model that detects a region corresponding to the predetermined phrase from an image group by performing machine learning using the labeled image data and a generator that generates labeled image data in which the phrase and the target area are associated with each other. It is provided with a model generation unit for generating data.
  • a predetermined phrase is extracted from the voice data in which the voice uttered by the user is recorded while observing the training image data, and the observation mode for the training image data is obtained. From the recorded observation data, the observation data corresponding to the time when the predetermined phrase is uttered is extracted, and based on the extracted observation data, a target area is set in the learning image data, and the predetermined phrase is set. The labeled image data associated with the target area is generated, and is generated by machine learning using the labeled image data. When the determination image data is input, the said phrase associated with the predetermined phrase. A region estimated to be the target region is extracted, and the computer is made to function so as to output the likelihood that the region is the target region.
  • an information processing device a learning device, and a trained model that can reduce the burden of setting a target area for image data and easily generate learning data.
  • FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment.
  • FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus according to the first embodiment.
  • FIG. 3 is a diagram showing voice data.
  • FIG. 4 is a diagram showing observation data.
  • FIG. 5 is a schematic diagram showing the configuration of the information processing apparatus according to the second embodiment.
  • FIG. 6 is a schematic diagram showing the configuration of the information processing apparatus according to the second embodiment.
  • FIG. 7 is a block diagram showing a functional configuration of the information processing apparatus according to the second embodiment.
  • FIG. 8 is a flowchart showing an outline of the processing executed by the information processing apparatus.
  • FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment.
  • FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus according to the first embodiment.
  • FIG. 3 is a diagram showing voice data.
  • FIG. 4 is a diagram showing observation
  • FIG. 9 is a diagram showing words and phrases extracted by the word and phrase extraction unit.
  • FIG. 10 is a diagram showing how the area setting unit sets the target area.
  • FIG. 11 is a block diagram showing a functional configuration of the microscope system according to the third embodiment.
  • FIG. 12 is a flowchart showing an outline of the processing performed by the microscope system according to the third embodiment.
  • FIG. 13 is a schematic view showing the configuration of the endoscope system according to the fourth embodiment.
  • FIG. 14 is a block diagram showing a functional configuration of the endoscope system according to the fourth embodiment.
  • FIG. 15 is a flowchart showing an outline of the processing executed by the endoscope system according to the fourth embodiment.
  • FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment.
  • the information processing system 1 shown in FIG. 1 displays an information processing device 10 that performs various processing on line-of-sight data, voice data, and image data input from the outside, and various data output from the information processing device 10.
  • the display unit 20 is provided.
  • the information processing device 10 and the display unit 20 are bidirectionally connected by wireless or wired connection.
  • the information processing device 10 shown in FIG. 1 is realized by using a program installed in, for example, a server or a personal computer, and various data are input via a network or various data acquired by an external device are input. Entered.
  • the information processing apparatus 10 includes a word extraction unit 11, an area setting unit 12, a generation unit 13, a recording unit 14, and a display control unit 15.
  • the word / phrase extraction unit 11 extracts a predetermined word / phrase from the voice data of the user input from the outside and recorded the voice uttered by the user while observing the image data.
  • the word / phrase extraction unit 11 includes a conversion unit 111 that converts voice data into text data, and an extraction unit 112 that extracts important words / phrases recorded in the recording unit 14 from the text data as predetermined words / phrases.
  • the conversion unit 111 converts the voice data into character information (text data) by performing a well-known text conversion process on the voice data, and outputs this character information to the extraction unit 112.
  • the target area may be set in the image data as the voice information and then converted into the character information.
  • the extraction unit 112 extracts important words and phrases recorded in advance in the recording unit 14 from the text data converted by the conversion unit 111, and records in the recording unit 14 the time when the important words and phrases are uttered in association with the important words and phrases. do. Specifically, the phrase extraction unit 11 associates important words and phrases uttered in this frame with each frame of voice data and records them in the recording unit 14. Further, the user's voice data input from the outside is generated by a voice input unit such as a microphone.
  • the phrase extraction unit 11 is configured by using a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a GPU (Graphics Processing Unit), and the like.
  • the phrase extraction unit 11 may handle the voice data as it is without converting the voice data into text data. In this case, the phrase extraction unit 11 does not have the conversion unit 111, and the extraction unit 112 extracts a predetermined phrase directly from the voice data.
  • the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered from the observation data in which the observation mode for the image data is recorded.
  • the area setting unit 12 may extract observation data included in a predetermined time or a predetermined number of sheets including a time when an important phrase is uttered (from the start to the end of utterance). Further, the area setting unit 12 sets a target area in the image data based on the extracted observation data, and records the position information of the target area in the recording unit 14.
  • the area setting unit 12 may set the target area in the image data corresponding to one selected observation data, and may set the target area in the plurality of image data corresponding to the selected plurality of observation data.
  • the target area may be set.
  • the area setting unit 12 is configured by using a CPU, FPGA, GPU and the like.
  • the area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is uttered, selects one or a plurality of observation data according to, for example, a magnification of the extracted observation data, and corresponds to the selected observation data.
  • the display area is set as the target area, and the position information of the target area is recorded in the recording unit 14.
  • the area setting unit 12 may set a part of the display area set by predetermined image processing or the like as the target area.
  • the field of view displayed on the display unit 20 matches the field of view of the image data, so that the relative positional relationship of the observation field of view with respect to the absolute coordinates of the image. Does not change.
  • the usage mode uses an optical microscope and the image observed by the user is recorded as a moving image, that is, when the image data is a moving image, the magnification of the objective lens and the time change of the stage position are changed. Is recorded in the recording unit 14 in synchronization with the audio data as observation data.
  • the area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is spoken, selects one or a plurality of observation data according to, for example, a magnification of the extracted observation data, and corresponds to the selected observation data.
  • An image is set in the target area, and information representing this target area is recorded in the recording unit 14. Further, when the usage pattern uses an endoscope system, when the image observed by the user is recorded as a moving image, that is, when the image data is a moving image, a feature showing the correlation between consecutive frames. The amount and the movement vector between consecutive frames are recorded as observation data in the recording unit 14 in synchronization with the audio data.
  • the area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is spoken, selects one or a plurality of observation data by, for example, a movement vector of the extracted observation data, and corresponds to the selected observation data.
  • the image to be used is set in the target area, and information representing this target area is recorded in the recording unit 14.
  • the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area, and outputs the generated labeled image data to the recording unit 14 and the display control unit 15. Specifically, the generation unit 13 generates labeled image data in which the important words extracted by the word extraction unit 11 and the target area set by the area setting unit 12 are associated with each other.
  • the generation unit 13 is configured by using a CPU, FPGA, GPU and the like.
  • the recording unit 14 synchronously records the voice data input from the word extraction unit 11, the observation data input from the area setting unit 12, and the image data input from the generation unit 13. Further, the recording unit 14 records the important words / phrases input in advance and the labeled image data in which the important words / phrases input from the generation unit 13 are associated with the target area.
  • the recording unit 14 is also used for temporarily recording important words and phrases extracted by the phrase extraction unit 11, observation data extracted by the area setting unit 12, target areas set by the area setting unit 12, and the like. Further, the recording unit 14 records various programs executed by the information processing apparatus 10 and data being processed.
  • the recording unit 14 is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like.
  • the display control unit 15 displays by superimposing various information on the image corresponding to the image data input from the outside and outputting it to the external display unit 20.
  • the display control unit 15 is configured by using a CPU, FPGA, GPU and the like.
  • the word / phrase extraction unit 11, the area setting unit 12, the generation unit 13, and the display control unit 15 may be configured so that each function can be exhibited by using any one of the CPU, FPGA, and GPU.
  • the CPU, FPGA, and GPU may be combined and configured so that each function can be exhibited.
  • the display unit 20 displays an image corresponding to the image data input from the display control unit 15.
  • the display unit 20 is configured by using a display monitor such as an organic EL (Electroluminescence) or a liquid crystal display.
  • FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus 10.
  • the information processing apparatus 10 acquires image data, observation data, and audio data input from the outside (step S101). These data are synchronized and are temporally associated with each other.
  • the image data is, for example, an entire image in WSI
  • the observation data is information representing a display area and an observation magnification in the entire image
  • the audio data is data that records the audio spoken by the user while observing.
  • the phrase extraction unit 11 extracts a keyword (predetermined phrase) from the voice data (step S102). Specifically, first, the conversion unit 111 converts the voice data into text data. Then, the extraction unit 112 extracts the important words and phrases recorded in the recording unit 14 from the text data as predetermined words and phrases.
  • FIG. 3 is a diagram showing voice data.
  • the phrase extraction unit 11 extracts utterance contents X1, X2, and X3 that match important words and phrases from the voice data converted into text data, respectively, and the start time t11 at which each utterance is started. , T21, t31 and the end times t12, t22, t32 are associated and recorded in the recording unit 14.
  • step S103 the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data (step S103).
  • the recording unit 14 records the time and the coordinate information of two predetermined points in the image displayed on the display unit 20 at that time in association with each other.
  • the predetermined two points are not particularly limited, but are, for example, two points, the upper left corner and the lower right corner of the rectangular image displayed on the display unit 20, and represent the display area displayed on the display unit 20.
  • the observation magnification can be calculated from these two points.
  • the area setting unit 12 extracts the observation magnification calculated from the coordinate information as the observation data included in the start time t11 to the end time t12 for the utterance content X1 shown in FIG. Further, the largest value is selected from the extracted observation magnifications, and the display area (image displayed on the display unit 20) corresponding to this observation magnification is set as the target area. Then, the area setting unit 12 records the set target area in the recording unit 14.
  • FIG. 4 is a diagram showing observation data.
  • the target area set by the area setting unit 12 is recorded in the recording unit 14 as coordinate information of two predetermined points representing the time and the display area.
  • the time t13 and the coordinate information representing the display area at this time t13 are recorded in the recording unit 14.
  • the coordinate information is coordinate A11 as coordinate information 1 corresponding to the upper left corner of the image displayed on the display unit 20 at time t13, and coordinate A12 as coordinate information 2 corresponding to the lower right corner of this image.
  • the area in which the image data is most magnified and observed while the user generates the utterance content X1 is set as the target area.
  • the time t23 and the coordinate information (A21, A22) representing the display area at this time t23 Is recorded in the recording unit 14.
  • the time t33 and the coordinate information (A31, A32) representing the display area at this time t33 Is recorded in the recording unit 14.
  • the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S104). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 14, and the information processing apparatus 10 ends this processing.
  • the area setting unit 12 sets the target area based on the observation data, the burden of setting the target area on the image data is reduced, and the learning data is easily generated. can do.
  • the target area can be set by the voice of the user, the observation is not interrupted by the user looking at the keyboard, the pointing device, or the like during the observation, so that the observation efficiency is improved. be able to.
  • the generation unit 13 since the generation unit 13 records the labeled image data in the recording unit 14, the learning data used in machine learning such as deep learning can be easily acquired. Then, by performing machine learning such as deep learning using the acquired image data with a label for learning, when the image data for judgment that is an image of a sample or the like is input, the target area associated with the important phrase is input. It is possible to extract a region to be estimated and generate a trained model that outputs the likelihood that this region is the target region. Although machine learning may be performed by the information processing device 10 using the labeled image data generated by the information processing device 10, machine learning may be performed by a computer different from the information processing device 10 to obtain a trained model. May be generated. Further, using the generated learning model, the information processing apparatus 10 may extract the target area for the determination image and output the likelihood, but the object for the determination image may be output by a computer different from the information processing apparatus 10. Region extraction and likelihood output may be performed.
  • the area setting unit 12 has described an example of extracting the observation data included in the start time t11 to the end time t12 for the utterance content X1, but the present invention is not limited to this.
  • Observation data may be extracted including up to.
  • the observation data is extracted including the time after the end time t12 by a predetermined time. You may.
  • the area setting unit 12 has described an example in which the largest value is selected from the extracted observation magnifications and the display area corresponding to the observation magnification is set as the target area. Not limited.
  • the area setting unit 12 may select the largest value from the extracted observation magnifications and set a part of the display area corresponding to the observation magnification as the target area. For example, the area setting unit 12 may set a part including the lesion area extracted from the display area by image processing as the target area.
  • the area setting unit 12 may preferentially extract the observation data having a large observation magnification, and the generation unit 13 may give higher importance to the labeled image data as the observation magnification is larger.
  • the area setting unit 12 extracts observation data having an observation magnification larger than the threshold value.
  • the generation unit 13 assigns a higher importance to the labeled image data as the observation magnification is larger, and records the data in the recording unit 14.
  • the image can be automatically labeled so that the image observed by the user at an increased observation magnification becomes more important, reducing the burden on the user when labeling the image. be able to.
  • the area setting unit 12 preferentially extracts the observation data having a low moving speed of the observation area, and the generation unit 13 may give higher importance to the labeled image data as the moving speed of the observation area is smaller. good.
  • the area setting unit 12 extracts observation data in which the moving speed of the observation area is smaller than the threshold value.
  • the generation unit 13 assigns a higher importance to the labeled image data as the moving speed of the observation region is smaller, and records the data in the recording unit 14.
  • the image can be automatically labeled so that the image that the user gazes at without moving the observation area so much becomes more important, and the burden on the user when labeling the image. Can be reduced.
  • the area setting unit 12 preferentially extracts the observation data for which the observation area stays in the predetermined area for a long time
  • the generation unit 13 preferentially extracts the observation data for which the observation area stays in the predetermined area for a long time.
  • the importance may be given to the labeled image data.
  • the area setting unit 12 extracts observation data in which the observation area stays in a predetermined area for a time longer than the threshold value.
  • the generation unit 13 assigns a higher importance to the labeled image data and records it in the recording unit 14 as the observation region stays in the predetermined region for a longer time.
  • the image can be automatically labeled so that the observation area by the user stays in the predetermined area for a long time and the image that the user gazes at becomes more important, and the image is labeled. It is possible to reduce the burden on the user when attaching.
  • each of the observation data and the voice data is input from the outside, but in the second embodiment, the observation data and the voice data are generated.
  • the processing executed by the information processing apparatus according to the second embodiment will be described.
  • the same components as those of the information processing system 1 according to the first embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.
  • FIG. 7 is a block diagram showing a functional configuration of the information processing apparatus according to the second embodiment.
  • the information processing apparatus 1a shown in FIGS. 5 to 7 includes a phrase extraction unit 11, an area setting unit 12, a generation unit 13, a display unit 20, an observation information detection unit 30, a voice input unit 31, and a control unit. 32, a time measuring unit 33, a recording unit 34, and an operation unit 35 are provided.
  • the observation information detection unit 30 includes an LED light source that irradiates near-infrared rays and an optical sensor (for example, CMOS (Complementary Metal Oxide Sensor), CCD (Charge Coupled Device), etc.) that captures pupil points and reflection points on the cornea. Constructed using.
  • the observation information detection unit 30 is provided on the side surface of the housing of the information processing apparatus 1a in which the user U1 can visually recognize the display unit 20 (see FIGS. 5 and 6). Under the control of the control unit 32, the observation information detection unit 30 generates line-of-sight data that detects the line-of-sight of the user U1 with respect to the image displayed by the display unit 20, and outputs this line-of-sight data to the control unit 32 as observation data.
  • the observation information detection unit 30 irradiates the cornea of the user U1 with near infrared rays from an LED light source or the like under the control of the control unit 32, and the optical sensor is used as a pupil point on the cornea of the user U1. Line-of-sight data is generated by imaging the reflection points. Then, the observation information detection unit 30 determines the pupil point and the reflection point of the user U1 based on the analysis result analyzed by image processing or the like on the data generated by the optical sensor under the control of the control unit 32.
  • the observation information detection unit 30 may generate line-of-sight data in which the line of sight of the user U1 is detected by detecting the pupil of the user U1 by using a well-known pattern matching simply by using an optical sensor.
  • the line-of-sight data may be generated by detecting the line-of-sight of the user U1 using other sensors or other well-known techniques.
  • the voice input unit 31 converts a microphone into which voice is input and a voice codec that converts the voice received by the microphone into digital voice data and outputs the voice data to the control unit 32 by amplifying the voice data. Consists of using.
  • the voice input unit 31 generates voice data by receiving the voice input of the user U1 under the control of the control unit 32, and outputs the voice data to the control unit 32.
  • the voice input unit 31 may be provided with a speaker or the like capable of outputting voice, and may be provided with a voice output function.
  • the control unit 32 is configured by using a CPU, FPGA, GPU and the like, and controls the observation information detection unit 30, the voice input unit 31, and the display unit 20.
  • the control unit 32 includes an observation information detection control unit 321, a voice input control unit 322, and a display control unit 323.
  • the observation information detection control unit 321 controls the observation information detection unit 30. Specifically, the observation information detection control unit 321 causes the observation information detection unit 30 to irradiate the user U1 with near infrared rays at predetermined timings, and causes the observation information detection unit 30 to image the pupil of the user U1. Generates line-of-sight data. Further, the observation information detection control unit 321 performs various image processing on the line-of-sight data input from the observation information detection unit 30 and outputs the image to the recording unit 34.
  • the voice input control unit 322 controls the voice input unit 31, performs various processes such as gain up and noise reduction processing on the voice data input from the voice input unit 31, and outputs the voice data to the recording unit 34.
  • the display control unit 323 controls the display mode of the display unit 20.
  • the display control unit 323 causes the display unit 20 to display an image corresponding to the image data recorded in the recording unit 34.
  • the time measurement unit 33 is configured by using a timer, a clock generator, or the like, and adds time information to the line-of-sight data generated by the observation information detection unit 30, the voice data generated by the voice input unit 31, and the like.
  • the recording unit 34 is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like, and records various information related to the information processing apparatus 1a.
  • the recording unit 34 includes an observation data recording unit 341, an audio data recording unit 342, an image data recording unit 343, and a program recording unit 344.
  • the observation data recording unit 341 records the line-of-sight data input from the observation information detection control unit 321 as observation data, and outputs the observation data to the area setting unit 12.
  • the voice data recording unit 342 records the voice data input from the voice input control unit 322 and outputs the voice data to the phrase extraction unit 11.
  • the image data recording unit 343 records a plurality of image data.
  • the plurality of image data are data input from the outside of the information processing device 1a or data captured by an external image pickup device by a recording medium.
  • the program recording unit 344 records various programs executed by the information processing apparatus 1a, data used during execution of various programs (for example, dictionary information and text conversion dictionary information), and processing data during execution of various programs.
  • the operation unit 35 is configured by using a mouse, a keyboard, a touch panel, various switches, and the like, receives input of the operation of the user U1, and outputs the operation content that has received the input to the control unit 32.
  • FIG. 8 is a flowchart showing an outline of the processing executed by the information processing apparatus.
  • the display control unit 323 causes the display unit 20 to display an image corresponding to the image data recorded by the image data recording unit 343 (step S201).
  • the display control unit 323 causes the display unit 20 to display an image corresponding to the image data selected according to the operation of the operation unit 35.
  • control unit 32 associates each of the observation data generated by the observation information detection unit 30 and the voice data generated by the voice input unit 31 with the time measured by the time measurement unit 33, and the observation data recording unit. Recording is performed in 341 and the voice data recording unit 342 (step S202).
  • the conversion unit 111 converts the voice data recorded by the voice data recording unit 342 into character information (text data) (step S203). In addition, this step may be performed after S206 described later.
  • step S204: Yes when an instruction signal for ending the observation of the image displayed by the display unit 20 is input from the operation unit 35 (step S204: Yes), the information processing apparatus 1a shifts to step S205 described later. On the other hand, when the instruction signal for ending the observation of the image displayed by the display unit 20 is not input from the operation unit 35 (step S204: No), the information processing apparatus 1a returns to step S202.
  • the phrase extraction unit 11 extracts a keyword (predetermined phrase) from the voice data. Specifically, the extraction unit 112 extracts important words and phrases as predetermined words and phrases from the text data converted by the conversion unit 111 from the voice data.
  • the extraction unit 112 includes an important word / phrase storage unit 113 for storing important words / phrases, and the important word / phrase storage unit 113 assigns attributes to important words / phrases and stores them.
  • the extraction unit 112 extracts one or more important words / phrases included in each attribute (word / phrase class) from the text data spoken as one phrase.
  • FIG. 9 is a diagram showing attributes for classifying important words and phrases.
  • the phrase extraction unit 11 has an important phrase indicating the site or origin of a lesion or the like as the phrase class 1 (first attribute) and a state of the lesion or the like as the phrase class 2 (second attribute). Extract important words and phrases that represent.
  • the set of important words and phrases extracted by the word and phrase extraction unit 11 is recorded in the recording unit 34 in association with the time when these sets of important words and phrases are uttered.
  • step S206 the area setting unit 12 extracts the observation data corresponding to the time when the set of important words and phrases is uttered, and sets the target area in the image data based on the extracted observation data.
  • FIG. 10 is a diagram showing how the area setting unit sets the target area.
  • the area setting unit 12 extracts line-of-sight data as observation data corresponding to the times Speech 1 and 2 in which the set of important words is uttered, and based on the extracted line-of-sight data, the area setting unit 12 contains the line-of-sight data in the image data.
  • the gazing points B1 and B2 that the user U1 gazes at are set. Since the central vision of a human is about 1 to 2 degrees, assuming that the central vision is 1.5 degrees and the distance between the user U1 and the display unit 20 is 50 cm, the user U1 is watching.
  • the area can be regarded as a circle with a radius of 1.3 cm.
  • the area setting unit 12 sets a circular area having a radius of 2 to 3 cm including the gazing points B1 and B2 as a target area in the image data so as to include this area. However, the area setting unit 12 may set a rectangular area including the gazing points B1 and B2 as a target area in the image data. Then, the area setting unit 12 records the set target area in the recording unit 34.
  • the generation unit 13 generates labeled image data in which the set of important words and phrases is associated with the target area (step S207). Further, the generation unit 13 outputs the labeled image data in which the set of important words and phrases is associated with the target area to the recording unit 34, and the information processing apparatus 1a ends the present processing.
  • the region such as a lesion is extracted by the important words and phrases having a plurality of attributes, the images can be classified in more detail. Further, since the area setting unit 12 sets the target area based on the line-of-sight data, the area that the user U1 is gazing at can be extracted more appropriately.
  • the important phrase has two attributes
  • the present invention is not limited to this, and the important phrase may have three or more attributes.
  • the area setting unit 12 sets the gaze point that the user U1 gazes at in the image data based on the line-of-sight data, but the present invention is not limited to this.
  • the area setting unit 12 may set a gazing point in the image data based on the position information of a pointing device such as a mouse. Then, the area setting unit 12 sets the area including the gazing point as the target area in the image data.
  • the third embodiment of the present disclosure will be described.
  • the information processing device 1a is the only component, but in the third embodiment, the information processing device is incorporated as a part of the microscope system.
  • the processing performed by the microscope system according to the third embodiment will be described.
  • the same components as those of the information processing apparatus 1a according to the second embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.
  • FIG. 11 is a block diagram showing a functional configuration of the microscope system according to the third embodiment.
  • the microscope system 100 includes an information processing device 1b, a display unit 20, a voice input unit 31, an operation unit 35, and a microscope 200.
  • the microscope 200 has a substantially C-shaped housing portion 201, a stage 202 movably attached to the housing portion 201 in a three-dimensional direction, and a plurality of objective lenses 203 having different observation magnifications.
  • the eyepiece 206 for observing the observation image of the specimen via the objective lens 203, the operation unit 207 for moving the stage 202 in the three-dimensional direction according to the user's operation, and the position of the stage 202 from the reference position. It includes a position detection unit 208 for detection, and a magnification detection unit 209 for detecting magnification information indicating the observation magnification at which the microscope 200 observes the sample, which is configured by using an encoder or the like.
  • the information processing device 1b includes a control unit 32b and a recording unit 34b in place of the control unit 32 and the recording unit 34 of the information processing device 1a according to the second embodiment described above.
  • the control unit 32b is configured by using a CPU, FPGA, GPU and the like, and controls the display unit 20, the voice input unit 31, and the microscope 200.
  • the control unit 32b further includes an imaging control unit 324 and a magnification calculation unit 325 in addition to the observation information detection control unit 321, the voice input control unit 322, and the display control unit 323 of the control unit 32 of the second embodiment described above. ..
  • the observation information detection control unit 321 calculates the current position information of the stage 202 based on the position of the stage 202 from the reference position detected by the position detection unit 208, and outputs the calculation result to the recording unit 34b.
  • the shooting control unit 324 controls the operation of the imaging unit 205.
  • the shooting control unit 324 generates image data (moving image) by sequentially imaging the image pickup unit 205 according to a predetermined frame rate.
  • the shooting control unit 324 performs image processing (for example, development processing) on the image data input from the image pickup unit 205 and outputs the image data to the recording unit 34b.
  • the magnification calculation unit 325 calculates the observation magnification of the current microscope 200 based on the detection result input from the magnification detection unit 209, and outputs this calculation result to the recording unit 34b. For example, the magnification calculation unit 325 calculates the observation magnification of the current microscope 200 based on the magnification of the objective lens 203 and the magnification of the eyepiece unit 206 input from the magnification detection unit 209.
  • the recording unit 34b is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like.
  • the recording unit 34b includes an image data recording unit 345 instead of the image data recording unit 343 according to the second embodiment described above.
  • the image data recording unit 345 records the image data input from the shooting control unit 324, and outputs this image data to the generation unit 13.
  • FIG. 12 is a flowchart showing an outline of the processing performed by the microscope system according to the third embodiment.
  • the control unit 32b generates observation data including the position information of the stage 202 calculated by the observation information detection control unit 321 and the observation magnification calculated by the magnification calculation unit 329, and the voice input unit 31.
  • Each of the recorded voice data is recorded in the observation data recording unit 341 and the voice data recording unit 342 in association with the time measured by the time measurement unit 33 (step S301).
  • the microscope system 100 shifts to step S302 described later.
  • Steps S302 to S304 correspond to each of steps S203 to S205 in FIG. 8 described above. After step S304, the microscope system 100 proceeds to step S305.
  • step S305 the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data. Specifically, the area setting unit 12 extracts the position information of the stage 202 and the observation magnification as observation data corresponding to the time when the important phrase is uttered, and records the observation field of view of the user at this time. To generate. In other words, the visual field data is an image displayed on the display unit 20 at the time when the important phrase is uttered. Then, the area setting unit 12 sets the area represented by the field of view data as the target area in the image data.
  • the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S306). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 34b, and the microscope system 100 ends this process.
  • the user sets a target area such as a lesion for the image data recorded while observing the sample under a microscope using the microscope system 100, and generates labeled image data. can do.
  • the target area can be set by the voice of the user, the observation is not interrupted by the user looking at the keyboard, the pointing device, or the like during the observation, so that the observation efficiency is improved. Can be improved.
  • the area setting unit 12 extracts the target area using the position information of the stage 202 and the observation data including the observation magnification, but the present invention is not limited to this.
  • the area setting unit 12 may set the target area in the image data based on the feature amount representing the correlation between the continuous frames and the movement vector between the continuous frames by using the image data. Specifically, when the area setting unit 12 determines that the correlation between consecutive frames is large (high degree of similarity), it determines that the user is gazing at this frame, and determines that this frame is the target area. May be set to. Similarly, when the area setting unit 12 determines that the movement vector between consecutive frames is small (the amount of movement between images is small), it determines that the user is gazing at this frame, and targets this frame. It may be set in the area.
  • the microscope 200 may have a line-of-sight detection unit.
  • the line-of-sight detection unit is provided inside or outside the eyepiece unit 206, generates line-of-sight data by detecting the line of sight of the user, and outputs the line-of-sight data to the information processing device 1b.
  • the line-of-sight detection unit is provided inside the eyepiece 206, an LED light source that irradiates near infrared rays, and an optical sensor (for example, CMOS) that is provided inside the eyepiece 206 and captures pupil points and reflection points on the cornea. , CCD) and.
  • CMOS near infrared rays
  • the line-of-sight detection unit is generated by irradiating the user's cornea with near-infrared rays from an LED light source or the like under the control of the information processing device 1b, and the optical sensor images the pupil points and reflection points on the user's cornea. do. Then, the line-of-sight detection unit uses the pattern of the pupil point and the reflection point of the user based on the analysis result analyzed by image processing or the like on the data generated by the optical sensor under the control of the information processing device 1b. The line-of-sight data is generated by detecting the line-of-sight of the user, and this line-of-sight data is output to the information processing device 1b. Similar to the second embodiment, the area setting unit 12 sets a gazing point in the image observed by the user using the line-of-sight data, and sets a target area representing a lesion or the like so as to include the gazing point. You may.
  • the fourth embodiment of the present disclosure will be described.
  • the information processing device is incorporated into a part of the endoscope system.
  • the processing executed by the endoscope system according to the fourth embodiment will be described.
  • the same components as those of the information processing apparatus 1a according to the second embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.
  • FIG. 13 is a schematic view showing the configuration of the endoscope system according to the fourth embodiment.
  • FIG. 14 is a block diagram showing a functional configuration of the endoscope system according to the fourth embodiment.
  • the endoscope system 300 shown in FIGS. 13 and 14 includes a display unit 20, an endoscope 400, a wearable device 500, an input unit 600, and an information processing device 1c.
  • the endoscope 400 generates image data by inserting the user U3 such as a doctor or an operator into the subject U4 and imaging the inside of the subject U4, and transfers this image data to the information processing device 1c. Output.
  • the endoscope 400 includes an image pickup unit 401 and an operation unit 402.
  • the imaging unit 401 is provided at the tip of the insertion unit of the endoscope 400. Under the control of the information processing device 1c, the image pickup unit 401 generates image data by imaging the inside of the subject U4 and outputs the image data to the information processing device 1c.
  • the image pickup unit 401 is configured by using an optical system capable of changing the observation magnification, an image sensor such as CMOS or CCD that generates image data by receiving a subject image formed by the optical system, and the like.
  • the operation unit 402 receives inputs of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing device 1c.
  • the wearable device 500 is attached to the user U3, detects the line of sight of the user U3, and accepts the voice input of the user U3.
  • the wearable device 500 includes a line-of-sight detection unit 510 and a voice input unit 520.
  • the line-of-sight detection unit 510 is provided in the wearable device 500, generates line-of-sight data by detecting the gaze degree of the line of sight of the user U3, and outputs the line-of-sight data to the information processing device 1c. Since the line-of-sight detection unit 510 has the same configuration as the line-of-sight detection unit according to the third embodiment described above, a detailed configuration will be omitted.
  • the voice input unit 520 is provided in the wearable device 500, generates voice data by receiving the voice input of the user U3, and outputs the voice data to the information processing device 1c.
  • the voice input unit 520 is configured by using a microphone or the like.
  • the configuration of the input unit 600 will be described.
  • the input unit 600 is configured by using a mouse, a keyboard, a touch panel, and various switches.
  • the input unit 600 receives inputs of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing device 1c.
  • the information processing device 1c includes a control unit 32c and a recording unit 34c in place of the control unit 32b and the recording unit 34b of the information processing device 1b according to the third embodiment described above.
  • the control unit 32c is configured by using a CPU, FPGA, GPU and the like, and controls the endoscope 400, the wearable device 500, and the display unit 20.
  • the control unit 32c includes an operation history detection unit 326 in addition to a line-of-sight data detection control unit 321c, a voice input control unit 322, a display control unit 323, and a shooting control unit 324.
  • the operation history detection unit 326 detects the content of the operation received by the operation unit 402 of the endoscope 400, and outputs the detection result to the recording unit 34c. Specifically, when the magnifying switch is operated from the operation unit 402 of the endoscope 400, the operation history detection unit 326 detects the operation content and outputs the detection result to the recording unit 34c.
  • the operation history detection unit 326 may detect the operation content of the treatment tool inserted into the subject U4 via the endoscope 400 and output the detection result to the recording unit 34c.
  • the recording unit 34c is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like.
  • the recording unit 34c further includes an operation history recording unit 346 in addition to the configuration of the recording unit 34c according to the third embodiment described above.
  • the operation history recording unit 346 records the operation history for the operation unit 402 of the endoscope 400 input from the operation history detection unit 326.
  • FIG. 15 is a flowchart showing an outline of the processing executed by the endoscope system according to the fourth embodiment.
  • the control unit 32c includes image data generated by the imaging unit 401, line-of-sight data generated by the line-of-sight detection unit 510, voice data generated by the voice input unit 520, and operation history detection unit 326.
  • Each of the detected operation histories is recorded in the image data recording unit 345, the line-of-sight data recording unit 341c, the voice data recording unit 342, and the operation history recording unit 346 in association with the time measured by the time measurement unit 33 (step S401).
  • the endoscope system 300 shifts to step S402 described later.
  • Steps S402 to S404 correspond to each of steps S302 to S304 in FIG. 12 described above. After step S404, the endoscope system 300 proceeds to step S405.
  • the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data. Specifically, the area setting unit 12 sets a feature amount representing the correlation between consecutive frames in the image data recorded in the image data recording unit 345 as observation data corresponding to the time when the important phrase is uttered. calculate. Then, when the area setting unit 12 determines that the correlation between consecutive frames is large (high degree of similarity), the area setting unit 12 determines that the user U3 is gazing at this frame, and sets this frame as the target area. do. Further, the area setting unit 12 may calculate a movement vector between consecutive frames in the image data recorded in the image data recording unit 345. In this case, when the area setting unit 12 determines that the movement vector is small (the amount of movement between images is small), the area setting unit 12 determines that the user U3 is gazing at this frame, and sets this frame as the target area. ..
  • the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S406). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 34c, and the endoscope system 300 ends this process.
  • the user U3 sets a target area such as a lesion on the image data recorded while endoscopically observing the inside of the subject U4 using the endoscope system 300. It can be set and labeled image data can be generated.
  • the target area can be set by the voice of the user U3, the observation is not interrupted by the user U3 looking at the keyboard, the pointing device, or the like during the observation. Therefore, the observation efficiency can be improved.
  • the area setting unit 12 extracts the target area based on the feature amount representing the correlation between the continuous frames and the movement vector between the continuous frames, but the region setting unit 12 is limited to this. do not have.
  • the area setting unit 12 may set a target area in the image data by using the line-of-sight data generated by the wearable device 500. Specifically, as in the second embodiment, the area setting unit 12 sets a gaze point in the image observed by the user U3 using the line-of-sight data, and a lesion or the like is included so as to include the gaze point.
  • the target area representing the above may be set.
  • the endoscope system is used, for example, a capsule-type endoscope, a video microscope for imaging a subject, a mobile phone having an imaging function, and a tablet-type terminal having an imaging function. Even if there is, it can be applied.
  • the endoscope system is provided with a flexible endoscope, but the endoscope system with a rigid endoscope and the endoscope provided with an industrial endoscope are provided. It can be applied even in a system.
  • the endoscope system includes an endoscope inserted into the subject, but the endoscope is a sinus endoscope and an endoscope system such as an electric knife or an examination probe. Can also be applied.
  • the above-mentioned "part” can be read as “means” or "circuit".
  • the control unit can be read as a control means or a control circuit.
  • the programs to be executed by the information processing apparatus are file data in an installable format or an executable format, such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital). It is recorded and provided on a computer-readable recording medium such as a Versail Disc), a USB medium, or a flash memory.
  • an installable format or an executable format such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital). It is recorded and provided on a computer-readable recording medium such as a Versail Disc), a USB medium, or a flash memory.
  • the program to be executed by the information processing apparatus according to the first to fourth embodiments may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program to be executed by the information processing apparatus according to the first to fourth embodiments may be provided or distributed via a network such as the Internet.
  • signals are transmitted from various devices via a transmission cable, but for example, it does not have to be wired and may be wireless.
  • signals may be transmitted from each device in accordance with a predetermined wireless communication standard (for example, Wi-Fi (registered trademark) or Bluetooth (registered trademark)).
  • Wi-Fi registered trademark
  • Bluetooth registered trademark
  • wireless communication may be performed according to other wireless communication standards.

Landscapes

  • Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An information processing device comprises: a phrase extraction unit that extracts a prescribed phrase from speech data in which speech uttered by a user is recorded, while observing image data; an area setting unit that extracts the observation data corresponding to the time in which the prescribed phrase was uttered from observation data recording the observation mode for the image data, and sets a target area in the image data on the basis of the extracted observation data; and a generation unit that generates labeled image data associating the prescribed phrase and the target area. An information processing device is thereby provided in which the load is reduced when setting a target area in image data, and learning data can be easily generated.

Description

情報処理装置、学習装置、及び学習済みモデルInformation processing device, learning device, and trained model
 本発明は、情報処理装置、学習装置、及び学習済みモデルに関する。 The present invention relates to an information processing device, a learning device, and a trained model.
 従来、患者から採取された標本を病理医が観察して診断を行う病理診断が行われている。病理診断において、病理医が標本の画像に診断結果(病変の有無、種類、位置等の情報)を付与して、診断レポートを作成する場合がある。 Conventionally, pathological diagnosis is performed in which a pathologist observes a specimen collected from a patient and makes a diagnosis. In pathological diagnosis, a pathologist may create a diagnostic report by adding a diagnosis result (information such as presence / absence, type, position, etc. of a lesion) to an image of a specimen.
 また、診断結果を付与した画像データを用いて機械学習を行うことにより、標本の画像から病変を検出する学習済みモデルを生成し、病理医の診断を支援する技術が知られている。特許文献1には、診断レポートの病変特徴と、画像処理により検出した病変特徴とが一致している画像データを機械学習用の正解データとする学習用データ生成支援装置が開示されている。 Further, there is known a technique for supporting a pathologist's diagnosis by generating a trained model for detecting a lesion from an image of a specimen by performing machine learning using image data to which a diagnosis result is given. Patent Document 1 discloses a learning data generation support device that uses image data in which the lesion features of a diagnostic report and the lesion features detected by image processing match as correct answer data for machine learning.
特開2019-008349号公報Japanese Unexamined Patent Publication No. 2019-008349
 しかしながら、特許文献1の技術を用いて学習用データを生成する場合、病理医が診断レポートを作成する必要がある。具体的には、病理医は、病変等がどの画像のどの位置にあるか対応付けをするため、標本の画像を観察しながらキーボードやポインティングデバイスを用いて病変等の対象領域の入力操作を行う必要があり、病理医の負担となっていた。 However, when generating learning data using the technique of Patent Document 1, a pathologist needs to create a diagnostic report. Specifically, the pathologist performs an input operation of the target area such as a lesion using a keyboard or a pointing device while observing the image of the specimen in order to associate the lesion or the like with which image and position. It was necessary and burdened by the pathologist.
 本発明は、上記に鑑みてなされたものであって、画像データに対象領域を設定する際の負担を軽減し、学習用データを容易に生成することができる情報処理装置、学習装置、及び学習済みモデルを提供することを目的とする。 The present invention has been made in view of the above, and is an information processing device, a learning device, and a learning device that can reduce the burden of setting a target area on image data and can easily generate learning data. The purpose is to provide a finished model.
 上述した課題を解決し、目的を達成するために、本発明の一態様に係る情報処理装置は、画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する語句抽出部と、前記画像データに対する観察態様を記録した観察データから、前記所定の語句が発声された時間に対応する前記観察データを抽出し、抽出した前記観察データに基づいて、前記画像データに対象領域を設定する領域設定部と、前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成する生成部と、を備える。 In order to solve the above-mentioned problems and achieve the object, the information processing apparatus according to one aspect of the present invention extracts predetermined words and phrases from the voice data recorded by the voice spoken by the user while observing the image data. The observation data corresponding to the time when the predetermined phrase is uttered is extracted from the phrase extraction unit and the observation data recording the observation mode for the image data, and the image data is obtained based on the extracted observation data. It includes an area setting unit for setting a target area, and a generation unit for generating labeled image data in which the predetermined word / phrase is associated with the target area.
 また、本発明の一態様に係る情報処理装置は、前記語句抽出部は、前記音声データをテキストデータに変換する変換部と、前記テキストデータから記録部に記録されている重要語句を前記所定の語句として抽出する抽出部と、を有する。 Further, in the information processing apparatus according to one aspect of the present invention, the phrase extraction unit has a conversion unit that converts the voice data into text data and the important words / phrases recorded in the recording unit from the text data. It has an extraction unit for extracting as words and phrases.
 また、本発明の一態様に係る情報処理装置は、前記抽出部は、前記重要語句を記憶する重要語句記憶部を備え、前記重要語句記憶部は、前記重要語句に所定の属性を付与して記憶しており、前記抽出部は、前記テキストデータから各属性に含まれる1つ以上の前記重要語句を前記所定の語句として抽出する。 Further, in the information processing apparatus according to one aspect of the present invention, the extraction unit includes an important phrase storage unit for storing the important phrase, and the important phrase storage unit assigns a predetermined attribute to the important phrase. It is stored, and the extraction unit extracts one or more of the important words and phrases included in each attribute from the text data as the predetermined words and phrases.
 また、本発明の一態様に係る情報処理装置は、前記重要語句の属性は、病変の部位又は由来を表す第1の属性と、病変の状態を表す第2の属性と、を含む。 Further, in the information processing apparatus according to one aspect of the present invention, the attributes of the important phrase include a first attribute representing the site or origin of the lesion and a second attribute representing the state of the lesion.
 また、本発明の一態様に係る情報処理装置は、前記テキストデータは、1つの文節として発話されたテキストデータであり、前記抽出部は、前記テキストデータから1つ以上の前記重要語句を抽出し、前記対象領域に対応するラベルが1つ以上の属性に帰属する前記重要語句の組で表される。 Further, in the information processing apparatus according to one aspect of the present invention, the text data is text data spoken as one phrase, and the extraction unit extracts one or more of the important words and phrases from the text data. , The label corresponding to the target area is represented by the set of important words and phrases belonging to one or more attributes.
 また、本発明の一態様に係る情報処理装置は、前記領域設定部は、観察倍率が大きい前記観察データを優先的に抽出し、前記生成部は、前記観察倍率が大きいほど高い重要度を前記ラベル付き画像データに付与する。 Further, in the information processing apparatus according to one aspect of the present invention, the area setting unit preferentially extracts the observation data having a large observation magnification, and the generation unit preferentially extracts the observation data having a large observation magnification. Attached to labeled image data.
 また、本発明の一態様に係る情報処理装置は、前記領域設定部は、観察領域の移動速度が小さい前記観察データを優先的に抽出し、前記生成部は、前記観察領域の移動速度が小さいほど高い重要度を前記ラベル付き画像データに付与する。 Further, in the information processing apparatus according to one aspect of the present invention, the area setting unit preferentially extracts the observation data having a small moving speed of the observation area, and the generating unit has a small moving speed of the observation area. The labeled image data is given a higher degree of importance.
 また、本発明の一態様に係る情報処理装置は、前記領域設定部は、観察領域が所定の領域内に停留する時間が長い前記観察データを優先的に抽出し、前記生成部は、前記観察領域が所定の領域内に停留する時間が長いほど高い重要度を前記ラベル付き画像データに付与する。 Further, in the information processing apparatus according to one aspect of the present invention, the area setting unit preferentially extracts the observation data for which the observation area stays in a predetermined area for a long time, and the generation unit generates the observation. The longer the region stays in the predetermined region, the higher the importance is given to the labeled image data.
 また、本発明の一態様に係る情報処理装置は、前記観察データは、前記利用者の視線を検出した視線データを含む。 Further, in the information processing apparatus according to one aspect of the present invention, the observation data includes the line-of-sight data obtained by detecting the line-of-sight of the user.
 また、本発明の一態様に係る情報処理装置は、前記観察データは、前記利用者の観察視野を記録した視野データを含む。 Further, in the information processing apparatus according to one aspect of the present invention, the observation data includes the visual field data in which the observation visual field of the user is recorded.
 また、本発明の一態様に係る情報処理装置は、前記観察データは、ポインティングデバイスの位置情報を含む。 Further, in the information processing apparatus according to one aspect of the present invention, the observation data includes the position information of the pointing device.
 また、本発明の一態様に係る情報処理装置は、フレーム画像を観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する語句抽出部と、前記所定の語句が発声された時間に対応する前記フレーム画像を抽出し、抽出した前記フレーム画像を対象領域に設定する領域設定部と、前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成する生成部と、を備える。 Further, in the information processing apparatus according to one aspect of the present invention, a phrase extraction unit that extracts a predetermined phrase from voice data recording a voice uttered by a user while observing a frame image, and the predetermined phrase are uttered. An area setting unit that extracts the frame image corresponding to the time and sets the extracted frame image in the target area, and a generation unit that generates labeled image data in which the predetermined phrase and the target area are associated with each other. , Equipped with.
 また、本発明の一態様に係る学習装置は、画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する語句抽出部と、前記画像データに対する観察態様を記録した観察データから、前記所定の語句が発声された時間に対応する前記観察データを抽出し、抽出した前記観察データに基づいて、前記画像データに対象領域を設定する領域設定部と、前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成する生成部と、前記ラベル付き画像データを用いた機械学習を行い、画像群から前記所定の語句に対応する領域を検出する学習済みモデルを生成するモデル生成部と、を備える。 Further, the learning device according to one aspect of the present invention records a word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data, and records the observation mode for the image data. The observation data corresponding to the time when the predetermined word is uttered is extracted from the observed data, and the area setting unit for setting the target area in the image data based on the extracted observation data, and the predetermined area. A trained model that detects a region corresponding to the predetermined phrase from an image group by performing machine learning using the labeled image data and a generator that generates labeled image data in which the phrase and the target area are associated with each other. It is provided with a model generation unit for generating data.
 また、本発明の一態様に係る学習済みモデルは、学習用画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出し、前記学習用画像データに対する観察態様を記録した観察データから、前記所定の語句が発声された時間に対応する前記観察データを抽出し、抽出した前記観察データに基づいて、前記学習用画像データに対象領域を設定し、前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成し、前記ラベル付き画像データを用いた機械学習により生成され、判定用画像データが入力された際に、前記所定の語句に関連付けられた前記対象領域と推定する領域を抽出し、該領域が前記対象領域である尤度を出力するよう、コンピュータを機能させる。 Further, in the trained model according to one aspect of the present invention, a predetermined phrase is extracted from the voice data in which the voice uttered by the user is recorded while observing the training image data, and the observation mode for the training image data is obtained. From the recorded observation data, the observation data corresponding to the time when the predetermined phrase is uttered is extracted, and based on the extracted observation data, a target area is set in the learning image data, and the predetermined phrase is set. The labeled image data associated with the target area is generated, and is generated by machine learning using the labeled image data. When the determination image data is input, the said phrase associated with the predetermined phrase. A region estimated to be the target region is extracted, and the computer is made to function so as to output the likelihood that the region is the target region.
 本発明によれば、画像データに対象領域を設定する際の負担を軽減し、学習用データを容易に生成することができる情報処理装置、学習装置、及び学習済みモデルを実現することができる。 According to the present invention, it is possible to realize an information processing device, a learning device, and a trained model that can reduce the burden of setting a target area for image data and easily generate learning data.
[規則91に基づく訂正 17.09.2020] 
図1は、実施の形態1に係る情報処理システムの機能構成を示すブロック図である。 図2は、実施の形態1に係る情報処理装置が実行する処理について説明するフローチャートである。 図3は、音声データを表す図である。 図4は、観察データを表す図である。 図5は、実施の形態2に係る情報処理装置の構成を示す概略図である。 図6は、実施の形態2に係る情報処理装置の構成を示す概略図である。 図7は、実施の形態2に係る情報処理装置の機能構成を示すブロック図である。 図8は、情報処理装置が実行する処理の概要を示すフローチャートである。 図9は、語句抽出部が抽出する語句を表す図である。 図10は、領域設定部が対象領域を設定する様子を表す図である。 図11は、実施の形態3に係る顕微鏡システムの機能構成を示すブロック図である。 図12は、実施の形態3に係る顕微鏡システムが実行する処理の概要を示すフローチャートである。 図13は、実施の形態4に係る内視鏡システムの構成を示す概略図である。 図14は、実施の形態4に係る内視鏡システムの機能構成を示すブロック図である。 図15は、実施の形態4に係る内視鏡システムが実行する処理の概要を示すフローチャートである。
[Correction under Rule 91 17.09.2020]
FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment. FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus according to the first embodiment. FIG. 3 is a diagram showing voice data. FIG. 4 is a diagram showing observation data. FIG. 5 is a schematic diagram showing the configuration of the information processing apparatus according to the second embodiment. FIG. 6 is a schematic diagram showing the configuration of the information processing apparatus according to the second embodiment. FIG. 7 is a block diagram showing a functional configuration of the information processing apparatus according to the second embodiment. FIG. 8 is a flowchart showing an outline of the processing executed by the information processing apparatus. FIG. 9 is a diagram showing words and phrases extracted by the word and phrase extraction unit. FIG. 10 is a diagram showing how the area setting unit sets the target area. FIG. 11 is a block diagram showing a functional configuration of the microscope system according to the third embodiment. FIG. 12 is a flowchart showing an outline of the processing performed by the microscope system according to the third embodiment. FIG. 13 is a schematic view showing the configuration of the endoscope system according to the fourth embodiment. FIG. 14 is a block diagram showing a functional configuration of the endoscope system according to the fourth embodiment. FIG. 15 is a flowchart showing an outline of the processing executed by the endoscope system according to the fourth embodiment.
 以下、本開示を実施するための形態を図面とともに詳細に説明する。なお、以下の実施の形態により本開示が限定されるものではない。また、以下の説明において参照する各図は、本開示の内容を理解でき得る程度に形状、大きさ、及び位置関係を概略的に示してあるに過ぎない。即ち、本開示は、各図で例示された形状、大きさ、及び位置関係のみに限定されるものでない。 Hereinafter, the mode for implementing the present disclosure will be described in detail together with the drawings. The present disclosure is not limited to the following embodiments. In addition, each figure referred to in the following description merely schematically shows the shape, size, and positional relationship to the extent that the contents of the present disclosure can be understood. That is, the present disclosure is not limited to the shapes, sizes, and positional relationships exemplified in each figure.
(実施の形態1)
 〔情報処理システムの構成〕
 図1は、実施の形態1に係る情報処理システムの機能構成を示すブロック図である。図1に示す情報処理システム1は、外部から入力される視線データ、音声データ、及び画像データに対して各種の処理を行う情報処理装置10と、情報処理装置10から出力された各種データを表示する表示部20と、を備える。なお、情報処理装置10と表示部20とは、無線、又は有線によって双方向に接続されている。
(Embodiment 1)
[Information processing system configuration]
FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment. The information processing system 1 shown in FIG. 1 displays an information processing device 10 that performs various processing on line-of-sight data, voice data, and image data input from the outside, and various data output from the information processing device 10. The display unit 20 is provided. The information processing device 10 and the display unit 20 are bidirectionally connected by wireless or wired connection.
 〔情報処理装置の構成〕
 まず、情報処理装置10の構成について説明する。
 図1に示す情報処理装置10は、例えばサーバやパーソナルコンピュータ等にインストールされたプログラムを用いて実現され、ネットワークを経由して各種データが入力される、又は外部の装置で取得された各種データが入力される。図1に示すように、情報処理装置10は、語句抽出部11と、領域設定部12と、生成部13と、記録部14と、表示制御部15と、を備える。
[Configuration of information processing equipment]
First, the configuration of the information processing apparatus 10 will be described.
The information processing device 10 shown in FIG. 1 is realized by using a program installed in, for example, a server or a personal computer, and various data are input via a network or various data acquired by an external device are input. Entered. As shown in FIG. 1, the information processing apparatus 10 includes a word extraction unit 11, an area setting unit 12, a generation unit 13, a recording unit 14, and a display control unit 15.
 語句抽出部11は、外部から入力される利用者の音声データであって、画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する。語句抽出部11は、音声データをテキストデータに変換する変換部111と、テキストデータから記録部14に記録されている重要語句を所定の語句として抽出する抽出部112と、を有する。 The word / phrase extraction unit 11 extracts a predetermined word / phrase from the voice data of the user input from the outside and recorded the voice uttered by the user while observing the image data. The word / phrase extraction unit 11 includes a conversion unit 111 that converts voice data into text data, and an extraction unit 112 that extracts important words / phrases recorded in the recording unit 14 from the text data as predetermined words / phrases.
 変換部111は、音声データに対して周知のテキスト変換処理を行うことによって、音声データを文字情報(テキストデータ)に変換し、この文字情報を抽出部112へ出力する。なお、音声の文字変換はこの時点で行わない構成も可能であり、その際には、音声情報のまま画像データに対象領域を設定し、その後文字情報に変換するようにしても良い。 The conversion unit 111 converts the voice data into character information (text data) by performing a well-known text conversion process on the voice data, and outputs this character information to the extraction unit 112. In addition, it is possible to configure the voice character conversion not to be performed at this point. In that case, the target area may be set in the image data as the voice information and then converted into the character information.
 抽出部112は、予め記録部14に記録されている重要語句を、変換部111によって変換されたテキストデータから抽出し、この重要語句と関連付けて重要語句が発声された時間を記録部14へ記録する。具体的には、語句抽出部11は、音声データのフレーム毎に、このフレームにおいて発声されている重要語句を関連付けて記録部14へ記録する。また、外部から入力される利用者の音声データは、マイク等の音声入力部によって生成されたものである。 The extraction unit 112 extracts important words and phrases recorded in advance in the recording unit 14 from the text data converted by the conversion unit 111, and records in the recording unit 14 the time when the important words and phrases are uttered in association with the important words and phrases. do. Specifically, the phrase extraction unit 11 associates important words and phrases uttered in this frame with each frame of voice data and records them in the recording unit 14. Further, the user's voice data input from the outside is generated by a voice input unit such as a microphone.
 語句抽出部11は、CPU(Central Processing Unit)、FPGA(Field Programmable Gate Array)、及びGPU(Graphics Processing Unit)等を用いて構成される。なお、語句抽出部11は、音声データをテキストデータに変換せず、音声データのまま取り扱ってもよい。この場合、語句抽出部11は、変換部111を有さず、抽出部112は、音声データから直接所定の語句を抽出する。 The phrase extraction unit 11 is configured by using a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a GPU (Graphics Processing Unit), and the like. The phrase extraction unit 11 may handle the voice data as it is without converting the voice data into text data. In this case, the phrase extraction unit 11 does not have the conversion unit 111, and the extraction unit 112 extracts a predetermined phrase directly from the voice data.
 領域設定部12は、画像データに対する観察態様を記録した観察データから、重要語句が発声された時間に対応する観察データを抽出する。領域設定部12は、重要語句が発声されている時間(発声開始から終了まで)を含む所定の時間又は所定の枚数に含まれる観察データを抽出してもよい。さらに、領域設定部12は、抽出した観察データに基づいて、画像データに対象領域を設定し、この対象領域の位置情報を記録部14へ記録する。領域設定部12は、複数の観察データを抽出した場合、選択した1つの観察データに対応する画像データに対象領域を設定してもよく、選択した複数の観察データに対応する複数の画像データに対象領域を設定してもよい。領域設定部12は、CPU、FPGA、及びGPU等を用いて構成される。 The area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered from the observation data in which the observation mode for the image data is recorded. The area setting unit 12 may extract observation data included in a predetermined time or a predetermined number of sheets including a time when an important phrase is uttered (from the start to the end of utterance). Further, the area setting unit 12 sets a target area in the image data based on the extracted observation data, and records the position information of the target area in the recording unit 14. When a plurality of observation data are extracted, the area setting unit 12 may set the target area in the image data corresponding to one selected observation data, and may set the target area in the plurality of image data corresponding to the selected plurality of observation data. The target area may be set. The area setting unit 12 is configured by using a CPU, FPGA, GPU and the like.
 利用形態がWSI(Whole Slide Imaging)では、顕微鏡のスライドサンプルの一部を視野として利用者が観察しており、時間とともに観察視野の倍率及び位置が変化する。この場合、全体の画像データをどの程度の倍率で、どの部分が視野として提示されているか、すなわち全体の画像データに対する表示領域の倍率及び絶対座標が観察データとして、音声データと同期して記録部14に記録されている。領域設定部12は、重要語句が発声されている時間に対応する複数の観察データを抽出し、抽出した観察データの例えば倍率により1又は複数の観察データを選択し、選択した観察データに対応する表示領域を対象領域に設定し、この対象領域の位置情報を記録部14へ記録する。ただし、領域設定部12は、所定の画像処理等により設定した表示領域の一部を対象領域に設定してもよい。 When the usage pattern is WSI (Whole Slide Imaging), the user observes a part of the slide sample of the microscope as a visual field, and the magnification and position of the observation visual field change with time. In this case, the magnification of the entire image data and which part is presented as the field of view, that is, the magnification and absolute coordinates of the display area with respect to the entire image data are recorded as observation data in synchronization with the audio data. It is recorded in 14. The area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is uttered, selects one or a plurality of observation data according to, for example, a magnification of the extracted observation data, and corresponds to the selected observation data. The display area is set as the target area, and the position information of the target area is recorded in the recording unit 14. However, the area setting unit 12 may set a part of the display area set by predetermined image processing or the like as the target area.
 また、利用形態が光学顕微鏡や内視鏡システムを用いる場合には、表示部20に表示している視野が画像データの視野と一致するため、画像の絶対座標に対する観察視野の相対的な位置関係が変わらない。また、利用形態が光学顕微鏡を用いる場合において、利用者が観察している画像を動画として記録しているとき、すなわち画像データが動画である場合、対物レンズの倍率や、ステージの位置の時間変化を観察データとして、音声データと同期して記録部14に記録する。領域設定部12は、重要語句が発声されている時間に対応する複数の観察データを抽出し、抽出した観察データの例えば倍率により1又は複数の観察データを選択し、選択した観察データに対応する画像を対象領域に設定し、この対象領域を表す情報を記録部14へ記録する。また、利用形態が内視鏡システムを用いる場合において、利用者が観察している画像を動画として記録しているとき、すなわち画像データが動画である場合、連続するフレーム間の相関関係を表す特徴量や、連続するフレーム間における移動ベクトルを観察データとして、音声データと同期して記録部14に記録する。領域設定部12は、重要語句が発声されている時間に対応する複数の観察データを抽出し、抽出した観察データの例えば移動ベクトルにより1又は複数の観察データを選択し、選択した観察データに対応する画像を対象領域に設定し、この対象領域を表す情報を記録部14へ記録する。 Further, when the usage mode uses an optical microscope or an endoscopic system, the field of view displayed on the display unit 20 matches the field of view of the image data, so that the relative positional relationship of the observation field of view with respect to the absolute coordinates of the image. Does not change. Further, when the usage mode uses an optical microscope and the image observed by the user is recorded as a moving image, that is, when the image data is a moving image, the magnification of the objective lens and the time change of the stage position are changed. Is recorded in the recording unit 14 in synchronization with the audio data as observation data. The area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is spoken, selects one or a plurality of observation data according to, for example, a magnification of the extracted observation data, and corresponds to the selected observation data. An image is set in the target area, and information representing this target area is recorded in the recording unit 14. Further, when the usage pattern uses an endoscope system, when the image observed by the user is recorded as a moving image, that is, when the image data is a moving image, a feature showing the correlation between consecutive frames. The amount and the movement vector between consecutive frames are recorded as observation data in the recording unit 14 in synchronization with the audio data. The area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is spoken, selects one or a plurality of observation data by, for example, a movement vector of the extracted observation data, and corresponds to the selected observation data. The image to be used is set in the target area, and information representing this target area is recorded in the recording unit 14.
 生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを生成し、この生成したラベル付き画像データを記録部14及び表示制御部15へ出力する。具体的には、生成部13は、語句抽出部11が抽出した重要語句と領域設定部12が設定した対象領域とを関連付けたラベル付き画像データを生成する。生成部13は、CPU、FPGA、及びGPU等を用いて構成される。 The generation unit 13 generates labeled image data in which important words and phrases are associated with the target area, and outputs the generated labeled image data to the recording unit 14 and the display control unit 15. Specifically, the generation unit 13 generates labeled image data in which the important words extracted by the word extraction unit 11 and the target area set by the area setting unit 12 are associated with each other. The generation unit 13 is configured by using a CPU, FPGA, GPU and the like.
 記録部14は、語句抽出部11から入力された音声データと、領域設定部12から入力された観察データと、生成部13から入力された画像データとを同期して記録する。また、記録部14は、予め入力された重要語句と、生成部13から入力された重要語句と対象領域とを関連付けたラベル付き画像データとを記録する。また、記録部14は、語句抽出部11が抽出した重要語句や領域設定部12が抽出した観察データ、領域設定部12が設定した対象領域等を一時記録する際にも用いられる。また、記録部14は、情報処理装置10が実行する各種プログラム、及び処理中のデータを記録する。記録部14は、揮発性メモリ、不揮発性メモリ、及び記録媒体等を用いて構成される。 The recording unit 14 synchronously records the voice data input from the word extraction unit 11, the observation data input from the area setting unit 12, and the image data input from the generation unit 13. Further, the recording unit 14 records the important words / phrases input in advance and the labeled image data in which the important words / phrases input from the generation unit 13 are associated with the target area. The recording unit 14 is also used for temporarily recording important words and phrases extracted by the phrase extraction unit 11, observation data extracted by the area setting unit 12, target areas set by the area setting unit 12, and the like. Further, the recording unit 14 records various programs executed by the information processing apparatus 10 and data being processed. The recording unit 14 is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like.
 表示制御部15は、外部から入力される画像データに対応する画像上に、各種情報を重畳して外部の表示部20に出力することによって表示させる。表示制御部15は、CPU、FPGA、及びGPU等を用いて構成される。なお、上述した語句抽出部11、領域設定部12、生成部13、及び表示制御部15をCPU、FPGA、及びGPUのいずれか1つを用いて各機能が発揮できるように構成してもよいし、もちろん、CPU、FPGA、及びGPUを組み合わせて各機能が発揮できるように構成してもよい。 The display control unit 15 displays by superimposing various information on the image corresponding to the image data input from the outside and outputting it to the external display unit 20. The display control unit 15 is configured by using a CPU, FPGA, GPU and the like. The word / phrase extraction unit 11, the area setting unit 12, the generation unit 13, and the display control unit 15 may be configured so that each function can be exhibited by using any one of the CPU, FPGA, and GPU. However, of course, the CPU, FPGA, and GPU may be combined and configured so that each function can be exhibited.
 〔表示部の構成〕
 次に、表示部20の構成について説明する。
 表示部20は、表示制御部15から入力された画像データに対応する画像を表示する。表示部20は、例えば有機EL(Electro Luminescence)や液晶等の表示モニタを用いて構成される。
[Structure of display unit]
Next, the configuration of the display unit 20 will be described.
The display unit 20 displays an image corresponding to the image data input from the display control unit 15. The display unit 20 is configured by using a display monitor such as an organic EL (Electroluminescence) or a liquid crystal display.
 〔情報処理装置の処理〕
 次に、情報処理装置10の処理について説明する。図2は、情報処理装置10が実行する処理について説明するフローチャートである。
[Processing of information processing equipment]
Next, the processing of the information processing apparatus 10 will be described. FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus 10.
 図2に示すように、まず、情報処理装置10は、外部から入力される画像データ、観察データ、及び音声データを取得する(ステップS101)。これらのデータは同期されており、互いに時間的に対応付けられている。画像データは、例えばWSIにおける全体画像であり、観察データは、全体画像における表示領域や観察倍率を表す情報であり、音声データは観察を行いながら利用者が発声した音声を記録したデータである。 As shown in FIG. 2, first, the information processing apparatus 10 acquires image data, observation data, and audio data input from the outside (step S101). These data are synchronized and are temporally associated with each other. The image data is, for example, an entire image in WSI, the observation data is information representing a display area and an observation magnification in the entire image, and the audio data is data that records the audio spoken by the user while observing.
 続いて、語句抽出部11は、音声データからキーワード(所定の語句)を抽出する(ステップS102)。具体的には、まず、変換部111は、音声データをテキストデータに変換する。そして、抽出部112は、テキストデータから記録部14に記録されている重要語句を所定の語句として抽出する。 Subsequently, the phrase extraction unit 11 extracts a keyword (predetermined phrase) from the voice data (step S102). Specifically, first, the conversion unit 111 converts the voice data into text data. Then, the extraction unit 112 extracts the important words and phrases recorded in the recording unit 14 from the text data as predetermined words and phrases.
 図3は、音声データを表す図である。図3に示すように、語句抽出部11は、テキストデータに変換された音声データから、重要語句に一致する発話内容X1、X2、X3をそれぞれ抽出し、それぞれの発声が開始された開始時刻t11、t21、t31と終了時刻t12、t22、t32とを関連付けて記録部14へ記録する。 FIG. 3 is a diagram showing voice data. As shown in FIG. 3, the phrase extraction unit 11 extracts utterance contents X1, X2, and X3 that match important words and phrases from the voice data converted into text data, respectively, and the start time t11 at which each utterance is started. , T21, t31 and the end times t12, t22, t32 are associated and recorded in the recording unit 14.
 図2に戻り、ステップS103以降の説明を続ける。
 ステップS103において、領域設定部12は、重要語句が発声された時間に対応する観察データを抽出し、抽出した観察データに基づいて、画像データに対象領域を設定する(ステップS103)。記録部14には、観察データとして、時刻とその時刻に表示部20に表示されている画像における所定の2点の座標情報とが関連付けられて記録されている。所定の2点は特に限定されないが、例えば表示部20に表示されている四角形の画像の左上の角及び右下の角の2点であり、表示部20に表示されている表示領域を表す。WSIにおいて全体画像の一部が表示領域である場合、この2点から観察倍率を算出することができる。そして、領域設定部12は、図3に示す発話内容X1について、開始時刻t11~終了時刻t12に含まれる観察データとして、座標情報から算出した観察倍率を抽出する。さらに、抽出した観察倍率から最も大きい値を選択し、この観察倍率に対応する表示領域(表示部20に表示されている画像)を対象領域として設定する。そして、領域設定部12は、設定した対象領域を記録部14へ記録する。
Returning to FIG. 2, the description after step S103 will be continued.
In step S103, the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data (step S103). As observation data, the recording unit 14 records the time and the coordinate information of two predetermined points in the image displayed on the display unit 20 at that time in association with each other. The predetermined two points are not particularly limited, but are, for example, two points, the upper left corner and the lower right corner of the rectangular image displayed on the display unit 20, and represent the display area displayed on the display unit 20. When a part of the whole image is a display area in WSI, the observation magnification can be calculated from these two points. Then, the area setting unit 12 extracts the observation magnification calculated from the coordinate information as the observation data included in the start time t11 to the end time t12 for the utterance content X1 shown in FIG. Further, the largest value is selected from the extracted observation magnifications, and the display area (image displayed on the display unit 20) corresponding to this observation magnification is set as the target area. Then, the area setting unit 12 records the set target area in the recording unit 14.
 図4は、観察データを表す図である。図4に示すように、記録部14には、領域設定部12が設定した対象領域が、時刻と表示領域を表す所定の2点の座標情報として記録される。発話内容X1について、開始時刻t11~終了時刻t12の間において観察倍率が最も大きい時刻が時刻t13である場合、時刻t13とこの時刻t13における表示領域を表す座標情報とが記録部14に記録される。座標情報は、時刻t13において表示部20に表示されている画像の左上の角に対応する座標情報1として座標A11、及びこの画像の右下の角に対応する座標情報2として座標A12である。その結果、利用者が発話内容X1を発生しながら、画像データを最も拡大して観察していた領域が対象領域に設定される。 FIG. 4 is a diagram showing observation data. As shown in FIG. 4, the target area set by the area setting unit 12 is recorded in the recording unit 14 as coordinate information of two predetermined points representing the time and the display area. For the utterance content X1, when the time with the largest observation magnification between the start time t11 and the end time t12 is the time t13, the time t13 and the coordinate information representing the display area at this time t13 are recorded in the recording unit 14. .. The coordinate information is coordinate A11 as coordinate information 1 corresponding to the upper left corner of the image displayed on the display unit 20 at time t13, and coordinate A12 as coordinate information 2 corresponding to the lower right corner of this image. As a result, the area in which the image data is most magnified and observed while the user generates the utterance content X1 is set as the target area.
 同様に、発話内容X2について、開始時刻t21~終了時刻t22の間において観察倍率が最も大きい時刻が時刻t23である場合、時刻t23とこの時刻t23における表示領域を表す座標情報(A21,A22)とが記録部14に記録される。同様に、発話内容X3について、開始時刻t31~終了時刻t32の間において観察倍率が最も大きい時刻が時刻t33である場合、時刻t33とこの時刻t33における表示領域を表す座標情報(A31,A32)とが記録部14に記録される。 Similarly, for the utterance content X2, when the time with the largest observation magnification between the start time t21 and the end time t22 is the time t23, the time t23 and the coordinate information (A21, A22) representing the display area at this time t23 Is recorded in the recording unit 14. Similarly, for the utterance content X3, when the time with the largest observation magnification between the start time t31 and the end time t32 is the time t33, the time t33 and the coordinate information (A31, A32) representing the display area at this time t33 Is recorded in the recording unit 14.
 続いて、生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを生成する(ステップS104)。さらに、生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを記録部14に出力し、情報処理装置10は、本処理を終了する。 Subsequently, the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S104). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 14, and the information processing apparatus 10 ends this processing.
 以上説明した実施の形態1によれば、領域設定部12が観察データに基づいて対象領域を設定するため、画像データに対象領域を設定する際の負担を軽減し、学習用データを容易に生成することができる。この情報処理システム1では、利用者の音声により対象領域を設定できるため、利用者が観察中にキーボードやポインティングデバイス等を見ることにより、観察が中断されることがないため、観察効率を向上させることができる。 According to the first embodiment described above, since the area setting unit 12 sets the target area based on the observation data, the burden of setting the target area on the image data is reduced, and the learning data is easily generated. can do. In this information processing system 1, since the target area can be set by the voice of the user, the observation is not interrupted by the user looking at the keyboard, the pointing device, or the like during the observation, so that the observation efficiency is improved. be able to.
 また、実施の形態1によれば、生成部13がラベル付き画像データを記録部14に記録するので、ディープラーニング等の機械学習で用いる学習データを容易に取得することができる。そして、取得した学習用のラベル付き画像データを用いてディープラーニング等の機械学習を行うことにより、標本等を撮像した判定用の画像データが入力された際に、重要語句に関連付けられた対象領域と推定する領域を抽出し、この領域が対象領域である尤度を出力する学習済みモデルを生成することができる。なお、情報処理装置10によって生成したラベル付き画像データを用いて、情報処理装置10により機械学習を行ってもよいが、情報処理装置10とは異なるコンピュータによって機械学習を行って、学習済みモデルを生成してもよい。また、生成した学習モデルを用いて、情報処理装置10により判定用画像に対する対象領域の抽出及び尤度の出力を行ってもよいが、情報処理装置10とは異なるコンピュータによって、判定用画像に対する対象領域の抽出及び尤度の出力を行ってもよい。 Further, according to the first embodiment, since the generation unit 13 records the labeled image data in the recording unit 14, the learning data used in machine learning such as deep learning can be easily acquired. Then, by performing machine learning such as deep learning using the acquired image data with a label for learning, when the image data for judgment that is an image of a sample or the like is input, the target area associated with the important phrase is input. It is possible to extract a region to be estimated and generate a trained model that outputs the likelihood that this region is the target region. Although machine learning may be performed by the information processing device 10 using the labeled image data generated by the information processing device 10, machine learning may be performed by a computer different from the information processing device 10 to obtain a trained model. May be generated. Further, using the generated learning model, the information processing apparatus 10 may extract the target area for the determination image and output the likelihood, but the object for the determination image may be output by a computer different from the information processing apparatus 10. Region extraction and likelihood output may be performed.
 また、上述した実施の形態1では、領域設定部12が、発話内容X1について、開始時刻t11~終了時刻t12に含まれる観察データを抽出する例を説明したが、これに限られない。例えば、利用者が発話内容X1に対応する病変等を発見してから発話内容X1を発声するまでの間に時間的な遅れが生じる場合があるため、開始時刻t11より所定の時間だけ前の時間まで含めて観察データを抽出してもよい。同様に、利用者が発話内容X1を発声してから発話内容X1に対応する病変等を詳細に観察する場合があるため、終了時刻t12より所定の時間だけ後の時間まで含めて観察データを抽出してもよい。 Further, in the above-described first embodiment, the area setting unit 12 has described an example of extracting the observation data included in the start time t11 to the end time t12 for the utterance content X1, but the present invention is not limited to this. For example, there may be a time delay between the user discovering the lesion corresponding to the utterance content X1 and the utterance of the utterance content X1, so that the time before the start time t11 by a predetermined time. Observation data may be extracted including up to. Similarly, since the user may utter the utterance content X1 and then observe the lesion or the like corresponding to the utterance content X1 in detail, the observation data is extracted including the time after the end time t12 by a predetermined time. You may.
 また、上述した実施の形態1では、領域設定部12は、抽出した観察倍率から最も大きい値を選択し、この観察倍率に対応する表示領域を対象領域として設定する例を説明したが、これに限られない。領域設定部12は、抽出した観察倍率から最も大きい値を選択し、この観察倍率に対応する表示領域の一部を対象領域として設定してもよい。例えば、領域設定部12は、表示領域から画像処理によって抽出した病変領域を含む一部を対象領域として設定してもよい。 Further, in the first embodiment described above, the area setting unit 12 has described an example in which the largest value is selected from the extracted observation magnifications and the display area corresponding to the observation magnification is set as the target area. Not limited. The area setting unit 12 may select the largest value from the extracted observation magnifications and set a part of the display area corresponding to the observation magnification as the target area. For example, the area setting unit 12 may set a part including the lesion area extracted from the display area by image processing as the target area.
 さらに、領域設定部12は、観察倍率が大きい観察データを優先的に抽出し、生成部13は、観察倍率が大きいほど高い重要度をラベル付き画像データに付与してもよい。具体的には、領域設定部12は、観察倍率が閾値より大きい観察データを抽出する。そして、生成部13は、観察倍率が大きいほど高い重要度をラベル付き画像データに付与して記録部14に記録する。その結果、利用者が観察倍率を上げて観察した画像ほど重要度が高くなるように自動的に画像にラベル付けをすることができ、画像にラベル付けを行う際の利用者の負担を低減させることができる。 Further, the area setting unit 12 may preferentially extract the observation data having a large observation magnification, and the generation unit 13 may give higher importance to the labeled image data as the observation magnification is larger. Specifically, the area setting unit 12 extracts observation data having an observation magnification larger than the threshold value. Then, the generation unit 13 assigns a higher importance to the labeled image data as the observation magnification is larger, and records the data in the recording unit 14. As a result, the image can be automatically labeled so that the image observed by the user at an increased observation magnification becomes more important, reducing the burden on the user when labeling the image. be able to.
 また、領域設定部12は、観察領域の移動速度が小さい観察データを優先的に抽出し、生成部13は、観察領域の移動速度が小さいほど高い重要度をラベル付き画像データに付与してもよい。具体的には、領域設定部12は、観察領域の移動速度が閾値より小さい観察データを抽出する。そして、生成部13は、観察領域の移動速度が小さいほど高い重要度をラベル付き画像データに付与して記録部14に記録する。その結果、利用者が観察領域をあまり移動させずに注視した画像ほど重要度が高くなるように自動的に画像にラベル付けをすることができ、画像にラベル付けを行う際の利用者の負担を低減させることができる。 Further, the area setting unit 12 preferentially extracts the observation data having a low moving speed of the observation area, and the generation unit 13 may give higher importance to the labeled image data as the moving speed of the observation area is smaller. good. Specifically, the area setting unit 12 extracts observation data in which the moving speed of the observation area is smaller than the threshold value. Then, the generation unit 13 assigns a higher importance to the labeled image data as the moving speed of the observation region is smaller, and records the data in the recording unit 14. As a result, the image can be automatically labeled so that the image that the user gazes at without moving the observation area so much becomes more important, and the burden on the user when labeling the image. Can be reduced.
 また、領域設定部12は、観察領域が所定の領域内に停留する時間が長い観察データを優先的に抽出し、生成部13は、観察領域が所定の領域内に停留する時間が長いほど高い重要度をラベル付き画像データに付与してもよい。具体的には、領域設定部12は、観察領域が所定の領域内に停留する時間が閾値より長い観察データを抽出する。そして、生成部13は、観察領域が所定の領域内に停留する時間が長いほど高い重要度をラベル付き画像データに付与して記録部14に記録する。その結果、利用者による観察領域が所定の領域内に停留する時間が長く、利用者が注視した画像ほど重要度が高くなるように自動的に画像にラベル付けをすることができ、画像にラベル付けを行う際の利用者の負担を低減させることができる。 Further, the area setting unit 12 preferentially extracts the observation data for which the observation area stays in the predetermined area for a long time, and the generation unit 13 preferentially extracts the observation data for which the observation area stays in the predetermined area for a long time. The importance may be given to the labeled image data. Specifically, the area setting unit 12 extracts observation data in which the observation area stays in a predetermined area for a time longer than the threshold value. Then, the generation unit 13 assigns a higher importance to the labeled image data and records it in the recording unit 14 as the observation region stays in the predetermined region for a longer time. As a result, the image can be automatically labeled so that the observation area by the user stays in the predetermined area for a long time and the image that the user gazes at becomes more important, and the image is labeled. It is possible to reduce the burden on the user when attaching.
(実施の形態2)
 次に、本開示の実施の形態2について説明する。実施の形態1では、外部から観察データ、及び音声データの各々が入力されていたが、実施の形態2では、観察データ、及び音声データを生成する。以下においては、実施の形態2に係る情報処理装置の構成を説明後、実施の形態2に係る情報処理装置が実行する処理について説明する。なお、上述した実施の形態1に係る情報処理システム1と同一の構成には同一の符号を付して詳細な説明は適宜省略する。
(Embodiment 2)
Next, the second embodiment of the present disclosure will be described. In the first embodiment, each of the observation data and the voice data is input from the outside, but in the second embodiment, the observation data and the voice data are generated. In the following, after explaining the configuration of the information processing apparatus according to the second embodiment, the processing executed by the information processing apparatus according to the second embodiment will be described. The same components as those of the information processing system 1 according to the first embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.
 〔情報処理装置の構成〕
 図5、図6は、実施の形態2に係る情報処理装置の構成を示す概略図である。図7は、実施の形態2に係る情報処理装置の機能構成を示すブロック図である。
[Configuration of information processing equipment]
5 and 6 are schematic views showing the configuration of the information processing apparatus according to the second embodiment. FIG. 7 is a block diagram showing a functional configuration of the information processing apparatus according to the second embodiment.
 図5~図7に示す情報処理装置1aは、語句抽出部11と、領域設定部12と、生成部13と、表示部20と、観察情報検出部30と、音声入力部31と、制御部32と、時間計測部33と、記録部34と、操作部35と、を備える。 The information processing apparatus 1a shown in FIGS. 5 to 7 includes a phrase extraction unit 11, an area setting unit 12, a generation unit 13, a display unit 20, an observation information detection unit 30, a voice input unit 31, and a control unit. 32, a time measuring unit 33, a recording unit 34, and an operation unit 35 are provided.
 観察情報検出部30は、近赤外線を照射するLED光源と、角膜上の瞳孔点と反射点を撮像する光学センサ(例えばCMOS(Complementary Metal Oxide Semiconductor)、CCD(Charge Coupled Device)等)と、を用いて構成される。観察情報検出部30は、利用者U1が表示部20を視認可能な情報処理装置1aの筐体の側面に設けられる(図5、及び図6を参照)。観察情報検出部30は、制御部32の制御のもと、表示部20が表示する画像に対する利用者U1の視線を検出した視線データを生成し、この視線データを観察データとして制御部32へ出力する。具体的には、観察情報検出部30は、制御部32の制御のもと、LED光源等から近赤外線を利用者U1の角膜に照射し、光学センサが利用者U1の角膜上の瞳孔点と反射点を撮像することによって視線データを生成する。そして、観察情報検出部30は、制御部32の制御のもと、光学センサによって生成されたデータに対して画像処理等によって解析した解析結果に基づいて、利用者U1の瞳孔点と反射点のパターンから利用者の視線や視線を連続的に算出することによって所定時間の視線データを生成し、この視線データを観察データとして後述する観察情報検出制御部321へ出力する。なお、観察情報検出部30は、単に光学センサのみで利用者U1の瞳を周知のパターンマッチングを用いることによって瞳を検出することによって、利用者U1の視線を検出した視線データを生成してもよいし、他のセンサや他の周知技術を用いて利用者U1の視線を検出することによって視線データを生成してもよい。 The observation information detection unit 30 includes an LED light source that irradiates near-infrared rays and an optical sensor (for example, CMOS (Complementary Metal Oxide Sensor), CCD (Charge Coupled Device), etc.) that captures pupil points and reflection points on the cornea. Constructed using. The observation information detection unit 30 is provided on the side surface of the housing of the information processing apparatus 1a in which the user U1 can visually recognize the display unit 20 (see FIGS. 5 and 6). Under the control of the control unit 32, the observation information detection unit 30 generates line-of-sight data that detects the line-of-sight of the user U1 with respect to the image displayed by the display unit 20, and outputs this line-of-sight data to the control unit 32 as observation data. do. Specifically, the observation information detection unit 30 irradiates the cornea of the user U1 with near infrared rays from an LED light source or the like under the control of the control unit 32, and the optical sensor is used as a pupil point on the cornea of the user U1. Line-of-sight data is generated by imaging the reflection points. Then, the observation information detection unit 30 determines the pupil point and the reflection point of the user U1 based on the analysis result analyzed by image processing or the like on the data generated by the optical sensor under the control of the control unit 32. By continuously calculating the line of sight and the line of sight of the user from the pattern, the line of sight data for a predetermined time is generated, and this line of sight data is output as observation data to the observation information detection control unit 321 described later. It should be noted that the observation information detection unit 30 may generate line-of-sight data in which the line of sight of the user U1 is detected by detecting the pupil of the user U1 by using a well-known pattern matching simply by using an optical sensor. Alternatively, the line-of-sight data may be generated by detecting the line-of-sight of the user U1 using other sensors or other well-known techniques.
 音声入力部31は、音声が入力されるマイクと、マイクが入力を受け付けた音声をデジタルの音声データに変換するとともに、この音声データを増幅することによって制御部32へ出力する音声コーデックと、を用いて構成される。音声入力部31は、制御部32の制御のもと、利用者U1の音声の入力を受け付けることによって音声データを生成し、この音声データを制御部32へ出力する。なお、音声入力部31は、音声の入力以外にも、音声を出力することができるスピーカ等を設け、音声出力機能を設けてもよい。 The voice input unit 31 converts a microphone into which voice is input and a voice codec that converts the voice received by the microphone into digital voice data and outputs the voice data to the control unit 32 by amplifying the voice data. Consists of using. The voice input unit 31 generates voice data by receiving the voice input of the user U1 under the control of the control unit 32, and outputs the voice data to the control unit 32. In addition to the voice input, the voice input unit 31 may be provided with a speaker or the like capable of outputting voice, and may be provided with a voice output function.
 制御部32は、CPU、FPGA、及びGPU等を用いて構成され、観察情報検出部30、音声入力部31、及び表示部20を制御する。制御部32は、観察情報検出制御部321と、音声入力制御部322と、表示制御部323と、を有する。 The control unit 32 is configured by using a CPU, FPGA, GPU and the like, and controls the observation information detection unit 30, the voice input unit 31, and the display unit 20. The control unit 32 includes an observation information detection control unit 321, a voice input control unit 322, and a display control unit 323.
 観察情報検出制御部321は、観察情報検出部30を制御する。具体的には、観察情報検出制御部321は、観察情報検出部30を所定のタイミング毎に近赤外線を利用者U1へ照射させるとともに、利用者U1の瞳を観察情報検出部30に撮像させることによって視線データを生成させる。また、観察情報検出制御部321は、観察情報検出部30から入力された視線データに対して、各種の画像処理を行って記録部34へ出力する。 The observation information detection control unit 321 controls the observation information detection unit 30. Specifically, the observation information detection control unit 321 causes the observation information detection unit 30 to irradiate the user U1 with near infrared rays at predetermined timings, and causes the observation information detection unit 30 to image the pupil of the user U1. Generates line-of-sight data. Further, the observation information detection control unit 321 performs various image processing on the line-of-sight data input from the observation information detection unit 30 and outputs the image to the recording unit 34.
 音声入力制御部322は、音声入力部31を制御し、音声入力部31から入力された音声データに対して各種の処理、例えばゲインアップやノイズ低減処理等を行って記録部34へ出力する。 The voice input control unit 322 controls the voice input unit 31, performs various processes such as gain up and noise reduction processing on the voice data input from the voice input unit 31, and outputs the voice data to the recording unit 34.
 表示制御部323は、表示部20の表示態様を制御する。表示制御部323は、記録部34に記録された画像データに対応する画像を表示部20に表示させる。 The display control unit 323 controls the display mode of the display unit 20. The display control unit 323 causes the display unit 20 to display an image corresponding to the image data recorded in the recording unit 34.
 時間計測部33は、タイマーやクロックジェネレータ等を用いて構成され、観察情報検出部30によって生成された視線データ、及び音声入力部31によって生成された音声データ等に対して時間情報を付与する。 The time measurement unit 33 is configured by using a timer, a clock generator, or the like, and adds time information to the line-of-sight data generated by the observation information detection unit 30, the voice data generated by the voice input unit 31, and the like.
 記録部34は、揮発性メモリ、不揮発性メモリ、及び記録媒体等を用いて構成され、情報処理装置1aに関する各種の情報を記録する。記録部34は、観察データ記録部341と、音声データ記録部342と、画像データ記録部343と、プログラム記録部344と、を有する。 The recording unit 34 is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like, and records various information related to the information processing apparatus 1a. The recording unit 34 includes an observation data recording unit 341, an audio data recording unit 342, an image data recording unit 343, and a program recording unit 344.
 観察データ記録部341は、観察情報検出制御部321から入力された視線データを観察データとして記録するとともに、観察データを領域設定部12へ出力する。 The observation data recording unit 341 records the line-of-sight data input from the observation information detection control unit 321 as observation data, and outputs the observation data to the area setting unit 12.
 音声データ記録部342は、音声入力制御部322から入力された音声データを記録するとともに、音声データを語句抽出部11へ出力する。 The voice data recording unit 342 records the voice data input from the voice input control unit 322 and outputs the voice data to the phrase extraction unit 11.
 画像データ記録部343は、複数の画像データを記録する。この複数の画像データは、情報処理装置1aの外部から入力されたデータ、又は記録媒体によって外部の撮像装置によって撮像されたデータである。 The image data recording unit 343 records a plurality of image data. The plurality of image data are data input from the outside of the information processing device 1a or data captured by an external image pickup device by a recording medium.
 プログラム記録部344は、情報処理装置1aが実行する各種プログラム、各種プログラムの実行中に使用するデータ(例えば辞書情報やテキスト変換辞書情報)、及び各種プログラムの実行中の処理データを記録する。 The program recording unit 344 records various programs executed by the information processing apparatus 1a, data used during execution of various programs (for example, dictionary information and text conversion dictionary information), and processing data during execution of various programs.
 操作部35は、マウス、キーボード、タッチパネル、及び各種スイッチ等を用いて構成され、利用者U1の操作の入力を受け付け、入力を受け付けた操作内容を制御部32へ出力する。 The operation unit 35 is configured by using a mouse, a keyboard, a touch panel, various switches, and the like, receives input of the operation of the user U1, and outputs the operation content that has received the input to the control unit 32.
 〔情報処理装置の処理〕
 次に、情報処理装置1aが実行する処理について説明する。図8は、情報処理装置が実行する処理の概要を示すフローチャートである。
[Processing of information processing equipment]
Next, the processing executed by the information processing apparatus 1a will be described. FIG. 8 is a flowchart showing an outline of the processing executed by the information processing apparatus.
 図8に示すように、まず、表示制御部323は、画像データ記録部343が記録する画像データに対応する画像を表示部20に表示させる(ステップS201)。この場合、表示制御部323は、操作部35の操作に応じて選択された画像データに対応する画像を表示部20に表示させる。 As shown in FIG. 8, first, the display control unit 323 causes the display unit 20 to display an image corresponding to the image data recorded by the image data recording unit 343 (step S201). In this case, the display control unit 323 causes the display unit 20 to display an image corresponding to the image data selected according to the operation of the operation unit 35.
 続いて、制御部32は、観察情報検出部30が生成した観察データ、及び音声入力部31が生成した音声データの各々と時間計測部33によって計測された時間とを対応付けて観察データ記録部341、及び音声データ記録部342に記録する(ステップS202)。 Subsequently, the control unit 32 associates each of the observation data generated by the observation information detection unit 30 and the voice data generated by the voice input unit 31 with the time measured by the time measurement unit 33, and the observation data recording unit. Recording is performed in 341 and the voice data recording unit 342 (step S202).
 その後、変換部111は、音声データ記録部342が記録する音声データを文字情報(テキストデータ)に変換する(ステップS203)。なお、このステップは後述のS206の後に行っても良い。 After that, the conversion unit 111 converts the voice data recorded by the voice data recording unit 342 into character information (text data) (step S203). In addition, this step may be performed after S206 described later.
 続いて、操作部35から表示部20が表示する画像の観察を終了する指示信号が入力された場合(ステップS204:Yes)、情報処理装置1aは、後述するステップS205へ移行する。これに対して、操作部35から表示部20が表示する画像の観察を終了する指示信号が入力されていない場合(ステップS204:No)、情報処理装置1aは、ステップS202へ戻る。 Subsequently, when an instruction signal for ending the observation of the image displayed by the display unit 20 is input from the operation unit 35 (step S204: Yes), the information processing apparatus 1a shifts to step S205 described later. On the other hand, when the instruction signal for ending the observation of the image displayed by the display unit 20 is not input from the operation unit 35 (step S204: No), the information processing apparatus 1a returns to step S202.
 ステップS205において、語句抽出部11は、音声データからキーワード(所定の語句)を抽出する。具体的には、抽出部112は、変換部111が音声データを変換したテキストデータから重要語句を所定の語句として抽出する。ここで、抽出部112は、重要語句を記憶する重要語句記憶部113を備え、重要語句記憶部113は、重要語句に属性を付与して記憶している。抽出部112は、1つの文節として発話されたテキストデータから各属性(語句クラス)に含まれる1つ以上の重要語句を抽出する。 In step S205, the phrase extraction unit 11 extracts a keyword (predetermined phrase) from the voice data. Specifically, the extraction unit 112 extracts important words and phrases as predetermined words and phrases from the text data converted by the conversion unit 111 from the voice data. Here, the extraction unit 112 includes an important word / phrase storage unit 113 for storing important words / phrases, and the important word / phrase storage unit 113 assigns attributes to important words / phrases and stores them. The extraction unit 112 extracts one or more important words / phrases included in each attribute (word / phrase class) from the text data spoken as one phrase.
 図9は、重要語句を分類する属性を表す図である。図9に示すように、語句抽出部11は、語句クラス1(第1の属性)として病変等の部位又は由来等を表す重要語句と、語句クラス2(第2の属性)として病変等の状態を表す重要語句とを抽出する。語句抽出部11が抽出した重要語句の組は、これらの重要語句の組が発声された時刻と関連付けられて記録部34へ記録される。 FIG. 9 is a diagram showing attributes for classifying important words and phrases. As shown in FIG. 9, the phrase extraction unit 11 has an important phrase indicating the site or origin of a lesion or the like as the phrase class 1 (first attribute) and a state of the lesion or the like as the phrase class 2 (second attribute). Extract important words and phrases that represent. The set of important words and phrases extracted by the word and phrase extraction unit 11 is recorded in the recording unit 34 in association with the time when these sets of important words and phrases are uttered.
 図8に戻り、ステップS206以降の説明を続ける。
 ステップS206において、領域設定部12は、重要語句の組が発声された時間に対応する観察データを抽出し、抽出した観察データに基づいて、画像データに対象領域を設定する。
Returning to FIG. 8, the description after step S206 will be continued.
In step S206, the area setting unit 12 extracts the observation data corresponding to the time when the set of important words and phrases is uttered, and sets the target area in the image data based on the extracted observation data.
 図10は、領域設定部が対象領域を設定する様子を表す図である。図10に示すように、領域設定部12は、重要語句の組が発声された時間Speach1、2にそれぞれ対応する観察データとして視線データを抽出し、抽出した視線データに基づいて、画像データ内に利用者U1が注視した注視点B1、B2を設定する。人間の中心視は1~2度程度であるから、中心視を1.5度とし、利用者U1と表示部20との間の距離を50cmであるとすると、利用者U1が注視している領域は半径1.3cmの円とみなすことができる。この領域を含むように、領域設定部12は、注視点B1、B2を含む半径2~3cmの円形の領域を画像データにおける対象領域として設定する。ただし、領域設定部12は、注視点B1、B2を含む四角形の領域を画像データにおける対象領域として設定してもよい。そして、領域設定部12は、設定した対象領域を記録部34へ記録する。 FIG. 10 is a diagram showing how the area setting unit sets the target area. As shown in FIG. 10, the area setting unit 12 extracts line-of-sight data as observation data corresponding to the times Speech 1 and 2 in which the set of important words is uttered, and based on the extracted line-of-sight data, the area setting unit 12 contains the line-of-sight data in the image data. The gazing points B1 and B2 that the user U1 gazes at are set. Since the central vision of a human is about 1 to 2 degrees, assuming that the central vision is 1.5 degrees and the distance between the user U1 and the display unit 20 is 50 cm, the user U1 is watching. The area can be regarded as a circle with a radius of 1.3 cm. The area setting unit 12 sets a circular area having a radius of 2 to 3 cm including the gazing points B1 and B2 as a target area in the image data so as to include this area. However, the area setting unit 12 may set a rectangular area including the gazing points B1 and B2 as a target area in the image data. Then, the area setting unit 12 records the set target area in the recording unit 34.
 続いて、生成部13は、重要語句の組と対象領域とを関連付けたラベル付き画像データを生成する(ステップS207)。さらに、生成部13は、重要語句の組と対象領域とを関連付けたラベル付き画像データを記録部34に出力し、情報処理装置1aは、本処理を終了する。 Subsequently, the generation unit 13 generates labeled image data in which the set of important words and phrases is associated with the target area (step S207). Further, the generation unit 13 outputs the labeled image data in which the set of important words and phrases is associated with the target area to the recording unit 34, and the information processing apparatus 1a ends the present processing.
 以上説明した実施の形態2によれば、複数の属性を有する重要語句によって病変等の領域を抽出するため、画像をより詳細に分類することができる。また、領域設定部12が視線データに基づいて対象領域を設定するため、利用者U1が注視している領域をより適切に抽出することができる。なお、実施の形態2では、重要語句が2つの属性を有する例を説明したが、これに限られず、重要語句が3つ以上の属性を有していてもよい。 According to the second embodiment described above, since the region such as a lesion is extracted by the important words and phrases having a plurality of attributes, the images can be classified in more detail. Further, since the area setting unit 12 sets the target area based on the line-of-sight data, the area that the user U1 is gazing at can be extracted more appropriately. In the second embodiment, an example in which the important phrase has two attributes has been described, but the present invention is not limited to this, and the important phrase may have three or more attributes.
 なお、上述した実施の形態2では、領域設定部12は、視線データに基づいて、画像データ内に利用者U1が注視した注視点を設定したがこれに限られない。領域設定部12は、マウス等のポインティングデバイスの位置情報に基づいて、画像データ内に注視点を設定してもよい。そして、領域設定部12は、注視点を含む領域を画像データにおける対象領域として設定する。 In the second embodiment described above, the area setting unit 12 sets the gaze point that the user U1 gazes at in the image data based on the line-of-sight data, but the present invention is not limited to this. The area setting unit 12 may set a gazing point in the image data based on the position information of a pointing device such as a mouse. Then, the area setting unit 12 sets the area including the gazing point as the target area in the image data.
(実施の形態3)
 次に、本開示の実施の形態3について説明する。上述した実施の形態2では、情報処理装置1aのみで構成されていたが、実施の形態3では、顕微鏡システムの一部に情報処理装置を組み込むことによって構成する。以下においては、実施の形態3に係る顕微鏡システムの構成を説明後、実施の形態3に係る顕微鏡システムが実行する処理について説明する。なお、上述した実施の形態2に係る情報処理装置1aと同一の構成には同一の符号を付して詳細な説明は適宜省略する。
(Embodiment 3)
Next, the third embodiment of the present disclosure will be described. In the second embodiment described above, the information processing device 1a is the only component, but in the third embodiment, the information processing device is incorporated as a part of the microscope system. In the following, after explaining the configuration of the microscope system according to the third embodiment, the processing performed by the microscope system according to the third embodiment will be described. The same components as those of the information processing apparatus 1a according to the second embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.
 〔顕微鏡システムの構成〕
 図11は、実施の形態3に係る顕微鏡システムの機能構成を示すブロック図である。図11に示すように、顕微鏡システム100は、情報処理装置1bと、表示部20と、音声入力部31と、操作部35と、顕微鏡200と、を備える。
[Structure of microscope system]
FIG. 11 is a block diagram showing a functional configuration of the microscope system according to the third embodiment. As shown in FIG. 11, the microscope system 100 includes an information processing device 1b, a display unit 20, a voice input unit 31, an operation unit 35, and a microscope 200.
 〔顕微鏡の構成〕
 まず、顕微鏡200の構成について説明する。
 顕微鏡200は、略C字状をなす筐体部201と、筐体部201に対して3次元方向に移動可能に取り付けられたステージ202と、互いに観察倍率が異なる複数の対物レンズ203を有し、ユーザの操作に応じて所望の対物レンズ203を配置するレボルバ204と、対物レンズ203を経由してステージ202上に載置された標本を撮像するCCD又はCMOS等で構成された撮像部205と、対物レンズ203を経由して標本の観察像を観察する接眼部206と、ユーザの操作に応じてステージ202を3次元方向に移動させる操作部207と、基準位置からのステージ202の位置を検出する位置検出部208と、エンコーダ等を用いて構成され、顕微鏡200が標本を観察する観察倍率を示す倍率情報を検出する倍率検出部209と、を備える。
[Microscope configuration]
First, the configuration of the microscope 200 will be described.
The microscope 200 has a substantially C-shaped housing portion 201, a stage 202 movably attached to the housing portion 201 in a three-dimensional direction, and a plurality of objective lenses 203 having different observation magnifications. , A revolver 204 for arranging a desired objective lens 203 according to a user's operation, and an imaging unit 205 composed of a CCD or CMOS for imaging a sample placed on a stage 202 via the objective lens 203. , The eyepiece 206 for observing the observation image of the specimen via the objective lens 203, the operation unit 207 for moving the stage 202 in the three-dimensional direction according to the user's operation, and the position of the stage 202 from the reference position. It includes a position detection unit 208 for detection, and a magnification detection unit 209 for detecting magnification information indicating the observation magnification at which the microscope 200 observes the sample, which is configured by using an encoder or the like.
 〔情報処理装置の構成〕
 次に、情報処理装置1bの構成について説明する。
 情報処理装置1bは、上述した実施の形態2に係る情報処理装置1aの制御部32、及び記録部34に換えて、制御部32b、記録部34bと、を備える。
[Configuration of information processing equipment]
Next, the configuration of the information processing apparatus 1b will be described.
The information processing device 1b includes a control unit 32b and a recording unit 34b in place of the control unit 32 and the recording unit 34 of the information processing device 1a according to the second embodiment described above.
 制御部32bは、CPU、FPGA、及びGPU等を用いて構成され、表示部20、音声入力部31、及び顕微鏡200を制御する。制御部32bは、上述した実施の形態2の制御部32の観察情報検出制御部321、音声入力制御部322、表示制御部323に加えて、撮影制御部324、及び倍率算出部325をさらに備える。 The control unit 32b is configured by using a CPU, FPGA, GPU and the like, and controls the display unit 20, the voice input unit 31, and the microscope 200. The control unit 32b further includes an imaging control unit 324 and a magnification calculation unit 325 in addition to the observation information detection control unit 321, the voice input control unit 322, and the display control unit 323 of the control unit 32 of the second embodiment described above. ..
 観察情報検出制御部321は、位置検出部208が検出した基準位置からのステージ202の位置に基づいて、現在のステージ202の位置情報を算出し、この算出結果を記録部34bへ出力する。 The observation information detection control unit 321 calculates the current position information of the stage 202 based on the position of the stage 202 from the reference position detected by the position detection unit 208, and outputs the calculation result to the recording unit 34b.
 撮影制御部324は、撮像部205の動作を制御する。撮影制御部324は、撮像部205を所定のフレームレートに従って順次撮像させることによって画像データ(動画)を生成させる。撮影制御部324は、撮像部205から入力された画像データに対して処理の画像処理(例えば現像処理等)を施して記録部34bへ出力する。 The shooting control unit 324 controls the operation of the imaging unit 205. The shooting control unit 324 generates image data (moving image) by sequentially imaging the image pickup unit 205 according to a predetermined frame rate. The shooting control unit 324 performs image processing (for example, development processing) on the image data input from the image pickup unit 205 and outputs the image data to the recording unit 34b.
 倍率算出部325は、倍率検出部209から入力された検出結果に基づいて、現在の顕微鏡200の観察倍率を算出し、この算出結果を記録部34bへ出力する。例えば、倍率算出部325は、倍率検出部209から入力された対物レンズ203の倍率と接眼部206の倍率とに基づいて、現在の顕微鏡200の観察倍率を算出する。 The magnification calculation unit 325 calculates the observation magnification of the current microscope 200 based on the detection result input from the magnification detection unit 209, and outputs this calculation result to the recording unit 34b. For example, the magnification calculation unit 325 calculates the observation magnification of the current microscope 200 based on the magnification of the objective lens 203 and the magnification of the eyepiece unit 206 input from the magnification detection unit 209.
 記録部34bは、揮発性メモリ、不揮発性メモリ、及び記録媒体等を用いて構成される。記録部34bは、上述した実施の形態2に係る画像データ記録部343に換えて、画像データ記録部345を備える。画像データ記録部345は、撮影制御部324から入力された画像データを記録し、この画像データを生成部13へ出力する。 The recording unit 34b is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like. The recording unit 34b includes an image data recording unit 345 instead of the image data recording unit 343 according to the second embodiment described above. The image data recording unit 345 records the image data input from the shooting control unit 324, and outputs this image data to the generation unit 13.
 〔顕微鏡システムの処理〕
 次に、顕微鏡システム100が実行する処理について説明する。図12は、実施の形態3に係る顕微鏡システムが実行する処理の概要を示すフローチャートである。
[Processing of microscope system]
Next, the processing performed by the microscope system 100 will be described. FIG. 12 is a flowchart showing an outline of the processing performed by the microscope system according to the third embodiment.
 図12に示すように、まず、制御部32bは、観察情報検出制御部321が算出したステージ202の位置情報、及び倍率算出部329が算出した観察倍率を含む観察データ、音声入力部31が生成した音声データの各々を時間計測部33によって計測された時間を対応付けて観察データ記録部341、及び音声データ記録部342に記録する(ステップS301)。ステップS301の後、顕微鏡システム100は、後述するステップS302へ移行する。 As shown in FIG. 12, first, the control unit 32b generates observation data including the position information of the stage 202 calculated by the observation information detection control unit 321 and the observation magnification calculated by the magnification calculation unit 329, and the voice input unit 31. Each of the recorded voice data is recorded in the observation data recording unit 341 and the voice data recording unit 342 in association with the time measured by the time measurement unit 33 (step S301). After step S301, the microscope system 100 shifts to step S302 described later.
 ステップS302~ステップS304は、上述した図8のステップS203~ステップS205それぞれに対応する。ステップS304の後、顕微鏡システム100は、ステップS305へ移行する。 Steps S302 to S304 correspond to each of steps S203 to S205 in FIG. 8 described above. After step S304, the microscope system 100 proceeds to step S305.
 ステップS305において、領域設定部12は、重要語句が発声された時間に対応する観察データを抽出し、抽出した観察データに基づいて、画像データに対象領域を設定する。具体的には、領域設定部12は、重要語句が発声された時間に対応する観察データとしてステージ202の位置情報と観察倍率とを抽出し、この時刻における利用者の観察視野を記録した視野データを生成する。換言すると、視野データは、重要語句が発声された時間に表示部20に表示されている画像である。そして、領域設定部12は、この視野データが表す領域を画像データにおける対象領域として設定する。 In step S305, the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data. Specifically, the area setting unit 12 extracts the position information of the stage 202 and the observation magnification as observation data corresponding to the time when the important phrase is uttered, and records the observation field of view of the user at this time. To generate. In other words, the visual field data is an image displayed on the display unit 20 at the time when the important phrase is uttered. Then, the area setting unit 12 sets the area represented by the field of view data as the target area in the image data.
 続いて、生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを生成する(ステップS306)。さらに、生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを記録部34bに出力し、顕微鏡システム100は、本処理を終了する。 Subsequently, the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S306). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 34b, and the microscope system 100 ends this process.
 以上説明した実施の形態3によれば、利用者は、顕微鏡システム100を用いて標本を顕微鏡観察しながら録画した画像データに対して、病変等の対象領域を設定し、ラベル付き画像データを生成することができる。このとき、顕微鏡システム100によれば、利用者の音声により対象領域を設定できるため、利用者が観察中にキーボードやポインティングデバイス等を見ることにより、観察が中断されることがないため、観察効率を向上させることができる。 According to the third embodiment described above, the user sets a target area such as a lesion for the image data recorded while observing the sample under a microscope using the microscope system 100, and generates labeled image data. can do. At this time, according to the microscope system 100, since the target area can be set by the voice of the user, the observation is not interrupted by the user looking at the keyboard, the pointing device, or the like during the observation, so that the observation efficiency is improved. Can be improved.
 なお、上述した実施の形態3では、領域設定部12は、ステージ202の位置情報、及び観察倍率を含む観察データを用いて対象領域を抽出したがこれに限られない。領域設定部12は、画像データを用いて、連続するフレーム間の相関関係を表す特徴量や、連続するフレーム間における移動ベクトルに基づいて、画像データに対象領域を設定してもよい。具体的には、領域設定部12は、連続するフレーム間の相関関係が大きい(類似度が高い)と判定した場合、利用者がこのフレームを注視していると判定し、このフレームを対象領域に設定してもよい。同様に、領域設定部12は、連続するフレーム間における移動ベクトルが小さい(画像間の移動量が小さい)と判定した場合、利用者がこのフレームを注視していると判定し、このフレームを対象領域に設定してもよい。 In the third embodiment described above, the area setting unit 12 extracts the target area using the position information of the stage 202 and the observation data including the observation magnification, but the present invention is not limited to this. The area setting unit 12 may set the target area in the image data based on the feature amount representing the correlation between the continuous frames and the movement vector between the continuous frames by using the image data. Specifically, when the area setting unit 12 determines that the correlation between consecutive frames is large (high degree of similarity), it determines that the user is gazing at this frame, and determines that this frame is the target area. May be set to. Similarly, when the area setting unit 12 determines that the movement vector between consecutive frames is small (the amount of movement between images is small), it determines that the user is gazing at this frame, and targets this frame. It may be set in the area.
 また、顕微鏡200は、視線検出部を有していてもよい。視線検出部は、接眼部206の内部、又は外部に設けられ、利用者の視線を検出することによって視線データを生成し、この視線データを情報処理装置1bへ出力する。視線検出部は、接眼部206の内部に設けられ、近赤外線を照射するLED光源と、接眼部206の内部に設けられ、角膜上の瞳孔点と反射点を撮像する光学センサ(例えばCMOS、CCD)と、を用いて構成される。視線検出部は、情報処理装置1bの制御のもと、LED光源等から近赤外線を利用者の角膜に照射し、光学センサが利用者の角膜上の瞳孔点と反射点を撮像することによって生成する。そして、視線検出部は、情報処理装置1bの制御のもと、光学センサによって生成されたデータに対して画像処理等によって解析した解析結果に基づいて、利用者の瞳孔点と反射点のパターンから利用者の視線を検出することによって視線データを生成し、この視線データを情報処理装置1bへ出力する。領域設定部12は、実施の形態2と同様に、視線データを用いて利用者が観察している画像内に注視点を設定し、この注視点を含むように病変等を表す対象領域を設定してもよい。 Further, the microscope 200 may have a line-of-sight detection unit. The line-of-sight detection unit is provided inside or outside the eyepiece unit 206, generates line-of-sight data by detecting the line of sight of the user, and outputs the line-of-sight data to the information processing device 1b. The line-of-sight detection unit is provided inside the eyepiece 206, an LED light source that irradiates near infrared rays, and an optical sensor (for example, CMOS) that is provided inside the eyepiece 206 and captures pupil points and reflection points on the cornea. , CCD) and. The line-of-sight detection unit is generated by irradiating the user's cornea with near-infrared rays from an LED light source or the like under the control of the information processing device 1b, and the optical sensor images the pupil points and reflection points on the user's cornea. do. Then, the line-of-sight detection unit uses the pattern of the pupil point and the reflection point of the user based on the analysis result analyzed by image processing or the like on the data generated by the optical sensor under the control of the information processing device 1b. The line-of-sight data is generated by detecting the line-of-sight of the user, and this line-of-sight data is output to the information processing device 1b. Similar to the second embodiment, the area setting unit 12 sets a gazing point in the image observed by the user using the line-of-sight data, and sets a target area representing a lesion or the like so as to include the gazing point. You may.
(実施の形態4)
 次に、本開示の実施の形態4について説明する。実施の形態4では、内視鏡システムの一部に情報処理装置を組み込むことによって構成する。以下においては、実施の形態4に係る内視鏡システムの構成を説明後、実施の形態4に係る内視鏡システムが実行する処理について説明する。なお、上述した実施の形態2に係る情報処理装置1aと同一の構成には同一の符号を付して詳細な説明は適宜省略する。
(Embodiment 4)
Next, the fourth embodiment of the present disclosure will be described. In the fourth embodiment, the information processing device is incorporated into a part of the endoscope system. In the following, after explaining the configuration of the endoscope system according to the fourth embodiment, the processing executed by the endoscope system according to the fourth embodiment will be described. The same components as those of the information processing apparatus 1a according to the second embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.
 〔内視鏡システムの構成〕
 図13は、実施の形態4に係る内視鏡システムの構成を示す概略図である。図14は、実施の形態4に係る内視鏡システムの機能構成を示すブロック図である。
[Configuration of endoscope system]
FIG. 13 is a schematic view showing the configuration of the endoscope system according to the fourth embodiment. FIG. 14 is a block diagram showing a functional configuration of the endoscope system according to the fourth embodiment.
 図13、及び図14に示す内視鏡システム300は、表示部20と、内視鏡400と、ウェアラブルデバイス500と、入力部600と、情報処理装置1cと、を備える。 The endoscope system 300 shown in FIGS. 13 and 14 includes a display unit 20, an endoscope 400, a wearable device 500, an input unit 600, and an information processing device 1c.
 〔内視鏡の構成〕
 まず、内視鏡400の構成について説明する。
 内視鏡400は、医者や術者等の利用者U3が被検体U4に挿入することによって、被検体U4の内部を撮像することによって画像データを生成し、この画像データを情報処理装置1cへ出力する。内視鏡400は、撮像部401と、操作部402と、を備える。
[Construction of endoscope]
First, the configuration of the endoscope 400 will be described.
The endoscope 400 generates image data by inserting the user U3 such as a doctor or an operator into the subject U4 and imaging the inside of the subject U4, and transfers this image data to the information processing device 1c. Output. The endoscope 400 includes an image pickup unit 401 and an operation unit 402.
 撮像部401は、内視鏡400の挿入部の先端部に設けられる。撮像部401は、情報処理装置1cの制御のもと、被検体U4の内部を撮像することによって画像データを生成し、この画像データを情報処理装置1cへ出力する。撮像部401は、観察倍率を変更することができる光学系と、光学系が結像した被写体像を受光することによって画像データを生成するCMOSやCCD等のイメージセンサ等を用いて構成される。 The imaging unit 401 is provided at the tip of the insertion unit of the endoscope 400. Under the control of the information processing device 1c, the image pickup unit 401 generates image data by imaging the inside of the subject U4 and outputs the image data to the information processing device 1c. The image pickup unit 401 is configured by using an optical system capable of changing the observation magnification, an image sensor such as CMOS or CCD that generates image data by receiving a subject image formed by the optical system, and the like.
 操作部402は、利用者U3の各種の操作の入力を受け付け、受け付けた各種操作に応じた操作信号を情報処理装置1cへ出力する。 The operation unit 402 receives inputs of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing device 1c.
 〔ウェアラブルデバイスの構成〕
 次に、ウェアラブルデバイス500の構成について説明する。
 ウェアラブルデバイス500は、利用者U3に装着され、利用者U3の視線を検出するとともに、利用者U3の音声の入力を受け付ける。ウェアラブルデバイス500は、視線検出部510と、音声入力部520と、を有する。
[Wearable device configuration]
Next, the configuration of the wearable device 500 will be described.
The wearable device 500 is attached to the user U3, detects the line of sight of the user U3, and accepts the voice input of the user U3. The wearable device 500 includes a line-of-sight detection unit 510 and a voice input unit 520.
 視線検出部510は、ウェアラブルデバイス500に設けられ、利用者U3の視線の注視度を検出することによって視線データを生成し、この視線データを情報処理装置1cへ出力する。視線検出部510は、上述した実施の形態3に係る視線検出部と同様の構成を有するため、詳細な構成は省略する。 The line-of-sight detection unit 510 is provided in the wearable device 500, generates line-of-sight data by detecting the gaze degree of the line of sight of the user U3, and outputs the line-of-sight data to the information processing device 1c. Since the line-of-sight detection unit 510 has the same configuration as the line-of-sight detection unit according to the third embodiment described above, a detailed configuration will be omitted.
 音声入力部520は、ウェアラブルデバイス500に設けられ、利用者U3の音声の入力を受け付けることによって音声データを生成し、この音声データを情報処理装置1cへ出力する。音声入力部520は、マイク等を用いて構成される。 The voice input unit 520 is provided in the wearable device 500, generates voice data by receiving the voice input of the user U3, and outputs the voice data to the information processing device 1c. The voice input unit 520 is configured by using a microphone or the like.
 〔入力部の構成〕
 入力部600の構成について説明する。
 入力部600は、マウス、キーボード、タッチパネル、及び各種のスイッチを用いて構成される。入力部600は、利用者U3の各種の操作の入力を受け付け、受け付けた各種操作に応じた操作信号を情報処理装置1cへ出力する。
[Structure of input unit]
The configuration of the input unit 600 will be described.
The input unit 600 is configured by using a mouse, a keyboard, a touch panel, and various switches. The input unit 600 receives inputs of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing device 1c.
 〔情報処理装置の構成〕
 次に、情報処理装置1cの構成について説明する。
 情報処理装置1cは、上述した実施の形態3に係る情報処理装置1bの制御部32b、記録部34bに換えて、制御部32c、及び記録部34cを備える。
[Configuration of information processing equipment]
Next, the configuration of the information processing apparatus 1c will be described.
The information processing device 1c includes a control unit 32c and a recording unit 34c in place of the control unit 32b and the recording unit 34b of the information processing device 1b according to the third embodiment described above.
 制御部32cは、CPU、FPGA、及びGPU等を用いて構成され、内視鏡400、ウェアラブルデバイス500、及び表示部20を制御する。制御部32cは、視線データ検出制御部321c、音声入力制御部322、表示制御部323、撮影制御部324に加えて、操作履歴検出部326を備える。 The control unit 32c is configured by using a CPU, FPGA, GPU and the like, and controls the endoscope 400, the wearable device 500, and the display unit 20. The control unit 32c includes an operation history detection unit 326 in addition to a line-of-sight data detection control unit 321c, a voice input control unit 322, a display control unit 323, and a shooting control unit 324.
 操作履歴検出部326は、内視鏡400の操作部402が入力を受け付けた操作の内容を検出し、この検出結果を記録部34cに出力する。具体的には、操作履歴検出部326は、内視鏡400の操作部402から拡大スイッチが操作された場合、この操作内容を検出し、この検出結果を記録部34cに出力する。なお、操作履歴検出部326は、内視鏡400を経由して被検体U4の内部に挿入される処置具の操作内容を検出し、この検出結果を記録部34cに出力してもよい。 The operation history detection unit 326 detects the content of the operation received by the operation unit 402 of the endoscope 400, and outputs the detection result to the recording unit 34c. Specifically, when the magnifying switch is operated from the operation unit 402 of the endoscope 400, the operation history detection unit 326 detects the operation content and outputs the detection result to the recording unit 34c. The operation history detection unit 326 may detect the operation content of the treatment tool inserted into the subject U4 via the endoscope 400 and output the detection result to the recording unit 34c.
 記録部34cは、揮発性メモリ、不揮発性メモリ、及び記録媒体等を用いて構成される。記録部34cは、上述した実施の形態3に係る記録部34cの構成に加えて、操作履歴記録部346をさらに備える。 The recording unit 34c is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like. The recording unit 34c further includes an operation history recording unit 346 in addition to the configuration of the recording unit 34c according to the third embodiment described above.
 操作履歴記録部346は、操作履歴検出部326から入力された内視鏡400の操作部402に対する操作の履歴を記録する。 The operation history recording unit 346 records the operation history for the operation unit 402 of the endoscope 400 input from the operation history detection unit 326.
 〔内視鏡システムの処理〕
 次に、内視鏡システム300が実行する処理について説明する。図15は、実施の形態4に係る内視鏡システムが実行する処理の概要を示すフローチャートである。
[Processing of endoscopic system]
Next, the process executed by the endoscope system 300 will be described. FIG. 15 is a flowchart showing an outline of the processing executed by the endoscope system according to the fourth embodiment.
 図15に示すように、まず、制御部32cは、撮像部401が生成した画像データ、視線検出部510が生成した視線データ、音声入力部520が生成した音声データ、及び操作履歴検出部326が検出した操作履歴の各々を時間計測部33によって計測された時間と対応付けて画像データ記録部345、視線データ記録部341c、音声データ記録部342、及び操作履歴記録部346に記録する(ステップS401)。ステップS401の後、内視鏡システム300は、後述するステップS402へ移行する。 As shown in FIG. 15, first, the control unit 32c includes image data generated by the imaging unit 401, line-of-sight data generated by the line-of-sight detection unit 510, voice data generated by the voice input unit 520, and operation history detection unit 326. Each of the detected operation histories is recorded in the image data recording unit 345, the line-of-sight data recording unit 341c, the voice data recording unit 342, and the operation history recording unit 346 in association with the time measured by the time measurement unit 33 (step S401). ). After step S401, the endoscope system 300 shifts to step S402 described later.
 ステップS402~ステップS404は、上述した図12のステップS302~ステップS304それぞれに対応する。ステップS404の後、内視鏡システム300は、ステップS405へ移行する。 Steps S402 to S404 correspond to each of steps S302 to S304 in FIG. 12 described above. After step S404, the endoscope system 300 proceeds to step S405.
 ステップS405において、領域設定部12は、重要語句が発声された時間に対応する観察データを抽出し、抽出した観察データに基づいて、画像データに対象領域を設定する。具体的には、領域設定部12は、重要語句が発声された時間に対応する観察データとして、画像データ記録部345に記録されている画像データにおいて連続するフレーム間の相関関係を表す特徴量を算出する。そして、領域設定部12は、連続するフレーム間の相関関係が大きい(類似度が高い)と判定した場合、利用者U3がこのフレームを注視していると判定し、このフレームを対象領域に設定する。また、領域設定部12は、画像データ記録部345に記録されている画像データにおいて連続するフレーム間における移動ベクトルを算出してもよい。この場合、領域設定部12は、移動ベクトルが小さい(画像間の移動量が小さい)と判定した場合、利用者U3がこのフレームを注視していると判定し、このフレームを対象領域に設定する。 In step S405, the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data. Specifically, the area setting unit 12 sets a feature amount representing the correlation between consecutive frames in the image data recorded in the image data recording unit 345 as observation data corresponding to the time when the important phrase is uttered. calculate. Then, when the area setting unit 12 determines that the correlation between consecutive frames is large (high degree of similarity), the area setting unit 12 determines that the user U3 is gazing at this frame, and sets this frame as the target area. do. Further, the area setting unit 12 may calculate a movement vector between consecutive frames in the image data recorded in the image data recording unit 345. In this case, when the area setting unit 12 determines that the movement vector is small (the amount of movement between images is small), the area setting unit 12 determines that the user U3 is gazing at this frame, and sets this frame as the target area. ..
 続いて、生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを生成する(ステップS406)。さらに、生成部13は、重要語句と対象領域とを関連付けたラベル付き画像データを記録部34cに出力し、内視鏡システム300は、本処理を終了する。 Subsequently, the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S406). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 34c, and the endoscope system 300 ends this process.
 以上説明した実施の形態4によれば、利用者U3は、内視鏡システム300を用いて被検体U4の体内を内視鏡観察しながら録画した画像データに対して、病変等の対象領域を設定し、ラベル付き画像データを生成することができる。このとき、内視鏡システム300によれば、利用者U3の音声により対象領域を設定できるため、利用者U3が観察中にキーボードやポインティングデバイス等を見ることにより、観察が中断されることがないため、観察効率を向上させることができる。 According to the fourth embodiment described above, the user U3 sets a target area such as a lesion on the image data recorded while endoscopically observing the inside of the subject U4 using the endoscope system 300. It can be set and labeled image data can be generated. At this time, according to the endoscope system 300, since the target area can be set by the voice of the user U3, the observation is not interrupted by the user U3 looking at the keyboard, the pointing device, or the like during the observation. Therefore, the observation efficiency can be improved.
 なお、上述した実施の形態3では、領域設定部12は、連続するフレーム間の相関関係を表す特徴量や、連続するフレーム間における移動ベクトルに基づいて、対象領域を抽出したがこれに限られない。領域設定部12は、ウェアラブルデバイス500が生成した視線データを用いて、画像データに対象領域を設定してもよい。具体的には、領域設定部12は、実施の形態2と同様に、視線データを用いて利用者U3が観察している画像内に注視点を設定し、この注視点を含むように病変等を表す対象領域を設定してもよい。 In the above-described third embodiment, the area setting unit 12 extracts the target area based on the feature amount representing the correlation between the continuous frames and the movement vector between the continuous frames, but the region setting unit 12 is limited to this. do not have. The area setting unit 12 may set a target area in the image data by using the line-of-sight data generated by the wearable device 500. Specifically, as in the second embodiment, the area setting unit 12 sets a gaze point in the image observed by the user U3 using the line-of-sight data, and a lesion or the like is included so as to include the gaze point. The target area representing the above may be set.
 また、実施の形態4では、内視鏡システムであったが、例えばカプセル型の内視鏡、被検体を撮像するビデオマイクロスコープ、撮像機能を有する携帯電話、及び撮像機能を有するタブレット型端末であっても適用することができる。 Further, in the fourth embodiment, the endoscope system is used, for example, a capsule-type endoscope, a video microscope for imaging a subject, a mobile phone having an imaging function, and a tablet-type terminal having an imaging function. Even if there is, it can be applied.
 また、実施の形態4では、軟性の内視鏡を備えた内視鏡システムであったが、硬性の内視鏡を備えた内視鏡システム、工業用の内視鏡を備えた内視鏡システムであっても適用することができる。 Further, in the fourth embodiment, the endoscope system is provided with a flexible endoscope, but the endoscope system with a rigid endoscope and the endoscope provided with an industrial endoscope are provided. It can be applied even in a system.
 また、実施の形態4では、被検体に挿入される内視鏡を備えた内視鏡システムであったが、副鼻腔内視鏡、及び電気メスや検査プローブ等の内視鏡システムであっても適用することができる。 Further, in the fourth embodiment, the endoscope system includes an endoscope inserted into the subject, but the endoscope is a sinus endoscope and an endoscope system such as an electric knife or an examination probe. Can also be applied.
(その他の実施の形態)
 上述した実施の形態1~4に開示されている複数の構成要素を適宜組み合わせることによって、種々の発明を形成することができる。例えば、上述した実施の形態1~4に記載した全構成要素からいくつかの構成要素を削除してもよい。さらに、上述した実施の形態1~4で説明した構成要素を適宜組み合わせてもよい。
(Other embodiments)
Various inventions can be formed by appropriately combining the plurality of components disclosed in the above-described embodiments 1 to 4. For example, some components may be deleted from all the components described in the above-described embodiments 1 to 4. Further, the components described in the above-described embodiments 1 to 4 may be appropriately combined.
 また、実施の形態1~4において、上述してきた「部」は、「手段」や「回路」などに読み替えることができる。例えば、制御部は、制御手段や制御回路に読み替えることができる。 Further, in the first to fourth embodiments, the above-mentioned "part" can be read as "means" or "circuit". For example, the control unit can be read as a control means or a control circuit.
 また、実施の形態1~4に係る情報処理装置に実行させるプログラムは、インストール可能な形式、又は実行可能な形式のファイルデータでCD-ROM、フレキシブルディスク(FD)、CD-R、DVD(Digital Versatile Disk)、USB媒体、フラッシュメモリ等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 Further, the programs to be executed by the information processing apparatus according to the first to fourth embodiments are file data in an installable format or an executable format, such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital). It is recorded and provided on a computer-readable recording medium such as a Versail Disc), a USB medium, or a flash memory.
 また、実施の形態1~4に係る情報処理装置に実行させるプログラムは、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。さらに、実施の形態1~4に係る情報処理装置に実行させるプログラムをインターネット等のネットワーク経由で提供、又は配布するようにしてもよい。 Further, the program to be executed by the information processing apparatus according to the first to fourth embodiments may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program to be executed by the information processing apparatus according to the first to fourth embodiments may be provided or distributed via a network such as the Internet.
 また、実施の形態1~4では、伝送ケーブルを経由して各種機器から信号を送信していたが、例えば有線である必要はなく、無線であってもよい。この場合、所定の無線通信規格(例えばWi-Fi(登録商標)やBluetooth(登録商標))に従って、各機器から信号を送信するようにすればよい。もちろん、他の無線通信規格に従って無線通信を行ってもよい。 Further, in the first to fourth embodiments, signals are transmitted from various devices via a transmission cable, but for example, it does not have to be wired and may be wireless. In this case, signals may be transmitted from each device in accordance with a predetermined wireless communication standard (for example, Wi-Fi (registered trademark) or Bluetooth (registered trademark)). Of course, wireless communication may be performed according to other wireless communication standards.
 なお、本明細書におけるフローチャートの説明では、「まず」、「その後」、「続いて」等の表現を用いてステップ間の処理の前後関係を明示していたが、本発明を実施するために必要な処理の順序は、それらの表現によって一意的に定められるわけではない。即ち、本明細書で記載したフローチャートにおける処理の順序は、矛盾のない範囲で変更することができる。 In the description of the flowchart in the present specification, the context of the processing between steps is clarified by using expressions such as "first", "after", and "continued", but in order to carry out the present invention. The order of processing required is not uniquely defined by those representations. That is, the order of processing in the flowchart described in the present specification can be changed within a consistent range.
 以上、本願の実施の形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、本発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are exemplary and various, including the embodiments described in the disclosure section of the present invention, based on the knowledge of those skilled in the art. It is possible to carry out the present invention in another modified or improved form.
 1 情報処理システム
 10,1a,1b,1c 情報処理装置
 11 語句抽出部
 12 領域設定部
 13 生成部
 14,34,34b,34c 記録部
 15 表示制御部
 20 表示部
 30 観察情報検出部
 31,520 音声入力部
 32,32b,32c 制御部
 33 時間計測部
 35 操作部
 100 顕微鏡システム
 111 変換部
 112 抽出部
 200 顕微鏡
 201 筐体部
 202 ステージ
 203 対物レンズ
 204 レボルバ
 205 撮像部
 206 接眼部
 207 操作部
 208 位置検出部
 209 倍率検出部
 401 撮像部
 300 内視鏡システム
 321 観察情報検出制御部
 321c 視線データ検出制御部
 322 音声入力制御部
 324 撮影制御部
 325 倍率算出部
 326 操作履歴検出部
 341 観察データ記録部
 341c 視線データ記録部
 342 音声データ記録部
 343,345 画像データ記録部
 344 プログラム記録部
 345 画像データ記録部
 346 操作履歴記録部
 400 内視鏡
 402 操作部
 500 ウェアラブルデバイス
 510 視線検出部
 600 入力部

 
1 Information processing system 10,1a, 1b, 1c Information processing device 11 Word extraction unit 12 Area setting unit 13 Generation unit 14, 34, 34b, 34c Recording unit 15 Display control unit 20 Display unit 30 Observation information detection unit 31,520 Voice Input unit 32, 32b, 32c Control unit 33 Time measurement unit 35 Operation unit 100 Microscope system 111 Conversion unit 112 Extraction unit 200 Microscope 201 Housing unit 202 Stage 203 Objective lens 204 Revolver 205 Imaging unit 206 Eyepiece 207 Operation unit 208 Position Detection unit 209 Magnification detection unit 401 Imaging unit 300 Endoscope system 321 Observation information detection control unit 321c Line-of-sight data detection control unit 322 Voice input control unit 324 Imaging control unit 325 Magnification calculation unit 326 Operation history detection unit 341 Observation data recording unit 341c Line-of-sight data recording unit 342 Voice data recording unit 343,345 Image data recording unit 344 Program recording unit 345 Image data recording unit 346 Operation history recording unit 400 Endoscope 402 Operation unit 500 Wearable device 510 Line-of-sight detection unit 600 Input unit

Claims (14)

  1.  画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する語句抽出部と、
     前記画像データに対する観察態様を記録した観察データから、前記所定の語句が発声された時間に対応する前記観察データを抽出し、抽出した前記観察データに基づいて、前記画像データに対象領域を設定する領域設定部と、
     前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成する生成部と、
     を備える情報処理装置。
    A word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data.
    From the observation data recording the observation mode for the image data, the observation data corresponding to the time when the predetermined word is spoken is extracted, and the target area is set in the image data based on the extracted observation data. Area setting part and
    A generation unit that generates labeled image data in which the predetermined word / phrase is associated with the target area, and the generation unit.
    Information processing device equipped with.
  2.  前記語句抽出部は、
     前記音声データをテキストデータに変換する変換部と、
     前記テキストデータから記録部に記録されている重要語句を前記所定の語句として抽出する抽出部と、
     を有する請求項1に記載の情報処理装置。
    The phrase extraction unit
    A conversion unit that converts the voice data into text data,
    An extraction unit that extracts important words and phrases recorded in the recording unit from the text data as the predetermined words and phrases, and an extraction unit.
    The information processing apparatus according to claim 1.
  3.  前記抽出部は、前記重要語句を記憶する重要語句記憶部を備え、
     前記重要語句記憶部は、前記重要語句に所定の属性を付与して記憶しており、
     前記抽出部は、前記テキストデータから各属性に含まれる1つ以上の前記重要語句を前記所定の語句として抽出する請求項2に記載の情報処理装置。
    The extraction unit includes an important phrase storage unit for storing the important phrase.
    The important phrase storage unit assigns a predetermined attribute to the important phrase and stores it.
    The information processing device according to claim 2, wherein the extraction unit extracts one or more of the important words / phrases included in each attribute from the text data as the predetermined words / phrases.
  4.  前記重要語句の属性は、
     病変の部位又は由来を表す第1の属性と、
     病変の状態を表す第2の属性と、
     を含む請求項3に記載の情報処理装置。
    The attributes of the important words are
    The first attribute representing the site or origin of the lesion,
    The second attribute, which represents the condition of the lesion,
    The information processing apparatus according to claim 3.
  5.  前記テキストデータは、1つの文節として発話されたテキストデータであり、
     前記抽出部は、前記テキストデータから1つ以上の前記重要語句を抽出し、前記対象領域に対応するラベルが1つ以上の属性に帰属する前記重要語句の組で表される請求項4に記載の情報処理装置。
    The text data is text data uttered as one phrase, and is
    The fourth aspect of claim 4, wherein the extraction unit extracts one or more of the important words and phrases from the text data, and the label corresponding to the target area is represented by a set of the important words and phrases belonging to one or more attributes. Information processing equipment.
  6.  前記領域設定部は、観察倍率が大きい前記観察データを優先的に抽出し、
     前記生成部は、前記観察倍率が大きいほど高い重要度を前記ラベル付き画像データに付与する請求項1~5のいずれか1つに記載の情報処理装置。
    The area setting unit preferentially extracts the observation data having a large observation magnification.
    The information processing apparatus according to any one of claims 1 to 5, wherein the generation unit gives higher importance to the labeled image data as the observation magnification is larger.
  7.  前記領域設定部は、観察領域の移動速度が小さい前記観察データを優先的に抽出し、
     前記生成部は、前記観察領域の移動速度が小さいほど高い重要度を前記ラベル付き画像データに付与する請求項1~5のいずれか1つに記載の情報処理装置。
    The area setting unit preferentially extracts the observation data having a small moving speed of the observation area.
    The information processing apparatus according to any one of claims 1 to 5, wherein the generation unit gives higher importance to the labeled image data as the moving speed of the observation region is smaller.
  8.  前記領域設定部は、観察領域が所定の領域内に停留する時間が長い前記観察データを優先的に抽出し、
     前記生成部は、前記観察領域が所定の領域内に停留する時間が長いほど高い重要度を前記ラベル付き画像データに付与する請求項1~5のいずれか1つに記載の情報処理装置。
    The area setting unit preferentially extracts the observation data in which the observation area stays in a predetermined area for a long time.
    The information processing apparatus according to any one of claims 1 to 5, wherein the generation unit imparts higher importance to the labeled image data as the observation region stays in a predetermined region for a longer period of time.
  9.  前記観察データは、前記利用者の視線を検出した視線データを含む請求項1~8のいずれか1つに記載の情報処理装置。 The information processing device according to any one of claims 1 to 8, wherein the observation data includes line-of-sight data for detecting the line-of-sight of the user.
  10.  前記観察データは、前記利用者の観察視野を記録した視野データを含む請求項1~9のいずれか1つに記載の情報処理装置。 The information processing device according to any one of claims 1 to 9, wherein the observation data includes visual field data recording the observation visual field of the user.
  11.  前記観察データは、ポインティングデバイスの位置情報を含む請求項1~10のいずれか1つに記載の情報処理装置。 The information processing device according to any one of claims 1 to 10, wherein the observation data includes position information of a pointing device.
  12.  フレーム画像を観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する語句抽出部と、
     前記所定の語句が発声された時間に対応する前記フレーム画像を抽出し、抽出した前記フレーム画像を対象領域に設定する領域設定部と、
     前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成する生成部と、
     を備える情報処理装置。
    A phrase extractor that extracts a predetermined phrase from voice data that records the voice spoken by the user while observing the frame image, and a phrase extractor.
    An area setting unit that extracts the frame image corresponding to the time when the predetermined word is uttered and sets the extracted frame image in the target area.
    A generation unit that generates labeled image data in which the predetermined word / phrase is associated with the target area, and the generation unit.
    Information processing device equipped with.
  13.  画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出する語句抽出部と、
     前記画像データに対する観察態様を記録した観察データから、前記所定の語句が発声された時間に対応する前記観察データを抽出し、抽出した前記観察データに基づいて、前記画像データに対象領域を設定する領域設定部と、
     前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成する生成部と、
     前記ラベル付き画像データを用いた機械学習を行い、画像群から前記所定の語句に対応する領域を検出する学習済みモデルを生成するモデル生成部と、
     を備える学習装置。
    A word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data.
    From the observation data recording the observation mode for the image data, the observation data corresponding to the time when the predetermined word is spoken is extracted, and the target area is set in the image data based on the extracted observation data. Area setting part and
    A generation unit that generates labeled image data in which the predetermined word / phrase is associated with the target area, and the generation unit.
    A model generation unit that performs machine learning using the labeled image data and generates a trained model that detects a region corresponding to the predetermined phrase from the image group.
    A learning device equipped with.
  14.  学習用画像データを観察しながら利用者が発声した音声を記録した音声データから所定の語句を抽出し、
     前記学習用画像データに対する観察態様を記録した観察データから、前記所定の語句が発声された時間に対応する前記観察データを抽出し、抽出した前記観察データに基づいて、前記学習用画像データに対象領域を設定し、
     前記所定の語句と前記対象領域とを関連付けたラベル付き画像データを生成し、
     前記ラベル付き画像データを用いた機械学習により生成され、判定用画像データが入力された際に、前記所定の語句に関連付けられた前記対象領域と推定する領域を抽出し、該領域が前記対象領域である尤度を出力するよう、
     コンピュータを機能させるための学習済みモデル。
    A predetermined phrase is extracted from the voice data obtained by recording the voice uttered by the user while observing the image data for learning.
    From the observation data recording the observation mode for the learning image data, the observation data corresponding to the time when the predetermined word is spoken is extracted, and the target for the learning image data is based on the extracted observation data. Set the area and
    Generates labeled image data in which the predetermined phrase is associated with the target area.
    When the image data for determination is input, which is generated by machine learning using the labeled image data, a region presumed to be the target region associated with the predetermined phrase is extracted, and the region is the target region. To output the likelihood that is
    A trained model for making your computer work.
PCT/JP2020/031904 2020-08-24 2020-08-24 Information processing device, learning device, and learned model WO2022044095A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/031904 WO2022044095A1 (en) 2020-08-24 2020-08-24 Information processing device, learning device, and learned model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/031904 WO2022044095A1 (en) 2020-08-24 2020-08-24 Information processing device, learning device, and learned model

Publications (1)

Publication Number Publication Date
WO2022044095A1 true WO2022044095A1 (en) 2022-03-03

Family

ID=80354819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/031904 WO2022044095A1 (en) 2020-08-24 2020-08-24 Information processing device, learning device, and learned model

Country Status (1)

Country Link
WO (1) WO2022044095A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019200651A (en) * 2018-05-17 2019-11-21 オリンパス株式会社 Information processor, method for processing information, and program
JP2019533847A (en) * 2016-08-12 2019-11-21 ヴェリリー ライフ サイエンシズ エルエルシー Advanced pathological diagnosis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019533847A (en) * 2016-08-12 2019-11-21 ヴェリリー ライフ サイエンシズ エルエルシー Advanced pathological diagnosis
JP2019200651A (en) * 2018-05-17 2019-11-21 オリンパス株式会社 Information processor, method for processing information, and program

Similar Documents

Publication Publication Date Title
JP7064952B2 (en) Information processing equipment, information processing methods and programs
JP7171985B2 (en) Information processing device, information processing method, and program
JP5455550B2 (en) Processor for electronic endoscope
US10754425B2 (en) Information processing apparatus, information processing method, and non-transitory computer readable recording medium
JP2004181229A (en) System and method for supporting remote operation
JP2007293818A (en) Image-recording device, image-recording method, and image-recording program
JPWO2018168261A1 (en) CONTROL DEVICE, CONTROL METHOD, AND PROGRAM
JP2017213097A (en) Image processing device, image processing method, and program
JP2015093147A (en) Medical system
JP2011215856A (en) Information processing system and information processing method
JP7141938B2 (en) Voice recognition input device, voice recognition input program and medical imaging system
WO2022044095A1 (en) Information processing device, learning device, and learned model
JP2018028562A (en) Medical image display device and image interpretation report generation assistance device
EP3603476A1 (en) Medical system control device, medical system control method, and medical system
JP2019202131A (en) Information processing apparatus, information processing method, and program
JP2018047067A (en) Image processing program, image processing method, and image processing device
US11883120B2 (en) Medical observation system, medical signal processing device, and medical signal processing device driving method
WO2022070423A1 (en) Information processing device, method for operating information processing device, and program
JP2006221583A (en) Medical treatment support system
WO2018138834A1 (en) Information recording system, information recording device, and information recording method
EP3731073A1 (en) Information processing device, information processing method, and program
KR20160122869A (en) Apparatus for being possible language converting using robot arm
US10971174B2 (en) Information processing apparatus, information processing method, and non-transitory computer readable recording medium
WO2021144970A1 (en) Information processing device, information processing method, and program
JP6095875B2 (en) MEDICAL DIAGNOSIS DEVICE, MEDICAL DIAGNOSIS DEVICE OPERATING METHOD, AND MEDICAL DIAGNOSIS DEVICE OPERATING PROGRAM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20951350

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20951350

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP