WO2022044095A1

WO2022044095A1 - Information processing device, learning device, and learned model

Info

Publication number: WO2022044095A1
Application number: PCT/JP2020/031904
Authority: WO
Inventors: 伸之渡辺; 英敏西村; 一仁堀内; 善興金子
Original assignee: オリンパス株式会社
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2022-03-03

Abstract

An information processing device comprises: a phrase extraction unit that extracts a prescribed phrase from speech data in which speech uttered by a user is recorded, while observing image data; an area setting unit that extracts the observation data corresponding to the time in which the prescribed phrase was uttered from observation data recording the observation mode for the image data, and sets a target area in the image data on the basis of the extracted observation data; and a generation unit that generates labeled image data associating the prescribed phrase and the target area. An information processing device is thereby provided in which the load is reduced when setting a target area in image data, and learning data can be easily generated.

Description

Information processing device, learning device, and trained model

The present invention relates to an information processing device, a learning device, and a trained model.

Conventionally, pathological diagnosis is performed in which a pathologist observes a specimen collected from a patient and makes a diagnosis. In pathological diagnosis, a pathologist may create a diagnostic report by adding a diagnosis result (information such as presence / absence, type, position, etc. of a lesion) to an image of a specimen.

Further, there is known a technique for supporting a pathologist's diagnosis by generating a trained model for detecting a lesion from an image of a specimen by performing machine learning using image data to which a diagnosis result is given. Patent Document 1 discloses a learning data generation support device that uses image data in which the lesion features of a diagnostic report and the lesion features detected by image processing match as correct answer data for machine learning.

Japanese Unexamined Patent Publication No. 2019-008349

However, when generating learning data using the technique of Patent Document 1, a pathologist needs to create a diagnostic report. Specifically, the pathologist performs an input operation of the target area such as a lesion using a keyboard or a pointing device while observing the image of the specimen in order to associate the lesion or the like with which image and position. It was necessary and burdened by the pathologist.

The present invention has been made in view of the above, and is an information processing device, a learning device, and a learning device that can reduce the burden of setting a target area on image data and can easily generate learning data. The purpose is to provide a finished model.

In order to solve the above-mentioned problems and achieve the object, the information processing apparatus according to one aspect of the present invention extracts predetermined words and phrases from the voice data recorded by the voice spoken by the user while observing the image data. The observation data corresponding to the time when the predetermined phrase is uttered is extracted from the phrase extraction unit and the observation data recording the observation mode for the image data, and the image data is obtained based on the extracted observation data. It includes an area setting unit for setting a target area, and a generation unit for generating labeled image data in which the predetermined word / phrase is associated with the target area.

Further, in the information processing apparatus according to one aspect of the present invention, the phrase extraction unit has a conversion unit that converts the voice data into text data and the important words / phrases recorded in the recording unit from the text data. It has an extraction unit for extracting as words and phrases.

Further, in the information processing apparatus according to one aspect of the present invention, the extraction unit includes an important phrase storage unit for storing the important phrase, and the important phrase storage unit assigns a predetermined attribute to the important phrase. It is stored, and the extraction unit extracts one or more of the important words and phrases included in each attribute from the text data as the predetermined words and phrases.

Further, in the information processing apparatus according to one aspect of the present invention, the attributes of the important phrase include a first attribute representing the site or origin of the lesion and a second attribute representing the state of the lesion.

Further, in the information processing apparatus according to one aspect of the present invention, the text data is text data spoken as one phrase, and the extraction unit extracts one or more of the important words and phrases from the text data. , The label corresponding to the target area is represented by the set of important words and phrases belonging to one or more attributes.

Further, in the information processing apparatus according to one aspect of the present invention, the area setting unit preferentially extracts the observation data having a large observation magnification, and the generation unit preferentially extracts the observation data having a large observation magnification. Attached to labeled image data.

Further, in the information processing apparatus according to one aspect of the present invention, the area setting unit preferentially extracts the observation data having a small moving speed of the observation area, and the generating unit has a small moving speed of the observation area. The labeled image data is given a higher degree of importance.

Further, in the information processing apparatus according to one aspect of the present invention, the area setting unit preferentially extracts the observation data for which the observation area stays in a predetermined area for a long time, and the generation unit generates the observation. The longer the region stays in the predetermined region, the higher the importance is given to the labeled image data.

Further, in the information processing apparatus according to one aspect of the present invention, the observation data includes the line-of-sight data obtained by detecting the line-of-sight of the user.

Further, in the information processing apparatus according to one aspect of the present invention, the observation data includes the visual field data in which the observation visual field of the user is recorded.

Further, in the information processing apparatus according to one aspect of the present invention, the observation data includes the position information of the pointing device.

Further, in the information processing apparatus according to one aspect of the present invention, a phrase extraction unit that extracts a predetermined phrase from voice data recording a voice uttered by a user while observing a frame image, and the predetermined phrase are uttered. An area setting unit that extracts the frame image corresponding to the time and sets the extracted frame image in the target area, and a generation unit that generates labeled image data in which the predetermined phrase and the target area are associated with each other. , Equipped with.

Further, the learning device according to one aspect of the present invention records a word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data, and records the observation mode for the image data. The observation data corresponding to the time when the predetermined word is uttered is extracted from the observed data, and the area setting unit for setting the target area in the image data based on the extracted observation data, and the predetermined area. A trained model that detects a region corresponding to the predetermined phrase from an image group by performing machine learning using the labeled image data and a generator that generates labeled image data in which the phrase and the target area are associated with each other. It is provided with a model generation unit for generating data.

Further, in the trained model according to one aspect of the present invention, a predetermined phrase is extracted from the voice data in which the voice uttered by the user is recorded while observing the training image data, and the observation mode for the training image data is obtained. From the recorded observation data, the observation data corresponding to the time when the predetermined phrase is uttered is extracted, and based on the extracted observation data, a target area is set in the learning image data, and the predetermined phrase is set. The labeled image data associated with the target area is generated, and is generated by machine learning using the labeled image data. When the determination image data is input, the said phrase associated with the predetermined phrase. A region estimated to be the target region is extracted, and the computer is made to function so as to output the likelihood that the region is the target region.

According to the present invention, it is possible to realize an information processing device, a learning device, and a trained model that can reduce the burden of setting a target area for image data and easily generate learning data.

[Correction under Rule 91 17.09.2020]
FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment. FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus according to the first embodiment. FIG. 3 is a diagram showing voice data. FIG. 4 is a diagram showing observation data. FIG. 5 is a schematic diagram showing the configuration of the information processing apparatus according to the second embodiment. FIG. 6 is a schematic diagram showing the configuration of the information processing apparatus according to the second embodiment. FIG. 7 is a block diagram showing a functional configuration of the information processing apparatus according to the second embodiment. FIG. 8 is a flowchart showing an outline of the processing executed by the information processing apparatus. FIG. 9 is a diagram showing words and phrases extracted by the word and phrase extraction unit. FIG. 10 is a diagram showing how the area setting unit sets the target area. FIG. 11 is a block diagram showing a functional configuration of the microscope system according to the third embodiment. FIG. 12 is a flowchart showing an outline of the processing performed by the microscope system according to the third embodiment. FIG. 13 is a schematic view showing the configuration of the endoscope system according to the fourth embodiment. FIG. 14 is a block diagram showing a functional configuration of the endoscope system according to the fourth embodiment. FIG. 15 is a flowchart showing an outline of the processing executed by the endoscope system according to the fourth embodiment.

Hereinafter, the mode for implementing the present disclosure will be described in detail together with the drawings. The present disclosure is not limited to the following embodiments. In addition, each figure referred to in the following description merely schematically shows the shape, size, and positional relationship to the extent that the contents of the present disclosure can be understood. That is, the present disclosure is not limited to the shapes, sizes, and positional relationships exemplified in each figure.

(Embodiment 1)
[Information processing system configuration]
FIG. 1 is a block diagram showing a functional configuration of the information processing system according to the first embodiment. The information processing system 1 shown in FIG. 1 displays an information processing device 10 that performs various processing on line-of-sight data, voice data, and image data input from the outside, and various data output from the information processing device 10. The display unit 20 is provided. The information processing device 10 and the display unit 20 are bidirectionally connected by wireless or wired connection.

[Configuration of information processing equipment]
First, the configuration of the information processing apparatus 10 will be described.
The information processing device 10 shown in FIG. 1 is realized by using a program installed in, for example, a server or a personal computer, and various data are input via a network or various data acquired by an external device are input. Entered. As shown in FIG. 1, the information processing apparatus 10 includes a word extraction unit 11, an area setting unit 12, a generation unit 13, a recording unit 14, and a display control unit 15.

The word / phrase extraction unit 11 extracts a predetermined word / phrase from the voice data of the user input from the outside and recorded the voice uttered by the user while observing the image data. The word / phrase extraction unit 11 includes a conversion unit 111 that converts voice data into text data, and an extraction unit 112 that extracts important words / phrases recorded in the recording unit 14 from the text data as predetermined words / phrases.

The conversion unit 111 converts the voice data into character information (text data) by performing a well-known text conversion process on the voice data, and outputs this character information to the extraction unit 112. In addition, it is possible to configure the voice character conversion not to be performed at this point. In that case, the target area may be set in the image data as the voice information and then converted into the character information.

The extraction unit 112 extracts important words and phrases recorded in advance in the recording unit 14 from the text data converted by the conversion unit 111, and records in the recording unit 14 the time when the important words and phrases are uttered in association with the important words and phrases. do. Specifically, the phrase extraction unit 11 associates important words and phrases uttered in this frame with each frame of voice data and records them in the recording unit 14. Further, the user's voice data input from the outside is generated by a voice input unit such as a microphone.

The phrase extraction unit 11 is configured by using a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a GPU (Graphics Processing Unit), and the like. The phrase extraction unit 11 may handle the voice data as it is without converting the voice data into text data. In this case, the phrase extraction unit 11 does not have the conversion unit 111, and the extraction unit 112 extracts a predetermined phrase directly from the voice data.

The area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered from the observation data in which the observation mode for the image data is recorded. The area setting unit 12 may extract observation data included in a predetermined time or a predetermined number of sheets including a time when an important phrase is uttered (from the start to the end of utterance). Further, the area setting unit 12 sets a target area in the image data based on the extracted observation data, and records the position information of the target area in the recording unit 14. When a plurality of observation data are extracted, the area setting unit 12 may set the target area in the image data corresponding to one selected observation data, and may set the target area in the plurality of image data corresponding to the selected plurality of observation data. The target area may be set. The area setting unit 12 is configured by using a CPU, FPGA, GPU and the like.

When the usage pattern is WSI (Whole Slide Imaging), the user observes a part of the slide sample of the microscope as a visual field, and the magnification and position of the observation visual field change with time. In this case, the magnification of the entire image data and which part is presented as the field of view, that is, the magnification and absolute coordinates of the display area with respect to the entire image data are recorded as observation data in synchronization with the audio data. It is recorded in 14. The area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is uttered, selects one or a plurality of observation data according to, for example, a magnification of the extracted observation data, and corresponds to the selected observation data. The display area is set as the target area, and the position information of the target area is recorded in the recording unit 14. However, the area setting unit 12 may set a part of the display area set by predetermined image processing or the like as the target area.

Further, when the usage mode uses an optical microscope or an endoscopic system, the field of view displayed on the display unit 20 matches the field of view of the image data, so that the relative positional relationship of the observation field of view with respect to the absolute coordinates of the image. Does not change. Further, when the usage mode uses an optical microscope and the image observed by the user is recorded as a moving image, that is, when the image data is a moving image, the magnification of the objective lens and the time change of the stage position are changed. Is recorded in the recording unit 14 in synchronization with the audio data as observation data. The area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is spoken, selects one or a plurality of observation data according to, for example, a magnification of the extracted observation data, and corresponds to the selected observation data. An image is set in the target area, and information representing this target area is recorded in the recording unit 14. Further, when the usage pattern uses an endoscope system, when the image observed by the user is recorded as a moving image, that is, when the image data is a moving image, a feature showing the correlation between consecutive frames. The amount and the movement vector between consecutive frames are recorded as observation data in the recording unit 14 in synchronization with the audio data. The area setting unit 12 extracts a plurality of observation data corresponding to the time when the important phrase is spoken, selects one or a plurality of observation data by, for example, a movement vector of the extracted observation data, and corresponds to the selected observation data. The image to be used is set in the target area, and information representing this target area is recorded in the recording unit 14.

The generation unit 13 generates labeled image data in which important words and phrases are associated with the target area, and outputs the generated labeled image data to the recording unit 14 and the display control unit 15. Specifically, the generation unit 13 generates labeled image data in which the important words extracted by the word extraction unit 11 and the target area set by the area setting unit 12 are associated with each other. The generation unit 13 is configured by using a CPU, FPGA, GPU and the like.

The recording unit 14 synchronously records the voice data input from the word extraction unit 11, the observation data input from the area setting unit 12, and the image data input from the generation unit 13. Further, the recording unit 14 records the important words / phrases input in advance and the labeled image data in which the important words / phrases input from the generation unit 13 are associated with the target area. The recording unit 14 is also used for temporarily recording important words and phrases extracted by the phrase extraction unit 11, observation data extracted by the area setting unit 12, target areas set by the area setting unit 12, and the like. Further, the recording unit 14 records various programs executed by the information processing apparatus 10 and data being processed. The recording unit 14 is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like.

The display control unit 15 displays by superimposing various information on the image corresponding to the image data input from the outside and outputting it to the external display unit 20. The display control unit 15 is configured by using a CPU, FPGA, GPU and the like. The word / phrase extraction unit 11, the area setting unit 12, the generation unit 13, and the display control unit 15 may be configured so that each function can be exhibited by using any one of the CPU, FPGA, and GPU. However, of course, the CPU, FPGA, and GPU may be combined and configured so that each function can be exhibited.

[Structure of display unit]
Next, the configuration of the display unit 20 will be described.
The display unit 20 displays an image corresponding to the image data input from the display control unit 15. The display unit 20 is configured by using a display monitor such as an organic EL (Electroluminescence) or a liquid crystal display.

[Processing of information processing equipment]
Next, the processing of the information processing apparatus 10 will be described. FIG. 2 is a flowchart illustrating a process executed by the information processing apparatus 10.

As shown in FIG. 2, first, the information processing apparatus 10 acquires image data, observation data, and audio data input from the outside (step S101). These data are synchronized and are temporally associated with each other. The image data is, for example, an entire image in WSI, the observation data is information representing a display area and an observation magnification in the entire image, and the audio data is data that records the audio spoken by the user while observing.

Subsequently, the phrase extraction unit 11 extracts a keyword (predetermined phrase) from the voice data (step S102). Specifically, first, the conversion unit 111 converts the voice data into text data. Then, the extraction unit 112 extracts the important words and phrases recorded in the recording unit 14 from the text data as predetermined words and phrases.

FIG. 3 is a diagram showing voice data. As shown in FIG. 3, the phrase extraction unit 11 extracts utterance contents X1, X2, and X3 that match important words and phrases from the voice data converted into text data, respectively, and the start time t11 at which each utterance is started. , T21, t31 and the end times t12, t22, t32 are associated and recorded in the recording unit 14.

Returning to FIG. 2, the description after step S103 will be continued.
In step S103, the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data (step S103). As observation data, the recording unit 14 records the time and the coordinate information of two predetermined points in the image displayed on the display unit 20 at that time in association with each other. The predetermined two points are not particularly limited, but are, for example, two points, the upper left corner and the lower right corner of the rectangular image displayed on the display unit 20, and represent the display area displayed on the display unit 20. When a part of the whole image is a display area in WSI, the observation magnification can be calculated from these two points. Then, the area setting unit 12 extracts the observation magnification calculated from the coordinate information as the observation data included in the start time t11 to the end time t12 for the utterance content X1 shown in FIG. Further, the largest value is selected from the extracted observation magnifications, and the display area (image displayed on the display unit 20) corresponding to this observation magnification is set as the target area. Then, the area setting unit 12 records the set target area in the recording unit 14.

FIG. 4 is a diagram showing observation data. As shown in FIG. 4, the target area set by the area setting unit 12 is recorded in the recording unit 14 as coordinate information of two predetermined points representing the time and the display area. For the utterance content X1, when the time with the largest observation magnification between the start time t11 and the end time t12 is the time t13, the time t13 and the coordinate information representing the display area at this time t13 are recorded in the recording unit 14. .. The coordinate information is coordinate A11 as coordinate information 1 corresponding to the upper left corner of the image displayed on the display unit 20 at time t13, and coordinate A12 as coordinate information 2 corresponding to the lower right corner of this image. As a result, the area in which the image data is most magnified and observed while the user generates the utterance content X1 is set as the target area.

Similarly, for the utterance content X2, when the time with the largest observation magnification between the start time t21 and the end time t22 is the time t23, the time t23 and the coordinate information (A21, A22) representing the display area at this time t23 Is recorded in the recording unit 14. Similarly, for the utterance content X3, when the time with the largest observation magnification between the start time t31 and the end time t32 is the time t33, the time t33 and the coordinate information (A31, A32) representing the display area at this time t33 Is recorded in the recording unit 14.

Subsequently, the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S104). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 14, and the information processing apparatus 10 ends this processing.

According to the first embodiment described above, since the area setting unit 12 sets the target area based on the observation data, the burden of setting the target area on the image data is reduced, and the learning data is easily generated. can do. In this information processing system 1, since the target area can be set by the voice of the user, the observation is not interrupted by the user looking at the keyboard, the pointing device, or the like during the observation, so that the observation efficiency is improved. be able to.

Further, according to the first embodiment, since the generation unit 13 records the labeled image data in the recording unit 14, the learning data used in machine learning such as deep learning can be easily acquired. Then, by performing machine learning such as deep learning using the acquired image data with a label for learning, when the image data for judgment that is an image of a sample or the like is input, the target area associated with the important phrase is input. It is possible to extract a region to be estimated and generate a trained model that outputs the likelihood that this region is the target region. Although machine learning may be performed by the information processing device 10 using the labeled image data generated by the information processing device 10, machine learning may be performed by a computer different from the information processing device 10 to obtain a trained model. May be generated. Further, using the generated learning model, the information processing apparatus 10 may extract the target area for the determination image and output the likelihood, but the object for the determination image may be output by a computer different from the information processing apparatus 10. Region extraction and likelihood output may be performed.

Further, in the above-described first embodiment, the area setting unit 12 has described an example of extracting the observation data included in the start time t11 to the end time t12 for the utterance content X1, but the present invention is not limited to this. For example, there may be a time delay between the user discovering the lesion corresponding to the utterance content X1 and the utterance of the utterance content X1, so that the time before the start time t11 by a predetermined time. Observation data may be extracted including up to. Similarly, since the user may utter the utterance content X1 and then observe the lesion or the like corresponding to the utterance content X1 in detail, the observation data is extracted including the time after the end time t12 by a predetermined time. You may.

Further, in the first embodiment described above, the area setting unit 12 has described an example in which the largest value is selected from the extracted observation magnifications and the display area corresponding to the observation magnification is set as the target area. Not limited. The area setting unit 12 may select the largest value from the extracted observation magnifications and set a part of the display area corresponding to the observation magnification as the target area. For example, the area setting unit 12 may set a part including the lesion area extracted from the display area by image processing as the target area.

Further, the area setting unit 12 may preferentially extract the observation data having a large observation magnification, and the generation unit 13 may give higher importance to the labeled image data as the observation magnification is larger. Specifically, the area setting unit 12 extracts observation data having an observation magnification larger than the threshold value. Then, the generation unit 13 assigns a higher importance to the labeled image data as the observation magnification is larger, and records the data in the recording unit 14. As a result, the image can be automatically labeled so that the image observed by the user at an increased observation magnification becomes more important, reducing the burden on the user when labeling the image. be able to.

Further, the area setting unit 12 preferentially extracts the observation data having a low moving speed of the observation area, and the generation unit 13 may give higher importance to the labeled image data as the moving speed of the observation area is smaller. good. Specifically, the area setting unit 12 extracts observation data in which the moving speed of the observation area is smaller than the threshold value. Then, the generation unit 13 assigns a higher importance to the labeled image data as the moving speed of the observation region is smaller, and records the data in the recording unit 14. As a result, the image can be automatically labeled so that the image that the user gazes at without moving the observation area so much becomes more important, and the burden on the user when labeling the image. Can be reduced.

Further, the area setting unit 12 preferentially extracts the observation data for which the observation area stays in the predetermined area for a long time, and the generation unit 13 preferentially extracts the observation data for which the observation area stays in the predetermined area for a long time. The importance may be given to the labeled image data. Specifically, the area setting unit 12 extracts observation data in which the observation area stays in a predetermined area for a time longer than the threshold value. Then, the generation unit 13 assigns a higher importance to the labeled image data and records it in the recording unit 14 as the observation region stays in the predetermined region for a longer time. As a result, the image can be automatically labeled so that the observation area by the user stays in the predetermined area for a long time and the image that the user gazes at becomes more important, and the image is labeled. It is possible to reduce the burden on the user when attaching.

(Embodiment 2)
Next, the second embodiment of the present disclosure will be described. In the first embodiment, each of the observation data and the voice data is input from the outside, but in the second embodiment, the observation data and the voice data are generated. In the following, after explaining the configuration of the information processing apparatus according to the second embodiment, the processing executed by the information processing apparatus according to the second embodiment will be described. The same components as those of the information processing system 1 according to the first embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.

[Configuration of information processing equipment]
5 and 6 are schematic views showing the configuration of the information processing apparatus according to the second embodiment. FIG. 7 is a block diagram showing a functional configuration of the information processing apparatus according to the second embodiment.

The information processing apparatus 1a shown in FIGS. 5 to 7 includes a phrase extraction unit 11, an area setting unit 12, a generation unit 13, a display unit 20, an observation information detection unit 30, a voice input unit 31, and a control unit. 32, a time measuring unit 33, a recording unit 34, and an operation unit 35 are provided.

The observation information detection unit 30 includes an LED light source that irradiates near-infrared rays and an optical sensor (for example, CMOS (Complementary Metal Oxide Sensor), CCD (Charge Coupled Device), etc.) that captures pupil points and reflection points on the cornea. Constructed using. The observation information detection unit 30 is provided on the side surface of the housing of the information processing apparatus 1a in which the user U1 can visually recognize the display unit 20 (see FIGS. 5 and 6). Under the control of the control unit 32, the observation information detection unit 30 generates line-of-sight data that detects the line-of-sight of the user U1 with respect to the image displayed by the display unit 20, and outputs this line-of-sight data to the control unit 32 as observation data. do. Specifically, the observation information detection unit 30 irradiates the cornea of the user U1 with near infrared rays from an LED light source or the like under the control of the control unit 32, and the optical sensor is used as a pupil point on the cornea of the user U1. Line-of-sight data is generated by imaging the reflection points. Then, the observation information detection unit 30 determines the pupil point and the reflection point of the user U1 based on the analysis result analyzed by image processing or the like on the data generated by the optical sensor under the control of the control unit 32. By continuously calculating the line of sight and the line of sight of the user from the pattern, the line of sight data for a predetermined time is generated, and this line of sight data is output as observation data to the observation information detection control unit 321 described later. It should be noted that the observation information detection unit 30 may generate line-of-sight data in which the line of sight of the user U1 is detected by detecting the pupil of the user U1 by using a well-known pattern matching simply by using an optical sensor. Alternatively, the line-of-sight data may be generated by detecting the line-of-sight of the user U1 using other sensors or other well-known techniques.

The voice input unit 31 converts a microphone into which voice is input and a voice codec that converts the voice received by the microphone into digital voice data and outputs the voice data to the control unit 32 by amplifying the voice data. Consists of using. The voice input unit 31 generates voice data by receiving the voice input of the user U1 under the control of the control unit 32, and outputs the voice data to the control unit 32. In addition to the voice input, the voice input unit 31 may be provided with a speaker or the like capable of outputting voice, and may be provided with a voice output function.

The control unit 32 is configured by using a CPU, FPGA, GPU and the like, and controls the observation information detection unit 30, the voice input unit 31, and the display unit 20. The control unit 32 includes an observation information detection control unit 321, a voice input control unit 322, and a display control unit 323.

The observation information detection control unit 321 controls the observation information detection unit 30. Specifically, the observation information detection control unit 321 causes the observation information detection unit 30 to irradiate the user U1 with near infrared rays at predetermined timings, and causes the observation information detection unit 30 to image the pupil of the user U1. Generates line-of-sight data. Further, the observation information detection control unit 321 performs various image processing on the line-of-sight data input from the observation information detection unit 30 and outputs the image to the recording unit 34.

The voice input control unit 322 controls the voice input unit 31, performs various processes such as gain up and noise reduction processing on the voice data input from the voice input unit 31, and outputs the voice data to the recording unit 34.

The display control unit 323 controls the display mode of the display unit 20. The display control unit 323 causes the display unit 20 to display an image corresponding to the image data recorded in the recording unit 34.

The time measurement unit 33 is configured by using a timer, a clock generator, or the like, and adds time information to the line-of-sight data generated by the observation information detection unit 30, the voice data generated by the voice input unit 31, and the like.

The recording unit 34 is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like, and records various information related to the information processing apparatus 1a. The recording unit 34 includes an observation data recording unit 341, an audio data recording unit 342, an image data recording unit 343, and a program recording unit 344.

The observation data recording unit 341 records the line-of-sight data input from the observation information detection control unit 321 as observation data, and outputs the observation data to the area setting unit 12.

The voice data recording unit 342 records the voice data input from the voice input control unit 322 and outputs the voice data to the phrase extraction unit 11.

The image data recording unit 343 records a plurality of image data. The plurality of image data are data input from the outside of the information processing device 1a or data captured by an external image pickup device by a recording medium.

The program recording unit 344 records various programs executed by the information processing apparatus 1a, data used during execution of various programs (for example, dictionary information and text conversion dictionary information), and processing data during execution of various programs.

The operation unit 35 is configured by using a mouse, a keyboard, a touch panel, various switches, and the like, receives input of the operation of the user U1, and outputs the operation content that has received the input to the control unit 32.

[Processing of information processing equipment]
Next, the processing executed by the information processing apparatus 1a will be described. FIG. 8 is a flowchart showing an outline of the processing executed by the information processing apparatus.

As shown in FIG. 8, first, the display control unit 323 causes the display unit 20 to display an image corresponding to the image data recorded by the image data recording unit 343 (step S201). In this case, the display control unit 323 causes the display unit 20 to display an image corresponding to the image data selected according to the operation of the operation unit 35.

Subsequently, the control unit 32 associates each of the observation data generated by the observation information detection unit 30 and the voice data generated by the voice input unit 31 with the time measured by the time measurement unit 33, and the observation data recording unit. Recording is performed in 341 and the voice data recording unit 342 (step S202).

After that, the conversion unit 111 converts the voice data recorded by the voice data recording unit 342 into character information (text data) (step S203). In addition, this step may be performed after S206 described later.

Subsequently, when an instruction signal for ending the observation of the image displayed by the display unit 20 is input from the operation unit 35 (step S204: Yes), the information processing apparatus 1a shifts to step S205 described later. On the other hand, when the instruction signal for ending the observation of the image displayed by the display unit 20 is not input from the operation unit 35 (step S204: No), the information processing apparatus 1a returns to step S202.

In step S205, the phrase extraction unit 11 extracts a keyword (predetermined phrase) from the voice data. Specifically, the extraction unit 112 extracts important words and phrases as predetermined words and phrases from the text data converted by the conversion unit 111 from the voice data. Here, the extraction unit 112 includes an important word / phrase storage unit 113 for storing important words / phrases, and the important word / phrase storage unit 113 assigns attributes to important words / phrases and stores them. The extraction unit 112 extracts one or more important words / phrases included in each attribute (word / phrase class) from the text data spoken as one phrase.

FIG. 9 is a diagram showing attributes for classifying important words and phrases. As shown in FIG. 9, the phrase extraction unit 11 has an important phrase indicating the site or origin of a lesion or the like as the phrase class 1 (first attribute) and a state of the lesion or the like as the phrase class 2 (second attribute). Extract important words and phrases that represent. The set of important words and phrases extracted by the word and phrase extraction unit 11 is recorded in the recording unit 34 in association with the time when these sets of important words and phrases are uttered.

Returning to FIG. 8, the description after step S206 will be continued.
In step S206, the area setting unit 12 extracts the observation data corresponding to the time when the set of important words and phrases is uttered, and sets the target area in the image data based on the extracted observation data.

FIG. 10 is a diagram showing how the area setting unit sets the target area. As shown in FIG. 10, the area setting unit 12 extracts line-of-sight data as observation data corresponding to the times Speech 1 and 2 in which the set of important words is uttered, and based on the extracted line-of-sight data, the area setting unit 12 contains the line-of-sight data in the image data. The gazing points B1 and B2 that the user U1 gazes at are set. Since the central vision of a human is about 1 to 2 degrees, assuming that the central vision is 1.5 degrees and the distance between the user U1 and the display unit 20 is 50 cm, the user U1 is watching. The area can be regarded as a circle with a radius of 1.3 cm. The area setting unit 12 sets a circular area having a radius of 2 to 3 cm including the gazing points B1 and B2 as a target area in the image data so as to include this area. However, the area setting unit 12 may set a rectangular area including the gazing points B1 and B2 as a target area in the image data. Then, the area setting unit 12 records the set target area in the recording unit 34.

Subsequently, the generation unit 13 generates labeled image data in which the set of important words and phrases is associated with the target area (step S207). Further, the generation unit 13 outputs the labeled image data in which the set of important words and phrases is associated with the target area to the recording unit 34, and the information processing apparatus 1a ends the present processing.

According to the second embodiment described above, since the region such as a lesion is extracted by the important words and phrases having a plurality of attributes, the images can be classified in more detail. Further, since the area setting unit 12 sets the target area based on the line-of-sight data, the area that the user U1 is gazing at can be extracted more appropriately. In the second embodiment, an example in which the important phrase has two attributes has been described, but the present invention is not limited to this, and the important phrase may have three or more attributes.

In the second embodiment described above, the area setting unit 12 sets the gaze point that the user U1 gazes at in the image data based on the line-of-sight data, but the present invention is not limited to this. The area setting unit 12 may set a gazing point in the image data based on the position information of a pointing device such as a mouse. Then, the area setting unit 12 sets the area including the gazing point as the target area in the image data.

(Embodiment 3)
Next, the third embodiment of the present disclosure will be described. In the second embodiment described above, the information processing device 1a is the only component, but in the third embodiment, the information processing device is incorporated as a part of the microscope system. In the following, after explaining the configuration of the microscope system according to the third embodiment, the processing performed by the microscope system according to the third embodiment will be described. The same components as those of the information processing apparatus 1a according to the second embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.

[Structure of microscope system]
FIG. 11 is a block diagram showing a functional configuration of the microscope system according to the third embodiment. As shown in FIG. 11, the microscope system 100 includes an information processing device 1b, a display unit 20, a voice input unit 31, an operation unit 35, and a microscope 200.

[Microscope configuration]
First, the configuration of the microscope 200 will be described.
The microscope 200 has a substantially C-shaped housing portion 201, a stage 202 movably attached to the housing portion 201 in a three-dimensional direction, and a plurality of objective lenses 203 having different observation magnifications. , A revolver 204 for arranging a desired objective lens 203 according to a user's operation, and an imaging unit 205 composed of a CCD or CMOS for imaging a sample placed on a stage 202 via the objective lens 203. , The eyepiece 206 for observing the observation image of the specimen via the objective lens 203, the operation unit 207 for moving the stage 202 in the three-dimensional direction according to the user's operation, and the position of the stage 202 from the reference position. It includes a position detection unit 208 for detection, and a magnification detection unit 209 for detecting magnification information indicating the observation magnification at which the microscope 200 observes the sample, which is configured by using an encoder or the like.

[Configuration of information processing equipment]
Next, the configuration of the information processing apparatus 1b will be described.
The information processing device 1b includes a control unit 32b and a recording unit 34b in place of the control unit 32 and the recording unit 34 of the information processing device 1a according to the second embodiment described above.

The control unit 32b is configured by using a CPU, FPGA, GPU and the like, and controls the display unit 20, the voice input unit 31, and the microscope 200. The control unit 32b further includes an imaging control unit 324 and a magnification calculation unit 325 in addition to the observation information detection control unit 321, the voice input control unit 322, and the display control unit 323 of the control unit 32 of the second embodiment described above. ..

The observation information detection control unit 321 calculates the current position information of the stage 202 based on the position of the stage 202 from the reference position detected by the position detection unit 208, and outputs the calculation result to the recording unit 34b.

The shooting control unit 324 controls the operation of the imaging unit 205. The shooting control unit 324 generates image data (moving image) by sequentially imaging the image pickup unit 205 according to a predetermined frame rate. The shooting control unit 324 performs image processing (for example, development processing) on the image data input from the image pickup unit 205 and outputs the image data to the recording unit 34b.

The magnification calculation unit 325 calculates the observation magnification of the current microscope 200 based on the detection result input from the magnification detection unit 209, and outputs this calculation result to the recording unit 34b. For example, the magnification calculation unit 325 calculates the observation magnification of the current microscope 200 based on the magnification of the objective lens 203 and the magnification of the eyepiece unit 206 input from the magnification detection unit 209.

The recording unit 34b is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like. The recording unit 34b includes an image data recording unit 345 instead of the image data recording unit 343 according to the second embodiment described above. The image data recording unit 345 records the image data input from the shooting control unit 324, and outputs this image data to the generation unit 13.

[Processing of microscope system]
Next, the processing performed by the microscope system 100 will be described. FIG. 12 is a flowchart showing an outline of the processing performed by the microscope system according to the third embodiment.

As shown in FIG. 12, first, the control unit 32b generates observation data including the position information of the stage 202 calculated by the observation information detection control unit 321 and the observation magnification calculated by the magnification calculation unit 329, and the voice input unit 31. Each of the recorded voice data is recorded in the observation data recording unit 341 and the voice data recording unit 342 in association with the time measured by the time measurement unit 33 (step S301). After step S301, the microscope system 100 shifts to step S302 described later.

Steps S302 to S304 correspond to each of steps S203 to S205 in FIG. 8 described above. After step S304, the microscope system 100 proceeds to step S305.

In step S305, the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data. Specifically, the area setting unit 12 extracts the position information of the stage 202 and the observation magnification as observation data corresponding to the time when the important phrase is uttered, and records the observation field of view of the user at this time. To generate. In other words, the visual field data is an image displayed on the display unit 20 at the time when the important phrase is uttered. Then, the area setting unit 12 sets the area represented by the field of view data as the target area in the image data.

Subsequently, the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S306). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 34b, and the microscope system 100 ends this process.

According to the third embodiment described above, the user sets a target area such as a lesion for the image data recorded while observing the sample under a microscope using the microscope system 100, and generates labeled image data. can do. At this time, according to the microscope system 100, since the target area can be set by the voice of the user, the observation is not interrupted by the user looking at the keyboard, the pointing device, or the like during the observation, so that the observation efficiency is improved. Can be improved.

In the third embodiment described above, the area setting unit 12 extracts the target area using the position information of the stage 202 and the observation data including the observation magnification, but the present invention is not limited to this. The area setting unit 12 may set the target area in the image data based on the feature amount representing the correlation between the continuous frames and the movement vector between the continuous frames by using the image data. Specifically, when the area setting unit 12 determines that the correlation between consecutive frames is large (high degree of similarity), it determines that the user is gazing at this frame, and determines that this frame is the target area. May be set to. Similarly, when the area setting unit 12 determines that the movement vector between consecutive frames is small (the amount of movement between images is small), it determines that the user is gazing at this frame, and targets this frame. It may be set in the area.

Further, the microscope 200 may have a line-of-sight detection unit. The line-of-sight detection unit is provided inside or outside the eyepiece unit 206, generates line-of-sight data by detecting the line of sight of the user, and outputs the line-of-sight data to the information processing device 1b. The line-of-sight detection unit is provided inside the eyepiece 206, an LED light source that irradiates near infrared rays, and an optical sensor (for example, CMOS) that is provided inside the eyepiece 206 and captures pupil points and reflection points on the cornea. , CCD) and. The line-of-sight detection unit is generated by irradiating the user's cornea with near-infrared rays from an LED light source or the like under the control of the information processing device 1b, and the optical sensor images the pupil points and reflection points on the user's cornea. do. Then, the line-of-sight detection unit uses the pattern of the pupil point and the reflection point of the user based on the analysis result analyzed by image processing or the like on the data generated by the optical sensor under the control of the information processing device 1b. The line-of-sight data is generated by detecting the line-of-sight of the user, and this line-of-sight data is output to the information processing device 1b. Similar to the second embodiment, the area setting unit 12 sets a gazing point in the image observed by the user using the line-of-sight data, and sets a target area representing a lesion or the like so as to include the gazing point. You may.

(Embodiment 4)
Next, the fourth embodiment of the present disclosure will be described. In the fourth embodiment, the information processing device is incorporated into a part of the endoscope system. In the following, after explaining the configuration of the endoscope system according to the fourth embodiment, the processing executed by the endoscope system according to the fourth embodiment will be described. The same components as those of the information processing apparatus 1a according to the second embodiment described above are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate.

[Configuration of endoscope system]
FIG. 13 is a schematic view showing the configuration of the endoscope system according to the fourth embodiment. FIG. 14 is a block diagram showing a functional configuration of the endoscope system according to the fourth embodiment.

The endoscope system 300 shown in FIGS. 13 and 14 includes a display unit 20, an endoscope 400, a wearable device 500, an input unit 600, and an information processing device 1c.

[Construction of endoscope]
First, the configuration of the endoscope 400 will be described.
The endoscope 400 generates image data by inserting the user U3 such as a doctor or an operator into the subject U4 and imaging the inside of the subject U4, and transfers this image data to the information processing device 1c. Output. The endoscope 400 includes an image pickup unit 401 and an operation unit 402.

The imaging unit 401 is provided at the tip of the insertion unit of the endoscope 400. Under the control of the information processing device 1c, the image pickup unit 401 generates image data by imaging the inside of the subject U4 and outputs the image data to the information processing device 1c. The image pickup unit 401 is configured by using an optical system capable of changing the observation magnification, an image sensor such as CMOS or CCD that generates image data by receiving a subject image formed by the optical system, and the like.

The operation unit 402 receives inputs of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing device 1c.

[Wearable device configuration]
Next, the configuration of the wearable device 500 will be described.
The wearable device 500 is attached to the user U3, detects the line of sight of the user U3, and accepts the voice input of the user U3. The wearable device 500 includes a line-of-sight detection unit 510 and a voice input unit 520.

The line-of-sight detection unit 510 is provided in the wearable device 500, generates line-of-sight data by detecting the gaze degree of the line of sight of the user U3, and outputs the line-of-sight data to the information processing device 1c. Since the line-of-sight detection unit 510 has the same configuration as the line-of-sight detection unit according to the third embodiment described above, a detailed configuration will be omitted.

The voice input unit 520 is provided in the wearable device 500, generates voice data by receiving the voice input of the user U3, and outputs the voice data to the information processing device 1c. The voice input unit 520 is configured by using a microphone or the like.

[Structure of input unit]
The configuration of the input unit 600 will be described.
The input unit 600 is configured by using a mouse, a keyboard, a touch panel, and various switches. The input unit 600 receives inputs of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing device 1c.

[Configuration of information processing equipment]
Next, the configuration of the information processing apparatus 1c will be described.
The information processing device 1c includes a control unit 32c and a recording unit 34c in place of the control unit 32b and the recording unit 34b of the information processing device 1b according to the third embodiment described above.

The control unit 32c is configured by using a CPU, FPGA, GPU and the like, and controls the endoscope 400, the wearable device 500, and the display unit 20. The control unit 32c includes an operation history detection unit 326 in addition to a line-of-sight data detection control unit 321c, a voice input control unit 322, a display control unit 323, and a shooting control unit 324.

The operation history detection unit 326 detects the content of the operation received by the operation unit 402 of the endoscope 400, and outputs the detection result to the recording unit 34c. Specifically, when the magnifying switch is operated from the operation unit 402 of the endoscope 400, the operation history detection unit 326 detects the operation content and outputs the detection result to the recording unit 34c. The operation history detection unit 326 may detect the operation content of the treatment tool inserted into the subject U4 via the endoscope 400 and output the detection result to the recording unit 34c.

The recording unit 34c is configured by using a volatile memory, a non-volatile memory, a recording medium, and the like. The recording unit 34c further includes an operation history recording unit 346 in addition to the configuration of the recording unit 34c according to the third embodiment described above.

The operation history recording unit 346 records the operation history for the operation unit 402 of the endoscope 400 input from the operation history detection unit 326.

[Processing of endoscopic system]
Next, the process executed by the endoscope system 300 will be described. FIG. 15 is a flowchart showing an outline of the processing executed by the endoscope system according to the fourth embodiment.

As shown in FIG. 15, first, the control unit 32c includes image data generated by the imaging unit 401, line-of-sight data generated by the line-of-sight detection unit 510, voice data generated by the voice input unit 520, and operation history detection unit 326. Each of the detected operation histories is recorded in the image data recording unit 345, the line-of-sight data recording unit 341c, the voice data recording unit 342, and the operation history recording unit 346 in association with the time measured by the time measurement unit 33 (step S401). ). After step S401, the endoscope system 300 shifts to step S402 described later.

Steps S402 to S404 correspond to each of steps S302 to S304 in FIG. 12 described above. After step S404, the endoscope system 300 proceeds to step S405.

In step S405, the area setting unit 12 extracts the observation data corresponding to the time when the important phrase is uttered, and sets the target area in the image data based on the extracted observation data. Specifically, the area setting unit 12 sets a feature amount representing the correlation between consecutive frames in the image data recorded in the image data recording unit 345 as observation data corresponding to the time when the important phrase is uttered. calculate. Then, when the area setting unit 12 determines that the correlation between consecutive frames is large (high degree of similarity), the area setting unit 12 determines that the user U3 is gazing at this frame, and sets this frame as the target area. do. Further, the area setting unit 12 may calculate a movement vector between consecutive frames in the image data recorded in the image data recording unit 345. In this case, when the area setting unit 12 determines that the movement vector is small (the amount of movement between images is small), the area setting unit 12 determines that the user U3 is gazing at this frame, and sets this frame as the target area. ..

Subsequently, the generation unit 13 generates labeled image data in which important words and phrases are associated with the target area (step S406). Further, the generation unit 13 outputs labeled image data in which important words and phrases are associated with the target area to the recording unit 34c, and the endoscope system 300 ends this process.

According to the fourth embodiment described above, the user U3 sets a target area such as a lesion on the image data recorded while endoscopically observing the inside of the subject U4 using the endoscope system 300. It can be set and labeled image data can be generated. At this time, according to the endoscope system 300, since the target area can be set by the voice of the user U3, the observation is not interrupted by the user U3 looking at the keyboard, the pointing device, or the like during the observation. Therefore, the observation efficiency can be improved.

In the above-described third embodiment, the area setting unit 12 extracts the target area based on the feature amount representing the correlation between the continuous frames and the movement vector between the continuous frames, but the region setting unit 12 is limited to this. do not have. The area setting unit 12 may set a target area in the image data by using the line-of-sight data generated by the wearable device 500. Specifically, as in the second embodiment, the area setting unit 12 sets a gaze point in the image observed by the user U3 using the line-of-sight data, and a lesion or the like is included so as to include the gaze point. The target area representing the above may be set.

Further, in the fourth embodiment, the endoscope system is used, for example, a capsule-type endoscope, a video microscope for imaging a subject, a mobile phone having an imaging function, and a tablet-type terminal having an imaging function. Even if there is, it can be applied.

Further, in the fourth embodiment, the endoscope system is provided with a flexible endoscope, but the endoscope system with a rigid endoscope and the endoscope provided with an industrial endoscope are provided. It can be applied even in a system.

Further, in the fourth embodiment, the endoscope system includes an endoscope inserted into the subject, but the endoscope is a sinus endoscope and an endoscope system such as an electric knife or an examination probe. Can also be applied.

(Other embodiments)
Various inventions can be formed by appropriately combining the plurality of components disclosed in the above-described embodiments 1 to 4. For example, some components may be deleted from all the components described in the above-described embodiments 1 to 4. Further, the components described in the above-described embodiments 1 to 4 may be appropriately combined.

Further, in the first to fourth embodiments, the above-mentioned "part" can be read as "means" or "circuit". For example, the control unit can be read as a control means or a control circuit.

Further, the programs to be executed by the information processing apparatus according to the first to fourth embodiments are file data in an installable format or an executable format, such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital). It is recorded and provided on a computer-readable recording medium such as a Versail Disc), a USB medium, or a flash memory.

Further, the program to be executed by the information processing apparatus according to the first to fourth embodiments may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program to be executed by the information processing apparatus according to the first to fourth embodiments may be provided or distributed via a network such as the Internet.

Further, in the first to fourth embodiments, signals are transmitted from various devices via a transmission cable, but for example, it does not have to be wired and may be wireless. In this case, signals may be transmitted from each device in accordance with a predetermined wireless communication standard (for example, Wi-Fi (registered trademark) or Bluetooth (registered trademark)). Of course, wireless communication may be performed according to other wireless communication standards.

In the description of the flowchart in the present specification, the context of the processing between steps is clarified by using expressions such as "first", "after", and "continued", but in order to carry out the present invention. The order of processing required is not uniquely defined by those representations. That is, the order of processing in the flowchart described in the present specification can be changed within a consistent range.

Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are exemplary and various, including the embodiments described in the disclosure section of the present invention, based on the knowledge of those skilled in the art. It is possible to carry out the present invention in another modified or improved form.

1

Information processing system

10,1a, 1b, 1c Information processing device 11 Word extraction unit 12 Area setting unit 13

Generation unit

14, 34, 34b, 34c Recording unit 15 Display control unit 20 Display unit 30 Observation information detection unit 31,520 Voice Input unit 32, 32b, 32c Control unit 33 Time measurement unit 35 Operation unit 100 Microscope system 111 Conversion unit 112 Extraction unit 200 Microscope 201 Housing unit 202 Stage 203 Objective lens 204 Revolver 205 Imaging unit 206 Eyepiece 207 Operation unit 208 Position Detection unit 209 Magnification detection unit 401 Imaging unit 300 Endoscope system 321 Observation information detection control unit 321c Line-of-sight data detection control unit 322 Voice input control unit 324 Imaging control unit 325 Magnification calculation unit 326 Operation history detection unit 341 Observation data recording unit 341c Line-of-sight data recording unit 342 Voice data recording unit 343,345 Image data recording unit 344 Program recording unit 345 Image data recording unit 346 Operation history recording unit 400 Endoscope 402 Operation unit 500 Wearable device 510 Line-of-sight detection unit 600 Input unit

Claims

A word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data.
From the observation data recording the observation mode for the image data, the observation data corresponding to the time when the predetermined word is spoken is extracted, and the target area is set in the image data based on the extracted observation data. Area setting part and
A generation unit that generates labeled image data in which the predetermined word / phrase is associated with the target area, and the generation unit.
Information processing device equipped with.
The phrase extraction unit
A conversion unit that converts the voice data into text data,
An extraction unit that extracts important words and phrases recorded in the recording unit from the text data as the predetermined words and phrases, and an extraction unit.
The information processing apparatus according to claim 1.
The extraction unit includes an important phrase storage unit for storing the important phrase.
The important phrase storage unit assigns a predetermined attribute to the important phrase and stores it.
The information processing device according to claim 2, wherein the extraction unit extracts one or more of the important words / phrases included in each attribute from the text data as the predetermined words / phrases.
The attributes of the important words are
The first attribute representing the site or origin of the lesion,
The second attribute, which represents the condition of the lesion,
The information processing apparatus according to claim 3.
The text data is text data uttered as one phrase, and is
The fourth aspect of claim 4, wherein the extraction unit extracts one or more of the important words and phrases from the text data, and the label corresponding to the target area is represented by a set of the important words and phrases belonging to one or more attributes. Information processing equipment.
The area setting unit preferentially extracts the observation data having a large observation magnification.
The information processing apparatus according to any one of claims 1 to 5, wherein the generation unit gives higher importance to the labeled image data as the observation magnification is larger.
The area setting unit preferentially extracts the observation data having a small moving speed of the observation area.
The information processing apparatus according to any one of claims 1 to 5, wherein the generation unit gives higher importance to the labeled image data as the moving speed of the observation region is smaller.
The area setting unit preferentially extracts the observation data in which the observation area stays in a predetermined area for a long time.
The information processing apparatus according to any one of claims 1 to 5, wherein the generation unit imparts higher importance to the labeled image data as the observation region stays in a predetermined region for a longer period of time.
The information processing device according to any one of claims 1 to 8, wherein the observation data includes line-of-sight data for detecting the line-of-sight of the user.
The information processing device according to any one of claims 1 to 9, wherein the observation data includes visual field data recording the observation visual field of the user.
The information processing device according to any one of claims 1 to 10, wherein the observation data includes position information of a pointing device.
A phrase extractor that extracts a predetermined phrase from voice data that records the voice spoken by the user while observing the frame image, and a phrase extractor.
An area setting unit that extracts the frame image corresponding to the time when the predetermined word is uttered and sets the extracted frame image in the target area.
A generation unit that generates labeled image data in which the predetermined word / phrase is associated with the target area, and the generation unit.
Information processing device equipped with.
A word / phrase extraction unit that extracts a predetermined word / phrase from the voice data that records the voice uttered by the user while observing the image data.
From the observation data recording the observation mode for the image data, the observation data corresponding to the time when the predetermined word is spoken is extracted, and the target area is set in the image data based on the extracted observation data. Area setting part and
A generation unit that generates labeled image data in which the predetermined word / phrase is associated with the target area, and the generation unit.
A model generation unit that performs machine learning using the labeled image data and generates a trained model that detects a region corresponding to the predetermined phrase from the image group.
A learning device equipped with.
A predetermined phrase is extracted from the voice data obtained by recording the voice uttered by the user while observing the image data for learning.
From the observation data recording the observation mode for the learning image data, the observation data corresponding to the time when the predetermined word is spoken is extracted, and the target for the learning image data is based on the extracted observation data. Set the area and
Generates labeled image data in which the predetermined phrase is associated with the target area.
When the image data for determination is input, which is generated by machine learning using the labeled image data, a region presumed to be the target region associated with the predetermined phrase is extracted, and the region is the target region. To output the likelihood that is
A trained model for making your computer work.