CN112885435A

CN112885435A - Method, device and system for determining image target area

Info

Publication number: CN112885435A
Application number: CN201911195964.5A
Authority: CN
Inventors: 王纯亮
Original assignee: Tianjin Tuoying Technology Co ltd
Current assignee: Tianjin Tuoying Technology Co ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-01
Anticipated expiration: 2039-11-29
Also published as: CN112885435B; WO2021103316A1

Abstract

The disclosure relates to a method, a device and a system for determining an image target area, and relates to the technical field of image processing. The method comprises the following steps: determining each concerned position of the user on the target picture according to the eyeball movement information of the user in the process of observing the target picture; extracting each attention area on the target picture by using a machine learning model; and determining a target area on the target picture according to each attention position and each attention area.

Description

Method, device and system for determining image target area

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method for determining an image target area, an apparatus for an image target area, a system for an image target area, and a computer-readable storage medium.

Background

At present, computer technology can be used as an important auxiliary processing means and applied to various technical fields. For example, computer-aided diagnosis technology and computer-aided detection technology can assist in finding out lesions by combining computer analysis and calculation through imaging, medical image processing technology and other possible physiological and biochemical means, thereby improving the accuracy of diagnosis. Therefore, extracting various information in an image is particularly important as an object of analysis calculation.

In the related art, important information in an image may be extracted through a correlation method of artificial intelligence, such as a deep neural network.

Disclosure of Invention

The inventors of the present disclosure found that the following problems exist in the above-described related art: the extracted important information of the image is often not required, so that the accuracy and the efficiency of image processing are low.

In view of this, the present disclosure provides a technical solution for determining an image target area. The accuracy and efficiency of image processing are improved.

According to some embodiments of the present disclosure, there is provided a method for determining an image target area, including: determining each concerned position of the user on the target picture according to the eyeball movement information of the user in the process of observing the target picture; extracting each attention area on the target picture by using a machine learning model; and determining a target area on the target picture according to each attention position and each attention area.

In some embodiments, determining the target region on the target picture according to the respective attention position and the respective attention region comprises: determining the position attention of the user to each attention position according to the motion information; and determining the target area in each attention area according to each position attention degree.

In some embodiments, determining the target region in each region of interest based on the respective location attentions comprises: determining the regional attention of the attention region according to the position attention of each attention position contained in the attention region; and determining the corresponding attention area as the target area when the area attention degree is smaller than the threshold value.

In some embodiments, determining the user's location attention to each of the attention locations based on the motion information comprises: and determining the gazing time of the user to each attention position according to the motion information for determining the attention degree of the position.

In some embodiments, determining, from the motion information, each focus position of the user on the target picture comprises: determining each fixation point of the user on the target picture according to the motion information; and determining each attention position according to the track formed by each attention point.

In some embodiments, the movement information of the eyeball comprises at least one of a movement of the eyeball relative to the head or a position of the eyeball.

In some embodiments, the target image is a medical image, the focus position is a focus position of a diagnostician, and the focus region is a suspected lesion region.

In some embodiments, the machine learning model is trained by: acquiring at least one of attention positions and corresponding position attention degrees of a user on each training picture as attention information, wherein the training pictures are pictures of the same type as a target picture; and taking each training picture and the attention information as input, and taking each attention area of each training picture as a labeling result to train the machine learning model.

According to further embodiments of the present disclosure, there is provided an apparatus for determining an image target area, including: the position determining unit is used for determining each concerned position of the user on the target picture according to the eyeball motion information of the user in the process of observing the target picture; the extraction unit is used for extracting each attention area on the target picture by utilizing a machine learning model; and the area determining unit is used for determining a target area on the target picture according to each attention position and each attention area.

In some embodiments, the region determining unit determines a position attention of the user to each attention position based on the motion information, and determines the target region in each attention region based on each position attention.

In some embodiments, the region determining unit determines the region attention of the attention region according to the position attention of each attention position included in the attention region, and determines the corresponding attention region as the target region when the region attention is smaller than a threshold.

In some embodiments, the region determination unit determines a user's gaze time for each location of interest based on the motion information for determining the location attention.

In some embodiments, the position determining unit determines each gaze point of the user on the target picture according to the motion information, and determines each attention position according to a track formed by each gaze point.

According to still other embodiments of the present disclosure, there is provided an apparatus for determining an image target area, including: a memory; and a processor coupled to the memory, the processor configured to perform the method of determining the image target area in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of determining an image target area in any of the above-described embodiments.

According to still further embodiments of the present disclosure, there is provided an image target area determination system including: the image target area determination device in any one of the above embodiments; and the eye tracker is used for acquiring the movement information of eyeballs of the user in the process of observing the target picture.

In the above embodiment, the important information in the picture is determined by combining the attention position acquired according to the eyeball motion of the user and the attention region extracted by the machine learning model. Therefore, the accuracy and the efficiency of image processing can be improved by combining the actual attention requirement of a user and the high performance of artificial intelligence.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

FIG. 1 illustrates a flow diagram of some embodiments of an image target area determination method of the present disclosure;

FIG. 2 illustrates a flow diagram of some embodiments of step 130 in FIG. 1;

FIG. 3 illustrates a flow diagram of some embodiments of step 1320 in FIG. 2;

FIG. 4 illustrates a flow diagram of further embodiments of the image target area determination method of the present disclosure;

FIG. 5 illustrates a block diagram of some embodiments of an image target area determination apparatus of the present disclosure;

FIG. 6 shows a block diagram of further embodiments of the image target area determination apparatus of the present disclosure;

FIG. 7 illustrates a block diagram of still further embodiments of an image target area determination apparatus of the present disclosure;

fig. 8 illustrates a block diagram of some embodiments of the image target area determination system of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 illustrates a flow diagram of some embodiments of an image target area determination method of the present disclosure.

As shown in fig. 1, the method includes: step 110, determining each attention position through eyeball motion; step 120, determining each attention area by the machine learning model; and step 130, determining a target area.

In step 110, according to the eyeball motion information of the user in the process of observing the target picture, each attention position of the user on the target picture is determined. For example, the movement information of the eyeball includes at least one of a movement of the eyeball relative to the head or a position of the eyeball.

In some embodiments, Eye Tracking (Eye Tracking) may be performed on the user, and Tracking of Eye movement may be achieved by measuring the position of the point of regard of the Eye or the movement of the Eye relative to the head. For example, the position of the eyeball may be tracked and measured by an eye tracker (e.g., via a video camera) to obtain the movement information of the eyeball.

In some embodiments, the target picture can be obtained and the movement information of the eyeball can be tracked and measured through a Screen-Based Eye Tracker (Screen-Based Eye Tracker); the target picture may be acquired and the movement information of the eyeballs may be tracked and measured by a device such as Eye Tracking Glasses that can take the target picture from the perspective of the observer.

For example, a Screen-Based Eye Tracker can be used to take a medical image displayed on a Screen as a target image and track the eyeball motion of a diagnostician, so as to perform computer-aided detection; the Eye Tracking Glasses are utilized to acquire a target image containing a vehicle or traffic sign seen by a driver in real time, and track the eyeball movement of the driver, so that computer-aided driving is performed.

In some embodiments, each fixation point of the user on the target picture is determined according to the motion information; and determining each attention position according to the track formed by each gaze point.

In step 120, each region of interest on the target picture is extracted using a machine learning model. For example, training can be used to enable the machine learning model to extract a face region in a portrait picture, or extract a lesion region in a medical picture, and the like.

In some embodiments, the machine learning model may be various neural network models capable of extracting image features. For example, a convolutional neural network model may be utilized to determine regions of interest on the target picture.

For example, for an application scenario of computer-aided examination, heatmaps (Heat maps) visually tracked by medical experts may be recorded while the medical experts are observing multiple medical image pictures; the acquired heat map (e.g., the heat map may be thresholded) is then trained as an output of the machine learning model.

After training is finished, a medical image picture is input into the machine learning model, and then the position concerned by the expert in the field can be estimated. The "expert in the art" may be the user himself, in which case the result of the presumption may be the correct observation of the user with or without waking.

In this way, even if the user is not awake enough (e.g., dozing) while viewing the image, the machine learning model may still indicate the "emphasis" that was missed when viewing it, i.e., the location that the "expert in the field" would be concerned with.

In step 130, a target area on the target picture is determined based on each attention position and each attention area.

In some embodiments, in the case where the position of interest and the region of interest overlap exceeds a threshold, the region of interest is determined as the target region. Under the condition, the target area is important information required by the user and is also an important area screened by an artificial intelligence method, and the target area can be used as the important information for further processing such as face recognition, target tracking, medical diagnosis and the like.

In some embodiments, step 130 may be performed by the embodiment in fig. 2.

Fig. 2 illustrates a flow diagram of some embodiments of step 130 in fig. 1.

As shown in fig. 2, step 130 includes: step 1310, determining a position attention; and step 1320, determining the target area.

In step 1310, the position attention of the user to each attention position is determined based on the motion information. For example, the gaze time of the user at each attention position may be determined from the motion information for determining the position attention. The attention degree of the corresponding attention position can also be determined according to other factors such as the change of the pupil, the rotation of the eyeball and the like.

In step 1320, a target region is identified in each region of interest based on each location attention.

In some embodiments, when the attention degree of the corresponding position of the attention area is greater than a threshold, the attention area may be determined as a target area, in which case, the target area is both important information required by a user and an important area screened by an artificial intelligence method, and the target area may be used as important information for further processing such as face recognition, target tracking, medical diagnosis, and the like.

In some embodiments, step 1320 may be performed by the embodiment in fig. 3.

FIG. 3 illustrates a flow diagram for some embodiments of step 1320 in FIG. 2.

As shown in fig. 3, step 1320 includes: step 310, determining the attention of the region; and step 320, determining a target area.

In step 310, the region attention degree of the attention region is determined based on the position attention degree of each attention position included in the attention region. For example, when the overlapping area of the attention position and the attention region is larger than the area threshold, it is determined that the attention region includes the attention position.

In step 320, in case the region attention is smaller than the threshold, the corresponding attention region is determined as the target region.

In this case, the target area determined by the artificial intelligence method may be important information required by the user, but the user does not give sufficient attention. In this way, the target area can be provided for the user as important information for further processing such as face recognition, target tracking, medical diagnosis and the like, thereby improving the accuracy and efficiency of image processing.

In some embodiments, the target picture is a monitoring picture, the attention position is a position where the monitor pays attention to, and the attention area is a suspected face area.

In some embodiments, the target image is a medical image, the focus position is a focus position of a diagnostician, and the focus region is a suspected lesion region. For example, the embodiment of fig. 4 may be used to give medical image pictures for computer-aided detection.

Fig. 4 shows a flow chart of further embodiments of the image target area determination method of the present disclosure.

As shown in fig. 4, the method includes: step 410, inputting a medical image picture; step 420, eye movement tracking is carried out; 430, artificial intelligence detection; step 440, acquiring a heatmap; step 450, determining a suspected lesion area; and step 460, determining a prompt area.

In step 410, medical image pictures are input into the system, so that the imaging physician can read the medical images through the display device, and the computer can perform corresponding processing. For example, the medical image may be an image generated by a nuclear magnetic device, a CT (Computed Tomography) device, a DR (Digital Radiography) device, an ultrasound device, an X-ray machine, or the like.

In step 420, in the process of reading the film by a doctor, recording a running track of a doctor's Gaze Point (Gaze Point) in the whole film reading process by using an eye tracker; and determining the attention of the doctor at each point or certain areas according to the running track. For example, the degree of attention may be determined from an awareness gaze time. The gaze point is the basic unit of measurement for the eye tracker, and one gaze point is equal to one raw sample captured by the eye tracker.

In step 430, a heat map is generated based on the attention. For example, in the process of reading the medical image, the longer the doctor looks at a position on the medical image, the darker the color of the region of the corresponding heat map of the medical image where the position is located. The heat map may be divided into a plurality of physician interest areas (e.g., by clustering, etc.) based on the shades of colors in the heat map.

In step 440, the medical image is processed using artificial intelligence (e.g., neural network) to extract one or more machine regions of interest (regions of interest). For example, a neural network model may be trained to identify disease-related lesion regions in an image for processing medical image pictures. Step 420 and step 440 are not performed in sequence.

In step 450, each machine region of interest is determined to be a suspected lesion region.

In step 460, the output areas of the two sets of systems (eye tracking system, artificial intelligence system) are compared.

In some embodiments, each region of physician interest may be matched to each region of machine interest based on the location information (e.g., may be based on the overlap area of the regions). The degree of attention of the machine region of interest may be determined from the degree of attention of the physician region of interest matching the machine region of interest.

In some embodiments, the machine attention area is prompted to a physician if the corresponding attention of the machine attention area is below an attention threshold. For example, by displaying a highlight in a corresponding area of the medical image picture, popping up a floating window, sounding a voice prompt, etc.

In some embodiments, a focus threshold may be set for each machine region of interest based on at least one of the anatomy of the machine region of interest, the characteristic of the lesion, the physician's interpretation habits (which may be extracted from the training data).

In some embodiments, the physician regions of interest 1-4 are matched to the machine regions of interest 1-4, respectively, and the degree of interest of the physician region of interest 4 corresponding to the machine region of interest 4 is less than the threshold of interest. In this case, the physician may be prompted to focus on the machine region of interest 4 to improve accuracy and efficiency.

For example, a physician makes a lung nodule diagnosis. In the process of observing corresponding medical image pictures, doctors find 4 lung nodule areas (which can be obtained by eye movement tracking); through artificial intelligence detection, 5 lung nodule regions are found in the medical image picture. 4 of the artificial intelligently discovered lung nodule regions are the same as the one discovered by the physician, in which case the physician may be prompted to read only the missing 1 of the artificial intelligently discovered lung nodule regions. Therefore, doctors do not need to see all the 5 lung nodule areas found by artificial intelligence once, so that the film reading time is greatly reduced, and the processing efficiency and the accuracy of the system are improved.

Fig. 5 shows a block diagram of some embodiments of the image target area determination apparatus of the present disclosure.

As shown in fig. 5, the image target region determining apparatus 5 includes a position determining unit 51, an extracting unit 52, and an extracting unit 53.

The position determining unit 51 determines each attention position of the user on the target picture according to the eyeball motion information of the user in the process of observing the target picture. For example, the movement information of the eyeball includes at least one of a movement of the eyeball relative to the head or a position of the eyeball.

In some embodiments, the position determination unit 51 determines, according to the motion information, respective gaze points of the user on the target picture; and determining each attention position according to the track formed by each attention point.

The extraction unit 52 extracts each region of interest on the target picture using a machine learning model.

The region determining unit 53 determines a target region on the target picture from each attention position and each attention region.

In some embodiments, the region determining unit 53 determines the position attention of the user to each attention position based on the motion information, and determines the target region in each attention region based on each position attention.

In some embodiments, the region determining unit 53 determines the region attention of the attention region according to the position attention of each attention position included in the attention region; and determining the corresponding attention area as the target area when the area attention degree is smaller than the threshold value.

In some embodiments, the area determination unit 53 determines a user's gaze time for each location of interest based on the motion information for determining the location attention.

Fig. 6 shows a block diagram of further embodiments of the image target area determination apparatus of the present disclosure.

As shown in fig. 6, the image target area determination device 6 of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, the processor 62 being configured to execute the method for determining the image target area in any one of the embodiments of the present disclosure based on instructions stored in the memory 61.

The memory 61 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, application programs, a boot loader, a database, and other programs.

Fig. 7 shows a block diagram of further embodiments of the image target area determination apparatus of the present disclosure.

As shown in fig. 7, the image target area determination device 7 of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710, the processor 720 being configured to execute the method for determining the image target area in any of the above embodiments based on instructions stored in the memory 710.

The memory 710 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a boot loader, and other programs.

The image target area determining means 7 may further include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These

interfaces

730, 740, 750, as well as the memory 710 and the processor 720, may be connected, for example, by a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 740 provides a connection interface for various networking devices. The storage interface 750 provides a connection interface for external storage devices such as an SD card and a usb disk.

As shown in fig. 8, the image target area determination system 8 includes the image target area determination device 81 and the eye tracker 82 in any of the above embodiments.

The eye tracker 82 is used for acquiring the movement information of the eyeball of the user in the process of observing the target picture.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein.

So far, the determination method of the image target area, the apparatus of the image target area, the system of the image target area, and the computer readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of determining an image target area, comprising:

determining each concerned position of a user on a target picture according to eyeball movement information of the user in the process of observing the target picture;

extracting each attention area on the target picture by using a machine learning model;

and determining a target area on the target picture according to each attention position and each attention area.

2. The determination method according to claim 1, wherein the determining a target region on the target picture according to each attention position and each attention region comprises:

determining the position attention of the user to each attention position according to the motion information;

and determining the target area in each attention area according to each position attention degree.

3. The determination method according to claim 2, wherein the determining the target region in the respective attention regions according to the respective position attention degrees includes:

determining the regional attention of the attention region according to the position attention of each attention position contained in the attention region;

and determining the corresponding attention area as the target area under the condition that the area attention degree is smaller than a threshold value.

4. The determination method according to claim 2, wherein the determining, according to the motion information, the position attention of the user to the respective attention positions includes:

and according to the motion information, determining the gazing time of the user to each attention position for determining the position attention degree.

5. The determination method according to claim 1, wherein the determining, according to the motion information, each attention position of the user on the target picture comprises:

determining each fixation point of the user on the target picture according to the motion information;

and determining each concerned position according to the track formed by each point of regard.

6. The determination method according to claim 1,

the movement information of the eyeball includes at least one of a movement of the eyeball relative to the head or a position of the eyeball.

7. The determination method according to claims 1-6,

the target picture is a medical image picture, the focus position is a focus position of a diagnostician, and the focus area is a suspected lesion area.

8. The determination method according to claims 1-6,

the machine learning model is trained by the following steps:

acquiring at least one of attention positions and corresponding position attention degrees of a user on each training picture as attention information, wherein the training pictures are pictures of the same type as the target pictures;

and taking the training pictures and the attention information as input, and taking the attention areas of the training pictures as labeling results to train the machine learning model.

9. An apparatus for determining an image target area, comprising:

the position determining unit is used for determining each concerned position of the user on the target picture according to the eyeball motion information of the user in the process of observing the target picture;

an extraction unit, configured to extract, by using a machine learning model, each attention region on the target picture;

and the area determining unit is used for determining a target area on the target picture according to each attention position and each attention area.

10. The determination apparatus according to claim 9,

the region determining unit determines a position attention of the user to each attention position according to the motion information, and determines the target region in each attention region according to the position attention.

11. The determination apparatus according to claim 10,

the region specifying unit specifies a region attention degree of a region of interest based on a position attention degree of each attention position included in the region of interest, and specifies the corresponding region of interest as the target region when the region attention degree is smaller than a threshold value.

12. The determination apparatus according to claim 10,

and the area determining unit determines the watching time of the user to each concerned position according to the motion information to determine the position attention.

13. The determination apparatus according to claim 10,

and the position determining unit determines each fixation point of the user on the target picture according to the motion information, and determines each concerned position according to a track formed by each fixation point.

14. The determination apparatus according to claim 10,

15. The determination apparatus according to claims 9-14,

16. The determination apparatus according to claims 9-14,

the machine learning model is trained by the following steps:

17. An apparatus for determining an image target area, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of image target area determination of any of claims 1-8 based on instructions stored in the memory.

18. A system for determining an image target area, comprising:

image target area determination means as claimed in any one of claims 9 to 17; and

the eye tracker is used for acquiring the movement information of eyeballs of a user in the process of observing a target picture.

19. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of determining an image target area according to any one of claims 1 to 8.