CN112885435B

CN112885435B - Method, device and system for determining image target area

Info

Publication number: CN112885435B
Application number: CN201911195964.5A
Authority: CN
Inventors: 王纯亮
Original assignee: Tianjin Tuoying Technology Co ltd
Current assignee: Tianjin Tuoying Technology Co ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2023-04-21
Anticipated expiration: 2039-11-29
Also published as: CN112885435A; WO2021103316A1

Abstract

The disclosure relates to a method, a device and a system for determining an image target area, and relates to the technical field of image processing. The method comprises the following steps: according to the motion information of eyeballs of a user in the process of observing a target picture, determining each concerned position of the user on the target picture; extracting each concerned region on the target picture by using a machine learning model; and determining the target area on the target picture according to each attention position and each attention area.

Description

Method, device and system for determining image target area

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method for determining an image target area, an apparatus for determining an image target area, a system for determining an image target area, and a computer readable storage medium.

Background

At present, computer technology can be used as an important auxiliary processing means and applied to various technical fields. For example, the computer aided diagnosis technology and the computer aided detection technology can be used for assisting in finding focus by combining the analysis and calculation of a computer through imaging, medical image processing technology and other possible physiological and biochemical means, thereby improving the accuracy of diagnosis. Therefore, it is particularly important to extract various information in an image as an object of analysis calculation.

In the related art, important information in an image may be extracted through a related method of artificial intelligence, such as a deep neural network, etc.

Disclosure of Invention

The inventors of the present disclosure found that the above-described related art has the following problems: the extracted important information of the image is often not needed, so that the accuracy and the efficiency of the image processing are low.

In view of this, the present disclosure proposes a determination technical solution for an image target area, which is capable of. The accuracy and efficiency of image processing are improved.

According to some embodiments of the present disclosure, there is provided a method of determining an image target area, including: according to the motion information of eyeballs of a user in the process of observing a target picture, determining each concerned position of the user on the target picture; extracting each concerned region on the target picture by using a machine learning model; and determining the target area on the target picture according to each attention position and each attention area.

In some embodiments, determining the target region on the target picture from each of the locations of interest and each of the regions of interest comprises: determining the position attention degree of the user to each attention position according to the motion information; and determining a target area in each attention area according to the attention degree of each position.

In some embodiments, determining the target region in each region of interest based on each positional interest comprises: determining the regional attention degree of the attention region according to the position attention degree of each attention position contained in the attention region; and determining the corresponding region of interest as a target region in the case that the region attention is smaller than the threshold value.

In some embodiments, determining the user's location interest level for each of the locations of interest based on the motion information comprises: and determining the fixation time of the user to each concerned position according to the motion information, wherein the fixation time is used for determining the position attention degree.

In some embodiments, determining the respective focus positions of the user on the target picture based on the motion information includes: according to the motion information, determining each fixation point of a user on a target picture; and determining each attention position according to the track formed by each attention point.

In some embodiments, the movement information of the eye includes at least one of movement of the eye relative to the head or eye position.

In some embodiments, the target picture is a medical image picture, the location of interest is a diagnostician location of interest, and the region of interest is a suspected lesion region.

In some embodiments, the machine learning model is trained by: acquiring at least one of each concerned position and corresponding position concerned degree of a user on each training picture as concerned information, wherein the training pictures are pictures of the same type as the target pictures; and training a machine learning model by taking each training picture and the attention information as input and taking each attention area of each training picture as a labeling result.

According to still further embodiments of the present disclosure, there is provided an apparatus for determining a target area of an image, including: the position determining unit is used for determining each concerned position of the user on the target picture according to the movement information of the eyeballs of the user in the process of observing the target picture; the extraction unit is used for extracting each concerned region on the target picture by utilizing the machine learning model; and the region determining unit is used for determining a target region on the target picture according to each attention position and each attention region.

In some embodiments, the region determining unit determines a position attention of the user to each attention position based on the motion information, and determines the target region in each attention region based on each position attention.

In some embodiments, the region determining unit determines a region attention of the region of interest according to a position attention of each attention position included in the region of interest, and determines the corresponding region of interest as the target region in a case where the region attention is smaller than a threshold.

In some embodiments, the area determining unit determines, based on the motion information, a time of gaze of the user for each of the positions of interest for determining the position attention.

In some embodiments, the position determining unit determines each gaze point of the user on the target picture based on the motion information, and determines each focus position based on a trajectory formed by each gaze point.

According to still further embodiments of the present disclosure, there is provided a determination apparatus of an image target area, including: a memory; and a processor coupled to the memory, the processor configured to perform the method of determining the image target area in any of the embodiments described above based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of determining an image target area in any of the above embodiments.

According to still further embodiments of the present disclosure, there is provided a system for determining an image target area, including: the image target area determining device in any one of the above embodiments; and the eye movement instrument is used for acquiring the movement information of the eyeballs of the user in the process of observing the target picture.

In the above embodiment, the important information in the picture is determined in combination with the attention position acquired from the eyeball action of the user and the attention region extracted by the machine learning model. Thus, the accuracy and efficiency of image processing can be improved by combining the actual attention requirements of users and the high performance of artificial intelligence.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 illustrates a flow chart of some embodiments of a method of determining an image target region of the present disclosure;

FIG. 2 illustrates a flow chart of some embodiments of step 130 of FIG. 1;

FIG. 3 illustrates a flow chart of some embodiments of step 1320 in FIG. 2;

FIG. 4 illustrates a flow chart of further embodiments of a method of determining an image target area of the present disclosure;

FIG. 5 illustrates a block diagram of some embodiments of a determination apparatus of an image target area of the present disclosure;

FIG. 6 illustrates a block diagram of further embodiments of an apparatus for determining an image target area of the present disclosure;

FIG. 7 illustrates a block diagram of still further embodiments of a determination apparatus of an image target area of the present disclosure;

fig. 8 illustrates a block diagram of some embodiments of a determination system of image target areas of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the authorization specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Fig. 1 illustrates a flow chart of some embodiments of a method of determining an image target area of the present disclosure.

As shown in fig. 1, the method includes: step 110, eyeball motion determines each focus position; step 120, determining each region of interest by the machine learning model; and step 130, determining a target area.

In step 110, each focus position of the user on the target picture is determined according to the movement information of the eyeball of the user during the process of observing the target picture. For example, the movement information of the eyeball includes at least one of movement of the eyeball relative to the head or the eyeball position.

In some embodiments, eye Tracking (Eye Tracking) may be performed on the user, with Eye movement being tracked by measuring the position of the gaze point of the Eye or the movement of the Eye relative to the head. For example, the movement information of the eyeball may be obtained by tracking the position of the measured eyeball by an eye tracker (e.g., by a video capturing device).

In some embodiments, the target picture can be acquired by an eye tracker (Screen-Based Eye Tracker) based on Screen display and the motion information of the eyeball is tracked and measured; the device capable of taking the target picture from the angle of the observer can also be used for acquiring the target picture and tracking and measuring the movement information of the eyeballs through eye tracking glasses (Eye Tracking Glasses).

For example, using Screen-Based Eye Tracker, a medical image picture displayed on a Screen can be used as a target picture and eye movements of a diagnostician can be tracked for computer-aided detection; the Eye Tracking Glasses is used for acquiring a target image containing a vehicle or a traffic sign seen by a driver in real time and tracking the eyeball movement of the driver so as to carry out computer-aided driving.

In some embodiments, determining each gaze point of the user on the target picture from the motion information; and determining each attention position according to the track formed by each attention point.

In step 120, each region of interest on the target picture is extracted using the machine learning model. For example, machine learning models may be trained to extract face regions in portrait pictures, or to extract lesion regions in medical pictures, etc.

In some embodiments, the machine learning model may be various neural network models capable of extracting image features. For example, convolutional neural network models may be utilized to determine regions of interest on a target picture.

For example, for an application scenario of computer-aided inspection, a Heat map (Heat maps) of visual tracking of each medical expert may be recorded when multiple medical experts observe multiple medical image pictures; the acquired heat map (e.g., the heat map may be thresholded) is then trained as an output of the machine learning model.

After training, a medical image picture is input into the machine learning model, and the position where 'the expert in the field' will pay attention can be deduced. The "expert in the field" may also be the user himself, in which case the presumption is the correct observation of the user with wakefulness or without fatigue.

Thus, even if the user is not awake enough (e.g., dozing off) to view the image, the machine learning model may still indicate "emphasis" that was missed in its view, i.e., the location where "expert in the field" would be concerned.

In step 130, a target region on the target picture is determined according to each focus position and each focus region.

In some embodiments, the region of interest is determined to be the target region if the location of interest and the region of interest overlap exceeds a threshold. In this case, the target area is not only important information required by the user but also important areas screened by the artificial intelligence method, and the target area can be used as important information for further processing such as face recognition, target tracking, medical diagnosis and the like.

In some embodiments, step 130 may be performed by the embodiment of fig. 2.

Fig. 2 shows a flow chart of some embodiments of step 130 in fig. 1.

As shown in fig. 2, step 130 includes: step 1310, determining a position attention; and step 1320, determining a target area.

In step 1310, a position attention of the user to each attention position is determined according to the motion information. For example, based on the motion information, a time of gaze of the user at each location of interest may be determined for determining the location attention. The attention degree of the corresponding attention position can be determined according to other factors such as pupil change, eyeball rotation and the like.

In step 1320, a target region is determined in each region of interest according to each position attention.

In some embodiments, the attention area may be determined as a target area when the attention degree of the corresponding position of the attention area is greater than a threshold value, where the target area is not only important information required by the user but also important areas screened by the artificial intelligence method, and the target area may be used as important information for further processing such as face recognition, target tracking, medical diagnosis, and the like.

In some embodiments, step 1320 may be performed by the embodiment in fig. 3.

Fig. 3 illustrates a flow chart of some embodiments of step 1320 in fig. 2.

As shown in fig. 3, step 1320 includes: step 310, determining a regional attention; and step 320, determining a target area.

In step 310, the region attention of the attention region is determined from the position attention of each attention position included in the attention region. For example, in a case where the overlapping area of the target position and the target region is larger than the area threshold, it is determined that the target region contains the target position.

In step 320, in the case where the region attention is smaller than the threshold, the corresponding attention region is determined as the target region.

In this case, the target area determined by the artificial intelligence method may be important information required by the user, but the user does not pay sufficient attention. Thus, the target area can be provided for users as important information for further processing such as face recognition, target tracking, medical diagnosis and the like, so that the accuracy and efficiency of image processing are improved.

In some embodiments, the target picture is a monitoring picture, the focus position is a position focused by the monitor, and the focus region is a suspected face region.

In some embodiments, the target picture is a medical image picture, the location of interest is a diagnostician location of interest, and the region of interest is a suspected lesion region. For example, a medical image picture may be given for computer-aided detection by the embodiment of fig. 4.

Fig. 4 illustrates a flow chart of further embodiments of a method of determining an image target area of the present disclosure.

As shown in fig. 4, the method includes: step 410, inputting a medical image picture; step 420, eye movement tracking is performed; step 430, artificial intelligent detection; step 440, obtaining a heat map; step 450, determining a suspected lesion area; and step 460, determining a prompt area.

In step 410, a medical image picture is entered into the system so that the imaging physician can read the film via the display device and the computer can process the film accordingly. For example, the medical image picture may be an image generated by a nuclear magnetic device, a CT (Computed Tomography, electronic computed tomography) device, a DR (Digital Radiography ) device, an ultrasound device, an X-ray machine, or the like.

In step 420, during the doctor's reading, the eye tracker is used to record the moving track of the doctor's Gaze Point (Gaze Point) during the whole reading; and determining the attention degree of the doctor at each point or certain areas according to the running track. For example, the degree of attention may be determined from the conscious gaze time. Gaze point is the basic unit of measurement of the eye tracker, one gaze point being equal to one raw sample captured by the eye tracker.

In step 430, a heat map is generated from the attention. For example, the longer a physician looks at a location on a medical image during a film reading, the darker the color of the region of the corresponding heat map of the medical image in that location. The heat map may be divided into a plurality of physician interest areas (e.g., by clustering, etc.) based on the shades of color in the heat map.

In step 440, the medical image picture is processed using artificial intelligence methods (e.g., neural networks) to extract one or more machine regions of interest (regions of interest). For example, a neural network model may be trained to identify disease-related lesion areas in an image for processing medical image pictures. Step 420 and step 440 are not performed in order.

In step 450, each machine region of interest is determined to be a suspected lesion region.

In step 460, the output areas of the two sets of systems (eye tracking system, artificial intelligence system) are compared.

In some embodiments, each physician region of interest may be matched to each machine region of interest based on the location information (e.g., may be based on the overlapping area of the regions). The degree of interest of the machine region of interest may be determined based on the degree of interest of the physician region of interest that matches the machine region of interest.

In some embodiments, the machine region of interest is prompted to a physician in the event that the corresponding degree of interest of the machine region of interest is below a threshold of interest. For example, by displaying a highlighting, popup floating window, sounding a prompt, etc. in the corresponding region of the medical image picture.

In some embodiments, the threshold of interest may be set for each machine region of interest based on at least one of the anatomy of the machine region of interest, the film reading habits of the characteristic physician of the lesion (which may be extracted from the training data).

In some embodiments, the physician interest areas 1-4 match the machine interest areas 1-4, respectively, and the degree of interest of the physician interest area 4 corresponding to the machine interest area 4 is less than the interest threshold. In this case, the physician may be prompted to focus on the machine region of interest 4 to improve accuracy and efficiency.

For example, physicians make lung nodule diagnoses. During the process of observing the corresponding medical image picture, a doctor discovers 4 lung nodule areas (which can be obtained through eye movement tracking); through artificial intelligence detection, 5 lung nodule regions were found in the medical image picture. The 4 of the artificially intelligently found lung nodule areas are identical to those found by the physician, in which case the physician may be prompted to read only 1 of the artificially intelligently found lung nodule areas that were missed. Thus, doctors do not need to look through the 5 lung nodule areas found by the artificial intelligence, so that the film reading time is greatly reduced, and the processing efficiency and the accuracy of the system are improved.

Fig. 5 illustrates a block diagram of some embodiments of an apparatus for determining an image target area of the present disclosure.

As shown in fig. 5, the determination device 5 of the image target area includes a position determination unit 51, an extraction unit 52, and an extraction unit 53.

The position determining unit 51 determines each focus position of the user on the target picture based on the movement information of the eyeball of the user during the observation of the target picture. For example, the movement information of the eyeball includes at least one of movement of the eyeball relative to the head or the eyeball position.

In some embodiments, the position determining unit 51 determines each gaze point of the user on the target picture according to the motion information; and determining each attention position according to the track formed by each attention point.

The extraction unit 52 extracts each region of interest on the target picture using the machine learning model.

The region determination unit 53 determines a target region on the target picture from each of the attention positions and each of the attention regions.

In some embodiments, the area determination unit 53 determines the degree of positional attention of the user to each attention position based on the motion information, and determines the target area in each attention area based on each positional attention degree.

In some embodiments, the region determination unit 53 determines the region attention of the region of interest from the position attention of each attention position contained in the region of interest; and determining the corresponding region of interest as a target region in the case that the region attention is smaller than the threshold value.

In some embodiments, the area determination unit 53 determines, from the motion information, a time of gaze of the user for each focus location for determining the location focus.

Fig. 6 shows a block diagram of further embodiments of an apparatus for determining an image target area of the present disclosure.

As shown in fig. 6, the image target area determination device 6 of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, the processor 62 being configured to perform the method of determining the image target area in any one of the embodiments of the present disclosure based on instructions stored in the memory 61.

The memory 61 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, application programs, boot loader programs, databases, and other programs.

Fig. 7 shows a block diagram of still further embodiments of the image target area determination apparatus of the present disclosure.

As shown in fig. 7, the image target area determination device 7 of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710, the processor 720 being configured to perform the method of determining the image target area in any of the foregoing embodiments based on instructions stored in the memory 710.

Memory 710 may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot loader programs, and other programs.

The image target area determining device 7 may further include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These

interfaces

730, 740, 750, and memory 710 and processor 720 may be connected by, for example, a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, etc. Network interface 740 provides a connection interface for various networking devices. Storage interface 750 provides a connection interface for external storage devices such as SD cards, U-discs, and the like.

As shown in fig. 8, the image target area determination system 8 includes the image target area determination device 81 and the eye tracker 82 in any of the above embodiments.

The eye tracker 82 is used to acquire movement information of an eyeball of a user during observation of a target picture.

It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein.

Thus far, the determination method of the image target area, the apparatus of the image target area, the system of the image target area, and the computer-readable storage medium according to the present disclosure have been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.

The methods and systems of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of determining a target area of an image, comprising:

according to the motion information of eyeballs of a user in the process of observing a target picture, determining each concerned position of the user on the target picture;

extracting each concerned region on the target picture by using a machine learning model;

determining a target area on the target picture according to each attention position and each attention area;

wherein, the determining the target area on the target picture according to each attention position and each attention area includes:

determining the position attention degree of the user to each attention position according to the motion information;

determining the regional attention degree of the attention region according to the position attention degree of each attention position contained in the attention region;

determining a corresponding region of interest as the target region if the region of interest is less than a threshold;

the determining the position attention degree of the user to each attention position according to the motion information comprises:

and according to the motion information, determining the fixation time of the user to each concerned position for determining the position attention degree.

2. The determination method according to claim 1, wherein the determining, according to the motion information, respective focus positions of the user on the target picture includes:

according to the motion information, determining each gaze point of the user on the target picture;

and determining the attention positions according to the tracks formed by the attention points.

3. The determination method according to claim 1, wherein,

the movement information of the eyeball comprises at least one of movement of the eyeball relative to the head or the eyeball position.

4. A determination method according to any one of claim 1 to 3, wherein,

the target picture is a medical image picture, the focus position is a focus position of a diagnostician, and the focus region is a suspected lesion region.

5. A determination method according to any one of claim 1 to 3, wherein,

the machine learning model is trained by:

acquiring at least one of each concerned position and corresponding position concerned degree of a user on each training picture as concerned information, wherein the training pictures are pictures of the same type as the target picture;

and training the machine learning model by taking the training pictures and the attention information as input and taking the attention areas of the training pictures as labeling results.

6. A determination apparatus of an image target area, comprising:

the position determining unit is used for determining each concerned position of the user on the target picture according to the movement information of the eyeballs of the user in the process of observing the target picture;

an extracting unit, configured to extract each region of interest on the target picture by using a machine learning model;

the area determining unit is used for determining a target area on the target picture according to each attention position and each attention area;

the region determining unit determines a position attention degree of the user to each attention position according to the motion information, determines a region attention degree of the attention region according to the position attention degree of each attention position contained in the attention region, and determines a corresponding attention region as the target region when the region attention degree is smaller than a threshold value;

the region determining unit determines, according to the motion information, a fixation time of the user to each of the positions of interest for determining the position attention.

7. The determining apparatus according to claim 6, wherein,

the position determining unit determines each gaze point of the user on the target picture according to the motion information, and determines each focus position according to a track formed by each gaze point.

8. The determining apparatus according to claim 6, wherein,

9. The determining apparatus according to any one of claims 6 to 8, wherein,

10. The determining apparatus according to any one of claims 6 to 8, wherein,

the machine learning model is trained by:

11. A determination apparatus of an image target area, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of determining an image target area of any of claims 1-5 based on instructions stored in the memory.

12. A system for determining a target area of an image, comprising:

the determination device of an image target area according to any one of claims 6 to 11; and

and the eye movement instrument is used for acquiring the movement information of the eyeballs of the user in the process of observing the target picture.

13. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of determining an image target area according to any of claims 1-5.