CN113610145A

CN113610145A - Model training method, image prediction method, training system, and storage medium

Info

Publication number: CN113610145A
Application number: CN202110887098.7A
Authority: CN
Inventors: 陈磊; 王晟
Original assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Current assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-11-05

Abstract

The application provides a model training method, an image prediction method, a training system and a storage medium, wherein the model training method comprises the following steps: acquiring at least one training image; acquiring annotation information and eye movement information corresponding to the training image; acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; inputting the training image into a preset neural network to obtain prediction information corresponding to the training image; calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; training a preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through the loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through the attention loss function. The model training method can obtain an image prediction model with wide application range and high intelligence degree.

Description

Model training method, image prediction method, training system, and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a model training method, an image prediction method, a training system, and a storage medium.

Background

Currently, eye tracker provides three-dimensional movement tracking of the eye and detects the approximate area of the eye's focus screen. Eye trackers come in many different types. For telemetric, non-invasive eye tracking, the most common technique is the pupillary corneal reflex technique. The basic idea of this technique is to illuminate the eye with a light source that produces significant reflections and to use a camera to capture images of the eye with these reflections. These images acquired by the camera are then used to identify the reflection of the light source on the cornea and pupil. Thus, the vector of the eye movement can be calculated by the angle between the corneal and pupillary reflections, and then the approximate direction of the line of sight can be calculated by combining the direction of this vector with the geometric features of the other reflections.

The medical image processing technology based on deep learning accumulates a large number of neural network models in the aspects of image segmentation, classification, detection, registration and the like. The deep learning based medical image processing technique requires a large amount of data as training to find out the characteristic rules in the image. In recent years, deep learning has been widely used in medical images but there are also problems in that: (1) the large amount of medical image data and diagnostic results are often difficult to acquire, thereby affecting the performance of the deep learning model. (2) The diagnosis of the deep learning model is an unexplained black box system, thereby affecting its further applications.

At present, no system and product for assisting in completing deep learning model training based on eye movement information of doctors exist in the market.

Disclosure of Invention

The application aims to provide a model training method, an image prediction method, a training system and a storage medium, which assist in completing deep learning model training based on eye movement information.

The purpose of the application is realized by adopting the following technical scheme:

in a first aspect, the present application provides a model training method, including: acquiring at least one training image; acquiring annotation information and eye movement information corresponding to the training image, wherein the annotation information corresponding to the training image is obtained by annotating the training image, and the eye movement information corresponding to the training image is acquired when a person gazes at the training image; acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; inputting the training image into a preset neural network to obtain prediction information corresponding to the training image; calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

The technical scheme has the advantages that the eye movement information acquired when a person gazes at a training image is acquired, the eye movement attention heat map corresponding to the training image is acquired based on the eye movement information corresponding to the training image, the neural network attention heat map and the eye movement attention heat map corresponding to the training image are restrained by using an attention loss function during training, deep learning model training is completed based on the eye movement information in a supervised learning mode, and the trained image prediction model can be used for executing an image prediction task, so that the application range is wide, and the intelligence degree is high.

In some optional embodiments, the eye movement information includes position information of a plurality of fixation points, and the obtaining an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image includes: denoising the position information of the plurality of fixation points corresponding to the training image to obtain a denoising result corresponding to the training image; and filtering the noise reduction result corresponding to the training image to obtain the eye movement attention heat map corresponding to the training image.

The technical scheme has the advantages that the eye movement attention heat map with the noise and clutter influence removed is obtained, the eye movement attention heat map with the noise and clutter influence removed and the neural network attention heat map corresponding to the training image are restrained through the attention loss function, and the accuracy of the image prediction model is improved.

In some optional embodiments, the denoising the position information of the multiple gazing points corresponding to the training image to obtain a denoising result corresponding to the training image includes: acquiring the position information of the fixation point at each moment corresponding to the training image, and acquiring the eye movement speed at each moment corresponding to the training image; calculating the attention level of each moment corresponding to the training image based on the eye movement speed of each moment corresponding to the training image; and acquiring the position information of the fixation point corresponding to the moment when the attention level is in the preset attention range, and taking the position information as a noise reduction result corresponding to the training image.

The technical scheme has the advantages that the eye movement speed of each moment corresponding to the training image is obtained through the position information of the fixation point of each moment corresponding to the training image, the attention level is calculated based on the eye movement speed, the data that the attention level is not in the preset attention range is removed, the noise reduction processing of the training image is completed, and the noise reduction effect is good.

In some optional embodiments, the filtering the noise reduction result corresponding to the training image to obtain the eye movement attention heat map corresponding to the training image includes: and performing Gaussian filtering on the noise reduction result corresponding to the training image to obtain an eye movement attention heat map corresponding to the training image.

The technical scheme has the advantages that filtering is performed by adopting a Gaussian filtering method, image noise which obeys normal distribution can be effectively removed, information required by the image is stored, and finally a clear eye movement attention heat map is obtained.

In some optional embodiments, the method for obtaining annotation information corresponding to the training image includes: and receiving the operation of marking the training image by using user operation equipment to obtain marking information corresponding to the training image.

The technical scheme has the advantages that the user operation equipment can be used for receiving the operation of labeling the training images to obtain labeling information corresponding to the training images to expand the training set, and the performance of the image prediction model is improved.

In some optional embodiments, the method for acquiring eye movement information corresponding to the training image includes: and acquiring eye movement information when the person gazes at the training image by using eye movement acquisition equipment to obtain the eye movement information corresponding to the training image.

The technical scheme has the beneficial effects that the eye movement information corresponding to the training image can be obtained through the eye movement acquisition equipment, and the eye movement acquisition equipment can carry out efficient, natural and accurate eye movement tracking.

In some alternative embodiments, the image prediction model is used to perform at least one of the following tasks: an image classification task; an image segmentation task; an image detection task; an image registration task; an image mapping task; and (5) performing an image fusion task.

The technical scheme has the beneficial effect that the image prediction model obtained by the model training method can meet various application scenes.

In a second aspect, the present application provides a method for image prediction, the method comprising: acquiring a to-be-predicted image; inputting the image to be predicted into an image prediction model to obtain prediction information corresponding to the image to be predicted; the image prediction model is obtained by training by using the following method: acquiring at least one training image; acquiring annotation information and eye movement information corresponding to the training image, wherein the annotation information corresponding to the training image is obtained by annotating the training image, and the eye movement information corresponding to the training image is acquired when a person gazes at the training image; acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; inputting the training image into a preset neural network to obtain prediction information corresponding to the training image; calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

In a third aspect, the present application provides a model training apparatus, the apparatus comprising: a training image module for acquiring at least one training image; the information acquisition module is used for acquiring the labeling information and the eye movement information corresponding to the training image, wherein the labeling information corresponding to the training image is obtained by labeling the training image, and the eye movement information corresponding to the training image is obtained by acquiring when a person gazes at the training image; the eye movement heat map module is used for acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; the image input module is used for inputting the training image into a preset neural network to obtain the prediction information corresponding to the training image; the network heat map module is used for calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; the model training module is used for training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

In some optional embodiments, the eye movement information comprises position information of a plurality of fixation points, the eye movement heat map module comprising: the noise reduction unit is used for reducing noise of the position information of the plurality of fixation points corresponding to the training image to obtain a noise reduction result corresponding to the training image; and the filtering unit is used for filtering the noise reduction result corresponding to the training image to obtain the eye movement attention heat map corresponding to the training image.

In some optional embodiments, the noise reduction unit includes: an information obtaining subunit, configured to obtain position information of a fixation point at each time corresponding to the training image, and obtain an eye movement speed at each time corresponding to the training image; the attention calculating subunit is used for calculating the attention level of each time corresponding to the training image based on the eye movement speed of each time corresponding to the training image; and the position noise reduction subunit is used for acquiring the position information of the fixation point corresponding to the moment when the attention level is in the preset attention range, and taking the position information as the noise reduction result corresponding to the training image.

In some optional embodiments, the filtering unit is configured to perform gaussian filtering on the noise reduction result corresponding to the training image to obtain an eye movement attention heat map corresponding to the training image.

In some optional embodiments, the information obtaining module is configured to receive, by using a user operation device, the operation of labeling the training image, and obtain labeling information corresponding to the training image.

In some optional embodiments, the information obtaining module is configured to obtain eye movement information corresponding to the training image by using an eye movement collecting device to collect the eye movement information when the person gazes at the training image.

In a fourth aspect, the present application provides an image prediction apparatus, comprising: the image to be predicted module is used for acquiring an image to be predicted; the image prediction module is used for inputting the image to be predicted into an image prediction model to obtain prediction information corresponding to the image to be predicted;

wherein the image prediction model is obtained by training with a model training device, and the model training device comprises: a training image module for acquiring at least one training image; the information acquisition module is used for acquiring the labeling information and the eye movement information corresponding to the training image, wherein the labeling information corresponding to the training image is obtained by labeling the training image, and the eye movement information corresponding to the training image is obtained by acquiring when a person gazes at the training image; the eye movement heat map module is used for acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; the image input module is used for inputting the training image into a preset neural network to obtain the prediction information corresponding to the training image; the network heat map module is used for calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; the model training module is used for training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

In a fifth aspect, the present application provides a training system comprising a user operating device, an eye movement acquisition device, and an electronic device; the electronic equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of any one of the model training methods when executing the computer program; the user operation equipment is used for receiving the operation of labeling the training image; the eye movement acquisition equipment is used for acquiring eye movement information when a person gazes at the training image.

In a sixth aspect, the present application provides a computer readable storage medium storing a computer program or an image prediction model; the computer program, when executed by a processor, implementing the steps of any of the above methods; the image prediction model is obtained by training by using the following method: acquiring at least one training image; acquiring annotation information and eye movement information corresponding to the training image, wherein the annotation information corresponding to the training image is obtained by annotating the training image, and the eye movement information corresponding to the training image is acquired when a person gazes at the training image; acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; inputting the training image into a preset neural network to obtain prediction information corresponding to the training image; calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

Drawings

The present application is further described below with reference to the drawings and examples.

FIG. 1 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of acquiring an eye movement attention heat map according to an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart illustrating a process of denoising a training image according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of an image prediction method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an eye movement heat map module according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a noise reduction unit provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image prediction apparatus according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a training system provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

In the figure: 10. an eye movement collection device; 20. an electronic device; 30. a user operating the device; 210. a memory; 220. a processor.

Detailed Description

The present application is further described with reference to the accompanying drawings and the detailed description, and it should be noted that, in the present application, the embodiments or technical features described below may be arbitrarily combined to form a new embodiment without conflict.

Referring to fig. 1, an embodiment of the present application provides a model training method, which includes steps S101 to S106.

Step S101: at least one training image is acquired. The training image may be a medical image, which includes, but is not limited to, a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, an endoscopic image, an Ultrasound (US) image, and the like; the training images may also be machine images in the industrial field, such as faulty car images, machining equipment images, etc.

Step S102: and acquiring the labeling information and the eye movement information corresponding to the training image, wherein the labeling information corresponding to the training image is obtained by labeling the training image, and the eye movement information corresponding to the training image is obtained by collecting when a person gazes at the training image. In the medical field, the person may be a medical care person, and the labeling information may be a lesion position labeled by the medical care person according to a Gold standard (Gold standard), which is a method for judging a disease in the clinical medical field; in the industrial field, the person may be a technician, and the labeled information may be information that the technician labels the fault, for example, a fault location and a fault type that a maintenance engineer labels the equipment image.

Step S103: and acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image.

Step S104: and inputting the training image into a preset neural network to obtain the prediction information corresponding to the training image.

Step S105: and calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image.

Step S106: training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

In some embodiments, when the training image corresponds to a living human body or animal body, the annotation information corresponding to the training image cannot be used to directly obtain a diagnosis result or a health condition of a disease.

In a specific application scenario, the image prediction model obtained by training the preset neural network may be a medical image prediction model, and is used for executing a segmentation task. Firstly, a training image of a medical image is obtained, the marking information corresponding to the training image can be whether a focus exists, and if the focus exists, the segmentation region information of the focus exists. For example, the labeling information in the training image is used to indicate whether the patient corresponding to the training image has a lung nodule, and the segmentation region of the lung nodule. The person who executes the labeling is Zhang III of a professional doctor, eye movement information of Zhang III when the person watches the training image is collected, and a corresponding eye movement attention heat map is obtained based on the collected eye movement information of Zhang III. And inputting the same training image into a preset neural network to obtain the prediction information corresponding to the training image, wherein the prediction information is used for indicating that the patient corresponding to the training image has the lung nodule and the segmentation region of the lung nodule. And calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image. And constraining the prediction information and the labeling information corresponding to the training image through the loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through the attention loss function to obtain the image prediction model trained by the preset neural network.

Therefore, the eye movement information acquired when a person gazes at a training image is acquired, the eye movement attention heat map corresponding to the training image is acquired based on the eye movement information corresponding to the training image, the neural network attention heat map and the eye movement attention heat map corresponding to the training image are restrained by using an attention loss function during training, deep learning model training is completed based on the eye movement information in an aided mode by using a supervised learning mode, and the trained image prediction model can be used for executing an image prediction task, and is wide in application range and high in intelligence degree.

Referring to fig. 2, in some embodiments, the eye movement information may include position information of a plurality of fixation points, and the step S103 may include steps S201 to S202.

Step S201: and denoising the position information of the plurality of fixation points corresponding to the training image to obtain a denoising result corresponding to the training image.

Step S202: and filtering the noise reduction result corresponding to the training image to obtain the eye movement attention heat map corresponding to the training image.

Therefore, the eye movement attention heat map without the influence of noise and clutter is obtained, the eye movement attention heat map without the influence of noise and clutter and the neural network attention heat map corresponding to the training image are restrained through the attention loss function, and the accuracy of the image prediction model is improved.

Referring to fig. 3, in some embodiments, the S201 may include steps S301 to S303.

Step S301: and acquiring the position information of the fixation point at each moment corresponding to the training image, and acquiring the eye movement speed at each moment corresponding to the training image. The position information of the fixation point at each moment can be obtained by the eye tracker, because the refresh rate of the eye tracker is fixed, the refresh rate of the eye tracker is the number of times that the screen is refreshed every second. The eye movement speed of each moment corresponding to the training image can be obtained through the previous fixation point and the moving distance of the fixation point on the training image at the moment.

Step S302: and calculating the attention level of each time corresponding to the training image based on the eye movement speed of each time corresponding to the training image. When the eye movement velocity is faster at a certain moment than at other moments, its relative attention level is low.

Step S303: and acquiring the position information of the fixation point corresponding to the moment when the attention level is in the preset attention range, and taking the position information as a noise reduction result corresponding to the training image. And performing noise reduction processing on the gaze point position information which does not conform to the attention horizontal range, and reserving the gaze point position information which conforms to the attention horizontal range, wherein the reserved gaze point position information is a noise reduction result corresponding to the training image.

Therefore, the eye movement speed of each moment corresponding to the training image is obtained through the position information of the fixation point of each moment corresponding to the training image, the attention level is calculated based on the eye movement speed, the data that the attention level is not in the preset attention range is removed, the noise reduction processing of the training image is completed, and the noise reduction effect is good.

The embodiment of the application can perform Gaussian filtering, mean filtering, median filtering and the like on the noise reduction result corresponding to the training image.

In some embodiments, the step S202 may include: and performing Gaussian filtering on the noise reduction result corresponding to the training image to obtain an eye movement attention heat map corresponding to the training image.

The commonly used median filtering operation in the prior art is long in time and easily causes partial structure loss of the image; the average filtering also eliminates the structure of the image in the filtering process, so that the image becomes unclear and the complete details of the image cannot be effectively protected to achieve the purpose of smooth filtering. While gaussian filtering does not have the above problems of mean filtering and median filtering. Therefore, filtering is carried out by adopting a Gaussian filtering method, image noise which obeys normal distribution can be effectively removed, information required by the image can be stored, and a clear eye movement attention heat map can be obtained.

In some embodiments, the method for acquiring annotation information corresponding to the training image in step S102 may include: and receiving the operation of marking the training image by using user operation equipment to obtain marking information corresponding to the training image.

In a specific application scenario, the model is trained, the user operation device may be a touch screen, a keyboard or a voice recognition device, and the method for acquiring the labeled information corresponding to the training image may be performed by an experienced doctor through key input, keyboard input, touch display screen input, voice command input, and the like. The number of keys is for example 3, 4, 5 etc. The keyboard is for example a physical keyboard, a virtual keyboard or a projected keyboard.

For example, by keyboard input "1" for "mild gastritis", input "2" for "moderate gastritis", input "3" for "severe gastritis", and input "0" for "no gastritis". For the first training image, inputting '3' through a keyboard by a tee, wherein the marking information of the first training image is used for indicating that the patient corresponding to the training image suffers from severe gastritis; and for the second training image, inputting '1' by the tee through a keyboard, wherein the label information of the second training image is used for indicating that the patient corresponding to the training image suffers from mild gastritis.

In a specific application scene, the marking information and the eye movement information corresponding to the current training image by the personnel can be simultaneously acquired; or the marking information and the eye movement information corresponding to the current training image by the personnel can be obtained at different times; for the current training image, the person performing the labeling operation and the person who is collected the eye movement information may be the same person or different persons.

Therefore, the user operation equipment can be used for receiving the operation of labeling the training images, the labeling information corresponding to the training images is obtained to expand the training set, and the performance of the image prediction model is improved.

In some embodiments, the method for acquiring the eye movement information corresponding to the training image includes: and acquiring eye movement information when the person gazes at the training image by using eye movement acquisition equipment to obtain the eye movement information corresponding to the training image. The eye movement acquisition equipment can be a portable eye movement instrument, a desktop eye movement instrument, a glasses type eye movement instrument and other equipment which track eye movement and record eye movement data. The eye movement acquisition equipment can provide three-dimensional movement tracking of eyeballs and can detect the approximate area of a screen which is focused by human eyes. The eye movement collecting device can adopt pupil cornea reflection technology, use a light source to irradiate the eye to generate obvious reflection, use a camera to collect the images of the eye with the reflection effects, then use the images collected by the camera to identify the reflection of the light source on the cornea and the pupil, thereby calculating the vector of the eye movement through the angle between the cornea and the pupil reflection, and further combining the direction of the vector with the geometrical characteristics of other reflections to calculate the approximate direction of the sight line.

Therefore, the eye movement information corresponding to the training image can be obtained through the eye movement collecting equipment, and the eye movement collecting equipment can carry out efficient, natural and accurate eye movement tracking.

In some embodiments, the training image at which the person gazes may be switched to a new training image for a certain time while eye movement information of the person is collected. Or when the eye movement information of the person is collected for a period of time or a certain amount of training image eye movement information of the person is collected, the person is arranged to have a rest. In a specific application scenario, a person watches and collects the eye movement information of twenty training images, then arranges a five-minute rest time, and then continues to collect the eye movement information of the training images after the person has a five-minute rest. Therefore, the fatigue of the personnel can be reduced, and the attention level of the personnel during the collection of the eye movement information can be improved.

In some embodiments, the image prediction model is used to perform at least one of the following tasks: an image classification task; an image segmentation task; an image detection task; an image registration task; an image mapping task; and (5) performing an image fusion task.

In a specific application scenario, a curved surface layer sheet of teeth in the oral cavity of a patient is input into an image prediction model, and an image classification task of the teeth is executed. The image prediction model can classify teeth and can output classification information for labeling incisors, lateral incisors, cuspids, premolars and molars.

In a specific application scenario, a brain medical image of a brain tumor patient acquired through nuclear magnetic resonance imaging is used as an image to be predicted to be input into an image prediction model, and an image segmentation task of tumor tissues is executed. The image prediction model can segment the image to be predicted, so as to divide tumor tissues and healthy brain tissues and assist doctors in making accurate diagnosis and treatment.

In a specific application scenario, a medical image of a lung is used as an image to be predicted to be input into an image prediction model, and an image detection task of a lung nodule is executed. The image prediction model detects a to-be-predicted image, frames a predicted point of a lung nodule, marks a confidence coefficient corresponding to the predicted point, and assists a doctor to further judge whether the lung nodule is benign or malignant by means of a medical device.

In a specific application scenario, a patient is examined in a hospital A due to chest discomfort and a chest CT image is obtained, and then the patient is transferred to a hospital B for treatment due to medical insurance reimbursement problems and a new chest CT image is obtained by retaking, because the positions of the two CT images do not correspond to each other in medical images obtained in different hospitals. And inputting the two chest CT images as images to be predicted into an image prediction model, and executing an image registration task. The image prediction model registers the two images to be predicted to obtain a registered medical image, and the registered medical image obtains information complementation of the two images to be predicted, so that a doctor can be assisted to diagnose a patient better.

In a specific application scenario, two medical images of the same patient at the same disease position before treatment and after conservative treatment are used as image prediction models to be predicted, and an image fusion task is executed. The image prediction model fuses images to be predicted to obtain fused medical images, and assists doctors in tracking disease development according to the fused medical images.

Therefore, the image prediction model obtained by the model training method can meet various application scenes.

Referring to fig. 4, an embodiment of the present application further provides an image prediction method, which includes steps S401 to S402.

Step S401: and acquiring a to-be-predicted image.

Step S402: inputting the image to be predicted into an image prediction model to obtain prediction information corresponding to the image to be predicted; the image prediction model is obtained by training by using the following method:

acquiring at least one training image; acquiring annotation information and eye movement information corresponding to the training image, wherein the annotation information corresponding to the training image is obtained by annotating the training image, and the eye movement information corresponding to the training image is acquired when a person gazes at the training image; acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; inputting the training image into a preset neural network to obtain prediction information corresponding to the training image; calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

Referring to fig. 5, an embodiment of the present application further provides a model training apparatus, and a specific implementation manner of the model training apparatus is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method, and details of a part of the implementation manner and the achieved technical effect are not repeated.

The model training apparatus includes: a training image module 101 for acquiring at least one training image; the information acquisition module 102 is configured to acquire labeling information and eye movement information corresponding to the training image, where the labeling information corresponding to the training image is obtained by labeling the training image, and the eye movement information corresponding to the training image is obtained by acquiring when a person gazes at the training image; the eye movement heat map module 103 is configured to obtain an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; an image input module 104, configured to input the training image into a preset neural network, so as to obtain prediction information corresponding to the training image; a network heat map module 105, configured to calculate, using an attention mechanism, attention of the preset neural network corresponding to the training image, to obtain a neural network attention heat map corresponding to the training image; the model training module 106 is configured to train the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

Referring to fig. 6, in some embodiments, the eye movement information may include location information of a plurality of fixation points, and the eye movement heat map module 103 may include: a denoising unit 201, configured to denoise position information of multiple fixation points corresponding to the training image to obtain a denoising result corresponding to the training image; and a filtering unit 202, configured to filter the noise reduction result corresponding to the training image to obtain an eye movement attention heat map corresponding to the training image.

Referring to fig. 7, in some embodiments, the noise reduction unit 201 may include: an information obtaining subunit 301, configured to obtain position information of a fixation point at each time corresponding to the training image, and obtain an eye movement speed at each time corresponding to the training image; an attention calculating subunit 302, configured to calculate an attention level at each time corresponding to the training image based on the eye movement speed at each time corresponding to the training image; and a position noise reduction subunit 303, configured to acquire position information of a gaze point corresponding to a moment when the attention level is within a preset attention range, as a noise reduction result corresponding to the training image.

In some embodiments, the filtering unit 202 may be configured to perform gaussian filtering on the noise reduction result corresponding to the training image to obtain an eye movement attention heat map corresponding to the training image.

In some embodiments, the information obtaining module 102 may be configured to receive, by using a user operating device, the operation of labeling the training image, and obtain labeling information corresponding to the training image.

In some embodiments, the information obtaining module 102 may be configured to collect, by using an eye movement collecting device, eye movement information when the person gazes at the training image, so as to obtain eye movement information corresponding to the training image.

In some embodiments, the image prediction model may be used to perform at least one of the following tasks: an image classification task; an image segmentation task; an image detection task; an image registration task; an image mapping task; and (5) performing an image fusion task.

Referring to fig. 8, an embodiment of the present application further provides an image prediction apparatus, and a specific implementation manner of the image prediction apparatus is consistent with the implementation manner and the achieved technical effect described in the embodiment of the image prediction method, and a part of the contents are not repeated.

The image prediction apparatus includes: a to-be-predicted image module 401, configured to obtain a to-be-predicted image; an image prediction module 402, configured to input the image to be predicted into an image prediction device, so as to obtain prediction information corresponding to the image to be predicted; wherein the model training apparatus comprises:

a training image module 101 for acquiring at least one training image; the information acquisition module 102 is configured to acquire labeling information and eye movement information corresponding to the training image, where the labeling information corresponding to the training image is obtained by labeling the training image, and the eye movement information corresponding to the training image is obtained by acquiring when a person gazes at the training image; the eye movement heat map module 103 is configured to obtain an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; an image input module 104, configured to input the training image into a preset neural network, so as to obtain prediction information corresponding to the training image; a network heat map module 105, configured to calculate, using an attention mechanism, attention of the preset neural network corresponding to the training image, to obtain a neural network attention heat map corresponding to the training image; the model training module 106 is configured to train the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

Referring to fig. 9 and 10, an embodiment of the present application further provides a training system, which includes a user operation device 30, an eye movement collecting device 10, and an electronic device 20; the electronic device 20 comprises a memory 210 and a processor 220, wherein the memory 210 stores a computer program, and the processor 220 implements the steps of any one of the above model training methods when executing the computer program; the user operation device 30 is used for receiving the operation of labeling the training image; the eye movement acquisition device 10 is used for acquiring eye movement information when a person gazes at the training image.

The memory 210 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)211 and/or cache memory 212, and may further include Read Only Memory (ROM) 213.

The memory 210 stores a computer program, and the computer program can be executed by the processor 220, so that the processor 220 executes the steps of the model training method in the embodiment of the present application, and the specific implementation manner of the method is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method, and some contents are not described again.

Memory 210 may also include a utility 214 having at least one program module 215, such program modules 215 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Accordingly, the processor 220 may execute the computer programs described above, and may execute the utility 214.

Bus 230 may be a local bus representing one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or any other type of bus structure.

The electronic device 20 may also communicate with one or more external devices 240, such as a keyboard, pointing device, bluetooth device, etc., and may also communicate with one or more devices capable of interacting with the electronic device 20, and/or with any devices (e.g., routers, modems, etc.) that enable the electronic device 20 to communicate with one or more other computing devices. Such communication may be through input-output interface 250. Also, the electronic device 20 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 20 via the bus 230. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 20, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

Referring to fig. 11, an embodiment of the present application further provides a computer-readable storage medium storing a computer program or an image prediction model; the computer program when executed by the processor 220 performs the steps of any of the above model training methods or image prediction methods; the image prediction model is obtained by training by using the following method: acquiring at least one training image; acquiring annotation information and eye movement information corresponding to the training image, wherein the annotation information corresponding to the training image is obtained by annotating the training image, and the eye movement information corresponding to the training image is acquired when a person gazes at the training image; acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image; inputting the training image into a preset neural network to obtain prediction information corresponding to the training image; calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image; training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

When executed, the computer program implements the steps of any one of the model training methods or the image prediction methods in the embodiments of the present application, and the specific implementation manner of the computer program is consistent with the implementation manner and the achieved technical effect described in the embodiments of the above methods, and some contents are not repeated.

Fig. 11 shows a program product 300 for implementing the model training method provided in this embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be executed on a terminal device, such as a personal computer. However, the program product 300 of the present invention is not so limited, and in this application, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Program product 300 may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that can communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on an associated device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service builder).

While the present application is described in terms of various aspects, including exemplary embodiments, the principles of the invention should not be limited to the disclosed embodiments, but are also intended to cover various modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of model training, the method comprising:

acquiring at least one training image;

acquiring annotation information and eye movement information corresponding to the training image, wherein the annotation information corresponding to the training image is obtained by annotating the training image, and the eye movement information corresponding to the training image is acquired when a person gazes at the training image;

acquiring an eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image;

inputting the training image into a preset neural network to obtain prediction information corresponding to the training image;

calculating the attention of the preset neural network corresponding to the training image by using an attention mechanism to obtain a neural network attention heat map corresponding to the training image;

training the preset neural network to obtain an image prediction model; and constraining the prediction information and the labeling information corresponding to the training image through a loss function, and constraining the neural network attention heat map and the eye movement attention heat map corresponding to the training image through an attention loss function.

2. The model training method according to claim 1, wherein the eye movement information includes position information of a plurality of fixation points, and the obtaining of the eye movement attention heat map corresponding to the training image based on the eye movement information corresponding to the training image includes:

denoising the position information of the plurality of fixation points corresponding to the training image to obtain a denoising result corresponding to the training image;

and filtering the noise reduction result corresponding to the training image to obtain the eye movement attention heat map corresponding to the training image.

3. The model training method according to claim 2, wherein the denoising the position information of the plurality of gaze points corresponding to the training image to obtain a denoising result corresponding to the training image comprises:

acquiring the position information of the fixation point at each moment corresponding to the training image, and acquiring the eye movement speed at each moment corresponding to the training image;

calculating the attention level of each moment corresponding to the training image based on the eye movement speed of each moment corresponding to the training image;

and acquiring the position information of the fixation point corresponding to the moment when the attention level is in the preset attention range, and taking the position information as a noise reduction result corresponding to the training image.

4. The model training method according to claim 2, wherein the filtering the noise reduction result corresponding to the training image to obtain the eye movement attention heat map corresponding to the training image comprises:

and performing Gaussian filtering on the noise reduction result corresponding to the training image to obtain an eye movement attention heat map corresponding to the training image.

5. The model training method according to claim 1, wherein the method for acquiring the labeling information corresponding to the training image comprises:

and receiving the operation of marking the training image by using user operation equipment to obtain marking information corresponding to the training image.

6. The model training method according to claim 1, wherein the method of obtaining eye movement information corresponding to the training image comprises:

and acquiring eye movement information when the person gazes at the training image by using eye movement acquisition equipment to obtain the eye movement information corresponding to the training image.

7. The model training method of claim 1, wherein the image prediction model is used to perform at least one of the following tasks: an image classification task; an image segmentation task; an image detection task; an image registration task; an image mapping task; and (5) performing an image fusion task.

8. A method of image prediction, the method comprising:

acquiring a to-be-predicted image;

inputting the image to be predicted into an image prediction model to obtain prediction information corresponding to the image to be predicted;

the image prediction model is obtained by training by using the following method:

acquiring at least one training image;

9. An exercise system comprising a user operated device, an eye movement acquisition device and an electronic device;

the electronic device comprising a memory storing a computer program and a processor implementing the steps of the method according to any of claims 1-7 when the processor executes the computer program;

the user operation equipment is used for receiving the operation of labeling the training image;

the eye movement acquisition equipment is used for acquiring eye movement information when a person gazes at the training image.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program or an image prediction model;

the computer program when being executed by a processor implementing the steps of the method of any one of claims 1 to 7 or the steps of the method of claim 8;

acquiring at least one training image;