CN117079291A

CN117079291A - Image track determining method, device, computer equipment and storage medium

Info

Publication number: CN117079291A
Application number: CN202311000300.5A
Authority: CN
Inventors: 崔景景; 董昢; 薛忠
Original assignee: Lianying Intelligent Medical Technology Beijing Co ltd
Current assignee: Lianying Intelligent Medical Technology Beijing Co ltd
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2023-11-17

Abstract

The application relates to an image track determining method, an image track determining device, computer equipment and a storage medium. The method comprises the following steps: acquiring an image to be identified and text information corresponding to the image to be identified; identifying image features of the image to be identified and text features of the text information; and determining a target track of the region of interest from the image to be identified according to the image features and the text features based on a predetermined image knowledge graph. By adopting the method, the sight line attention track of an expert when the expert reads the medical image can be determined according to any medical image and the image report of the medical image, so that the primary doctor can learn, the learning range of the imaging knowledge is enlarged, and the flexibility of learning the imaging knowledge is improved.

Description

Image track determining method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a computer device, and a storage medium for determining an image track.

Background

The medical image can play an important auxiliary role in clinical diagnosis and treatment process, and has important significance for education and training of imaging knowledge of primary doctors.

At present, primary doctors usually learn imaging knowledge through methods of professional publications and publications, expert experience sharing, clinical practice, academic conferences and seminars, online resources and the like, and most of the methods are experience summarization or typical cases, and the flexibility is lacking, for example, the primary doctors can learn with the expert in real time through the academic conferences and seminars, but the learning time is limited, learning content is limited, and corresponding knowledge can be learned according to own needs through the online resources, but the primary doctors cannot communicate with the expert in time. Furthermore, the learning of the imaging knowledge needs to be compatible with both theoretical learning and practical operation, and the current learning mode is difficult to realize both.

Therefore, the learning of the current imaging knowledge has the problem of insufficient flexibility.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an image trajectory determination method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve learning flexibility.

In a first aspect, the present application provides an image trajectory determination method. The method comprises the following steps:

acquiring an image to be identified and text information corresponding to the image to be identified;

Identifying image features of the image to be identified and text features of the text information;

and determining a target track of the region of interest from the image to be identified according to the image features and the text features based on a predetermined image knowledge graph.

In one embodiment, the identifying the image feature of the image to be identified and the text feature of the text information includes:

inputting the image to be identified into a trained image encoder to obtain the image characteristics of the image to be identified;

and inputting the text information into a trained text encoder to obtain the text characteristics of the text information.

In one embodiment, before acquiring the image to be identified and the text information corresponding to the image to be identified, the method further includes:

acquiring an image sample, and a track sample and a text sample corresponding to the image sample;

determining sample fusion characteristics of the image sample according to the track sample, and determining text sample characteristics corresponding to the image sample according to the text sample;

training the encoder to be trained according to the sample fusion characteristics and the text sample characteristics to obtain a trained encoder; the trained encoder includes a trained track encoder, the trained image encoder, and the trained text encoder.

In one embodiment, the determining the sample fusion feature of the image sample from the trajectory sample includes:

extracting track sample characteristics of the track sample;

registering the track sample with the image sample to obtain a region-of-interest sample in the image sample, and extracting sample image features of the region-of-interest sample;

and fusing the track sample characteristics and the sample image characteristics to obtain the sample fusion characteristics of the image sample.

In one embodiment, the training the encoder to be trained according to the sample fusion feature and the text sample feature to obtain a trained encoder includes:

pairing the sample fusion characteristics and the text sample characteristics to obtain a positive training sample and a negative training sample;

and training a track encoder to be trained, an image encoder to be trained and a text encoder to be trained according to the positive training sample and the negative training sample to obtain the trained track encoder, the trained image encoder and the trained text encoder.

In one embodiment, the training the track encoder to be trained, the image encoder to be trained and the text encoder to be trained according to the positive training sample and the negative training sample to obtain the trained track encoder, the trained image encoder and the trained text encoder includes:

Performing contrast learning according to the positive training sample and the negative training sample to obtain a contrast learning loss value;

and according to the contrast learning loss value, carrying out parameter adjustment on the track encoder to be trained, the image encoder to be trained and the text encoder to be trained to obtain the trained track encoder, the trained image encoder and the trained text encoder.

In one embodiment, after training the encoder to be trained according to the sample fusion feature and the text sample feature to obtain a trained encoder, the method further includes:

determining a region feature entity and a text feature entity corresponding to the region feature entity according to the trained encoder;

extracting the relation between the regional characteristic entity and the text characteristic entity to obtain an entity relation between the regional characteristic entity and the text characteristic entity;

and determining the image knowledge graph according to the regional characteristic entity, the text characteristic entity and the entity relationship.

In a second aspect, the application further provides an image track determining device. The device comprises:

The information acquisition module is used for acquiring an image to be identified and text information corresponding to the image to be identified;

the feature recognition module is used for recognizing the image features of the image to be recognized and the text features of the text information;

and the track determining module is used for determining the target track of the region of interest from the image to be identified according to the image characteristics and the text characteristics based on a predetermined image knowledge graph.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The image track determining method, the device, the computer equipment, the storage medium and the computer program product are used for identifying the image characteristics of the image to be identified and the text characteristics of the text information by acquiring the image to be identified and the text information corresponding to the image to be identified, and determining the target track of the region of interest from the image to be identified according to the image characteristics and the text characteristics based on a predetermined image knowledge graph; the method can determine the sight line attention track of an expert when the expert reads the medical image based on the knowledge graph according to any medical image and the image report of the medical image, so that the primary doctor can learn, the learning range of the imaging knowledge is enlarged, and the flexibility of the imaging knowledge learning is improved.

Drawings

FIG. 1 is a flow chart of a method of determining an image trajectory in one embodiment;

FIG. 2 is a flow chart of an image track determining method according to another embodiment;

FIG. 3 is a flow diagram of an encoder training process in one embodiment;

FIG. 4 is a flow diagram of a sample fusion feature acquisition process in one embodiment;

FIG. 5 is a flow chart of an encoder training process in another embodiment;

FIG. 6 is a flow diagram of an image knowledge graph construction process in one embodiment;

FIG. 7 is a flow chart of an image track determining method according to another embodiment;

FIG. 8 is a block diagram showing the construction of an image trajectory determining device in one embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in fig. 1, an image track determining method is provided, where the method is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

Step S110, obtaining an image to be identified and text information corresponding to the image to be identified.

The image to be identified may be an image requiring identification of a line of sight track of interest, for example, a medical image requiring identification of an expert line of sight track of interest, including but not limited to CT (Computed Tomography, electronic computed tomography) images, MR (Magnetic Resonance ) images, and pathology images.

The text information may be text associated with the image to be identified, such as an image report of a medical image, including but not limited to a CT image report, an MR image report, and an analysis report of a pathology image.

In a specific implementation, an image to be identified can be acquired, text information of the image to be identified can be acquired, the image to be identified and the text information thereof are input to a terminal, and the terminal acquires the image to be identified and the text information corresponding to the image to be identified.

In practical application, a medical image requiring imaging knowledge training is used as an image to be identified, an image report of the medical image is used as text information corresponding to the image to be identified, and the medical image and the image report are input to the terminal.

Step S120, identifying image features of the image to be identified, and text features of the text information.

In the specific implementation, the terminal can identify the image to be identified through the trained image encoder to obtain the image characteristics of the image to be identified, and identify the text information through the trained text encoder to obtain the text characteristics of the text information.

In practical application, the image encoder and the text encoder may be trained in advance, the terminal preprocesses the medical image and inputs the medical image into the trained image encoder, the trained image encoder outputs the image characteristics of the medical image, the terminal may also normalize the image report and input the image report into the trained text encoder, and the trained text encoder outputs the text characteristics of the image report.

Step S130, determining a target track of the region of interest from the image to be identified according to the image characteristics and the text characteristics based on a predetermined image knowledge graph.

The image knowledge graph can be a knowledge graph describing image characteristics, text characteristics, sight line attention track characteristics and relations among the characteristics.

The region of interest may be a region of the line-of-sight attention track and its neighborhood, i.e. a line-of-sight attention region.

Wherein the target trajectory may be an identified line of sight trajectory of interest.

In specific implementation, an image knowledge graph can be pre-constructed, and the terminal determines an interested region and a target track in the interested region from the image to be identified according to the image characteristics and the text characteristics based on the pre-constructed image knowledge graph.

In practical application, the line-of-sight attention track of an expert in the medical image can be deduced based on the image knowledge graph and the image characteristics of the medical image and the text characteristics of the image report, the line-of-sight attention track is determined to be a target track, the terminal can highlight the line-of-sight attention track and the neighborhood thereof as the region of interest, and the deduced line-of-sight attention track can be displayed in a question-answering or recommending mode.

The method for constructing the image knowledge graph includes, but is not limited to, a bottom-up method, a top-down method, or a method of mixing the bottom-up method and the top-down method.

The image knowledge graph reasoning method comprises, but is not limited to, logic-based reasoning, graph-based reasoning and deep learning-based reasoning.

Fig. 2 provides a flow chart of an image trajectory determination method. According to fig. 2, the image trajectory determination method may include the steps of:

Step S201, inputting an image report (or clinical information) of a medical image which is expected to carry out knowledge training on a primary doctor into a standardized image report template, and acquiring text characteristics of the image report (or clinical information) by adopting a trained text encoder;

step S202, preprocessing the medical image which is expected to carry out knowledge training on the primary doctor, wherein the preprocessing method comprises, but is not limited to, traditional region segmentation (for example, a thresholding method, a random walk method and the like), and removing black edges or scanning tables in the medical image;

step S203, inputting the preprocessed medical image into a trained image encoder to acquire a sight line attention area which needs to be learned by a primary doctor, wherein the trained image encoder outputs image characteristics of the medical image;

step S204, according to the text characteristics of the image report and the image characteristics of the medical image, a pre-constructed knowledge graph network is adopted to acquire knowledge reasoning corresponding to the medical image and the image report (or clinical information);

step S205, the acquired sight line attention area is displayed in a highlighting mode to remind a primary doctor to learn, and knowledge reasoning acquired in step S204 is displayed in a mode of intelligent question answering, recommendation system and the like to prompt the primary doctor to train and train corresponding knowledge so as to acquire the corresponding knowledge.

According to the image track determining method, the image characteristics of the image to be identified and the text characteristics of the text information are identified by acquiring the image to be identified and the text information corresponding to the image to be identified, and the target track of the region of interest is determined from the image to be identified according to the image characteristics and the text characteristics based on a predetermined image knowledge graph; the method can determine the sight line attention track of an expert when the expert reads the medical image based on the knowledge graph according to any medical image and the image report of the medical image, so that the primary doctor can learn, the learning range of the imaging knowledge is enlarged, and the flexibility of the imaging knowledge learning is improved.

In one embodiment, the step S120 may specifically include: inputting the image to be identified into a trained image encoder to obtain the image characteristics of the image to be identified; inputting the text information into a trained text encoder to obtain text characteristics of the text information.

The image encoder may be, but is not limited to, a transducer (neural network based on the attention mechanism), a convolutional neural network based encoder, or a convolutional neural network and transducer based encoder, among others.

The text encoder may be, but is not limited to, a transducer-based encoder, a recurrent neural network-based encoder, or a recurrent neural network and transducer-based encoder.

In a specific implementation, the terminal may input the image to be identified to a trained image encoder, the trained image encoder outputs image features of the image to be identified, text information corresponding to the image to be identified is input to a trained text encoder, and the trained text encoder outputs text features of the text information.

In practical application, the method can be used for carrying out traditional region segmentation on the medical image, removing black edges or scanning tables and the like, inputting the medical image into a trained image encoder, identifying the preprocessed medical image by the trained image encoder, outputting the image characteristics of the medical image, or carrying out standardization processing on an image report by applying a standardized image report template, inputting the standardized image report template into a trained text encoder, identifying the standardized image report by the trained text encoder, and outputting the text characteristics of the image report.

In the embodiment, the image characteristics of the image to be identified are obtained by inputting the image to be identified into a trained image encoder; the text information is input to the trained text encoder, the text characteristics of the text information are obtained, the characteristics of the region of interest of the image to be identified and the text characteristics of the text information can be rapidly determined, and the efficiency of determining the sight line attention track is improved.

In one embodiment, as shown in fig. 3, before the step S110, the method may specifically further include:

step S101, acquiring an image sample, and a track sample and a text sample corresponding to the image sample;

step S102, determining sample fusion characteristics of an image sample according to a track sample, and determining text sample characteristics corresponding to the image sample according to a text sample;

step S103, training the encoder to be trained according to the sample fusion characteristics and the text sample characteristics to obtain a trained encoder; the trained encoders include a trained track encoder, a trained image encoder, and a trained text encoder.

The image samples may be training samples of an encoder network, for example, a plurality of medical image samples. The trajectory samples may be gaze trajectories in the image samples, e.g. expert gaze trajectories in each medical image sample. The text samples may be text information corresponding to the image samples, e.g., an image report for each medical image sample.

The track encoder may be, but is not limited to, an LSTM (Long Short Term Memory, long and short term memory neural network) based encoder.

In a specific implementation, a plurality of medical image samples can be used as image samples, a sight line attention track of an expert, an imaging doctor or a pathology doctor during film reading is collected through an eye tracker for each medical image sample to be used as a track sample, and an image report of each medical image sample is used as a text sample. Inputting a track sample to a track encoder to be trained to obtain track sample characteristics of the track sample, registering the track sample with the image sample to obtain a sight line attention area in the image sample, taking the sight line attention area as an attention area sample, inputting the sight line attention area to the image encoder to be trained to obtain sample image characteristics of the sight line attention area, and fusing the track sample characteristics of the sight line attention track with the sample image characteristics of the sight line attention area to obtain sample fusion characteristics of the image sample. The text sample can be input to a text encoder to be trained, and the text sample characteristics corresponding to the image sample are obtained. And training the track encoder to be trained, the image encoder to be trained and the text encoder to be trained according to the sample fusion characteristics and the text sample characteristics, and finally obtaining the trained track encoder, the trained image encoder and the trained text encoder.

In the embodiment, an image sample, a track sample and a text sample corresponding to the image sample are acquired; determining sample fusion characteristics of the image sample according to the track sample, and determining text sample characteristics corresponding to the image sample according to the text sample; training the encoder to be trained according to the sample fusion characteristics and the text sample characteristics to obtain a trained encoder; the trained encoder comprises a trained track encoder, a trained image encoder and a trained text encoder, and the track encoder, the image encoder and the text encoder can be trained, so that the efficiency of determining the sight line attention track is improved.

In one embodiment, the step S102 may specifically include: extracting track sample characteristics of a track sample; registering the track sample with the image sample to obtain a region-of-interest sample in the image sample, and extracting sample image features of the region-of-interest sample; and fusing the track sample characteristics and the sample image characteristics to obtain sample fusion characteristics of the image sample.

The track sample feature may be track feature information of a track of visual attention in the image sample. The sample image features may be image feature information of a line of sight region of interest in the image sample.

In a specific implementation, the terminal can input a track sample to a track encoder to be trained for identification to obtain track sample characteristics of the track sample, can register the track sample with an image sample, extract the track sample in the image sample and a neighborhood of the track sample as a region-of-interest sample, input the region-of-interest sample to the image encoder to be trained to obtain sample image characteristics of the region-of-interest sample, and can fuse the track sample characteristics with the sample image characteristics to obtain sample fusion characteristics of the image sample.

Fig. 4 provides a flow diagram of a sample fusion feature acquisition process. According to fig. 4, the acquisition process of the sample fusion feature mainly comprises the following steps:

step S401, collecting a sight line attention track of an expert by using an eye tracker as a track sample;

step S402, a medical image sample is obtained as an image sample, traditional region segmentation is carried out on the image sample, for example, a threshold method, a random walk method and the like, and black edges or scanning tables in the medical image are removed;

step S403, inputting the track sample in step S401 into a track encoder to be trained, and obtaining the feature expression P of the track sample ₁ 、P ₂ 、……、P _n I.e., trace sample features;

step S404, registering the track sample with the preprocessed image sample, detecting the sight line focus track in the registered image, dividing the detected sight line focus track and the neighborhood thereof to obtain a region-of-interest sample, inputting the region-of-interest sample into an image encoder to be trained to obtain an image feature expression I of the region-of-interest sample ₁ 、I ₂ 、……、I _n I.e. sample image features;

step S405, track sample characteristics P of a plurality of image samples _i (1.ltoreq.i.ltoreq.n) and sample image features I _j (j is not less than 1 and not more than n) to obtain a sample fusion characteristic R _k (1≤k≤n)。

In this embodiment, the track sample features of the track sample are extracted; registering the track sample with the image sample to obtain a region-of-interest sample in the image sample, and extracting sample image features of the region-of-interest sample; the track sample features and the sample image features are fused to obtain sample fusion features of the image samples, the track features of the line-of-sight track in the image samples can be fused with the image features of the line-of-sight region in the image samples, the obtained sample fusion features are utilized to train the encoder network, and the efficiency of determining the image tracks can be improved.

In one embodiment, the step S103 may specifically include: pairing the sample fusion characteristics and the text sample characteristics to obtain a positive training sample and a negative training sample; and training the track encoder to be trained, the image encoder to be trained and the text encoder to be trained according to the positive training sample and the negative training sample to obtain a trained track encoder, a trained image encoder and a trained text encoder.

The positive training samples may be sample fusion features and text sample features corresponding to the same image sample. The samples are negatively trained to fuse features and text sample features for samples corresponding to different image samples.

In specific implementation, the terminal can pair sample fusion characteristics of the image samples and text sample characteristics of the text samples, take the sample fusion characteristics and the text sample characteristics corresponding to the same image samples as positive training samples, take the sample fusion characteristics and the text sample characteristics corresponding to different image samples as negative training samples, train the track encoder to be trained, the image encoder to be trained and the text encoder to be trained according to the positive training samples and the negative training samples, gradually approach the distance between the sample fusion characteristics and the text sample characteristics in the positive training samples, gradually separate the distance between the sample fusion characteristics and the text sample characteristics in the negative training samples, continuously adjust parameters of the track encoder to be trained, the image encoder to be trained and the text encoder to be trained, and finally obtain the trained track encoder, the trained image encoder and the trained text encoder.

After the sample fusion feature set and the text sample feature set are obtained, the sample fusion features and the text sample features corresponding to the same image sample in the two sets can be paired to obtain a positive training sample, and then the residual sample fusion features in the sample fusion feature set and the residual text sample features in the text sample feature set are paired to obtain a negative training sample.

In the embodiment, a positive training sample and a negative training sample are obtained by matching sample fusion characteristics with text sample characteristics; and training the track encoder to be trained, the image encoder to be trained and the text encoder to be trained according to the positive training sample and the negative training sample to obtain a trained track encoder, a trained image encoder and a trained text encoder, so that the track encoder, the image encoder and the text encoder can be obtained through training, and the efficiency of determining the sight line focus track is improved.

In one embodiment, the step of training the track encoder to be trained, the image encoder to be trained, and the text encoder to be trained according to the positive training sample and the negative training sample to obtain a trained track encoder, a trained image encoder, and a trained text encoder may specifically include: performing contrast learning according to the positive training sample and the negative training sample to obtain a contrast learning loss value; and according to the contrast learning loss value, carrying out parameter adjustment on the track encoder to be trained, the image encoder to be trained and the text encoder to be trained to obtain a trained track encoder, a trained image encoder and a trained text encoder.

In specific implementation, the terminal can perform contrast learning by using a positive training sample and a negative training sample, and adjust parameters of a track encoder to be trained, an image encoder to be trained and a text encoder to be trained by gradually approaching a distance between a sample fusion feature in the positive training sample and a text sample feature and gradually separating a distance between the sample fusion feature in the negative training sample and the text sample feature, and also can compare a contrast learning loss value obtained by training a current encoder with a contrast learning loss value obtained by training a last encoder, if the contrast learning loss value is converged, the encoder parameters obtained by training the current encoder are used to determine the trained track encoder, the trained image encoder and the trained text encoder, otherwise, if the contrast learning loss value is not converged, the encoder parameters are continuously adjusted until the contrast learning loss value is converged.

Fig. 5 provides a flow diagram of an encoder training process. According to fig. 5, the encoder training process may include the steps of:

step S501, acquiring a large number of unlabeled medical images, and acquiring clinical texts in PACS (Picture Archiving and Communication System, image archiving and communication system) image reports or RIS (Radiology Information System, radiology department information system), wherein the image reports or the clinical texts are used as text samples;

Step S502, converting the image report or the clinical text into standardized template text information;

step S503, inputting the standardized template text information into a text encoder to be trained to obtain text sample characteristics T ₁ 、T ₂ 、……、T _n ；

Step S504, fusing the sample fusion characteristics R of the plurality of image samples _k (1. Ltoreq.k. Ltoreq.n) and text sample feature T _p (1. Ltoreq.p. Ltoreq.n) pairs of features R paired into a plurality of image samples _k T _p (1.ltoreq.k.ltoreq.n, 1.ltoreq.p.ltoreq.n), wherein pairs of features of paired (corresponding to the same image sample) are positive samples and pairs of features of unpaired (corresponding to different image samples) are negative samples;

step S505, performing contrast learning by using positive and negative samples, and performing model training according to contrast learning loss function (contrast loss) to make R of positive sample pair _k And T _p The distance between them is close, R of negative sample pair _k And T _p The distance between the two is far away, and the track encoder to be trained, the image encoder to be trained in fig. 4 and the text encoder to be trained in fig. 5 are trained, so that the encoder network is optimized, wherein the trained track encoder, the image encoder and the text encoder can have respective weights, and the weights of the trained track encoder, the image encoder and the text encoder are fixed, so that a knowledge graph is constructed according to the trained track encoder, the trained image encoder and the trained text encoder.

In the embodiment, a comparison learning loss value is obtained by performing comparison learning according to a positive training sample and a negative training sample; and according to the contrast learning loss value, carrying out parameter adjustment on the track encoder to be trained, the image encoder to be trained and the text encoder to be trained to obtain a trained track encoder, a trained image encoder and a trained text encoder, and training to obtain the track encoder, the image encoder and the text encoder, thereby improving the efficiency of determining the sight line attention track.

In one embodiment, after the step S103, the method may specifically further include: determining a text feature entity corresponding to the region feature entity according to the trained encoder; extracting the relation between the regional characteristic entity and the text characteristic entity to obtain the entity relation between the regional characteristic entity and the text characteristic entity; and determining an image knowledge graph according to the regional characteristic entity, the text characteristic entity and the entity relationship.

The region feature entity may include an entity formed by the region of interest track feature and an entity formed by the region of interest image feature. The text feature entity may be an entity formed by text features. The entity relationship may be a relationship between a region feature entity and a text feature entity.

In specific implementation, the terminal can register the image to be identified and the sight line focusing track in the image to be identified to obtain an interested region in the image to be identified, input the interested region into a trained image encoder to obtain image characteristics, input the sight line focusing track in the image to be identified into the trained track encoder to obtain track characteristics, input the image characteristics and the track characteristics as regional characteristic entities, input texts corresponding to the image to be identified into a trained text encoder, take the obtained text characteristics as text characteristic entities, perform relation extraction on the regional characteristic entities and the text characteristic entities of the plurality of images to be identified to obtain entity relations between the regional characteristic entities and the text characteristic entities, further form a relation network between the regional characteristic entities and the text characteristic entities in multiple categories, perform knowledge characteristic map representation on the relation network, represent the regional characteristic entities and the text characteristic entities as knowledge nodes, represent the entity relations as edges of the knowledge map, and form the knowledge map network.

Fig. 6 provides a schematic flow chart of an image knowledge graph construction process. According to fig. 6, the image knowledge graph construction process may include the steps of:

Step S601, acquiring track characteristics of a line-of-sight focus track in an image to be identified based on a trained track encoder, acquiring image characteristics of the image to be identified based on a trained image encoder, acquiring text characteristics of a text corresponding to the image to be identified based on a trained text encoder, taking the track characteristics and the image characteristics as regional characteristic entities, and taking the text characteristics as text characteristic entities;

step S602, performing relation extraction on the track features and the image features of the plurality of images to be identified and the text features of the corresponding texts to form a relation network between the region feature entities and the text feature entities, for example, forming a relation network between multi-category track features, image features and text features;

in step S603, knowledge feature representation is performed on the obtained relationship network, the entity is represented as a node of the knowledge graph, the relationship is represented as an edge of the knowledge graph, and the knowledge graph network is formed, for example, the track feature, the image feature and the text feature are represented as nodes, and the relationship among the track feature, the image feature and the text feature is represented as an edge, so that the knowledge graph network is formed.

In this embodiment, the region feature entity and the text feature entity corresponding to the region feature entity are determined according to the trained encoder; extracting the relation between the regional characteristic entity and the text characteristic entity to obtain the entity relation between the regional characteristic entity and the text characteristic entity; according to the regional characteristic entity, the text characteristic entity and the entity relationship, determining an image knowledge graph, constructing the knowledge graph, determining a target track through the knowledge graph, and improving the track determination efficiency.

In order to facilitate a thorough understanding of embodiments of the present application by those skilled in the art, the following description will be provided in connection with a specific example.

In order to help doctors accumulate experience and quickly learn experience knowledge of experts on medical images (such as CT, MR or pathological images) and the like, the application constructs a medical knowledge graph network by analyzing eye movement tracks when the experts read the images and report, and designs a cross-mode image and text learning method based on eye movement tracking, which comprises the following specific steps:

step 10, constructing a learning network based on the eye movement tracking region of interest characteristics, which specifically comprises the following steps:

step 11, performing eye vision tracking based on an eye tracker, and acquiring a sight line attention track when an expert (including but not limited to an imaging doctor, a pathology doctor and the like) reads a film as a track sample;

step 12, acquiring a medical image as an image sample, performing conventional region segmentation (for example, a thresholding method, a random walk method, etc.) on the image sample, and removing black edges or a scanning table in the image sample;

step 13, inputting the track sample into a track encoder to be trained, and obtaining the track sample characteristics P ₁ 、P ₂ 、……、P _n ；

Step 14, registering the track sample with the preprocessed image sample, and acquiring a detection or segmentation area of the sight line focus track neighborhood from the registered image sample;

Step 15, taking the detection or segmentation area of the sight line attention track neighborhood as an area sample of interest, inputting the area sample of interest into an image encoder to be trained, and obtaining sample image characteristics I of the area sample of interest ₁ 、I ₂ 、……、I _n ；

Step 16, track sample feature P ₁ 、P ₂ 、……、P _n Sample image features I with a region of interest sample ₁ 、I ₂ 、……、I _n Fusing, inputting the fused characteristics into a corresponding network to obtain sample fused characteristics R ₁ 、R ₂ 、……、R _n A learning network based on the eye movement tracking of the features of the region of interest is constructed.

Step 20, constructing a joint learning network based on cross-modal images, texts and eye movement tracking regions of interest, which specifically comprises the following steps:

step 21, obtaining a large number of unlabeled original medical images as image samples, and PACS image reports or clinical texts in RIS as text samples;

step 22, converting the text sample into standardized template text;

step 23, inputting the standardized template text into a text encoder to be trained to obtain text sample characteristics T ₁ 、T ₂ 、……、T _n ；

Step 24, fusing the sample fusion characteristics R of the plurality of image samples ₁ 、R ₂ 、……、R _n And text sample feature T ₁ 、T ₂ 、……、T _n Pairing to form a plurality of feature pairs R _k T _p (1.ltoreq.k.ltoreq.n, 1.ltoreq.p.ltoreq.n), wherein paired pairs of features are positive samples and unpaired pairs of features are negative samples;

Step 25, performing contrast learning by using the positive sample and the negative sample obtained in the step 24, and performing model training according to a contrast learning loss function;

step 26, training a track encoder to be trained, an image encoder to be trained and a text encoder to be trained to optimize a network;

step 27, fixing weights of the trained track encoder, the trained image encoder and the trained text encoder.

Step 30, constructing a knowledge graph network, which specifically comprises the following steps:

step 31, acquiring a region of interest (including a track and an image) and a characteristic entity of a text based on the trained track encoder, the trained image encoder and the trained text encoder constructed in step 27;

step 32, extracting the relation between the multiple samples of the feature entities of the region of interest and the text to form a relation network among the multi-category track features, the image features and the text features;

and 33, carrying out knowledge characteristic representation on the relationship network obtained in the step 32, and respectively representing the entity and the relationship as nodes and edges of the knowledge graph to form a knowledge graph network.

Step 40, performing doctor training education based on the constructed knowledge graph network, which specifically comprises the following steps:

Step 41, inputting corresponding clinical information which is expected to carry out knowledge training on a primary doctor into a standardized image report template, and obtaining text characteristics of standardized clinical information by adopting a trained text encoder;

step 42, preprocessing corresponding image information for which knowledge training is expected to be performed on the primary doctor;

step 43, inputting the preprocessed images into a trained track encoder and a trained image encoder to acquire a sight line attention area which needs to be learned by doctors;

step 44, obtaining reasoning and deducing new knowledge corresponding to the clinical information and the image according to the knowledge graph network constructed in step 33 by adopting the text features obtained in step 41 and the sight line attention area obtained in step 43;

and 45, displaying the sight line attention area obtained in the step 43 in a highlighting mode, reminding a primary doctor to learn, displaying the new reasoning and the new reasoning obtained in the step 44 in a mode of intelligent question answering, a recommendation system and the like, and prompting the primary doctor to train and train corresponding knowledge so as to obtain the corresponding knowledge.

According to the method for learning the cross-mode images and the texts based on the eye movement tracking, the learning network based on the characteristics of the region of interest of the eye movement tracking and the joint learning network of the cross-mode images, the texts and the region of interest of the eye movement tracking are constructed, and then the knowledge graph is constructed according to the constructed learning network, so that the method can be used for training the primary doctor including the knowledge of the imaging doctor, the pathology doctor and the like, constructing the medical knowledge graph network, saving education training resources and rapidly improving the actual combat experience of the doctor.

In one embodiment, as shown in fig. 7, there is provided an image track determining method, which is described by taking a terminal as an example, and includes the following steps:

step S701, acquiring an image sample, and a track sample and a text sample corresponding to the image sample;

step S702, determining sample fusion characteristics of an image sample according to a track sample, and determining text sample characteristics corresponding to the image sample according to a text sample;

step S703, training the encoder to be trained according to the sample fusion characteristics and the text sample characteristics to obtain a trained encoder; the trained encoder includes a trained track encoder, a trained image encoder, and a trained text encoder;

step S704, determining a region feature entity and a text feature entity corresponding to the region feature entity according to the trained encoder;

step S705, extracting the relation between the regional characteristic entity and the text characteristic entity to obtain the entity relation between the regional characteristic entity and the text characteristic entity;

step S706, determining an image knowledge graph according to the regional characteristic entity, the text characteristic entity and the entity relationship;

step S707, obtaining an image to be identified and text information corresponding to the image to be identified;

Step S708, inputting the image to be recognized into a trained image encoder to obtain the image characteristics of the image to be recognized, and inputting the text information into a trained text encoder to obtain the text characteristics of the text information;

step S709, determining a target track of the region of interest from the image to be identified according to the image features and the text features based on the predetermined image knowledge graph.

In the specific implementation, the terminal can acquire an image sample, a track sample and a text sample corresponding to the image sample, input the track sample into a track encoder to be trained to obtain track sample characteristics output by the track encoder to be trained, register the track sample and the image sample, extract a region-of-interest sample in the image sample, input the region-of-interest sample into the image encoder to be trained to obtain sample image characteristics output by the image encoder to be trained, fuse the sample image characteristics with the track sample characteristics to obtain sample fusion characteristics, input the text sample into the text encoder to be trained to obtain text sample characteristics output by the text encoder to be trained, pair the sample fusion characteristics with the text sample characteristics to obtain paired positive training samples and unpaired negative training samples, training a track encoder, an image encoder and a text encoder to be trained by using a positive training sample and a negative training sample to obtain a trained track encoder, an image encoder and a text encoder, generating a regional characteristic entity and a text characteristic entity according to the trained track encoder, the image encoder and the text encoder, determining the entity relationship between the regional characteristic entity and the text characteristic entity, further constructing an image knowledge graph, inputting the image to be recognized into the trained image encoder to obtain the image characteristics of the image to be recognized, inputting text information corresponding to the image to be recognized into the trained text encoder to obtain the text characteristics of the text information, reasoning out a target track corresponding to the image characteristics and the text characteristics based on the image knowledge graph as a sight line focus track to be learned, the target track and the neighborhood thereof are the region of interest.

According to the image track determining method, through obtaining an image sample, a track sample and a text sample corresponding to the image sample, determining sample fusion characteristics of the image sample according to the track sample, determining text sample characteristics corresponding to the image sample according to the text sample, training an encoder to be trained according to the sample fusion characteristics and the text sample characteristics to obtain a trained encoder, determining regional characteristic entities and text characteristic entities corresponding to the regional characteristic entities according to the trained encoder, extracting the regional characteristic entities and the text characteristic entities to obtain entity relations between the regional characteristic entities and the text characteristic entities, determining an image knowledge graph according to the regional characteristic entities, the text characteristic entities and the entity relations, obtaining text information corresponding to an image to be identified and the image to be identified, inputting the image to be identified into the trained image encoder to obtain image characteristics of the image to be identified, inputting the text information into the trained text encoder to obtain text characteristics of the text information, determining an interested region from the image to be identified according to the image characteristics and the text characteristics based on the predetermined image knowledge graph, and further obtaining a target track; the method can determine the sight line attention track of an expert when the expert reads the medical image based on the knowledge graph according to any medical image and the image report of the medical image, so that the primary doctor can learn, the learning range of the imaging knowledge is enlarged, and the flexibility of the imaging knowledge learning is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an image track determining device for realizing the above related image track determining method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the image track determining apparatus provided in the following may be referred to the limitation of the image track determining method hereinabove, and will not be described herein.

In one embodiment, as shown in fig. 8, there is provided an image trajectory determining device including: an information acquisition module 810, a feature recognition module 820, and a trajectory determination module 830, wherein:

the information acquisition module 810 is configured to acquire an image to be identified and text information corresponding to the image to be identified;

a feature recognition module 820 for recognizing image features of the image to be recognized and text features of the text information;

the track determining module 830 is configured to determine, based on a predetermined image knowledge graph, a target track of the region of interest from the image to be identified according to the image feature and the text feature.

In one embodiment, the feature recognition module 820 is further configured to input the image to be recognized to a trained image encoder to obtain the image feature of the image to be recognized; and inputting the text information into a trained text encoder to obtain the text characteristics of the text information.

In one embodiment, the image track determining device further includes:

the sample acquisition module is used for acquiring an image sample, and a track sample and a text sample corresponding to the image sample;

The sample feature module is used for determining sample fusion features of the image samples according to the track samples and determining text sample features corresponding to the image samples according to the text samples;

the model training module is used for training the encoder to be trained according to the sample fusion characteristics and the text sample characteristics to obtain a trained encoder; the trained encoder includes a trained track encoder, the trained image encoder, and the trained text encoder.

In one embodiment, the sample feature module is further configured to extract a track sample feature of the track sample; registering the track sample with the image sample to obtain a region-of-interest sample in the image sample, and extracting sample image features of the region-of-interest sample; and fusing the track sample characteristics and the sample image characteristics to obtain the sample fusion characteristics of the image sample.

In one embodiment, the model training module is further configured to pair the sample fusion feature and the text sample feature to obtain a positive training sample and a negative training sample; and training a track encoder to be trained, an image encoder to be trained and a text encoder to be trained according to the positive training sample and the negative training sample to obtain the trained track encoder, the trained image encoder and the trained text encoder.

In one embodiment, the model training module is further configured to determine a contrast learning loss value for performing contrast learning according to the positive training sample and the negative training sample; and according to the contrast learning loss value, carrying out parameter adjustment on the track encoder to be trained, the image encoder to be trained and the text encoder to be trained to obtain the trained track encoder, the trained image encoder and the trained text encoder.

In one embodiment, the image track determining device further includes:

the entity determining module is used for determining a region feature entity and a text feature entity corresponding to the region feature entity according to the trained encoder;

the relation extraction module is used for extracting the relation between the regional characteristic entity and the text characteristic entity to obtain the entity relation between the regional characteristic entity and the text characteristic entity;

and the map construction module is used for determining the image knowledge map according to the regional characteristic entity, the text characteristic entity and the entity relationship.

The respective modules in the above-described image trajectory determination device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image trajectory determination method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of image trajectory determination, the method comprising:

2. The method of claim 1, wherein the identifying the image features of the image to be identified and the text features of the text information comprises:

3. The method according to claim 2, further comprising, before acquiring the image to be recognized and the text information corresponding to the image to be recognized:

4. A method according to claim 3, wherein said determining a sample fusion feature of the image sample from the track sample comprises:

extracting track sample characteristics of the track sample;

5. The method of claim 4, wherein training the encoder to be trained based on the sample fusion feature and the text sample feature results in a trained encoder, comprising:

6. The method of claim 5, wherein training the trajectory encoder to be trained, the image encoder to be trained, and the text encoder to be trained based on the positive training samples and the negative training samples to obtain the trained trajectory encoder, the trained image encoder, and the trained text encoder comprises:

7. A method according to claim 3, further comprising, after training the encoder to be trained based on the sample fusion feature and the text sample feature to obtain a trained encoder:

8. An image trajectory determining device, characterized in that the device comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.