CN113158822B

CN113158822B - Method and device for classifying eye detection data based on cross-modal relation reasoning

Info

Publication number: CN113158822B
Application number: CN202110336212.7A
Authority: CN
Inventors: 乔宇; 张秀兰; 宋迪屏; 李飞; 熊健; 何军军; 付彬
Original assignee: Shenzhen Institute of Advanced Technology of CAS; Zhongshan Ophthalmic Center
Current assignee: Shenzhen Institute of Advanced Technology of CAS; Zhongshan Ophthalmic Center
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2023-09-29
Anticipated expiration: 2041-03-29
Also published as: WO2022205780A1; CN113158822A

Abstract

The application is suitable for the technical field of artificial intelligence, and provides a method and a device for classifying eye detection data based on cross-modal relation reasoning, wherein the method comprises the following steps: obtaining visual field VF data and video disc data; inputting the VF data and the optic disc data into a trained convolutional neural network model to obtain a classification result corresponding to the VF data and the optic disc data, wherein the processing procedure of the convolutional neural network model on the VF data and the optic disc data comprises the following steps: and respectively extracting data characteristics of the VF data and the video disc data to obtain the VF data characteristics and the video disc data characteristics, carrying out joint processing on the VF data characteristics and the video disc data characteristics to obtain the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, carrying out characteristic fusion on the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data to obtain fusion characteristics, and classifying the fusion characteristics to obtain a classification result. By the method, more accurate classification results can be obtained.

Description

Method and device for classifying eye detection data based on cross-modal relation reasoning

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a method and a device for classifying eye detection data based on cross-modal relation reasoning, terminal equipment and a computer readable storage medium.

Background

Glaucoma is the first irreversible blinding eye disease worldwide, a heterogeneous disease that damages the optic nerve and leads to vision loss. Since glaucoma has the property of irreversible and progressive vision loss, early detection and timely treatment of glaucoma is critical for preventing visual field loss, blindness.

Clinically, medical personnel perform structural assessment of the patient's eye, such as optical coherence tomography (Optical Coherence Tomography, OCT) detection of the patient's eye. OCT is a non-contact and non-invasive imaging modality that can provide objective and quantitative assessment of various retinal structures. The medical staff refers to and analyzes the image presented by OCT detection (the data of OCT image for short later) to obtain the diagnosis result. Considering that it takes much time to review and analyze the image presented by OCT detection by a medical staff, in the existing method, the OCT image data is usually analyzed by combining with a convolutional neural network (Convolutional neural networks, CNNs) to obtain a classification result of whether glaucoma is obtained, and the medical staff then combines and analyzes the classification result with other data to obtain a diagnosis result. However, since glaucoma is complicated, the accuracy of the classification result output by the CNNs is low, and the assistance to medical staff is low, so that a new classification result determination method is required.

Disclosure of Invention

The embodiment of the application provides a method for classifying eye detection data based on cross-modal relation reasoning, which can solve the problem of lower accuracy of classification results output by the traditional CNNs.

In a first aspect, an embodiment of the present application provides a method for classifying eye detection data based on cross-modal relationship reasoning, including:

obtaining visual field VF data and video disc data;

inputting the VF data and the video disc data into a trained convolutional neural network model to obtain a classification result corresponding to the VF data and the video disc data, wherein the processing procedure of the convolutional neural network model on the VF data and the video disc data comprises the following steps: and respectively extracting the data characteristics of the VF data and the video disc data to obtain the VF data characteristics and the video disc data characteristics, carrying out joint processing on the VF data characteristics and the video disc data characteristics to obtain the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, carrying out characteristic fusion on the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data to obtain fusion characteristics, and classifying the fusion characteristics to obtain the classification result.

In a second aspect, an embodiment of the present application provides a device for classifying eye detection data based on cross-modal relationship reasoning, including:

the data acquisition unit is used for acquiring visual field VF data and video disc data;

the classification result output unit is configured to input the VF data and the optic disc data into a trained convolutional neural network model, and obtain classification results corresponding to the VF data and the optic disc data, where a processing procedure of the convolutional neural network model on the VF data and the optic disc data includes: and respectively extracting the data characteristics of the VF data and the video disc data to obtain the VF data characteristics and the video disc data characteristics, carrying out joint processing on the VF data characteristics and the video disc data characteristics to obtain the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, carrying out characteristic fusion on the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data to obtain fusion characteristics, and classifying the fusion characteristics to obtain the classification result.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product for causing a terminal device to carry out the method of the first aspect described above when the computer program product is run on the terminal device.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

in the embodiment of the application, since the output classification result is related to the enhancement feature of the optic disc data and the enhancement feature of the VF data, and the enhancement feature of the optic disc data and the enhancement feature of the VF data are obtained by carrying out joint processing on the VF data feature and the optic disc data feature, that is, the output classification result is related to the relationship between the VF data feature and the optic disc data feature in addition to the extracted VF data feature and the optic disc data feature, and the VF data feature and the optic disc data feature can reflect whether the tested eye is glaucoma to a certain extent, the accuracy of the classification result output by the embodiment of the application is higher than that of the classification result output by the convolutional neural network model based on single-mode data training, for example, the accuracy of the classification result output by the convolutional neural network model based on OCT graph data training.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a method for classifying eye detection data based on cross-modal relationship reasoning according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a convolutional neural network model according to a first embodiment of the present application;

FIG. 3 is a schematic diagram of determining a first global relationship vector according to an embodiment of the present application;

FIG. 4 is a schematic diagram showing a correspondence between a visual field area and a retinal nerve fiber layer according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a calculation flow of local relation vectors of 2 feature region pairs according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a VF enhancement feature calculation process provided in accordance with the first embodiment of the present application;

FIG. 7 is a schematic diagram of PDPs according to a first embodiment of the application;

FIG. 8 is a schematic diagram of OCT image data according to a first embodiment of the present application;

fig. 9 is a schematic structural diagram of a classification device for eye detection data based on cross-modal relationship reasoning according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Furthermore, in the description of the present specification and the appended claims, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise.

Embodiment one:

the existing CNNs are obtained by training based on OCT image data, and because the CNNs are obtained by training based on single-mode data, the accuracy of the output classification result is low, and the auxiliary effect on medical staff is also low. In order to improve accuracy of classification results, the embodiment of the application provides a novel method for classifying eye detection data based on cross-modal relation reasoning, in the classification method, a convolutional neural network model is trained through Visual Field (VF) data and video disc data, so that the trained convolutional neural network model can respectively extract data characteristics of input VF data and video disc data to obtain corresponding VF data characteristics and video disc data characteristics, corresponding enhancement characteristics are respectively determined based on the extracted data characteristics, characteristic fusion is carried out according to the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, and finally the fused characteristics are classified to obtain classification results. Because the output classification result is related to the enhancement feature of the video disc data and the enhancement feature of the VF data, and the enhancement feature of the video disc data and the enhancement feature of the VF data are obtained by carrying out joint processing on the VF data feature and the video disc data feature, namely, the output classification result is related to the relationship between the VF data feature and the video disc data feature besides the extracted VF data feature and the video disc data feature, and the VF data feature and the video disc data feature can reflect whether the tested eye is glaucoma to a certain extent, the accuracy of the classification result output by the embodiment of the application is higher than that of the classification result output by a convolution neural network model based on single-mode data training.

The following describes a method for classifying eye detection data based on cross-modal relation reasoning provided by an embodiment of the application with reference to the accompanying drawings.

Fig. 1 shows a flowchart of a method for classifying eye detection data based on cross-modal relationship reasoning, which is provided in an embodiment of the present application, and is described in detail as follows:

step S11, view VF data and disc data are acquired.

Specifically, if the user's eyes are subjected to VF detection, corresponding VF data are obtained; when the user's eyes are inspected for discs, corresponding disc data is obtained, and the discs are all called optic discs, also called disk papilla. It should be noted that, the VF data and the optic disc data herein are data corresponding to the same eye to be tested, that is, the data obtained after performing VF detection on the same eye to be tested is VF data, and the data obtained after performing optic disc nerve is optic disc data, where the VF data and the optic disc data are used to reflect the situation of the same eye to be tested on the field of view and the optic disc, respectively.

In this embodiment, the view VF data and the optic disc data may be obtained locally or from the cloud. The view field VF data and the disc data may be data obtained by processing the original data, for example, if the original data is a view field detection report, the view field detection report is processed to obtain VF data.

Step S12, inputting the VF data and the video disc data into a trained convolutional neural network model to obtain a classification result corresponding to the VF data and the video disc data, wherein the processing process of the VF data and the video disc data by the convolutional neural network model comprises the following steps: and respectively extracting data characteristics of the VF data and the video disc data to obtain the VF data characteristics and the video disc data characteristics, carrying out joint processing on the VF data characteristics and the video disc data characteristics to obtain the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, carrying out characteristic fusion on the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data to obtain fusion characteristics, and classifying the fusion characteristics to obtain a classification result.

Referring to the schematic structure of the convolutional neural network model shown in fig. 2, the convolutional neural network model includes a VF data feature extraction module 21, a optic disc data feature extraction module 22, a multi-modal enhancement feature processing module 23, a feature fusion module 24, and a classifier 25. Wherein, the VF data feature extraction module 21 is used for extracting the data features of the VF data; the optic disc data feature extraction module 22 is configured to extract data features of optic disc data; the multi-mode enhancement feature processing module 23 is configured to perform joint processing on the VF data feature and the optic disc data feature, so as to obtain an enhancement feature of VF data and an enhancement feature of optic disc data; the feature fusion module 24 is configured to fuse the enhancement feature of the VF data and the enhancement feature of the optic disc data to obtain a fusion feature; the classifier 25 is used for classifying the fusion features to obtain a classification result.

In some embodiments, the joint processing of the VF data feature and the video disc data feature results in an enhanced feature of the VF data, including:

a1, determining VF data features obtained by converting the video disc data features according to the VF data features and the video disc data features;

a2, obtaining VF enhancement features according to the VF data features and the VF data features obtained by converting the video disc data features, wherein the VF enhancement features are enhancement features of the VF data;

Performing joint processing on the data characteristics of the VF data and the data characteristics of the video disc data to obtain enhancement characteristics of the video disc data, wherein the joint processing comprises the following steps:

b1, determining video disc data characteristics obtained by converting the VF data characteristics according to the VF data characteristics and the video disc data characteristics;

and B2, obtaining the video disc enhancement characteristic according to the video disc data characteristic and the video disc data characteristic obtained by converting the VF data characteristic, wherein the video disc enhancement characteristic is the enhancement characteristic of video disc data.

In the above-mentioned A1, A2, B1 and B2, the VF data feature converted from the disc data feature means: the data characteristic represented by the disc data characteristic is used as a compensation for the acquired original VF data characteristic. Similarly, the disc data characteristics converted from the VF data characteristics are used as a compensation for the acquired original disc data characteristics. Since the structure of the Optic Nerve Head (ONH) of the eye changes when glaucoma lesions occur and the eye also has corresponding visual function defects, there is a strong correlation between structural damage and visual function damage caused by glaucoma lesions, i.e. thinning of the Optic Nerve fiber layer (Retinal Nerve Fiber Layer, RNFL) is spatially consistent with defects of VF, i.e. there is a strong correlation between Optic disc data characteristics and VF data characteristics. In addition, for early glaucoma, there is time "hysteresis" if detected using VF techniques, and RNFL thickness is a sensitive indicator of early glaucoma changes (e.g., pre-visual field glaucoma, a decrease in apparent RNFL thickness can already be observed before visual field damage occurs); however, for glaucoma in the middle and late stages, VF detection is an effective means of monitoring the progression of glaucoma. In summary, the optic disc data features and the VF data features have complementary characteristics, and a comprehensive evaluation of the relationship between the structural damage and the functional damage (such as determining the VF data features obtained by converting the optic disc data features) is helpful for understanding the glaucoma by the convolutional neural network model, so as to output a more accurate classification result, where the classification result is glaucoma or non-glaucoma.

In some embodiments, the VF data characteristics converted from disc data characteristics described above are determined according to the following:

c1, determining a first global relation vector for converting the video disc data characteristic to the VF data characteristic according to the VF data characteristic and the video disc data characteristic, and determining a first local relation vector for converting the video disc data characteristic to the VF data characteristic.

Specifically, for each VF data feature, a relationship between the VF data feature and each disc data feature is determined, and then a first global relationship vector is determined according to each determined relationship. For example, assuming VF data is characterized by a1, a2 and disc data is characterized by b1 and b2, then for a1, determining the relationship of a1 to b1, and determining the relationship of a1 to b 2; determining a2 to b1 relationship for a2, and determining a2 to b2 relationship; and finally, determining a corresponding first global relation vector according to the relation between a1 and b1, the relation between a1 and b2, the relation between a2 and b1 and the relation between a2 and b 2.

And C2, determining VF data features obtained by converting the video disc data features according to the first global relation vector and the first local relation vector.

In the foregoing C1 and C2, since the VF data feature obtained by converting the disc data feature is determined according to the first global relationship vector and the first local relationship vector, the VF data feature obtained by converting the disc data feature has a global relationship (for example, has a global relationship between each VF data feature and each disc data feature) and a local relationship, so as to ensure accuracy of the VF data feature obtained by converting the disc data feature.

It should be noted that, assuming that the global relationship vector used for converting the VF data feature to the optic disc data feature is a second global relationship vector, the method for determining the second global relationship vector is similar to the method for determining the first global relationship vector described above, and will not be described herein.

In some embodiments, determining a first global relationship vector for converting the video disc data characteristic to the VF data characteristic according to the VF data characteristic and the video disc data characteristic in C1 includes:

and C11, splicing the video disc data features and the VF data features to obtain splicing vectors.

And C12, performing mapping operation on the spliced vector to obtain a mapping value of the spliced vector.

Wherein the mapping value is a scalar.

And C13, determining a first global relation vector for converting the video disc data characteristics into the VF data characteristics according to the mapping values.

Specifically, after mapping values corresponding to splicing vectors of each VF data feature and each video disc data feature are obtained, normalization operation is performed on the obtained mapping values, and a first global relation vector is obtained. Referring to fig. 3, v represents a feature map corresponding to the features of the VF data, the number of channels is C, and h×w is the feature map size; and O represents a characteristic diagram corresponding to the characteristic of the video disc data, the channel number is C, and the characteristic diagram size is H multiplied by W. V and O are taken as inputs to a global relationship inference module in the multi-modal enhanced feature processing module 23, which is used to determine a first global relationship vector and a second global relationship vector. In some embodiments, if the feature dimensions of V and O are not identical, then V and O are processed such that they are identical, e.g., convolutional layer transport, reshape (which is a function of transforming a specified matrix into a particular dimension matrix), repeat (which is a function of replicating matrix elements), etc., are performed on V and O, respectively, such that they are identical, i.e., become cxhw xhw. And then the VF data features and the video disc data features after the operation processing are spliced to obtain OCT-VF fusion features (or called fusion feature graphs), wherein the dimension after the splicing is 2C×HW×HW, HW×HW feature points are arranged on the fusion feature graphs, each feature point is a group of OCT-VF feature pairs, and the channel number of the feature pairs is 2C. Each video disc data characteristic and all VF data characteristics form a point-to-point characteristic pair, and each VF data characteristic and all video disc data characteristics form a point-to-point characteristic pair. By using whole Office pairwise relationship functionEach set of OCT-VF feature pairs is mapped into a scalar that represents the relationship between the set of OCT-VF feature pairs. Finally, softmax normalization is carried out according to the rows and the columns respectively to obtain 1 group (2) global relation vectors: in fig. 3, one row represents one VF data feature and all the optic disc data features, and one column represents one optic disc data feature and all the VF data features, at this time, the first global relationship vector obtained by normalizing the row is used for converting the optic disc data feature into the VF data feature, and the second global relationship vector obtained by normalizing the row is used for converting the VF data feature into the optic disc data feature. I.e. alpha in figure 3 ^g Representing a first global relationship vector, beta ^g Representing a second global relationship vector.

It should be noted that (in fig. 3 the dimensions of V and O are uniform, but in practice non-uniform dimensions may be used, without limitation).

In some embodiments, to improve the processing efficiency of the convolutional neural network model, the mapping value is divided by M before normalizing the mapping value, where M is greater than 1. Of course, the M may also be a value corresponding to a dimension, for example, the following equations may be used to determine the first global relationship vector and the second global relationship vector respectively:

Wherein d _o And d _v Respectively representing the number of video discsAccording to the dimensions of the feature and VF data feature, d, if FIG. 3 is taken as an example _o And d _v In the case of H.times.W,as a global pairwise relationship function, alpha ^g For a first global relation vector for converting the video disc data characteristics to VF data characteristics, beta ^g For a second global relation vector for converting VF data features to disc data features, W _g 、W _v And W is _o All are learnable convolution layer weights, v _i Represents the ith VF data feature, o _j Representing the jth disc data characteristic.

Medical research shows that there is a position correspondence between the visual field area and the retinal nerve fiber layer, fig. 4 is a partition map found by Garway-Heath and the like, they divide the VF data feature and the optic disc data feature into 6 areas respectively, the leftmost diagram of fig. 4 is a visual field pattern deviation probability map, the middle diagram is a partition map of the optic disc, the rightmost diagram of fig. 4 is an OCT scan map of the circular disc, and the areas with identical numbers in the diagrams have a structure-function correspondence, for example, the areas with numbers of "1" in the visual field pattern deviation probability map have a structure-function correspondence with the areas with numbers of "121 ° -230 °" in the partition map of the optic disc. That is, if visual field damage caused by glaucoma is observed in the 1 st region of VF, a phenomenon of thinning of RNFL is observed in the 1 st region of OCT. That is, in some embodiments, determining the first local relation vector for converting the VF data feature to the optic disc data feature according to the VF data feature and the optic disc data feature in C1 includes:

C11' to divide the VF data feature into 6 VF areas and the disc data feature into 6 disc areas.

And C12', determining 6 characteristic region pairs according to the 6 VF regions and the 6 video disc regions, wherein each characteristic region pair corresponds to one VF region and one video disc region one by one.

And C13', determining relation vectors between the VF data features in the VF region and the video disc data features in the video disc region for the VF region and the video disc region of each feature region pair, and obtaining a first local relation vector for converting the video disc data features into the VF data features.

In the above-mentioned C11 'to C13', medical prior (OCT-VF partition mapping relation) is introduced into the design of the neural network, so as to learn a more accurate relation diagram between OCT and VF. In an embodiment of the present application, the first local relationship vector is determined by a guided regional relationship module of the multi-modal enhanced feature processing module 23, the process flow of which is shown in fig. 5. Similar to the processing of the global relationship inference module, the difference is that the computation of the local relationship vector is limited to the OCT-VF feature set where a partition mapping relationship exists. In fig. 5, the (a) and (b) diagrams are the calculation flows of the local relation vectors of the 1 st feature area pair and the 2 nd feature area pair, taking the partition relation diagram of the 1 st feature area pair as an example, we only select the VF data feature of the 1 st feature area pair and the disc data feature of the 1 st feature area pair to calculate, As a pairwise relationship function of the 1 st feature region pair, alpha ^r(1) For the first local relation vector, beta, of the 1 st feature region pair for the conversion of a video disc data feature to a VF data feature ^r(1) A second local relationship vector for the conversion of the VF data feature to the disc data feature for the 1 st feature region pair.

Wherein each first local relationship vector for the conversion of the disc data feature to the VF data feature may be determined according to the following equation:

wherein W is _r 、W _v And W is _o Is a learnable convolution layer weight, C _r Is the normalized parameter, k e {1,2,3,4,5,6}, is the partition number,is the VF data characteristic of the kth partition, < >>Is the disk data characteristic of the kth partition.

It should be noted that, the calculation manner of each second local relation vector for converting the VF data feature to the optic disc data feature is similar to the calculation manner of the first local relation vector for converting the optic disc data feature to the VF data feature, which is not described herein.

FIG. 6 shows a schematic diagram of the computation of a VF enhancement feature, in FIG. 6, "Matmul" represents matrix multiplication, "Element-wise sum" represents para-multiplication addition, α ^g Alpha for the first global relationship vector (i.e., global relationship map for disk data feature conversion) ^r(1) Alpha for the first local relationship vector for the conversion of disc data features to VF data features (i.e., the 1 st set of partition relationship maps for disc data feature conversion) in the 1 st feature region pair ^r(6) For the 6 th set of partition relationship diagrams for disk data feature conversion, V and O are VF features and disk data features, respectively, the arrow of O pointing to "Matmul" represents Conv (convolution operation) +reshape operation, and the 3 arrows pointed from "Matmul" represent Reshape operation. O first performs convolutional layer operations and Reshape operations, and the arrow indicated from "Fusion" indicates Conv operations. Respectively with alpha ^g 、α ^r(1) ...α ^r(6) Performing matrix multiplication according to the following steps:

wherein, sigma epsilon { g, r (k) },is the VF data feature converted from the disc data feature in the sigma-th feature region pair, +.>Is the video disk data feature converted by VF data feature in the sigma-th feature region pair, adds the first local relation vector of 6 feature region pairs to obtain partition enhancement feature +.>That is, after determining the first global relationship vector and the first local relationship vector, respectively, the above step A2 may determine the VF data feature converted from the disc data feature according to the above equation (7).

Enhancing partition characteristicsAnd global enhancement feature- >Fusion is carried out:

Z _VO ＝V+V _o→v ..........................(11)；

Z _OV ＝O+O _v→o ..........................(12)；

wherein V is _o→v Representing VF data characteristics converted from video disc data characteristics and Fusion tableShows the fusion function, O _v→o Representing the characteristics of the video disc data converted from the VF data characteristics, Z _VO Representing VF enhancement features derived from VF data features and VF data features converted from disc data features, Z _OV Representing the resulting disc enhancement features based on the disc data features and the disc data features converted from the VF data features.

Wherein the above-mentioned fusing mode includes at least one of the following: addition (average or weighted average), bit-wise maximization, concatenation, etc.

The fusion mode of addition refers to a vector obtained by adding the partition enhanced feature and the global enhanced feature according to the bit.

The fusion mode of taking the maximum value according to the bits is to compare the partition enhancement characteristic and the global enhancement characteristic according to the bits, and take the maximum value as an output value.

The splicing is to connect the partition enhancement feature and the global enhancement feature together, and 1×60 fusion features are obtained assuming that the first global relationship vector and the first local relationship vector to be spliced are both 1×30.

It should be noted that, when the above fusion method is adopted to fuse the partition enhanced feature and the global enhanced feature, weights corresponding to the partition enhanced feature and the global enhanced feature may also be set. I.e. by setting different weights, the flexibility of the converted data characteristics (VF data characteristics or disc data characteristics) is improved.

In some embodiments, the VF data is pattern bias probability map (Pattern deviation probability plots, PDPs) data, and the obtained VF data is PDPs data; the optic disc data is OCT image data obtained by performing ring scanning of the optic disc by optical coherence tomography (Optical Coherence Tomography, OCT).

Wherein, obtaining the PDPs data comprises: PDPs data is extracted from the visual field detection report.

In this embodiment, PDPs data is extracted from a pdf file or a tif image corresponding to a visual field detection report. Specifically, whether the visual field detection report meets the requirements is judged according to the reliability index in the visual field detection report, and if so, the corresponding PDPs data is extracted from the visual field detection report meeting the requirements. Since the PDPs data retains the visual field partition (position) information, it can provide more detailed and comprehensive visual field function information, so that the accuracy of the obtained classification result can be improved by using the PDPs data as VF data.

In some embodiments, extracting PDPs data from the visual field detection report includes:

d1, dividing the appointed position in the visual field detection report into N x N blocks, and determining the gray value of each block, wherein N is greater than or equal to 9.

Here, the designated position refers to a region position in which PDPs are displayed in the visual field detection report, and N is determined based on the number of content items included in the region position, where the number of content items includes the number of icon marks and the number of blank cells, and N is typically 10.

And D2, determining icon identifications corresponding to the blocks according to the gray values of the blocks and a preset mapping table to obtain PDPs data, wherein the preset mapping table is used for storing the corresponding relation between the gray values of the blocks and the icon identifications, and one icon identification is used for uniquely marking one icon in the PDPs.

In fig. 7, 4 kinds of abnormal probability icons are displayed beside the pattern deviation probability map, and the darker the color of the icons is, the smaller the corresponding probability value P is, that is, the lower the possibility that the field of view of the site is normal is. Referring to fig. 7,0 represents a blank lattice, and 1 to 5 represent 5 probability icons (4 abnormalities+1 normal), respectively. For example, icon designation "5" indicates P <2% (less than 2% of normal people would have such a low sensitivity view, that is, 98% of the site's view is abnormal), icon designation "4" indicates P <1% (99% of the site's view is abnormal), icon designation "3" indicates P <0.5% (99.5% of the site's view is abnormal), and so on.

It should be noted that, in practical situations, other information may be used as the icon identifier, which is not limited herein.

In the above-mentioned D1 and D2, the gray values of each block divided from the designated position are compared with the gray values stored in the preset mapping table, so as to determine the same gray value, and further determine the icon identifier corresponding to the same gray value, and each icon identifier corresponding to the gray value of each block constitutes PDPs data, where the PDPs data are two-dimensional discrete data and may also be regarded as a gray map.

In some embodiments, after acquiring the PDPs data and OCT image data, comprising:

and E1, performing first pretreatment on the PDPs data, wherein the first pretreatment comprises normalization treatment.

Specifically, the normalization process is to map each icon identifier to a 0-1 section, and for example, assume that 6 icon identifiers, that is, 0-5, are normalized to obtain 6 numerical values "0,075,0.693,0.8825,0.9107,0.9924" in the 0-1 section. Of course, the above 6 values in the interval 0 to 1 are merely examples, and in actual cases, other values in the interval 0 to 1 may be mapped, which is not limited thereto. The first preprocessing including normalization processing is performed on the PDPs, so that the VF data feature extraction module is simpler in data feature extraction of the PDPs after the first preprocessing. In some embodiments, the VF data feature extraction module includes at least two convolutional layers, with the parameters of the different convolutional layers typically being different, including channel number, convolutional kernel size, step size, padding, hole convolution, and so forth. When the number of layers of the VF data feature extraction module is larger, the learned parameters are larger, the semantics of the data features extracted by the VF data feature extraction module are stronger and are more abstract.

And E2, performing second preprocessing on the OCT image data, wherein the second preprocessing comprises normalization processing and scaling processing.

The normalization processing included in the second preprocessing refers to normalization processing of image pixel values of OCT image data, specifically: (1) The mean value and variance of each OCT image data in the training data set are counted in advance (2) the image pixel value of the OCT image data in the eye detection data to be processed is subtracted by the counted mean value, and then the counted variance is divided.

Wherein the second preprocessing includes a scaling process that scales the OCT image data to a specified size. It should be noted that the size of the OCT image data sample used by the training optic disc data feature extraction module is the above specified size, and the OCT image data sample used is also normalized. In some embodiments, to improve the generalization performance of the optic disc data feature extraction module, the OCT image data samples employed by the training optic disc data feature extraction module are samples obtained using different optical coherence tomography instruments. As shown in fig. 8, the OCT image data can be used by the user to view the thickness of the retinal nerve fiber layer (Retinal Nerve Fibre Layer, RNFL). Because the OCT image data is subjected to the second preprocessing before the data features of the OCT image data are extracted, and the second preprocessing includes the normalization processing and the scaling processing, the optic disc data feature extraction module does not need to pay attention to a large data range or other sizes when the data features are extracted, so that the corresponding data features can be quickly extracted from the OCT image data after the second preprocessing.

In some embodiments, the optic disc data feature extraction module includes at least two convolution layers, and the at least two convolution layers process data extracted from OCT image data after the second preprocessing using batch normalization and instance normalization to obtain corresponding data features.

The video disc data characteristic extraction module comprises at least two convolution layers, and parameters of different convolution layers are usually different, wherein the parameters comprise channel number, convolution kernel size, step size, padding, cavity convolution and the like. When the number of layers of the video disc data feature extraction module is larger, the learned parameters are larger, the semanteme of the data features extracted by the video disc data feature extraction module is stronger, and the data features are abstract.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Embodiment two:

corresponding to the above method for classifying eye detection data based on cross-modal relationship reasoning in the embodiment, fig. 9 shows a block diagram of a device for classifying eye detection data based on cross-modal relationship reasoning according to an embodiment of the present application, and for convenience of explanation, only the portion relevant to the embodiment of the present application is shown.

Referring to fig. 9, the device 9 for classifying eye detection data based on cross-modal relation reasoning includes: a data acquisition unit 91 and a classification result output unit 92. Wherein:

the data acquisition unit 91 is configured to acquire view VF data and disc data.

The classification result output unit 92 is configured to input VF data and optic disc data into a trained convolutional neural network model, and obtain classification results corresponding to the VF data and the optic disc data, where a processing procedure of the VF data and the optic disc data by the convolutional neural network model includes: and respectively extracting data characteristics of the VF data and the video disc data to obtain the VF data characteristics and the video disc data characteristics, carrying out joint processing on the VF data characteristics and the video disc data characteristics to obtain the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, carrying out characteristic fusion on the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data to obtain fusion characteristics, and classifying the fusion characteristics to obtain a classification result.

determining VF data features obtained by converting the video disc data features according to the VF data features and the video disc data features; obtaining VF enhancement features according to the VF data features and the VF data features obtained by converting the video disc data features, wherein the VF enhancement features are enhancement features of the VF data;

determining the video disc data characteristics obtained by converting the VF data characteristics according to the VF data characteristics and the video disc data characteristics; and obtaining the video disc enhancement characteristic according to the video disc data characteristic and the video disc data characteristic obtained by converting the VF data characteristic, wherein the video disc enhancement characteristic is the enhancement characteristic of video disc data.

In some embodiments, the VF data characteristics converted from the disc data characteristics are determined according to the following:

determining a first global relation vector for converting the video disc data characteristic to the VF data characteristic according to the VF data characteristic and the video disc data characteristic, and determining a first local relation vector for converting the video disc data characteristic to the VF data characteristic; and determining VF data features obtained by converting the video disc data features according to the first global relation vector and the first local relation vector.

In some embodiments, determining a first global relationship vector for a conversion of a disc data feature to a VF data feature from the disc data feature and the VF data feature comprises:

splicing the video disc data features and the VF data features to obtain splicing vectors; mapping operation is carried out on the spliced vector, and a mapping value of the spliced vector is obtained; a first global relationship vector for a conversion of the disc data characteristic to the VF data characteristic is determined based on the map values.

In some embodiments, determining a first local relationship vector for a conversion of a disc data feature to a VF data feature from the disc data feature and the VF data feature comprises:

dividing the VF data features into 6 VF regions, and dividing the video disc data features into 6 video disc regions; determining 6 characteristic area pairs according to the 6 VF areas and the 6 video disc areas, wherein each characteristic area pair corresponds to one VF area and one video disc area one by one; and determining a relation vector between the VF data features in the VF region and the video disc data features in the video disc region for the VF region and the video disc region of each feature region pair, and obtaining a first local relation vector for converting the video disc data features into the VF data features.

In some embodiments, splicing the video disc data features with the VF data features to obtain a splice vector includes:

if the dimensionality of the VF data feature is inconsistent with the dimensionality of the video disc data feature, adjusting the dimensionality of the VF data feature and/or the dimensionality of the video disc data feature to obtain the VF data feature and the video disc data feature with the consistent dimensionality; and splicing according to the VF data features and the video disc data features with the consistent dimensions to obtain a splicing vector.

In some embodiments, the VF data is mode deviation probability map data and the optic disc data is OCT map data obtained after ring scanning of the optic disc by optical coherence tomography OCT.

In some embodiments, the classifying device 9 of eye detection data based on cross-modal relation reasoning further comprises:

the first preprocessing unit is used for carrying out first preprocessing on the model deviation probability map data, and the first preprocessing comprises normalization processing;

the second preprocessing unit is used for performing second preprocessing on the OCT image data, wherein the second preprocessing comprises normalization processing and scaling processing;

the classification result output unit 92 is specifically configured to, when inputting VF data and optic disc data into the pre-trained convolutional neural network model: and inputting the mode deviation probability map data after the first pretreatment and the OCT map data after the second pretreatment into a pre-trained convolutional neural network model.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

Embodiment III:

fig. 10 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 10, the terminal device 10 of this embodiment includes: at least one processor 100 (only one processor is shown in fig. 10), a memory 101, and a computer program 102 stored in the memory 101 and executable on the at least one processor 100, the processor 100 implementing the steps in any of the various method embodiments described above when executing the computer program 102:

obtaining visual field VF data and video disc data;

inputting the VF data and the optic disc data into a trained convolutional neural network model to obtain a classification result corresponding to the VF data and the optic disc data, wherein the processing procedure of the convolutional neural network model on the VF data and the optic disc data comprises the following steps: and respectively extracting data characteristics of the VF data and the video disc data to obtain the VF data characteristics and the video disc data characteristics, carrying out joint processing on the VF data characteristics and the video disc data characteristics to obtain the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data, carrying out characteristic fusion on the enhancement characteristics of the VF data and the enhancement characteristics of the video disc data to obtain fusion characteristics, and classifying the fusion characteristics to obtain a classification result.

The terminal device 10 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 100, a memory 101. It will be appreciated by those skilled in the art that fig. 10 is merely an example of the terminal device 10 and is not intended to limit the terminal device 10, and may include more or fewer components than shown, or may combine certain components, or may include different components, such as input-output devices, network access devices, etc.

The processor 100 may be a central processing unit (Central Processing Unit, CPU), and the processor 100 may also be other general purpose processors, digital signal processors (Digital SignalProcessor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 101 may in some embodiments be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may in other embodiments also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit and an external storage device of the terminal device 10. The memory 101 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the application also provides a network device, which comprises: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, which when executed by the processor performs the steps of any of the various method embodiments described above.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The method for classifying the eye detection data based on the cross-modal relation reasoning is characterized by comprising the following steps of:

obtaining visual field VF data and video disc data;

inputting the VF data and the video disc data into a trained convolutional neural network model to obtain a classification result corresponding to the VF data and the video disc data, wherein the processing procedure of the convolutional neural network model on the VF data and the video disc data comprises the following steps: extracting data features of the VF data and the video disc data respectively to obtain the VF data features and the video disc data features, carrying out joint processing on the VF data features and the video disc data features to obtain enhancement features of the VF data and enhancement features of the video disc data, carrying out feature fusion on the enhancement features of the VF data and the enhancement features of the video disc data to obtain fusion features, and classifying the fusion features to obtain the classification result;

And carrying out joint processing on the VF data features and the video disc data features to obtain enhancement features of the VF data, wherein the method comprises the following steps:

determining VF data features obtained by converting the video disc data features according to the VF data features and the video disc data features;

obtaining a VF enhancement feature according to the VF data feature and the VF data feature obtained by converting the video disc data feature, wherein the VF enhancement feature is an enhancement feature of the VF data;

performing joint processing on the data features of the VF data and the data features of the video disc data to obtain enhancement features of the video disc data, wherein the joint processing comprises the following steps:

determining the video disc data characteristics obtained by converting the VF data characteristics according to the VF data characteristics and the video disc data characteristics;

and obtaining the video disc enhancement characteristic according to the video disc data characteristic and the video disc data characteristic obtained by converting the VF data characteristic, wherein the video disc enhancement characteristic is the enhancement characteristic of the video disc data.

2. The method for classifying eye detection data based on cross-modal relationship reasoning as claimed in claim 1, wherein the VF data characteristics converted from the optic disc data characteristics are determined according to the following manner:

Determining a first global relationship vector for the conversion of the video disc data feature to the VF data feature according to the VF data feature and the video disc data feature, and determining a first local relationship vector for the conversion of the video disc data feature to the VF data feature;

and determining VF data features obtained by converting the video disc data features according to the first global relation vector and the first local relation vector.

3. The method of cross-modal relational reasoning-based eye detection data classification as set forth in claim 2 wherein the determining a first global relational vector for the transformation of the optic disc data feature to the VF data feature based on the optic disc data feature and the VF data feature comprises:

splicing the video disc data features and the VF data features to obtain splicing vectors;

performing mapping operation on the spliced vector to obtain a mapping value of the spliced vector;

and determining a first global relation vector for converting the video disc data characteristic to the VF data characteristic according to the mapping value.

4. The method of cross-modal relational reasoning-based eye detection data classification as set forth in claim 2 wherein determining a first local relational vector for the transformation of the optic disc data feature to the VF data feature based on the optic disc data feature and the VF data feature comprises:

Dividing the VF data features into 6 VF regions, and dividing the video disc data features into 6 video disc regions;

determining 6 characteristic region pairs according to the 6 VF regions and the 6 video disc regions, wherein each characteristic region pair corresponds to one VF region and one video disc region one by one;

and determining a relation vector between the VF data features in the VF region and the video disc data features in the video disc region for the VF region and the video disc region of each feature region pair, and obtaining a first local relation vector for converting the video disc data features into the VF data features.

5. The method for classifying eye detection data based on cross-modal relation reasoning as claimed in claim 3, wherein the stitching the video disc data feature with the VF data feature to obtain a stitching vector includes:

if the dimensionality of the VF data feature is inconsistent with the dimensionality of the video disc data feature, adjusting the dimensionality of the VF data feature and/or the dimensionality of the video disc data feature to obtain the VF data feature and the video disc data feature with the consistent dimensionality;

and splicing according to the VF data features and the video disc data features with consistent dimensions to obtain splicing vectors.

6. The method for classifying eye detection data based on cross-modal relation reasoning according to any one of claims 1 to 5, wherein the VF data is mode deviation probability map data and the optic disc data is OCT map data obtained by performing circular scanning on a optic disc by optical coherence tomography OCT.

7. The method for classifying eye detection data based on cross-modal relation reasoning as claimed in claim 6, wherein after acquiring the pattern deviation probability map data and the OCT map data, comprising:

performing first preprocessing on the mode deviation probability map data, wherein the first preprocessing comprises normalization processing;

performing second preprocessing on the OCT map data, wherein the second preprocessing comprises normalization processing and scaling processing;

the inputting the VF data and the optic disc data into a pre-trained convolutional neural network model, comprising: and inputting the mode deviation probability map data subjected to the first pretreatment and the OCT map data subjected to the second pretreatment into a pre-trained convolutional neural network model.

8. A device for classifying eye detection data based on cross-modal relationship reasoning, comprising:

the classification result output unit is configured to input the VF data and the optic disc data into a trained convolutional neural network model, and obtain classification results corresponding to the VF data and the optic disc data, where a processing procedure of the convolutional neural network model on the VF data and the optic disc data includes: extracting data features of the VF data and the video disc data respectively to obtain the VF data features and the video disc data features, carrying out joint processing on the VF data features and the video disc data features to obtain enhancement features of the VF data and enhancement features of the video disc data, carrying out feature fusion on the enhancement features of the VF data and the enhancement features of the video disc data to obtain fusion features, and classifying the fusion features to obtain the classification result;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.