WO2024001104A1 - Image-text data mutual-retrieval method and apparatus, and device and readable storage medium - Google Patents

Image-text data mutual-retrieval method and apparatus, and device and readable storage medium Download PDF

Info

Publication number
WO2024001104A1
WO2024001104A1 PCT/CN2022/141374 CN2022141374W WO2024001104A1 WO 2024001104 A1 WO2024001104 A1 WO 2024001104A1 CN 2022141374 W CN2022141374 W CN 2022141374W WO 2024001104 A1 WO2024001104 A1 WO 2024001104A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
features
feature
information
Prior art date
Application number
PCT/CN2022/141374
Other languages
French (fr)
Chinese (zh)
Inventor
赵雅倩
王立
范宝余
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024001104A1 publication Critical patent/WO2024001104A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS

Definitions

  • the application belongs to the field of computers, and specifically relates to a method, device, equipment and readable storage medium for mutual checking of graphic and text data.
  • Single-modal retrieval can only query information of the same modality, such as text retrieval text and image retrieval image.
  • Cross-modal retrieval refers to using samples of one modality to retrieve samples of another modality that are semantically similar to it, such as image retrieval of text and text retrieval of images.
  • the cross-domain heterogeneity of this application is mainly reflected in the fact that the image data is in different spaces and is heterogeneous data. If the retrieval is correct, the retrieval method needs to have the function of cross-domain retrieval to achieve alignment and sorting between modalities.
  • cross-modal retrieval not only needs to model the relationship between modal data, but also needs to model the correlation between different modalities, so as to achieve different Cross-domain retrieval between modalities.
  • Cross-modal retrieval has strong flexibility, wide application scenarios and strong user needs. It is also an important research content of cross-modal machine learning and has very important academic value and significance.
  • This application proposes a mutual inspection method for medical graphic and text data, including:
  • Iterative training based on text features and image features based on a predetermined loss function generates a graphic and text data mutual detection model
  • the corresponding text information and/or image information in the input graphic data is retrieved through the graphic and text data mutual inspection model.
  • Another aspect of this application also proposes a medical image and text data mutual detection device, including:
  • the preprocessing module is configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model to generate text in a cascade manner according to the classification relationship. feature;
  • the first model calculation module is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
  • the second model calculation module is configured to iteratively train based on a predetermined loss function based on text features and image features to generate a graphic and text data mutual detection model;
  • the image-text mutual inspection module is configured to retrieve the corresponding text information and/or image information in the input image-text data through the image-text data mutual inspection model.
  • FIG. 1 Another aspect of the present application also proposes a computer device, including: a memory and one or more processors.
  • Computer readable instructions are stored in the memory.
  • the computer readable instructions are executed by the one or more processors, such that The above-mentioned one or more processors implement the steps of the above-mentioned mutual checking method of medical graphic and text data.
  • Another aspect of the present application also proposes one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the above-mentioned computer-readable instructions are executed by the above-mentioned one or more processors, the above-mentioned one or more processors Each processor executes the steps of the above-mentioned mutual checking method of medical graphic and text data.
  • Figure 1 is a flow chart of an embodiment of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 2 is a schematic diagram of medical text data provided by this application according to one or more embodiments.
  • Figure 3 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 4 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 5 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 6 is a schematic structural diagram of a model of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 7 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 8 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments.
  • Figure 9 is a schematic structural diagram of a medical image and text data mutual detection device provided by the present application according to one or more embodiments.
  • Figure 10 is a schematic structural diagram of a computer device provided according to one or more embodiments of the present application.
  • Figure 11 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by this application according to one or more embodiments.
  • the task data consists of two parts: medical images and medical text.
  • Medical images include many types of images, such as MRI images, CT, ultrasound images, etc., which are all sequence images.
  • medical texts include: medical record reports, etc. This is just an example, but it does not mean that the method of this application can only be applied in this field.
  • this application proposes a mutual checking method of medical graphic and text data.
  • the mutual checking method of medical graphic and text data is applied to computer equipment as an example, including:
  • Step S1 Classify the text information in the graphic data in a multi-level manner according to a predetermined method, and pass the classified text information through the first neural network model to generate text features in a cascade manner according to the classification relationship;
  • Step S2 Generate image features from the image information in the graphic data through the second neural network model in the form of an image sequence
  • Step S3 Iteratively train based on text features and image features based on a predetermined loss function to generate an image and text data mutual detection model
  • Step S4 retrieve the corresponding text information and/or image information from the text information and/or image information in the input image and text data through the image and text data mutual inspection model.
  • the graphic data refers to the text data and image data corresponding to the medical image, that is, the graphic data in this application refers to the medical sequence image and the corresponding disease description and the patient's corresponding
  • information related to the patient's disease such as physical status information, please refer to the content shown in Figure 2 for details.
  • the text data in the graphic data is divided into multiple categories, as shown in Figure 2, and the classified text is input into the first neural network model with the classified text as a unit.
  • the first neural network model is a Transformer model, that is, in step S1, corresponding feature vectors are calculated for the classified text data through multiple Transformer models, and then the feature vectors output by the multiple Transformer models are used as input, Input it into a superior Transformer model, and use the output results of the superior Transformer model as text features.
  • step S2 the medical images in the graphic data are calculated through the residual network model ResNet to obtain corresponding image features.
  • An image feature is a vector of specified size.
  • there is at least one medical image in the graphic data which usually refers to multiple medical images. That is, in reality, medical images such as MRI or CT generally have multiple medical images. Scan the lesion across a wide area or multiple angles. Therefore, when there are multiple medical images, corresponding image features need to be generated based on the multiple medical images.
  • step S3 the above text features and image features are similarity matched, and the corresponding similarity loss value is calculated according to the preset loss function, and the corresponding loss value is back-propagated to the Transformer model and the residual network model. Iterative training is repeated until the size of the loss value meets the accuracy requirements, then the Transformer model, the residual network model and the corresponding model parameters on the loss function are saved as mutual inspection models.
  • step S4 when using the mutual detection model for analysis or prediction, the text description of the corresponding case or disease and/or the corresponding medical image is input into the mutual detection model, and the mutual detection model gives a result based on the input text or image. Match the test report, or filter out the diagnosis content of the corresponding disease through the corresponding medical image. Realize the mutual inspection of pictures and texts of medical images to help medical workers reduce their workload.
  • the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship.
  • Figure 2 shows the description information of the patient's disease in a certain hospital, and includes the patient's personal information, such as age, marital status, occupation, etc., as well as allergy history and current illness history. , personal history, past history, family history, current illnesses, and much more.
  • the text information in Figure 2 is divided into multiple structured texts according to the above classification.
  • the text content of the personal history classification is: born and raised in the place of origin, living in a good living environment, etc., as one type of text. information.
  • the content is input into the Transformer model as input data of a Transformer model, and the Transformer model gives the feature vector of the corresponding text information under the category. That is, the content of personal history is represented by a Transformer model to give corresponding feature vectors for subsequent model judgment.
  • the above-mentioned input of classified text content into the Transformer model is not the original text input, but the corresponding text is converted into word vector mode using the corresponding tool and then input into the Transformer model.
  • the corresponding tool can be a model such as Bert. Text vectorization.
  • classifying text information according to text structure types includes:
  • text information can also be divided according to a combination of time and structure.
  • the causes of some diseases cannot be affected by past medical history, but are related to the patient's living habits or other pre-existing symptoms in the recent period. Therefore, when the entire disease content is related to the person's medical history, past history or When family histories are mixed together, there will be a large number of irrelevant factors that will affect the judgment of the mutual inspection model. Therefore, when classifying text information, time factors can be used to classify the text information. The effect of text content that highlights certain diseases in model judgment.
  • the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. Also includes:
  • this application also uses the Transformer model cascade method to generate the text features of the classification. Specifically, the classified text is divided into corresponding sections of content according to punctuation marks or semantics, which are called clauses in this application, that is, each category is represented by multiple clauses. In natural language, multiple clauses The content is the classified text content.
  • each clause is used as the input of a Transformer model and the feature vector corresponding to the clause is calculated, that is, one clause corresponds to a Transformer model, and then multiple clause feature vectors are input into a Transformer model, and the Transformer model converts multiple clauses into one Transformer model.
  • the feature vector of the clause is calculated, and the calculation result is the text feature of the classified text content.
  • all the content of the personal history in Figure 2 corresponds to one sentence (the comma interval can also be regarded as one sentence).
  • the Transformer model performs calculations, and then inputs the multiple outputs of the Transformer model corresponding to multiple sentences into a total Transformer model, and then the total Transformer model outputs text features of personal history.
  • Figure 3 shows the way in which the text features of multiple classified texts are cascaded into a total Transformer model, and the text features are output by the total Transformer model. In Figure 3 Just replace the first text information with the corresponding clause.
  • sentences are sorted by the number of times they appear, and each sorted sentence is input as a parameter to the first neural network model to calculate the text features of the structural text information:
  • the words in each sentence are added together with their corresponding sequence number values and the sentence numbers in the text structure classification and then input into the first neural network model to calculate text features of the structural text information.
  • a is the feature vector of the first clause
  • the Emb in the lower part of Figure 4 represents an input data of the Transformer model.
  • any piece of data needs to be combined with the number of the text category it belongs to, which is the text type in the second to last line in the lower part of the figure.
  • the values are added, and then added to the sorting number of the input clause (that is, the position information in Figure 4), and the final value is input into the Transformer model.
  • the method further includes:
  • the text features of the structured text information are obtained by weighting and averaging the calculation results output by the first neural network model and corresponding to the plurality of sentences.
  • structured text information refers to classified text information.
  • the first way is to rely on the calculation principle of the Transformer model. For any input data, the Transformer model will calculate it with other input data and output the data. The result is the feature vector of the input data (different from the value of the original input), so the output result of any input data can be used as the output of the Transformer model at this level, that is, if it is the output result of a certain category of text information.
  • the value of one of the clauses calculated by the total Transformer model can be used as the text feature of the classified text.
  • the output value of the Transformer model of one of the clauses is used as the text feature of the entire classified text; or the output value of the Transformer model of multiple clauses is weighted and averaged to obtain the text feature of the corresponding structural text information.
  • the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. Also includes:
  • Text features of the multiple structural text information are input into the first neural network model to obtain text features of the text information.
  • Figure 3 shows a schematic diagram of the Transformer model cascade of this application, that is, multiple classified texts are calculated through multiple Transformer models in the lower layer, and corresponding multiple classified text features are obtained, and then the multiple classified text features are calculated.
  • the classified text features are input into the last-level Transformer model to obtain the text features of the overall text.
  • inputting feature vectors of multiple structural text information into the first neural network model to obtain text features of the text information includes:
  • the text features of each structured text information and the corresponding sequence value and classification number of the structured text are added and then input into the first neural network model to calculate the text features of the text information.
  • structured text refers to classified text classified according to structure. Furthermore, similar to the feature cascade calculation of clauses in classified text, when calculating the text features of the overall text information, it is also necessary to first add the text features after the corresponding classification to their corresponding classification numbers, and then The difference from its corresponding sequential addition is that the values of the two additions are the same in some scenarios.
  • the method further includes:
  • the text features of the text information are obtained by weighting and averaging the calculation results output by the first neural network model and multiple structural text information; or
  • the text features of multiple structural text information are spliced into long vectors, and the spliced long vectors are passed through the fully connected layer to obtain the text features of the text information.
  • a feature of the classified text output by the Transformer model can be selected as the feature of the overall text. That is, the output result of the total Transformer model corresponding to one of the categories can be selected as the text feature of the graphic data.
  • the text features of multiple structural text information can be spliced head to tail, and the spliced text features are used through a fully connected layer to obtain a new dimensional feature vector as the text feature of the overall text information.
  • using the image information in the graphic data in the form of an image sequence to generate image features through a second neural network model includes:
  • the image feature weight vector is then added to the image sequence feature vector to obtain the image feature.
  • the image sequence shown in FIG. 5 only shows three images. Specifically, the image sequence is calculated through the residual network, and the corresponding feature vector of each image is obtained.
  • calculating the weight of the image sequence feature vector includes:
  • FIG. 7 is a sub-figure of the overall network structure diagram of this application, illustrating our weight calculation structure, including two fully connected layers FC and one ReLU layer.
  • the image features are passed through the backbone network to obtain the embedded features, that is, the feature vector of the image.
  • the embedded features are passed through a fully connected layer to obtain the final embedded feature e of each image.
  • the final embedded feature e will calculate the weight of each feature through the attention structure.
  • the weight is a number and is normalized through the sigmoid layer.
  • iterative training to generate an image and text data mutual detection model based on text features and image features based on a predetermined loss function includes:
  • the text loss value and the image loss value are summed to obtain the first loss value, and the mutual detection model is trained through the first loss value.
  • this application proposes a new generalized pairwise hinge-loss function to evaluate the above model loss.
  • the formula is as follows:
  • this application will traverse the feature coding of each image group (i.e., the image features as before) and the text feature coding (the text features corresponding to the overall text information). ) to find the average of the loss function. As shown in the above formula.
  • N represents a total of N paired samples in this batch.
  • the image group features Traverse (N in total), and the one selected by the traversal is called a represents anchor (anchor sample).
  • anchor sample The text feature encoding paired with the anchor sample is denoted as p stands for positive.
  • All remaining unpaired samples are recorded as s np .
  • is a hyperparameter, fixed during training, and is set to 0.4 in this application.
  • this application also performs the same traversal operation for text features. Represents the sample selected in the traversal, and its corresponding positive image group feature sample is recorded as Those that do not correspond are recorded as s np .
  • This application uses the above loss function to perform gradient backpropagation during training to update the cascade Transformer model and ResNet network parameters.
  • the text loss value refers to the above formula:
  • s np represents the minimum Euclidean distance between text features and other text features and/or image features.
  • the image loss value refers to the above formula:
  • iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
  • the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature is used as the second loss value, and the mutual detection model is trained through the second loss value.
  • the two features describe information in the same semantic space.
  • F represents the first conversion method
  • G represents the second conversion method
  • X represents e csi , which is our image group feature
  • Y represents e rec , which is our medical text feature.
  • 2 ) means that the image features are transformed into the text feature space through the second transformation method, and the text image features are obtained, and then the text image features are transformed into the text image features through the first transformation method. After transforming into the image feature space and obtaining the image transformation features, the mean value of the difference between the image transformation features and the original image features;
  • 2 ) means that the first transformation method transforms text features into image feature space and obtains image text features, and then uses the second transformation method to transform image text features into text After the text transformation features are obtained from the feature space, the mean value of the difference between the text transformation features and the original text features is
  • L c represents the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature, which is the minimum value of the second loss function.
  • the mutual detection model is iteratively trained using L c as the loss function.
  • iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
  • the loss values corresponding to the corresponding text features and image features are calculated respectively through the third conversion method, and the difference between the loss values corresponding to the text features and the loss values corresponding to the image features is determined, and the difference is used as the third loss value, and The mutual inspection model is iteratively trained through the third loss value.
  • the purpose of this application is to make the features of X (image features) and Y (text features) as close as possible, so this application designs a discriminant loss function:
  • X is mapped to a Dx feature (scalar) through the D method, that is, a certain image feature is calculated into a scalar Dx through the third conversion method D).
  • Y is mapped to Dy features (scalars) via the D method. The purpose is to make the Dx and Dy features as close as possible, so that it is even impossible to tell whether it is Dx or Dy.
  • logD(Y) refers to the logarithm of the number after converting the text feature into a scalar through the third conversion method D.
  • E[logD(Y)] represents the mean of all logarithms in a Batch sample;
  • log (1-D(X)) refers to converting the image feature into a scalar through the third conversion method D, and then calculating the logarithm of it.
  • E[log(1-D(X)) represents all the image data in a Batch sample The mean of the logarithm after transformation by the third transformation method D.
  • L d represents the loss value of the above-mentioned discriminant loss function in a Batch sample, that is, the loss value of the loss function under one Batch iteration training. If the third conversion method D obtains appropriate parameter values through iterative training, then D(Y) and D(X) should be extremely close, and L d is infinitely close to 0, (under super-ideal conditions), then this application will be explained at this time
  • the mutual detection model transforms text features and image features into the same space. Text features and corresponding image features have very similar meanings, that is, they are almost the same.
  • iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
  • the mutual detection model is iteratively trained by using the sum of the first loss value, the second loss value and the third loss value as the loss value.
  • the training process of this application can refer to Figure 6 to build a medical image text retrieval network based on cascade transformers, including a text information feature encoder and a medical image sequence feature encoder (as shown in the figure above).
  • the network is trained according to the above loss function to make it converge.
  • the network training process is as follows: The training process of the neural network is divided into two stages. The first stage is the stage where data is propagated from low level to high level, that is, the forward propagation stage. Another stage is the stage where the error is propagated from the high level to the bottom level, that is, the back propagation stage, when the results obtained by the forward propagation are not in line with expectations.
  • the training process is:
  • All network layer weights are initialized, generally using random initialization
  • the input image and text data are forward propagated through the neural network, convolutional layer, downsampling layer, fully connected layer and other layers to obtain the output value;
  • Each layer of the network adjusts all weight coefficients in the network based on the backpropagation error of each layer, that is, updates the weights.
  • the weight coefficients trained by the network are preloaded. Feature extraction from medical text or medical image sequences. Store in the data set to be retrieved. The user gives any medical text data or medical image sequence data, which we call query data. Extract features of medical text data or medical image sequence data of query data, using our cascade transformer medical image text retrieval network. Distance-match the features of the query data with the features of all samples in the data set to be retrieved, that is, find the vector distance. This application finds the Euclidean distance. For example: If the query data is medical text data, get all the medical image sequence features in the data set to be retrieved and calculate the distance. Similarly, query data is medical image sequence data. Calculate the Euclidean distance from all the medical image sequence features in the data set to be retrieved. The sample with the smallest distance is the recommended sample and is output.
  • the text data is calculated in a multi-level Transformer model cascade manner, and the corresponding text features are calculated through the residual network.
  • the image features of the image sequence are calculated through the multiple losses proposed by this application.
  • the function calculates and trains the mutual detection model of graphic and text data based on text features and image features. Further, the input text data or image data is predicted or the corresponding text data or image data is retrieved through the image and text data mutual detection model.
  • FIG. 9 Another aspect of this application also proposes a medical image and text data mutual detection device, including:
  • the preprocessing module 1 is configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model in a cascade manner according to the classification relationship. Generate text features;
  • the first model calculation module 2 is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
  • the second model calculation module 3 is configured to iteratively train and generate a graphic and text data mutual detection model based on text features and image features based on a predetermined loss function;
  • the image-text mutual inspection module 4 is configured to retrieve the corresponding text information and/or image information in the input image-text data through the image-text data mutual inspection model.
  • the computer device can be a terminal or a server, including:
  • the memory 22 stores computer readable instructions 23 that can be run on the processor 21 .
  • the readable instructions 23 are executed by the processor 21 , the steps of any one of the methods in the above embodiments are implemented.
  • FIG. 11 another aspect of the present application also proposes a non-volatile computer-readable storage medium 401.
  • the non-volatile computer-readable storage medium 401 stores computer-readable instructions 402.
  • the computer-readable instructions 402 When executed by the processor, the steps of any one of the methods in the above embodiments are implemented.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

A medical image-text data mutual-retrieval method, comprising: performing multi-level classification on text information in image-text data according to a predetermined manner, and respectively generating classified text information into a text feature in a cascaded manner by means of a first neural network model according to a classification relationship (S1); generating image information in the image-text data into an image feature in an image sequence manner by means of a second neural network model (S2); according to the text feature and the image feature, performing iterative training on the basis of a predetermined loss function, so as to generate an image-text data mutual-retrieval model (S3); and for text information and/or image information in input image-text data, retrieving corresponding text information and/or image information by means of the image-text data mutual-retrieval model (S4).

Description

一种图文数据互检方法、装置、设备及可读存储介质A method, device, equipment and readable storage medium for mutual checking of graphic and text data
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月30日提交中国专利局,申请号为202210760827.7,申请名称为“一种医疗图文数据互检方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application is required to be submitted to the China Patent Office on June 30, 2022. The application number is 202210760827.7, and the application title is "A medical graphic and text data mutual inspection method, device, equipment and readable storage medium". The priority of the Chinese patent application , the entire contents of which are incorporated herein by reference.
技术领域Technical field
本申请属于计算机领域,具体涉及一种图文数据互检方法、装置、设备及可读存储介质。The application belongs to the field of computers, and specifically relates to a method, device, equipment and readable storage medium for mutual checking of graphic and text data.
背景技术Background technique
随着医疗行业信息化水平的不断提高,医学影像数据量日益膨胀,行业内普遍现状是对于这些具有多种模态的医学图像数据一直缺乏有效的管理和检索方式,多种模态的数据检索成为了亟需解决的问题。With the continuous improvement of the informatization level of the medical industry, the amount of medical image data is expanding day by day. The common situation in the industry is that there has been a lack of effective management and retrieval methods for these medical image data with multiple modalities. Data retrieval of multiple modalities has become an urgent problem that needs to be solved.
现有的医学检索任务主要面向单模态检索。单模态检索只能查询同类模态的信息,如文本检索文本、图像检索图像。跨模态检索则是指使用一种模态的样本来检索与之语义相似的另一种模态的样本,比如图像检索文本、文本检索图像。Existing medical retrieval tasks are mainly oriented to single-modal retrieval. Single-modal retrieval can only query information of the same modality, such as text retrieval text and image retrieval image. Cross-modal retrieval refers to using samples of one modality to retrieve samples of another modality that are semantically similar to it, such as image retrieval of text and text retrieval of images.
本申请跨域异质主要体现在:图像数据处在不同的空间当中,属于异质数据。如果检索正确需要检索方法有跨域检索的功能,实现模态之间的对齐和排序。The cross-domain heterogeneity of this application is mainly reflected in the fact that the image data is in different spaces and is heterogeneous data. If the retrieval is correct, the retrieval method needs to have the function of cross-domain retrieval to achieve alignment and sorting between modalities.
发明人意识到,相比于单模态数据,跨模态检索不仅要对模态数据之间的关系进行建模,而且还需对不同模态之间的相关性进行建模,从而实现不同模态之间跨域检索。跨模态检索具有较强的灵活性,应用场景广泛和用户需求强烈,同时也是跨模态机器学习重要研究内容,具有十分重要的学术价值和意义。The inventor realized that compared with single-modal data, cross-modal retrieval not only needs to model the relationship between modal data, but also needs to model the correlation between different modalities, so as to achieve different Cross-domain retrieval between modalities. Cross-modal retrieval has strong flexibility, wide application scenarios and strong user needs. It is also an important research content of cross-modal machine learning and has very important academic value and significance.
例如,医学信息化的蓬勃发展,各类医院信息系统越来越完善,收集到了种类丰富的医学数据。医学数据慢慢成为继自然数据集后,又一特殊的跨模态数据类型。一般放射学科医生通过经验并参考他们之前见过的病例特征直接通过肉眼观察进行诊断。由于数据量大、经验有限等原因,不可避免会出现误诊、漏诊等情况,对患者治疗的准确性留下很大的隐患。因此医生如果能快速的查询医学数据库中数据相似的信息来进行辅助诊断,将减少误诊情况,提高工作效率。For example, with the vigorous development of medical informatization, various hospital information systems have become more and more complete, and a rich variety of medical data has been collected. Medical data has slowly become another special cross-modal data type after natural data sets. General radiologists make diagnoses directly with the naked eye through experience and with reference to characteristics of cases they have seen before. Due to large amounts of data, limited experience and other reasons, misdiagnosis and missed diagnosis will inevitably occur, leaving great hidden dangers to the accuracy of patient treatment. Therefore, if doctors can quickly query similar information in medical databases for auxiliary diagnosis, they will reduce misdiagnosis and improve work efficiency.
发明内容Contents of the invention
本申请提出一种医疗图文数据互检方法,包括:This application proposes a mutual inspection method for medical graphic and text data, including:
将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征;Classify the text information in the graphic data in a multi-level manner according to a predetermined method, and pass the classified text information through the first neural network model to generate text features in a cascade manner according to the classification relationship;
将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征;Generate image features from the image information in the graphic data through the second neural network model in the form of image sequences;
根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型;和Iterative training based on text features and image features based on a predetermined loss function generates a graphic and text data mutual detection model; and
通过图文数据互检模型对输入的图文数据中的文本信息和/或图像信息进行检索对应的文本信息和/或图像信息。The corresponding text information and/or image information in the input graphic data is retrieved through the graphic and text data mutual inspection model.
本申请的另一方面还提出一种医疗图文数据互检装置,包括:Another aspect of this application also proposes a medical image and text data mutual detection device, including:
预处理模块,预处理模块配置用于将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征;The preprocessing module is configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model to generate text in a cascade manner according to the classification relationship. feature;
第一模型计算模块,第一模型计算模块配置用于将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征;The first model calculation module is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
第二模型计算模块,第二模型计算模块配置用于根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型;和The second model calculation module is configured to iteratively train based on a predetermined loss function based on text features and image features to generate a graphic and text data mutual detection model; and
图文互检模块,图文互检模块配置用于通过图文数据互检模型对输入的图文数据中的文本信息和/或图像信息进行检索对应的文本信息和/或图像信息。The image-text mutual inspection module is configured to retrieve the corresponding text information and/or image information in the input image-text data through the image-text data mutual inspection model.
本申请的又一方面还提出一种计算机设备,包括:存储器及一个或多个处理器,存储器中储存有计算机可读指令,上述计算机可读指令被上述一个或多个处理器执行时,使得上述一个或多个处理器实现上述医疗图文数据互检方法的步骤。Another aspect of the present application also proposes a computer device, including: a memory and one or more processors. Computer readable instructions are stored in the memory. When the computer readable instructions are executed by the one or more processors, such that The above-mentioned one or more processors implement the steps of the above-mentioned mutual checking method of medical graphic and text data.
本申请的再一方面还提出一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,上述计算机可读指令被上述一个或多个处理器执行时,使得上述一个或多个处理器执行上述医疗图文数据互检方法的步骤。Another aspect of the present application also proposes one or more non-volatile computer-readable storage media storing computer-readable instructions. When the above-mentioned computer-readable instructions are executed by the above-mentioned one or more processors, the above-mentioned one or more processors Each processor executes the steps of the above-mentioned mutual checking method of medical graphic and text data.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1为本申请根据一个或多个实施例提供多的一种医疗图文检索方法的实施例流程图;Figure 1 is a flow chart of an embodiment of a medical image and text retrieval method provided by this application according to one or more embodiments;
图2为本申请根据一个或多个实施例提供的医疗文本数据的示意图;Figure 2 is a schematic diagram of medical text data provided by this application according to one or more embodiments;
图3为本申请根据一个或多个实施例提供的一种医疗图文检索方法的部分模型结构示意图;Figure 3 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
图4为本申请根据一个或多个实施例提供的一种医疗图文检索方法的部分模型结构示意图;Figure 4 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
图5为本申请根据一个或多个实施例提供的一种医疗图文检索方法的部分模型结构示意图;Figure 5 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
图6为本申请根据一个或多个实施例提供的一种医疗图文检索方法的模型结构示意图;Figure 6 is a schematic structural diagram of a model of a medical image and text retrieval method provided by this application according to one or more embodiments;
图7为本申请根据一个或多个实施例提供的一种医疗图文检索方法的部分模型结构示意图;Figure 7 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
图8为本申请根据一个或多个实施例提供的一种医疗图文检索方法的部分模型结构示意图。Figure 8 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments.
图9为本申请根据一个或多个实施例提供的一种医疗图文数据互检装置的结构示意图;Figure 9 is a schematic structural diagram of a medical image and text data mutual detection device provided by the present application according to one or more embodiments;
图10为本申请根据一个或多个实施例提供的一种计算机设备的结构示意图;Figure 10 is a schematic structural diagram of a computer device provided according to one or more embodiments of the present application;
图11为本申请根据一个或多个实施例提供的一种非易失性计算机可读存储介质的结构示意图。Figure 11 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by this application according to one or more embodiments.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。In order to make the purpose, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to specific embodiments and the accompanying drawings.
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本申请实施例的限定,后续实施例对此不再一一说明。It should be noted that all expressions using “first” and “second” in the embodiments of this application are intended to distinguish two entities or parameters with the same name but not the same. It can be seen that “first” and “second” It is only for the convenience of description and should not be understood as a limitation on the embodiments of the present application, and subsequent embodiments will not describe this one by one.
本申请致力于实现医学----图像文本之间的互检索任务。任务数据由医学图像和医学文本两部分组成对于医学图像包括众多种类图像,例如核磁图像、CT、超声图像等等,它们都是序列图像。对于医学文本包括:病历报告等等。这里只是举个例子,但不表示本申请方法只能应用在该领域。This application is dedicated to realizing medicine--mutual retrieval tasks between images and texts. The task data consists of two parts: medical images and medical text. Medical images include many types of images, such as MRI images, CT, ultrasound images, etc., which are all sequence images. For medical texts include: medical record reports, etc. This is just an example, but it does not mean that the method of this application can only be applied in this field.
在传统的解决方案中,大都采用单模态的检索方法对病人的病症进行解析,例如一些以图像处理技术为基础的医学图像检测模型,只能提供对对应的医疗图像的病症检测结果,只是由医疗图像给出病症结果,即有病或无病,而难以对病人进行全方位的分析,而要对病人全方位的分析需要获取病人的多种信息,例如病史、家族病史、年龄、生活习惯等信息,而这些信息大都采用文本记录,因此传统的技术实现上缺乏根据病人的多种信息确定病症的智能辅助技术。无法为医护人员提供全方位的病情及病因分析。In traditional solutions, most use single-modal retrieval methods to analyze patient conditions. For example, some medical image detection models based on image processing technology can only provide disease detection results for corresponding medical images. Medical images give disease results, that is, disease or no disease, and it is difficult to conduct a comprehensive analysis of the patient. To conduct a comprehensive analysis of the patient, it is necessary to obtain a variety of patient information, such as medical history, family history, age, and life style. Habits and other information, and most of this information is recorded in text, so traditional technology implementation lacks intelligent assistive technology to determine the disease based on the patient's various information. It is impossible to provide medical staff with a comprehensive analysis of the condition and etiology.
如图1所示,为解决上述问题,本申请提出一种医疗图文数据互检方法,该医疗图文数据互检方法应用于计算机设备为例进行说明,包括:As shown in Figure 1, in order to solve the above problems, this application proposes a mutual checking method of medical graphic and text data. The mutual checking method of medical graphic and text data is applied to computer equipment as an example, including:
步骤S1、将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征;Step S1: Classify the text information in the graphic data in a multi-level manner according to a predetermined method, and pass the classified text information through the first neural network model to generate text features in a cascade manner according to the classification relationship;
步骤S2、将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征;Step S2: Generate image features from the image information in the graphic data through the second neural network model in the form of an image sequence;
步骤S3、根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型;Step S3: Iteratively train based on text features and image features based on a predetermined loss function to generate an image and text data mutual detection model;
步骤S4、通过图文数据互检模型对输入的图文数据中的文本信息和/或图像信息进行检索对应的文本信息和/或图像信息。Step S4: Retrieve the corresponding text information and/or image information from the text information and/or image information in the input image and text data through the image and text data mutual inspection model.
在本申请的实施例中,在步骤S1中,图文数据是指医疗图像对应的文本数据和图像数据,即本申请中的图文数据是指医疗序列图像和对应的病症描述以及病人对应的身体状态信息等对于病人的病 症相关的信息,具体可参考图2示出的内容。In the embodiment of this application, in step S1, the graphic data refers to the text data and image data corresponding to the medical image, that is, the graphic data in this application refers to the medical sequence image and the corresponding disease description and the patient's corresponding For information related to the patient's disease, such as physical status information, please refer to the content shown in Figure 2 for details.
进一步,对图文数据中的文本数据即病症的描述信息分成多个类别,如图2所示,并以分类后的文本为单位,将分类后的文本输入到第一神经网络模型,在本实施例中,第一神经网络模型为Transformer模型,即在步骤S1中,通过多个Transformer模型对分类后的文本数据分别计算对应的特征向量,然后将多个Transformer模型输出的特征向量作为输入,输入到一个上级Transformer模型中,将上级Transformer模型的输出结果作为文本特征。Further, the text data in the graphic data, that is, the description information of the disease, is divided into multiple categories, as shown in Figure 2, and the classified text is input into the first neural network model with the classified text as a unit. In the embodiment, the first neural network model is a Transformer model, that is, in step S1, corresponding feature vectors are calculated for the classified text data through multiple Transformer models, and then the feature vectors output by the multiple Transformer models are used as input, Input it into a superior Transformer model, and use the output results of the superior Transformer model as text features.
在步骤S2中,通过残差网络模型ResNet对图文数据中医疗图像进行计算得到对应的图像特征。图像特征是一个指定大小的向量。In step S2, the medical images in the graphic data are calculated through the residual network model ResNet to obtain corresponding image features. An image feature is a vector of specified size.
需要说明的是,在本申请的一些实施例中,图文数据中的医疗图像最少为1个,通常情况下是指多个医疗图像,即现实中的核磁共振或CT等医疗图像一般为多幅或多个角度的对病灶进行扫描。因此,当存在多个医疗图像时需要根据多个医疗图像生成对应的图像特征。It should be noted that in some embodiments of the present application, there is at least one medical image in the graphic data, which usually refers to multiple medical images. That is, in reality, medical images such as MRI or CT generally have multiple medical images. Scan the lesion across a wide area or multiple angles. Therefore, when there are multiple medical images, corresponding image features need to be generated based on the multiple medical images.
在步骤S3中,将上述文本特征和图像特征进行相似性匹配,并按照预先设定损失函数计算对应的相似性的损失值,并通过对应的损失值反向传播到Transformer模型和残差网络模型反复迭代训练直到损失值的大小满足对准确性的要求,则将Transformer模型和残差网络模型以及损失函数上对应的模型参数保存作为互检模型。In step S3, the above text features and image features are similarity matched, and the corresponding similarity loss value is calculated according to the preset loss function, and the corresponding loss value is back-propagated to the Transformer model and the residual network model. Iterative training is repeated until the size of the loss value meets the accuracy requirements, then the Transformer model, the residual network model and the corresponding model parameters on the loss function are saved as mutual inspection models.
在步骤S4中,在使用互检模型进行分析或预测时,将对应的病例或病症的文本描述和/或对应的医疗图像输入到互检模型,由互检模型根据输入的文本或图像给出匹配的检测报告,或者通过对应的医疗图像筛选出对应的病症的诊断内容。实现对医疗图像的图文互检,帮助医护工作者减轻工作量。In step S4, when using the mutual detection model for analysis or prediction, the text description of the corresponding case or disease and/or the corresponding medical image is input into the mutual detection model, and the mutual detection model gives a result based on the input text or image. Match the test report, or filter out the diagnosis content of the corresponding disease through the corresponding medical image. Realize the mutual inspection of pictures and texts of medical images to help medical workers reduce their workload.
在本申请的一些实施方式中,将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征包括:In some embodiments of the present application, the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. include:
将文本信息按照文本结构类型进行分类,并将分类后的每一个结构文本信息通过第一神经网络模型计算结构文本信息的特征向量。Classify the text information according to the text structure type, and calculate the feature vector of the structural text information through the first neural network model for each classified structural text information.
在本实施例中,如图2所示,图2示出的是某医院的病人的病症的描述信息,并包括病人的个人信息,如年龄、婚姻状况、职业等,以及过敏史、现病史、个人史、既往史、家族史、当前病症等众多内容。在本实施例中,将图2中文本信息,按照上述分类划分为多个结构文本,例如:个人史分类的文本内容为:出生并长于原籍,居住即生活环境良好…等内容作为一类文本信息。并将该内容作为一个Transformer模型的输入数据输入到Transformer模型中,由Transformer模型给出该分类下的对应的文本信息的特征向量。即将个人史的内容通过一个Transformer模型给出对应的特征向量来表示,用于后续的模型判断。In this embodiment, as shown in Figure 2, Figure 2 shows the description information of the patient's disease in a certain hospital, and includes the patient's personal information, such as age, marital status, occupation, etc., as well as allergy history and current illness history. , personal history, past history, family history, current illnesses, and much more. In this embodiment, the text information in Figure 2 is divided into multiple structured texts according to the above classification. For example, the text content of the personal history classification is: born and raised in the place of origin, living in a good living environment, etc., as one type of text. information. And the content is input into the Transformer model as input data of a Transformer model, and the Transformer model gives the feature vector of the corresponding text information under the category. That is, the content of personal history is represented by a Transformer model to give corresponding feature vectors for subsequent model judgment.
进一步,上述将分类后的文本内容输入到Transformer模型,并不是原文输入,而是将对应的文字以相应的工具转换为词向量模式后输入到Transformer模型,对应的工具可以是Bert等模型来做文本向量化。Furthermore, the above-mentioned input of classified text content into the Transformer model is not the original text input, but the corresponding text is converted into word vector mode using the corresponding tool and then input into the Transformer model. The corresponding tool can be a model such as Bert. Text vectorization.
在本申请的一些实施方式中,将文本信息按照文本结构类型进行分类包括:In some embodiments of the present application, classifying text information according to text structure types includes:
将文本信息按照文本结构和/或时间类型进行分类。Categorize text information according to text structure and/or time type.
在本申请的一些实施例中,文本信息的分来还可以根据时间与结构结合的方式。例如,有些疾病的原因并不是通过过往病史能够影响的,而是病人在最近一段时间内的生活习惯或出现的其他预发的病症情况有关的,因此当全部病症内容跟人病史、既往史或家族史混为一谈时,则会存在大量的无关因素对互检模型的判断造成影响,因此在对文本信息进行分类时,可结合时间因素对文本信息进行分类。突出某些病症的文本内容在模型判断时的效果。In some embodiments of the present application, text information can also be divided according to a combination of time and structure. For example, the causes of some diseases cannot be affected by past medical history, but are related to the patient's living habits or other pre-existing symptoms in the recent period. Therefore, when the entire disease content is related to the person's medical history, past history or When family histories are mixed together, there will be a large number of irrelevant factors that will affect the judgment of the mutual inspection model. Therefore, when classifying text information, time factors can be used to classify the text information. The effect of text content that highlights certain diseases in model judgment.
在本申请的一些实施方式中,将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征还包括:In some embodiments of the present application, the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. Also includes:
将分类后的结构文本信息中的文本内容,以语句的先后出现次数进行排序,并将排序后的每一个语句作为参数输入到第一神经网络模型计算结构文本信息的文本特征。Sort the text content in the classified structured text information based on the number of occurrences of the statements, and input each sorted statement as a parameter into the first neural network model to calculate the text features of the structured text information.
在本实施例中,对于分类后的文本内容,本申请同样采用Transformer模型级联的方式生成该分类的文本特征。具体地,以分类后的文本按照标点符号或语义进行分割成对应的一段一段的内容,本申请中称为子句,即每一个分类由多个子句表示,在自然语言中,多个子句的内容就是分类后的文本内容。In this embodiment, for the classified text content, this application also uses the Transformer model cascade method to generate the text features of the classification. Specifically, the classified text is divided into corresponding sections of content according to punctuation marks or semantics, which are called clauses in this application, that is, each category is represented by multiple clauses. In natural language, multiple clauses The content is the classified text content.
进一步,将每个子句作为一个Transformer模型的输入并计算子句对应的特征向量,即一个子句对应一个Transformer模型,然后将多个子句特征向量输入到一个Transformer模型中,由该Transformer模型将多个子句的特征向量进行计算,计算结果则为该分类后的文本内容的文本特征,例如,将图2中个人史的所有内容,按照每一句话(逗号间隔也可以视为一句话)对应一个Transformer模型进行计算,然后将多句话对应的Transformer模型的多个输出输入到一个总的Transformer模型,然后由总的Transformer模型输出个人史的文本特征。可以参考图3所示的级联结构,图3示出的是多个分类文本的文本特征级联输入到一个总的Transformer模型,由总的Transformer模型输出文本特征的方式,将图3中的第一文本信息换成对应的子句即可。Furthermore, each clause is used as the input of a Transformer model and the feature vector corresponding to the clause is calculated, that is, one clause corresponds to a Transformer model, and then multiple clause feature vectors are input into a Transformer model, and the Transformer model converts multiple clauses into one Transformer model. The feature vector of the clause is calculated, and the calculation result is the text feature of the classified text content. For example, all the content of the personal history in Figure 2 corresponds to one sentence (the comma interval can also be regarded as one sentence). The Transformer model performs calculations, and then inputs the multiple outputs of the Transformer model corresponding to multiple sentences into a total Transformer model, and then the total Transformer model outputs text features of personal history. You can refer to the cascade structure shown in Figure 3. Figure 3 shows the way in which the text features of multiple classified texts are cascaded into a total Transformer model, and the text features are output by the total Transformer model. In Figure 3 Just replace the first text information with the corresponding clause.
在本申请的一些实施方式中,以语句的先后出现次数进行排序,并将排序后的每一个语句作为参数输入到第一神经网络模型计算结构文本信息的文本特征:In some embodiments of the present application, sentences are sorted by the number of times they appear, and each sorted sentence is input as a parameter to the first neural network model to calculate the text features of the structural text information:
将每一个语句中的词以其对应的顺序编号值以及文本结构分类中的语句编号相加后输入到第一神经网络模型计算结构文本信息的文本特征。The words in each sentence are added together with their corresponding sequence number values and the sentence numbers in the text structure classification and then input into the first neural network model to calculate text features of the structural text information.
在本实施例中,如前所述,在以子句为单位计算分类后的文本信息的文本特征时,将多个子句输入到多个Transformer模型,对于任意的子句在输入到Transformer模型时,将子句中对应的词(以词向量表示,即数字表示)分别与对应的子句在该分类文本信息中的位置编号相加,即假如第一个词的词向量的值为0.3,其的子句为该分类文本中的第一个子句,则计算过程为0.3+1,进一步加上该词向量的位置编号,即也为1,即0.3+1+1=2.3,将2.3作为Transformer模型的第一个输入数据,依次类推,对第二个次,假如是0.4,则按照上述过程其输入到Transformer模型的值为0.4+1+2,其中,1是第一个语句,2是该词在本语句中排名第二。按照上述方式对每一个子句计算器对应的特 征向量,并将多个子句的特征向量最后输如到总的Transformer模型得到该分类的文本特征值。In this embodiment, as mentioned above, when calculating text features of classified text information in clause units, multiple clauses are input to multiple Transformer models. When any clause is input to the Transformer model, , add the corresponding words in the clause (represented by word vectors, that is, numbers) to the position numbers of the corresponding clauses in the classified text information, that is, if the value of the word vector of the first word is 0.3, Its clause is the first clause in the classified text, then the calculation process is 0.3+1, further adding the position number of the word vector, which is also 1, that is, 0.3+1+1=2.3, 2.3 As the first input data of the Transformer model, and so on, for the second time, if it is 0.4, the value input to the Transformer model according to the above process is 0.4+1+2, where 1 is the first statement, 2 means that the word ranks second in this sentence. Calculate the corresponding feature vector for each clause in the above manner, and finally input the feature vectors of multiple clauses into the overall Transformer model to obtain the text feature value of the category.
需要说明的是,在将多个子句的特征向量添加到Transformer模型中时,计算过程如子句计算过程,即假如第一个子句的特征向量是a=[0.1,0.2,0.3],即其不在是单个的词向量,在输入到Transformer模型时,则将如果a是第一个子句的特征向量,则a+1,同时再加上分类编号假如是第一文本分类,则也是1,即a+1+1=[2.1,2.2,2.3],依此类推,完成每一个文本分类的级联Transformer模型计算。具体地参考图4,图4下部的Emb表示Transformer模型的一个输入数据,在输入到Transformer模型之前,任意个数据都需要先和所属的文本分类的编号即图中下部倒数第二行的文本类型的值相加,然后再和输入的子句的排序编号(即图4中的位置信息)相加,最后的值才输入到Transformer模型中。It should be noted that when adding the feature vectors of multiple clauses to the Transformer model, the calculation process is the same as the clause calculation process, that is, if the feature vector of the first clause is a=[0.1,0.2,0.3], that is It is no longer a single word vector. When input to the Transformer model, if a is the feature vector of the first clause, then a+1 will be added. If a is the first text category, it will also be 1. If it is the first text category, it will also be 1. , that is, a+1+1=[2.1,2.2,2.3], and so on, to complete the cascade Transformer model calculation for each text classification. Specifically referring to Figure 4, the Emb in the lower part of Figure 4 represents an input data of the Transformer model. Before being input to the Transformer model, any piece of data needs to be combined with the number of the text category it belongs to, which is the text type in the second to last line in the lower part of the figure. The values are added, and then added to the sorting number of the input clause (that is, the position information in Figure 4), and the final value is input into the Transformer model.
在本申请的一些实施方式中,方法还包括:In some embodiments of the present application, the method further includes:
从第一神经网络模型输出的与对应多个语句的计算结果中选择其中任意一个作为结构文本信息的文本特征;或Select any one of the calculation results output by the first neural network model and corresponding to multiple sentences as the text feature of the structured text information; or
将第一神经网络模型输出的与多个语句对应的计算结果加权求平均值得到结构文本信息的文本特征。The text features of the structured text information are obtained by weighting and averaging the calculation results output by the first neural network model and corresponding to the plurality of sentences.
在本申请的一些实施例中,结构文本信息是指分类后的文本信息。对于任意级的Transformer模型的输出的选择有两种方式,第一种方式则是依赖Transformer模型的计算原则,Transformer模型对任意一个输入的数据,都会将其与其他输入数据进行计算并输出该数据的结果,即该输入数据的特征向量(与原始输入的值不同),因此可将任意一个输入数据的输出结果作为该层级的Transformer模型的输出,即如果是某个分类的文本信息的输出结果可以将其中一个子句经过总的Transformer模型计算后的值作为该分类文本的文本特征。In some embodiments of the present application, structured text information refers to classified text information. There are two ways to select the output of the Transformer model at any level. The first way is to rely on the calculation principle of the Transformer model. For any input data, the Transformer model will calculate it with other input data and output the data. The result is the feature vector of the input data (different from the value of the original input), so the output result of any input data can be used as the output of the Transformer model at this level, that is, if it is the output result of a certain category of text information The value of one of the clauses calculated by the total Transformer model can be used as the text feature of the classified text.
即将其中某个子句的Transformer模型输出值作为整个分类文本的文本特征;或者将多个子句Transformer模型输出值加权求平均得到对应的结构文本信息的文本特征。That is, the output value of the Transformer model of one of the clauses is used as the text feature of the entire classified text; or the output value of the Transformer model of multiple clauses is weighted and averaged to obtain the text feature of the corresponding structural text information.
在本申请的一些实施方式中,将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征还包括:In some embodiments of the present application, the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. Also includes:
将多个结构文本信息的文本特征输入到第一神经网络模型的得到文本信息的文本特征。Text features of the multiple structural text information are input into the first neural network model to obtain text features of the text information.
如图3所示,图3示出了本申请的Transformer模型级联的示意图,即在下层通过多个Transformer模型对多个分类文本进行计算,并得到对应的多个分类文本特征,然后将多个分类后的文本特征输入到最后一级Transformer模型性得到整体文本的文本特征。As shown in Figure 3, Figure 3 shows a schematic diagram of the Transformer model cascade of this application, that is, multiple classified texts are calculated through multiple Transformer models in the lower layer, and corresponding multiple classified text features are obtained, and then the multiple classified text features are calculated. The classified text features are input into the last-level Transformer model to obtain the text features of the overall text.
在本申请的一些实施方式中,将多个结构文本信息的特征向量输入到第一神经网络模型的得到文本信息的文本特征包括:In some embodiments of the present application, inputting feature vectors of multiple structural text information into the first neural network model to obtain text features of the text information includes:
将每一个结构文本信息的文本特征和其对应的结构文本的顺序值以及分类编号相加后输入到第一神经网络模型计算文本信息的文本特征。The text features of each structured text information and the corresponding sequence value and classification number of the structured text are added and then input into the first neural network model to calculate the text features of the text information.
在本实施例中,结构文本是指按照结构进行分类后的分类文本。进一步,与分类文本中的子句的 特征级联计算类似,对于整体文本信息的文本特征计算时,也需要先将对应的分类的后的文本特征,首先和其对应的分类编号相加,再和其对应的顺序相加,不同的是,两次相加的值在一些场景下是相同的。In this embodiment, structured text refers to classified text classified according to structure. Furthermore, similar to the feature cascade calculation of clauses in classified text, when calculating the text features of the overall text information, it is also necessary to first add the text features after the corresponding classification to their corresponding classification numbers, and then The difference from its corresponding sequential addition is that the values of the two additions are the same in some scenarios.
在本申请的一些实施方式中,方法还包括:In some embodiments of the present application, the method further includes:
从第一神经网络模型输出的与对应多个结构文本信息的计算结果中选择其中任意一个作为文本信息的文本特征;或Select any one of the calculation results output by the first neural network model and corresponding to multiple structural text information as the text feature of the text information; or
将第一神经网络模型输出的与多个结构文本信息的计算结果加权求平均值得到文本信息的文本特征;或The text features of the text information are obtained by weighting and averaging the calculation results output by the first neural network model and multiple structural text information; or
将多个结构文本信息的文本特征进行拼接成长向量,并将拼接后的长向量通过全连接层得到文本信息的文本特征。The text features of multiple structural text information are spliced into long vectors, and the spliced long vectors are passed through the fully connected layer to obtain the text features of the text information.
在本实施例中,与上述确定分类文本的文本特征类似,在确定整体文本的文本特征时,可以选择Transformer模型输出的一个分类文本的特征作为整体文本的特征。即可以选择其中一个分类对应的总的Transformer模型输出的结果作为图文数据的文本特征。In this embodiment, similar to determining the text features of the classified text described above, when determining the text features of the entire text, a feature of the classified text output by the Transformer model can be selected as the feature of the overall text. That is, the output result of the total Transformer model corresponding to one of the categories can be selected as the text feature of the graphic data.
也可以是将多个结构文本信息(分类文本信息)中的文本特征加权求平均值得到整体文本信息的文本特征。It may also be a weighted average of the text features in multiple structural text information (classified text information) to obtain the text features of the overall text information.
在本申请的一些实施例中,可以将多个结构文本信息的文本特征进行首尾拼接,并对拼接后的文本特征通过全连接层得到一个新的维度的特征向量作为整体文本信息的文本特征。In some embodiments of the present application, the text features of multiple structural text information can be spliced head to tail, and the spliced text features are used through a fully connected layer to obtain a new dimensional feature vector as the text feature of the overall text information.
在本申请的一些实施方式中,将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征包括:In some embodiments of the present application, using the image information in the graphic data in the form of an image sequence to generate image features through a second neural network model includes:
将图像序列输入到第二神经网络模型并计算出图像序列对应的图像序列特征向量;Input the image sequence into the second neural network model and calculate the image sequence feature vector corresponding to the image sequence;
计算图像序列特征向量的权重,并将权重与图像序列特征向量相乘得到图像序列特征权重向量;以及Calculate the weight of the image sequence feature vector, and multiply the weight with the image sequence feature vector to obtain the image sequence feature weight vector; and
将图像特征权重向量再与图像序列特征向量相加得到图像特征。The image feature weight vector is then added to the image sequence feature vector to obtain the image feature.
在本实施例中,如图5所示,图5中示出的图像序列仅展示出3张图像。具体地,通过残差网络计算图像序列,并得到对应的每一个图像的特征向量。In this embodiment, as shown in FIG. 5 , the image sequence shown in FIG. 5 only shows three images. Specifically, the image sequence is calculated through the residual network, and the corresponding feature vector of each image is obtained.
进一步计算每一个图像对应的特征向量的权重,将权重分别与对应的特征向量相乘,再与对应的图像的特征向量相加,再通过线性变换将多个图像特征权重向量变换到一个新的维度作为图像序列的图像特征。Further calculate the weight of the feature vector corresponding to each image, multiply the weight with the corresponding feature vector respectively, add it to the feature vector of the corresponding image, and then transform the multiple image feature weight vectors into a new one through linear transformation. Dimensions as image features of image sequences.
在本申请的一些实施方式中,计算图像序列特征向量的权重包括:In some embodiments of the present application, calculating the weight of the image sequence feature vector includes:
将图像序列特征向量经过第一全连接层得到第一全连接层向量;Pass the image sequence feature vector through the first fully connected layer to obtain the first fully connected layer vector;
将第一全连接层向量通过池化层得到池化层向量;Pass the first fully connected layer vector through the pooling layer to obtain the pooling layer vector;
将池化层向量再通过第二全连接层得到第二全连接层向量;Pass the pooling layer vector through the second fully connected layer to obtain the second fully connected layer vector;
将第二全连接成向量进行归一化得到对应的图像序列的权重。在本实施例中,如图7所示,图7是本申请网络结构总图的子图,阐明了我们的权重计算结构,包含两个全连接层FC和一个ReLU层。在本申请中,图像特征经过骨干网络backbone后获得嵌入式特征即图像的特征向量,嵌入式特征经过一个全连接层以后获得每张图像的最终的嵌入特征e。最终的嵌入特征e会通过经过attention结构,计算每个特征的权重,该权重是一个数,经过sigmoid层进行归一化。Normalize the second fully connected vector to obtain the weight of the corresponding image sequence. In this embodiment, as shown in Figure 7, Figure 7 is a sub-figure of the overall network structure diagram of this application, illustrating our weight calculation structure, including two fully connected layers FC and one ReLU layer. In this application, the image features are passed through the backbone network to obtain the embedded features, that is, the feature vector of the image. The embedded features are passed through a fully connected layer to obtain the final embedded feature e of each image. The final embedded feature e will calculate the weight of each feature through the attention structure. The weight is a number and is normalized through the sigmoid layer.
所有图像序列的特征的权重会统一进入softmax层,来判别哪一个医学图像序列是重要的。最终,经过softmax层后的图像序列的特征权重会于对应的每张图像的最终的嵌入特征e相乘。同时,我们引入了残差网络的思想,对于每个医学图像序列而言,其注意力结构的输出如下公式所示:The weights of the features of all image sequences will be uniformly entered into the softmax layer to determine which medical image sequence is important. Finally, the feature weight of the image sequence after the softmax layer is multiplied by the final embedded feature e corresponding to each image. At the same time, we introduced the idea of residual network. For each medical image sequence, the output of its attention structure is as follows:
Figure PCTCN2022141374-appb-000001
Figure PCTCN2022141374-appb-000001
最后的图像特征会通过Liner的全连接层fc,得到最终的医学图像特征:The final image features will pass through Liner's fully connected layer fc to obtain the final medical image features:
Figure PCTCN2022141374-appb-000002
Figure PCTCN2022141374-appb-000002
上式中
Figure PCTCN2022141374-appb-000003
表示经过多个图像对应的图像序列特征向量,与权重相乘(Attention)后再加上原有的图像序列特征向量,
Figure PCTCN2022141374-appb-000004
表示最终计算的图像特征,将经过权重计算和自身相加后的图像序列的特征向量通过全连接层计算得到新的维度的特征向量作为图像的图像特征。Attention值图7所示的Attention模型的计算过程。fc表示全连接层。
In the above formula
Figure PCTCN2022141374-appb-000003
Represents the image sequence feature vector corresponding to multiple images, multiplied by the weight (Attention) and then added to the original image sequence feature vector,
Figure PCTCN2022141374-appb-000004
Represents the final calculated image feature. The feature vector of the image sequence after weight calculation and its own addition is calculated through the fully connected layer to obtain a new dimensional feature vector as the image feature of the image. Attention value Figure 7 shows the calculation process of the Attention model. fc stands for fully connected layer.
在本申请的一些实施方式中,根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型包括:In some embodiments of the present application, iterative training to generate an image and text data mutual detection model based on text features and image features based on a predetermined loss function includes:
对任意的文本特征,计算文本特征与对应的图像特征的欧式距离和文本特征与其他文本特征和/或图像特征的最小欧式距离,并将欧式距离与最小欧式距离的差作为文本损失值;For any text feature, calculate the Euclidean distance between the text feature and the corresponding image feature and the minimum Euclidean distance between the text feature and other text features and/or image features, and use the difference between the Euclidean distance and the minimum Euclidean distance as the text loss value;
对任意的图像特征,计算图像特征与对应的文本特征的欧式距离和图像特征与其他文本特征/或图像特征的最小欧式距离,并将欧式距离和最小欧式距离作的差作为图像损失值;For any image feature, calculate the Euclidean distance between the image feature and the corresponding text feature and the minimum Euclidean distance between the image feature and other text features/or image features, and use the difference between the Euclidean distance and the minimum Euclidean distance as the image loss value;
将文本损失值和图像损失值求和得到第一损失值,并通过第一损失值训练互检模型。The text loss value and the image loss value are summed to obtain the first loss value, and the mutual detection model is trained through the first loss value.
在本实施例中,本申请提出一种新的generalized pairwise hinge-loss函数,对以上模型损失进行评估。公式如下:In this embodiment, this application proposes a new generalized pairwise hinge-loss function to evaluate the above model loss. The formula is as follows:
Figure PCTCN2022141374-appb-000005
Figure PCTCN2022141374-appb-000005
在loss函数设计中,如图8所示,本申请对于这种成对的数据,会遍历每一个图像组特征编码(即如前的图像特征)和文本特征编码(整体文本信息对应的文本特征)求取损失函数的平均值。如上公式所示。In the loss function design, as shown in Figure 8, for this pair of data, this application will traverse the feature coding of each image group (i.e., the image features as before) and the text feature coding (the text features corresponding to the overall text information). ) to find the average of the loss function. As shown in the above formula.
每一次迭代计算需遍历N次,N代表在本batch(批次)中,共有N个成对的样本。首先对图像组特征
Figure PCTCN2022141374-appb-000006
进行遍历(共N个),遍历选中的那个就称为
Figure PCTCN2022141374-appb-000007
a代表anchor(锚点样本)。与锚点样本成对的文本特征编码记为
Figure PCTCN2022141374-appb-000008
p代表positive。同理,在本batch中与
Figure PCTCN2022141374-appb-000009
不配对的其余所有样本记为s np。Δ是超参数,在训练时固定,本申请设置为0.4。
Each iterative calculation needs to be traversed N times, where N represents a total of N paired samples in this batch. First, the image group features
Figure PCTCN2022141374-appb-000006
Traverse (N in total), and the one selected by the traversal is called
Figure PCTCN2022141374-appb-000007
a represents anchor (anchor sample). The text feature encoding paired with the anchor sample is denoted as
Figure PCTCN2022141374-appb-000008
p stands for positive. In the same way, in this batch,
Figure PCTCN2022141374-appb-000009
All remaining unpaired samples are recorded as s np . Δ is a hyperparameter, fixed during training, and is set to 0.4 in this application.
同理,对于文本特征本申请也做相同的遍历操作,
Figure PCTCN2022141374-appb-000010
代表遍历中被选中的那个样本,与其对应的正图像组特征样本记为
Figure PCTCN2022141374-appb-000011
不对应的记为s np。本申请用以上loss函数在训练中,进行梯度反传,对级联Transformer模型、ResNet网络参数进行更新。
In the same way, this application also performs the same traversal operation for text features.
Figure PCTCN2022141374-appb-000010
Represents the sample selected in the traversal, and its corresponding positive image group feature sample is recorded as
Figure PCTCN2022141374-appb-000011
Those that do not correspond are recorded as s np . This application uses the above loss function to perform gradient backpropagation during training to update the cascade Transformer model and ResNet network parameters.
在本实施例中,其中文本损失值是指上述公式中的:In this embodiment, the text loss value refers to the above formula:
Figure PCTCN2022141374-appb-000012
Figure PCTCN2022141374-appb-000012
其中,
Figure PCTCN2022141374-appb-000013
表示文本特征与对应的图像特征的欧式距离,
Figure PCTCN2022141374-appb-000014
s np表示文本特征与其他文本特征和/或图像特征的最小欧式距离。
in,
Figure PCTCN2022141374-appb-000013
Represents the Euclidean distance between text features and corresponding image features,
Figure PCTCN2022141374-appb-000014
s np represents the minimum Euclidean distance between text features and other text features and/or image features.
图像损失值是指上述公式中的:The image loss value refers to the above formula:
Figure PCTCN2022141374-appb-000015
Figure PCTCN2022141374-appb-000015
Figure PCTCN2022141374-appb-000016
表示第一损失值(损失函数在一轮迭代计算后的损失值),
Figure PCTCN2022141374-appb-000017
表示图像特征与对应的文本特征的欧式距离,
Figure PCTCN2022141374-appb-000018
表示图像特征与其他文本特征/或图像特征的最小欧式距离。
Figure PCTCN2022141374-appb-000016
Represents the first loss value (the loss value after one round of iterative calculation of the loss function),
Figure PCTCN2022141374-appb-000017
Represents the Euclidean distance between image features and corresponding text features,
Figure PCTCN2022141374-appb-000018
Represents the minimum Euclidean distance between an image feature and other text features/or image features.
在本申请的一些实施方式中,根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型还包括:In some embodiments of the present application, iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
通过第一转换方法将文本特征变换到图像特征空间,并得到图像文本特征,再通过第二转换方法将图像文本特征变换到文本特征空间得到文本变换特征;Transform the text features into the image feature space through the first transformation method, and obtain the image text features, and then transform the image text features into the text feature space through the second transformation method to obtain the text transformation features;
通过第二转换方法将图像特征变换到文本特征空间,并得到文本图像特征,再通过第一转换方法将文本图像特征变换到图像特征空间并得到图像变换特征;Transform the image features into the text feature space through the second transformation method and obtain the text image features, and then transform the text image features into the image feature space through the first transformation method and obtain the image transformation features;
将文本变换特征与文本特征的距离和图像变换特征与图像特征的距离之和的最小值作为第二损失值,并通过第二损失值训练互检模型。在本实施例中,为了实现多结构文本和图像特征的对齐,即两个特征描述同一个语义空间的信息。本申请设计了一种语义对齐损失对抗损失函数:The minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature is used as the second loss value, and the mutual detection model is trained through the second loss value. In this embodiment, in order to achieve the alignment of multi-structured text and image features, that is, the two features describe information in the same semantic space. This application designs a semantic alignment loss adversarial loss function:
Figure PCTCN2022141374-appb-000019
Figure PCTCN2022141374-appb-000019
其中,F代表第一转换方法,G代表第二转换方法,Among them, F represents the first conversion method, G represents the second conversion method,
如上图所示,X代表e csi即我们的图像组特征,Y代表e rec即我们的医学文本特征。我们希望这两个特征X,Y映射到一个公共的空间当中。 As shown in the figure above, X represents e csi , which is our image group feature, and Y represents e rec , which is our medical text feature. We hope that these two features X and Y are mapped to a common space.
为了对以上目的进行约束,本申请执行步骤如下:In order to constrain the above purposes, the implementation steps of this application are as follows:
1将X特征通过方法G映射到
Figure PCTCN2022141374-appb-000020
代表由图像特征映射到文本特征空间中的特征。
1. Map the X feature to
Figure PCTCN2022141374-appb-000020
Represents the features mapped from image features to text feature space.
2将
Figure PCTCN2022141374-appb-000021
通过方法F映射到
Figure PCTCN2022141374-appb-000022
2 will
Figure PCTCN2022141374-appb-000021
mapped to via method F
Figure PCTCN2022141374-appb-000022
3本申请要求
Figure PCTCN2022141374-appb-000023
和原始特征X尽量相近。
3 Requirements for this application
Figure PCTCN2022141374-appb-000023
It should be as close as possible to the original feature X.
同理:Same reason:
4将Y特征通过方法F映射到
Figure PCTCN2022141374-appb-000024
代表由文本特征映射到图像特征空间中的特征。
4. Map Y features to
Figure PCTCN2022141374-appb-000024
Represents the features mapped from text features to image feature space.
5将
Figure PCTCN2022141374-appb-000025
通过方法G映射到
Figure PCTCN2022141374-appb-000026
5 will
Figure PCTCN2022141374-appb-000025
mapped to via method G
Figure PCTCN2022141374-appb-000026
6本申请要求
Figure PCTCN2022141374-appb-000027
和原始特征Y尽量相近。
6 Requirements for this application
Figure PCTCN2022141374-appb-000027
It should be as close as possible to the original feature Y.
因此,约束公式如下:Therefore, the constraint formula is as follows:
L c=min{E(||F(G(X))-X|| 2)+E(||G(F(Y))-Y|| 2)} L c =min{E(||F(G(X))-X|| 2 )+E(||G(F(Y))-Y|| 2 )}
其中,E(||F(G(X))-X|| 2)表示通过第二转换方法将图像特征变换到文本特征空间,并得到文本图像特征,再通过第一转换方法将文本图像特征变换到图像特征空间并得到图像变换特征之后,图像变换特征和原始图像特征之差的均值; Among them, E(||F(G(X))-X|| 2 ) means that the image features are transformed into the text feature space through the second transformation method, and the text image features are obtained, and then the text image features are transformed into the text image features through the first transformation method. After transforming into the image feature space and obtaining the image transformation features, the mean value of the difference between the image transformation features and the original image features;
E(||G(F(Y))-Y|| 2)表示第一转换方法将文本特征变换到图像特征空间,并得到图像文本特征,再通过第二转换方法将图像文本特征变换到文本特征空间得到文本变换特征后,文本变换特征与原始文本特征之差的均值, E(||G(F(Y))-Y|| 2 ) means that the first transformation method transforms text features into image feature space and obtains image text features, and then uses the second transformation method to transform image text features into text After the text transformation features are obtained from the feature space, the mean value of the difference between the text transformation features and the original text features is
L c则表示将文本变换特征与文本特征的距离和图像变换特征与图像特征的距离之和的最小值,即第二损失函数的最小值。 L c represents the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature, which is the minimum value of the second loss function.
在本实施例中通过L c作为损失函数迭代训练互检模型。 In this embodiment, the mutual detection model is iteratively trained using L c as the loss function.
在本申请的一些实施方式中,根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型还包括:In some embodiments of the present application, iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
通过第三转换方法分别计算相对应的文本特征和图像特征对应的损失值,并将,并判断文本特征对应的损失值与图像特征对应的损失值的差距,将差距作为第三损失值,并通过第三损失值迭代训练互检模型。The loss values corresponding to the corresponding text features and image features are calculated respectively through the third conversion method, and the difference between the loss values corresponding to the text features and the loss values corresponding to the image features is determined, and the difference is used as the third loss value, and The mutual inspection model is iteratively trained through the third loss value.
在本实施例中,本申请目的是使X(图像特征)和Y(文本特征)的特征尽量接近,因此本申请设计一种判别损失函数:In this embodiment, the purpose of this application is to make the features of X (image features) and Y (text features) as close as possible, so this application designs a discriminant loss function:
即,X通过D方法映射到Dx特征(标量,即将某图像特征通过第三转换方法D计算的到一个标量Dx)。Y通过D方法映射到Dy特征(标量)。目的是使Dx和Dy特征尽量接近,甚至无法判别出到底 是Dx还是Dy。That is, X is mapped to a Dx feature (scalar) through the D method, that is, a certain image feature is calculated into a scalar Dx through the third conversion method D). Y is mapped to Dy features (scalars) via the D method. The purpose is to make the Dx and Dy features as close as possible, so that it is even impossible to tell whether it is Dx or Dy.
公式如下:The formula is as follows:
L d=E[logD(Y)]+E[log(1-D(X))] L d =E[logD(Y)]+E[log(1-D(X))]
其中,logD(Y)是指通过第三转换方法D将文本特征变换成一个标量之后对该数求对数,E[logD(Y)]表示对应一个Batch样本中的所有对数的均值;log(1-D(X))是指通过第三转换方法D将图像征变换成一个标量之后,对其求对数,E[log(1-D(X))表示一个Batch样本中所有图像数据通过第三转换方法D变换之后的对数的均值。Among them, logD(Y) refers to the logarithm of the number after converting the text feature into a scalar through the third conversion method D. E[logD(Y)] represents the mean of all logarithms in a Batch sample; log (1-D(X)) refers to converting the image feature into a scalar through the third conversion method D, and then calculating the logarithm of it. E[log(1-D(X)) represents all the image data in a Batch sample The mean of the logarithm after transformation by the third transformation method D.
L d则表示上述判别损失函数在一个Batch样本中的损失值,即损失函数在一次Batch迭代训练下的损失值。如果第三转换方法D通过迭代训练得到合适的参数值,则D(Y)和D(X)应无比接近,L d则无限接近于0,(超理想情况下),则此时说明本申请将的互检模型将文本特征和图像特征变换到相同的空间,文本特征与对应的图像特征具有极相近的意义,即几乎相同,则说明可以用图像特征代表文本特征,其现实意义则是为某一组医疗图像找到对应的文本信息描述,或者是某文本信息找到对应的图像来展示,即实现了图文互检。在本申请的一些实施方式中,根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型还包括: L d represents the loss value of the above-mentioned discriminant loss function in a Batch sample, that is, the loss value of the loss function under one Batch iteration training. If the third conversion method D obtains appropriate parameter values through iterative training, then D(Y) and D(X) should be extremely close, and L d is infinitely close to 0, (under super-ideal conditions), then this application will be explained at this time The mutual detection model transforms text features and image features into the same space. Text features and corresponding image features have very similar meanings, that is, they are almost the same. This means that image features can be used to represent text features, and its practical significance is When a certain set of medical images finds the corresponding text information description, or when a certain text information finds the corresponding image for display, image-text mutual inspection is achieved. In some embodiments of the present application, iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
通过第一损失值、第二损失值和第三损失值的和作为损失值迭代训练互检模型。The mutual detection model is iteratively trained by using the sum of the first loss value, the second loss value and the third loss value as the loss value.
在本实施例中,可以将上述三种损失函数以叠加的方式对本申请提出的互检模型进行训练,即公式描述为:In this embodiment, the above three loss functions can be superimposed to train the mutual detection model proposed in this application, that is, the formula is described as:
Figure PCTCN2022141374-appb-000028
Figure PCTCN2022141374-appb-000028
实施例:Example:
本申请的训练过程可以参考图6,构建基于级联transformer的医学图像文本检索网络,包括文本信息特征编码器和医学图像序列特征编码器(如上图所示)。The training process of this application can refer to Figure 6 to build a medical image text retrieval network based on cascade transformers, including a text information feature encoder and a medical image sequence feature encoder (as shown in the figure above).
建立generalized pairwise hinge-loss损失函数:
Figure PCTCN2022141374-appb-000029
Establish a generalized pairwise hinge-loss loss function:
Figure PCTCN2022141374-appb-000029
根据如上损失函数对网络进行训练,使其收敛。The network is trained according to the above loss function to make it converge.
网络训练过程如下:神经网络的训练过程分为两个阶段。第一个阶段是数据由低层次向高层次传播的阶段,即前向传播阶段。另外一个阶段是,当前向传播得出的结果与预期不相符时,将误差从高层次向底层次进行传播训练的阶段,即反向传播阶段。训练过程为:The network training process is as follows: The training process of the neural network is divided into two stages. The first stage is the stage where data is propagated from low level to high level, that is, the forward propagation stage. Another stage is the stage where the error is propagated from the high level to the bottom level, that is, the back propagation stage, when the results obtained by the forward propagation are not in line with expectations. The training process is:
1、所有网络层权值进行初始化,一般采用随机初始化;1. All network layer weights are initialized, generally using random initialization;
2、输入图像和文本数据经过神经网络、卷积层、下采样层、全连接层等各层的前向传播得到输出值;2. The input image and text data are forward propagated through the neural network, convolutional layer, downsampling layer, fully connected layer and other layers to obtain the output value;
3、求出网络的输出值,根据如上损失函数计算公式求取网络的输出值的泛三元组损失函数
Figure PCTCN2022141374-appb-000030
语义对齐对抗损失函数(L c+L d)之和。
3. Find the output value of the network, and calculate the universal triplet loss function of the output value of the network according to the above loss function calculation formula.
Figure PCTCN2022141374-appb-000030
The sum of semantic alignment adversarial loss functions (L c +L d ).
4、将误差反向传回网络中,依次求得网络各层:transformer层,全连接层,卷积层等各层的反向传播误差。4. Reversely propagate the error back to the network, and obtain the back propagation error of each layer of the network: transformer layer, fully connected layer, convolution layer, etc.
5、网络各层根据各层的反向传播误差对网络中的所有权重系数进行调整,即进行权重的更新。5. Each layer of the network adjusts all weight coefficients in the network based on the backpropagation error of each layer, that is, updates the weights.
6、重新随机选取新的batch的图像文本数据,然后进入到第二步,获得网络前向传播得到输出值。6. Re-randomly select the image text data of the new batch, and then enter the second step to obtain the output value through forward propagation of the network.
7、无限往复迭代,当求出网络的输出值与目标值(标签)之间的误差小于某个阈值,或者迭代次数超过某个阈值时,结束训练。7. Infinite iteration. When the error between the output value of the network and the target value (label) is less than a certain threshold, or the number of iterations exceeds a certain threshold, the training ends.
8、保存训练好的所有层的网络参数。8. Save the trained network parameters of all layers.
下面简述网络推理过程,即检索匹配过程:The following is a brief description of the network reasoning process, that is, the retrieval and matching process:
在推理过程中,预先加载网络训练好的权重系数。对医学文本或医学图像序列进行特征提取。存入待检索数据集中。用户给定任意医学文本数据或医学图像序列数据,我们称为query数据。提取query数据的医学文本数据或医学图像序列数据的特征,使用我们的级联transformer的医学图像文本检索网络。将query数据的特征与待检索数据集中所有样本特征进行距离匹配,即求向量距离。本申请求欧式距离。例如:若query数据是医学文本数据就去取待检索数据集中所有的医学图像序列特征进行求距离。同理query数据是医学图像序列数据。与待检索数据集中所有的医学图像序列特征求欧式距离,距离最小的样本即为推荐样本,进行输出。During the inference process, the weight coefficients trained by the network are preloaded. Feature extraction from medical text or medical image sequences. Store in the data set to be retrieved. The user gives any medical text data or medical image sequence data, which we call query data. Extract features of medical text data or medical image sequence data of query data, using our cascade transformer medical image text retrieval network. Distance-match the features of the query data with the features of all samples in the data set to be retrieved, that is, find the vector distance. This application finds the Euclidean distance. For example: If the query data is medical text data, get all the medical image sequence features in the data set to be retrieved and calculate the distance. Similarly, query data is medical image sequence data. Calculate the Euclidean distance from all the medical image sequence features in the data set to be retrieved. The sample with the smallest distance is the recommended sample and is output.
通过本申请提供的图文互检方法,将文本数据以多级Transformer模型级联的方式计算对应的文本特征,并通过残差网络计算图像序列的图像特征,并通过本申请提出的多个损失函数基于文本特征和图像特征计算训练图文数据互检模型。进一步通过图文数据互检模型对输入的文本数据或图像数据进行预测或检索对应的文本数据或图像数据。Through the image-text mutual inspection method provided by this application, the text data is calculated in a multi-level Transformer model cascade manner, and the corresponding text features are calculated through the residual network. The image features of the image sequence are calculated through the multiple losses proposed by this application. The function calculates and trains the mutual detection model of graphic and text data based on text features and image features. Further, the input text data or image data is predicted or the corresponding text data or image data is retrieved through the image and text data mutual detection model.
如图9所示,本申请的另一方面还提出一种医疗图文数据互检装置,包括:As shown in Figure 9, another aspect of this application also proposes a medical image and text data mutual detection device, including:
预处理模块1,预处理模块1配置用于将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征; Preprocessing module 1, the preprocessing module 1 is configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model in a cascade manner according to the classification relationship. Generate text features;
第一模型计算模块2,第一模型计算模块2配置用于将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征;The first model calculation module 2 is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
第二模型计算模块3,第二模型计算模块3配置用于根据文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型;The second model calculation module 3 is configured to iteratively train and generate a graphic and text data mutual detection model based on text features and image features based on a predetermined loss function;
图文互检模块4,图文互检模块4配置用于通过图文数据互检模型对输入的图文数据中的文本信息和/或图像信息进行检索对应的文本信息和/或图像信息。The image-text mutual inspection module 4 is configured to retrieve the corresponding text information and/or image information in the input image-text data through the image-text data mutual inspection model.
如图10所示,本申请的又一方面还提出一种计算机设备,该计算机设备可以为终端或服务器,包括:As shown in Figure 10, another aspect of this application also proposes a computer device. The computer device can be a terminal or a server, including:
至少一个处理器21;以及at least one processor 21; and
存储器22,存储器22存储有可在处理器21上运行的计算机可读指令23,可读指令23由处理器21执行时实现上述实施方式中任意一项方法的步骤。The memory 22 stores computer readable instructions 23 that can be run on the processor 21 . When the readable instructions 23 are executed by the processor 21 , the steps of any one of the methods in the above embodiments are implemented.
如图11所示,本申请的再一方面还提出一种非易失性计算机可读存储介质401,非易失性计算机可读存储介质401存储有计算机可读指令402,计算机可读指令402被处理器执行时实现上述实施方式中任意一项方法的步骤。As shown in Figure 11, another aspect of the present application also proposes a non-volatile computer-readable storage medium 401. The non-volatile computer-readable storage medium 401 stores computer-readable instructions 402. The computer-readable instructions 402 When executed by the processor, the steps of any one of the methods in the above embodiments are implemented.
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。The above are exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application as defined in the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or claimed in individual form, they may also be understood as plural unless explicitly limited to the singular.
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。It will be understood that, as used herein, the singular form "a" and "an" are intended to include the plural form as well, unless the context clearly supports an exception. It will also be understood that as used herein, "and/or" is meant to include any and all possible combinations of one or more of the associated listed items.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,上述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through computer readable instructions. The above computer readable instructions can be stored in a non-volatile computer. When the computer readable instructions are read from the storage medium and executed, they may include the processes of the above method embodiments. Any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the patent application. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims (21)

  1. 一种医疗图文数据互检方法,其特征在于,包括:A method for mutual checking of medical graphic and text data, which is characterized by including:
    将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征;Classify the text information in the graphic data in a multi-level manner according to a predetermined method, and pass the classified text information through the first neural network model to generate text features in a cascade manner according to the classification relationship;
    将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征;Generate image features from the image information in the graphic data through the second neural network model in the form of image sequences;
    根据所述文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型;和Iterative training based on the text features and image features based on a predetermined loss function generates a graphic and text data mutual detection model; and
    通过所述图文数据互检模型对输入的图文数据中的文本信息和/或图像信息进行检索对应的文本信息和/或图像信息。The text information and/or image information in the input graphic data are retrieved for corresponding text information and/or image information through the mutual checking model of graphic and text data.
  2. 根据权利要求1所述的方法,其特征在于,所述将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征,包括:The method according to claim 1, characterized in that the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model in a hierarchical manner according to the classification relationship. Text features are generated in a connected manner, including:
    将所述文本信息按照文本结构类型进行分类,并将分类后的每一个结构文本信息通过所述第一神经网络模型计算所述结构文本信息的特征向量。The text information is classified according to text structure types, and each classified structural text information is used to calculate the feature vector of the structural text information through the first neural network model.
  3. 根据权利要求2所述的方法,其特征在于,所述将所述文本信息按照文本结构类型进行分类,包括:The method of claim 2, wherein classifying the text information according to text structure type includes:
    将所述文本信息按照文本结构和/或时间类型进行分类。The text information is classified according to text structure and/or time type.
  4. 根据权利要求2所述的方法,其特征在于,所述将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征,包括:The method according to claim 2, characterized in that the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model in a hierarchical manner according to the classification relationship. Text features are generated in a connected manner, including:
    将所述分类后的结构文本信息中的文本内容,以语句的先后出现次数进行排序,并将排序后的每一个语句作为参数输入到所述第一神经网络模型计算所述结构文本信息的文本特征。The text content in the classified structural text information is sorted based on the number of occurrences of statements, and each sorted statement is input into the first neural network model as a parameter to calculate the text of the structured text information. feature.
  5. 根据权利要求4所述的方法,其特征在于,所述以语句的先后出现次数进行排序,并将排序后的每一个语句作为参数输入到所述第一神经网络模型计算所述结构文本信息的文本特征,包括:The method of claim 4, wherein the sorting is performed based on the number of occurrences of statements, and each sorted statement is input into the first neural network model as a parameter to calculate the structural text information. Text features, including:
    将每一个语句中的词以其对应的顺序编号值以及所述文本结构分类中的语句编号相加后输入到所述第一神经网络模型计算所述结构文本信息的文本特征。The words in each sentence are added together with their corresponding sequence number values and the sentence numbers in the text structure classification, and then input into the first neural network model to calculate text features of the structural text information.
  6. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method of claim 4, further comprising:
    从所述第一神经网络模型输出的与对应多个语句的计算结果中选择其中任意一个作为所述结构文本信息的文本特征。Select any one of the calculation results corresponding to multiple sentences output by the first neural network model as the text feature of the structured text information.
  7. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method of claim 4, further comprising:
    将所述第一神经网络模型输出的与多个语句对应的计算结果加权求平均值得到所述结构文本信息 的文本特征。The text features of the structured text information are obtained by weighting and averaging the calculation results output by the first neural network model and corresponding to multiple sentences.
  8. 根据权利要求4所述的方法,其特征在于,所述将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征,包括:The method according to claim 4, characterized in that the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model in a hierarchical manner according to the classification relationship. Text features are generated in a connected manner, including:
    将多个所述结构文本信息的文本特征输入到所述第一神经网络模型的得到所述文本信息的文本特征。Text features of a plurality of structural text information are input into the first neural network model to obtain text features of the text information.
  9. 根据权利要求8所述的方法,其特征在于,所述将多个所述结构文本信息的特征向量输入到所述第一神经网络模型的得到所述文本信息的文本特征,包括:The method according to claim 8, wherein said inputting a plurality of feature vectors of the structural text information into the first neural network model to obtain text features of the text information includes:
    将每一个所述结构文本信息的文本特征和其对应的结构文本的顺序值以及分类编号相加后输入到所述第一神经网络模型计算所述文本信息的文本特征。The text features of each structured text information and the sequence value and classification number of the corresponding structured text are added and then input into the first neural network model to calculate the text features of the text information.
  10. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method of claim 8, further comprising:
    从所述第一神经网络模型输出的与对应多个结构文本信息的计算结果中选择其中任意一个作为所述文本信息的文本特征。Select any one of the calculation results output by the first neural network model and corresponding to multiple structural text information as the text feature of the text information.
  11. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method of claim 8, further comprising:
    将所述第一神经网络模型输出的与多个结构文本信息的计算结果加权求平均值得到所述文本信息的文本特征。The text features of the text information are obtained by weighting and averaging the calculation results output by the first neural network model and multiple structural text information.
  12. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method of claim 8, further comprising:
    将所述多个结构文本信息的文本特征进行拼接成长向量,并将拼接后的长向量通过全连接层得到所述文本信息的文本特征。The text features of the multiple structural text information are spliced into long vectors, and the spliced long vectors are passed through a fully connected layer to obtain the text features of the text information.
  13. 根据权利要求1所述的方法,其特征在于,所述将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征,包括:The method according to claim 1, characterized in that generating image features through a second neural network model from the image information in the graphic data in the form of an image sequence includes:
    将所述图像序列输入到所述第二神经网络模型并计算出所述图像序列对应的图像序列特征向量;Input the image sequence into the second neural network model and calculate the image sequence feature vector corresponding to the image sequence;
    计算所述图像序列特征向量的权重,并将所述权重与所述图像序列特征向量相乘得到图像序列特征权重向量;以及Calculate the weight of the image sequence feature vector, and multiply the weight with the image sequence feature vector to obtain an image sequence feature weight vector; and
    将所述图像特征权重向量再与所述图像序列特征向量相加得到所述图像特征。The image feature weight vector is then added to the image sequence feature vector to obtain the image feature.
  14. 根据权利要求13所述的方法,其特征在于,所述计算所述图像序列特征向量的权重,包括:The method of claim 13, wherein calculating the weight of the image sequence feature vector includes:
    将所述图像序列特征向量经过第一全连接层得到第一全连接层向量;Pass the image sequence feature vector through the first fully connected layer to obtain the first fully connected layer vector;
    将所述第一全连接层向量通过池化层得到池化层向量;Pass the first fully connected layer vector through the pooling layer to obtain the pooling layer vector;
    将所述池化层向量再通过第二全连接层得到第二全连接层向量;和Pass the pooling layer vector through the second fully connected layer to obtain the second fully connected layer vector; and
    将所述第二全连接成向量进行归一化得到对应的图像序列的权重。The second fully connected vector is normalized to obtain the weight of the corresponding image sequence.
  15. 根据权利要求1所述的方法,其特征在于,所述根据所述文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型,包括:The method according to claim 1, characterized in that the iterative training to generate a graphic and text data mutual detection model based on a predetermined loss function according to the text features and image features includes:
    对任意的文本特征,计算所述文本特征与对应的图像特征的欧式距离和所述文本特征与其他文本特征和/或图像特征的最小欧式距离,并将所述欧式距离与最小欧式距离的差作为文本损失值;For any text feature, calculate the Euclidean distance between the text feature and the corresponding image feature and the minimum Euclidean distance between the text feature and other text features and/or image features, and calculate the difference between the Euclidean distance and the minimum Euclidean distance as text loss value;
    对任意的图像特征,计算所述图像特征与对应的文本特征的欧式距离和所述图像特征与其他文本特征/或图像特征的最小欧式距离,并将所述欧式距离和所述最小欧式距离作的差作为图像损失值;和For any image feature, calculate the Euclidean distance between the image feature and the corresponding text feature and the minimum Euclidean distance between the image feature and other text features/or image features, and calculate the Euclidean distance and the minimum Euclidean distance as The difference is used as the image loss value; and
    将所述文本损失值和图像损失值求和得到第一损失值,并通过所述第一损失值训练所述互检模型。The text loss value and the image loss value are summed to obtain a first loss value, and the mutual detection model is trained through the first loss value.
  16. 根据权利要求1所述的方法,其特征在于,所述根据所述文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型,包括:The method according to claim 1, characterized in that the iterative training to generate a graphic and text data mutual detection model based on a predetermined loss function according to the text features and image features includes:
    通过第一转换方法将所述文本特征变换到图像特征空间,并得到图像文本特征,再通过第二转换方法将图像文本特征变换到文本特征空间得到文本变换特征;Transform the text features into the image feature space through the first conversion method, and obtain the image text features, and then transform the image text features into the text feature space through the second conversion method to obtain the text transformation features;
    通过第二转换方法将所述图像特征变换到文本特征空间,并得到文本图像特征,再通过第一转换方法将所述文本图像特征变换到图像特征空间并得到图像变换特征;和The image features are transformed into the text feature space through the second transformation method, and text image features are obtained, and the text image features are transformed into the image feature space through the first transformation method, and the image transformation features are obtained; and
    将所述文本变换特征与文本特征的距离和图像变换特征与图像特征的距离之和的最小值作为第二损失值,并通过所述第二损失值训练所述互检模型。The minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature is used as the second loss value, and the mutual detection model is trained through the second loss value.
  17. 根据权利要求1所述的方法,其特征在于,所述根据所述文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型,包括:The method according to claim 1, characterized in that the iterative training to generate a graphic and text data mutual detection model based on a predetermined loss function according to the text features and image features includes:
    通过第三转换方法分别计算相对应的文本特征和图像特征对应的损失值,并判断所述文本特征对应的损失值与所述图像特征对应的损失值的差距,将所述差距作为第三损失值,通过所述第三损失值迭代训练所述互检模型。The loss values corresponding to the corresponding text features and image features are respectively calculated through the third conversion method, and the difference between the loss values corresponding to the text features and the loss values corresponding to the image features is determined, and the difference is used as the third loss value, and iteratively trains the mutual detection model through the third loss value.
  18. 根据权利要求15-17中任一项所述的方法,其特征在于,所述根据所述文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型,包括:The method according to any one of claims 15 to 17, characterized in that the iterative training to generate a graphic and text data mutual detection model based on a predetermined loss function according to the text features and image features includes:
    通过所述第一损失值、第二损失值和第三损失值的和作为损失值迭代训练所述互检模型。The mutual detection model is iteratively trained using the sum of the first loss value, the second loss value and the third loss value as the loss value.
  19. 一种医疗图文数据互检装置,其特征在于,包括:A device for mutual checking of medical graphic and text data, which is characterized by including:
    预处理模块,所述预处理模块配置用于将图文数据中的文本信息按照预订方式进行多级分类,并将分类后的文本信息分别通过第一神经网络模型按照分类关系以级联的方式生成文本特征;A preprocessing module configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model in a cascade manner according to the classification relationship. Generate text features;
    第一模型计算模块,所述第一模型计算模块配置用于将图文数据中的图像信息以图像序列的方式通过第二神经网络模型生成图像特征;A first model calculation module, the first model calculation module is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
    第二模型计算模块,所述第二模型计算模块配置用于根据所述文本特征和图像特征基于预订的损失函数迭代训练生成图文数据互检模型;和A second model calculation module, the second model calculation module is configured to iteratively train and generate a graphic and text data mutual detection model based on the predetermined loss function according to the text features and image features; and
    图文互检模块,所述图文互检模块配置用于通过所述图文数据互检模型对输入的图文数据中的文本信息和/或图像信息进行检索对应的文本信息和/或图像信息。A picture-text mutual check module configured to retrieve the corresponding text information and/or image information from the text information and/or image information in the input picture-text data through the picture-text data mutual check model. information.
  20. 一种计算机设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1-18任意一项所述方法的步骤。A computer device, characterized in that it includes a memory and one or more processors. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the one or more processors, the computer-readable instructions cause the The one or more processors perform the steps of the method described in any one of claims 1-18.
  21. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1-18任意一项所述方法的步骤。One or more non-volatile computer-readable storage media storing computer-readable instructions, characterized in that, when executed by one or more processors, the computer-readable instructions cause the one or more processors to Carry out the steps of the method according to any one of claims 1-18.
PCT/CN2022/141374 2022-06-30 2022-12-23 Image-text data mutual-retrieval method and apparatus, and device and readable storage medium WO2024001104A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210760827.7 2022-06-30
CN202210760827.7A CN115408551A (en) 2022-06-30 2022-06-30 Medical image-text data mutual detection method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2024001104A1 true WO2024001104A1 (en) 2024-01-04

Family

ID=84158085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/141374 WO2024001104A1 (en) 2022-06-30 2022-12-23 Image-text data mutual-retrieval method and apparatus, and device and readable storage medium

Country Status (2)

Country Link
CN (1) CN115408551A (en)
WO (1) WO2024001104A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115408551A (en) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 Medical image-text data mutual detection method, device, equipment and readable storage medium
CN117407518B (en) * 2023-12-15 2024-04-02 广州市省信软件有限公司 Information screening display method and system based on big data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239153A (en) * 2021-05-26 2021-08-10 清华大学深圳国际研究生院 Text and image mutual retrieval method based on example masking
US20210271707A1 (en) * 2020-02-27 2021-09-02 Adobe Inc. Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
CN114357148A (en) * 2021-12-27 2022-04-15 之江实验室 Image text retrieval method based on multi-level network
CN114612749A (en) * 2022-04-20 2022-06-10 北京百度网讯科技有限公司 Neural network model training method and device, electronic device and medium
CN114661933A (en) * 2022-03-08 2022-06-24 重庆邮电大学 Cross-modal retrieval method based on fetal congenital heart disease ultrasonic image-diagnosis report
CN115408551A (en) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 Medical image-text data mutual detection method, device, equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210271707A1 (en) * 2020-02-27 2021-09-02 Adobe Inc. Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
CN113239153A (en) * 2021-05-26 2021-08-10 清华大学深圳国际研究生院 Text and image mutual retrieval method based on example masking
CN114357148A (en) * 2021-12-27 2022-04-15 之江实验室 Image text retrieval method based on multi-level network
CN114661933A (en) * 2022-03-08 2022-06-24 重庆邮电大学 Cross-modal retrieval method based on fetal congenital heart disease ultrasonic image-diagnosis report
CN114612749A (en) * 2022-04-20 2022-06-10 北京百度网讯科技有限公司 Neural network model training method and device, electronic device and medium
CN115408551A (en) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 Medical image-text data mutual detection method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN115408551A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
WO2022227207A1 (en) Text classification method, apparatus, computer device, and storage medium
US11580415B2 (en) Hierarchical multi-task term embedding learning for synonym prediction
CN107516110B (en) Medical question-answer semantic clustering method based on integrated convolutional coding
WO2020177230A1 (en) Medical data classification method and apparatus based on machine learning, and computer device and storage medium
US20210034813A1 (en) Neural network model with evidence extraction
WO2024001104A1 (en) Image-text data mutual-retrieval method and apparatus, and device and readable storage medium
CN112015868B (en) Question-answering method based on knowledge graph completion
CN110674850A (en) Image description generation method based on attention mechanism
CN112016295B (en) Symptom data processing method, symptom data processing device, computer equipment and storage medium
WO2020198855A1 (en) Method and system for mapping text phrases to a taxonomy
CN112149414B (en) Text similarity determination method, device, equipment and storage medium
WO2023029506A1 (en) Illness state analysis method and apparatus, electronic device, and storage medium
US11354599B1 (en) Methods and systems for generating a data structure using graphical models
WO2022227203A1 (en) Triage method, apparatus and device based on dialogue representation, and storage medium
US11625935B2 (en) Systems and methods for classification of scholastic works
WO2023160264A1 (en) Medical data processing method and apparatus, and storage medium
US20230244869A1 (en) Systems and methods for classification of textual works
US20220375576A1 (en) Apparatus and method for diagnosing a medical condition from a medical image
CN116956228A (en) Text mining method for technical transaction platform
US11783244B2 (en) Methods and systems for holistic medical student and medical residency matching
Wang et al. A BERT-based named entity recognition in Chinese electronic medical record
Gao et al. Accuracy analysis of triage recommendation based on CNN, RNN and RCNN models
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
US20220165430A1 (en) Leveraging deep contextual representation, medical concept representation and term-occurrence statistics in precision medicine to rank clinical studies relevant to a patient
Wang et al. TransH-RA: A Learning Model of Knowledge Representation by Hyperplane Projection and Relational Attributes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949180

Country of ref document: EP

Kind code of ref document: A1