WO2024001104A1 - Procédé et appareil d'extraction mutuelle de données d'image-texte, dispositif ainsi que support d'enregistrement lisible - Google Patents

Procédé et appareil d'extraction mutuelle de données d'image-texte, dispositif ainsi que support d'enregistrement lisible Download PDF

Info

Publication number
WO2024001104A1
WO2024001104A1 PCT/CN2022/141374 CN2022141374W WO2024001104A1 WO 2024001104 A1 WO2024001104 A1 WO 2024001104A1 CN 2022141374 W CN2022141374 W CN 2022141374W WO 2024001104 A1 WO2024001104 A1 WO 2024001104A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
features
feature
information
Prior art date
Application number
PCT/CN2022/141374
Other languages
English (en)
Chinese (zh)
Inventor
赵雅倩
王立
范宝余
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024001104A1 publication Critical patent/WO2024001104A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS

Definitions

  • the application belongs to the field of computers, and specifically relates to a method, device, equipment and readable storage medium for mutual checking of graphic and text data.
  • Single-modal retrieval can only query information of the same modality, such as text retrieval text and image retrieval image.
  • Cross-modal retrieval refers to using samples of one modality to retrieve samples of another modality that are semantically similar to it, such as image retrieval of text and text retrieval of images.
  • the cross-domain heterogeneity of this application is mainly reflected in the fact that the image data is in different spaces and is heterogeneous data. If the retrieval is correct, the retrieval method needs to have the function of cross-domain retrieval to achieve alignment and sorting between modalities.
  • cross-modal retrieval not only needs to model the relationship between modal data, but also needs to model the correlation between different modalities, so as to achieve different Cross-domain retrieval between modalities.
  • Cross-modal retrieval has strong flexibility, wide application scenarios and strong user needs. It is also an important research content of cross-modal machine learning and has very important academic value and significance.
  • This application proposes a mutual inspection method for medical graphic and text data, including:
  • Iterative training based on text features and image features based on a predetermined loss function generates a graphic and text data mutual detection model
  • the corresponding text information and/or image information in the input graphic data is retrieved through the graphic and text data mutual inspection model.
  • Another aspect of this application also proposes a medical image and text data mutual detection device, including:
  • the preprocessing module is configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model to generate text in a cascade manner according to the classification relationship. feature;
  • the first model calculation module is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
  • the second model calculation module is configured to iteratively train based on a predetermined loss function based on text features and image features to generate a graphic and text data mutual detection model;
  • the image-text mutual inspection module is configured to retrieve the corresponding text information and/or image information in the input image-text data through the image-text data mutual inspection model.
  • FIG. 1 Another aspect of the present application also proposes a computer device, including: a memory and one or more processors.
  • Computer readable instructions are stored in the memory.
  • the computer readable instructions are executed by the one or more processors, such that The above-mentioned one or more processors implement the steps of the above-mentioned mutual checking method of medical graphic and text data.
  • Another aspect of the present application also proposes one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the above-mentioned computer-readable instructions are executed by the above-mentioned one or more processors, the above-mentioned one or more processors Each processor executes the steps of the above-mentioned mutual checking method of medical graphic and text data.
  • Figure 1 is a flow chart of an embodiment of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 2 is a schematic diagram of medical text data provided by this application according to one or more embodiments.
  • Figure 3 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 4 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 5 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 6 is a schematic structural diagram of a model of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 7 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments;
  • Figure 8 is a schematic diagram of a partial model structure of a medical image and text retrieval method provided by this application according to one or more embodiments.
  • Figure 9 is a schematic structural diagram of a medical image and text data mutual detection device provided by the present application according to one or more embodiments.
  • Figure 10 is a schematic structural diagram of a computer device provided according to one or more embodiments of the present application.
  • Figure 11 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by this application according to one or more embodiments.
  • the task data consists of two parts: medical images and medical text.
  • Medical images include many types of images, such as MRI images, CT, ultrasound images, etc., which are all sequence images.
  • medical texts include: medical record reports, etc. This is just an example, but it does not mean that the method of this application can only be applied in this field.
  • this application proposes a mutual checking method of medical graphic and text data.
  • the mutual checking method of medical graphic and text data is applied to computer equipment as an example, including:
  • Step S1 Classify the text information in the graphic data in a multi-level manner according to a predetermined method, and pass the classified text information through the first neural network model to generate text features in a cascade manner according to the classification relationship;
  • Step S2 Generate image features from the image information in the graphic data through the second neural network model in the form of an image sequence
  • Step S3 Iteratively train based on text features and image features based on a predetermined loss function to generate an image and text data mutual detection model
  • Step S4 retrieve the corresponding text information and/or image information from the text information and/or image information in the input image and text data through the image and text data mutual inspection model.
  • the graphic data refers to the text data and image data corresponding to the medical image, that is, the graphic data in this application refers to the medical sequence image and the corresponding disease description and the patient's corresponding
  • information related to the patient's disease such as physical status information, please refer to the content shown in Figure 2 for details.
  • the text data in the graphic data is divided into multiple categories, as shown in Figure 2, and the classified text is input into the first neural network model with the classified text as a unit.
  • the first neural network model is a Transformer model, that is, in step S1, corresponding feature vectors are calculated for the classified text data through multiple Transformer models, and then the feature vectors output by the multiple Transformer models are used as input, Input it into a superior Transformer model, and use the output results of the superior Transformer model as text features.
  • step S2 the medical images in the graphic data are calculated through the residual network model ResNet to obtain corresponding image features.
  • An image feature is a vector of specified size.
  • there is at least one medical image in the graphic data which usually refers to multiple medical images. That is, in reality, medical images such as MRI or CT generally have multiple medical images. Scan the lesion across a wide area or multiple angles. Therefore, when there are multiple medical images, corresponding image features need to be generated based on the multiple medical images.
  • step S3 the above text features and image features are similarity matched, and the corresponding similarity loss value is calculated according to the preset loss function, and the corresponding loss value is back-propagated to the Transformer model and the residual network model. Iterative training is repeated until the size of the loss value meets the accuracy requirements, then the Transformer model, the residual network model and the corresponding model parameters on the loss function are saved as mutual inspection models.
  • step S4 when using the mutual detection model for analysis or prediction, the text description of the corresponding case or disease and/or the corresponding medical image is input into the mutual detection model, and the mutual detection model gives a result based on the input text or image. Match the test report, or filter out the diagnosis content of the corresponding disease through the corresponding medical image. Realize the mutual inspection of pictures and texts of medical images to help medical workers reduce their workload.
  • the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship.
  • Figure 2 shows the description information of the patient's disease in a certain hospital, and includes the patient's personal information, such as age, marital status, occupation, etc., as well as allergy history and current illness history. , personal history, past history, family history, current illnesses, and much more.
  • the text information in Figure 2 is divided into multiple structured texts according to the above classification.
  • the text content of the personal history classification is: born and raised in the place of origin, living in a good living environment, etc., as one type of text. information.
  • the content is input into the Transformer model as input data of a Transformer model, and the Transformer model gives the feature vector of the corresponding text information under the category. That is, the content of personal history is represented by a Transformer model to give corresponding feature vectors for subsequent model judgment.
  • the above-mentioned input of classified text content into the Transformer model is not the original text input, but the corresponding text is converted into word vector mode using the corresponding tool and then input into the Transformer model.
  • the corresponding tool can be a model such as Bert. Text vectorization.
  • classifying text information according to text structure types includes:
  • text information can also be divided according to a combination of time and structure.
  • the causes of some diseases cannot be affected by past medical history, but are related to the patient's living habits or other pre-existing symptoms in the recent period. Therefore, when the entire disease content is related to the person's medical history, past history or When family histories are mixed together, there will be a large number of irrelevant factors that will affect the judgment of the mutual inspection model. Therefore, when classifying text information, time factors can be used to classify the text information. The effect of text content that highlights certain diseases in model judgment.
  • the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. Also includes:
  • this application also uses the Transformer model cascade method to generate the text features of the classification. Specifically, the classified text is divided into corresponding sections of content according to punctuation marks or semantics, which are called clauses in this application, that is, each category is represented by multiple clauses. In natural language, multiple clauses The content is the classified text content.
  • each clause is used as the input of a Transformer model and the feature vector corresponding to the clause is calculated, that is, one clause corresponds to a Transformer model, and then multiple clause feature vectors are input into a Transformer model, and the Transformer model converts multiple clauses into one Transformer model.
  • the feature vector of the clause is calculated, and the calculation result is the text feature of the classified text content.
  • all the content of the personal history in Figure 2 corresponds to one sentence (the comma interval can also be regarded as one sentence).
  • the Transformer model performs calculations, and then inputs the multiple outputs of the Transformer model corresponding to multiple sentences into a total Transformer model, and then the total Transformer model outputs text features of personal history.
  • Figure 3 shows the way in which the text features of multiple classified texts are cascaded into a total Transformer model, and the text features are output by the total Transformer model. In Figure 3 Just replace the first text information with the corresponding clause.
  • sentences are sorted by the number of times they appear, and each sorted sentence is input as a parameter to the first neural network model to calculate the text features of the structural text information:
  • the words in each sentence are added together with their corresponding sequence number values and the sentence numbers in the text structure classification and then input into the first neural network model to calculate text features of the structural text information.
  • a is the feature vector of the first clause
  • the Emb in the lower part of Figure 4 represents an input data of the Transformer model.
  • any piece of data needs to be combined with the number of the text category it belongs to, which is the text type in the second to last line in the lower part of the figure.
  • the values are added, and then added to the sorting number of the input clause (that is, the position information in Figure 4), and the final value is input into the Transformer model.
  • the method further includes:
  • the text features of the structured text information are obtained by weighting and averaging the calculation results output by the first neural network model and corresponding to the plurality of sentences.
  • structured text information refers to classified text information.
  • the first way is to rely on the calculation principle of the Transformer model. For any input data, the Transformer model will calculate it with other input data and output the data. The result is the feature vector of the input data (different from the value of the original input), so the output result of any input data can be used as the output of the Transformer model at this level, that is, if it is the output result of a certain category of text information.
  • the value of one of the clauses calculated by the total Transformer model can be used as the text feature of the classified text.
  • the output value of the Transformer model of one of the clauses is used as the text feature of the entire classified text; or the output value of the Transformer model of multiple clauses is weighted and averaged to obtain the text feature of the corresponding structural text information.
  • the text information in the graphic data is classified into multiple levels according to a predetermined method, and the classified text information is passed through the first neural network model to generate text features in a cascade manner according to the classification relationship. Also includes:
  • Text features of the multiple structural text information are input into the first neural network model to obtain text features of the text information.
  • Figure 3 shows a schematic diagram of the Transformer model cascade of this application, that is, multiple classified texts are calculated through multiple Transformer models in the lower layer, and corresponding multiple classified text features are obtained, and then the multiple classified text features are calculated.
  • the classified text features are input into the last-level Transformer model to obtain the text features of the overall text.
  • inputting feature vectors of multiple structural text information into the first neural network model to obtain text features of the text information includes:
  • the text features of each structured text information and the corresponding sequence value and classification number of the structured text are added and then input into the first neural network model to calculate the text features of the text information.
  • structured text refers to classified text classified according to structure. Furthermore, similar to the feature cascade calculation of clauses in classified text, when calculating the text features of the overall text information, it is also necessary to first add the text features after the corresponding classification to their corresponding classification numbers, and then The difference from its corresponding sequential addition is that the values of the two additions are the same in some scenarios.
  • the method further includes:
  • the text features of the text information are obtained by weighting and averaging the calculation results output by the first neural network model and multiple structural text information; or
  • the text features of multiple structural text information are spliced into long vectors, and the spliced long vectors are passed through the fully connected layer to obtain the text features of the text information.
  • a feature of the classified text output by the Transformer model can be selected as the feature of the overall text. That is, the output result of the total Transformer model corresponding to one of the categories can be selected as the text feature of the graphic data.
  • the text features of multiple structural text information can be spliced head to tail, and the spliced text features are used through a fully connected layer to obtain a new dimensional feature vector as the text feature of the overall text information.
  • using the image information in the graphic data in the form of an image sequence to generate image features through a second neural network model includes:
  • the image feature weight vector is then added to the image sequence feature vector to obtain the image feature.
  • the image sequence shown in FIG. 5 only shows three images. Specifically, the image sequence is calculated through the residual network, and the corresponding feature vector of each image is obtained.
  • calculating the weight of the image sequence feature vector includes:
  • FIG. 7 is a sub-figure of the overall network structure diagram of this application, illustrating our weight calculation structure, including two fully connected layers FC and one ReLU layer.
  • the image features are passed through the backbone network to obtain the embedded features, that is, the feature vector of the image.
  • the embedded features are passed through a fully connected layer to obtain the final embedded feature e of each image.
  • the final embedded feature e will calculate the weight of each feature through the attention structure.
  • the weight is a number and is normalized through the sigmoid layer.
  • iterative training to generate an image and text data mutual detection model based on text features and image features based on a predetermined loss function includes:
  • the text loss value and the image loss value are summed to obtain the first loss value, and the mutual detection model is trained through the first loss value.
  • this application proposes a new generalized pairwise hinge-loss function to evaluate the above model loss.
  • the formula is as follows:
  • this application will traverse the feature coding of each image group (i.e., the image features as before) and the text feature coding (the text features corresponding to the overall text information). ) to find the average of the loss function. As shown in the above formula.
  • N represents a total of N paired samples in this batch.
  • the image group features Traverse (N in total), and the one selected by the traversal is called a represents anchor (anchor sample).
  • anchor sample The text feature encoding paired with the anchor sample is denoted as p stands for positive.
  • All remaining unpaired samples are recorded as s np .
  • is a hyperparameter, fixed during training, and is set to 0.4 in this application.
  • this application also performs the same traversal operation for text features. Represents the sample selected in the traversal, and its corresponding positive image group feature sample is recorded as Those that do not correspond are recorded as s np .
  • This application uses the above loss function to perform gradient backpropagation during training to update the cascade Transformer model and ResNet network parameters.
  • the text loss value refers to the above formula:
  • s np represents the minimum Euclidean distance between text features and other text features and/or image features.
  • the image loss value refers to the above formula:
  • iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
  • the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature is used as the second loss value, and the mutual detection model is trained through the second loss value.
  • the two features describe information in the same semantic space.
  • F represents the first conversion method
  • G represents the second conversion method
  • X represents e csi , which is our image group feature
  • Y represents e rec , which is our medical text feature.
  • 2 ) means that the image features are transformed into the text feature space through the second transformation method, and the text image features are obtained, and then the text image features are transformed into the text image features through the first transformation method. After transforming into the image feature space and obtaining the image transformation features, the mean value of the difference between the image transformation features and the original image features;
  • 2 ) means that the first transformation method transforms text features into image feature space and obtains image text features, and then uses the second transformation method to transform image text features into text After the text transformation features are obtained from the feature space, the mean value of the difference between the text transformation features and the original text features is
  • L c represents the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature, which is the minimum value of the second loss function.
  • the mutual detection model is iteratively trained using L c as the loss function.
  • iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
  • the loss values corresponding to the corresponding text features and image features are calculated respectively through the third conversion method, and the difference between the loss values corresponding to the text features and the loss values corresponding to the image features is determined, and the difference is used as the third loss value, and The mutual inspection model is iteratively trained through the third loss value.
  • the purpose of this application is to make the features of X (image features) and Y (text features) as close as possible, so this application designs a discriminant loss function:
  • X is mapped to a Dx feature (scalar) through the D method, that is, a certain image feature is calculated into a scalar Dx through the third conversion method D).
  • Y is mapped to Dy features (scalars) via the D method. The purpose is to make the Dx and Dy features as close as possible, so that it is even impossible to tell whether it is Dx or Dy.
  • logD(Y) refers to the logarithm of the number after converting the text feature into a scalar through the third conversion method D.
  • E[logD(Y)] represents the mean of all logarithms in a Batch sample;
  • log (1-D(X)) refers to converting the image feature into a scalar through the third conversion method D, and then calculating the logarithm of it.
  • E[log(1-D(X)) represents all the image data in a Batch sample The mean of the logarithm after transformation by the third transformation method D.
  • L d represents the loss value of the above-mentioned discriminant loss function in a Batch sample, that is, the loss value of the loss function under one Batch iteration training. If the third conversion method D obtains appropriate parameter values through iterative training, then D(Y) and D(X) should be extremely close, and L d is infinitely close to 0, (under super-ideal conditions), then this application will be explained at this time
  • the mutual detection model transforms text features and image features into the same space. Text features and corresponding image features have very similar meanings, that is, they are almost the same.
  • iterative training to generate a mutual detection model of image and text data based on text features and image features based on a predetermined loss function also includes:
  • the mutual detection model is iteratively trained by using the sum of the first loss value, the second loss value and the third loss value as the loss value.
  • the training process of this application can refer to Figure 6 to build a medical image text retrieval network based on cascade transformers, including a text information feature encoder and a medical image sequence feature encoder (as shown in the figure above).
  • the network is trained according to the above loss function to make it converge.
  • the network training process is as follows: The training process of the neural network is divided into two stages. The first stage is the stage where data is propagated from low level to high level, that is, the forward propagation stage. Another stage is the stage where the error is propagated from the high level to the bottom level, that is, the back propagation stage, when the results obtained by the forward propagation are not in line with expectations.
  • the training process is:
  • All network layer weights are initialized, generally using random initialization
  • the input image and text data are forward propagated through the neural network, convolutional layer, downsampling layer, fully connected layer and other layers to obtain the output value;
  • Each layer of the network adjusts all weight coefficients in the network based on the backpropagation error of each layer, that is, updates the weights.
  • the weight coefficients trained by the network are preloaded. Feature extraction from medical text or medical image sequences. Store in the data set to be retrieved. The user gives any medical text data or medical image sequence data, which we call query data. Extract features of medical text data or medical image sequence data of query data, using our cascade transformer medical image text retrieval network. Distance-match the features of the query data with the features of all samples in the data set to be retrieved, that is, find the vector distance. This application finds the Euclidean distance. For example: If the query data is medical text data, get all the medical image sequence features in the data set to be retrieved and calculate the distance. Similarly, query data is medical image sequence data. Calculate the Euclidean distance from all the medical image sequence features in the data set to be retrieved. The sample with the smallest distance is the recommended sample and is output.
  • the text data is calculated in a multi-level Transformer model cascade manner, and the corresponding text features are calculated through the residual network.
  • the image features of the image sequence are calculated through the multiple losses proposed by this application.
  • the function calculates and trains the mutual detection model of graphic and text data based on text features and image features. Further, the input text data or image data is predicted or the corresponding text data or image data is retrieved through the image and text data mutual detection model.
  • FIG. 9 Another aspect of this application also proposes a medical image and text data mutual detection device, including:
  • the preprocessing module 1 is configured to perform multi-level classification of the text information in the graphic data according to a predetermined method, and pass the classified text information through the first neural network model in a cascade manner according to the classification relationship. Generate text features;
  • the first model calculation module 2 is configured to use the image information in the graphic data in the form of an image sequence to generate image features through the second neural network model;
  • the second model calculation module 3 is configured to iteratively train and generate a graphic and text data mutual detection model based on text features and image features based on a predetermined loss function;
  • the image-text mutual inspection module 4 is configured to retrieve the corresponding text information and/or image information in the input image-text data through the image-text data mutual inspection model.
  • the computer device can be a terminal or a server, including:
  • the memory 22 stores computer readable instructions 23 that can be run on the processor 21 .
  • the readable instructions 23 are executed by the processor 21 , the steps of any one of the methods in the above embodiments are implemented.
  • FIG. 11 another aspect of the present application also proposes a non-volatile computer-readable storage medium 401.
  • the non-volatile computer-readable storage medium 401 stores computer-readable instructions 402.
  • the computer-readable instructions 402 When executed by the processor, the steps of any one of the methods in the above embodiments are implemented.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

Un procédé d'extraction mutuelle de données d'image-texte médicales comprend les étapes consistant à : effectuer une classification multi-niveau sur des informations de texte dans des données d'image-texte selon une manière prédéterminée et générer respectivement des informations de texte classifiées dans une caractéristique de texte d'une manière en cascade au moyen d'un premier modèle de réseau de neurones artificiels conformément à une relation de classification (S1); générer des informations d'image dans les données d'image-texte en une caractéristique d'image d'une manière de séquence d'images au moyen d'un second modèle de réseau de neurones artificiels (S2) ; conformément à la caractéristique de texte et à la caractéristique d'image, effectuer un apprentissage itératif sur la base d'une fonction de perte prédéterminée, de façon à générer un modèle d'extraction mutuelle de données d'image-texte (S3) ; et pour des informations de texte et/ou des informations d'image dans des données d'image-texte d'entrée, extraire des informations de texte et/ou des informations d'image correspondantes au moyen du modèle d'extraction mutuelle de données d'image-texte (S4).
PCT/CN2022/141374 2022-06-30 2022-12-23 Procédé et appareil d'extraction mutuelle de données d'image-texte, dispositif ainsi que support d'enregistrement lisible WO2024001104A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210760827.7 2022-06-30
CN202210760827.7A CN115408551A (zh) 2022-06-30 2022-06-30 一种医疗图文数据互检方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024001104A1 true WO2024001104A1 (fr) 2024-01-04

Family

ID=84158085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/141374 WO2024001104A1 (fr) 2022-06-30 2022-12-23 Procédé et appareil d'extraction mutuelle de données d'image-texte, dispositif ainsi que support d'enregistrement lisible

Country Status (2)

Country Link
CN (1) CN115408551A (fr)
WO (1) WO2024001104A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115408551A (zh) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 一种医疗图文数据互检方法、装置、设备及可读存储介质
CN117407518B (zh) * 2023-12-15 2024-04-02 广州市省信软件有限公司 一种基于大数据分析的信息筛选展示方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239153A (zh) * 2021-05-26 2021-08-10 清华大学深圳国际研究生院 一种基于实例遮掩的文本与图像互检索方法
US20210271707A1 (en) * 2020-02-27 2021-09-02 Adobe Inc. Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
CN114357148A (zh) * 2021-12-27 2022-04-15 之江实验室 一种基于多级别网络的图像文本检索方法
CN114612749A (zh) * 2022-04-20 2022-06-10 北京百度网讯科技有限公司 神经网络模型训练方法及装置、电子设备和介质
CN114661933A (zh) * 2022-03-08 2022-06-24 重庆邮电大学 基于胎儿先心病超声图像—诊断报告的跨模态检索方法
CN115408551A (zh) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 一种医疗图文数据互检方法、装置、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210271707A1 (en) * 2020-02-27 2021-09-02 Adobe Inc. Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
CN113239153A (zh) * 2021-05-26 2021-08-10 清华大学深圳国际研究生院 一种基于实例遮掩的文本与图像互检索方法
CN114357148A (zh) * 2021-12-27 2022-04-15 之江实验室 一种基于多级别网络的图像文本检索方法
CN114661933A (zh) * 2022-03-08 2022-06-24 重庆邮电大学 基于胎儿先心病超声图像—诊断报告的跨模态检索方法
CN114612749A (zh) * 2022-04-20 2022-06-10 北京百度网讯科技有限公司 神经网络模型训练方法及装置、电子设备和介质
CN115408551A (zh) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 一种医疗图文数据互检方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN115408551A (zh) 2022-11-29

Similar Documents

Publication Publication Date Title
WO2022227207A1 (fr) Procédé de classification de texte, appareil, dispositif informatique et support de stockage
US11580415B2 (en) Hierarchical multi-task term embedding learning for synonym prediction
CN107516110B (zh) 一种基于集成卷积编码的医疗问答语义聚类方法
US20210034813A1 (en) Neural network model with evidence extraction
WO2020177230A1 (fr) Procédé et appareil de classification de données médicales basés sur un apprentissage machine et dispositif informatique et support de stockage
CN112149414B (zh) 文本相似度确定方法、装置、设备及存储介质
WO2024001104A1 (fr) Procédé et appareil d'extraction mutuelle de données d'image-texte, dispositif ainsi que support d'enregistrement lisible
CN112015868B (zh) 基于知识图谱补全的问答方法
WO2023029506A1 (fr) Procédé et appareil d'analyse d'état de maladie, dispositif électronique et support de stockage
CN110674850A (zh) 一种基于注意力机制的图像描述生成方法
CN112016295B (zh) 症状数据处理方法、装置、计算机设备及存储介质
WO2020198855A1 (fr) Procédé et système de mappage de phrases textuelles avec une taxonomie
WO2023160264A1 (fr) Procédé et appareil de traitement de données médicales, et support d'informations
US20230244869A1 (en) Systems and methods for classification of textual works
WO2022227203A1 (fr) Procédé, appareil et dispositif de triage basés sur une représentation de dialogue, et support de stockage
US11354599B1 (en) Methods and systems for generating a data structure using graphical models
US11625935B2 (en) Systems and methods for classification of scholastic works
CA3068891C (fr) Procede et systeme pour generer la representation vectorielle d`une image
CN116992002A (zh) 一种智能护理方案应答方法及系统
US20220375576A1 (en) Apparatus and method for diagnosing a medical condition from a medical image
CN110633363B (zh) 一种基于nlp和模糊多准则决策的文本实体推荐方法
Wang et al. Transh-ra: A learning model of knowledge representation by hyperplane projection and relational attributes
CN115936115B (zh) 基于图卷积对比学习和XLNet的知识图谱嵌入方法
CN116956228A (zh) 一种技术交易平台的文本挖掘方法
US11783244B2 (en) Methods and systems for holistic medical student and medical residency matching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949180

Country of ref document: EP

Kind code of ref document: A1