CN116434841A

CN116434841A - Embryo evaluation method and device based on multi-modal data

Info

Publication number: CN116434841A
Application number: CN202310369281.7A
Authority: CN
Inventors: 倪娜; 童国庆; 吕毅
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-07-14

Abstract

The application relates to an embryo assessment method and device based on multi-modal data. Wherein the method of training the machine learning model comprises: acquiring image information of a training embryo sample and text information corresponding to the training embryo sample; acquiring image characteristics of the image information; acquiring text characteristics of the text information; acquiring the matching degree of the image features and the text features; and training the model based on the image features and the text features from the same training embryo sample having a high degree of matching to obtain the machine learning model. The machine learning model can comprehensively utilize the image and text information to evaluate embryo quality more comprehensively, so that evaluation accuracy and evaluation efficiency are improved, and artificial subjectivity errors are reduced.

Description

Embryo evaluation method and device based on multi-modal data

Technical Field

The present invention relates to the field of biological information, and more particularly, to a method and apparatus for embryo assessment based on multimodal data, and more particularly, to a method for training a machine learning model, an apparatus for training a machine learning model, a system for assessing embryo quality, a computing device, and a computer readable storage medium.

Background

In Vitro Fertilization (IVF) is one of the most common methods of treating infertility. The treatment process comprises ovulation promotion, ovum taking, fertilization, embryo transfer and implantation. However, this technique currently only achieves about 35% conception rates. Embryo quality assessment is a critical component in IVF diagnosis and treatment. The physician selects the most viable embryo to implant into the mother's uterus based on the morphology of the embryo. However, subjective differences in embryo scoring often lead to variability in embryo assessment results; and other physiological information of the patient, treatment regimens and related indicators, such as hormones, follicular number morphology, endometrial thickness, can also affect embryo quality and thus the final pregnancy outcome.

At present, some companies and scholars at home and abroad conduct researches on embryo quality assessment based on an artificial intelligence algorithm, and the method mainly focuses on extracting features of embryo static pictures or time sequence images by using a computer vision technology to classify embryos. An embryo is a three-dimensional tissue that contains multiple constituents, such as, for example, when the embryo grows to form a blastocyst on the fifth day, multiple structures such as an inner cell mass, trophoblasts, zona pellucida, etc., which can continue to progress to a fetal or placental tissue, etc., with different effects on the final pregnancy outcome. The relationship and distinction between tissues can not be refined only by using the image information, so that the growth condition of each tissue structure can not be accurately and effectively estimated, and the quality of the final embryo can not be quantitatively judged. The embryo growth not only depends on the in vitro culture link, but also is closely related to the physiological index of the parent, the follicle quantity quality, age and hormone level before and after excretion promotion. However, up to now, no machine learning model for embryo quality assessment by multimodal synthetic data has been found.

Therefore, there is a need in the art to develop a machine learning model for quantitatively evaluating embryo quality based on multi-modal data to improve embryo detection efficiency and accuracy.

Disclosure of Invention

The present application was proposed by the inventors based on the discovery of the following problems and facts:

aiming at the problems of low efficiency, poor accuracy, large subjective difference and the like of the conventional embryo quality assessment, the inventor combines and constructs a multi-mode assessment model by extracting various super parameters in image and text information, and can maximally assess the embryo quality by using the image and text information. Meanwhile, the multi-modal model adopts a probability calculation mode different from the traditional classification model, and the improvement enables the model to more fully utilize the super-parameter information extracted from the model, and improves retrieval accuracy and efficiency.

The present invention aims to solve at least to some extent one of the above technical problems.

To this end, in a first aspect of the invention, the invention proposes a method of training a machine learning model for evaluating embryo quality. According to an embodiment of the invention, the machine learning model comprises an image feature extractor sub-model, a text feature extractor sub-model and a matching degree predictor sub-model, the method comprising: acquiring image information of a training embryo sample and text information corresponding to the training embryo sample; inputting the image information into an image feature extractor sub-model to obtain image features of the training embryo sample; and inputting the text information into a text feature extractor sub-model to obtain text features of the training embryo sample; inputting the image features and the text features into a matching degree prediction sub-model, wherein the matching degree prediction model outputs the matching degree of the image features and the text features; and training the image feature extractor sub-model, the text feature extractor sub-model, and the matching prediction sub-model based on the image features and the text features from the same training embryo sample having a high matching degree, so as to obtain the machine learning model.

The machine learning model for embryo quality assessment trained by the method can comprehensively utilize the image and text information to more comprehensively assess embryo quality, and improves assessment accuracy; in addition, the automatic evaluation mode of the machine learning model reduces the artificial subjectivity error and improves the evaluation efficiency; since the machine learning model is trained based on a large amount of training data, a high accuracy embryo quality assessment result can be provided.

In this application, the image feature extractor submodel selects the Swin-Tiny model as an encoder of the image, and extracts features in the original image through a series of layering operations such as a convolution layer, a pooling layer and the like, and represents the features as a series of feature maps. These feature maps may be used for image classification, object detection, etc.

The text feature extractor sub-model refers to a machine learning model for extracting useful features from text data. These models may convert text data into digital vectors for use in various machine learning tasks such as classification, clustering, information retrieval, and the like. In the application, a text Transfomer model is selected to train a text data set for learning the most effective feature extraction mode.

It should be noted that the matching degree prediction sub-model is used for predicting the similarity between the image dataset and the text dataset based on a deep learning method. By using a supervised learning approach, the degree of similarity between the image set and the text set is learned by training the data set and used to predict the new similarity.

According to an embodiment of the present invention, the method for training a machine learning model may further include at least one of the following technical features:

according to an embodiment of the invention, the machine learning model further comprises an embryo quality assessment sub-model for generating embryo quality assessment results. The characteristics of embryo morphology, cell number, cell symmetry and the like are automatically evaluated through a computer vision technology and an artificial intelligence algorithm, so that the burden of manual evaluation is reduced, and the accuracy and reliability are improved.

According to an embodiment of the invention, the quality assessment result is selected from at least one of a graphic description, an assessment score, an assessment level and a zoned assessment. Specifically, the quality evaluation result includes:

1) Graphic description: the embryo quality information is described by combining text and images, so that the analysis flow is simplified.

2) Evaluation level: embryos are evaluated for quality by classifying them by one level, e.g., excellent, good, general, etc.

3) Evaluation score: the quality of the embryo is scored, and the embryo with high quality is further defined to be higher than a certain threshold score.

4) Regional assessment: by evaluating different positions or different periods of the embryo image, the quality of the embryo at a particular point in time is further determined, helping the physician to analyze the reasons for the good quality of the embryo.

It should be noted that different quality evaluation results may be obtained by different evaluation methods, so that an appropriate evaluation method needs to be selected when quality evaluation is performed, and analysis and judgment are performed in combination with specific situations.

According to an embodiment of the invention, the pre-trained model of image information and text information is selected from a CLIP model. The method and the device have the advantages that the CLIP model can be used for processing texts and images simultaneously, the CLIP model is selected to perform large-scale training learning on embryo images and text information, and based on the universality of the CLIP model, different types of images and text training learning can be selectively input so as to enhance the robustness of the model. This can produce accurate results even in the case of inputting a special image.

According to an embodiment of the invention, the image feature extractor sub-model is selected from the Swin-Tiny model. The inventor selects the Swin-Tiny model based on the advantages of light weight, strong generalization performance and high training efficiency.

According to the embodiment of the invention, the image feature extractor sub-model is obtained by taking at least one of basic follicle, follicle after promoting excretion, endometrium before transplantation, ovum, cleavage period and blastula period information in an image as feature parameter supervision training based on a Swin-Tiny model. According to the embodiment of the invention, the image feature extractor submodel developed based on the Swin-Tiny model extracts information such as basic follicles, follicles after promoting excretion, endometrium before transplantation, ova, cleavage period, blastocyst period and the like from an input image, takes the information as characteristic parameters for supervision training, and recognizes and classifies different types of follicles and endometrium by learning the characteristic parameters, thereby helping doctors to better judge the quality of embryos.

According to an embodiment of the invention, the text feature extractor sub-model is selected from a text Transfomer model. The text transducer model is widely used for natural language processing as one of deep learning models. According to the embodiment of the invention, the text transformation model is selected as a model for processing text information, and the transformation adopts a multi-head self-attention mechanism, so that long-term dependence can be effectively modeled, and long texts can be better processed. In addition, the transducer can perform complete parallel operation, and can process data faster during training and reasoning, thereby reducing training time cost and resources.

According to the embodiment of the invention, the text feature extractor sub-model is obtained by taking at least one of physiological information, hormone information, medication information, follicular information and embryo development information in a text as feature parameter supervision training based on a text Transfomer model. In particular, the sub-model is trained using related text data sets that have been previously annotated to learn how to extract effective characteristic parameters associated with physiological, hormonal, drug, follicular, embryo development, etc. information, and to better process and analyze the text data in subsequent tasks.

According to an embodiment of the invention, the machine learning model text feature is selected from a patient.

According to the embodiment of the invention, the matching degree prediction sub-model is obtained by training based on the similarity between the image characteristics and the text characteristics. Training of the sub-model is based on the similarity between image features and text features, i.e. the degree of matching between two feature vectors is determined by comparing the degree of similarity between them. Training is performed by using a known matching degree dataset for predicting the similarity between two unknown images or texts, thereby determining the matching degree between them.

According to an embodiment of the invention, the degree of matching corresponds to a degree of similarity between the image feature and the text feature.

Illustratively, image and text information is entered into an embryo quality assessment model that has been trained in the present application, and the quality of the embryo is predicted by analyzing the entered image and text information. When outputting the result, the user can select to output the number N (N is a positive integer, such as 1, 2, 3, 4, 5, etc.) of similar images before the similarity sorting and the corresponding text information. It should be noted that the model may also modify the output results to embryo quality scores, embryo quality ratings, or embryo quality assessment at the input continuous time point images. These different output formats provide more options, suitable for different clinical needs. In addition, the user can select the output image and the corresponding text information to carry out combined evaluation on the image and the embryo quality evaluation score, the rating and the continuous time point image.

According to an embodiment of the invention, the image information further comprises an ultrasound image.

It should be noted that the image information described in this application includes, but is not limited to, ultrasound images.

According to an embodiment of the invention, the ultrasound image comprises a B-ultrasound.

According to an embodiment of the invention, the machine learning model image information is selected from at least one of a single image or a continuous image or different areas in the images under a microscope. According to an embodiment of the invention, the machine learning model image information is selected from a single RGB image, a continuous RGB image, and a partial region in the single RGB image under a microscope for training to facilitate a more comprehensive and accurate quality assessment. Through the analysis of the growth and development conditions of the whole embryo process, the generation of the evaluation report has higher rating accuracy, and can assist a reproductive doctor in carrying out embryo selective transplantation in practical clinical application.

In a second aspect of the invention, the invention proposes an apparatus for training a machine learning model. According to an embodiment of the invention, the apparatus is for assessing embryo quality, the apparatus comprising: the information acquisition module is used for acquiring image information of a training embryo sample and text information corresponding to the training embryo sample; the feature acquisition module is used for acquiring image features of the training embryo sample and text features of the training embryo sample; the matching degree prediction module is used for acquiring the matching degree of the image features and the text features; a training module that trains the machine learning model based on the image features and the text features from the same training embryo sample having a high degree of matching; wherein the machine learning model is trained by the method of the first aspect of the invention.

The device provided by the embodiment of the invention has the following advantages:

1) Improving the accuracy of embryo quality assessment: the device can acquire image information and text information of the embryo at the same time, and conduct feature extraction and matching degree prediction on the image information and the text information, and based on large-scale training data, the embryo quality assessment accuracy can be remarkably improved.

2) The degree of automation is high: the device can automatically complete the embryo quality assessment process, thereby saving labor cost (and avoiding subjective errors of observers) and time cost.

3) Training of machine learning models is supported: the device can also perform data set training on the machine learning model based on the acquired image features and text features, and the robustness of the model is improved.

4) The application range is wide: the device can be suitable for different types of embryo quality assessment tasks and has universality.

According to an embodiment of the present invention, the apparatus for training a machine learning model may further include at least one of the following technical features:

In a third aspect of the invention, the invention provides a system for assessing embryo quality. According to an embodiment of the invention, the system comprises: the information acquisition unit is used for acquiring image information of the training embryo sample and text information corresponding to the training embryo sample; the feature acquisition unit is used for acquiring image features of the training embryo sample and text features of the training embryo sample; the matching degree prediction unit is used for acquiring the matching degree of the image features and the text features; and the embryo evaluation unit is used for evaluating embryo quality based on the machine learning model. Wherein the machine learning model is trained by the method of the first aspect of the invention. An advantage of the system according to embodiments of the present invention is that it can automatically evaluate embryo quality, reducing the impact of human subjective differences. In addition, the system uses a machine learning model for evaluation, can more accurately predict the development capability and health condition of the embryo, improves the success rate of embryo transfer, and simultaneously is beneficial to reducing unnecessary medical expenses and time cost.

According to an embodiment of the present invention, the above-described system for evaluating embryo quality may further include at least one of the following technical features:

According to an embodiment of the present invention, the quality evaluation result is selected from at least one of a graphic description, an evaluation score, an evaluation level, and a regional evaluation matching degree.

In a fourth aspect of the invention, the invention provides a computing device. According to an embodiment of the invention, the computing device comprises: a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program to implement the method according to the first aspect of the present invention. According to an embodiment of the present invention, the processor executes a program corresponding to executable program code stored in the memory by reading the executable program code to implement training of a machine learning model. The device can execute the computer program quickly, making it more efficient. Furthermore, the device may store various types of computer programs as needed, enhancing flexibility in processing computer programs. More importantly, the cost of such a computing device may be significantly lower relative to other high-end computing devices, as it is implemented by a common combination of hardware components and open source software.

It should be noted that, the device described in the present application has scalability, for example, in practical applications, if a larger storage space or a more powerful processing capability is required, the computing device may be extended by adding an additional memory or a processor; multitasking parallel processing may also be supported: thereby improving the calculation efficiency.

In a fifth aspect of the present invention, a computer-readable storage medium is presented. According to an embodiment of the invention, the storage medium comprises computer instructions which, when executed by a computer, cause the computer to carry out the method according to the first aspect of the invention. The computer program contained in the computer readable storage medium can more accurately process the embryo images and the complex information contained in the text information, and can give a more accurate result based on a large amount of training in the early stage. And the computer readable storage medium can store a large amount of data information, so that human errors are avoided, and the accuracy and the reliability of data are ensured. Moreover, the access can be performed anytime and anywhere so as to conveniently call detection records of different times of the user and the like.

It should be noted that in this application, the logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered as a ordered listing of executable instructions for implementing logical functions, and may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. Various computer-readable storage media described herein can represent one or more devices and/or other machine-readable storage media for storing information. The term "machine-readable storage medium" can include, without being limited to, wireless channels and various other media capable of storing, containing, and/or carrying instruction(s) and/or data.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, it may be implemented using any one or combination of the following techniques, as known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a diagram of a training machine learning model apparatus according to an embodiment of the present invention;

FIG. 2 is a flow chart of a system for evaluating embryo quality in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of an embryo evaluation method according to an embodiment of the present invention;

FIG. 4 is a diagram of a multi-modal learning network model in accordance with an embodiment of the present invention;

FIG. 5 is a confusion matrix diagram of the results of classification, image retrieval, and image-text retrieval according to embodiment 1 of the present invention;

FIG. 6 is a result of similarity between each word and a partial image according to embodiment 1 of the present invention; wherein darker color indicates higher similarity.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

Definition and description

In order that the invention may be more readily understood, certain technical and scientific terms are defined below. Unless clearly defined otherwise herein in this document, all other technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The terms "first," "second," and the like herein are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. Further, in the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.

In this document, the terms "comprise" or "include" are used in an open-ended fashion, i.e., to include what is indicated by the present invention, but not to exclude other aspects.

In this document, the terms "optionally," "optional," or "optionally" generally refer to the subsequently described event or condition may, but need not, occur, and the description includes instances in which the event or condition occurs, as well as instances in which the event or condition does not.

In this application, the term "multimodal data" includes data composed of a plurality of forms of information, such as different types of data of images, text, sound, etc. In this context, the multi-modal embryo assessment model is a machine learning model for more accurately and comprehensively assessing embryo quality after analysis, processing and fusion of the aforementioned different types of data (embryo image information, text information, etc.) based on an artificial intelligence algorithm.

Method for training machine learning model

According to an embodiment of the present invention, the machine learning model is used for embryo quality assessment, the machine learning model comprising an image feature extractor sub-model, a text feature extractor sub-model and a matching degree predictor sub-model, the method comprising: acquiring image information of a training embryo sample and text information corresponding to the training embryo sample; inputting the image information into an image feature extractor sub-model to obtain image features of the training embryo sample; and inputting the text information into a text feature extractor sub-model to obtain text features of the training embryo sample; inputting the image features and the text features into a matching degree prediction sub-model, wherein the matching degree prediction model outputs the matching degree of the image features and the text features; and training the image feature extractor sub-model, the text feature extractor sub-model, and the matching prediction sub-model based on the image features and the text features from the same training embryo sample having a high matching degree, so as to obtain the machine learning model.

Specifically, for convenience of understanding, the following detailed explanation and description are given to the technical solutions of the present application:

in this application, the machine learning model training process specifically includes:

1) Image preprocessing: the image is preprocessed to enhance contrast and eliminate noise interference to improve quality and accuracy of the image.

For example, for multiple embryo culture modes. First, an original RGB embryo image is converted into a gray image, and then the gray image is subjected to a binarization operation. And positioning a rectangular area of the outer boundary of the blastula by using an opencv algorithm, cutting out the rectangular area from the original RGB blastula image, and cutting out a background image according to the rectangular binary mask image. Finally, to load a batch of images while training the network, the present application adjusts the pixel size of the ultrasound and embryo images to fixed pixels by way of padding, thereby maintaining the aspect ratio of the images.

It should be noted that the term "binarization operation" is a basic preprocessing technique in digital image processing, and functions to convert a gray-scale image into a binary image. In a binary image, each pixel has only two possible values, typically 0 and 1, but other polarities are also possible. Specifically, the gradation values are generally classified into two categories by comparing the pixel value in the gradation image with a threshold value, and setting a pixel having a pixel value greater than the threshold value to 1 and a pixel having a pixel value less than or equal to the threshold value to 0. In the binarized image, obvious distinction exists between adjacent pixel values, so that the outline and the characteristics are more clear, and the image analysis and the image processing can be better carried out.

2) Character paradigm processing: in order to recognize characters in an image, a paradigm process is performed on the characters.

The invention designs a set of text paradigm generation aiming at embryo tasks. For ultrasound and embryo image data, patient and related embryo information are integrated to generate an alignment paradigm. For example, for an ovarian ultrasound image of the pro-drainage, the number and size of follicles are given; for embryo microscope pictures, shooting conditions such as focal length, growth time, cell mass position morphology, whether debris vacuoles exist or not and the like are given, and the text information provides rich self-supervision information for corresponding images.

3) And extracting global and local characteristics of the image and the text.

The invention utilizes the image feature extractor and the text feature extractor to extract global features and local features of the graphics context to learn the image representation and the text representation in the same multi-modal feature space. The Pre-Training model of the Image feature extractor and the text feature extractor uses a visual and language joint learning modality model, such as the CLIP (Contrastive Language-Image Pre-Training) model, to learn an alignment representation between text and images by Pre-Training on large-scale data, with the alignment representation being used to achieve fine-tuning of embryo tasks.

Specifically, as shown in fig. 3, the image is input into an ultrasonic image before the transplantation of the basic follicular phase and the pro-arranging phase of a patient obtained by ultrasonic, and a time-series embryo growth and development image is obtained by an optical microscope. The text input contains physiological information of the patient, hormone information, medication information, follicular information, embryo development information and the like.

In the image feature extractor sub-model, a Swin-Tiny transform is selected to extract features from the image, which is segmented into a sequence of image blocks to accommodate the multi-headed attentiveness mechanism of the model. Each image block obtains local characteristics through a characteristic extractor, and all the image block embedded representations are averaged to obtain global image characteristics.

In the text feature extractor sub-model, a text Transfomer model is used as the text feature extractor to obtain robust global text representations and local text (e.g., whole text, words, or sentences) representations.

In the architecture training stage, a text feature extractor and a picture feature extractor are used for extracting text description features and picture features respectively, similarity of the text and picture features is used as a score of a current category, and then cross entropy loss is used for optimizing a model.

4) Embryo quality assessment: the embryo quality is assessed using classification models, regression models, reinforcement learning or retrieval subtasks. Depending on the type and purpose of the data, for example, classification models may be suitable for ranking embryos from poor to good, while regression models may be used to predict quality scores for embryos.

On the downstream subtasks, the subtasks such as image-text retrieval, quantitative evaluation, classification, recognition segmentation and the like are customized according to the requirements of doctors. The subtasks differ as follows:

a. and (5) image-text retrieval: similar to X-ray inspection, i.e., the input image is used by the AI algorithm to generate a detection report. The prediction model inputs text information such as images or/and physiological blood values, calculates the similarity between the picture features and all text description features, selects the category corresponding to the text with the largest feature similarity as a model prediction result, and finally realizes embryo text description.

b. Quantitative evaluation task: the prediction model inputs text information such as relevant detection images of the patient, physiological blood values and the like, the model is added with a regression head, and the quality of the embryo, such as a percentile, is predicted quantitatively by using a regression task. Different embryos of the patient have different scoring predictors, and according to the predictors, doctors can conduct transplantation or treatment of the next period according to embryo scores.

c. Classification tasks: similar to the regression task, the prediction model inputs text information such as relevant detection images and physiological blood values of the patient, and the classification task is used for grading the embryo, for example, three stages, so that a doctor can transplant or treat in the next period according to the embryo score.

d. Identifying a segmentation task: the prediction model inputs text information such as relevant detection images and physiological blood values of a patient, marks the corresponding areas of image and embryo morphology description, such as the number of blastomeres in the division period, the morphology of cell clusters and trophoblasts in the blastula period and the positions and the number of debris cavities, so as to help doctors to quantitatively and finely judge the quality of embryos.

According to the embodiment of the invention, embryo rating is realized by purchasing the multi-mode data embryo evaluation model by using the method, so that doctors are helped to more accurately perform IVF treatment, and the success rate of the treatment is improved.

Device for training machine learning model

According to an embodiment of the invention, the device is used for assessing embryo quality.

As shown in fig. 1, the apparatus includes: the information acquisition module S100 is used for acquiring image information of the training embryo sample and text information corresponding to the training embryo sample; the feature acquisition module S200 is used for acquiring image features of a training embryo sample and text features of the training embryo sample; a matching degree prediction module S300, configured to obtain a matching degree of the image feature and the text feature; the training module S400 trains the machine learning model based on the image features and the text features from the same training embryo sample having a high degree of matching.

It should be noted that, the information obtaining module S100 is connected to the feature obtaining module S200, the feature obtaining module S200 is connected to the matching degree predicting module S300, and the matching degree predicting module S300 is connected to the training module S400.

System for evaluating embryo quality

According to an embodiment of the invention, the system is used for assessing embryo quality.

As shown in fig. 2, the information obtaining unit S500 is configured to obtain image information of a training embryo sample and text information corresponding to the training embryo sample; the feature acquisition unit S600 is used for acquiring image features of a training embryo sample and text features of the training embryo sample; the matching degree prediction unit S700 is configured to obtain a matching degree of the image feature and the text feature; the embryo estimating unit S800 is used for estimating embryo quality based on a machine learning model.

The information acquisition unit S500 is connected to the feature acquisition unit S600, the feature acquisition unit S600 is connected to the matching degree prediction unit S700, and the matching degree prediction unit S700 is connected to the embryo evaluation unit S800.

Computing device

According to an embodiment of the invention, the computing device comprises: a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program to implement the aforementioned method of training a machine learning model.

Computer readable storage medium

According to an embodiment of the invention, the storage medium includes computer instructions that, when executed by a computer, cause the computer to implement the aforementioned method of training a machine learning model.

Embodiments of the present invention will be described in more detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

The embodiments will be described in detail below. The examples are not to be construed as limiting the specific techniques or conditions described in the literature in this field or as per the specifications of the product.

Example 1: embryo quality assessment model verification

The data set used for training of the invention is derived from a certain trimester hospital in Shanghai. The embodiment of the invention constructs a multi-modal model of an image-text based on blastula images and text information (physiological information, hormone information, medication information, follicular information, embryo development information and the like) for evaluating the quality of embryos. The method comprises the following specific steps:

1) Image data acquisition

The dataset contained 395 microscopic static RGB images of blastula (no age or other physiological index selectivity). Blastula images were taken 115-120 hours (day 5) or 135-140 hours (day 6) after in vitro insemination with a standard inverted optical microscope system (model: nikon Eclipse Ti-U), which images may contain differences in different viewing angles, scales, colors, contrast, etc.

2) Text data acquisition

The present example trains embryo quality assessment models with static fifth or sixth day blastula images and their corresponding grade text for cell clusters within.

3) Model parameter adjustment

For better blastocyst quality evaluation performance of the multi-modal network, the invention uses an Adam optimizer (decay rate: 5e-4, momentum: 0.95) to train the model 300 iterations. Furthermore, the early-stop strategy is applied to avoid the phenomenon of overfitting of the model. The batch size was 32, the image encoder learning rate was 7.5e-6, the text encoder learning rate was 7.5e-7, and the temperature coefficients were 1.0, respectively.

The present invention newly devised text labels for cell clusters within blasts based on text descriptions in the Gardner scoring system and the Time-capsule system, instead of traditional letter ratings, as shown in table 1. These text labels can provide more information to aid in assessing blastocyst quality. By combining the image data and the text labels, the invention successfully constructs a multi-modal model based on image-text to more comprehensively and accurately evaluate the blastocyst quality.

Table 1: text labels replace corresponding letter ratings

4) Model performance verification

To verify the effectiveness of the model, model performance was evaluated by four metrics, accuracy, precision, recall and F1-score. In all experiments, the patent performs 5-fold cross validation to increase the robustness of the model. More semantic information is included in the text description of blastula than in the single-mode network. In addition, the present invention compares the performance of different image encoders used in classification experiments. Experimental results demonstrate the effectiveness of the proposed multimodal method.

5) Analysis of results

The results are shown in Table 2, the multimodal Swin-Tiny model is superior to the best unimodal classification model (+1.5%) and the image retrieval model (+1.2%).

Table 2: model performance comparison results

To further analyze the scoring ability of the model, the present invention examines the recall for each level. As shown in fig. 3. The model achieves the best performance in predicting a and C compared to the other two models. In clinical practice, grade A and grade C blasts are critical to embryologists in selecting blasts. Furthermore, the model has a minimum number of cross-level prediction errors, i.e., labels a and predicts C or vice versa. These findings indicate that the model can utilize textual information of the blastocysts for more accurate blastocyst assessment.

The invention also evaluates the effect of the text word by calculating the similarity between each word and the patch image embedding. As shown in fig. 4, the maximum similarity of the words most relevant to the image block in each rank is ordered from small to large. Words with higher similarity, such as vacuoles at class C, play a more critical role.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims

1. A method of training a machine learning model for evaluating embryo quality, the machine learning model comprising an image feature extractor sub-model, a text feature extractor sub-model, and a match degree predictor sub-model, the method comprising:

acquiring image information of a training embryo sample and text information corresponding to the training embryo sample;

inputting the image information into an image feature extractor sub-model to obtain image features of the training embryo sample; and

inputting the text information into a text feature extractor sub-model to obtain text features of the training embryo sample;

inputting the image features and the text features into a matching degree prediction sub-model, wherein the matching degree prediction model outputs the matching degree of the image features and the text features; and

the image feature extractor sub-model, the text feature extractor sub-model, and the matching prediction sub-model are trained to obtain the machine learning model based on the image features and the text features from the same training embryo sample having a high matching.

2. The method of claim 1, wherein the machine learning model further comprises an embryo quality assessment sub-model for generating embryo quality assessment results;

optionally, the quality assessment result is selected from at least one of a graphic description, an assessment score, an assessment level, and a zoned assessment.

3. The method of claim 1, wherein the pre-trained model of image information and text information is selected from a CLIP model.

4. The method of claim 1, wherein the image feature extractor sub-model is selected from the Swin-Tiny model;

optionally, the image feature extractor sub-model is obtained based on a Swin-Tiny model as feature parameter supervised training of at least one of basic follicle, follicle after promoting drainage, pre-implantation endometrium, ovum, cleavage phase and blastocyst phase information in the image.

5. The method of claim 1, wherein the text feature extractor sub-model is selected from a text Transfomer model;

optionally, the text feature extractor sub-model is obtained by taking at least one of physiological information, hormone information, medication information, follicular information and embryo development information in a text as feature parameter supervision training based on a text Transfomer model;

optionally, the machine learning model text feature is selected from a patient.

6. The method of claim 1, wherein the matching degree predictor model is obtained by training based on similarity between image features and text features;

optionally, the degree of matching corresponds to a degree of similarity between the image feature and the text feature.

7. The method of claim 1, wherein the image information further comprises an ultrasound image;

optionally, the ultrasound image comprises B-ultrasound.

8. The method of claim 1, wherein the machine learning model image information is selected from at least one of a single image or a continuous image or different regions of images under a microscope.

9. An apparatus for training a machine learning model, the apparatus for evaluating embryo quality, the apparatus comprising:

the information acquisition module is used for acquiring image information of a training embryo sample and text information corresponding to the training embryo sample;

the feature acquisition module is used for acquiring image features of the training embryo sample and text features of the training embryo sample;

the matching degree prediction module is used for acquiring the matching degree of the image features and the text features;

a training module that trains the machine learning model based on the image features and the text features from the same training embryo sample having a high degree of matching;

wherein the machine learning model is trained by the method of any one of claims 1-8.

10. A system for assessing embryo quality, comprising:

the information acquisition unit is used for acquiring image information of the training embryo sample and text information corresponding to the training embryo sample;

the feature acquisition unit is used for acquiring image features of the training embryo sample and text features of the training embryo sample;

the matching degree prediction unit is used for acquiring the matching degree of the image features and the text features;

an embryo evaluation unit for evaluating embryo quality based on the machine learning model;

11. The apparatus of claim 9 or the system of claim 10, wherein the image information further comprises an ultrasound image;

optionally, the ultrasound image comprises B-ultrasound.

12. The system of claim 10, wherein the quality assessment is selected from at least one of a teletext description, an assessment score, an assessment level, and a regional assessment.

13. A computing device, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor for executing the computer program to implement the method of any one of claims 1 to 8.

14. A computer readable storage medium comprising computer instructions which, when executed by a computer, cause the computer to implement the method of any one of claims 1 to 8.