CN115187558A

CN115187558A - Embryo development detection device and training platform thereof

Info

Publication number: CN115187558A
Application number: CN202210856237.4A
Authority: CN
Inventors: 汪涛
Original assignee: Aifu Technology Shanghai Co ltd
Current assignee: Aifu Technology Shanghai Co ltd
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-10-14

Abstract

The invention relates to an embryo development detection device and a training platform thereof, comprising an embryo image enhancement module, an embryo image training set construction module, a model total loss judgment module and a parameter adjustment module, wherein the embryo images corresponding to two different embryo development stages are subjected to image fusion in a weighted addition mode and then are used for constructing an embryo characteristic training set corresponding to embryo characteristics; and modifying at least one parameter in the initial embryo target detection model according to the total loss calculated after the embryo characteristic training set is sent into the initial embryo target detection model to generate an embryo target detection model corresponding to the embryo characteristic, so that a plurality of preset characteristics in the embryo image can be accurately and quickly identified. The invention also relates to an embryo development detection device generated by utilizing the training platform.

Description

Embryo development detection device and training platform thereof

Technical Field

The invention relates to embryo development detection based on delayed photography, in particular to an embryo development detection device and a training platform thereof.

Background

The identification of image data by using AI artificial intelligence technique has become one of the effective ways to reduce cost and improve efficiency in the medical field. Automated assessment of the state of embryo growth and development has been achieved by applying artificial intelligence algorithms to replace the embryologist's analysis of Time-lapse photographic (Time-lapse) data from embryo culture.

Such time-lapse photographing apparatuses are known in the art, and by combining a high-resolution camera with an embryo incubator, continuously and dynamically monitoring the embryo development process at specific intervals, frequencies, angles, etc., and taking images, morphological observation and analysis of the embryo development process can be performed without frequently disturbing the environment inside the embryo incubator.

In this regard, it is desirable to accurately identify and measure specific characteristics and parameters indicative of the developmental status of the embryo, such as identifying pronuclei, nucleoli, cell number, cell shape and area, etc., in time series images of the embryo taken by time-lapse photography equipment.

In the prior art, in order to detect a specific target in an image, a large number of frames which may contain a target object are often generated in a candidate region, a classifier is used to determine whether each frame contains the target object and the probability or confidence of the class to which the target object belongs, and the frames are improved through post-processing to eliminate repeated detection targets. For example, CN111539308a discloses locating a border of a blastomere region in an embryo image by using a target detection neural network of fast RCNN, which first extracts a feature map of the embryo image by using a convolutional neural network, obtains a plurality of proposed regions by using a region proposal network, pools each proposed region by using an ROI-Align layer, and finally performs border regression and logistic regression by using a full connection layer to obtain a border of the blastomere region.

In the above target detection system using a sliding window or a candidate region, the classifier can only obtain local information of an image, and therefore context information cannot be well utilized when detecting an object, so that erroneous object information is easily predicted on a background, which results in unreliable embryo image recognition results requiring accuracy.

Disclosure of Invention

The invention aims to provide a target detection model for quickly and accurately identifying a small target in an embryo image, and provides a training platform for quickly training the same data set to obtain a plurality of different target detection models.

The invention relates to a training platform of an embryo development detection device, which comprises an embryo image enhancement module, an embryo image training set construction module, a model total loss judgment module and a parameter adjustment module, wherein the embryo image enhancement module receives two image groups of an embryo, the two image groups respectively comprise a plurality of embryo images obtained by carrying out time-delay photography on the embryo, and the two image groups respectively correspond to two different embryo development stages; the embryo image enhancement module performs image fusion on the plurality of embryo images of the two image groups in a weighted addition mode to serve as an input image of the embryo image training set construction module; the embryo image training set construction module labels the input image according to the selected embryo characteristics so as to construct an embryo characteristic training set corresponding to the embryo characteristics; after the embryo characteristic training set is sent into an embryo target detection initial model, the model total loss judgment module calculates the total loss of the embryo target detection initial model according to a loss function; and the parameter adjusting module modifies at least one parameter in the embryo target detection initial model according to the total loss so as to generate an embryo target detection model M corresponding to the embryo characteristic.

Preferably, the model total loss determining module calculates the total loss by using a BCEWithLogitsLoss loss function and a CIOU loss function.

Preferably, the parameter adjustment module calculates a gradient of at least one parameter in the embryo target detection initial model M by back propagation and optimizes the parameter by Adam optimization algorithm.

Preferably, the initial embryo target detection model is a path aggregation network, and comprises a slice layer, a first downsampling layer, a second downsampling layer, a third downsampling layer, a spatial pyramid layer, a first upsampling layer, a second upsampling layer, a first output layer, a second output layer, a third output layer and a detection layer which are sequentially connected; wherein the slice layer receives the training set of embryo characteristics; the first up-sampling layer comprises a first residual convolution module and a first up-sampling module which are connected in sequence, and an output image of the first up-sampling module is connected with an output of the third down-sampling layer to be used as an input image of the second sampling layer; the second up-sampling layer comprises a second residual convolution module and a second up-sampling module which are connected in sequence, and an output image of the second up-sampling module is used as an input image of the first output layer; the first output layer comprises a first output module and connects the output image of the second up-sampling module with the output of the second down-sampling module as the input image of the first output module; the second output layer comprises a second output module, and the output image of the first output module is connected with the output of the second residual convolution module after convolution operation to be used as the input image of the second output module; the third output layer comprises a third output module, and the output image of the second output module is connected with the output of the first residual convolution module after convolution operation to be used as the input image of the third output module; the detection layer comprises a detection module, and the first output layer, the second output layer and the third output layer are respectively used as input images of the detection layer.

The invention relates to a embryo development detection device, which receives an embryo image obtained by carrying out time-delay shooting on an embryo, and outputs characteristic classification information and positioning information of the embryo image by utilizing a pre-trained embryo target detection model, wherein the embryo target detection model comprises a slice layer, a first lower sampling layer, a second lower sampling layer, a third lower sampling layer, a spatial pyramid layer, a first upper sampling layer, a second upper sampling layer, a first output layer, a second output layer and a third output layer which are sequentially connected; wherein: the slice layer receives the embryo characteristic training set; the first up-sampling layer comprises a first residual convolution module and a first up-sampling module which are connected in sequence, and an output image of the first up-sampling module is connected with an output of the third down-sampling layer to be used as an input image of the second sampling layer; the second up-sampling layer comprises a second residual convolution module and a second up-sampling module which are connected in sequence, and an output image of the second up-sampling module is used as an input image of the first output layer; the first output layer comprises a first output module and connects the output image of the second up-sampling module with the output of the second down-sampling module as the input image of the first output module; the second output layer comprises a second output module, and the output image of the first output module is connected with the output of the second residual convolution module after convolution operation to be used as the input image of the second output module; the third output layer comprises a third output module, and the output image of the second output module is connected with the output of the first residual convolution module after convolution operation to be used as the input image of the third output module; the detection layer comprises a detection module, and the first output layer, the second output layer and the third output layer are respectively used as input images of the detection layer.

By the method, the high-efficiency extraction of the image data of the embryo in the in-vitro culture stage can be realized, the workload of embryo experts in the data acquisition judgment process is reduced, and the working efficiency is improved.

Compared with the prior art, the target detection network model disclosed by the invention can be used for directly predicting the coordinates of the frame, the confidence coefficient and the class probability of the object contained in the frame from a whole image. Because the target detection network model of the invention can always see the information of a whole image during training and testing, the context information can be well utilized during object detection, thereby being difficult to predict wrong object information on the background. In addition, since the object detection process is performed in a neural network, the object detection performance can be optimized in an end-to-end manner. In addition, compared with the fast RCNN, the initial model of the target detection network of the invention can reduce background errors and is less easy to predict non-existing objects on the background.

Drawings

The foregoing summary, as well as the following detailed description of the invention, will be better understood when read in conjunction with the appended drawings. It is to be noted that the appended drawings are intended as examples of the claimed invention. In the drawings, like reference characters designate the same or similar elements.

Fig. 1 shows a schematic diagram of the components of a training platform 100 of the embryo development testing device of the present invention.

Fig. 2 shows the overall structure schematic diagram of the embryo target detection initial model M0 and the embryo target detection model M1 of the present invention.

FIG. 3 is a schematic diagram showing the slice structure of the embryo target detection initial model M0 and the embryo target detection model M1 according to the present invention.

FIG. 4 is a schematic diagram of the first down-sampling layer structure of the embryo target detection initial model M0 and the embryo target detection model M1 according to the present invention.

Fig. 5 is a schematic diagram illustrating the structures of the second down-sampling layer and the third down-sampling layer of the embryo target detection initial model M0 and the embryo target detection model M1 according to the present invention.

Fig. 6 is a schematic diagram of the spatial pyramid layer structure of the embryo target detection initial model M0 and the embryo target detection model M1 according to the present invention.

FIG. 7 is a schematic structural diagram of the first residual convolution module and the second residual convolution module of the embryo target detection initial model M0 and the embryo target detection model M1 according to the present invention.

FIG. 8 is a schematic structural diagram of the first output module, the second output module and the third output module of the initial embryo target detection model M0 and the embryo target detection model M1 according to the present invention.

FIG. 9 is a schematic structural diagram of a detection module D of the initial object detection model M0 and the embryo object detection model M1 according to the present invention.

FIG. 10 shows an embryonic target detection raw input image and a detected output image of the present invention.

Detailed Description

The detailed features and advantages of the invention are described in detail in the detailed description which follows, and will be readily apparent to those skilled in the art from the description, claims and drawings.

In the prior art, an original image set PP consisting of a plurality of images of an embryo is obtained by performing Time-lag (Time-lag) photography of the embryo covering a plurality of embryo stages. Such time-lapse photographing apparatuses are known in the art, and by combining a high-resolution camera with an embryo incubator, continuously and dynamically monitoring and photographing an embryo development process at a specific interval, frequency, angle, etc., morphological observation and analysis of the embryo development process can be performed without frequently disturbing the environment inside the embryo incubator.

The first aspect of the present invention relates to a training platform 100 for an embryo development testing apparatus. As shown in fig. 1, the training platform 100 includes an embryo image enhancement module 1, an embryo image training set construction module 2, a model total loss judgment module 3, and a parameter adjustment module 4.

[ embryo image enhancement Module ]

The embryo image training set construction module 2 is used for labeling the input image according to the selected embryo characteristics and constructing an embryo characteristic training set corresponding to the embryo characteristics.

The embryo image enhancement module 1 receives at least two image groups Ga and Gb of embryos, wherein the two image groups Ga and Gb respectively comprise a plurality of embryo images obtained by carrying out time-delay shooting on the embryos, and the two image groups Ga and Gb respectively correspond to two different embryo development stages; and the embryo image enhancement module 1 performs image fusion on a plurality of embryo images of the two image groups Ga and Gb in a weighted addition mode to be used as input images of the embryo image training set construction module 2.

Any two embryo images from different embryo stages can be fused, and the fusion ratio is lam:0.7. when adding, each pixel value corresponding to two embryo images is directly added. I.e. inputs =0.7 images +0.3 images _random, wherein images may be images from image group Ga, corresponding to the first stage of embryo development, and images _ random may be images from image group Gb, corresponding to images of the remaining stages of embryo development different from the first stage, for example images of the fourth stage of embryo development. Therefore, the method is beneficial to improving the image fuzziness of the embryo after the fourth stage, and is beneficial to increasing the accuracy of identifying the embryo image after the fourth stage. On the basis, in the subsequent training stage, label information of any two embryo images in different embryo stages is also fused to construct a new data set for training, so that the generalization capability of the model is improved.

In addition, the embryo image enhancement module 1 can also perform various enhancement processes on the embryo image, including but not limited to image generation, image fusion, image size change, rotation, folding, and the like.

In one embodiment of the invention, the embryo image enhancement module can normalize the pixel data of each embryo image to between [ -1,1], specifically, can calculate the mean and variance of the image data, and then use the raw data minus the mean divided by the variance to normalize the data to between [ -1,1], so that the model is easier to converge; on the basis, the normalized data can be subjected to data enhancement, including: a probability of 0.5 is assigned, flipping the image in either the vertical or horizontal direction.

In another embodiment of the present invention, the embryo image enhancement module 1 can automatically adapt the embryo image with the pixel size range of 320-960 pixels to the 640 × 640 pixel image matched with the embryo target detection initial model M0 adopted in the present invention, so that the model training can be performed on the embryo original images with different pixel sizes, the information loss caused by filling or scaling the original images is avoided, and the accuracy of the embryo target detection model M1 generated in the present invention for detecting small targets in the embryo is improved.

[ embryo image dataset construction Module ]

The embryo image data set construction module 2 carries out various annotations on the time-lapse video data to generate a data set.

According to different attention characteristics, the embryo image data processed by the embryo image enhancement module is labeled, and the preset selected labeling content can comprise the time point of cell division, including the first cleavage time, the second cleavage time and the third cleavage time, or the time from 1 cell to 2 cells, 3 cells, 4 cells and 5 cells; cell area (area of each cell when divided by two); prokaryotic differentiation (female and male pronuclei), prokaryotic production time, prokaryotic disappearance time, 8 hours before prokaryotic disappearance, and the area of the male and female pronuclei; the number of nucleoli, the distribution of nucleoli three hours before the disappearance of pronuclei, etc.

And generating a corresponding data set by the marked data, wherein the data set comprises: a division time point data set, a cell area data set, a male-female prokaryotic change time point data set, a prokaryotic area data set, a nucleolar quantity data set, a nucleolar distribution pattern data set and the like.

In the invention, the image standard can be completed manually by doctors or trained workers, and the embryo image marking can also be completed by adopting an automatic marking module.

[ initial model of object detection network ]

The overall structure of the target detection network initial model M0 of the present invention is shown in fig. 2.

After the embryo characteristic training set is sent into the embryo target detection initial model M0, the model total loss judgment module 3 calculates the total loss of the embryo target detection initial model M0 according to the loss function, and the parameter adjustment module 4 modifies at least one parameter in the embryo target detection initial model M0 according to the total loss to generate the embryo target detection model M1 corresponding to the embryo characteristic. Therefore, the initial target detection model M0 has the same structure as the embryo target detection model M1 generated by the training platform according to the first aspect of the present invention, but the parameter settings of the components are different.

The target detection network initial model M0 adopts a PANET sampling mode to carry out information sampling, and the method comprises the following steps: and a bottom-to-top path expansion mode is added, so that the embryo image information is fully utilized, the image characteristics are enhanced, and the information loss is avoided.

Referring to fig. 2, the initial embryo target detection model M0 is a path aggregation network, and includes a slice layer, a first downsampling layer, a second downsampling layer, a third downsampling layer, a spatial pyramid layer, a first upsampling layer, a second upsampling layer, a first output layer, a second output layer, a third output layer, and a detection layer, which are connected in sequence, that is, the output of each layer is used as the input of the next layer.

Wherein the slice layer receives a training set of embryo characteristics; the first up-sampling layer comprises a first residual convolution module RES1 and a first up-sampling module LOS1 which are connected in sequence, and an output image of the first up-sampling module LOS is connected with an output of the third down-sampling layer to be used as an input image of the second sampling layer; the second up-sampling layer comprises a second residual convolution module RES2 and a second up-sampling module LOS2 which are sequentially connected, and an output image of the second up-sampling module LOS is used as an input image of the first output layer;

the first output layer comprises a first output module LO1 and connects the output image of the second up-sampling module LOs2 with the output of the second down-sampling module LUS2 as an input image of the first output module LO 1;

the second output layer comprises a second output module LO2, and the output image of the first output module LO1 is connected with the output of the second residual convolution module RES2 after the convolution operation conv to be used as the input image of the second output module LO 2;

the third output layer comprises a third output module LO3, and connects the output image of the second output module LO2 with the output of the first residual convolution module RES1 after the convolution operation conv, as the input image of the third output module LO 3;

the detection layer comprises a detection module D, and output images of the first output layer, the second output layer and the third output layer are respectively used as input images of the detection module.

Each layer and its composition are described in detail below.

[ sliced layer ]

The Slice layer comprises a Slice module F comprising a Slice operation Slice, a join operation Concat, and a Slice layer CBS operation, wherein:

slicing operation: firstly, receiving original input images with the size of 640 x 640 and the number of channels of 3, and carrying out three-channel slicing operation on the original input images to generate four slices with the size of 320 x 320;

in the concatenating operation concat, the four slices with the size of 320 × 320 are concatenated, that is, the image with the same size is subjected to channel expansion, so as to obtain a 12-channel 320 × 320 image;

then, a CBS operation set to {32 × 12 × 3} is performed, where the CBS operation includes three steps of convolution operation (C), normalization (B), and SiLu (S), where 32 represents that 32 convolution kernels are used in the convolution operation, 12 represents 12 input channels, and 3*3 represents the convolution kernel size; normalization (B) and SiLu (S) are known in the art, resulting in 320 x 320 images of 32 channels.

[ first lower sampling layer ]

The first downsampling layer comprises a first downsampling module LUS1, see fig. 4, the first downsampling module LUS1 comprising a convolution operation conv, a single residual network of a combination of multiple CBS operations, a pixel overlap add, a connection operation concat, and an additional CBS operation, wherein:

in the convolution operation conv, 320 × 320 images of 32 channels from the slice layer output are received and subjected to convolution operations of 64 convolution kernels, 32 channels and 3*3 convolution kernels, and then 64 channels 160 × 160 images are obtained;

the single residual network is combined by using four CBS operations, as shown in fig. 4:

the down-sampling layer first CBS operation C1 receives the 64-channel 160 × 160 image output by the convolution operation conv, which is set to {32 × 64 × 1}, and the output of which is the 32-channel 160 × 160 image;

the down-sampling layer second CBS operation C2 receives the 32-channel 160 x 160 image, set as {32 x 1}, output by the down-sampling layer first CBS operation C2, which is a 32-channel 160 x 160 image;

the down-sampling layer third CBS operation C3 receives the 32-channel 160 x 160 image, set as {32 x 3}, output by the down-sampling layer second CBS operation C2, which is a 32-channel 160 x 160 image;

the down-sampling layer fourth CBS operation C4 receives the 64-channel 160-by-160 image output by the down-sampling layer convolution operation conv as in the down-sampling layer first CBS operation C1 and sets it to { 32-by-64-by-1 } as in the down-sampling layer first CBS operation C1, and outputs it as a 32-channel 160-by-160 image;

in the pixel superposition add step, performing pixel superposition on the 32-channel 160 × 160 image output by the third CBS operation C3 of the down-sampling layer and the 32-channel 160 × 160 image output by the first CBS operation C1 of the down-sampling layer, and outputting a 160 × 160 image still of 32 channels;

connecting operation contact: connecting the 32-channel 160-by-160 image after pixel superposition with the 32-channel 160-by-160 image output by the fourth CBS operation C4 of the downsampling layer, and outputting a 64-channel 160-by-160 image;

then, the CBS operation is performed again, the CBS operation set to {64 × 1} is performed on the 160 × 160 images of the 64 channels connected to the operation output, and the 160 × 160 images of the 64 channels are output as the output of the down-sampling layer 1.

(second downsampling layer)

The second downsampling layer includes a second downsampling module LUS2, as shown in fig. 5.

For the second downsampling module LUS2, the settings 20 to 29 are: <xnotran> {64*128*3*3}, {64*128*1*1}, {64*64*1*1}, {64*64*3*3}, {64*64*1*1}, {64*64*3*3}, {64*64*1*1}, {64*64*3*3}, {128*128*1*1}, 64 160*160 128 5363 zxft 5363 , 128 80*80 , , 128 80*80 CBS , 128 80*80 . </xnotran>

(third downsampling layer)

The third downsampling layer includes a third downsampling module LUS3 having the same structure as the second downsampling module LU2, as shown in fig. 5.

For the third downsampling module LUS3, the settings 30 to 39 are: <xnotran> {128*256*3*3}, {128*256*1*1}, {128*128*1*1}, {128*128*3*3}, {128*128*1*1}, {128*128*3*3}, {128*128*1*1}, {128*128*3*3}, {245*256*1*1}, LU2 128 80*80 256 5363 zxft 5363 , 256 80*80 , , 256 80*80 CBS , 256 40*40 . </xnotran>

[ spatial pyramid layer ]

The spatial pyramid layer comprises spatial pyramid modules LSPP, as shown in fig. 6.

The spatial pyramid module LSPP receives the 256-channel 40 × 40 image from the third downsampling layer, and performs a convolution operation conv with a convolution kernel size 3*3 and a convolution kernel number of 512 to obtain a 512-channel 20 × 20 image;

the CBS operation once set to {256 × 512 × 1} is again used, resulting in an image of 256 channels 20 × 20;

then, maximum pooling operation and depth splicing are respectively carried out to obtain 1024-channel 20 × 20 images; finally, after one CBS operation set to {512 × 1024 × 1} an image of 512 channels 20 × 20 is output.

[ first Up-sampling layer ]

The first upsampling layer comprises, in order, a first residual convolution module RES1 and a first upsampling module LOS1.

For the first residual convolution module RES1, see fig. 7, the settings 40 to 45 are: {256 × 512 × 1}, {256 × 1 × 3}, {512 × 1}, {256 × 512 × 1}, and {256 × 512 × 1}; thereby receiving the 512 channel 20 by 20 image output by the spatial pyramid layer and outputting the 256 channel 20 by 20 image.

In the first up-sampling module LOS1, the 256-channel 20 × 20 image is subjected to the up-sampling operation of 2*2 by the convolution kernel, and a 256-channel 40 × 40 image is output.

Thereafter, the 256-channel 40 × 40 image output from the third down-sampling layer LUS3 is connected to the 256-channel 40 × 40 image output from the first up-sampling module LOS1 through the connection operation concat, and a 512-channel 40 × 40 image is output.

[ second Up-sampling layer ]

The second upsampling layer comprises a second residual convolution module RES2 and a second upsampling module LOS2.

The second residual convolution module RES2 has the same structure as the first residual convolution module RES1, as shown in fig. 7; for the second residual convolution module RES2, 50 to 50 are set as follows: <xnotran> {128*512*1*1}, {128*128*1*1}, {128*128*3*3}, {256*256*1*1}, {128*256*1*1}, {256*512*1*1}, 512 40*40 128 40*40 , LOS2 , 128 40*40 3242 zxft 3242 128 80*80 . </xnotran>

[ first output layer ]

The first output layer includes a primary connection operation concat and a first output module LO1.

First, the 128-channel 80 x 80 image from the second up-sampling layer LOS output is received, and in the connecting operation concat, the 128-channel 80 x 80 image from the second down-sampling layer output is connected to the 128-channel 80 x 80 image from the second up-sampling module LOS2, resulting in a 256-channel 80 x 80 image.

[ second output layer ]

The second output layer comprises in turn a convolution operation conv, a join operation concat and a second output module LO2.

In the convolution operation conv, the 128-channel 80 × 80 image from the first output module LO1 is subjected to 128 convolution operations with a convolution kernel size of 3*3 and a step size of 2, resulting in a 128-channel 40 × 40 image.

In the concatenating operation concat, the 128-channel 40 × 40 image is concatenated with the 128-channel 40 × 40 image of the second residual convolution block RES2 of the second upsampled layer, resulting in a 256-channel 40 × 40 image.

The second output module LO2 has the same structure as the first output module LO1, and as shown in fig. 8, the second output module LO2 is set 70 to 74 to {128 × 256 × 1}, {128 × 1}, {128 × 3}, {256 × 1}, {128 × 128 1}, and {128 × 1}, respectively, so that the 256 × 40 images are output after the 256 passages 40 images are subjected to the multiple CBS operations shown in fig. 8 and the combination thereof in the second output module LO2.

[ third output layer ]

The third output layer comprises in sequence a convolution operation conv, a join operation concat and a third output module LO3.

In the convolution operation conv, 256 channels 40 × 40 images from the second output module LO2 are subjected to 256 convolution operations with a convolution kernel of 3*3 and a step size of 2, to obtain 256 channels 20 × 20 images.

In the connecting operation concat, connecting the 256-channel 20 × 20 image with the 256-channel 20 × 20 image of the first residual convolution module RES1 of the first upsampling layer to obtain a 512-channel 20 × 20 image;

the third output module LO3 has the same structure as the first output module LO1 and the second output module LO2, as shown in fig. 8, for the third output module LO2, the settings 80 to 84 are {256 × 512 × 1}, {256 × 1}, respectively {256 × 3}, {512 × 1}, and {256 × 512 × 1}; thus, in the third output module LO2, the 256-channel 40 × 40 image is subjected to the CBS operations and the combination thereof as shown in fig. 8, and then a 512-channel 20 × 20 image is output.

[ detection layer ]

As shown in fig. 9, the detection layer includes a detection layer module D, which includes three anchor frame layers DML1, DML2, DML3, whose receptive fields are 8*8, 16 × 16, and 32 × 32, respectively, corresponding to 80 × 80 signature meshes of the first output layer, 40 × 40 signature meshes of the second output layer, and 20 × 20 signature meshes of the third output layer, respectively.

Each anchor frame layer receives output pictures from the first, second, and third output layers, respectively, and each anchor frame layer DML1, DML2, DML3 has three prediction frames, respectively.

In the following, taking the example that the first anchor frame layer DML1 receives the output image from the first output layer as an example, the 128 channel 80 × 80 image output by the first output layer is sent to the first anchor frame layer DML1, and the first anchor frame layer DML1 predicts the 128 channel 80 × 80 image output by the first output layer through the three prediction frames F11, F12, and F13 contained in the first anchor frame layer DML1, that is, each of the 80 feature map grids is convolution-identified by the three prediction frames F11, F12, and F13 of the first anchor frame layer DML1, so that the output image of the first output layer passes through the first anchor frame layer DML1 to obtain 3 prediction results of 80 × 80, and each piece of the prediction result information includes the target type, the target center point, the width and height coordinates, and the confidence coefficient.

That is, the first anchor frame layer DML1 obtains 80 × 80 signature meshes of the 128-channel 80 × 80 image of the first output layer, and the sense field of each signature mesh is 640/80=8 × 8, which is responsible for detecting small targets, and outputs information of 3 × 80 small-size detection targets.

Similarly, the second anchor frame layer DML2 yields 40 × 40 signature grids of the 128-channel 80 × 80 image of the first output layer, each signature grid having a receptive field of 640/40=16 × 16 size, responsible for detecting intermediate targets; the 40 × 40 feature map grids are subjected to convolution operation in combination with the three prediction frames F21, F22, and F23 in the second anchor frame layer DML2, and information of the size detection target in the 3 × 40 feature map grids is output.

Similarly, the third anchor frame layer DML3 obtains 20 × 20 feature map grids of the 128-channel 80 × 80 image of the first output layer, each feature map grid having a receptive field of 640/20=32 × 32 size, and is responsible for detecting large targets; the 20 × 20 feature map grids are subjected to convolution operation in combination with the three prediction frames F31, F32, and F33 in the third anchor frame layer DML3, and information of 3 × 20 large-size detection targets is output.

And finally, according to the confidence obj of each grid output by the detection layer, judging whether the grid contains the embryo target. Specifically, the threshold of the confidence obj is set to 0.5, that is, the object whose confidence ob is not more than 0.5 is removed; screening a prediction frame in the target information according to a non-maximum suppression algorithm, removing a target of a rectangular frame repeatedly detected for the same embryo target, finally, according to a classification probability value in the target information, retaining information on the target with the maximum embryo target probability value, and outputting a class cls, a confidence coefficient obj, a central point and a width-height coordinate of the corresponding embryo target to generate a rectangular frame D1.

Fig. 10 (a) shows an original image fed into the embryo development detecting device of the present invention, and fig. 10 (c) shows a detection result of the embryo development detecting device of the present invention, such as a size and an accurate position of an embryo for characterizing a rectangular frame D1 generated based on the first output layer. Specifically, the identifier "t3" in the upper left corner of the rectangular frame D1 is the embryo classification cls corresponding to the embryo image detected by the embryo development detection device of the present invention, and "0.92" is the confidence obj representing the classification result, and the rectangular frame D1 is generated according to the center point coordinate and width and height output by the detection model.

[ model Total loss judgment Module ]

The model total loss judgment module 3 calculates the embryo total loss by adopting a BCEWithLogitsLoss loss function and a CIOU loss function.

Specifically, in the invention, the total Loss of the model is composed of three Loss modules, namely an embryo rectangular frame Loss box _ Loss, a confidence Loss obj _ Loss and an embryo classification probability Loss cls _ Loss, wherein the total Loss of the embryo is the weighted sum of the above three losses.

The rectangular frame represents the size and the accurate position of the embryo, and the rectangular frame loss box _ loss is used for calculating the distance error between the predicted embryo rectangular frame D1 output by the detection layer and the rectangular frame coordinate in the embryo label in the actual picture.

The rectangular frame loss box _ loss adopts a CIOU loss function, and the formula is as follows:

as shown in fig. 10 (b), wherein:

distance _2: the Euclidean distance between the center point of the embryo prediction frame RG and the center point of the embryo true frame RR;

distance _ C: the diagonal distance of C (the diagonal distance of the upper left corner and the lower right corner of the box RY);

c: the minimum circumscribed matrix of the embryo prediction frame RG and the embryo true frame RR, as shown by the frame RY in the figure;

v (where w is width, h is height, gt is true frame RR of embryo, p is predicted frame RG of embryo) aspect ratio influencing factor

The IOU is the intersection ratio of the rectangular frame output by the embryo detection model and the rectangular frame of the original label, as shown in fig. 10 (b). Firstly, the intersection area of two rectangular frames is calculated:

S1＝(xp2-xl1)*(yp2–yl1)

calculate phase-wise area:

S2＝(xp2–xp1)*(yp2–yp1)+(yl2–yl1)*(xl2–xl1)-S1

the calculation of the IOU is therefore:

IOU＝S1/S2

the CIOU adds the overlapping area, the distance of the central point and the aspect ratio of the prediction rectangular box into calculation on the basis of the IOU.

The confidence degree represents the credibility of the predicted frame, the value range is 0-1, the larger the value is, the more possible embryos exist in the rectangular frame, and the confidence degree loss obj _ loss calculates the confidence degree of the network;

and (4) classifying the probability loss representing the category of the embryo, and calculating whether the category of the embryo output by the detection layer and the category of the embryo in the actual picture are correct or not by the classification probability loss cls _ loss.

The probability loss cls _ loss and confidence loss obj _ loss of the embryo classification adopt a BCEWithLogitsLoss loss function, and the formula is as follows:

for example, in the calculation of the embryo classification probability loss cls _ loss, yn indicates the type of the embryo in the label, xn is the predicted embryo type value output by the embryo detection model, in the calculation of the confidence loss obj _ loss, yn indicates the CIOU of the predicted frame output by the embryo detection model and the original image label target frame, CIOU is used as the confidence label of the predicted frame, and xn indicates the random gradient value obtained at the moment of the predicted confidence value t output by the embryo detection model.

The total embryo Loss is the weighted sum of the above three losses, and in the present invention, the embryo confidence Loss is weighted to the maximum, and the embryo rectangular box Loss and the embryo classification Loss are weighted to the next, so that a =0.4, b =0.3, c =0.3:

Loss＝a*obj_loss+b*loss_box+c*clc_loss

[ MEANS FOR ADJUSTING PARAMETER MODULE ]

The parameter adjusting module 4 calculates the gradient of at least one parameter in the embryo target detection initial model M1 through back propagation, and optimizes the parameter through an Adam optimization algorithm.

Specifically, the Adam optimization function dynamically adjusts the learning rate of each parameter in the embryo training process by utilizing the first moment estimation and the second moment estimation of the gradient, and compared with other optimization functions, the Adam optimization function has the advantages that after the embryo is subjected to offset correction, the learning rate of each iteration has a certain range, so that the embryo model parameters are more stable. The Adam optimization algorithm adopted by the invention is updated as follows:

t←t+1

calculating the gradient:

updating biased first moment estimates:

m _t ←β ₁ ·m _t-1 +(1-β ₁ )g _t

updating the biased second moment estimation:

calculate the first moment estimate of the bias correction:

calculate the first moment estimate of the bias correction:

updating parameters:

wherein the exponential moving average m of the gradient is calculated according to formula (1) _t ，m ₀ The initialization is 0. The gradient Momentum of the previous time step is integrated with reference to the Momentum algorithm. Beta is a ₁ The coefficients are exponential decay rates, and the control weight assignments (momentum and current gradient) are typically close to 1, with a default of 0.9.gt is the random gradient value obtained at time t.

m _t ←β ₁ ·m _t-1 +(1-β ₁ )g _t ……(1)

Calculating an exponential moving average v of the gradient squared according to equation (2) _t ，v ₀ The initialization is 0. The coefficient is an exponential decay rate, the influence of the gradient square before control refers to the RMSProp algorithm, the gradient square is weighted and averaged, and the default is 0.999.

Due to m ₀ Initialization to 0 results in m _t Biased towards 0, especially in the early stages of embryo model training. Thus, refer to the formula (3) m _t And correcting deviation to reduce the influence of the deviation on the initial stage of embryo training.

And m ₀ Similarly, because v ₀ An initialization of 0 results in an initial stage v of embryo training _t Biased toward 0, corrected according to equation (4)

The parameters can be updated from equation (5), the initial learning rate α multiplied by the ratio of the mean of the gradient to the square root of the variance of the gradient. Wherein the default learning rate α =0.001; setting ∈ =10 ^-8 The divisor is prevented from becoming 0.

It can be seen that for the updated step size calculation, the embryo model training can be adaptively adjusted from two angles of gradient mean and gradient square, rather than being directly determined by the current gradient.

Adam has the following advantages over other optimization functions:

(1) the learning rate is automatically initialized during embryo training;

(2) the learning rate is automatically adjusted during embryo training;

(3) the method is suitable for large-scale data and parameter scenes of embryos;

(4) the method is suitable for detecting various unstable target function combination scenes by embryos;

although the present invention has been described with reference to the present specific embodiments, it will be appreciated by those skilled in the art that the above embodiments are merely illustrative of the present invention, and various equivalent changes and substitutions may be made without departing from the spirit of the invention, and therefore, it is intended that all changes and modifications to the above embodiments within the spirit and scope of the present invention be covered by the appended claims.

Claims

1. The utility model provides a embryo development detection device's training platform, includes embryo image reinforcing module (1), embryo image training set construction module (2), model total loss judgement module (3), parameter adjustment module (4), wherein:

the embryo image enhancement module (1) receives at least two groups of images (Ga, gb) of embryos, said two groups of images (Ga, gb) respectively comprising a plurality of embryo images obtained by time-lapse photography of said embryos, and said two groups of images (Ga, gb) respectively corresponding to two different stages of embryo development; the embryo image enhancement module (1) performs image fusion on the plurality of embryo images in the two image groups Ga and Gb in a weighted addition mode to serve as an input image of the embryo image training set construction module (2);

the embryo image training set construction module (2) labels the input image according to the selected embryo characteristics so as to construct an embryo characteristic training set corresponding to the embryo characteristics;

after the embryo characteristic training set is sent into an embryo target detection initial model (M0), the model total loss judgment module (3) calculates the total loss of the embryo target detection initial model (M0) according to a loss function;

the parameter adjusting module (4) modifies at least one parameter in the embryo target detection initial model (M0) according to the total loss to generate an embryo target detection model (M1) corresponding to the embryo characteristic.

2. The training platform of claim 1, wherein:

and the model total loss judgment module (3) calculates the total loss by adopting a BCEWithLogitsLoss loss function and a CIOU loss function.

3. The training platform of claim 1, wherein:

the parameter adjustment module (4) calculates the gradient of at least one parameter in the embryo target detection initial model (M1) through back propagation, and optimizes the parameter through an Adam optimization algorithm.

4. The training platform of claim 1, wherein:

the embryo target detection initial model (M0) is a path aggregation network and comprises a slicing layer, a first downsampling layer, a second downsampling layer, a third downsampling layer, a spatial pyramid layer, a first upsampling layer, a second upsampling layer, a first output layer, a second output layer, a third output layer and a detection layer which are sequentially connected; wherein:

the slice layer receives the embryo characteristic training set; and is provided with

The first up-sampling layer comprises a first residual convolution module (RES 1) and a first up-sampling module (LOS 1) which are connected in sequence, and an output image of the first up-sampling module (LOS) is connected with an output of the third down-sampling layer to serve as an input image of the second up-sampling layer;

the second up-sampling layer comprises a second residual convolution module (RES 2) and a second up-sampling module (LOS 2) which are connected in sequence, and an output image of the second up-sampling module (LOS) is used as an input image of the first output layer;

-the first output layer comprises a first output module (LO 1) and connects the output image of the second up-sampling module (LOs 2) with the output of the second down-sampling module (LUS 2) as input image of the first output module (LO 1);

the second output layer comprises a second output module (LO 2) and connects the output image of the first output module (LO 1) after a convolution operation (conv) with the output of the second residual convolution module RES2 as the input image of the second output module (LO 2);

the third output layer comprises a third output module (LO 3) and connects the output image of the second output module (LO 2) after the convolution operation (conv) with the output of the first residual convolution module RES1 as the input image of the third output module (LO 3);

the detection layer comprises a detection module (D) and takes the output image of the first output module (LO 1), the output image of the second output module (LO 2) and the output image of the third output module (LO 3) as input images of the detection module (D), respectively.

5. An embryo development detection device receives an embryo image obtained by performing time-lapse photography on an embryo, and outputs target category information, a center point, a width-height coordinate and confidence of the embryo image by using a pre-trained embryo target detection model.

The embryo target detection model comprises a slicing layer, a first down-sampling layer, a second down-sampling layer, a third down-sampling layer, a spatial pyramid layer, a first up-sampling layer, a second up-sampling layer, a first output layer, a second output layer, a third output layer and a detection layer which are connected in sequence; wherein:

the slice layer receives the embryo characteristic training set; and is

The first up-sampling layer comprises a first residual convolution module RES1 and a first up-sampling module (LOS 1) which are connected in sequence, and an output image of the first up-sampling module (LOS 1) is connected with an output of the third down-sampling layer to serve as an input image of the second sampling layer;

the second up-sampling layer comprises a second residual convolution module RES2 and a second up-sampling module (LOS 2) which are connected in sequence, and an output image of the second up-sampling module (LOS 2) is used as an input image of the first output layer;

the third output layer comprises a third output module (LO 3) and connects the output image of the second output module (LO 2) after a convolution operation (conv) with the output of the first residual convolution module RES1 as the input image of the third output module (LO 3).