WO2023217222A1 - 细胞信息统计方法、装置、设备及计算机可读存储介质 - Google Patents

细胞信息统计方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2023217222A1
WO2023217222A1 PCT/CN2023/093467 CN2023093467W WO2023217222A1 WO 2023217222 A1 WO2023217222 A1 WO 2023217222A1 CN 2023093467 W CN2023093467 W CN 2023093467W WO 2023217222 A1 WO2023217222 A1 WO 2023217222A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
cell
image
area
training
Prior art date
Application number
PCT/CN2023/093467
Other languages
English (en)
French (fr)
Inventor
徐晓欧
Original Assignee
徕卡显微系统科技(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 徕卡显微系统科技(苏州)有限公司 filed Critical 徕卡显微系统科技(苏州)有限公司
Publication of WO2023217222A1 publication Critical patent/WO2023217222A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Definitions

  • the present application relates to the field of cell image analysis, and in particular to cell information statistical methods, devices, equipment and computer-readable storage media.
  • Such a deep learning method needs to collect a large number of training images during the data set construction stage, and they need to be labeled one by one. If you need to make the model recognize cells of other cell lines, you need to re-collect a large number of training images of other cell lines and label them one by one. Label.
  • fluorescent cell images or stained cell images are usually used to enhance cell characteristics.
  • fluorescence and staining not only increase the cost, but are also toxic to cells, and require fluorescence excitation or staining of cells in advance. It takes a lot of time. For example, DAPI staining takes about 70 minutes, and the subsequent image processing steps also become cumbersome.
  • the traditional random forest machine learning algorithm is usually used to count stain-free or non-fluorescent cells.
  • this algorithm is not accurate enough and requires users to perform a large number of annotations to obtain ideal results.
  • the purpose of the present invention is to provide an image processing algorithm that can realize cell segmentation and counting with only a small amount of labeling, so as to realize the statistics of cell information.
  • a cell information statistics method for performing information statistics on cell groups including the following steps:
  • the defective feature includes one or more of cell adhesion, at least partial disappearance of cells, and abnormal cell shape. If so, perform steps S4-S6, otherwise perform steps S6;
  • the cell information includes one or more of cell number, cell size, and cell shape.
  • step S6 is executed.
  • the pre-built model is trained through the following steps:
  • the semi-supervised learning method uses a method of paying attention to and learning only the area where the labeling operation is performed to obtain the pre-built model.
  • step M3 further includes:
  • M303 Determine the area with defective features in the training result image, where the defective features include one or more of cell adhesion, at least partial disappearance of cells, and abnormal cell shape;
  • M305 Retrain the current model according to the annotation operation. During the retraining process, only the area where the annotation operation is performed is paid attention to and learned, and a new training result image is obtained based on the current model after retraining;
  • step M306 is executed;
  • step S5 only focusing on and learning the area where the annotation operation is performed includes: extracting features according to the annotation operation to perform training on the model. Retrain.
  • step M3 it also includes:
  • annotation operation includes background annotation and/or cell annotation in the form of manual annotation.
  • step S2 one of the pre-built models is selected from the pre-built model database, and before or after step S6, it also includes:
  • the target image information of the cell group to be counted in step S1 is one or more of unstained and non-fluorescent cell images, stained cell images, fluorescent cell images, phase contrast images, and bright field images. .
  • a cell information statistics device for performing information statistics on cells in a cell group, including the following modules:
  • a pre-built model calling module configured to select a pre-built model and input the target image information into the pre-built model to obtain a result image
  • a labeling and retraining module which is configured to receive the determination result of the defect feature determination module, perform a labeling operation on at least one area where defective characteristics exist, and then retrain the prebuilt model according to the labeling operation, And input the target image information into the retrained model to obtain a new result image;
  • a statistics module configured to perform statistics on cell information based on the result image, where the cell information includes one or more of cell number, cell size, and cell shape.
  • Annotation sample unit which is configured to obtain multiple images of the cell line and manually annotate them to obtain annotation learning samples
  • a pre-built model unit configured to retrain the basic model using a semi-supervised learning method that focuses on and learns only the area where the labeling operation is performed to obtain the pre-built model .
  • the pre-built model unit includes the following sub-units:
  • a sample image set acquisition subunit which is configured to acquire a sample image set for establishing a pre-built model, where the sample image set includes a plurality of sample images;
  • a traversal subunit configured to traverse the set of sample images, sending one sample image therein to the training subunit at a time
  • a training subunit which is configured to input the received sample image into the basic model to obtain the result image during training
  • Defect identification subunit which is configured to receive the training result image obtained by the training subunit, and determine whether there is a defective feature in an area of the training result image.
  • the defective feature includes cell adhesion, cell at least One or more of local disappearance and cell abnormality;
  • the labeling and retraining subunit is configured to perform a labeling operation on at least one area with defective features; and then retrain the basic model according to the labeling operation. During the retraining process, only the areas where the labeling operation is performed are performed. Pay attention to and learn the area, and obtain a new training result image based on the current model after retraining, and send it to the defect discrimination sub-unit;
  • a model update subunit which is configured to update the basic model to the current model when the defect identification subunit identifies defect-free features; and trigger the traversal subunit to send the next sample image to the training subunit;
  • the model saving subunit is configured to save the current model in response to the end of the traversal process of the traversal subunit to obtain the prebuilt model.
  • an electronic device including a processor and a memory, wherein the memory is used to store program instructions, the processor is configured to execute the program instructions, and the program instructions are Run the steps of the method described above.
  • a computer-readable storage medium for storing program instructions, the program The instructions are configured to invoke and perform the steps of the method as described above.
  • a computer program product comprising a readably stored computer program.
  • the computer program includes program instructions.
  • the program instructions When the program instructions are run on a computer device, the computer device performs the above steps. steps of the method described.
  • a model training method including the following steps:
  • step M3 further includes:
  • M303 Determine the area with defective features in the training result image, where the defective features include one or more of cell adhesion, at least partial disappearance of cells, and abnormal cell shape;
  • M305 Retrain the current model according to the annotation operation. During the retraining process, only the area where the annotation operation is performed is paid attention to and learned, and a new training result image is obtained based on the current model after retraining;
  • step M306 is executed;
  • the basic model obtained through deep learning, combined with a small number of labeled cell line pictures, can be used to obtain a pre-built model of the cell line through semi-supervised learning.
  • the pre-built models of different cell lines can share the same basic deep learning model. That is, the same basic model combined with pictures of different cell lines can be used to obtain pre-built models that support different cell lines;
  • the semi-supervised learning method only the areas where the annotation operation is performed are paid attention to and learned. Compared with the fully supervised learning method, it has the advantages of a simpler algorithm and a clearer purpose of retraining and learning. Users can mark and learn image defect areas that they are not satisfied with. By retraining the model, you can train a model you are satisfied with, and as users identify more and more cell images, the accuracy of the model becomes higher and higher.
  • Figure 1 is a schematic flow diagram of the basic flow of a cell information statistical method provided by an exemplary embodiment of the present invention
  • Figure 2 is a specific flow diagram of a cell information statistical method provided by an exemplary embodiment of the present invention.
  • Figure 3 is a schematic flowchart of a training method for a pre-built model provided by an exemplary embodiment of the present invention
  • Figure 4 is a schematic flowchart of a method for retraining the basic model using a semi-supervised learning method according to an exemplary embodiment of the present invention
  • Figure 5 is a schematic flowchart of a method for establishing a pre-built model that supports two or more cell lines provided by an exemplary embodiment of the present invention
  • Figure 6 is a schematic flowchart of a method for paying attention to and learning only the area where the labeling operation is performed according to an exemplary embodiment of the present invention
  • Figure 7 is a schematic block diagram of a module of a cell information statistics device provided by an exemplary embodiment of the present invention.
  • Figure 8 is a schematic block diagram of a sub-unit of a pre-built model unit provided by an exemplary embodiment of the present invention.
  • Figure 9(a) is the original phase image of the unstained and non-fluorescent cell image
  • Figure 9(b) is the result image obtained after inputting Figure 9(a) into the pre-built model
  • Figure 9(c) is the comparison of Figure 9(b) ) image after labeling defect features
  • Figure 10(a) is the original image of the fluorescent cell image
  • Figure 10(b) is the result image obtained after inputting Figure 10(a) into the pre-built model
  • Figure 10(c) is the defect feature in Figure 10(b) The annotated image
  • Figure 10(d) is the result image output after retraining the model using Figure 10(c);
  • Figure 11 is a schematic diagram of a system 100 that performs the method described in an embodiment of the present invention.
  • the purpose of the present invention is to provide an improved image analysis method that can accurately count cell information in the image through simple operations even if it is an image of cells without staining or fluorescence. Since non-stained and non-fluorescent cell images are the most difficult image type to analyze and process, the present invention is suitable for non-stained and non-fluorescent cells.
  • the cell information statistical method of fluorescent cell images is also applicable to other types such as stained cell images, fluorescent cell images, phase contrast images, and bright field images.
  • the present invention uses image data of a cell line for deep learning and training to obtain a basic model; then based on this basic model, a semi-supervised learning method is used to train to obtain a pre-built model, and responds to the support needs of other cell lines, requiring only a small amount of Images labeled with other cell lines can be used as training samples for semi-supervised learning without re-establishing the basic model.
  • a cell information statistics method is provided, which is used to perform information statistics on cell groups, such as counting the number of cells, classifying and measuring cell sizes, or classifying and counting cell shapes.
  • one or more pre-built models need to be established in advance to form a model database, an appropriate pre-built model is selected according to the cell group to be counted, and the pre-built model is used to process the cell image to obtain the result image.
  • the pre-built model in the database is trained through the following steps:
  • the semi-supervised learning method uses a method of paying attention to and learning only the area where the labeling operation is performed to obtain the pre-built model.
  • step M3 further includes:
  • M301 Obtain a sample image set for establishing a pre-built model, where the sample image set includes multiple sample images.
  • M305 Retrain the current model according to the annotation operation. Specifically, input the sample image and at least the local area image of the annotation operation (which may also be the training result image after the annotation operation) into the current model to retrain and obtain a new model. Model, during the retraining process, only the area where the annotation operation is performed is paid attention to and learned, and a new training result image is obtained based on the current model after retraining;
  • step M306 is executed;
  • a single sample image is used in sequence from 0001 to 0100 to retrain the basic model using a semi-supervised learning method. At this point, the training of the sample image numbered 0001 is completed.
  • the final model is the pre-built model.
  • a pre-built model supporting two or more cell lines can also be established, as shown in Figure 5:
  • step M1 assuming that multiple images of the first cell line are acquired, then the corresponding step M308 obtains a prebuilt model that supports the first cell line;
  • step M3 also includes:
  • M4 Obtain multiple images of other cell lines (such as the second cell line) and manually label them to obtain labeled samples of other cell lines;
  • M5. Input labeled samples of other cell lines into the pre-built model, and use a semi-supervised learning method for retraining to obtain a pre-built model that supports the other cell lines.
  • step M5 uses the same sample image set as step M301 or different from step M301, input the first sample image into the pre-built model to obtain the result image during training;
  • the defective features include one or more of cell adhesion, at least partial disappearance of cells, and abnormal cell shape; perform a labeling operation on at least one area with defective features ; Retrain the basic model according to the annotation operation. During the retraining process, only focus on and learn the area where the annotation operation is performed, and obtain a new training result image based on the current model after retraining; Repeat the execution This step is performed until the preset number of repetitions is reached, or until there are no areas with defective features in the training result image, and then the following steps are performed:
  • Update the basic model to the current model traverse the next sample image in the sample image set, input it into the current model to obtain the result image in the current training, and perform the steps in the previous paragraph, that is, each time a sample image is traversed, then The current model is updated and saved once, until the sample images in the sample image set undergo a preset round of traversal, and then the latest current model is saved as a pre-built model that supports both the first cell line and the second cell line.
  • cell images (even unstained and non-fluorescent cell images) can be identified to count the cell information (such as number, size or shape) in the image.
  • cell information statistical methods include Following steps:
  • the cell group to be analyzed is often not a single cell, but an aggregation of multiple cells.
  • a microscope can be used to collect images of the cell group to obtain the target image, such as the unstained and non-fluorescent cell image shown in Figure 9(a), which is a phase image type.
  • one of the pre-built models is selected from the pre-built model database.
  • the model database includes pre-built model A that supports cell line a, pre-built model B that supports cell line b, and pre-built model B that supports cell lines a and c.
  • Build model C If the cell group to be counted belongs to cell line a, you can choose pre-built model A; if the cell group belongs to cell line c, you can choose pre-built model C; if the cell group belongs to cell line b', then select the prebuilt model corresponding to cell line b that is closest to cell line b' B.
  • the defective feature includes one or more of cell adhesion, at least partial disappearance of cells, and abnormal cell shape. If so, perform steps S4-S6, otherwise perform steps S6;
  • the pre-built model is retrained according to the annotation operation, and the target image information is input into the retrained model to obtain a new result image; in this embodiment, the target image in step S1 and at least The local area image of the annotation operation (which may also be the result image after the annotation operation) is input into the pre-built model to retrain to obtain a new model; however, the present invention does not limit the input to the target image in step S1.
  • a microscope can be used to re-acquire images of the cell group to obtain a new target image.
  • the cell information includes one or more of cell number, cell size, and cell shape.
  • the execution steps S4-S6 here include a variety of situations:
  • step S6 is executed directly;
  • step S5 After step S5, steps S3 to S5 are repeatedly executed until a preset number of times, such as 5 times, and then step S6 is executed regardless of whether the new result image obtained in step S5 still has defective features.
  • step S5 In the process of retraining the pre-built model according to the annotation operation in step S5, only the area where the annotation operation is performed is paid attention to and learned, including: extracting features according to the annotation operation to perform training on the model. Retrain.
  • the annotation operation is an operation performed on defect features in the image
  • the annotation operation specifically includes background annotation and/or cell annotation in the form of manual annotation.
  • background annotation can be used, that is, a background mark line is drawn between the two cells; if a cell in a certain area in the image is manually judged to be ignored ( If it is misidentified as background), you can use cell labeling, that is, draw cell marking points on this cell.
  • cell labeling that is, draw cell marking points on this cell.
  • the back Different colors are used for background annotation and cell annotation to facilitate differentiation of annotation types during manual annotation and to avoid misuse of background annotation without realizing it when labeling cells.
  • defect features of different types in the image different types of defect features need to be marked; if there are multiple defect features of the same type in the image, only one of them can be marked.
  • defective features that is, there is no need to force the user to mark too finely, but only to roughly mark those areas that are obviously unsatisfactory.
  • Feedback can be obtained after each mark, and this progressive interaction method is adopted until the expectations are met. Effect.
  • the neural network parameters in the model are updated to obtain a retrained model, and the image is input into the retrained model again.
  • the effectiveness of this deep learning is verified to be 100%; if all the previous defect characteristics of the same type elsewhere have not disappeared, then according to the preset standards Obtain the effectiveness of this deep learning. For example, if less than one-tenth of the same type of defect features have not disappeared, the effectiveness of deep learning is 90%; if more than one-fifth of the same type of defect features have not disappeared , then the effectiveness of deep learning is unqualified. For unqualified situations, you can return to the initial state of the image and re-label one or more defect features.
  • step S6 the pre-built model is retrained according to the labeling operation, and we get The retrained model can be used as the pre-built model B' to support the cell line b'; if you want to perform information statistics on the cell line b' next time, you can directly select the pre-built model B'.
  • a corresponding client is provided, and the user performs cell segmentation operations on the client as follows:
  • the user loads the target image information of the cell group, selects and loads a pre-built model, and runs to obtain a result image. If the user is satisfied, there is no need to perform subsequent steps, and the current result image is used as the cell segmentation result for counting or other information statistics;
  • the client interface will Display the new result image and repeat the above steps until the result image meets the user's expectations. Otherwise, continue annotation and retraining. Finally, the satisfactory result image is used as the cell segmentation result for counting or other information statistics.
  • a cell information statistics device for performing information statistics on cells in a cell group.
  • the cell information statistics device includes the following modules:
  • a pre-built model calling module configured to select a pre-built model and input the target image information into the pre-built model to obtain a result image
  • a labeling and retraining module which is configured to receive the determination result of the defect feature determination module, perform a labeling operation on at least one area where defective characteristics exist, and then retrain the prebuilt model according to the labeling operation, And input the target image information into the retrained model to obtain a new result image;
  • a statistics module configured to perform statistics on cell information based on the result image, where the cell information includes one or more of cell number, cell size, and cell shape.
  • the pre-built model calling module includes the following units:
  • a basic model unit which is configured to design a deep learning model, and use the annotated learning samples to train the deep learning model to obtain a basic model
  • a pre-built model unit configured to retrain the basic model using a semi-supervised learning method that focuses on and learns only the area where the labeling operation is performed to obtain the pre-built model .
  • a sample image set acquisition subunit which is configured to acquire a sample image set for establishing a pre-built model, where the sample image set includes a plurality of sample images;
  • a traversal subunit configured to traverse the set of sample images, sending one sample image therein to the training subunit at a time
  • a training subunit which is configured to input the received sample image into the basic model to obtain the result image during training
  • Defect identification subunit which is configured to receive the training result image obtained by the training subunit, and determine whether there is a defective feature in an area of the training result image.
  • the defective feature includes cell adhesion, cell at least One or more of local disappearance and cell abnormality;
  • the labeling and retraining subunit is configured to perform a labeling operation on at least one area with defective features; and then retrain the basic model according to the labeling operation. During the retraining process, only the areas where the labeling operation is performed are performed. Pay attention to and learn the area, and obtain a new training result image based on the current model after retraining, and send it to the defect discrimination sub-unit;
  • a model update subunit which is configured to update the basic model when the defect identification subunit identifies defect-free features;
  • the model is updated to the current model; and triggers the traversal subunit to send the next sample image to the training subunit;
  • the model saving subunit is configured to save the current model in response to the end of the traversal process of the traversal subunit to obtain the prebuilt model.
  • an electronic device including a processor and a memory, wherein the memory is used to store program instructions, the processor is configured to run the program instructions, and the program instructions is run to perform the steps performed by the above method embodiment.
  • a computer-readable storage medium for storing program instructions, and the program instructions are configured to call and perform the steps performed by the above method embodiment.
  • a computer program product including a readably stored computer program.
  • the computer program includes program instructions.
  • the program instructions When the program instructions are run on a computer device, the computer device performs the above Steps performed by method embodiments.
  • FIG. 11 shows a schematic diagram of a system 100 configured to perform the methods described herein.
  • System 100 includes a microscope 110 and a computer system 120.
  • Microscope 110 is configured to capture images and is connected to computer system 120 .
  • Computer system 120 is configured to perform at least a portion of the methods described herein.
  • Computer system 120 may be configured to execute machine learning algorithms.
  • Computer system 120 and microscope 110 may be separate entities, but may also be integrated into a common housing.
  • Computer system 120 may be part of the central processing system of microscope 110 and/or, computer system 120 may be part of a sub-assembly of microscope 110 , such as sensors, actuators, cameras or illumination units of microscope 110 , etc.
  • Computer system 120 may be a local computer device (e.g., a personal computer, laptop, tablet, or mobile phone) having one or more processors and one or more storage devices, or it may be a distributed computer system (e.g., having a distributed One or more processors and one or more storage devices at various locations, such as a local client and/or one or more remote server locations and/or a data center).
  • Computer system 120 may include any circuit or combination of circuits.
  • computer system 120 may include one or more processors, which may be of any type.
  • a processor may refer to any type of computing circuit, such as, but not limited to, a microprocessor, a microcontroller, a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set Computing (RISC) microprocessor , very long instruction word (VLIW) microprocessors, graphics processors, digital signal processors (DSP), multi-core processors, field programmable gate arrays (FPGAs) such as microscopes or microscope components (such as cameras), or any other Type processor or processing circuit.
  • CISC Complex Instruction Set Computing
  • RISC Reduced Instruction Set Computing
  • VLIW very long instruction word
  • DSP digital signal processors
  • FPGAs field programmable gate arrays
  • Computer system 120 may include one or more storage devices, which may include one or more storage elements suitable for a particular application, such as main memory in the form of random access memory (RAM), one or more hard drives, and/or a or multiple drives that handle removable media media such as compact disks (CDs), flash memory cards, digital video disks (DVDs), etc.
  • RAM random access memory
  • CDs compact disks
  • DVDs digital video disks
  • Computer system 120 may also include a display device, one or more speakers, and a keyboard and/or controller, which may include This includes a mouse, trackball, touch screen, voice recognition device, or any other device that allows a system user to enter information into or receive information from computer system 120 .
  • Some or all of the method steps may be performed by (or using) a hardware device (eg, a processor, a microprocessor, a programmable computer, or an electronic circuit). In some embodiments, such a device may perform one or more of the most important method steps.
  • a hardware device eg, a processor, a microprocessor, a programmable computer, or an electronic circuit.
  • such a device may perform one or more of the most important method steps.
  • embodiments of the invention may be implemented in hardware or software.
  • This implementation may be performed using a non-transitory storage medium (such as a digital storage medium such as a floppy disk, DVD, Blu-ray, CD, ROM, PROM and EPROM, EEPROM or FLASH) having electronically readable control signals stored thereon.
  • the read control signals cooperate (or are capable of cooperating) with the programmable computer system to perform corresponding methods. Therefore, digital storage media may be computer-readable.
  • Some embodiments of the invention comprise a data carrier having electronically readable control signals capable of cooperating with a programmable computer system to perform one of the methods described in the invention.
  • embodiments of the invention may be implemented as a computer program product having a program code, executable when the computer program product is run on a computer, for performing one of the methods.
  • the program code may, for example, be stored on a machine-readable carrier.
  • inventions include a computer program stored on a machine-readable carrier for performing one of the methods described in the invention.
  • an embodiment of the invention is therefore a computer program having a program code for performing one of the methods described in the invention when the computer program is run on a computer.
  • a further embodiment of the invention is a storage medium (or data carrier or computer readable medium) comprising stored thereon a computer program for performing the invention when executed by a processor
  • a storage medium or data carrier or computer readable medium
  • Data carriers, digital storage media or recording media are usually tangible and/or non-transitory.
  • a device according to the invention which includes a processor and a storage medium.
  • a further embodiment of the invention is a data stream or a sequence of signals representing a computer program for performing one of the methods described in the invention.
  • the data stream or signal sequence may for example be configured to be transmitted via a data communication connection, for example via the Internet.
  • Yet another embodiment includes a processing device, such as a computer or programmable logic device, configured or adapted to perform one of the methods of the invention.
  • a processing device such as a computer or programmable logic device, configured or adapted to perform one of the methods of the invention.
  • a further embodiment includes a computer on which a computer program for performing one of the methods according to the invention is installed.
  • Yet another embodiment of the invention includes a device or system configured to transmit to a receiver (eg electronically or optically) a computer program for performing one of the methods described in the invention.
  • the receiver may be, for example, a computer, a mobile device, a storage device, or the like.
  • the device or system may include, for example, a file server for transmitting the computer program to the receiver.
  • programmable logic devices eg, field programmable gate arrays
  • a field programmable gate array may cooperate with a microprocessor to perform one of the methods described herein.
  • the method is preferably performed by any hardware device.
  • Embodiments may be based on the use of machine learning models or machine learning algorithms.
  • Machine learning can refer to algorithms and statistical models that computer systems can use to perform specific tasks without using explicit instructions, instead relying on models and inference.
  • machine learning instead of rule-based data transformation, one can use data transformation derived from analysis of historical and/or training data. Change.
  • the content of the image can be analyzed using a machine learning model or using a machine learning algorithm.
  • the machine learning model can be trained using training images as input and training content information as output.
  • the machine learning model By training a machine learning model with a large number of training images and/or training sequences (such as words or sentences) and associated training content information (such as labels or annotations), the machine learning model "learns” to recognize the content of the image, making it possible to use machine learning The model recognizes image content not included in the training data.
  • the same principle can be used for other types of sensor data: by training a machine learning model using training sensor data and desired outputs, the machine learning model can "learn" the transformation between the sensor data and the output, which can be used to provide Provides output for non-training sensor data for machine learning models.
  • the provided data eg, sensor data, metadata, and/or image data
  • the example specified above uses a training method called "guided learning.”
  • guided learning multiple training samples are used to train a machine learning model, where each sample can include multiple input data values, and multiple expected output values, that is, each training sample is associated with an expected output value.
  • a machine learning model "learns" what output values to provide based on input samples similar to the samples provided during training.
  • semi-guided learning can also be used. In semi-guided learning, some training samples lack corresponding expected output values.
  • Guided learning can be based on guided learning algorithms (such as classification algorithms, regression algorithms, or similarity learning algorithms).
  • Classification algorithms can be used when the output is restricted to a restricted set of values (categorical variables), i.e. the input is classified as one of a restricted set of values.
  • Regression algorithms can be used when the output may have any numerical value (within a range).
  • Similarity learning algorithms may be similar to both classification and regression algorithms, but are based on learning from examples using a similarity function that measures the degree of similarity or relatedness between two objects.
  • unsupervised learning can also be used to train machine learning models.
  • input data can (only) be provided, and unsupervised learning algorithms can be used to find structures in the input data (e.g., by grouping or clustering the input data to find common points in the data). Clustering is the assignment of input data containing multiple input values to subsets (clusters) such that input values in the same cluster are similar according to one or more (predefined) similarity criteria and are similar to those contained in other clusters. The input values are not similar.
  • feature learning can be used.
  • feature learning may be used to at least partially train a machine learning model, and/or the machine learning algorithm may include a feature learning component.
  • Feature learning algorithms (which can be called representation learning algorithms) can retain information in their input, but also transform it in a way that makes it useful, often as a preprocessing step before performing classification or prediction.
  • Feature learning can be based on principal component analysis or cluster analysis, for example.
  • anomaly detection i.e., outlier detection
  • anomaly detection may be used to at least partially train a machine learning model, and/or the machine learning algorithm may include an anomaly detection component.
  • machine learning algorithms can use decision trees as predictive models.
  • machine learning models can be based on decision trees.
  • observations about an item (such as a set of input values) can be represented by the branches of the decision tree, and the output values corresponding to the item can be represented by the leaves of the decision tree.
  • Decision trees can support discrete values and continuous values as input Out value.
  • a decision tree can be represented as a classification tree if discrete values are used and as a regression tree if continuous values are used.
  • Association rules are another technique that can be used in machine learning algorithms.
  • a machine learning model can be based on one or more association rules. Create association rules by identifying relationships between variables in large amounts of data.
  • Machine learning algorithms can identify and/or exploit one or more relational rules that represent knowledge derived from data. Rules can be used, for example, to store, manipulate or apply this knowledge.
  • Machine learning algorithms are often based on machine learning models.
  • the term “machine learning algorithm” can mean a set of instructions that can be used to create, train, or use a machine learning model.
  • the term “machine learning model” may refer to a data structure and/or set of rules that represent learned knowledge (eg, based on training performed by a machine learning algorithm).
  • the use of a machine learning algorithm may mean the use of an underlying machine learning model (or multiple underlying machine learning models).
  • the use of a machine learning model may mean that the machine learning model and/or the data structure/rule set that is the machine learning model is trained by a machine learning algorithm.
  • the machine learning model can be an artificial neural network (ANN).
  • ANNs are systems inspired by biological neural networks, such as those found in the retina or brain.
  • An ANN consists of multiple interconnected nodes and multiple connections between nodes, so-called edges.
  • Each node can represent an artificial neuron.
  • Each edge can transfer information from one node to another.
  • the output of a node can be defined as a (non-linear) function of its inputs (e.g. the sum of its inputs).
  • a node's input can be used in a function based on the "weight" of the edge or node that provides the input.
  • weights of nodes and/or edges can be adjusted during the learning process.
  • training of an artificial neural network may include adjusting the weights of nodes and/or edges of the artificial neural network, ie, achieving a desired output for a given input.
  • the machine learning model can be a support vector machine, a random forest model, or a gradient boosting model.
  • Support vector machines i.e., support vector networks
  • a support vector machine can be trained by providing the input with multiple training input values belonging to one of two categories.
  • a support vector machine can be trained to assign new input values to one of two categories.
  • the machine learning model may be a Bayesian network, which is a probabilistic directed acyclic graphical model. Bayesian networks can use directed acyclic graphs to represent a set of random variables and their conditional dependencies.
  • machine learning models can be based on genetic algorithms, which are search algorithms and exploration techniques that mimic the process of natural selection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本发明公开了一种细胞信息统计方法、装置、设备及计算机可读存储介质,方法包括:S1、获取待统计的细胞群的目标图像信息;S2、选择预建模型,并将目标图像信息输入预建模型,得到结果图像;S3、判别结果图像是否有区域存在缺陷特征,缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的若干项,若是,则执行步骤S4-S6,否则执行步骤S6;S4、对存在缺陷特征的至少一处区域进行标注操作;S5、根据标注操作对预建模型进行再训练,并将目标图像信息输入再训练后的模型,以得到新的结果图像;S6、基于当前结果图像,对细胞进行计数。本发明采用深度学习结合半监督学习方法,仅需对结果不满意的区域进行少量标注即可达到预期的效果。

Description

细胞信息统计方法、装置、设备及计算机可读存储介质 技术领域
本申请涉及细胞图像分析领域,尤其涉及细胞信息统计方法、装置、设备及计算机可读存储介质。
背景技术
细胞计数是计算机视觉领域的研究热门,对细胞信息进行提取并统计是一项重要技术,也是医学图像处理的难题。目前通常是基于传统的深度学习方法来对细胞进行分割计数,传统的深度学习方法包括构建数据集、数据扩增、设计深度学习模型,再在数据集上训练直至模型收敛,得到一个基础模型。
这样的深度学习方法需要在构建数据集阶段收集大量的训练图片,并且需要一一标注,如果需要使得到的模型认识其他细胞系的细胞,则需要重新收集其他细胞系的大量训练图片并一一标注。
为了弥补目前深度学习模型的精度不足,目前通常采用荧光细胞图像或者染色细胞图像来增强细胞特征,但是荧光、染色不仅增加了成本,而且对细胞有毒害,并且预先对细胞进行荧光激发或染色需要耗费大量的时间,比如DAPI染色需要耗时70分钟左右,后期的图像处理步骤也变得繁琐。
目前通常使用传统的随机森林机器学习算法来对免染色或非荧光的细胞进行计数,但是这种算法准确率不够,需要用户进行大量标注才能够得到理想的结果。
发明内容
本发明的目的是提供一种仅需要少量标注即可实现细胞分割计数的图像处理算法,以此来实现细胞信息的统计。
为达到上述目的,本发明采用的技术方案如下:
一种细胞信息统计方法,用于对细胞群进行信息统计,包括以下步骤:
S1、获取待信息统计的细胞群的目标图像信息;
S2、选择预建模型,并将所述目标图像信息输入到所述预建模型,得到结果图像;
S3、判别所述结果图像是否有区域存在缺陷特征,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项,若是,则执行步骤S4-S6,否则执行步骤S6;
S4、对存在缺陷特征的至少一处区域进行标注操作;
S5、根据所述标注操作对所述预建模型进行再训练,并将所述目标图像信息输入再训练后的模型,以得到新的结果图像;
S6、基于当前结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
进一步地,在步骤S5之后,重复执行步骤S3至S5,直至达到预设的重复次数,或者直至所述结果图像中没有存在缺陷特征的区域,则执行步骤S6。
进一步地,所述预建模型通过以下步骤训练得到:
M1、获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
M2、设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
M3、使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
进一步地,步骤M3进一步包括:
M301、获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
M302、将其中的一样本图像输入基础模型,得到训练中结果图像;
M303、确定所述训练中结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
M304、对存在缺陷特征的至少一处区域进行标注操作;
M305、根据所述标注操作对当前模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像;
重复执行步骤M303至M305,直至达到预设的重复次数,或者直至所述训练中结果图像中没有存在缺陷特征的区域,再执行步骤M306;
M306、将所述基础模型更新为当前模型;
M307、遍历所述样本图像集,将下一样本图像输入当前模型以得到当前训练中结果图像,并执行步骤M303至M306,直至其中的样本图像经历预设轮次的遍历后执行M308;
M308、保存当前模型,得到所述预建模型。
进一步地,步骤S5中根据所述标注操作对所述预建模型进行再训练的过程中,仅对进行标注操作的区域进行关注和学习,包括:根据标注操作提取特征,以对所述模型进行再训练。
进一步地,所述仅对进行标注操作的区域进行关注和学习包括:
划定进行标注操作的区域范围,使得所划定的范围包括标注操作所涉及的细胞;
生成所划定的区域范围的子图像,将该子图像及对应的标注操作信息作为深度学习样本输入待再训练的模型;
以对应的标注操作信息为学习目标,深度学习所述子图像的特征,以更新所述模型中的神经网络参数,得到再训练后的模型。
进一步地,在步骤M3之后还包括:
M4、获取其他细胞系的多张图像,并对其进行人工标注,得到其他细胞系的标注样本;
M5、将其他细胞系的标注样本输入所述预建模型,并使用半监督学习方法进行再训练,得到支持该其他细胞系的预建模型。
进一步地,所述标注操作包括人工标注形式的背景标注和/或细胞标注。
进一步地,步骤S2中从预建的模型数据库中选择其中一个预建模型,在步骤S6之前或之后还包括:
S7、保存当前的再训练的模型,并将其添加到所述模型数据库中。
可选地,步骤S1中的所述待信息统计的细胞群的目标图像信息为无染色无荧光细胞图像、染色细胞图像、荧光细胞图像、相衬图像、明场图像中的一种或多种。
根据本发明的另一方面,提供了一种细胞信息统计装置,用于对细胞群的细胞进行信息统计,包括以下模块:
待统计目标获取模块,其被配置为获取待信息统计的细胞群的目标图像信息;
预建模型调用模块,其被配置为选择预建模型,并将所述目标图像信息输入到所述预建模型,得到结果图像;
缺陷特征确定模块,其被配置为确定所述结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
标注及再训练模块,其被配置为接收所述缺陷特征确定模块的确定结果,对存在缺陷特征的至少一处区域进行标注操作,再根据所述标注操作对所述预建模型进行再训练,并将所述目标图像信息输入再训练后的模型,以得到新的结果图像;
统计模块,其被配置为基于结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
进一步地,所述预建模型调用模块包括以下单元:
标注样本单元,其被配置为获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
基础模型单元,其被配置为设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
预建模型单元,其被配置为使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
进一步地,所述预建模型单元包括以下子单元:
样本图像集获取子单元,其被配置为获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
遍历子单元,其被配置为遍历所述样本图像集,每次将其中的一个样本图像发送至训练子单元;
训练子单元,其被配置为将接收到的样本图像输入基础模型,得到训练中结果图像;
缺陷判别子单元,其被配置为接收所述训练子单元得到的所述训练中结果图像,并判别所述训练中结果图像是否有区域存在缺陷特征,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
标记及再训练子单元,其被配置为对存在缺陷特征的至少一处区域进行标注操作;再根据所述标注操作对所述基础模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像,并将其发送至所述缺陷判别子单元;
模型更新子单元,其被配置为当所述缺陷判别子单元判别到无缺陷特征时,将所述基础模型更新为当前模型;且触发所述遍历子单元将下一个样本图像发送至所述训练子单元;
模型保存子单元,其被配置为响应于所述遍历子单元遍历过程结束,保存当前模型,得到所述预建模型。
根据本发明的另一方面,提供了一种电子设备,包括处理器和存储器,其中,所述存储器用于存储程序指令,所述处理器被配置为运行所述程序指令,所述程序指令被运行而执行如上所述的方法的步骤。
根据本发明的另一方面,提供了一种计算机可读存储介质,用于存储程序指令,所述程序 指令被配置为调用而执行如上所述的方法的步骤。
根据本发明的再一方面,提供了一种计算机程序产品,包括被可读存储的计算机程序,所述计算机程序包括程序指令,当所述程序指令在计算机设备上运行时,计算机设备执行如上所述的方法的步骤。
根据本发明的又一方面,提供了一种模型训练方法,包括以下步骤:
M1、获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
M2、设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
M3、使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型;
其中,步骤M3进一步包括:
M301、获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
M302、将其中的一样本图像输入基础模型,得到训练中结果图像;
M303、确定所述训练中结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
M304、对存在缺陷特征的至少一处区域进行标注操作;
M305、根据所述标注操作对当前模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像;
重复执行步骤M303至M305,直至达到预设的重复次数,或者直至所述训练中结果图像中没有存在缺陷特征的区域,再执行步骤M306;
M306、将所述基础模型更新为当前模型;
M307、遍历所述样本图像集,将下一样本图像输入当前模型以得到当前训练中结果图像,并执行步骤M303至M306,直至其中的样本图像经历预设轮次的遍历后执行M308;
M308、保存当前模型,得到所述预建模型。
本发明提供的技术方案带来的有益效果如下:
用户对预建模型输出结果不满意的情况下通过对结果图像进行标注并对模型进行再训练,以达到期望的效果,因此,对初始模型的精度要求较低,前期建立模型阶段无需大量标注过的训练样本;
通过深度学习得到的基础模型,结合少量的被标注的细胞系图片,通过半监督学习的方法可得到该细胞系的预建模型,不同细胞系的预建模型可以共享同一深度学习的基础模型,即同一基础模型结合不同细胞系图片可以得到对应支持不同细胞系的预建模型;
适用于大多数的细胞图像,包括染色细胞图像、荧光细胞图像、相衬图像、明场图像,甚至能够精确地分辨无染色无荧光细胞图像中的细胞;
半监督学习方法中仅对进行标注操作的区域进行关注和学习,相比于全监督学习方法具有算法更简便、再训练学习目的更明确的优点,用户针对自己不满意的图像缺陷区域进行标注及模型再训练,可以训练得到自己满意的模型,并且随着用户对越来越多的细胞图像进行识别,模型精度越来越高。
附图说明
为了更清楚地说明本申请实施例中的技术方案或常规的技术方案,下面将对实施例或常规技术方案描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明的一个示例性实施例提供的细胞信息统计方法基础流程示意图;
图2是本发明的一个示例性实施例提供的细胞信息统计方法的具体流程示意图;
图3是本发明的示例性实施例提供的预建模型的训练方法流程示意图;
图4是本发明的示例性实施例提供的使用半监督学习方法对所述基础模型进行再训练的方法流程示意图;
图5是本发明的示例性实施例提供的建立支持两个以上细胞系的预建模型的方法流程示意图;
图6是本发明的示例性实施例提供的仅对进行标注操作的区域进行关注和学习的方法流程示意图;
图7是本发明的示例性实施例提供的细胞信息统计装置的模块示意框图;
图8是本发明的示例性实施例提供的预建模型单元的子单元示意框图;
图9(a)是无染色无荧光细胞图像的原始相位图像;图9(b)是将图9(a)输入预建模型后得到的结果图像;图9(c)是对图9(b)中缺陷特征进行标注后的图像;
图10(a)是荧光细胞图像的原始图像;图10(b)是将图10(a)输入预建模型后得到的结果图像;图10(c)是对图10(b)中缺陷特征进行标注后的图像;图10(d)是利用图10(c)对模型再训练后再输出的结果图像;
图11是执行本发明实施例所描述的方法的系统100的示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、装置、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其他步骤或单元。
常规的利用图像分析法对细胞计数的方法中,需要对细胞进行染色或荧光激发,使其在图像中呈现特定的特征,以便于区分相邻的细胞。而本发明的目的在于提供一种改进的图像分析方法,即使是无染色无荧光细胞图像,也能够通过简单的操作准确地统计出图像中的细胞信息。由于无染色无荧光细胞图像是分析处理难度最大的图像类型,因此,本发明中适用于无染色无 荧光细胞图像的细胞信息统计方法,对于其他类型比如染色细胞图像、荧光细胞图像、相衬图像、明场图像同样是适用的。
本发明用一种细胞系的图像数据来深度学习,训练得到基础模型;再基于这个基础模型,采用半监督学习的方法来训练得到预建模型,并且响应其他细胞系的支持需求,仅需要少量标注其他细胞系的图像作为半监督学习的训练样本即可,而无需重新建立基础模型。
在本发明的一个实施例中,提供了一种细胞信息统计方法,用于对细胞群进行信息统计,比如对细胞数量进行计数、对细胞大小进行分类计量或者对细胞形状进行分类统计等等。
本实施例中,需要预先建立一个或多个预建模型,组成模型数据库,根据待信息统计的细胞群来选择合适的预建模型,利用预建模型对细胞图像进行处理得到结果图像。如图3所示,数据库中的预建模型通过以下步骤训练得到:
M1、获取一个细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
M2、设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
M3、使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
具体如图4所示,步骤M3进一步包括:
M301、获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像。
与步骤M1中标注学习样本不同的是,这里的样本图像集中的样本图像可以不需要预先人工标注。比如样本图像集中有100张样本图像,分别编号为0001至0100。
M302、将其中的一样本图像输入基础模型,得到训练中结果图像;
M303、确定所述训练中结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
M304、对存在缺陷特征的至少一处区域进行标注操作;
M305、根据所述标注操作对当前模型进行再训练,具体为将样本图像和至少标注操作的局部区域图像(也可以是进行标注操作后的训练中结果图像)输入当前模型以再训练得到新的模型,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像;
重复执行步骤M303至M305,直至达到预设的重复次数,或者直至所述训练中结果图像中没有存在缺陷特征的区域,再执行步骤M306;
M306、将所述基础模型更新为当前模型;
比如按照编号0001至0100的顺序依次利用单张样本图像使用半监督学习方法对基础模型进行再训练,至此完成对编号0001的样本图像训练完毕。
M307、遍历所述样本图像集,将下一样本图像输入当前模型以得到当前训练中结果图像,并执行步骤M303至M306,直至其中的样本图像经历预设轮次的遍历后执行M308;
比如按序遍历至编号为0002的样本图像,对其执行步骤M302至M306,然后对编号为0003的样本图像执行步骤M302至M306,直至遍历到编号为0100的样本图像,对其执行步骤M302至M306。至此完成一轮次的遍历,本发明并不限定遍历的轮次数量,若不完全遍历样本图像集中的全部图像(比如跳过其中编号为0066的图像,或只遍历到编号为0098的图像),也应 该视为本实施例的简单置换,同样落入本发明要求的保护范围。
M308、保存当前模型,得到所述预建模型。
按照预设的遍历规则完成遍历后,最终得到的模型即为预建模型。
在本发明的一个实施例中,还可以建立支持两个以上细胞系的预建模型,具体如图5所示:
在步骤M1中,假设获取的是第一细胞系的多张图像,那么相应的步骤M308得到的是支持第一细胞系的预建模型;
在步骤M3之后还包括:
M4、获取其他细胞系(比如第二细胞系)的多张图像,并对其进行人工标注,得到其他细胞系的标注样本;
M5、将其他细胞系的标注样本输入所述预建模型,并使用半监督学习方法进行再训练,得到支持该其他细胞系的预建模型。
步骤M5的具体执行步骤如下:以相同于步骤M301或不同于步骤M301的样本图像集,将第一张样本图像输入所述预建模型,得到训练中结果图像;
确定所述训练中结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;对存在缺陷特征的至少一处区域进行标注操作;根据所述标注操作对所述基础模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像;重复执行这个步骤,直至达到预设的重复次数,或者直至所述训练中结果图像中没有存在缺陷特征的区域,再执行以下步骤:
将所述基础模型更新为当前模型,遍历样本图像集的下一张样本图像,将其输入当前模型以得到当前训练中结果图像,并执行上一段落的步骤,即每遍历一张样本图像,则更新保存一次当前模型,直至样本图像集中的样本图像经历预设轮次的遍历后,将最新的当前模型保存为同时支持第一细胞系和第二细胞系的预建模型。
若想其同时支持第三细胞系,则可以取第二细胞系的多张图像,执行步骤M4至M5,在此不再赘述。
也可以就不同的细胞系,分别执行步骤M1至M3,得到不同的预建模型,可以将这些预建模型保存得到预建模型数据库。
基于预建模型的建立,可以对细胞图像(甚至是无染色无荧光细胞图像)进行识别以统计图像中的细胞信息(比如数量、大小或形状),如图1所示,细胞信息统计方法包括以下步骤:
S1、获取待信息统计的细胞群的目标图像信息。
待信息统计的细胞群往往不会是单个的细胞,而是呈多个细胞聚集的状态。可以利用显微镜对该细胞群进行图像采集,得到目标图像,如图9(a)所示的无染色无荧光细胞图像,为相位图像类型。
S2、选择预建模型,并将所述目标图像信息输入到所述预建模型,得到结果图像,如图9(b)所示。
具体地,从预建的模型数据库中选择其中一个预建模型,比如模型数据库中有支持细胞系a的预建模型A,支持细胞系b的预建模型B,支持细胞系a、c的预建模型C,若所述待信息统计的细胞群属于细胞系a,则可以选择预建模型A;若该细胞群属于细胞系c,则可以选择预建模型C;若该细胞群属于细胞系b’,则选择与细胞系b’最相近的细胞系b对应的预建模型 B。
S3、判别所述结果图像是否有区域存在缺陷特征,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项,若是,则执行步骤S4-S6,否则执行步骤S6;
若当前结果图像不存在缺陷特征,则直接执行S6,即基于当前没有缺陷特征的结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
若当前结果图像存在缺陷特征,则执行:S4、对存在缺陷特征的至少一处区域进行标注操作,如图9(c)中两处箭头所示(实际界面上可以不显示箭头);S5、根据所述标注操作对所述预建模型进行再训练,并将所述目标图像信息输入再训练后的模型,以得到新的结果图像;本实施例中,将步骤S1中的目标图像和至少标注操作的局部区域图像(也可以是进行标注操作后的结果图像)输入所述预建模型以再训练得到新的模型;但是本发明并不限定输入的是步骤S1中的目标图像,在另一个实施例中,可以利用显微镜对该细胞群重新图像采集,得到当前新的目标图像。
对于目标图像是荧光细胞图像的情况,如图10(a)所示,将其输入预建模型得到如图10(b)所示的结果图像,结果图像上还显示了细胞数量,10(c)示出了对该结果图像进行标注,其中标注的点标识细胞,标注的线标识背景(非细胞),然后将标注操作和目标图像输入当前模型进行再训练得到再训练后的模型,再将目标图像输入再训练后的模型,得到图10(d)所示的新的结果图像,可以看到经过新的模型识别所得到的细胞数量的精确度提高了。
S6、基于当前结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
这里的执行步骤S4-S6包括多种情况:
情况一:无论步骤S5得到的新的结果图像是否还存在缺陷特征,都直接执行步骤S6;
情况二:如图2所示,在步骤S5之后,重复执行步骤S3至S5,直至所述结果图像中没有存在缺陷特征的区域,则执行步骤S6;
情况三:在步骤S5之后,重复执行步骤S3至S5,直至重复预设的次数,比如重复5次,之后无论步骤S5得到的新的结果图像是否还存在缺陷特征,都执行步骤S6。
具体地,步骤S5中根据所述标注操作对所述预建模型进行再训练的过程中,仅对进行标注操作的区域进行关注和学习,包括:根据标注操作提取特征,以对所述模型进行再训练。
“仅对进行标注操作的区域进行关注和学习”还同时为步骤M3、M305中特征,如图6所示,所述仅对进行标注操作的区域进行关注和学习包括:
划定进行标注操作的区域范围,使得所划定的范围包括标注操作所涉及的细胞;
生成所划定的区域范围的子图像,将该子图像及对应的标注操作信息作为深度学习样本输入待再训练的模型;
以对应的标注操作信息为学习目标,深度学习所述子图像的特征,以更新所述模型中的神经网络参数,得到再训练后的模型。
具体地,标注操作是针对图像中缺陷特征所作出的操作,标注操作具体包括人工标注形式的背景标注和/或细胞标注。比如,人工判别图像中某处区域的两个细胞发生了黏连,则可以采用背景标注,即在这两个细胞间划上背景标记线;若人工判别图像中某处的细胞被忽略了(将其误识别为背景),则可以采用细胞标注,即在这个细胞上划上细胞标记点。本实施例中,背 景标注和细胞标注采用不同的颜色,以便于人工标注的时候区分标注类型,避免想细胞标注的时候而未察觉误用背景标注。
在本发明的一个实施例中,图像中若存在多处类型不同的缺陷特征,需要对不同类型的缺陷特征均进行标注;图像中若存在多处相同类型的缺陷特征,可以仅对其中一处缺陷特征进行标注,即不需要强求用户标注的太精细,只需要粗略标注那些明显不满意的地方即可,在每次标注后都能得到反馈,采用这种渐进式的交互方式,直至达到预期的效果。然后模型经过对标注的缺陷特征进行深度学习后,更新所述模型中的神经网络参数以得到再训练后的模型,将该图像再次输入再训练后的模型中,得到的结果图像若是不仅之前标注的缺陷特征消失,而且之前其他处相同类型的缺陷特征也消失,则验证此次深度学习的有效性为100%;若之前其他处相同类型的缺陷特征并未全部消失,则根据预设的标准得到此次深度学习的有效性,比如有少于十分之一的相同类型的缺陷特征未消失,则深度学习的有效性为90%;有大于五分之一的相同类型的缺陷特征未消失,则深度学习的有效性为不合格,对于不合格的情况,可以返回到该图像的初始状态,重新对其中一处或多处缺陷特征进行标注。
在步骤S6之前或之后还包括:
S7、保存当前的再训练的模型,并将其添加到所述模型数据库中。
以待统计的细胞群属于细胞系b’为例,在步骤S2中选择了相近的预建模型B,在步骤S6执行的时候,有根据标注操作对预建模型进行再训练的情况下,得到的再训练后的模型即可以作为支持细胞系b’的预建模型B’;若下一次要对细胞系b’进行信息统计时,可以直接选取预建模型B’。
或者,以待信息统计的细胞群属于细胞系a为例,在步骤S2中选择了预建模型A,后续有根据标注操作对预建模型进行再训练的情况下,得到的再训练后的模型即可以作为支持细胞系a的预建模型A的优化模型;若下一次要对细胞系a进行信息统计时,可以直接选取预建模型A的优化模型。随着模型的迭代更新,优化模型的学习能力逐步增强,可使下一次执行步骤S1至S6的过程中,结果图像出现缺陷特征的数量逐步减少,进一步提升操作的简便性。
在本发明的一个实施例中,提供了相应的客户端,用户在客户端上进行细胞分割的操作如下:
用户加载细胞群的目标图像信息,选择并加载一个预建模型,运行得到一张结果图像,若用户满意,则不用执行后续步骤,以当前结果图像作为细胞分割结果,进行计数或者其他信息统计;
若用户不满意,则对不满意的区域进行标注,比如,在黏连的细胞之间标注背景,在消失的细胞处标注细胞;完成标注后,点击客户端的再训练按钮,然后客户端界面会显示新的结果图像,反复上述步骤,直至结果图像达到用户预期,否则就继续标注、再训练。最后以满意的结果图像作为细胞分割结果,进行计数或者其他信息统计。
细胞形态各异,不同的细胞在显微镜下面有不同的形态,另一方面,研究员可能对细胞下药或者改变培养液的操作可能会对细胞的形态产生影响,再者,不同的光强或曝光时间对成像有很大的影响,进而对算法结果有很大的影响。本发明实施例中,用户可以训练并保存自己的模型来满足自己的应用场景,通常一个用户仅针对特定几个细胞系感兴趣。他们可以保存自己的模型,允许用户在很短的时间内训练他们自己的模型,所以几乎可以满足所有用户的需求, 并且使用模型对细胞图像识别的次数越多,在识别过程中用户通过对不满意的区域进行标注以强化模型的再训练,使得模型输出的结果越准确,最终达到很高的准确率。并且,不需要强求用户标注的太精细,只需要粗略标注那些明显不满意的地方即可,在每次标注后都能得到反馈,采用这种渐进式的交互方式,直至达到预期的效果。在对578张hela相差图的测试中,得到的验证结果为平均准确率达到94.42%,R2(从自变量预测的因变量的变化比例)达到0.9931。
在本发明的一个实施例中,提供了一种细胞信息统计装置,用于对细胞群的细胞进行信息统计,如图7所示,细胞信息统计装置包括以下模块:
待统计目标获取模块,其被配置为获取待信息统计的细胞群的目标图像信息;
预建模型调用模块,其被配置为选择预建模型,并将所述目标图像信息输入到所述预建模型,得到结果图像;
缺陷特征确定模块,其被配置为确定所述结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
标注及再训练模块,其被配置为接收所述缺陷特征确定模块的确定结果,对存在缺陷特征的至少一处区域进行标注操作,再根据所述标注操作对所述预建模型进行再训练,并将所述目标图像信息输入再训练后的模型,以得到新的结果图像;
统计模块,其被配置为基于结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
如图7所示,所述预建模型调用模块包括以下单元:
标注样本单元,其被配置为获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
基础模型单元,其被配置为设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
预建模型单元,其被配置为使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
如图8所示,所述预建模型单元包括以下子单元:
样本图像集获取子单元,其被配置为获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
遍历子单元,其被配置为遍历所述样本图像集,每次将其中的一个样本图像发送至训练子单元;
训练子单元,其被配置为将接收到的样本图像输入基础模型,得到训练中结果图像;
缺陷判别子单元,其被配置为接收所述训练子单元得到的所述训练中结果图像,并判别所述训练中结果图像是否有区域存在缺陷特征,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
标记及再训练子单元,其被配置为对存在缺陷特征的至少一处区域进行标注操作;再根据所述标注操作对所述基础模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像,并将其发送至所述缺陷判别子单元;
模型更新子单元,其被配置为当所述缺陷判别子单元判别到无缺陷特征时,将所述基础模 型更新为当前模型;且触发所述遍历子单元将下一个样本图像发送至所述训练子单元;
模型保存子单元,其被配置为响应于所述遍历子单元遍历过程结束,保存当前模型,得到所述预建模型。
在本发明的一个实施例中,提供了一种电子设备,包括处理器和存储器,其中,所述存储器用于存储程序指令,所述处理器被配置为运行所述程序指令,所述程序指令被运行而执行如上方法实施例所执行的步骤。
在本发明的一个实施例中,提供了一种计算机可读存储介质,用于存储程序指令,所述程序指令被配置为调用而执行如上方法实施例所执行的步骤。
在本发明的一个实施例中,提供了一种计算机程序产品,包括被可读存储的计算机程序,所述计算机程序包括程序指令,当所述程序指令在计算机设备上运行时,计算机设备执行如上方法实施例所执行的步骤。
需要说明的是,上述细胞信息统计装置、电子设备、计算机可读存储介质、计算机程序产品实施例与细胞信息统计方法实施例属于相同的发明构思,通过引用的方式将细胞信息统计方法实施例的全部内容并入细胞信息统计装置、电子设备、计算机可读存储介质、计算机程序产品实施例中。
一些实施例涉及一种包括应用于细胞信息统计方法、装置的显微镜。可选地,显微镜可以是执行如结合图1至图6中的一个或多个方法流程的系统的一部分或连接到该系统。图11示出被配置为执行本发明所描述的方法的系统100的示意图。系统100包括显微镜110和计算机系统120。显微镜110配置为拍摄图像,并连接到计算机系统120。计算机系统120被配置为执行本发明所描述的方法的至少一部分。计算机系统120可以被配置为执行机器学习算法。计算机系统120和显微镜110可以是单独的实体,但也可以集成到一个公用外壳中。计算机系统120可以是显微镜110的中央处理系统的一部分并且/或者,计算机系统120可以是显微镜110的子组件的一部分,例如显微镜110的传感器、执行器、照相机或照明单元等。
计算机系统120可以是具有一个或多个处理器和一个或多个存储装置的本地计算机装置(例如个人电脑、笔记本、平板电脑或移动电话),或者也可以是分布式计算机系统(例如,具有分布在各个位置、例如本地客户端和/或一个或多个远程服务器场所和/或数据中心的一个或多个处理器和一个或多个存储装置)。计算机系统120可以包括任何电路或电路的组合。在一个实施例中,计算机系统120可以包括一个或多个可以是任何类型的处理器。如本发明所使用的,处理器可以指任何类型的计算电路,例如但不限于微处理器、微控制器、复杂指令集计算(CISC)微处理器、精简指令集计算(RISC)微处理器、超长长指令字(VLIW)微处理器、图形处理器、数字信号处理器(DSP)、多核处理器、例如显微镜或显微镜组件(例如相机)的现场可编程门阵列(FPGA)或任何其他类型处理器或处理电路。可以包括在计算机系统120中的其他类型的电路可以是定制电路、专用集成电路(ASlC)等,例如在移动电话、平板电脑、笔记本电脑、双向无线电和类似电子系统之类的无线装置中使用的一个或多个电路(例如通信电路)。计算机系统120可以包括一个或多个存储设备,其可以包括一个或多个适合特定应用的存储元件,例如形式为随机存取存储器(RAM)的主存储器、一个或多个硬盘驱动器和/或一个或多个处理可移除的媒体介质、例如光盘(CD)、闪存卡、数字视频磁盘(DVD)等的驱动器。计算机系统120还可以包括显示装置、一个或多个扬声器以及键盘和/或控制器,其可以包 括鼠标、轨迹球、触摸屏、语音识别装置或允许系统用户将信息输入到计算机系统120或从计算机系统120接收信息的任何其它装置。
方法步骤中的一些或全部可以通过(或使用)硬件设备(例如,处理器、微处理器、可编程计算机或电子电路)来执行。在一些实施例中,这种设备可以执行最重要的方法步骤中的一个或多个。
取决于某些实施要求,本发明的实施例可以在硬件或软件中实施。可以使用存储在其上的电子可读控制信号的非暂时性存储介质(诸如数字存储介质、例如软盘、DVD、蓝光、CD、ROM、PROM和EPROM、EEPROM或FLASH)进行该实施,该电子可读控制信号与可编程计算机系统协作(或能够协作),从而执行相应的方法。因此,数字存储介质可以是计算机可读的。
本发明的一些实施例包括具有电子可读控制信号的数据载体,该电子可读控制信号能够与可编程计算机系统协作,从而执行本发明描述的方法之一。
通常,本发明的实施例可以实施为具有程序代码的计算机程序产品,当计算机程序产品在计算机上运行时,该程序代码可运行,以用于执行所述方法之一。程序代码例如可以存储在机器可读载体上。
其他实施例包括存储在机器可读载体上的、用于执行本发明所述所述方法之一的计算机程序。
换而言之,本发明的实施例因此是一种计算机程序,其具有当计算机程序在计算机上运行时用于执行本发明所述方法之一的程序代码。
因此,本发明的又一实施例是一种存储介质(或数据载体或计算机可读介质),其包括存储在其上的计算机程序,该计算机程序用于在其由处理器执行时执行本发明所述方法之一。数据载体、数字存储介质或记录介质通常是有形的和/或非暂时性的。本发明的又一个实施例是如本发明所述的设备,其包括处理器和存储介质。
因此,本发明的又一实施例是表示用于执行本发明所述方法之一的计算机程序的数据流或信号序列。数据流或信号序列例如可以配置为经由数据通信连接、例如经由互联网来传输。
又一实施例包括处理装置、例如计算机或可编程逻辑装置,其配置为或适于执行本发明所述方法之一。
又一实施例包括一种计算机,该计算机上安装了用于执行本发明所述方法之一的计算机程序。
本发明的又一实施例包括一种设备或系统,其配置为将用于执行本发明所述方法之一的计算机程序(例如,以电子方式或光学方式)传送给接收器。接收器例如可以是计算机、移动装置、存储装置等。该设备或系统例如可以包括用于将计算机程序传输到接收器的文件服务器。
在一些实施例中,可编程逻辑器件(例如,现场可编程门阵列)可以用于执行本发明所述方法的一些或全部功能。在一些实施例中,现场可编程门阵列可以与微处理器协作,以便执行本发明所述方法之一。通常,该方法优选地由任何硬件设备执行。
实施例可以基于使用机器学习模型或机器学习算法。机器学习可以指代计算机系统可以用来执行特定任务而无需使用显式指令、而是依靠模型和推理的算法和统计模型。例如,在机器学习中,代替基于规则的数据转换,可以使用从对历史和/或训练数据的分析中推导出的数据转 换。例如,可以使用机器学习模型或使用机器学习算法来分析图像的内容。为了使机器学习模型分析图像的内容,可以使用训练图像作为输入并且训练内容信息作为输出来训练机器学习模型。通过用大量的训练图像和/或训练序列(例如单词或句子)和相关的训练内容信息(例如标签或注释)训练机器学习模型,机器学习模型“学习”识别图像的内容,使得可以使用机器学习模型识别训练数据中未包含的图像内容。相同的原理同样也可以用于其他类型的传感器数据:通过使用训练传感器数据和期望输出来训练机器学习模型,机器学习模型可以“学习”传感器数据和输出之间的转换,其可以用于基于提供给机器学习模型的非训练传感器数据提供输出。所提供的数据(例如,传感器数据、元数据和/或图像数据)可以被预处理,以获得特征向量,该特征向量用作机器学习模型的输入。
可以使用训练输入数据来训练机器学习模型。上面指定的示例使用一种称为“指导学习”的训练方法。在指导学习中,使用多个训练样本来训练机器学习模型,其中每个样本可以包括多个输入数据值、以及多个期望的输出值,即每个训练样本与期望的输出值相关联。通过指定训练样本和期望的输出值,机器学习模型根据类似于在训练过程中提供的样本的输入样本“学习”要提供的输出值。除了指导学习之外,还可以使用半指导学习。在半指导学习中,一些训练样本缺少相应的期望输出值。指导学习可以基于指导学习算法(例如分类算法、回归算法或相似性学习算法)。当输出被限制为一组受限的值(分类变量)时,即输入归类为一组受限的值中的一个,可以使用分类算法。当输出可能具有任何数值(在一个范围内)时,可以使用回归算法。相似性学习算法可能与分类和回归算法都相似,但基于从使用测量两个对象之间相似或相关程度的相似度函数的示例的学习。除了指导学习或半指导学习之外,还可以使用无指导学习来训练机器学习模型。在无指导学习中,可以(仅)提供输入数据,并且无指导学习算法可以用来查找输入数据中的结构(例如,通过对输入数据进行分组或分簇,查找数据中的共通点)。分簇是将包含多个输入值的输入数据分配给子集(簇),使得同一簇中的输入值按照一个或多个(预定义)相似性标准而相似,而与包含在其他簇中的输入值不相似。
强化学习是第三组机器学习算法。换而言之,强化学习可用于训练机器学习模型。在强化学习中,训练了一个或多个软件执行者(称为“软件代理”),以在环境中采取行动。基于所采取的行动,计算奖励。强化学习基于对一个或多个软件代理进行行动选择的训练,以使累积奖励增加,从而使软件代理在执行任务时变得更好(通过增加奖励来证明)。
此外,某些技术可以应用于一些机器学习算法。例如,可以使用特征学习。换而言之,可以使用特征学习来至少部分地训练机器学习模型,和/或机器学习算法可以包括特征学习组件。特征学习算法(可以称为表示学习算法)可以将信息保留在其输入中,但也可以通过使其有用的方式对其进行转换,通常作为执行分类或预测之前的预处理步骤。特征学习例如可以基于主成分分析或集群分析。
在一些示例中,可以使用异常检测(即,孤立点检测),其目的是通过与大多数输入或训练数据明显不同来提供对引起怀疑的输入值的识别。换而言之,可以使用异常检测来至少部分地训练机器学习模型,和/或机器学习算法可以包括异常检测成分。
在一些示例中,机器学习算法可以将决策树用作预测模型。换而言之,机器学习模型可以基于决策树。在决策树中,关于项目的观察(例如一组输入值)可以由决策树的分支表示,并且与该项目相对应的输出值可以由决策树的叶子表示。决策树可以支持离散值和连续值作为输 出值。如果使用离散值,则决策树可以表示为分类树,如果使用连续值,则决策树可以表示为回归树。
关联规则是可以在机器学习算法中使用的另一种技术。换而言之,机器学习模型可以基于一个或多个关联规则。通过识别大量数据中变量之间的关系来创建关联规则。机器学习算法可以识别和/或利用一个或多个关系规则,这些关系规则表示从数据中得出的知识。规则例如可以用于存储、操作或应用该知识。
机器学习算法通常基于机器学习模型。换而言之,术语“机器学习算法”可以表示可用于创建、训练或使用机器学习模型的指令集。术语“机器学习模型”可以表示代表所学知识的数据结构和/或规则集(例如,基于由机器学习算法所执行的训练)。在实施例中,机器学习算法的使用可以意味着潜在的机器学习模型(或多个潜在的机器学习模型)的使用。机器学习模型的使用可意味着机器学习模型和/或作为机器学习模型的数据结构/规则集由机器学习算法训练。
例如,机器学习模型可以是人工神经网络(ANN)。ANN是受生物神经网络启发的系统,诸如可以在视网膜或大脑中发现的系统。ANN包括多个互连的节点以及节点之间的多个连接、所谓的边。通常有三种类型的节点:接收输入值的输入节点、(仅)连接到其他节点的隐藏节点以及提供输出值的输出节点。每个节点可以代表一个人工神经元。每个边可以将信息从一个节点传输到另一节点。节点的输出可以定义为其输入(例如其输入之和)的(非线性)函数。可以基于提供输入的边或节点的“权重”在函数中使用节点的输入。节点和/或边的权重可以在学习过程中进行调整。换而言之,人工神经网络的训练可以包括调整人工神经网络的节点和/或边的权重,即对于给定输入实现期望的输出。
可选地,机器学习模型可以是支持向量机、随机森林模型或梯度提升模型。支持向量机(即支持向量网络)是具有相关学习算法的指导学习模型,其可用于分析数据(例如,在分类或回归分析中)。可以通过向输入提供属于两个类别之一的多个训练输入值来训练支持向量机。可以训练支持向量机,以将新的输入值分配给两个类别之一。可选地,机器学习模型可以是贝叶斯网络,其是概率有向非循环图形模型。贝叶斯网络可以使用有向无环图来表示一组随机变量及其条件依赖性。可选地,机器学习模型可以基于遗传算法,其是模仿自然选择过程的搜索算法和探索技术。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
如本发明所使用的,术语“和/或”包括一个或多个相关联的所列项目的任何和所有组合,并且可以缩写为“/”。
以上所述仅是本发明的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (19)

  1. 一种细胞信息统计方法,用于对细胞群进行信息统计,其特征在于,包括以下步骤:
    S1、获取待信息统计的细胞群的目标图像信息;
    S2、选择预建模型,并将所述目标图像信息输入到所述预建模型,得到结果图像;
    S3、判别所述结果图像是否有区域存在缺陷特征,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项,若是,则执行步骤S4-S6,否则执行步骤S6;
    S4、对存在缺陷特征的至少一处区域进行标注操作;
    S5、根据所述标注操作对所述预建模型进行再训练,并将所述目标图像信息输入再训练后的模型,以得到新的结果图像;
    S6、基于当前结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
  2. 根据权利要求1所述的细胞信息统计方法,其特征在于,在步骤S5之后,重复执行步骤S3至S5,直至达到预设的重复次数,或者直至所述结果图像中没有存在缺陷特征的区域,则执行步骤S6。
  3. 根据权利要求1或2所述的细胞信息统计方法,其特征在于,所述预建模型通过以下步骤训练得到:
    M1、获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
    M2、设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
    M3、使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
  4. 根据权利要求3所述的细胞信息统计方法,其特征在于,步骤M3进一步包括:
    M301、获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
    M302、将其中的一样本图像输入基础模型,得到训练中结果图像;
    M303、确定所述训练中结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
    M304、对存在缺陷特征的至少一处区域进行标注操作;
    M305、根据所述标注操作对当前模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像;
    重复执行步骤M303至M305,直至达到预设的重复次数,或者直至所述训练中结果图像中没有存在缺陷特征的区域,再执行步骤M306;
    M306、将所述基础模型更新为当前模型;
    M307、遍历所述样本图像集,将下一样本图像输入当前模型以得到当前训练中结果图像,并执行步骤M303至M306,直至其中的样本图像经历预设轮次的遍历后执行M308;
    M308、保存当前模型,得到所述预建模型。
  5. 根据权利要求3或4所述的细胞信息统计方法,其特征在于,在步骤M3之后还包括:
    M4、获取其他细胞系的多张图像,并对其进行人工标注,得到其他细胞系的标注样本;
    M5、将其他细胞系的标注样本输入所述预建模型,并使用半监督学习方法进行再训练,得到支持该其他细胞系的预建模型。
  6. 根据权利要求1至5中任一项所述的细胞信息统计方法,其特征在于,步骤S5中根据所述标注操作对所述预建模型进行再训练的过程中,仅对进行标注操作的区域进行关注和学习,包括:根据标注操作提取特征,以对所述模型进行再训练。
  7. 根据权利要求3至6中任一项所述的细胞信息统计方法,其特征在于,所述仅对进行标注操作的区域进行关注和学习包括:
    划定进行标注操作的区域范围,使得所划定的范围包括标注操作所涉及的细胞;
    生成所划定的区域范围的子图像,将该子图像及对应的标注操作信息作为深度学习样本输入待再训练的模型;
    以对应的标注操作信息为学习目标,深度学习所述子图像的特征,以更新所述模型中的神经网络参数,得到再训练后的模型。
  8. 根据权利要求1至7中任一项所述的细胞信息统计方法,其特征在于,所述标注操作包括人工标注形式的背景标注和/或细胞标注。
  9. 根据权利要求1至8中任一项所述的细胞信息统计方法,其特征在于,步骤S2中从预建的模型数据库中选择其中一个预建模型,在步骤S6之前或之后还包括:
    S7、保存当前的再训练的模型,并将其添加到所述模型数据库中。
  10. 根据权利要求1至9中任一项所述的细胞信息统计方法,其特征在于,步骤S1中的所述待信息统计的细胞群的目标图像信息为无染色无荧光细胞图像、染色细胞图像、荧光细胞图像、相衬图像、明场图像中的一种或多种。
  11. 一种细胞信息统计装置,用于对细胞群的细胞进行信息统计,其特征在于,包括以下模块:
    待统计目标获取模块,其被配置为获取待信息统计的细胞群的目标图像信息;
    预建模型调用模块,其被配置为选择预建模型,并将所述目标图像信息输入到所述预建模型,得到结果图像;
    缺陷特征确定模块,其被配置为确定所述结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
    标注及再训练模块,其被配置为接收所述缺陷特征确定模块的确定结果,对存在缺陷特征的至少一处区域进行标注操作,再根据所述标注操作对所述预建模型进行再训练,并将所述目标图像信息输入再训练后的模型,以得到新的结果图像;
    统计模块,其被配置为基于结果图像,对细胞信息进行统计,所述细胞信息包括细胞数量、细胞大小、细胞形状中的一种或多种。
  12. 根据权利要求11所述的细胞信息统计装置,其特征在于,所述预建模型调用模块包括以下单元:
    标注样本单元,其被配置为获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
    基础模型单元,其被配置为设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
    预建模型单元,其被配置为使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
  13. 根据权利要求12所述的细胞信息统计装置,其特征在于,所述预建模型单元包括以下子单元:
    样本图像集获取子单元,其被配置为获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
    遍历子单元,其被配置为遍历所述样本图像集,每次将其中的一个样本图像发送至训练子单元;
    训练子单元,其被配置为将接收到的样本图像输入基础模型,得到训练中结果图像;
    缺陷判别子单元,其被配置为接收所述训练子单元得到的所述训练中结果图像,并判别所述训练中结果图像是否有区域存在缺陷特征,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
    标记及再训练子单元,其被配置为对存在缺陷特征的至少一处区域进行标注操作;再根据所述标注操作对所述基础模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像,并将其发送至所述缺陷判别子单元;
    模型更新子单元,其被配置为当所述缺陷判别子单元判别到无缺陷特征时,将所述基础模型更新为当前模型;且触发所述遍历子单元将下一个样本图像发送至所述训练子单元;
    模型保存子单元,其被配置为响应于所述遍历子单元遍历过程结束,保存当前模型,得到所述预建模型。
  14. 一种电子设备,包括处理器和存储器,其中,所述存储器用于存储程序指令,所述处理器被配置为运行所述程序指令,其特征在于,所述程序指令被运行而执行如权利要求1至10中任一项所述的方法的步骤。
  15. 一种计算机可读存储介质,用于存储程序指令,其特征在于,所述程序指令被配置为调用而执行如权利要求1至10中任一项所述的方法的步骤。
  16. 一种计算机程序产品,包括被可读存储的计算机程序,所述计算机程序包括程序指令, 其特征在于,当所述程序指令在计算机设备上运行时,计算机设备执行如权利要求1至10中任一项所述的方法的步骤。
  17. 一种模型训练方法,其特征在于,包括以下步骤:
    M1、获取细胞系的多张图像,并对其进行人工标注,得到标注学习样本;
    M2、设计深度学习模型,并利用所述标注学习样本对所述深度学习模型进行训练,得到基础模型;
    M3、使用半监督学习方法对所述基础模型进行再训练,所述半监督学习方法采用仅对进行标注操作的区域进行关注和学习的方式,得到所述预建模型。
  18. 根据权利要求17所述的模型训练方法,其特征在于,步骤M3进一步包括:
    M301、获取用于建立预建模型的样本图像集,所述样本图像集包括多张样本图像;
    M302、将其中的一样本图像输入基础模型,得到训练中结果图像;
    M303、确定所述训练中结果图像中存在缺陷特征的区域,所述缺陷特征包括细胞黏连、细胞至少局部消失、细胞异形中的一项或多项;
    M304、对存在缺陷特征的至少一处区域进行标注操作;
    M305、根据所述标注操作对当前模型进行再训练,再训练的过程中仅对进行标注操作的区域进行关注和学习,并基于再训练后的当前模型得到新的训练中结果图像;
    重复执行步骤M303至M305,直至达到预设的重复次数,或者直至所述训练中结果图像中没有存在缺陷特征的区域,再执行步骤M306;
    M306、将所述基础模型更新为当前模型;
    M307、遍历所述样本图像集,将下一样本图像输入当前模型以得到当前训练中结果图像,并执行步骤M303至M306,直至其中的样本图像经历预设轮次的遍历后执行M308;
    M308、保存当前模型,得到所述预建模型。
  19. 根据权利要求17或18所述的模型训练方法,其特征在于,所述仅对进行标注操作的区域进行关注和学习包括:
    划定进行标注操作的区域范围,使得所划定的范围包括标注操作所涉及的细胞;
    生成所划定的区域范围的子图像,将该子图像及对应的标注操作信息作为深度学习样本输入待再训练的模型;
    以对应的标注操作信息为学习目标,深度学习所述子图像的特征,以更新所述模型中的神经网络参数,得到再训练后的模型。
PCT/CN2023/093467 2022-05-13 2023-05-11 细胞信息统计方法、装置、设备及计算机可读存储介质 WO2023217222A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210522470.9A CN114972222A (zh) 2022-05-13 2022-05-13 细胞信息统计方法、装置、设备及计算机可读存储介质
CN202210522470.9 2022-05-13

Publications (1)

Publication Number Publication Date
WO2023217222A1 true WO2023217222A1 (zh) 2023-11-16

Family

ID=82983656

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/093467 WO2023217222A1 (zh) 2022-05-13 2023-05-11 细胞信息统计方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114972222A (zh)
WO (1) WO2023217222A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497064A (zh) * 2023-12-04 2024-02-02 电子科技大学 基于半监督学习的单细胞三维基因组数据分析方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972222A (zh) * 2022-05-13 2022-08-30 徕卡显微系统科技(苏州)有限公司 细胞信息统计方法、装置、设备及计算机可读存储介质
EP4375883A1 (en) * 2022-11-26 2024-05-29 PreciPoint Group GmbH Method for operating a distributed digital microscopy system and distributed digital microscopy system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170213067A1 (en) * 2016-01-26 2017-07-27 Ge Healthcare Bio-Sciences Corp. Automated cell segmentation quality control
CN110705403A (zh) * 2019-09-19 2020-01-17 平安科技(深圳)有限公司 细胞分类方法、装置、介质及电子设备
CN111210024A (zh) * 2020-01-14 2020-05-29 深圳供电局有限公司 模型训练方法、装置、计算机设备和存储介质
CN113570007A (zh) * 2021-09-27 2021-10-29 深圳市信润富联数字科技有限公司 零件缺陷识别模型构建优化方法、装置、设备及存储介质
CN113674292A (zh) * 2021-08-17 2021-11-19 厦门理工学院 一种基于部分实例标注的半监督骨髓瘤细胞实例分割方法
CN114972222A (zh) * 2022-05-13 2022-08-30 徕卡显微系统科技(苏州)有限公司 细胞信息统计方法、装置、设备及计算机可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978850B (zh) * 2019-03-21 2020-12-22 华南理工大学 一种多模态医学影像半监督深度学习分割系统
CN112750106B (zh) * 2020-12-31 2022-11-04 山东大学 一种基于非完备标记的深度学习的核染色细胞计数方法、计算机设备、存储介质
CN113205085B (zh) * 2021-07-05 2021-11-19 武汉华信数据系统有限公司 一种图像识别方法和装置
CN113762286A (zh) * 2021-09-16 2021-12-07 平安国际智慧城市科技股份有限公司 数据模型训练方法、装置、设备及介质
CN114155398A (zh) * 2021-11-29 2022-03-08 杭州涿溪脑与智能研究所 一种标注类型自适应的主动学习图像目标检测方法及装置
CN114155412A (zh) * 2022-02-09 2022-03-08 北京阿丘科技有限公司 深度学习模型迭代方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170213067A1 (en) * 2016-01-26 2017-07-27 Ge Healthcare Bio-Sciences Corp. Automated cell segmentation quality control
CN110705403A (zh) * 2019-09-19 2020-01-17 平安科技(深圳)有限公司 细胞分类方法、装置、介质及电子设备
CN111210024A (zh) * 2020-01-14 2020-05-29 深圳供电局有限公司 模型训练方法、装置、计算机设备和存储介质
CN113674292A (zh) * 2021-08-17 2021-11-19 厦门理工学院 一种基于部分实例标注的半监督骨髓瘤细胞实例分割方法
CN113570007A (zh) * 2021-09-27 2021-10-29 深圳市信润富联数字科技有限公司 零件缺陷识别模型构建优化方法、装置、设备及存储介质
CN114972222A (zh) * 2022-05-13 2022-08-30 徕卡显微系统科技(苏州)有限公司 细胞信息统计方法、装置、设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Doctoral Dissertation", 1 December 2018, NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS, CN, article SHAO, WEI: "Research on Machine-Learning-Based Cell Microscopic Image Analysis", pages: 1 - 121, XP009550425, DOI: 10.27239/d.cnki.gnhhu.2018.000224 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497064A (zh) * 2023-12-04 2024-02-02 电子科技大学 基于半监督学习的单细胞三维基因组数据分析方法

Also Published As

Publication number Publication date
CN114972222A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2023217222A1 (zh) 细胞信息统计方法、装置、设备及计算机可读存储介质
Bacanin et al. Artificial neural networks hidden unit and weight connection optimization by quasi-refection-based learning artificial bee colony algorithm
Gui et al. Depression detection on social media with reinforcement learning
US20220414470A1 (en) Multi-Task Attention Based Recurrent Neural Networks for Efficient Representation Learning
CN113239189A (zh) 一种文本情感领域分类的方法及系统
Momeni et al. Deep recurrent attention models for histopathological image analysis
CN113609337A (zh) 图神经网络的预训练方法、训练方法、装置、设备及介质
CN115687610A (zh) 文本意图分类模型训练方法、识别方法、装置、电子设备及存储介质
CN117608650B (zh) 业务流程图生成方法、处理设备及存储介质
US12026191B2 (en) System and method for processing biology-related data, a system and method for controlling a microscope and a microscope
US11960518B2 (en) System and method for processing biology-related data, a system and method for controlling a microscope and a microscope
US20210319269A1 (en) Apparatus for determining a classifier for identifying objects in an image, an apparatus for identifying objects in an image and corresponding methods
Harris et al. DeepAction: a MATLAB toolbox for automated classification of animal behavior in video
CN112270334B (zh) 一种基于异常点暴露的少样本图像分类方法及系统
CN110059743B (zh) 确定预测的可靠性度量的方法、设备和存储介质
CN114391162A (zh) 用于处理生物学相关数据的系统和方法及显微镜
US20220084306A1 (en) Method and system of guiding a user on a graphical interface with computer vision
Huang et al. A semi-supervised cross-modal memory bank for cross-modal retrieval
CN114529191A (zh) 用于风险识别的方法和装置
Jeyachitra et al. Machine learning and deep learning: Classification and regression problems, recurrent neural networks, convolutional neural networks
KR102636461B1 (ko) 인공지능 모델 학습을 위한 오토 레이블링 자동화 방법, 장치 및 시스템
EP4273608A1 (en) Automatic acquisition of microscopy image sets
CN117764536B (zh) 一种基于人工智能的创新创业项目辅助管理系统
Sultana et al. Human Emotion Recognition from Facial Images Using Convolutional Neural Network
CN117390175B (zh) 基于bert的智能家居使用事件抽取方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802991

Country of ref document: EP

Kind code of ref document: A1