CN113111879B

CN113111879B - Cell detection method and system

Info

Publication number: CN113111879B
Application number: CN202110483051.4A
Authority: CN
Inventors: 范伟亚
Original assignee: Shanghai Ruiyu Biotech Co Ltd
Current assignee: Shanghai Ruiyu Biotech Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-11-10
Anticipated expiration: 2041-04-30
Also published as: CN113111879A

Abstract

The embodiment of the application discloses a cell detection method and a cell detection system, wherein the method comprises the following steps: acquiring a cell image to be detected; processing the cell image by using a target recognition model, and determining a recognition result of the cell image; the object recognition model includes at least a separable residual convolution network, a region generation network, a region of interest pooling layer, and a plurality of fully connected layers.

Description

Cell detection method and system

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and a system for detecting cells.

Background

Biomedical image analysis is classified into macroscopic image analysis and microscopic image analysis, and macroscopic image analysis is medical instrument equipment manufactured by applying medical imaging technology, including X-ray, CT, MRI (magnetic resonance), ultrasound doppler, and the like. Microscopic image analysis refers to that a microscope is connected with a computer, and the images of cells and tissues under the microscope are processed and analyzed by the computer. Cell detection is an important research content in biomedical image analysis, cell counting can be achieved through cell detection, and potential diseases and related diseases can be detected through accurate cell counting.

For this reason, it is desirable to provide a method and system for cell detection that improves the accuracy of cell detection.

Disclosure of Invention

One of the embodiments of the present disclosure provides a cell detection method, the method comprising: acquiring a cell image to be detected; processing the cell image by using a target recognition model, and determining a recognition result of the cell image; the target recognition model at least comprises a separable residual convolution network, a region generation network, a region of interest pooling layer and a plurality of fully connected layers, and the processing of the target recognition model comprises the following steps: processing the cell image by using the separable residual convolution network to determine a first characteristic image; processing the first characteristic image by using the region generation network to determine a candidate region; processing the candidate region and the first characteristic image by using the region-of-interest pooling layer to determine a second characteristic image; processing the second characteristic images by using the plurality of full-connection layers respectively to obtain a plurality of third characteristic images; the plurality of fully connected layers have different numbers of neurons; performing fusion processing on the plurality of third characteristic images to obtain a fourth characteristic image; and carrying out classification processing and regression processing on the fourth characteristic image, and determining the identification result.

In some embodiments, the plurality of fully connected layers includes three fully connected layers having a neuron number of 2048, 512, and 128, respectively.

In some embodiments, the processing the first feature image with the region generation network to determine a candidate region includes: performing sliding processing on the first characteristic image by utilizing a sliding window, and determining a plurality of first center points; mapping processing is carried out on the cell images based on the first center points, and a second center points are determined; generating a plurality of candidate anchor frames based on the anchor frames of the preset size at the position of each of the plurality of second center points; the anchor frame with the preset size is obtained based on training data of the target recognition model; the candidate region is determined based on the plurality of candidate anchor boxes for each of the plurality of second center points.

In some embodiments, the anchor frame with the preset size is obtained by processing size data of the labeling frame in the training data based on a clustering algorithm. In some embodiments, the clustering algorithm includes, but is not limited to, one or more of the following: k-means clustering algorithm, mean shift algorithm and density-based clustering algorithm. In some embodiments, the predetermined size of the anchor frame includes 8×8 and 16×16 in area, and the aspect ratio includes anchor frames corresponding to 1:1, 1:1.5 and 2:1.

In some embodiments, the target recognition model further comprises a random deactivation layer; the classifying and regression processing are carried out on the fourth characteristic image, and the identification result is determined, which comprises the following steps: processing the fourth characteristic image by utilizing the random inactivation layer to determine a fifth characteristic image; and carrying out classification processing and regression processing on the fifth characteristic image, and determining the identification result.

One of the embodiments of the present disclosure provides a cell detection system, the system comprising: the acquisition module is used for acquiring a cell image to be detected; the processing module is used for processing the cell image by utilizing a target recognition model and determining a recognition result of the cell image; the target recognition model at least comprises a separable residual convolution network, a region generation network, a region of interest pooling layer and a plurality of fully connected layers, and the processing of the target recognition model comprises the following steps: processing the cell image by using the separable residual convolution network to determine a first characteristic image; processing the first characteristic image by using the region generation network to determine a candidate region; processing the candidate region and the first characteristic image by using the region-of-interest pooling layer to determine a second characteristic image; processing the second characteristic images by using the plurality of full-connection layers respectively to obtain a plurality of third characteristic images; the plurality of fully connected layers have different numbers of neurons; performing fusion processing on the plurality of third characteristic images to obtain a fourth characteristic image; and carrying out classification processing and regression processing on the fourth characteristic image, and determining the identification result.

One of the embodiments of the present disclosure provides a cell detection device, including a processor, the device including a processor and a memory, the memory being configured to store instructions, the processor being configured to execute the instructions to implement operations corresponding to the cell detection method according to any one of the preceding claims.

One of the embodiments of the present disclosure provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform operations corresponding to the cell detection method as set forth in any one of the preceding claims.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a schematic illustration of an application scenario of a cell detection system according to some embodiments of the present disclosure;

FIG. 2 is an exemplary flow chart of a cell detection method according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram of an exemplary architecture of an object recognition model shown in accordance with some embodiments of the present description;

FIG. 4 is an exemplary flow chart for determining candidate regions according to some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Fig. 1 is a schematic diagram of an application scenario of a cell detection system according to some embodiments of the present disclosure.

As shown in fig. 1, the cell detection system 100 may include a processing device 110, a network 120, and a user terminal 130.

The processing device 110 may be used to process information and/or data associated with cell detection to perform one or more functions disclosed in this specification. In some embodiments, the processing device 110 may be used to acquire an image of the cells to be detected. In some embodiments, the processing device 110 may process the cell image using the target recognition model to determine a recognition result of the cell image. In some embodiments, processing device 110 may include one or more processing engines (e.g., single core processing engines or multi-core processors). By way of example only, the processing device 110 may include one or more combinations of a central processing unit (cpu), an Application Specific Integrated Circuit (ASIC), an application specific instruction set processor (ASIP), an image processor (GPU), a physical arithmetic processing unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, and the like.

The network 120 may facilitate the exchange of information and/or data. In some embodiments, one or more components of the cell detection system 100 (e.g., the processing device 110, the user terminal 130) may communicate information to other components of the cell detection system 100 over the network 120. For example, the processing device 110 may acquire the cell image to be detected sent by the user terminal 130 through the network 120. For another example, the user terminal 130 may acquire the identification result of the cell image determined by the processing device 110 through the network 120. In some embodiments, network 120 may be any form of wired or wireless network, or any combination thereof. By way of example only, the network 120 may be one or more combinations of a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth network, and the like.

The user terminal 130 may be a device with data acquisition, storage and/or transmission capabilities. In some embodiments, the user of the user terminal 130 may be a user/operator (e.g., doctor) using the cell detection system. In some embodiments, user terminal 130 may include, but is not limited to, a mobile device 130-1, a tablet computer 130-2, a notebook computer 130-3, a desktop computer 130-4, and the like, or any combination thereof. Exemplary mobile devices 130-1 may include, but are not limited to, smartphones, personal digital assistants (Personal Digital Assistance, PDAs), palm game consoles, smartwatches, wearable devices, virtual display devices, display enhancement devices, and the like, or any combination thereof. In some embodiments, the user terminal 130 may send the acquired data to one or more devices in the cell detection system 100. For example, the user terminal 130 may transmit the acquired data to the processing device 110. In some embodiments, the data acquired by the user terminal 130 may be an image of the cells to be detected.

In some embodiments, the cell detection system 100 may include an acquisition module and a processing module.

In some embodiments, the acquisition module may be used to acquire an image of the cells to be detected.

In some embodiments, the processing module may be configured to process the cell image using a target recognition model to determine a recognition result of the cell image; the target recognition model at least comprises a separable residual convolution network, a region generation network, a region of interest pooling layer and a plurality of fully connected layers, and the processing of the target recognition model comprises the following steps: processing the cell image by using the separable residual convolution network to determine a first characteristic image; processing the first characteristic image by using the region generation network to determine a candidate region; processing the candidate region and the first characteristic image by using the region-of-interest pooling layer to determine a second characteristic image; processing the second characteristic images by using the plurality of full-connection layers respectively to obtain a plurality of third characteristic images; the plurality of fully connected layers have different numbers of neurons; performing fusion processing on the plurality of third characteristic images to obtain a fourth characteristic image; and carrying out classification processing and regression processing on the fourth characteristic image, and determining the identification result.

In some embodiments, the processing module may be further configured to perform a sliding process on the first feature image using a sliding window, and determine a plurality of first center points; mapping processing is carried out on the cell images based on the first center points, and a second center points are determined; generating a plurality of candidate anchor frames based on the anchor frames of the preset size at the position of each of the plurality of second center points; the anchor frame with the preset size is obtained based on training data of the target recognition model; the candidate region is determined based on the plurality of candidate anchor boxes for each of the plurality of second center points.

In some embodiments, the target recognition model further comprises a random deactivation layer; the processing module may be further to: processing the fourth characteristic image by utilizing the random inactivation layer to determine a fifth characteristic image; and carrying out classification processing and regression processing on the fifth characteristic image, and determining the identification result.

It should be understood that the system shown in fig. 1 and its modules may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may then be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system of the present specification and its modules may be implemented not only with hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also with software executed by various types of processors, for example, and with a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the cell detection system and its modules is for convenience of description only and is not intended to limit the present disclosure to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the principles of the system, various modules may be combined arbitrarily or a subsystem may be constructed in connection with other modules without departing from such principles. In some embodiments, the acquisition module and the processing module disclosed in fig. 1 may be different modules in one system, or may be one module to implement the functions of two or more modules described above. For example, each module may share one memory module, or each module may have a respective memory module. Such variations are within the scope of the present description.

FIG. 2 is an exemplary flow chart of a method of cell detection according to some embodiments of the present disclosure. As shown in fig. 2, the process 200 may include the steps of:

at step 210, an image of the cell to be detected is acquired. In some embodiments, step 210 may be performed by the acquisition module.

In some embodiments, the cell image to be detected may be an image of the need for cell detection, which may be the type and location of cells in the detection image. In some embodiments, the cell image to be detected may be an image of any format. Such as RGB format, HSV format, YUV format, etc. In some embodiments, the cell image to be detected may be an image obtained after pretreatment. Preprocessing may include, but is not limited to, normalization, size conversion, and the like. For example, the cell image to be detected may be an image converted to a preset size after normalization, and the preset size may be specifically set according to actual requirements.

In some embodiments, the acquisition module may acquire the detection image from the user terminal 130, the storage unit, or the database.

And 220, processing the cell image by using a target recognition model, and determining a recognition result of the cell image. In some embodiments, step 220 may be performed by a processing module.

In some embodiments, the target recognition model may be a pre-trained machine learning model. A processing device (e.g., processing device 110) may acquire a trained object recognition model to process the cell image and determine a recognition result of the cell image. In some embodiments, the recognition results may include classification results of the cells, which may reflect the class of the cells, and regression results, which may reflect the location of the cells. In some embodiments, the location of the cell may be represented by a detection frame of the cell, i.e., a border surrounding the cell. Specific details regarding the recognition result may be found in the following related description, and will not be described here.

In some embodiments, the target recognition model may be trained by: acquiring a plurality of sample cell images carrying labels; the label is used for representing the labeling category and labeling position of at least one sample cell in the sample cell image; and updating parameters of the initial target recognition model through multiple iterations based on the multiple sample cell images to reduce loss function values corresponding to the sample cell images, so as to obtain a trained target recognition model.

In some embodiments, the sample cell image may be data input into the initial target recognition model for training the target recognition model. In some embodiments, the tag may be used to characterize some kind of real information of the sample cell image. For example, the tag may characterize the true class and true location of the sample cell in the sample cell image. In some embodiments, the annotation location may be represented by an annotation box. In some embodiments, the annotation boxes may be obtained by manual annotation. For example, the border surrounding the sample cells is manually determined and labeled.

In some embodiments, the target recognition model may be obtained by an online training or offline training of a processing device (e.g., processing device 110 or other processing device). During model training, the processing device may continually update parameters of the initial target recognition model based on the plurality of sample cell images. Specifically, the processing device may continuously adjust parameters of the initial target recognition model to reduce the loss function value corresponding to each sample cell image, so that the loss function value satisfies a preset condition. For example, the loss function value converges, or the loss function value is smaller than a preset value. And when the loss function meets the preset condition, model training is completed, and a trained target recognition model is obtained.

In some embodiments, the loss functions of the training object recognition model may include category loss functions and location loss functions. In some embodiments, the class loss function may employ a cross entropy loss function or a Focal loss modified cross entropy loss function. Preferably, the class loss function may employ a Focal loss-improved cross entropy loss function. In some embodiments, the position loss function may employ an L1 loss mean absolute error, an L2 loss mean squared error loss function, a Smooth L1 loss function, a IoU loss function, a GIoU loss function, a DIoU loss function, or a CIoU loss function.

In some embodiments, when the class loss function employs a Focal loss modified cross entropy loss function, the trained class loss function may be equation (1):

wherein L is _{focal loss} The model is used for representing class loss functions, alpha and gamma represent balance factors, y' represents a class predicted value of a target recognition model on sample cells in a sample cell image, and y represents a labeling class of the sample cells in the sample cell image. In some embodiments, α and γ may be empirically derived values. For example, α is 0.25 and γ is 2.

Because the proportion of sample cells in the sample cell image is smaller than that of the background area in the image, a large number of negative samples can be generated in the training process of the target recognition model, and the proportion of the positive and negative samples is seriously unbalanced. By utilizing the balance factors in the Focal loss improved cross entropy loss function, the loss of the samples easy to classify can be reduced, so that the model is more focused on difficult and misplaced samples, meanwhile, the proportion of positive and negative samples can be balanced, the serious unbalance of the proportion of the positive and negative samples is avoided, and the training effect of the target recognition model is improved.

In some embodiments, the training employs an initial target recognition model that is the same structure as the training completed target recognition model. Correspondingly, in the training process of the target recognition model, the loss function value corresponding to each sample cell image can be determined through the following processes: processing the sample cell image by using a separable residual convolution network to determine a first sample characteristic image; processing the first characteristic image of the sample by using a region generation network to determine a sample candidate region; processing the sample candidate region and the first sample characteristic image by using the region-of-interest pooling layer to determine a second sample characteristic image; processing the second sample characteristic images by using a plurality of full-connection layers respectively to obtain a plurality of third sample characteristic images; performing fusion processing on the plurality of third sample feature images to obtain a fourth sample feature image; classifying and regressing the fourth sample characteristic image to determine a sample recognition result; a loss function value is determined based at least on a difference between a sample identification result of at least one sample cell image and a label of the sample cell image. In some embodiments, the fourth sample feature image may also be processed with a random inactivation layer to determine a fifth sample feature image; and carrying out classification processing and regression processing on the fifth sample characteristic image, and determining a sample identification result. The processing of the sample cell image in the training process of the target recognition model is the same as the processing of the cell image to be detected in the application process of the target recognition model, and specifically, reference may be made to the following related description, which is not repeated here.

As shown in fig. 3, in some embodiments, the object recognition model 300 may include at least a separable residual convolution network, a region generation network, a region of interest pooling layer, and a plurality of fully connected layers.

In some embodiments, the separable residual convolution network may process the cell image to be detected to determine a first feature image. In some embodiments, the separable residual convolution network may extract features in the cell image. For example, a directional gradient histogram (Histogram of Oriented Gradient, HOG) feature, an LBP (Local Binary Pattern ) feature, a Haar feature, and the like are extracted.

In some embodiments, the separable residual convolution network may include a plurality of depth convolution kernels and a plurality of point convolution kernels. In some embodiments, the plurality of depth convolution kernels may respectively convolve a plurality of channels included in the cell image to obtain a spatial feature image of the plurality of channels. The convolution kernels can carry out convolution processing on the space feature images of the channels to obtain a plurality of point feature images. Taking a cell image as an RGB image as an example, the separable residual convolution network may include three depth convolution kernels and three point convolution kernels, where the three depth convolution kernels respectively convolve R channels, G channels, and B channels to obtain spatial feature images of the three channels. In some embodiments, the processing device may fuse the plurality of point feature images to determine the first feature image. In some embodiments, the processing device may fuse the plurality of point feature images using a feature fusion layer (e.g., contact layer) to determine a first feature image. In some embodiments, the depth convolution kernel may be a convolution kernel of 3*3 and the point-wise convolution kernel may be a convolution kernel of 1*1.

In some embodiments, the processing device may further process the cell image to be detected with a deep convolutional neural network comprising a separable residual convolutional network to determine the first feature image. In some embodiments, the deep convolutional neural network may be one or more combinations of LeNet, alexNet and vgnet. Preferably, the deep convolutional neural network may be Vgg16 in VggNet. In some embodiments, the processing device may replace the convolutional network (e.g., convolutional block) included in the deep convolutional neural network with a separable residual convolutional network, resulting in a deep convolutional neural network that includes a separable residual convolutional network. For example, the convolutional network (e.g., convolutional block) contained in Vgg16 is replaced with a separable residual convolutional network.

In the embodiment of the specification, the separable residual convolution network is adopted to separate the conventional convolution in the space dimension, the network width is increased, the extracted characteristics are enriched, and the point-by-point convolution is performed, so that the calculation complexity of convolution operation is reduced, the parameter number is reduced, and the calculation speed is improved. Meanwhile, the separable residual convolution network adopts direct connection and is identical mapping, so that the problem of gradient disappearance of the target recognition model is well solved.

In some embodiments, the object recognition model may also include a sampled convolution layer. In some embodiments, the sampling convolution layer may be connected to a separable residual convolution network, and the output of the sampling convolution layer may be an input to the separable residual convolution network. In some embodiments, a convolution network of a sampling convolution layer and a separable residual convolution network may process an image of a cell to be detected to determine a first feature image. In some embodiments, the sampling convolution layer may be a hole convolution layer. The cavity convolution layer can enlarge the receptive field of the convolution kernel, so that the convolution network can better extract the global features of the image, and the recognition accuracy of the target recognition model is improved.

In some embodiments, the region-generating network may be a network for extracting candidate regions. The candidate region may characterize a possible location of the cell in the cell image to be detected. In some embodiments, the region generation network may generate candidate regions using a sliding window. The specific details of the region generation network generation candidate region may be referred to in fig. 4 and the related description thereof, and will not be described herein.

In some embodiments, the region of interest pooling layer may process the candidate region and the first feature image to determine a second feature image. In some embodiments, the region of interest pooling layer may extract features of each candidate region from the first feature image and cause each candidate region to generate a feature map of a fixed size, resulting in a second feature image. In some embodiments, the second feature image may include feature vectors of the same dimension for each candidate region. The dimension of the feature vector corresponding to each candidate region is the same through the region-of-interest pooling layer, so that the input requirement of the subsequent full-connection layer can be met, and the processing of the subsequent full-connection layer is facilitated.

In some embodiments, the fully connected layer may be used to linearly change one feature space to another. In some embodiments, the processing device may perform feature extraction on the second feature images by using the plurality of fully connected layers, to obtain a plurality of third feature images.

In some embodiments, the plurality of third feature images may reflect high-level features at different scales. In some embodiments, the plurality of fully connected layers have different numbers of neurons. In some embodiments, the plurality of fully connected layers includes three fully connected layers having a neuron number of 2048, 512, and 128, respectively.

In the embodiment of the specification, the feature extraction is performed on the second feature image by adopting the full-connection layers of three different neurons, so that the image feature representation of different scales can be realized, the high-level features under different scales can be extracted, and the accuracy of the subsequent cell detection is improved.

In some embodiments, the processing device may perform fusion processing on the plurality of third feature images to obtain a fourth feature image. In some embodiments, the processing device may perform fusion processing on the plurality of third feature images by using the feature fusion layer to obtain a fourth feature image. In some embodiments, the feature fusion layer may be a contact layer.

In some embodiments, the target recognition model may also include a random deactivation layer. In some embodiments, the processing device may process the fourth feature image with a random deactivation layer to determine a fifth feature image. Partial neuron outputs in the full-connection layer can be discarded by using the random inactivation layer, so that joint adaptability among neuron nodes is weakened, and generalization capability of the model is enhanced. Correspondingly, the target recognition model is obtained through training an initial target recognition model comprising a random inactivation layer, the random inactivation layer can avoid the overfitting phenomenon in the model training process, and the generalization capability of the model is improved.

In some embodiments, the processing device may perform classification processing and regression processing on the fourth feature image to determine the recognition result. In some embodiments, the processing device may further perform classification processing and regression processing on the fifth feature image, and determine the recognition result.

In some embodiments, the processing device may process the fourth feature image or the fifth feature image with the classification layer to obtain the classification recognition result. In some embodiments, the classification layer may include a full connection layer and a normalization layer, or a convolution layer and a normalization layer. The normalization layer may include a softmax layer or a logits layer. In some embodiments, the processing device may process the fourth feature image or the fifth feature image by using a regressive device to obtain a regression identification result.

FIG. 4 is an exemplary flow chart for determining candidate regions according to some embodiments of the present description. As shown in fig. 4, the process 400 may include steps 410-440. In some embodiments, steps 410-440 may be performed by a processing module.

In step 410, a sliding process is performed on the first feature image using a sliding window, and a plurality of first center points are determined.

In some embodiments, the sliding process is performed on the first feature image using a sliding window, which may be understood as performing a full-image scan on the first feature image using the sliding window. In some embodiments, the processing device may perform a sliding process on the first feature image with a preset step size using a sliding window of a preset size, to determine a plurality of first center points. For example, the preset size of the sliding window may be 3*3 and the preset step size may be 2. In some embodiments, the plurality of first center points may be center points of the sliding window corresponding to each sliding position, the center points being center points of the sliding window.

Step 420, determining a plurality of second center points based on the mapping process performed by the plurality of first center points on the cell image.

Step 430, generating a plurality of candidate anchor frames based on the anchor frames with preset sizes at the position of each of the plurality of second center points; and obtaining the anchor frame with the preset size based on the training data of the target recognition model.

In some embodiments, the plurality of second center points may be projection points generated after the first center points are mapped on the cell image. In some embodiments, the anchor frame with a preset size can be specifically set according to practical situations. In some embodiments, the anchor boxes of the preset size may be derived based on training data of the target recognition model. In some embodiments, the training data for the target recognition model may include a plurality of sample cell data carrying tags. As previously described, the tag may be used to characterize the labeling category and labeling location of at least one cell in the sample cell image, and the labeling location may be characterized by a labeling frame.

In some embodiments, the anchor boxes of the preset size may be derived based on the annotation boxes in the training data of the target recognition model. In some embodiments, the anchor frame with the preset size can be obtained by processing size data of the labeling frame in the training data based on a clustering algorithm. In some embodiments, the clustering algorithm may include, but is not limited to, one or more of the following: k-means clustering algorithm, mean shift algorithm and density-based clustering algorithm. Preferably, the clustering algorithm may be a K-means clustering algorithm. The size data of the labeling frame is processed through a clustering algorithm to obtain the preset size of the anchor frame, and the labeling frame is the labeling position of cells in the cell image, so that the labeling frame is more in line with the application scene of cell detection (or cell identification) of the target identification model, the size of the finally generated anchor frame is more in line with the actual situation, and the target identification model can accurately detect the cells in the cell image.

In some embodiments, the anchor boxes with preset sizes obtained based on the clustering algorithm may be anchor boxes with areas including 8×8 and 16×16, and aspect ratios including 1:1, 1:1.5 and 2:1. Accordingly, the anchor frames of the preset size may include 6 anchor frames with an area of 8×8, three anchor frames with aspect ratios of 1:1, 1:1.5 and 2:1, and three anchor frames with an area of 16×16, and aspect ratios of 1:1, 1:1.5 and 2:1, respectively. Because the cells are small targets, the area of the anchor frame is set to be 8 x 8 and 16 x 16, so that the cell objects can be better framed, and compared with the anchor frames with the traditional 9 sizes, the anchor frame size in the embodiment of the specification can reduce the number of the anchor frames, thereby reducing the parameter calculation of the target identification model, avoiding the slow detection speed caused by redundant parameter calculation, and reducing the calculation cost.

Step 440, determining the candidate region based on the plurality of candidate anchor boxes for each of the plurality of second center points.

In some embodiments, the candidate anchor frame may be an anchor frame determined in a preset size with the second center point as the center point of the candidate anchor frame. For example, taking the second center point as (x, y) as an example, the candidate anchor boxes may be three anchor boxes with a determined area of 8 x 8 and aspect ratios of 1:1, 1:1.5 and 2:1, and three anchor boxes with a determined area of 16 x 16 and aspect ratios of 1:1, 1:1.5 and 2:1, with the (x, y) as the center point.

In some embodiments, the processing device may determine a plurality of candidate anchor boxes for each of the plurality of second center points as candidate regions. For example, taking the above example as an example, the processing device may determine six anchor boxes at each second center point as candidate regions.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.

Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.

Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.

Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.

In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.

Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., referred to in this specification is incorporated herein by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the content of this specification, documents that are currently or later attached to this specification in which the broadest scope of the claims to this specification is limited are also. It is noted that, if the description, definition, and/or use of a term in an attached material in this specification does not conform to or conflict with what is described in this specification, the description, definition, and/or use of the term in this specification controls.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. A method of cell detection, the method comprising:

acquiring a cell image to be detected;

processing the cell image by using a target recognition model, and determining a recognition result of the cell image; the target recognition model at least comprises a separable residual convolution network, a region generation network, a region of interest pooling layer and a plurality of fully connected layers, and the processing of the target recognition model comprises the following steps:

processing the cell image by using the separable residual convolution network to determine a first characteristic image, wherein the separable residual convolution network comprises a plurality of depth convolution kernels and a plurality of point convolution kernels, the processing the cell image by using the separable residual convolution network is performed, and the determining of the first characteristic image is specifically as follows:

The plurality of depth convolution kernels are used for respectively carrying out convolution processing on a plurality of channels included in the cell image to obtain space feature images of the channels, wherein each depth convolution kernel corresponds to one channel;

the plurality of point-to-convolution checks the space feature images of the plurality of channels to carry out convolution processing to obtain a plurality of point feature images; and

fusing the plurality of point feature images to determine the first feature image;

processing the first characteristic image by using the region generation network to determine a candidate region;

processing the candidate region and the first characteristic image by using the region-of-interest pooling layer to determine a second characteristic image;

processing the second characteristic images by using the plurality of full-connection layers respectively to obtain a plurality of third characteristic images; the plurality of fully connected layers have different numbers of neurons;

performing fusion processing on the plurality of third characteristic images to obtain a fourth characteristic image;

performing classification processing and regression processing on the fourth characteristic image, and determining the identification result;

training the loss function of the object recognition model to include a class loss function and a position loss function, wherein the class loss function is an improved cross entropy loss function, and the improved cross entropy loss function formula is as follows:

Wherein L is _focalloss Representing the class loss function, alpha and gamma representing balance factors, y ^′ And representing a class prediction value of the target recognition model on the sample cells in the sample cell image, wherein y represents the labeling class of the sample cells in the sample cell image.

2. The method of claim 1, wherein the plurality of fully connected layers comprises three fully connected layers having a neuron number of 2048, 512, and 128, respectively.

3. The method of claim 1, wherein processing the first feature image with the region-generating network to determine candidate regions comprises:

performing sliding processing on the first characteristic image by utilizing a sliding window, and determining a plurality of first center points;

mapping processing is carried out on the cell images based on the first center points, and a second center points are determined;

generating a plurality of candidate anchor frames based on the anchor frames of the preset size at the position of each of the plurality of second center points; the anchor frame with the preset size is obtained based on training data of the target recognition model;

the candidate region is determined based on the plurality of candidate anchor boxes for each of the plurality of second center points.

4. The method of claim 3, wherein the anchor frame of the preset size is obtained by processing size data of a labeling frame in the training data based on a clustering algorithm.

5. The method of claim 4, wherein the clustering algorithm includes, but is not limited to, one or more of the following: k-means clustering algorithm, mean shift algorithm and density-based clustering algorithm.

6. The method of claim 4, wherein the predetermined size of anchor boxes comprises 8 x 8 and 16 x 16 in area and the aspect ratio comprises anchor boxes corresponding to 1:1, 1:1.5 and 2:1.

7. The method of claim 1, wherein the target recognition model further comprises a random deactivation layer; the classifying and regression processing are carried out on the fourth characteristic image, and the identification result is determined, which comprises the following steps:

processing the fourth characteristic image by utilizing the random inactivation layer to determine a fifth characteristic image;

and carrying out classification processing and regression processing on the fifth characteristic image, and determining the identification result.

8. A cell detection system, the system comprising:

the acquisition module is used for acquiring a cell image to be detected;

The processing module is used for processing the cell image by utilizing a target recognition model and determining a recognition result of the cell image; the target recognition model at least comprises a separable residual convolution network, a region generation network, a region of interest pooling layer and a plurality of fully connected layers, and the processing of the target recognition model comprises the following steps:

9. A cell detection apparatus comprising a processor and a memory for storing instructions, wherein the processor is configured to execute the instructions to perform operations corresponding to the cell detection method of any one of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions that, when executed by a processor, perform operations corresponding to the cell detection method of any one of claims 1 to 7.