CN109003267A

CN109003267A - From the computer implemented method and system of the automatic detected target object of 3D rendering

Info

Publication number: CN109003267A
Application number: CN201810789942.0A
Authority: CN
Inventors: 宋麒; 孙善辉; 陈翰博; 白军杰; 高峰; 尹游兵
Original assignee: Shenzhen Ke Ya Medical Technology Co Ltd
Current assignee: Shenzhen Ke Ya Medical Technology Co Ltd
Priority date: 2017-08-09
Filing date: 2018-07-18
Publication date: 2018-12-14
Anticipated expiration: 2038-07-18
Also published as: CN109003267B

Abstract

This disclosure relates to a kind of computer implemented method and system from the automatic detected target object of 3D rendering.This method may include receiving the 3D rendering obtained by imaging device.This method, which may further include, uses multiple bounding boxes of the 3D learning network detection comprising target object by processor.Learning network can be trained to generate multiple Feature Mappings of different scale based on 3D rendering.This method can also include one group of parameter of the bounding box for determining that identification is each detected using 3D learning network by processor, and position target object based on one group of parameter by processor.This method quickly and accurately and can automatically detect target object from 3D rendering by means of 3D learning network.

Description

From the computer implemented method and system of the automatic detected target object of 3D rendering

Cross reference to related applications

This application claims 62/542, No. 890 priority of the U.S. Provisional Application No. submitted for 9th in August in 2017, complete Portion's content is incorporated herein by reference.

Technical field

The disclosure relates generally to image procossings and analysis.More specifically, this disclosure relates to for automatically fixed from 3D rendering The method and system of position and detected target object.

Background technique

The accuracy and therapeutic effect of diagnosis depend on the quality of medical image analysis, especially target object (such as device Official, tissue, target site etc.) detection.Compared with conventional two-dimensional imaging, volume (3D) imaging, such as volume CT can be captured more Valuable medical information, to help more accurately to diagnose.But target object is usually by veteran healthcare givers (such as radiologist) detection, rather than detected by machine, this makes its troublesome, time-consuming and easy error.

One example is that Lung neoplasm is detected from lung images.Fig. 1 shows the shaft-like face image from volume chest CT An example.High density quality in white border frame corresponds to Lung neoplasm.In order to detect this Lung neoplasm, radiologist It must screening be hundreds of and thousands of images from volume CT scan.Due to lacking 3d space information, only identified from 2D image Tubercle is not a simple task.Differentiation lesser tubercle and blood vessel are very difficult in 2D image, because of 2D shaft-like view In blood vessel be also round or ellipse, look like tubercle.In general, radiologist needs to check adjacent image with virtual Ground (in brains) rebuilds 3d space relationship and/or checks that sagittal or coronal view (low resolution) are for reference.Therefore, it examines Survey the experience that Lung neoplasm depends entirely on radiologist.

Although some bare metal learning methods are introduced into for detecting, usually artificially definition is special for these methods Sign, and therefore accuracy in detection is low.In addition, this machine learning is typically limited to 2D image study, but empty due to lacking 3D Between information and 3D study needed for considerable computing resource, can not directly detect target object in the 3 d image.

Present disclose provides a kind of method and system, can by means of 3D learning network from 3D rendering quickly and accurately With automatically detect target object.It is this detection can include but is not limited to positioning target object, determine target object size, And the type (such as blood vessel or Lung neoplasm) of identification target object.

Summary of the invention

In an arrangement, this disclosure relates to it is a kind of for from the computer realization side of the automatic detected target object of 3D rendering Method.This method may include receiving the 3D rendering obtained by imaging device.This method may further include to be used by processor The detection of 3D learning network includes multiple bounding boxes of target object.Learning network can be trained to different to be generated based on 3D rendering Multiple Feature Mappings of scale.This method can also include determining that identification is each using 3D learning network by processor to detect Bounding box one group of parameter, and target object is positioned based on one group of parameter by processor.

In some embodiments, one group of parameter includes the seat for identifying the position of each bounding box in the 3D rendering Mark.

In some embodiments, one group of parameter includes identifying the size of the size of each bounding box.

In some embodiments, the 3D learning network is trained to execute recurrence to one group of parameter.

In some embodiments, the computer implemented method further comprises by multiple anchor frames and the 3D rendering phase Association, wherein one group of parameter indicates offset of each bounding box relative to respective anchors frame.

In some embodiments, each anchor frame is associated with the grid cell of Feature Mapping.

In some embodiments, the anchor frame is according to the scaling of the Feature Mapping.

In some embodiments, wherein the modified picture size of the multiple Feature Mapping.

In some embodiments, wherein the multiple Feature Mapping uses the sliding window of variable size.

In some embodiments, the computer implemented method further includes creation initial boundary frame, wherein detection includes Multiple bounding boxes of the target object include being classified as being associated with multiple labels by the initial boundary frame.

In some embodiments, the computer implemented method further includes to the bounding box detected using non-maximum suppression System.

In some embodiments, the computer implemented method further include: divide the 3D rendering to obtain convex closure simultaneously And the detection of the multiple bounding box is constrained using the convex closure.

In some embodiments, the learning network is further trained to divide institute in each bounding box detected State target object.

In some embodiments, wherein the imaging device is Computerized tomographic imaging system.

In some embodiments, the target object is Lung neoplasm.

In some embodiments, the learning network is full convolutional neural networks.

In another program, present disclosure also relates to for the system from the automatic detected target object of 3D rendering.The system can be with Including being configured as receiving the interface of the 3D rendering obtained by imaging device.The system can also include processor, the processor It is configured with multiple bounding boxes that the detection of 3D learning network includes target object.Learning network can be trained to based on 3D Multiple Feature Mappings of image generation different scale.Processor can be configured to determine knowledge using 3D learning network One group of parameter of the bounding box not detected each, and target object is positioned based on one group of parameter.

In some embodiments, the processor includes graphics processing unit.

In yet another aspect, present disclosure also relates to be stored thereon with the non-transitory computer-readable medium of instruction.Instruction It can be executed when executed by the processor for the method from the automatic detected target object of 3D rendering.This method may include receiving The 3D rendering obtained by imaging device.This method, which may further include, includes the more of target object using the detection of 3D learning network A bounding box.Learning network can be trained to generate multiple Feature Mappings of different scale based on 3D rendering.This method may be used also To include determining one group of parameter for identifying each bounding box detected using 3D learning network, and be based on one group of ginseng Number is to position target object.

It should be understood that foregoing general description and following detailed description are only exemplary and explanatory, It is not the limitation to claimed invention.

Detailed description of the invention

In the attached drawing being not necessarily drawn to scale, identical appended drawing reference can describe similar in different views Component.Like numeral with letter suffix or different letter suffix can indicate the different instances of similar component.Attached drawing is generally By way of example rather than the mode of limitation shows various embodiments, and is used for together with specification and claims to institute Disclosed embodiment is illustrated.In due course, all attached drawings make to be referred to identical reference everywhere it is same or Similar part.Such embodiment is illustrative, and be not intended as this method, system or thereon have for realizing The exhaustive or exclusive embodiment of the non-transitory computer-readable medium of the instruction of this method.

Fig. 1 shows the exemplary shaft-like image generated with chest volumetric computed tomography photography；

Fig. 2 shows exemplary nodule detection systems according to an embodiment of the present disclosure；

Fig. 3 shows according to an embodiment of the present disclosure from full articulamentum to the exemplary transformation of full convolutional layer；

Fig. 4 depicts the block diagram for illustrating exemplary medical image processing equipment according to an embodiment of the present disclosure；

Fig. 5 shows the schematic diagram of 3D learning network according to an embodiment of the present disclosure；

Fig. 6 shows the process of the example process according to an embodiment of the present disclosure for training convolutional neural networks model Figure；

Fig. 7 shows the flow chart of the example process of target object for identification according to an embodiment of the present disclosure；

Fig. 8 shows the example process according to an embodiment of the present disclosure for from the automatic detected target object of 3D rendering；

Fig. 9 shows the exemplary nodule detection of the 3D learning network according to an embodiment of the present disclosure using n scale Process；And

Figure 10 shows a kind of exemplary nodule segmentation of the 3D learning network according to an embodiment of the present disclosure using scale Process.

Specific embodiment

Terms used herein " target object " can refer to any anatomical structure in subject's body, such as tissue, device A part or target site of official.For example, target object can be Lung neoplasm.In the following embodiments, using Lung neoplasm as The example of " target object " is illustrative and not limiting, but those skilled in the art can be by the lung knot in following each embodiment Section is easily replaced other kinds of " target object ".

Fig. 2 shows according to an embodiment of the present disclosure for examining from the exemplary tubercle of the automatic detected target object of 3D rendering Examining system 200.In this embodiment, Lung neoplasm is target object.Lung neoplasm can become the target area (target of the treatments such as radiotherapy Area).As shown in Fig. 2, nodule detection system 200 includes: nodule detection model training unit 202, for training detection model；With Nodule detection unit 204, for detecting position and the classification of tubercle object using trained detection model.Trained inspection Nodule detection unit 204 can be transferred to from nodule detection model training unit 202 by surveying model, so that nodule detection unit 204 Trained detection model can be obtained and be applied to the 3D medicine figure for example obtained from 3D medical image databases 206 Picture.In some embodiments, detection model can be 3D learning network.

For example, the position of tubercle object can be identified by the center and its range of tubercle object.As needed, tubercle pair The classification of elephant can identify by the label selected from (n+1) a tubercle label, such as, but not limited to non-nodules, the first ruler Very little tubercle ..., the tubercle of the n-th size.In another example the position may include multiple bounding boxes comprising tubercle object.Optionally Or additionally, the position of tubercle object may include one group of parameter for identifying each bounding box detected.One group of parameter It may include the coordinate for identifying the position (for example, center) of each bounding box in 3D medical image.One group of parameter can be with Size including the size of each bounding box in identification 3D medical image.Based on the bounding box detected and/or identify it one Group parameter, can position tubercle object.

Training sample can store in training image database 201, and can be by nodule detection model training unit 202 obtain to train detection model.Each training sample includes the position of the tubercle object in medical image and corresponding medical image It sets and classification information.

In some embodiments, the output result (position and classification including tubercle object) of nodule detection unit 204 can It is visualized with using with primitive medicine 3D rendering (such as original volumetric CT image) Chong Die thermal map.In some embodiments In, testing result can be transferred to training image database 201 by network 205, and with corresponding image together as attached Training sample is added to be added.In this way it is possible to by including new testing result come continuous updating training image database 201.In some embodiments, nodule detection model training unit 202 can use the training sample updated periodically training inspection Model is surveyed, to improve the accuracy in detection of trained detection model.

3D learning network can be realized by various neural networks.In some embodiments, 3D learning network can be Feedover 3D convolutional neural networks.Multiple spies can be generated when being applied to Pulmonary volume CT image in this feedforward 3D convolutional neural networks Sign mapping, each Feature Mapping correspond to corresponding a kind of tubercle object, such as tubercle of non-nodules, first size ..., the n-th ruler Very little tubercle.In some embodiments, each grid cell of Feature Mapping can indicate in the correspondence area of Pulmonary volume CT image The existence of the tubercle object of respective type in domain.Based on multiple Feature Mappings, multiple bounding boxes and these boundaries are generated The scoring of the existence of tubercle object in frame.For example, score 1.0 can indicate the tubercle in bounding box there are respective type Object, score 0.0 can indicate that the score being not present between the tubercle object and 0.0 and 1.0 of respective type in bounding box can To show that there are the probability of the tubercle object of respective type in bounding box.In some embodiments, feedover 3D convolutional Neural net It can be with right and wrong maximum suppression layer to generate final detection result after network.It alternatively, can be on 3D convolutional neural networks It is additional to assist full articulamentum or the full convolutional layer of auxiliary as detection layers.In some embodiments, bounding box can be 3D's and by One group of parameter identification.For example, it can pass through the frame on the coordinate (x, y, z) and x-axis, y-axis and z-axis at the center of bounding box respectively Size (size_x, size_y, size_z) identifies.In some embodiments, 3D convolutional neural networks can be trained to come to institute It states one group of parameter and executes recurrence, therefore the result of 3D convolutional neural networks may include that the classification results of tubercle object (detect Tubercle object type) and the type for the tubercle object accordingly detected 6 regression parameter values.

In one embodiment, the feedforward 3D convolutional neural networks may include basic network, and can be from basic Network obtains the Feature Mapping of scale 1 (the first scale).For example, basic network may include three convolution blocks and three detection layers Fc1, fc2 and fc3, each convolution block are made of the maximum pond layer in two 3 × 3 × 3 convolutional layers, ReLU layers and 2 × 2 × 2.Convolution Convolutional layer 1 and convolutional layer 2 in block 1 have 64 Feature Mappings, and the convolutional layer 1 and convolutional layer 2 in convolution block 2 have 128 Feature Mapping, and convolutional layer 1 in convolution block 3 and convolutional layer 2 have 256 Feature Mappings.In some embodiments, fc1, Fc2 and fc3 can be the full articulamentum of auxiliary for classification task.In one embodiment, fc1 has after ReLU layers 512 neurons, fc2 has 128 neurons after ReLU layers, and the quantity of neuron possessed by fc3 depends on Classification.For example, if by tubercle object be divided into 10 classes (such as non-nodules, the tubercle of the 1st size, the 2nd size tubercle ..., the 9th The tubercle of size), then the neuronal quantity in fc3 layers is 10.

It in another embodiment, can be by the way that above-mentioned full articulamentum fc1, fc2 and fc3 be transformed to full convolutional layer respectively Fc1-conv, Fc2-conv and Fc3-conv modify basic network.Therefore, because the acceleration for carrying out convolution algorithm to image is imitated Fruit can be calculated acceleration based on modified basic network.Fig. 3 shows according to an embodiment of the present disclosure from full articulamentum To the exemplary transformation of full convolutional layer.In some embodiments, the kernel size of Fc1-conv can be with the ruler of Feature Mapping It is very little identical, if it is desired, Feature Mapping is, by pond, and Fc2-conv and Fc3-conv have after the output of convolution block 3 The kernel size of 1x1x1.In some embodiments, the quantity of the Feature Mapping of three full convolutional layers and corresponding full articulamentum The quantity of Feature Mapping is identical.As shown in figure 3, weight w00, w01, w10 and w11 of convolution kernel are converted from corresponding full articulamentum Respective weights w00, w01, w10 and w11.It in some embodiments, can be according to the quantity of convolution kernels size and Feature Mapping To remold the weight of full articulamentum.

In one embodiment, above-mentioned basic network or its revision may be used as 3D learning network directly to detect side Boundary's frame.Basic network can be applied to generate multiple Feature Mappings on the 3D rendering of input, and each Feature Mapping corresponds to pair The particular category of elephant.In some embodiments, each grid cell of Feature Mapping corresponds to the relevant block in 3D rendering.Example Such as, ith feature corresponding with the i-th class object is mapped, the value of grid cell can indicate that the i-th class object is present in 3D Probability in the corresponding blocks of image.In some embodiments, can the value based on the correspondence grid cell of each Feature Mapping come Classify to the object in relevant block.In addition, by by the coordinate of grid cell from Feature Mapping spatial alternation to 3D rendering Initial boundary frame can be generated and using classification results as its label in space.In some embodiments, the volume for not filling Product operation, can be used formula (1) to execute transformation.

Wherein, x_fIt is the coordinate in predicted characteristics mapping, x_oriIt is the coordinate in image space, s₁、s₂And s₃Scale because Number,It is lower rounding operation, c_{i_j}(i=1,2,3, j=1,2) is the convolution kernels size of jth convolutional layer in the i-th convolution block, c_{1_1}=c_{1_2}=c_{2_1}=c_{2_2}=c_{3_1}=c_{3_2}=3, c₄=8, c₅=1, c₆=1.

In some embodiments, for using the convolution algorithm of filling, formula (2) can be used to execute transformation.

In some embodiments, 3D learning network can have several scales.It is, for example, possible to use be based on Analysis On Multi-scale Features The convolutional network of mapping.The quantity of scale can be determined based on different Detection tasks.The Analysis On Multi-scale Features of convolutional network can To realize in various ways.For example, multiple Feature Mappings can have identical scale size, but use different size of cunning Dynamic window obtains.As another example, convolution filter or pond filter or both can be used will be under Feature Mapping Different scales is sampled, while using the sliding window of identical size.As another example, down-sampling layer can also be used It is different scales etc. by Feature Mapping down-sampling.It can be added using the disclosed convolutional network based on Analysis On Multi-scale Features mapping Speed calculates, to make the detection based on 3D learning network that can also fit while being able to detect the object with extensive size range For clinic.

In some embodiments, 3D learning network is (also referred to as complete using a series of full convolution filters on each scale Articulamentum) generate fixed quantity testing result.In some embodiments, 3D learning network can return to multiple bounding boxes, often A bounding box and two parts are associated: one group of parameter of object classification and the corresponding bounding box of identification.Object classification can have c Classification.In one embodiment, c=3, wherein three object type be respectively non-lung background, non-nodules but for lung tissue and Tubercle.For tubercle classification, bounding box surrounds corresponding tubercle object.It in some embodiments, can be each Feature Mapping Grid cell introduces multiple anchor frames, to make the bounding box detected preferably track target object.For example, identifying corresponding boundary One group of parameter of frame can be the returning on coordinate (centered on the grid cell assessed) and size relative to corresponding anchor frame The offset returned.For tubercle classification, offset can be relative to respective anchors frame coordinate and size relative value (dx, dy, Dz, dsize_x, dsize_y and dsize_z).For the Feature Mapping of each scale, if by k anchor frame and therein each Grid cell is associated, then the corresponding grid cell acquisition of each of Feature Mapping of s scale k*s bounding box (and k*s in total A anchor frame).Then, for each bounding box in k*s bounding box, the score of each classification in c classification can be calculated And its 6 offsets relative to respective anchors frame.This will lead to (the c+ applied in Feature Mapping around each position in total 6) * k*s filter.In the case where the size of each Feature Mapping is m*n*d, it can produce (c+6) * k*s*m*n*d Output.In some embodiments, anchor frame is 3D and associated with the Feature Mapping of different scale.In some embodiments, The scaling that anchor frame can be mapped based on character pair.It is alternatively possible to be adjusted according to image information by regression algorithm The positions and dimensions of anchor frame.

Fig. 4, which is depicted, illustrates according to an embodiment of the present disclosure be suitable for from the exemplary of the automatic detected target object of 3D rendering The block diagram of Medical Image Processing equipment 300.Medical Image Processing equipment 300 may include network interface 328, be connect by means of network Mouthfuls 328, Medical Image Processing equipment 300 may be coupled to network (not shown), such as, but not limited to the local area network in hospital or Internet.Network can be by the external device (ED) of medical image processing devices 300 and such as image collecting device (not shown), medicine Image data base 325, image data storage apparatus 326 connect.Image collecting device can be any of the image for obtaining object Device, such as DSA imaging device, MRI imaging device, CT imaging device, PET imaging device, ultrasonic device, fluoroscopy are set Other medical imaging devices of standby, SPECT imaging device or the medical image for obtaining patient.For example, imaging device can be with It is lung CT imaging device etc..

In some embodiments, medical image processing devices 300 can be special intelligent device or universal intelligent device.Example Such as, device 300 can be the service of the computer or cloud for image data acquiring and image data processing tasks customization Device.For example, device 300 can be integrated into image collecting device.Optionally, list is rebuild the apparatus may include or with 3D Member cooperation, 3D reconstruction unit rebuild 3D rendering for the 2D image based on being obtained by image acquisition equipment.

Medical image processing devices 300 may include image processor 321 and memory 322, and can additionally include At least one of input/output 327 and image display 329.

Image processor 321 can be including one or more general purpose processing devices (such as microprocessor, central processing list First (CPU), graphics processing unit (GPU) etc.) processing equipment.More specifically, image processor 321 can be complicated order Collection calculates (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, fortune The processor of other instruction set of row or the combined processor of operating instruction collection.Image processor 321 is also possible to one or more A dedicated treatment facility, such as specific integrated circuit (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), system on chip (SoC) etc..As it will appreciated by a person of ordinary skill, in some embodiments, image processor 321 It can be application specific processor, rather than general processor.Image processor 321 may include that one or more known treatments are set Pentium that is standby, such as being manufactured by Intel company^TM、Core^TM、Xeon^TMOr Itanium^TMThe microprocessor of series, by AMD The Turion of company's manufacture^TM、Athlon^TM、Sempron^TM、Opteron^TM、FX^TM、Phenom^TMThe microprocessor or the sun of series Micro-system (Sun Microsystems) manufacture various processors it is any.Image processor 321 can also include figure Processing unit, such as from the manufacture of Nvidia company The GPU of series, by English spy GMA, Iris of your TM manufacture^TMThe GPU of the series or Radeon manufactured by Advanced Micro Devices^TMSerial GPU.Image processor 321 It can also include the processing unit accelerated, desktop A-4 (6, the 8) series of such as Advanced Micro Devices manufacture, Intel company's manufacture Xeon Phi^TMSeries.The disclosed embodiments are not limited to any kind of processor or processor circuit, these processors or place Reason device circuit is otherwise configured to meet following calculating demand: identification, analysis, maintenance, generate and/or provide largely at As data or the such imaging data of manipulation with consistently detect and position from 3D rendering with the disclosed embodiments target object or Manipulate the data of any other type.In addition, term " processor " or " image processor " they may include more than one processor, For example, multi core design or multiple processors, each processor in the multiple processor has multi core design.Image processor 321 can execute the sequence for the computer program instructions being stored in memory 322, with execute it is disclosed herein it is various operation, Process, method.

Image processor 321 can be communicably coupled to memory 322 and be configured as executing the meter being stored therein Calculation machine executable instruction.Memory 322 may include read-only memory (ROM), flash memory, random access memory (RAM), such as The dynamic random access memory (DRAM) of synchronous dram (SDRAM) or Rambus DRAM, static memory (for example, flash memory, Static random access memory) etc., store computer executable instructions in any format thereon.In some embodiments, it stores Device 322 can store the computer executable instructions of one or more image processing programs 223.Computer program instructions can be by Image processor 321 accesses, and reads from ROM or any other suitable storage location, and be loaded into RAM for image procossing Device 321 executes.For example, memory 322 can store one or more software applications.It is stored in soft in memory 322 Part application program may include for example for the operating system (not shown) of general-purpose computing system and soft control equipment.This Outside, memory 322 can store entire software application or only store a part (such as image procossing of software application Program 223) can be executed by image processor 321.In addition, memory 322 can store multiple software modules, for realizing With each step of the process of the consistent method or training 3D learning network from the automatic detected target object of 3D rendering of the disclosure. In addition, memory 322 can store generation/caching data when executing computer program, such as medical image 324, It includes the medical image from transmissions such as image collecting device, medical image databases 325, image data storage apparatus 326.This The medical image 324 of sample may include the 3D medicine figure detected automatically that will carry out it target object received Picture.In addition, medical image 324 can also include target object testing result of the 3D medical image together with it.

Image processor 321 can execute image processing program 223 to realize for detecting target pair automatically from 3D rendering The method of elephant.In some embodiments, when executing image processing program 223, image processor 321 can scheme corresponding 3D As with include that the testing result of object classification and the bounding box detected is associated, and by 3D rendering together with (such as with marking Having) testing result storage is to memory 322.Optionally, memory 322 can be communicated with medical image databases 325 with therefrom It obtains image (wherein there is object to be detected) or 3D rendering is sent jointly into medical image together with testing result Library 325.

In some embodiments, 3D learning network can be stored in memory 322.Optionally, 3D learning network can To be stored in remote equipment, discrete database (such as medical image databases 325), in distributed apparatus, and can be with It is used by image processing program 223.3D rendering can be used as training sample together with testing result and be stored in medical image databases In 325.

Input/output 327, which can be configured as, allows medical image processing devices 300 to receive and/or send data.It is defeated Enter/export 327 may include allow equipment 300 communicate with user or other machines and equipment one or more digital and/or Artificial traffic equipment.For example, input/output 327 may include that user is allowed to provide the keyboard and mouse of input.

Network interface 328 may include network adapter, cable connector, serial connector, USB connector, connect parallel Connect device, the high speed data transfer adapter of such as optical fiber, USB 3.0, lightning, wireless network adapter such as WIFI adapter, electricity Believe (3G, 4G/LTE etc.) adapter.Device 300 can be connected to network by network interface 328.Network can provide local area network (LAN), wireless network, cloud computing environment are (for example, the software as service, the platform as service, set as the basis of service Apply), the function of client-server, wide area network (WAN) etc..

Other than showing medical image, image display 329 can also show other information, such as classification results and inspection The bounding box measured.For example, image display 329 can be LCD, CRT or light-emitting diode display.

There has been described various operations or functions, which can be implemented as software code or instruction or are defined as software generation Code or instruction.Such content can be the source code or variance codes that can directly execute (" object " or " executable " form) (" increment " or " block " code).Software code or instruction may be stored in a computer readable storage medium, and work as and be performed When, machine can be made to execute described function or operation, and including for storing information in the form of machine-accessible Any mechanism (for example, calculating equipment, electronic system etc.), such as recordable or non-recordable medium is (for example, read-only memory (ROM), random access memory (RAM), magnetic disk storage medium, optical storage media, flash memory device etc.).

As described above, 3D learning network according to an embodiment of the present disclosure can be worked with end to end system, and directly pre- Survey tubercle classification and bounding box.

In some embodiments, in order to reduce calculating and carrying cost, training program stage by stage can be used.Training program Three phases can be divided into: (1) unsupervised learning；(2) the sorter network training based on small image block；(3) it is based on big image block Detection network training.In some embodiments, stage (1) and (2) can be used for training sorter network (one of detection network Point), all basic networks as disclosed herein, to generate good netinit for entire detection network.In some implementations In example, the stage (3) can carry out end-to-end training on big image block.

In some embodiments, (such as modern times of image processor 321 cannot be encased in if original 3D rendering is too big GPU) in memory, then it can be divided into multiple big image blocks according to the memory size of image processor 321 with civilian dress Enter wherein.By the way that original 3D rendering is divided into small image block and big image block, and utilize the rank including unsupervised training Segmentation training program, for the sorter network in small image block training 3D detection network, then in trained sorter network On the basis of training 3D detect network, the required total calculating of training can be substantially reduced to be realized by modern GPU.

In one embodiment, 3D convolution autocoder can be used to generate, as shown in Figure 5 in initial network weight. In some embodiments, encoder section is made of concatenated convolutional block (such as 3 convolution blocks), and decoder section is by corresponding to In the cascade of 3 convolution blocks of encoder section deconvolute block composition.In the block that deconvolutes, the layer that deconvolutes is by up-sampling layer and it Convolutional layer composition afterwards.As shown in figure 5, input picture block, executes convolution algorithm by encoder section, and by decoder section De-convolution operation is executed, the image block of prediction is then exported.3D convolution autocoder can be trained so that output image block with Input picture block (target image block) is identical.In one embodiment, the noise of such as Gaussian noise etc can be added to It the image block of input but is not added in target output image block, to make study that there is robustness.In some embodiments, Both input picture block and target output image block can be converted, rotation, deformation, amplification/diminution etc., so that Study has higher robustness.Then, it has been initialized using the network weight of the encoder (that is, basic network) after training The training process of supervision.

Detecting network includes the sorter network.It in one embodiment, can the training point before training detects network Class network.Fig. 6 shows the flow chart for training the convolutional neural networks training process 400 of sorter network therein.Process 400 start from receiving the step 450 of 3D training image and relevant classification result.It in some embodiments, can be in step 452 3D training image is for example divided into image block using sliding window.Then, in step 454, by single image block together with its point Class result is input in sorter network together as training data.In some embodiments, the weight of sorter network may be It is initialised.In step 456, the classifier parameters of sorter network can be determined based on training data.In step 458, classification The determination of device parameter may include being verified for loss function.In some embodiments, step 456 and 458 can also be by It is integrated in identical step, wherein each image block can be based on, joins for the classifier of loss function Optimum Classification network Number.In some embodiments, optimization process can be executed by any algorithms most in use, and including but not limited to gradient decline is calculated Method, Newton method, conjugate gradient algorithms, quasi-Newton method and Levenberg Marquardt algorithm etc..In step 460, it determine that being It is no to have been processed by all image blocks, if it is, will there is training for the classifier parameters currently optimized in step 462 Sorter network output be used as trained model.Otherwise, the process back to step 454 to handle subsequent image block, until All image blocks are handled.In step 452, sliding window can be used by 3D training image and be divided into image block.In a reality Apply in example, as above by reference to explaining Fig. 3, convolutional network as sorter network (or detection network) it is last it is several entirely Articulamentum can be converted into full convolutional layer.The striding just as sliding window that stride of convolution algorithm, these operations are equivalent. Since the fast convolution on GPU calculates, huge acceleration is obtained.

In some embodiments, it is based on trained sorter network, can be constructed on big image block and training detects net Network.For example, the training process 400 of sorter network can be adapted for training detection network.It distinguishes as follows: being inputted in step 454 Training data is big image block and the information about the bounding box comprising target object wherein detected, and such as identification is corresponding One group of parameter of the bounding box detected.Such one group of parameter can include but is not limited to label (with the mesh in big image block Mark object is associated) and one group of location parameter of bounding box for detecting.As an example, label 0 can indicate 3D rendering block not Comprising tubercle (that is, the bounding box wherein detected includes non-nodules target object), and label 1-9 can respectively indicate 3D figure (that is, the bounding box wherein detected includes the tubercle target object of size n, n is 1-9 to tubercle as block comprising different size 1-9 Integer in range).There is also differences in step 456 and 458, wherein to be to belong to inspection for the parameter of loss function optimization The parameter of survey grid network.

In some embodiments, in order to train detection network, label can be distributed to each anchor frame.In one embodiment In, if friendship that anchor frame and any ground truth frame overlap and being higher than specific threshold (such as 0.7) than (IoU), correspond to ground The label of face true value frame can be assigned to anchor frame.It note that single ground truth frame (such as tight bounding box comprising tubercle) Respective labels can be distributed to several anchor frames.For example, for non-nodules anchor frame for the IoU ratio of ground truth frame outside all lungs Rate is lower than 0.3, then intrapulmonary non-nodules label can be distributed to the non-nodules anchor frame.On the contrary, if non-nodules anchor frame is for institute There is the IoU ratio of intrapulmonary ground truth frame lower than 0.3, non-nodules label outside lung can be distributed to the non-nodules anchor frame.

In one embodiment, the loss function for training detection network can be multitask loss function, with covering Both classification task and bounding box prediction task.For example, multitask loss function can be defined by formula (3).

Wherein i is the index of anchor frame in trained small lot, p_iIt is the prediction probability that anchor frame i is tubercle.Ground truth label It is p_i*, t_iIndicate 6 parametrization coordinates of the bounding box of prediction, t_iIt * is the ginseng of ground truth frame associated with tubercle anchor frame Numberization coordinate.L_clsIt is to intersect entropy loss, L_regIt is robust loss function.In some embodiments, N_clsAnd N_regCan be respectively The quantity that frame is corresponded in small lot, for normalizing.λ is the weighting parameters between classification task and recurrence task.

As an example, bounding box can be returned and uses 6 parametric regressions, and this 6 ginsengs can be defined by formula (4) Number.

Wherein x, y, z, w, h and d indicate bounding box centre coordinate and its width, height and depth.Variable x, x_aAnd x^*Point Bounding box, anchor frame and the ground truth frame (being equally applicable to y, z, w, h and d) that Yong Yu do not predict.

Fig. 7 shows the flow chart of the example process 500 of the target object in 3D rendering scanning for identification.Target pair As identification process 500 step 512 start, receive trained nodule detection model in this step.In step 514, receive It may include the 3D medical image of target object.Then, in step 452, sliding window can be used and divide 3D medical image For image block.In step 516, multiple bounding boxes are detected from image block using detection network.In step 518, determine that identification is each The label of a bounding box and one group of location parameter.Alternatively, step 516 and 518 can integrate in same step.Then, It can be classified using label to bounding box in step 520, and classified bounding box and its location parameter can be used To position the target object of such as tubercle.In step 460, it is determined whether all image blocks have been handled, if it is, in step 462, the nodule detection result of each image block is integrated, to obtain and export complete nodule detection result.If it is not, then place Reason 500 returns to step 516.Although recongnition of objects process 500 as shown in Figure 7 be based on sliding window, it is unlimited In this.In one embodiment, as above by reference to described in Fig. 3, convolutional network as sorter network (or detection network) Last several full articulamentums can be converted into full convolutional layer.The striding of convolution algorithm is played similar with striding for sliding window Effect, these operations are equivalent.Since the fast convolution on GPU calculates, huge acceleration is obtained.

The exemplary mistake being used for from the automatic detected target object of 3D rendering of Fig. 8 diagram according to another embodiment of the present disclosure Journey.As shown in figure 8,3D Pulmonary volume CT image input detection systems, the system can be detected it using trained model In tubercle, and identify include each tubercle corresponding bounding box, the positions and dimensions including corresponding bounding box.

In some embodiments, 3D rendering can be divided into along various directions (such as, but not limited to direction z) smaller Chunking, then can will test network and its related algorithm and be applied to each chunking to obtain the corresponding detection of target object and tie Fruit.The testing result of each chunking can be gathered, and the chunking with corresponding testing result can be integrated, to generate for whole The complete testing result of a 3D rendering.

Fig. 9 shows the exemplary nodule detection process of the 3D learning network using n scale.As shown in figure 9, the 3D of input Medical image is W*H*Z volume CT scan.It is fitted into GPU memory to will test network and Feature Mapping, inputs CT scan quilt It is divided into smaller chunking, it is noted that two chunking W*H*Z have been illustrated in Fig. 9₁CT scan and W*H*Z₂CT scan, as illustrating An example with detection process is illustrated, can actually select the quantity of chunking, as needed to be suitble to the property of GPU Energy.For example, basic network corresponds to scale 1, and the learning network for corresponding to other scales including 2~n of scale can lead to It crosses and scaling (rescale) operation of basic network is realized, including but not limited to convolution sum maximum pond.Each chunking can To there is 3 class labels for bounding box.The Feature Mapping of 3 classifications of various scales can be used to detect each chunking Interior bounding box.The testing result may include 6 offset parameters of the class label of bounding box and the recurrence relative to anchor frame (dx, dy, dz, dsize_x, dsize_y and dsize_z).The bounding box of all chunkings detected, i.e., more frames shown in Fig. 9 Prediction, can organize merging and be transformed into coordinates of original image coordinates system, carry out the non-maximum suppression of 3D, then to obtain final detection knot Fruit.By the non-maximum suppression of 3D, redundancy bounding box can be cancelled, to simplify and clarify the inspection of target object in 3D medical image Survey/positioning result.For example, as the non-maximum suppression of 3D as a result, can determine the boundary detected for a tubercle Frame.

It is alternatively possible to execute segmentation before running detection algorithm, 3D medical image is tied to will test algorithm Potential region rather than entire 3D medical image.Thus, it is possible to improve detection accuracy, while can reduce needed for detection network Calculation amount.

Using Lung neoplasm as the example of target object, it is known that Lung neoplasm is always in intrapulmonary.It in one embodiment, can be pre- Lung segmentation is first carried out further to remove the false alarm of intrapulmonary.Particularly, can to first carry out in advance lung segmentation convex to generate lung Packet, then constrains nodule detection using convex closure.Lung segmentation can be executed by various means, including but not limited to convolution Network, movable contour model, fractional spins etc..In some embodiments, it can be executed in low resolution scanning This lung segmentation, and original resolution can be upsampled to result, 3D learning network and Feature Mapping are packed into GPU Memory, while accelerating cutting procedure.

In clinical setting, radiologist usually requires to carry out quantitative analysis to Lung neoplasm.For example, in addition to detecting Except bounding box, they also need the boundary of tubercle, and the accurate size etc. of the detected tubercle depending on nodule segmentation.? In some embodiments, segmentation can be executed based on the bounding box detected.It is, for example, possible to use 3D convolutional networks to detect Lung neoplasm bounding box in execute segmentation.Parted pattern/learning network can be instructed on lesser nodule image block as a result, The image-region practiced and be applied in detected bounding box.

In one embodiment, as shown in Figure 10, nodule segmentation and detection can be integrated into during nodule segmentation, with Realize detection and cutting procedure end to end.It, can be with although input W*H*Z 3D scanning is divided into two CT scan in Figure 10 Any appropriate number of CT scan can be divided by imagining input 3D scanning.It, can be basic for the CT scan of each division Various scaling operations, including the operation of convolution sum maximum pondization are executed on network, to obtain the testing result of corresponding scale.One Denier detects bounding box, bounding box can be scaled back to Feature Mapping space (for example, realigning characteristic layer to the end), so ROI (area-of-interest) pond is applied on it afterwards, to generate ROI region.Partitioning algorithm can be executed to each ROI with straight Deliver a child into nodule segmentation.As an example, this partitioning algorithm can be realized by full convolutional network dividing layer.It is such Partitioning algorithm can also be realized by a series of warp laminations after convolutional layer or up-sampling layer.In some embodiments, The nodule segmentation result of each segmentation CT scan can be integrated, to obtain the entire tubercle point in the original CT scan inputted Cut result.In one embodiment, pond is of the same size ROI using bilinear interpolation and resampling, to accelerate GPU is calculated.

It is envisioned that nodule segmentation process as shown in Figure 10, can extend from previous detection network, wherein examining It surveys and the segmentation stage can share identical detection network, such as basic network.

In some embodiments, the training of network used in Figure 10 can be proceed as follows.Firstly, training detection net Network detects the weight of network portion to obtain.Then, give detection network portion weight, using ground truth bounding box to point Network portion is cut to be trained.Segmentation network portion can be trained using several loss functions, before being including but not limited to based on The normalization cross entropy of scape and background voxels quantity.Above two network can be combined to obtain nodule segmentation result.

Detection network portion and segmentation network portion can respectively, be sequentially or simultaneously trained to.Implement at one In example, during the training stage, detection network portion and segmentation both network portion can simultaneously for associated losses function into Row training, wherein carrying out supervised segmentation using ground truth bounding box and segmentation.Associated losses function can be defined by formula (5).

Wherein, item is identical those of in the item with formula (3) of front and therefore omits its definition.Last last item is segmentation Lose component.N_segIt is the quantity of cut zone in a small lot.L_segIt is the voxel losses function in a region, j is instruction Practice the index of area-of-interest in small lot, S_jIt is the prediction probability of area-of-interest, S_jIt * is ground truth segmentation.

The description of front is for illustrative purposes and presents.This is not exhausted, and be not limited to it is disclosed really Cut form or embodiment.In view of the explanation and practice of the disclosed embodiments, the modifications and changes of embodiment will become it is aobvious and It is clear to.

It in the document, include one or more using term " one " or " a " as common in the patent literature In one, independently of "at least one" or any other example or usage of " one or more ".Herein, unless in addition referring to Out, term "or" is for referring to nonexcludability, or " A or B " is made to include " A but do not include B ", " B but do not include A " and " A with B".In the document, term " including (including) " and " wherein (in which) " be used as corresponding term " including And the plain English equivalent of " wherein (wherein) " (comprising) ".Moreover, in the following claims, term " including (including) " and " including (comprising) " is open, that is, including in addition to those exist in the claims Equipment, system, equipment, product, composition, formula or the process of element other than the element listed after the term, are also regarded as and fall Enter in the scope of protection of the claims.In addition, in the following claims, term " first ", " second " and " third " etc. It is merely used as label, it is no intended to which requirement numerically is applied to its object.

Illustrative methods described herein can be at least partly machine or computer implemented.Some examples can wrap The computer-readable medium or machine readable media with instruction encoding are included, described instruction can be operated to configure electronic equipment and execute such as Method described in above example.The realization of this method may include software code, such as microcode, assembler language code, More advanced language codes etc..Various software programming techniques can be used to create in various programs or program module.For example, can be with Program segment or program module are designed using Java, Python, C, C++, assembler language or any of programming language.One Or multiple such software sections or module can be integrated into computer system and/or computer-readable medium.It is this soft Part code may include the computer-readable instruction for executing various methods.Software code can form computer program product Or a part of computer program module.In addition, in one example, software code can such as during execution or other when Between be tangibly stored in one or more volatibility, non-transitory or non-volatile visible computer readable medium.These have The example of the computer-readable medium of shape can include but is not limited to hard disk, moveable magnetic disc, removable CD (for example, CD And digital video disc), cassette, storage card or stick, random access memory (RAM), read-only memory (ROM) etc..

In addition, although there is described herein illustrative embodiments, range include with based on the disclosure equivalent elements, Modification is omitted, any and all embodiments of combination (for example, combination of the scheme across various embodiments), adjustment or change.Power Benefit require in element will be construed broadly as based on language used in claim, and be not limited in this specification or The example of the duration description of the application, these examples are to be interpreted as nonexcludability.In addition, the step of disclosed method Suddenly it can modify in any way, including by rearrangement step or insertion or delete step.Therefore, it is intended that only will Description is considered as example, and real range is indicated by following following claims and its whole equivalency ranges.

Above description is intended to illustrative rather than restrictive.For example, above-mentioned example (or one or more side Case) it can be in combination with one another.Other embodiments can be used when checking above description by those of ordinary skill in the art.And And in detailed description above, various features can be grouped together to simplify the disclosure.This is not construed as being intended to Not claimed open feature is allowed all to be essential for any claim.But subject matter can be It is combined in the feature fewer than all features of a disclosed embodiment.Therefore, thus following following claims is used as example or reality Example is applied to be incorporated into specific embodiment, wherein each claim is independently used as individual embodiment, and be contemplated that It is that these embodiments can be combined with each other with various combinations or displacement.The scope of the present invention should refer to appended claims And it assigns the full scope of the equivalent of these claims and determines.

Claims

1. a kind of computer implemented method from the automatic detected target object of 3D rendering, comprising:

Receive the 3D rendering obtained by imaging device；

Multiple bounding boxes of the 3D learning network detection comprising the target object are used by processor, wherein the learning network It is trained to generate multiple Feature Mappings with variation scale based on the 3D rendering；

One group of parameter for identifying each bounding box detected is determined using the 3D learning network by processor；And

One group of parameter is based on by processor and positions the target object.

2. computer implemented method according to claim 1, wherein one group of parameter includes identifying in the 3D rendering Each bounding box position coordinate.

3. computer implemented method as described in claim 1, wherein one group of parameter includes identify each bounding box big Small size.

4. computer implemented method according to claim 1, wherein the 3D learning network is trained to described one group Parameter executes recurrence.

5. computer implemented method according to claim 1 further comprises that multiple anchor frames are related to the 3D rendering Connection, wherein one group of parameter indicates offset of each bounding box relative to respective anchors frame.

6. computer implemented method as claimed in claim 5, wherein each anchor frame is related to the grid cell of Feature Mapping Connection.

7. computer implemented method according to claim 6, wherein the anchor frame contracts according to the scale of the Feature Mapping It puts.

8. computer implemented method according to claim 1, wherein the modified image ruler of the multiple Feature Mapping It is very little.

9. computer implemented method according to claim 1, wherein the multiple Feature Mapping uses the cunning of variable size Dynamic window.

10. computer implemented method according to claim 1 further includes creation initial boundary frame, wherein detection includes institute The multiple bounding boxes for stating target object include being classified as being associated with multiple labels by the initial boundary frame.

11. computer implemented method according to claim 1 further includes to the bounding box detected using non-maximum suppression System.

12. computer implemented method according to claim 1, further includes: divide the 3D rendering with obtain convex closure and The detection of the multiple bounding box is constrained using the convex closure.

13. computer implemented method according to claim 1, wherein the learning network is further trained each The segmentation target object in the bounding box detected.

14. computer implemented method according to claim 1, wherein the imaging device is computed tomography imaging System.

15. computer implemented method according to claim 1, wherein the target object is Lung neoplasm.

16. computer implemented method according to claim 1, wherein the learning network is full convolutional neural networks.

17. a kind of system from the automatic detected target object of 3D rendering, comprising:

Interface is configured as receiving the 3D rendering obtained by imaging device；And

Processor is configured as:

Multiple bounding boxes comprising the target object are detected using 3D learning network, wherein the learning network be trained to Multiple Feature Mappings with variation scale are generated based on the 3D rendering；

One group of parameter for identifying each bounding box detected is determined using 3D learning network；And

The target object is positioned based on one group of parameter.

18. system as claimed in claim 17, wherein the processor includes graphics processing unit.

19. system according to claim 17, wherein imaging device is Computerized tomographic imaging system.

20. a kind of non-transitory computer-readable medium for being stored thereon with instruction, wherein described instruction is executed by processor Method of the Shi Zhihang from the automatic detected target object of 3D rendering, which comprises

Receive the 3D rendering obtained by imaging device；

One group of parameter of the bounding box that identification each detects is determined using the 3D learning network；And

Target object is positioned based on one group of parameter.