CN110796659A

CN110796659A - Method, device, equipment and storage medium for identifying target detection result

Info

Publication number: CN110796659A
Application number: CN201911289667.7A
Authority: CN
Inventors: 殷保才; 徐亮; 孙梅
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-06-24
Filing date: 2019-12-13
Publication date: 2020-02-14
Anticipated expiration: 2039-12-13
Also published as: CN110264460A; CN110796659B

Abstract

The application provides a method, a device, equipment and a storage medium for identifying a target detection result, wherein the method comprises the following steps: acquiring an initial target detection result, wherein the initial target detection result is a primary detection result obtained by detecting an image target from an image; dividing the initial target detection result into an image unit sequence, and extracting the internal structure characteristics of the initial target detection result according to the image unit sequence; and determining whether the initial target detection result is an image target or not at least according to the internal structure characteristics. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

Description

Method, device, equipment and storage medium for identifying target detection result

The present application claims priority of chinese patent application entitled "method, apparatus, device and storage medium for identifying target detection result" filed 24/6/2019 with chinese patent office, application number 201910549722.5, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a target detection result.

Background

Object detection, that is, detection of an image object from an image, is a common processing content in image processing. With the increasing demand of people for automation of image target detection, the application of automated image icon detection schemes is becoming more and more popular.

On the other hand, detecting a correct image target from an image is a key to the subsequent processing of the target detection result. The performance of different target detection schemes is different, and even the target detection performance of the same target detection scheme applied to different images is different, so that whether the target detection result is an image target or not is identified, the accuracy of target detection is ensured, and the method is an actual requirement in target detection application.

Disclosure of Invention

Based on the above requirements, the present application provides a method, an apparatus, a device and a storage medium for identifying a target detection result, which can be used to identify whether the target detection result is an image target.

A method of discriminating between target detection results, comprising:

acquiring an initial target detection result, wherein the initial target detection result is a primary detection result obtained by detecting an image target from an image;

dividing the initial target detection result into an image unit sequence, and extracting the internal structure characteristics of the initial target detection result according to the image unit sequence;

and determining whether the initial target detection result is an image target or not at least according to the internal structure characteristics.

Optionally, the initial target detection result is a three-dimensional initial target detection result;

the dividing the initial target detection result into a sequence of image units comprises:

and dividing the initial target detection result into a two-dimensional image sequence.

Optionally, the extracting the internal structural feature of the initial target detection result according to the image unit sequence includes:

inputting the two-dimensional image sequence into a pre-trained structural relationship recognition model, and extracting the internal structural features of the initial target detection result;

the structural relationship recognition model is obtained by training at least through extracting relationship features between two-dimensional image frames of a three-dimensional image target sample.

Optionally, the structural relationship recognition model is obtained based on recurrent neural network training.

Optionally, the method further includes:

extracting the appearance characteristics of the initial target detection result;

the determining whether the initial target detection result is an image target according to at least the internal structure feature includes:

and determining whether the initial target detection result is an image target or not according to the representation characteristic and the internal structure characteristic.

Optionally, the extracting the appearance feature of the initial target detection result includes:

inputting the initial target detection result into a pre-trained symptom identification model to obtain the expression characteristics of the initial target detection result;

wherein, the symptom identification model is obtained by at least extracting the class characteristics and/or the symptom characteristics of the image target sample.

Optionally, the feature recognition model is obtained based on convolutional neural network training.

Optionally, the determining whether the initial target detection result is an image target according to the representation feature and the internal structure feature includes:

inputting the appearance features and the internal structure features into a pre-trained classifier, and determining whether the initial target detection result is an image target;

the classifier performs classification training on the image target samples at least according to the appearance features and the internal structure features of the image target samples to obtain the image target samples.

carrying out feature fusion processing on the appearance features and the internal structure features to obtain comprehensive features of the appearance and the internal structure;

inputting the comprehensive characteristics of the appearance and the internal structure into a pre-trained classifier, and determining whether the initial target detection result is an image target;

and the classifier performs classification training on the image target samples at least according to the comprehensive characteristics of the appearance and the internal structure of the image target samples to obtain the image target samples.

Optionally, the performing feature fusion processing on the appearance features and the internal structure features to obtain comprehensive features of the appearance and the internal structure includes:

respectively carrying out regularization processing on the appearance features and the internal structure features;

processing the appearance features and the internal structure features after the regularization treatment into features with the same scale;

and splicing the appearance features and the internal structure features with the same scale to obtain comprehensive characteristics of the appearance and the internal structure.

Optionally, the obtaining an initial target detection result includes:

preprocessing an image to be processed;

inputting the preprocessed image to be processed into a pre-trained image target detection model to obtain an initial target detection result;

the image target detection model is obtained by training at least through detecting image targets from images.

An apparatus for discriminating a result of detection of an object, comprising:

the data acquisition unit is used for acquiring an initial target detection result, wherein the initial target detection result is a primary detection result obtained by detecting an image target from an image;

the first data processing unit is used for dividing the initial target detection result into an image unit sequence and extracting the internal structure characteristics of the initial target detection result according to the image unit sequence;

and the judgment processing unit is used for determining whether the initial target detection result is an image target or not at least according to the internal structure characteristics.

correspondingly, when the first data processing unit divides the initial target detection result into an image unit sequence, the first data processing unit is specifically configured to:

Optionally, when the first data processing unit extracts the internal structure feature of the initial target detection result according to the image unit sequence, the first data processing unit is specifically configured to:

Optionally, the apparatus further comprises:

the second data processing unit is used for extracting the appearance characteristics of the initial target detection result;

the judgment processing unit is configured to, when determining whether the initial target detection result is an image target at least according to the internal structure feature, specifically:

Optionally, when the second data processing unit extracts the appearance feature of the initial target detection result, the second data processing unit is specifically configured to:

Optionally, when the determination processing unit determines whether the initial target detection result is an image target according to the appearance feature and the internal structure feature, the determination processing unit is specifically configured to:

Optionally, the judgment processing unit includes:

the characteristic fusion unit is used for carrying out characteristic fusion processing on the appearance characteristic and the internal structure characteristic to obtain comprehensive characteristics of the appearance and the internal structure;

the characteristic processing unit is used for inputting the comprehensive characteristics of the representation and the internal structure into a classifier which is trained in advance and determining whether the initial target detection result is an image target;

Optionally, the feature fusion unit includes:

the first processing unit is used for respectively carrying out regularization processing on the appearance features and the internal structure features;

the second processing unit is used for processing the appearance features and the internal structure features after the regularization processing into features with the same scale;

and the third processing unit is used for splicing the appearance features and the internal structure features with the same scale to obtain comprehensive characteristics of the appearance and the internal structure.

Optionally, when the data obtaining unit 100 obtains the initial target detection result, it is specifically configured to:

preprocessing an image to be processed;

An authentication apparatus of an initial target detection result, comprising:

a memory and a processor;

wherein the memory is connected with the processor and used for storing programs;

the processor is used for implementing the method for identifying the target detection result by operating the program stored in the memory.

A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the above-described method of discriminating between target detection results.

According to the method for identifying the target detection result, the initial target detection result is divided into the image unit sequence, the internal structure characteristic of the initial target detection result is extracted according to the image unit sequence, and whether the initial target detection result is the image target or not is determined according to the internal structure characteristic of the initial target detection result. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for identifying a target detection result according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a pulmonary CT image provided by an embodiment of the present application;

FIG. 3 is a schematic process flow diagram of a self-attention mechanism according to an embodiment of the present disclosure;

FIG. 4 is a schematic flowchart of another method for identifying a target detection result according to an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of a method for identifying a target detection result according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a U-net network structure provided in an embodiment of the present application;

fig. 7 is a schematic diagram of lung nodule segmentation provided by an embodiment of the present application;

FIG. 8 is a schematic processing diagram of an object detection system according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an apparatus for discriminating a target detection result according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of an apparatus for discriminating a target detection result according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an authentication apparatus for detecting a target according to an embodiment of the present application.

Detailed Description

The technical scheme of the embodiment of the application can be applied to the application scene of identifying whether the image target detection result is the image target. By adopting the technical scheme of the embodiment of the application, the target detection result can be identified, and whether the target detection result is an image target or not can be determined.

As an exemplary implementation manner, the technical solution of the present application may be applied to a hardware device such as a hardware processor, or packaged into a software program to be executed, and when the hardware processor executes the processing procedure of the technical solution of the present application, or the software program is executed, the target detection result may be identified to determine whether the target is an image target. The embodiment of the present application only introduces the specific processing procedure of the technical scheme of the present application by way of example, and does not limit the specific implementation form of the technical scheme of the present application, and any technical implementation form that can execute the processing procedure of the technical scheme of the present application can be adopted by the embodiment of the present application.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a method for identifying a target detection result, which is shown in fig. 1 and comprises the following steps:

s101, obtaining an initial target detection result.

And the initial target detection result is a preliminary detection result obtained by detecting an image target from the image. Since the initial target detection result is from the image, the initial target detection result is essentially a portion of the image content in the image. The specific form of the initial target detection result depends on the form of the image target detected from the image, and when the initial target detection result is a target detection result obtained by detecting a two-dimensional image target from a two-dimensional image or a three-dimensional image, the initial target detection result is a two-dimensional image; when the initial target detection result is a target detection result obtained by detecting a three-dimensional image target from a three-dimensional image, the initial target detection result is a three-dimensional target detection result.

The technical scheme of the embodiment of the application can be applied to identifying the detection result of the initial target in any form, such as two-dimensional or three-dimensional, and is particularly suitable for identifying the detection result of the initial target in three-dimensional to determine whether the detection result is the image target. The embodiment of the present application takes the identification of the three-dimensional initial target detection result as an example, and introduces a specific processing procedure of the technical scheme of the embodiment of the present application. For example, the embodiment of the present application takes the identification of the lung nodule detection result of the lung CT image shown in fig. 2 as an example, and introduces a specific processing procedure of the technical solution of the embodiment of the present application.

When the technical solution of the embodiment of the present application is applied to identifying target detection results in other forms, a specific processing procedure may be executed by referring to the introduction of the embodiment of the present application, or after performing adaptive adjustment and perfection on the processing procedure introduced in the embodiment of the present application, and the embodiment of the present application is not described in detail one by one.

The above-mentioned initial target detection result may be an initial target detection result read from a database, or a received initial target detection result, or an image detection result obtained by directly detecting an image target from an image, or the like.

S102, dividing the initial target detection result into an image unit sequence, and extracting the internal structure characteristics of the initial target detection result according to the image unit sequence;

the image unit sequence is an image unit sequence in which an image is divided into unit regions to obtain image units, and the image units are arranged according to the positions of the image units in the original image. The internal structural feature described above is a feature indicating a structural relationship between each image unit in an image.

It is understood that dividing the above-described initial target detection result into a sequence of image units corresponds to dividing the composition structure of the initial target detection result. Since the specific correlation characteristics exist between the respective constituent structures within the specific initial target detection result, in order to retain the original correlation characteristics between the constituent structures of the initial target detection result, the embodiments of the present application configure that after the initial target detection result is divided into image units, the image units are necessarily arranged into the image unit sequence according to their positional relationships in the original image.

On the basis, the correlation among all the image units in the sequence is calculated, namely the correlation among the internal structures of the initial target detection result is obtained, and the calculated correlation is further characterized, so that the internal structure characteristics of the initial target detection result can be obtained.

As an exemplary implementation, after dividing the initial target detection result into the image unit sequence, when the internal structural feature of the initial target detection result is extracted according to the image unit sequence, the model may be used to implement the method.

And inputting the image unit sequence into a pre-trained feature extraction model to obtain the internal structural features of the initial target detection result. The feature extraction model is obtained by training at least through extracting the structural relationship features among the image units of the image unit sequence, so that the structural relationship features among the image units of the input image unit sequence can be accurately extracted. The specific structure of the feature extraction model can adopt a common artificial intelligence algorithm structure, for example, a neural network structure such as CNN and RNN can be adopted.

According to the scheme, the obtained initial target detection result is divided into the image unit sequence, and the internal structure characteristic of the initial target detection result is obtained through calculation according to the mutual relation among all parts of the image unit sequence.

It can be understood that, when the initial target detection result is a two-dimensional image, the initial target detection result may be divided into image areas according to rows and columns to obtain two-dimensional image blocks, and then the two-dimensional image blocks are arranged according to their positions in the original image to obtain a two-dimensional image block sequence. And extracting the structural relationship features among the image blocks of the two-dimensional image block sequence, for example, inputting the two-dimensional image block sequence into a pre-trained feature extraction model, so as to obtain the structural relationship features among the two-dimensional image blocks, namely the internal structural features of the initial target detection result.

For example, when the technical solution of the embodiment of the present application is applied to the identification of the initial target detection result of the three-dimensional volume, for example, when the lung nodule detection result of the lung CT image shown in fig. 2 is identified as a lung nodule, the initial target detection result of the three-dimensional volume is the three-dimensional image region where the lung nodule detected from the lung CT image is located.

At this time, when the initial target detection result is divided into the image unit sequence, the initial target detection result may be divided into a two-dimensional image sequence, that is, the initial target detection result of the three-dimensional stereo may be divided into a two-dimensional plane image sequence.

For example, if a three-dimensional image block (C × H × W) in which a lung nodule is located is obtained from the lung CT image, when the three-dimensional image block is divided into image units, the three-dimensional image block may be divided into C frames of two-dimensional plane images of H × W, and the divided two-dimensional plane images may be arranged according to their positions in the three-dimensional image block, so as to obtain a two-dimensional plane image sequence.

After the three-dimensional initial target detection result is divided into two-dimensional image sequences, as an optional implementation manner, in the embodiment of the present application, the two-dimensional image sequences are input into a pre-trained structural relationship recognition model, and the internal structural features of the initial target detection result are extracted.

The structural relationship recognition model is obtained by at least extracting relationship features between two-dimensional image frames of a three-dimensional image target sample.

That is, in the embodiment of the present application, a structural relationship recognition model is trained in advance, and is used for extracting features of the interrelation between the two-dimensional image frames of the three-dimensional image target sample.

And continuously extracting the relationship characteristics between the two-dimensional image frames of the three-dimensional image target sample through the constructed structural relationship recognition model to train the structural relationship recognition model until the structural relationship recognition model can accurately extract the internal structural relationship characteristics of the three-dimensional image target sample.

At this time, each frame of two-dimensional plane image contained in the two-dimensional image sequence is taken as a moment and sequentially input into the structural relationship identification model, the model performs feature extraction on each frame of two-dimensional plane image, and simultaneously models the relationship among different frames, so that the internal structural relationship features of the two-dimensional image sequence are extracted, and the internal structural features of the three-dimensional initial target detection result are obtained.

As an exemplary implementation manner, the structural relationship recognition model is obtained based on Recurrent Neural Network (RNN) training, that is, a training process for extracting relationship features between two-dimensional image frames of a three-dimensional image target sample is performed on RNN, so that the RNN has the capability of extracting internal structural features of a three-dimensional image, and is used as the structural relationship recognition model.

Illustratively, the structural relationship recognition model is obtained based on bidirectional long-short term memory network BilSTM training.

It should be noted that, unlike the task of conventional sequence learning, the lung nodule detection task of the lung CT image has only one output, so that the output module of the RNN has only one node, and the modeling Unit in the RNN may be a Long Short-Term Memory network (LSTM) or a Gated Recurrent Unit (GRU), which is not strictly limited in the embodiments of the present application. In order to integrate the features of different image frames, the embodiment of the present application uses the self-attention mechanism shown in fig. 3 to automatically learn the importance of different image frames, which can greatly improve the distinctiveness of the integrated features.

S103, determining whether the initial target detection result is an image target or not at least according to the internal structure characteristics.

Typically, different image regions have different internal structural features. Such as a lung CT image as shown in fig. 2, in which the internal structural features of the lung nodule image region are different from those of the normal image region. Therefore, the initial target detection result can be identified according to the internal structure characteristics of the initial target detection result, and whether the internal structure characteristics of the initial target detection result accord with the internal structure characteristics of the image target or not can be identified.

For example, in the embodiment of the present application, a classifier is trained in advance, and whether an initial target detection result is an image target is determined by classifying the internal structure features.

The classifier performs classification training on the image target samples at least according to the internal structure characteristics of the image target samples to obtain the image target samples.

The classifier can classify the input internal structure features, and when the classifier classifies the internal structure features of the input initial target detection result into the internal structure feature category of the image target, the initial target detection result can be determined to be the image target.

As an alternative implementation manner, in the embodiment of the present application, the structural relationship recognition model and the classifier are jointly designed and trained. The internal structure characteristics of the initial target detection result obtained by the output layer of the structural relationship recognition model are directly used as the input of the classifier, and the structural relationship recognition model and the classifier are combined to form a complete processing model. After the processing model completes training, namely the joint training of the structural relationship recognition model and the classifier, the image unit sequence of the initial target detection result is input into the processing model, and whether the initial target detection result is an image target or not can be recognized.

As can be seen from the above description, in the method for identifying a target detection result provided in the embodiment of the present application, an initial target detection result is divided into an image unit sequence, and an internal structural feature of the initial target detection result is extracted according to the image unit sequence, so as to determine whether the initial target detection result is an image target according to the internal structural feature of the initial target detection result. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

As an alternative implementation, referring to fig. 4, another embodiment of the present application further discloses that, after obtaining the initial target detection result, the method for identifying the initial target detection result further includes:

s402, extracting the appearance characteristics of the initial target detection result.

Correspondingly, when determining whether the initial target detection result is the image target, the appearance feature and the internal structure feature are combined for use, and are jointly used for determining whether the initial target detection result is the image target.

That is, the above determining whether the initial target detection result is an image target at least according to the internal structure feature specifically includes:

s404, determining whether the initial target detection result is an image target according to the representation characteristics and the internal structure characteristics.

The above-mentioned appearance feature of the initial target detection result refers to an external structural feature of the initial target detection result, such as a shape, whether a surface is smooth, whether there is a burr, or the like.

As an exemplary implementation manner, the embodiment of the present application extracts the class feature and/or the symptom feature of the initial target detection result as the representation feature of the initial target detection result. Taking the extraction of the appearance characteristics of the lung nodules in the lung CT image as an example, the classification of the lung nodules, namely the classification attributes of the lung nodules, such as benign nodules or malignant nodules, and the like; the characteristic feature of the lung nodule is whether the lung nodule has external shape features such as burrs.

Steps S401 and S403 in the method for identifying an initial target detection result shown in fig. 4 correspond to steps S101 and S102 in the method embodiment shown in fig. 1, respectively, and for specific contents, reference is made to the contents of the method embodiment shown in fig. 1, which is not described herein again.

As an exemplary implementation manner, when the appearance feature of the initial target detection result is extracted, the embodiment of the application inputs the initial target detection result into a pre-trained feature recognition model to obtain the appearance feature of the initial target detection result.

The characteristic recognition model is obtained by training at least through extracting the class characteristics and/or the characteristic characteristics of the image target sample.

Illustratively, the above-mentioned feature recognition model is obtained by training based on a convolutional neural network CNN, that is, performing a single-task or multi-task learning training on the CNN by using an image target sample, so as to extract a class feature of the image target sample, or extract a feature of the image target sample, or extract both the class feature and the feature of the image target sample, that is, identify both the class and the feature of the image target sample. When the convolutional neural network CNN can accurately extract the class characteristics and/or the symptom characteristics of the image target sample, the training of the network is finished, and the network can be used as a symptom identification model.

Inputting the initial target detection result into the trained symptom identification model, and extracting the class characteristics and/or the symptom characteristics of the initial target detection result from the model to be used as the expression characteristics of the initial target detection result.

When the embodiment of the application is applied to the extraction of the representation features of the lung nodule detection result of the lung CT image, the feature recognition model is obtained based on 3D-CNN training. At the moment, the lung nodule image sample marked with the category and/or the sign is input into the 3D-CNN, so that the 3D-CNN can identify the category and/or the sign of the lung nodule, and when the category and/or the sign of the lung nodule can be accurately identified, the 3D-CNN is considered to be trained. At this time, the category feature and/or the symptom feature of the lung nodule output by the output layer of the 3D-CNN is a result of extracting the category feature and/or the symptom feature of the input lung nodule.

It should be noted that, in practical application of the technical solution of the embodiment of the present application, the structure of the above-mentioned feature recognition model may be flexibly selected, for example, when the feature recognition model is applied to extracting representation features from a planar image, the feature recognition model may adopt a CNN structure; alternatively, the model structure such as RNN may be used.

As an optional implementation manner, the determining whether the initial target detection result is an image target according to the appearance feature and the internal structure feature includes:

When the image target detection result is identified to be the image target through the appearance feature and the internal structure feature of the initial target detection result, firstly, a classifier is trained to classify the image target sample according to the appearance feature and the internal structure feature of the image target sample, so that whether the image target sample is the image target can be accurately identified.

After the appearance features and the internal structure features of the initial target detection result are respectively extracted through the symptom identification model and the structural relationship identification model, the features are input into a classifier which is trained in advance, and therefore whether the initial target detection result is an image target or not is determined.

For the above lung nodule identification, in an actual scene, when a physician manually identifies whether a lung nodule detected by a machine from a lung CT image is a true nodule, the physician firstly observes the appearance of the nodule, for example, observes the classification of the nodule and whether there are symptoms such as burrs, and then turns over the CT image layer where the detected lung nodule is located up and down, and determines whether the detected lung nodule is the true nodule by observing the structural relationship between the detected lung nodule and surrounding tissues.

It can be understood that when the lung nodules are identified manually, the judgment is carried out by identifying the appearance characteristics of the lung nodules and turning the images to look at the internal space structure relationship characteristics of the lung nodules.

In the technical scheme of the embodiment of the application, the identification of the initial target detection result is realized by extracting the appearance characteristics and the internal structure characteristics of the initial target detection result. Corresponding to the lung nodule identification, the internal structural features provided by the embodiment of the application, namely the structural relationship features between the two-dimensional plane images of the CT image region where the lung nodule is located, are equivalent to the structural relationship between the lung nodule and the surrounding tissues observed by manually turning the CT image layer where the lung nodule is located; the above-mentioned appearance features are consistent with the appearance features of the lung nodules obtained by manually observing the lung nodules.

Therefore, it can be understood that in the technical solution of the embodiment of the present application, whether the initial target detection result is an image target is identified by extracting the appearance feature and the internal structure feature of the initial target detection result, which actually simulates an operation process of manually observing the appearance feature of the initial target detection result and determining the internal organization structure relationship by browsing each layer of the initial target detection result, so as to determine whether the initial target detection result is the image target. Therefore, the technical scheme of the embodiment of the application identifies whether the initial target detection result is the image target or not in a processing mode closest to manual operation, has practical foundation and has realizability guarantee.

Optionally, as another implementation manner, another embodiment of the present application discloses that, referring to fig. 5, the determining whether the initial target detection result is an image target according to the appearance feature and the internal structure feature includes:

s504, regularizing the appearance features and the internal structure features respectively;

taking the lung nodules detected from the lung CT image shown in fig. 2 as an example, assuming that the output layer of the feature recognition model outputs the appearance features F2 of the lung nodules (assuming that dimensions C × H × W, C, H, W respectively represent the channel, height, and width of 3D-CNN), the output layer of the structural relationship recognition model outputs the internal structural features F1 of the lung nodules (assuming that dimension T HW represents the number of frames, and HW represents that the 2D image features of each frame are stretched into one dimension), since F1 and F2 may have different ranges, the present embodiment first performs L2-norm regularization processing on F1 and F2, and then performs step S505:

s505, processing the appearance features and the internal structure features after the regularization processing into features with the same scale;

by scaling F1 and F2, both were processed to features of the same scale. For example, F2 was drawn to the C HW dimension, making it the same as the F1 dimension.

S506, splicing the appearance features and the internal structure features with the same scale to obtain comprehensive features of the appearance and the internal structure;

f1 in the dimension T × HW is spliced with F2 in the dimension C × HW to obtain the representation and internal structural comprehensive characteristics F in the dimension (T + C) × HW.

It is understood that the above steps S504 to S506 are performed by performing feature fusion processing on the image features and the internal structural features to obtain comprehensive features of the appearance and the internal structural features. The process achieves fusion of the image features and the internal structural features.

Before the classifier is used for classifying the appearance features and the internal structure features of the initial target detection result and further determining the initial target detection result, the extracted appearance features and the internal structure features are subjected to the fusion processing, so that the association between the extracted features can be enhanced, and convenience is brought to the classification processing of the classifier.

And S507, inputting the comprehensive characteristics of the representation and the internal structure into a pre-trained classifier, and determining whether the initial target detection result is an image target.

And finally, inputting the F into a pre-trained classifier, and determining whether the initial target detection result is an image target. The classifier performs classification training on the image target samples at least according to the comprehensive characteristics of the appearance and the internal structure of the image target samples.

Steps S501 to S503 in the method processing flow shown in fig. 5 correspond to steps S401 to S403 in the method embodiment shown in fig. 4, respectively, and for specific content, reference is made to the content of the method embodiment shown in fig. 4, which is not described herein again.

As an alternative implementation manner, in acquiring an initial target detection result, the embodiment of the present application obtains the result by detecting an image target from an image.

Taking the lung nodule detection from the lung CT image shown in fig. 2 as an example, the specific implementation process for obtaining the initial target detection result includes:

first, the image to be processed is preprocessed, i.e. the lung CT image shown in fig. 2 is preprocessed.

The CT image is used as a sequence image, and can reconstruct a three-dimensional anatomical structure of a human body. In general, the image features of the lung nodule region are different from those of the normal lung region, and in order to reduce the processing amount and avoid the influence of the non-nodule region image on the nodule region image detection, the embodiment of the present application preprocesses the lung CT image, cuts the HU value of the voxel of the CT image within-1000 to 1000, then calculates the mean and variance of the HU value of the voxel of the CT image, and further performs normalization processing on the voxel value of the CT image.

After the preprocessing is finished, inputting the preprocessed image to be processed into a pre-trained image target detection model to obtain an initial target detection result.

After the lung CT image is preprocessed, the preprocessed lung CT image is input into a pre-trained image target detection model, and an initial target detection result is obtained by utilizing the model for extraction.

Illustratively, the embodiment of the application utilizes U-net as an image target detection model. U-net is a common convolutional neural network, which extracts features through a series of convolution operations, and reconstructs an image target segmentation result through a series of inverse convolution operations, as shown in FIG. 6. Because the spatial dimension of the CT is large, if the whole CT is directly sent into the CT, the hardware condition limit is exceeded, so the embodiment of the present application first continuously divides the CT image into equal-sized blocks, and then sends the blocks into the trained U-net, so that the segmentation result of the candidate nodule can be obtained. As shown in fig. 7, which is a candidate segmentation probability map obtained by segmenting candidate nodules from a certain layer image of a certain block. After the candidate node segmentation probability map of each layer of the block is obtained, corresponding to the block, the candidate node segmentation probability map in the block can be obtained.

After obtaining the segmentation probability map of the candidate nodule, performing threshold truncation on the segmentation probability value of each voxel in the map according to the following formula:

the value of T is generally 0.5, and when the segmentation probability of any voxel is smaller than T, the voxel is considered as background, otherwise, the voxel is considered as foreground, i.e., a nodule. By means of the threshold truncation operation, a binary image of the lung nodule segmentation can be obtained, a connected region in the binary image is found, the geometric center of gravity of the connected region is calculated, and the nodule position can be determined. And extracting a mass block at the position of the nodule from the original lung CT image to obtain a nodule detection result.

The U-net is a deep learning network, and before the detection of the image object is realized by applying the network, the embodiment of the present application trains the network in advance to have the capability of detecting the image object from the image.

In response to the need to detect nodules from CT images of the lungs, embodiments of the present application train the U-net network with CT image samples of the lungs.

Specifically, in training the U-net network, lung CT image samples are first collected from an open source data set or hospital. In order to prevent missing nodules, the layer thickness of the CT image sample is required to be 2mm or less.

And then performing data annotation on the collected lung CT image samples. In 2018, a chest CT lung nodule data annotation and quality control expert consensus was jointly issued by the Chinese food and drug examination research institute and the Chinese medical society radiology department of cardiology, and in the embodiment of the application, annotation content and an annotation mode of a lung CT image sample are taken as references, so that positions and boundaries of nodules are mainly annotated, and common signs and categories of the nodules are also annotated (the categories include solid nodules, partial solid nodules, ground glass nodules and calcified nodules, and the signs include leaf-dividing signs, burr signs and the like). To improve the labeling accuracy, two simultaneous labeling runs were performed on the same CT case by 3 physicians: in the first round, each doctor independently finishes labeling; and in the second round, each doctor can correct the marking result of the doctor according to the marking results of other doctors. After the two rounds of labeling are completed, only the nodes labeled by 2 or more doctors at the same time are finally defined as effective nodes, and the nodes can be used for training the U-net network.

After the lung CT image samples are labeled, the voxel values in the CT image samples are cut off within-1000 to 1000, and then the mean value and the variance of all the CT image samples are calculated for normalization processing. When the mean and the variance of all the CT images are calculated and normalized, the mean and the variance of each CT image may be calculated and normalized, or the mean and the variance of all the CT image samples may be calculated and normalized.

After the lung CT image sample processing is completed, the sample is used for training the U-net network until the accuracy of detecting the nodule from the lung CT image by the U-net network meets the set requirement, the training of the network is completed, and the network can be used for detecting the nodule from the lung CT image to obtain an initial target detection result.

Further, by combining the descriptions of the above embodiments of the present application, the target detection system can be obtained by sequentially combining and connecting the target detection model, the symptom identification model, the structural relationship identification model, and the classifier. The specific structure of each part model of the system can be realized by referring to the corresponding parts of the above embodiments.

As shown in fig. 8, when an image to be processed, such as that shown in fig. 2, is input into the object detection model of the object detection system, the object detection model detects an image object from the image to obtain an initial object detection result; then, the initial target detection result respectively enters a symptom identification model and a structural relationship identification model to extract the appearance characteristics and the internal structural characteristics; and finally, the extracted appearance features and internal structure features enter a classifier, the classifier further judges whether the initial target detection result is an image target, and at the moment, the initial target detection result of a non-image target is filtered, so that the accuracy of the initial target detection result is ensured.

It can be understood that the above target detection system includes two parts of processing of target detection and identification of initial target detection result, and on the basis of implementing image target detection, the target detection accuracy is further improved.

Corresponding to the above method for identifying target detection results, the embodiment of the present application further discloses an apparatus for identifying target detection results, as shown in fig. 9, the apparatus includes:

a data obtaining unit 100, configured to obtain an initial target detection result, where the initial target detection result is a preliminary detection result obtained by detecting an image target from an image;

a first data processing unit 110, configured to divide the initial target detection result into an image unit sequence, and extract an internal structural feature of the initial target detection result according to the image unit sequence;

a determining unit 120, configured to determine whether the initial target detection result is an image target according to at least the internal structure feature.

In the device for identifying a target detection result provided in the embodiment of the present application, after the data obtaining unit 100 obtains the initial target detection result, the first data processing unit 110 divides the initial target detection result into an image unit sequence, and extracts an internal structural feature of the initial target detection result according to the image unit sequence, and then the judgment processing unit 120 determines whether the initial target detection result is an image target according to the internal structural feature of the initial target detection result. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

Illustratively, the initial target detection result is a three-dimensional initial target detection result;

correspondingly, when the first data processing unit 110 divides the initial target detection result into an image unit sequence, it is specifically configured to:

As an optional implementation manner, when the first data processing unit 110 extracts the internal structural feature of the initial target detection result according to the image unit sequence, it is specifically configured to:

As an alternative implementation, the structural relationship recognition model is obtained based on recurrent neural network training.

As an alternative implementation, referring to fig. 10, the apparatus further includes:

a second data processing unit 130, configured to extract an appearance feature of the initial target detection result;

the judgment processing unit 120 is specifically configured to, when determining whether the initial target detection result is an image target according to at least the internal structure feature:

As an optional implementation manner, when the second data processing unit 130 extracts the appearance feature of the initial target detection result, it is specifically configured to:

As an alternative implementation, the feature recognition model is obtained based on convolutional neural network training.

As an optional implementation manner, when determining whether the initial target detection result is an image target according to the appearance feature and the internal structure feature, the judgment processing unit 120 is specifically configured to:

As an optional implementation manner, the determining processing unit 120 includes:

As an optional implementation manner, the feature fusion unit includes:

As an optional implementation manner, when the data obtaining unit 100 obtains an initial target detection result, it is specifically configured to:

preprocessing an image to be processed;

Another embodiment of the present application further discloses an apparatus for discriminating a target detection result, as shown in fig. 11, the apparatus including:

a memory 200 and a processor 210;

wherein, the memory 200 is connected to the processor 210 for storing programs;

the processor 210 is configured to implement the method for identifying the target detection result disclosed in any of the above embodiments by running the program stored in the memory 200.

Specifically, the apparatus for discriminating a result of the target detection may further include: a bus, a communication interface 220, an input device 230, and an output device 240.

The processor 210, the memory 200, the communication interface 220, the input device 230, and the output device 240 are connected to each other through a bus. Wherein:

a bus may include a path that transfers information between components of a computer system.

The processor 210 may be a general-purpose processor, such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in accordance with the present invention. But may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The processor 210 may include a main processor and may also include a baseband chip, modem, and the like.

The memory 200 stores programs for executing the technical solution of the present invention, and may also store an operating system and other key services. In particular, the program may include program code including computer operating instructions. More specifically, memory 200 may include a read-only memory (ROM), other types of static storage devices that may store static information and instructions, a Random Access Memory (RAM), other types of dynamic storage devices that may store information and instructions, a disk storage, a flash, and so forth.

The input device 230 may include a means for receiving data and information input by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor, among others.

Output device 240 may include equipment that allows output of information to a user, such as a display screen, a printer, speakers, and the like.

Communication interface 220 may include any device that uses any transceiver or the like to communicate with other devices or communication networks, such as an ethernet network, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The processor 2102 executes the programs stored in the memory 200 and invokes other devices, which may be used to implement the steps of the method for identifying the target detection result provided by the embodiments of the present application.

Another embodiment of the present application further provides a storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements the steps of the method for identifying a target detection result provided in any of the above embodiments.

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.

The modules and sub-modules in the device and the terminal in the embodiments of the application can be combined, divided and deleted according to actual needs.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of a module or a sub-module is only one logical division, and there may be other divisions when the terminal is actually implemented, for example, a plurality of sub-modules or modules may be combined or integrated into another module, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules or sub-modules described as separate parts may or may not be physically separate, and parts that are modules or sub-modules may or may not be physical modules or sub-modules, may be located in one place, or may be distributed over a plurality of network modules or sub-modules. Some or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, each functional module or sub-module in the embodiments of the present application may be integrated into one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated into one module. The integrated modules or sub-modules may be implemented in the form of hardware, or may be implemented in the form of software functional modules or sub-modules.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software cells may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for identifying a result of detection of a target, comprising:

2. The method of claim 1, wherein the initial target detection result is a three-dimensional initial target detection result;

3. The method of claim 2, wherein said extracting internal structural features of the initial target detection result from the sequence of image units comprises:

4. The method of claim 3, wherein the structural relationship recognition model is derived based on recurrent neural network training.

5. The method of claim 1, further comprising:

6. The method of claim 5, wherein said extracting the appearance feature of the initial target detection result comprises:

7. The method of claim 6, wherein the feature recognition model is based on convolutional neural network training.

8. The method of claim 5, wherein said determining whether the initial object detection result is an image object based on the appearance features and the internal structure features comprises:

9. The method of claim 5, wherein said determining whether the initial object detection result is an image object based on the appearance features and the internal structure features comprises:

10. The method according to claim 9, wherein the performing a feature fusion process on the appearance feature and the internal structure feature to obtain an appearance and internal structure comprehensive feature comprises:

11. An apparatus for discriminating a result of detection of an object, comprising:

12. The apparatus of claim 11, wherein the initial target detection result is a three-dimensional initial target detection result;

when the first data processing unit divides the initial target detection result into an image unit sequence, the first data processing unit is specifically configured to:

13. The apparatus according to claim 12, wherein the first data processing unit, when extracting the internal structural feature of the initial target detection result according to the image unit sequence, is specifically configured to:

14. The apparatus of claim 11, further comprising:

15. The apparatus according to claim 14, wherein the second data processing unit, when extracting the appearance feature of the initial target detection result, is specifically configured to:

16. An apparatus for discriminating a result of detection of an object, comprising:

a memory and a processor;

the processor is configured to implement the method for discriminating a target detection result according to any one of claims 1 to 10 by executing the program stored in the memory.

17. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method of authenticating a target detection result according to any one of claims 1 to 10.