CN110796659B

CN110796659B - Target detection result identification method, device, equipment and storage medium

Info

Publication number: CN110796659B
Application number: CN201911289667.7A
Authority: CN
Inventors: 殷保才; 徐亮; 孙梅
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-06-24
Filing date: 2019-12-13
Publication date: 2023-12-01
Anticipated expiration: 2039-12-13
Also published as: CN110796659A; CN110264460A

Abstract

The application provides a target detection result identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an initial target detection result, wherein the initial target detection result is a preliminary detection result obtained by detecting an image target from an image; dividing the initial target detection result into an image unit sequence, and extracting internal structural features of the initial target detection result according to the image unit sequence; and determining whether the initial target detection result is an image target or not at least according to the internal structural characteristics. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

Description

Target detection result identification method, device, equipment and storage medium

The present application claims priority from chinese patent office, application number 201910549722.5, entitled "method, apparatus, device, and storage medium for identifying target test results," filed 24 d 6 in 2019, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a target detection result.

Background

Object detection, i.e. detecting an image object from an image, is a common processing content in image processing. With the increasing demand for automation of image object detection, the application of automated image icon detection schemes is becoming more popular.

On the other hand, detecting a correct image target from an image is a key to the subsequent processing of the target detection result. The performances of different target detection schemes are different, even the target detection performance of the same target detection scheme when applied to different images is different, so that whether the target detection result is an image target is identified, the accuracy of target detection is ensured, and the method is an actual requirement in target detection application.

Disclosure of Invention

Based on the above needs, the present application provides a method, apparatus, device and storage medium for identifying a target detection result, which can be used to identify whether the target detection result is an image target.

A method of identifying a target detection result, comprising:

acquiring an initial target detection result, wherein the initial target detection result is a preliminary detection result obtained by detecting an image target from an image;

Dividing the initial target detection result into an image unit sequence, and extracting internal structural features of the initial target detection result according to the image unit sequence;

and determining whether the initial target detection result is an image target or not at least according to the internal structural characteristics.

Optionally, the initial target detection result is a three-dimensional initial target detection result;

the dividing the initial target detection result into a sequence of image units includes:

dividing the initial target detection result into a two-dimensional image sequence.

Optionally, the extracting the internal structural feature of the initial target detection result according to the image unit sequence includes:

inputting the two-dimensional image sequence into a pre-trained structural relation recognition model, and extracting internal structural features of the initial target detection result;

the structural relation recognition model is obtained at least through relation feature training among two-dimensional image frames of the three-dimensional image target sample.

Optionally, the structural relation recognition model is obtained based on cyclic neural network training.

Optionally, the method further comprises:

extracting the appearance characteristics of the initial target detection result;

The determining whether the initial target detection result is an image target at least according to the internal structural feature comprises:

and determining whether the initial target detection result is an image target according to the appearance characteristic and the internal structure characteristic.

Optionally, the extracting the appearance feature of the initial target detection result includes:

inputting the initial target detection result into a pre-trained sign recognition model to obtain the appearance characteristics of the initial target detection result;

wherein the sign recognition model is obtained at least through extracting category characteristics and/or sign characteristics of the image target sample.

Optionally, the symptom identification model is obtained based on convolutional neural network training.

Optionally, the determining whether the initial target detection result is an image target according to the appearance feature and the internal structural feature includes:

inputting the appearance characteristics and the internal structure characteristics into a pre-trained classifier, and determining whether the initial target detection result is an image target or not;

the classifier is obtained by classifying and training the image target sample at least according to the appearance characteristics and the internal structural characteristics of the image target sample.

performing feature fusion processing on the appearance features and the internal structure features to obtain appearance and internal structure comprehensive features;

inputting the appearance and the internal structure comprehensive characteristics into a pre-trained classifier, and determining whether the initial target detection result is an image target or not;

the classifier is obtained by classifying and training the image target sample at least according to the appearance and the internal structure comprehensive characteristics of the image target sample.

Optionally, the performing feature fusion processing on the appearance feature and the internal structure feature to obtain an integral feature of the appearance and the internal structure includes:

regularizing the appearance characteristic and the internal structural characteristic respectively;

processing the regularized appearance features and the internal structural features into features with the same scale;

and splicing the appearance features and the internal structure features with the same scale to obtain the appearance and the internal structure comprehensive features.

Optionally, the obtaining an initial target detection result includes:

Preprocessing an image to be processed;

inputting the preprocessed image to be processed into a pre-trained image target detection model to obtain an initial target detection result;

the image target detection model is obtained at least through image target detection training.

An authentication apparatus of a target detection result, comprising:

the data acquisition unit is used for acquiring an initial target detection result, wherein the initial target detection result is a preliminary detection result obtained by detecting an image target in an image;

the first data processing unit is used for dividing the initial target detection result into an image unit sequence and extracting internal structural features of the initial target detection result according to the image unit sequence;

and the judging and processing unit is used for determining whether the initial target detection result is an image target or not at least according to the internal structural characteristics.

correspondingly, when the first data processing unit divides the initial target detection result into a sequence of image units, the first data processing unit is specifically configured to:

Optionally, when the first data processing unit extracts the internal structural feature of the initial target detection result according to the image unit sequence, the first data processing unit is specifically configured to:

Optionally, the apparatus further includes:

the second data processing unit is used for extracting the appearance characteristics of the initial target detection result;

the judging and processing unit is specifically configured to, when determining whether the initial target detection result is an image target according to at least the internal structural feature:

Optionally, when the second data processing unit extracts the appearance feature of the initial target detection result, the second data processing unit is specifically configured to:

Optionally, the determining unit determines, according to the appearance feature and the internal structural feature, whether the initial target detection result is an image target, where the determining unit is specifically configured to:

Optionally, the judging and processing unit includes:

the feature fusion unit is used for carrying out feature fusion processing on the appearance features and the internal structure features to obtain the appearance and the internal structure comprehensive features;

the feature processing unit is used for inputting the appearance and the internal structure comprehensive features into a pre-trained classifier and determining whether the initial target detection result is an image target or not;

Optionally, the feature fusion unit includes:

the first processing unit is used for regularizing the appearance characteristic and the internal structural characteristic respectively;

the second processing unit is used for processing the regularized appearance characteristics and the internal structure characteristics into characteristics with the same scale;

and the third processing unit is used for performing splicing processing on the appearance features and the internal structure features with the same scale to obtain the appearance and the internal structure comprehensive features.

Optionally, when the data acquisition unit 100 acquires an initial target detection result, the data acquisition unit is specifically configured to:

preprocessing an image to be processed;

An authentication apparatus of an initial target detection result, comprising:

a memory and a processor;

the memory is connected with the processor and used for storing programs;

the processor is configured to implement the above-described method for identifying the target detection result by running the program stored in the memory.

A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described target detection result discrimination method.

According to the target detection result identification method, the initial target detection result is divided into the image unit sequences, the internal structural features of the initial target detection result are extracted according to the image unit sequences, and whether the initial target detection result is an image target or not is further determined according to the internal structural features of the initial target detection result. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for identifying a target detection result according to an embodiment of the present application;

FIG. 2 is a schematic representation of a CT image of a lung provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a process flow of the self-attention mechanism provided by an embodiment of the present application;

FIG. 4 is a flowchart of another method for identifying a target detection result according to an embodiment of the present application;

FIG. 5 is a flowchart of another method for identifying a target detection result according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a U-net network structure provided by an embodiment of the present application;

FIG. 7 is a schematic illustration of lung nodule segmentation provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of a process of the object detection system according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of an apparatus for identifying a target detection result according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of an authentication device for detecting another object according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an authentication device for detecting a target according to an embodiment of the present application.

Detailed Description

The technical scheme of the embodiment of the application can be applied to the application scene for identifying whether the image target detection result is the image target. By adopting the technical scheme of the embodiment of the application, the target detection result can be identified, and whether the target detection result is an image target or not can be determined.

As an exemplary implementation manner, the technical solution of the embodiment of the present application may be applied to a hardware device such as a hardware processor, or packaged into a software program to be executed, and when the hardware processor executes a processing procedure of the technical solution of the embodiment of the present application, or the software program is executed, it may be implemented to identify a target detection result and determine whether it is an image target. The embodiment of the application only exemplary introduces the specific processing procedure of the technical scheme of the application, but does not limit the specific implementation form of the technical scheme of the application, and any technical implementation form capable of executing the processing procedure of the technical scheme of the application can be adopted by the embodiment of the application.

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application provides a method for identifying a target detection result, which is shown in fig. 1, and comprises the following steps:

S101, acquiring an initial target detection result.

The initial target detection result is a preliminary detection result obtained by detecting an image target from an image. Since the initial target detection result is derived from the image, the initial target detection result is essentially part of the image content in the image. The specific form of the initial target detection result depends on the form of the image target detected from the image, and when the initial target detection result is a target detection result obtained by detecting a two-dimensional image target from a two-dimensional image or a three-dimensional image, the initial target detection result is a two-dimensional image; when the initial target detection result is a target detection result obtained by detecting a three-dimensional image target from a three-dimensional image, the initial target detection result is a three-dimensional target detection result.

The technical scheme of the embodiment of the application can be suitable for identifying the initial target detection result in any form, such as two-dimensional or three-dimensional, and is especially suitable for identifying the initial target detection result in three-dimensional to determine whether the initial target detection result is an image target. The embodiment of the application takes the identification of the three-dimensional initial target detection result as an example, and introduces a specific processing procedure of the technical scheme of the embodiment of the application. By way of example, the embodiment of the present application will take the example of identifying the lung nodule detection result of the lung CT image shown in fig. 2, and describe the specific processing procedure of the technical solution of the embodiment of the present application.

When the technical scheme of the embodiment of the application is applied to identifying other forms of target detection results, the specific processing process can be executed by referring to the embodiment of the application, or the processing process described by the embodiment of the application is executed after being adaptively adjusted and perfected, the embodiment of the application is not described in detail, but it can be understood that various modifications or adaptive adjustment applications related to the technical scheme of the embodiment of the application are within the protection scope of the embodiment of the application.

The initial target detection result may be obtained by reading the initial target detection result from a database, or the received initial target detection result, or may be an image detection result obtained by directly detecting an image target from an image, or the like.

S102, dividing the initial target detection result into an image unit sequence, and extracting internal structural features of the initial target detection result according to the image unit sequence;

the above image unit sequence refers to an image unit sequence obtained by dividing an image into unit areas, and arranging the image units according to the positions of the image units in an original image. The internal structural feature refers to a feature that represents the mutual structural relationship between each image unit in an image.

It can be understood that the above-mentioned initial target detection result is divided into the image unit sequences, which corresponds to the composition structure that divides the initial target detection result. Because each component structure in the specific initial target detection result has specific interrelationship characteristics, in order to retain the original interrelationship characteristics between the component structures of the initial target detection result, the embodiment of the application sets that after dividing the initial target detection result into image units, the image units are necessarily arranged into an image unit sequence according to the position relationship of the image units in the original image.

On the basis, the interrelationship among the image units in the sequence is calculated, namely, the interrelationship among the internal structures of the initial target detection result is obtained, and the calculated interrelationship is further characterized, namely, the internal structure characteristics of the initial target detection result can be obtained.

As an exemplary implementation, after dividing the initial target detection result into the image unit sequence, the internal structural feature of the initial target detection result may be extracted from the image unit sequence by means of a model.

And inputting the image unit sequence into a pre-trained feature extraction model to obtain the internal structural features of the initial target detection result. The feature extraction model is obtained at least through structural relation feature training among all image units of the extracted image unit sequence, so that the structural relation feature among all image units of the input image unit sequence can be accurately extracted. The specific structure of the feature extraction model can adopt a common artificial intelligent algorithm structure, for example, can adopt neural network structures such as CNN, RNN and the like.

According to the scheme, the embodiment of the application divides the obtained initial target detection result into the image unit sequences, and calculates the internal structural characteristics of the initial target detection result according to the interrelationships among the parts of the image unit sequences.

It can be understood that when the initial target detection result is a two-dimensional image, the initial target detection result may be divided into two-dimensional image blocks according to rows and columns, and then the two-dimensional image blocks are arranged according to positions of the two-dimensional image blocks in the original image, so as to obtain a two-dimensional image block sequence. And extracting structural relation features among the image blocks from the two-dimensional image block sequence, for example, inputting the two-dimensional image block sequence into a pre-trained feature extraction model, and obtaining the structural relation features among the two-dimensional image blocks, namely obtaining the internal structural features of the initial target detection result.

For example, when the technical solution of the embodiment of the present application is applied to the identification of the three-dimensional initial target detection result, for example, when the technical solution is applied to the identification of whether the lung nodule detection result of the lung CT image shown in fig. 2 is a lung nodule, the initial target detection result is the three-dimensional initial target detection result, that is, the three-dimensional image area where the lung nodule detected from the lung CT image is located.

At this time, when the initial target detection result is divided into the image unit sequences, the initial target detection result may be divided into two-dimensional image sequences, that is, the initial target detection result of the three-dimensional object may be divided into two-dimensional plane image sequences.

For example, if a three-dimensional image block (c×h×w) where a lung nodule is located is obtained by detecting a lung nodule from the above-mentioned lung CT image, when dividing an image unit for the three-dimensional image block, the three-dimensional image block may be divided into two-dimensional plane images of C frames h×w, and the two-dimensional plane images obtained by the division may be arranged according to the positions thereof in the three-dimensional image block, so that a two-dimensional plane image sequence may be obtained.

After dividing the three-dimensional initial target detection result into a two-dimensional image sequence, as an optional implementation manner, the embodiment of the application inputs the two-dimensional image sequence into a pre-trained structural relationship recognition model, and extracts the internal structural features of the initial target detection result.

That is, the embodiment of the present application trains the structural relationship recognition model in advance for extracting the features of the interrelationship between the two-dimensional image frames of the three-dimensional image target sample.

And continuously extracting the relation features among the two-dimensional image frames of the three-dimensional image target sample through the constructed structural relation recognition model, and training the structural relation recognition model until the structural relation recognition model can accurately extract the internal structural relation features of the three-dimensional image target sample.

At this time, each frame of two-dimensional plane image contained in the two-dimensional image sequence is taken as a moment, sequentially input into the structural relation recognition model, the model extracts the characteristics of each frame of two-dimensional plane image, and models the relation among different frames at the same time, so that the internal structural relation characteristics of the two-dimensional image sequence are extracted, and the internal structural characteristics of the three-dimensional initial target detection result are obtained.

As an exemplary implementation manner, the structural relationship recognition model is obtained based on training of a recurrent neural network (Recurrent Neural Network, RNN), that is, a training process for extracting the relationship features between two-dimensional image frames of a three-dimensional image target sample is performed on the RNN, so that the RNN has the capability of extracting the internal structural features of the three-dimensional image, and is used as the structural relationship recognition model.

Exemplary, the embodiment of the application obtains the structural relation recognition model based on BiLSTM training of the two-way long-short-term memory network.

It should be noted that, unlike the task of conventional sequence learning, the task of detecting a lung nodule in a lung CT image has only one output, so that the output module of the RNN has only one node, and the modeling unit in the RNN may be a Long Short-Term Memory (LSTM) or a gating cycle unit (Gated Recurrent Unit, GRU), which is not strictly limited in the embodiment of the present application. In order to integrate the features of different image frames, the embodiment of the application automatically learns the importance of different image frames by adopting a self-attention mechanism as shown in fig. 3, and can greatly improve the distinguishing property of the integrated features.

S103, determining whether the initial target detection result is an image target or not at least according to the internal structural characteristics.

Typically, different image regions have different internal structural features. Such as the lung CT image shown in fig. 2, in which the internal structural features of the lung nodule image region are different from those of the normal image region. Therefore, the initial target detection result can be identified according to the internal structural feature of the initial target detection result, and whether the initial target detection result is an image target can be identified by identifying whether the internal structural feature of the initial target detection result accords with the internal structural feature of the image target.

Exemplary, the classifier is trained in advance, and whether the initial target detection result is an image target is determined by classifying the internal structural features.

The classifier is obtained by classifying and training the image target sample at least according to the internal structural characteristics of the image target sample.

The classifier can classify the input internal structural features, and when the classifier classifies the internal structural features of the input initial target detection result as the internal structural feature type of the image target, the initial target detection result can be determined to be the image target.

As an optional implementation manner, the embodiment of the application combines the structural relation recognition model with the classifier for design and training. The internal structural features of the initial target detection result obtained by the output layer of the structural relation recognition model are directly used as the input of the classifier, and the structural relation recognition model and the classifier are combined to form a complete processing model. After the processing model is trained, namely the combined training of the structural relation recognition model and the classifier is completed, the image unit sequence of the initial target detection result is input into the processing model, and whether the initial target detection result is an image target can be recognized.

As can be seen from the above description, the method for identifying a target detection result according to the embodiment of the present application divides an initial target detection result into an image unit sequence, extracts internal structural features of the initial target detection result according to the image unit sequence, and further determines whether the initial target detection result is an image target according to the internal structural features of the initial target detection result. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

As an alternative implementation manner, referring to fig. 4, another embodiment of the present application further discloses a method for identifying an initial target detection result, where after obtaining the initial target detection result, the method further includes:

s402, extracting the appearance characteristics of the initial target detection result.

Accordingly, when determining whether the initial target detection result is an image target, the above-mentioned appearance feature and internal structure feature are used in combination to determine whether the initial target detection result is an image target.

That is, the above-mentioned determining whether the initial target detection result is an image target at least according to the internal structural feature specifically includes:

S404, determining whether the initial target detection result is an image target according to the appearance characteristic and the internal structure characteristic.

The above-mentioned appearance characteristics of the initial target detection result refer to external structural characteristics of the initial target detection result, such as shape, whether the surface is smooth, whether burrs are present, and the like.

As an exemplary implementation, the embodiment of the present application extracts the category features and/or the sign features of the initial target detection result as the appearance features of the initial target detection result. Taking the example of extracting the appearance characteristics of lung nodules in a lung CT image, the classification of the lung nodules, namely, the classification attribute of benign nodules or malignant nodules and the like; the sign characteristic of the lung nodule is that whether the lung nodule has external shape characteristics such as burrs.

Steps S401 and S403 in the method for identifying the initial target detection result shown in fig. 4 correspond to steps S101 and S102 in the method embodiment shown in fig. 1, respectively, and the specific content thereof is shown in the method embodiment shown in fig. 1 and will not be described herein.

As an exemplary implementation manner, when extracting the appearance characteristics of the initial target detection result, the embodiment of the application inputs the initial target detection result into a pre-trained sign recognition model to obtain the appearance characteristics of the initial target detection result.

The sign recognition model is obtained at least through extracting category characteristics and/or sign characteristics of the image target sample.

The above-mentioned sign recognition model is obtained by training the CNN based on a convolutional neural network, that is, by performing a single-task or multi-task learning training on the CNN using the image target sample, so as to extract the category characteristics of the image target sample, or extract the sign characteristics of the image target sample, or extract the category characteristics and the sign characteristics of the image target sample at the same time, that is, recognize the category and the sign of the image target sample at the same time. When the convolutional neural network CNN can accurately extract the category characteristics and/or the sign characteristics of the image target sample, training of the network is finished, and the network can be used as a sign recognition model.

Inputting the initial target detection result into a trained sign recognition model, and extracting the category characteristics and/or sign characteristics of the initial target detection result from the model to serve as the appearance characteristics of the initial target detection result.

When the embodiment of the application is applied to the extraction of the appearance characteristics of the lung nodule detection result of the lung CT image, the symptom identification model is obtained based on 3D-CNN training. At this time, the lung nodule image sample marked with the category and/or the sign is input into the 3D-CNN, so that the 3D-CNN can identify the category and/or the sign of the lung nodule, and when the lung nodule image sample can accurately identify the category and/or the sign of the lung nodule, the training of the 3D-CNN is considered to be completed. At this time, the category features and/or the sign features of the lung nodules output by the output layer of the 3D-CNN are the results of extracting the category features and/or the sign features of the input lung nodules.

It should be noted that, when the technical scheme of the embodiment of the present application is actually applied, the structure of the above-mentioned sign recognition model may be flexibly selected, for example, when the method is applied to extracting the appearance feature from the planar image, the sign recognition model may adopt a CNN structure; or may be implemented by adopting a model structure such as RNN.

As an optional implementation manner, determining whether the initial target detection result is an image target according to the appearance feature and the internal structural feature includes:

According to the embodiment of the application, when the initial target detection result is identified as the image target through the appearance characteristic and the internal structural characteristic of the initial target detection result, the classifier is trained to classify the image target sample according to the appearance characteristic and the internal structural characteristic of the image target sample, so that whether the image target sample is the image target can be accurately identified.

After the appearance features and the internal structural features of the initial target detection result are respectively extracted through the sign recognition model and the structural relation recognition model, the features are input into a pre-trained classifier, so that whether the initial target detection result is an image target is determined.

In the above-mentioned lung nodule identification, in an actual scene, when a doctor manually identifies whether a lung nodule detected by a machine from a lung CT image is a true nodule, the doctor first observes the appearance of the nodule, for example, observes the classification of the nodule, whether there is a burr or not, and then turns up and down the CT image layer where the detected lung nodule is located, and determines whether the detected lung nodule is a true nodule by observing the structural relationship between the detected lung nodule and surrounding tissues.

It can be appreciated that in manually identifying a lung nodule, the determination is made by identifying the appearance characteristics of the lung nodule and flipping the image to view the spatial structural relationship characteristics within the lung nodule.

In the technical scheme of the embodiment of the application, the identification of the initial target detection result is realized by extracting the appearance characteristic and the internal structural characteristic of the initial target detection result. The internal structural features provided by the embodiment of the application, namely structural relationship features among the two-dimensional plane images of the CT image area where the lung nodule is located, correspond to structural relationships between the lung nodule and peripheral tissues observed by manually turning over the CT image layer where the lung nodule is located; the above-mentioned appearance characteristics are consistent with those obtained by manually observing the pulmonary nodules.

Therefore, it can be understood that, in the technical solution of the embodiment of the present application, whether the initial target detection result is an image target is identified by extracting the appearance characteristic and the internal structural characteristic of the initial target detection result, which is actually an operation process of simulating to observe the appearance characteristic of the initial target detection result manually and determining the internal tissue structure relationship by turning over each layer of the initial target detection result, so as to determine whether the initial target detection result is an image target. Therefore, the technical scheme of the embodiment of the application identifies whether the initial target detection result is the image target in the processing mode closest to manual operation, has a practical basis and has the feasibility guarantee.

Alternatively, as another implementation manner, another embodiment of the present application discloses, referring to fig. 5, that determining whether the initial target detection result is an image target according to the appearance feature and the internal structural feature, including:

s504, regularizing the appearance characteristic and the internal structural characteristic respectively;

taking the example of a lung nodule detected from the lung CT image shown in fig. 2, assuming that the output layer of the sign recognition model outputs the appearance feature F2 of the lung nodule (assuming that c×h×w dimensions, C, H, W denote the channel, height, width of 3D-CNN, respectively), the output layer of the structural relation recognition model outputs the internal structural feature F1 of the lung nodule (assuming that t×hw dimensions, T denote the frame number, HW denote that the 2D image feature of each frame is stretched to one dimension), since F1 and F2 may have different ranges, the embodiment of the present application first performs L2-norm regularization processing on F1 and F2, and then performs step S505:

S505, processing the regularized appearance features and the internal structure features into features with the same scale;

by scaling F1 and F2, both are processed into features of the same scale. For example, F2 is stretched to c×hw dimension to be the same as F1 dimension.

S506, splicing the appearance features and the internal structure features with the same scale to obtain an appearance and an internal structure comprehensive feature;

and splicing F1 of the T-HW dimension and F2 of the C-HW dimension to obtain the (T+C) -HW dimension representation and the internal structure comprehensive feature F.

It will be appreciated that the steps S504 to S506 described above obtain the integrated features of the image and the internal structural feature by performing feature fusion processing on the image feature and the internal structural feature. The process achieves fusion of the surface features and the internal structural features.

Before the classifier is used for classifying the appearance features and the internal structural features of the initial target detection result so as to determine the initial target detection result, the fusion processing is carried out on the extracted appearance features and the internal structural features, so that the association between the extracted features can be enhanced, and meanwhile, the classification processing of the classifier is facilitated.

S507, inputting the appearance and the internal structure comprehensive characteristics into a pre-trained classifier, and determining whether the initial target detection result is an image target.

And finally, inputting the F into a pre-trained classifier, and determining whether the initial target detection result is an image target. The classifier is obtained by performing classification training on the image target sample at least according to the appearance and the internal structure comprehensive characteristics of the image target sample.

Steps S501 to S503 in the processing flow of the method shown in fig. 5 correspond to steps S401 to S403 in the method embodiment shown in fig. 4, respectively, and the specific content thereof is shown in the method embodiment shown in fig. 4, and will not be described herein.

As an alternative implementation manner, the embodiment of the present application obtains the initial target detection result by detecting the image target from the image.

Taking the example of detecting a lung nodule from a lung CT image shown in fig. 2, the specific implementation process for obtaining the initial target detection result includes:

the image to be processed is first preprocessed, i.e. the lung CT image shown in fig. 2 is preprocessed.

The CT image is used as a sequence image, and can reconstruct the three-dimensional anatomical structure of the human body. In general, the features of the lung nodule region are different from those of the normal region of the lung, in order to reduce the processing amount and avoid the influence of the non-nodule region image on the detection of the nodule region image, the embodiment of the application preprocesses the lung CT image, truncates the HU value of the voxel of the CT image within-1000 to 1000, then calculates the mean value and the variance of the HU value of the voxel of the CT image, and further normalizes the voxel value of the CT image.

After the pretreatment is finished, the pretreated image to be treated is input into a pre-trained image target detection model, and an initial target detection result is obtained.

The image target detection model is obtained at least through training of detecting an image target from an image.

For the lung CT image, after preprocessing is completed, the lung CT image is input into a pre-trained image target detection model, and an initial target detection result is obtained by extracting the model.

Exemplary, embodiments of the present application utilize U-net as an image object detection model. U-net is a common convolutional neural network, and as shown in FIG. 6, the network extracts features through a series of convolutional operations, and then reconstructs image target segmentation results through a series of deconvolution operations. Because the space dimension of the CT is larger, if the CT is directly sent to the whole CT, the limitation of hardware conditions is exceeded, so that the embodiment of the application firstly divides the CT image into body blocks with equal size continuously, and then sends the body blocks into the trained U-net, and the segmentation result of the candidate nodules can be obtained. As shown in fig. 7, the nodule candidate segmentation probability map is obtained by segmenting the nodule candidates from a certain layer image of a certain body block. Corresponding to the body block, obtaining the nodule candidate segmentation probability map of each layer of the body block, and then obtaining the nodule candidate segmentation probability map in the body block.

After obtaining a nodule candidate segmentation probability map, carrying out threshold truncation on segmentation probability values of all voxels in the map according to the following formula:

the T value is generally 0.5, and is considered as the background when the segmentation probability of any voxel is smaller than T, otherwise, the foreground is the nodule. By means of the threshold value cutting operation, a binary image of lung nodule segmentation can be obtained, connected areas in the binary image are found, the geometric gravity center of the connected areas is calculated, and the nodule position can be determined. And extracting a body block at the position of the nodule from the original lung CT image to obtain a nodule detection result.

It should be noted that, the above-mentioned U-net is a deep learning network, and before the network is applied to realize the detection of the image target, the embodiment of the present application trains the network in advance, so that the network has the capability of detecting the image target from the image.

Corresponding to the need to detect nodules from lung CT images, embodiments of the present application train the U-net network with lung CT image samples.

Specifically, in training a U-net network, lung CT image samples are first collected from an open source dataset or hospital. In order to prevent missing nodules, the CT image samples are required to have a layer thickness of less than 2 mm.

And then, carrying out data annotation on the collected lung CT image samples. In 2018, the national food and drug verification institute and the radiology branch of the China medical society are combined to release the 'chest CT lung nodule data labeling and quality control expert consensus', and the labeling content and the labeling mode of a lung CT image sample are taken as references, so that the nodule position and the boundary are labeled, and common signs and categories of the nodule (the categories comprise solid nodules, partial solid nodules, ground glass nodules, calcified nodules, the signs comprise scores She Zheng, burr signs and the like) are also labeled. To improve labeling accuracy, the same CT was labeled by 3 doctors simultaneously in two rounds: each doctor independently completes marking in the first round; and each doctor can correct the labeling result according to the labeling results of other doctors in the second round. After the two rounds of labeling are completed, only the nodules labeled by 2 and more doctors simultaneously are finally defined as effective nodules and can be used for training the U-net network.

After the labeling of the lung CT image samples is completed, the voxel values in the CT image samples are truncated within-1000 to 1000, and then the mean value and the variance of all CT image samples are calculated for normalization processing. When the mean and variance of all the CT images are calculated and normalized, the mean and variance of each CT image may be calculated and normalized, or the mean and variance of all the CT image samples may be calculated and normalized.

After the lung CT image sample processing is completed, the U-net network is trained by using the sample until the accuracy of detecting the nodules from the lung CT image by the U-net network reaches the set requirement, so that the training of the network is completed, and the initial target detection result can be obtained by detecting the nodules from the lung CT image by using the network.

Further, by integrating the description of the above embodiments of the present application, the above object detection model, the sign recognition model, the structural relationship recognition model, and the classifier are sequentially combined and connected, so as to obtain the object detection system. The specific structure of each part of the system model can be realized by referring to the corresponding parts of the embodiments.

As shown in fig. 8, when an image to be processed, such as that shown in fig. 2, is input into the object detection model of the object detection system, the object detection model detects an image object from the image to obtain an initial object detection result; then, the initial target detection result enters a sign recognition model and a structural relation recognition model respectively to extract the appearance characteristic and the internal structural characteristic; finally, the extracted appearance features and internal structural features enter a classifier, the classifier further judges whether the initial target detection result is an image target or not, and at the moment, the initial target detection result of a non-image target is filtered, so that the accuracy of the initial target detection result is ensured.

It can be understood that the target detection system comprises two parts of target detection and initial target detection result identification, and the target detection accuracy is further improved on the basis of realizing image target detection.

Corresponding to the above method for identifying the target detection result, the embodiment of the application also discloses a device for identifying the target detection result, as shown in fig. 9, which comprises:

a data acquisition unit 100, configured to acquire an initial target detection result, where the initial target detection result is a preliminary detection result obtained by detecting an image target from an image;

A first data processing unit 110, configured to divide the initial target detection result into a sequence of image units, and extract internal structural features of the initial target detection result according to the sequence of image units;

the judging processing unit 120 is configured to determine whether the initial target detection result is an image target at least according to the internal structural feature.

In the device for identifying a target detection result according to the embodiment of the present application, after the data acquisition unit 100 acquires the initial target detection result, the first data processing unit 110 divides the initial target detection result into an image unit sequence, and extracts the internal structural feature of the initial target detection result according to the image unit sequence, and then the determination processing unit 120 determines whether the initial target detection result is an image target according to the internal structural feature of the initial target detection result. The method identifies the initial target detection result based on the internal structure of the initial target detection result, and can accurately identify whether the initial target detection result is an image target.

The initial target detection result is an initial target detection result of a three-dimensional body;

correspondingly, when the first data processing unit 110 divides the initial target detection result into a sequence of image units, the method is specifically used for:

As an optional implementation manner, when the first data processing unit 110 extracts the internal structural feature of the initial target detection result according to the image unit sequence, the method is specifically used for:

As an alternative implementation manner, the structural relation recognition model is obtained based on cyclic neural network training.

As an alternative implementation, referring to fig. 10, the apparatus further includes:

a second data processing unit 130, configured to extract an appearance feature of the initial target detection result;

the determining unit 120 is specifically configured to, when determining whether the initial target detection result is an image target according to at least the internal structural feature:

As an alternative implementation manner, when the second data processing unit 130 extracts the appearance feature of the initial target detection result, the method is specifically used for:

As an alternative implementation, the symptom recognition model is based on convolutional neural network training.

As an optional implementation manner, when the determining processing unit 120 determines whether the initial target detection result is an image target according to the appearance feature and the internal structural feature, the determining processing unit is specifically configured to:

As an alternative implementation manner, the determining processing unit 120 includes:

As an alternative implementation manner, the feature fusion unit includes:

As an optional implementation manner, when the data acquisition unit 100 acquires the initial target detection result, the data acquisition unit is specifically configured to:

Preprocessing an image to be processed;

Another embodiment of the present application also discloses an apparatus for identifying a target detection result, referring to fig. 11, the apparatus includes:

a memory 200 and a processor 210;

wherein the memory 200 is connected to the processor 210, and is used for storing a program;

the processor 210 is configured to implement the method for identifying the target detection result disclosed in any of the above embodiments by running the program stored in the memory 200.

Specifically, the apparatus for identifying a target detection result may further include: a bus, a communication interface 220, an input device 230, and an output device 240.

The processor 210, the memory 200, the communication interface 220, the input device 230, and the output device 240 are interconnected by a bus. Wherein:

a bus may comprise a path that communicates information between components of a computer system.

Processor 210 may be a general-purpose processor such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., or may be an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in accordance with aspects of the present application. But may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

Processor 210 may include a main processor, and may also include a baseband chip, modem, and the like.

The memory 200 stores programs for implementing the technical scheme of the present invention, and may also store an operating system and other key services. In particular, the program may include program code including computer-operating instructions. More specifically, the memory 200 may include read-only memory (ROM), other types of static storage devices that may store static information and instructions, random access memory (random access memory, RAM), other types of dynamic storage devices that may store information and instructions, disk storage, flash, and the like.

The input device 230 may include means for receiving data and information entered by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor, among others.

Output device 240 may include means, such as a display screen, printer, speakers, etc., that allow information to be output to a user.

The communication interface 220 may include devices using any transceiver or the like for communicating with other devices or communication networks, such as ethernet, radio Access Network (RAN), wireless Local Area Network (WLAN), etc.

The processor 2102 executes programs stored in the memory 200 and invokes other devices that may be used to implement the steps of the method for identifying target detection results provided by embodiments of the present application.

Another embodiment of the present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for identifying a target detection result provided in any of the above embodiments.

For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders or concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.

The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.

The modules and the submodules in the device and the terminal of the embodiments of the application can be combined, divided and deleted according to actual needs.

In the embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of modules or sub-modules is merely a logical function division, and there may be other manners of division in actual implementation, for example, multiple sub-modules or modules may be combined or integrated into another module, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules or sub-modules illustrated as separate components may or may not be physically separate, and components that are modules or sub-modules may or may not be physical modules or sub-modules, i.e., may be located in one place, or may be distributed over multiple network modules or sub-modules. Some or all of the modules or sub-modules may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional module or sub-module in the embodiments of the present application may be integrated in one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated in one module. The integrated modules or sub-modules may be implemented in hardware or in software functional modules or sub-modules.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software elements may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of identifying a target detection result, comprising:

dividing the initial target detection result into an image unit sequence, and extracting internal structural features of the initial target detection result according to the image unit sequence; the image unit sequence is an image unit sequence obtained by dividing the unit area of an image to obtain image units and arranging the image units according to the positions of the image units in an original image; the internal structural feature refers to a feature representing the mutual structural relationship between each image unit in an image;

2. The method of claim 1, wherein the initial target detection result is a three-dimensional initial target detection result;

3. The method according to claim 2, wherein said extracting internal structural features of the initial target detection result from the sequence of image units comprises:

4. A method according to claim 3, wherein the structural relationship recognition model is derived based on recurrent neural network training.

5. The method according to claim 1, wherein the method further comprises:

6. The method of claim 5, wherein said extracting the apparent features of the initial target detection result comprises:

7. The method of claim 6, wherein the symptom recognition model is derived based on convolutional neural network training.

8. The method of claim 5, wherein said determining whether said initial object detection result is an image object based on said appearance feature and said internal structural feature comprises:

9. The method of claim 5, wherein said determining whether said initial object detection result is an image object based on said appearance feature and said internal structural feature comprises:

10. The method of claim 9, wherein the feature fusion process for the appearance feature and the internal structural feature to obtain an appearance and an internal structural integrated feature comprises:

11. An authentication apparatus for detecting a target, comprising:

the first data processing unit is used for dividing the initial target detection result into an image unit sequence and extracting internal structural features of the initial target detection result according to the image unit sequence; the image unit sequence is an image unit sequence obtained by dividing the unit area of an image to obtain image units and arranging the image units according to the positions of the image units in an original image; the internal structural feature refers to a feature representing the mutual structural relationship between each image unit in an image;

12. The apparatus of claim 11, wherein the initial target detection result is a three-dimensional initial target detection result;

the first data processing unit is specifically configured to, when dividing the initial target detection result into a sequence of image units:

13. The apparatus according to claim 12, wherein the first data processing unit is configured to, when extracting the internal structural feature of the initial target detection result from the sequence of image units:

14. The apparatus of claim 11, wherein the apparatus further comprises:

15. The apparatus according to claim 14, wherein the second data processing unit is configured to, when extracting the appearance feature of the initial target detection result:

16. An authentication apparatus of a target detection result, characterized by comprising:

a memory and a processor;

the memory is connected with the processor and used for storing programs;

the processor is configured to implement the method for discriminating a target detection result according to any one of claims 1 to 10 by running a program stored in the memory.

17. A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of discriminating a target detection result according to any one of claims 1 to 10.