CN111862044B

CN111862044B - Ultrasonic image processing method, ultrasonic image processing device, computer equipment and storage medium

Info

Publication number: CN111862044B
Application number: CN202010704590.1A
Authority: CN
Inventors: 李肯立; 李胜利; 伍湘琼; 谭光华; 文华轩; 朱宁波; 陈志伦
Original assignee: Shenzhen Lanxiang Zhiying Technology Co ltd
Current assignee: Shenzhen Lanxiang Zhiying Technology Co ltd
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2024-06-18
Anticipated expiration: 2040-07-21
Also published as: CN111862044A

Abstract

The application relates to an ultrasonic image processing method, an ultrasonic image processing device, computer equipment and a storage medium. The method comprises the following steps: acquiring an ultrasound image dataset; preprocessing an ultrasonic image data set to obtain a preprocessed image data set; inputting the preprocessed image data set into a preset image processing network to obtain a corresponding feature map; respectively inputting the feature images into a preset target detection network and a preset segmentation network, identifying the category and the position of target tissues in each image through the preset target detection network, and determining the size of the target tissues in each image through the preset segmentation network; and inputting the category and the position of the target tissue in each image into a trained target tracking network to track the target, so as to obtain the category data and the number of the target tissue. By adopting the method, the tissue and organ can be identified, segmented, tracked and measured automatically in real time, and doctors can be effectively assisted in accurately screening and diagnosing the tissue and organ (thyroid and the main neck tissues around the thyroid).

Description

Ultrasonic image processing method, ultrasonic image processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an ultrasound image processing method, an ultrasound image processing apparatus, a computer device, and a storage medium.

Background

With the development of medical technology, ultrasonic examination can be dynamically examined in real time due to no radiation and low cost, and is widely applied to screening diagnosis of diseases. For example, taking thyroid disease examination as an example, ultrasonic examination has become a preferred imaging method for evaluating thyroid disease examination, in the process of performing ultrasonic diagnosis by using ultrasonic imaging, an ultrasonic doctor needs to acquire an ultrasonic scanning video of thyroid to obtain an ultrasonic image of thyroid, identify important tissue structures of thyroid glands and necks, check whether the size and internal echo of thyroid glands are abnormal, and then make analysis and diagnosis.

However, the above-mentioned ultrasonic diagnosis scheme has great problems, firstly, the identification and measurement of the tissue around the thyroid gland and the neck are dependent on the clinical experience and anatomical knowledge of doctors, and the time cost and the labor cost are quite high, so that the speed of obtaining the diagnosis result is slow, the diagnosis result is given depending on the personal experience of the doctors, and the diagnosis result lacks objectivity, consistency and repeatability. Secondly, the ultrasonic image is easily affected by speckle noise and echo disturbance, so that the image is blurred and uneven, such as the structure of surrounding tissues of thyroid is complex, and the demarcation between the thyroid and the surrounding tissues is less clear, which can affect the accuracy of diagnosis. Third, the acquisition of ultrasound images is highly dependent on the operator, and images acquired for the same patient may vary from hospital to hospital, from machine to machine, and from doctor to doctor. This image variability also affects the diagnosis of thyroid disease.

In summary, the existing ultrasonic diagnosis scheme cannot effectively assist a doctor to accurately screen and diagnose tissue organs (such as thyroid and surrounding main neck tissues) according to ultrasonic images of the tissue organs.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an ultrasound image processing method, apparatus, computer device, and storage medium that can assist a doctor in accurately screening and diagnosing a tissue organ (such as thyroid and its surrounding main cervical tissue) from ultrasound image data of the tissue organ.

An ultrasound image processing method, the method comprising:

Acquiring an ultrasound image dataset;

preprocessing an ultrasonic image data set to obtain a preprocessed image data set;

Inputting the preprocessed image data set into a preset image processing network to obtain a corresponding feature map;

Respectively inputting the feature images into a preset target detection network and a preset segmentation network, identifying the category and the position of target tissues in each image through the preset target detection network, and determining the size of the target tissues in each image through the preset segmentation network;

And inputting the category and the position of the target tissue in each image into a trained target tracking network to track the target, so as to obtain the category data and the number of the target tissue.

In one embodiment, inputting the category and the position of the target tissue in each image to the trained target tracking network for target tracking, and obtaining the category data and the number of the target tissue includes:

Acquiring the category and the position of a target tissue in a current frame image;

Predicting the position of the target tissue in the next frame of image through Kalman filtering processing based on the type and the position of the target tissue in the current frame of image;

Judging whether the target tissues in the front and back frame images are the same target tissue or not;

And distributing the same identification data to the same target organization, and counting the category data and the number of the target organization.

In one embodiment, determining whether the target tissue in the previous and subsequent frame images is the same target tissue includes:

and calculating the mahalanobis distance, the cosine distance and the intersection ratio between the target tissues in the front frame image and the rear frame image, and judging whether the target tissues in the front frame image and the rear frame image are the same target tissue.

In one embodiment, inputting the preprocessed image dataset into a preset image processing network, obtaining the corresponding feature map includes:

inputting the preprocessed image dataset into a preset feature extraction network, and extracting a corresponding initial feature map;

inputting the initial feature images into a preset regional suggestion network to obtain suggestion windows corresponding to the initial feature images;

mapping the suggestion window to an initial feature map of the last layer of the preset area suggestion network, and processing through a ROIAlign layer to obtain a feature map with a fixed size.

In one embodiment, inputting the preprocessed image dataset into a preset feature extraction network, and extracting the corresponding initial feature map includes:

Inputting the preprocessed image dataset into the backbone network ResNet-50, and extracting corresponding characteristic data;

And carrying out feature fusion on the extracted feature data by adopting a feature pyramid layer to obtain a corresponding initial feature map.

In one embodiment, determining the size of the target tissue in each image through the preset segmentation network includes:

obtaining the number of pixels in a unit length and presetting a segmentation mask corresponding to a feature map output by a segmentation network;

Calculating the number of pixels contained in a target tissue in each image according to the segmentation mask corresponding to the feature map and the number of pixels in unit length;

And determining the size of the target tissue in each image according to the number of the pixels contained in the target structure in each image and preset pixel proportion data.

In one embodiment, preprocessing the ultrasound image dataset to obtain a preprocessed image dataset comprises:

and sequentially performing scaling treatment, normalization treatment and random enhancement treatment on the ultrasonic image data set to obtain a preprocessed image data set.

An ultrasound image processing apparatus, the apparatus comprising:

the data set acquisition module is used for acquiring an ultrasonic image data set;

the data preprocessing module is used for preprocessing the ultrasonic image data set to obtain a preprocessed image data set;

the feature map acquisition module is used for inputting the preprocessed image data set into a preset image processing network to obtain a corresponding feature map;

The feature map processing module is used for respectively inputting the feature map into a preset target detection network and a preset segmentation network, identifying the type and the position of the target tissue in each image through the preset target detection network, and determining the size of the target tissue in each image through the preset segmentation network;

The target tracking module is used for inputting the category and the position of the target tissue in each image to the trained target tracking network to track the target, so as to obtain the category data and the number of the target tissue.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

Acquiring an ultrasound image dataset;

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

Acquiring an ultrasound image dataset;

The ultrasonic image processing method, the ultrasonic image processing device, the computer equipment and the storage medium are used for correspondingly processing the ultrasonic image data set to obtain the feature map with fixed size, then the feature map is respectively input into the preset target detection network and the preset segmentation network, the type and the position of the target tissue in each image can be identified, the size of the target tissue is measured, and the trained target tracking network is used for carrying out target tracking on the type and the position input of the target tissue in each image, so that the type data and the number of the target tissue can be obtained. The whole scheme can realize real-time and automatic identification, segmentation, tracking and size measurement of tissue organs (such as thyroid and surrounding main neck tissues), ensures the accuracy of data while improving the speed of result processing, and can effectively assist doctors to realize accurate screening and diagnosis of tissue organs (such as thyroid and surrounding main neck tissues).

Drawings

FIG. 1 is a diagram of an application environment for an ultrasound image processing method in one embodiment;

FIG. 2 is a flow chart of a method of ultrasound image processing in one embodiment;

FIG. 3 is a flow chart of a method of processing an ultrasound image in another embodiment;

FIG. 4 is a flow chart of the steps for obtaining category data and quantity of a target organization in one embodiment;

fig. 5 (a) is a cross-cut thyroid isthmus drawing, fig. 5 (b) is a right thyroid She Hengqie drawing, fig. 5 (c) is a right thyroid She Zongqie drawing, fig. 5 (d) is a left thyroid She Hengqie drawing, and fig. 5 (e) is a left thyroid She Zongqie drawing;

FIGS. 6 and 7 are diagrams of recognition, segmentation and capture results obtained after thyroid video frames are input to a deep convolutional neural network;

FIG. 8 is a block diagram of an ultrasound image processing apparatus in one embodiment;

Fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The ultrasonic image processing method provided by the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. Specifically, the user may upload a neck ultrasound image dataset to be processed to the server 104 through the terminal 102, operate the terminal 102 to generate an image processing message, send the image processing message to the server 104, respond to the message by the server 104, obtain an ultrasound image dataset, preprocess the ultrasound image dataset to obtain a preprocessed image dataset, input the preprocessed image dataset to a preset image processing network to obtain a corresponding feature map, respectively input the feature map to a preset target detection network and a preset segmentation network, identify the type and position of a target tissue in each image through the preset target detection network, determine the size of the target tissue in each image through the preset segmentation network, and input the type and position of the target tissue in each image to a trained target tracking network to perform target tracking to obtain the type data and the number of the target tissue. The terminal 102 may be, but not limited to, various medical devices, personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, an ultrasound image processing method is provided, and the method is applied to the server in fig. 1, and the method includes the following steps:

step 202, an ultrasound image dataset is acquired.

In practice, the ultrasound image dataset may be a continuous thyroid ultrasound video dataset acquired from an ultrasound device, which contains thyroid ultrasound image frames. Taking the thyroid ultrasound video dataset as an example, the dataset may be one that has been selected and precisely labeled by a sonographer based on clinical experience.

Step 204, preprocessing the ultrasound image dataset to obtain a preprocessed image dataset.

To ensure normalization and handleability of the image, further image preprocessing of the acquired dataset is required. Specifically, the image preprocessing may include image shrinkage and normalization processing.

As shown in fig. 3, in another embodiment, step 204 includes: step 224, sequentially performing scaling processing, normalization processing and random enhancement processing on the ultrasonic image data set to obtain a preprocessed image data set. Specifically, for each frame of image in the acquired dataset, scaling the image to 800x600 pixels, normalizing the scaled image by using a linear function to obtain normalized images, and performing random enhancement operation on each normalized image to obtain a randomly enhanced image dataset.

Step 206, inputting the preprocessed image data set to a preset image processing network to obtain a corresponding feature map.

In this embodiment, the preset image processing network may include a feature extraction network and a regional recommendation network (Region Proposal Network, RPN). Specifically, the preprocessed image dataset may be input to a trained feature extraction network to obtain a corresponding initial feature map, and then the obtained initial feature map is put into a regional suggestion network for further processing to obtain a feature map with a fixed size.

As shown in fig. 3, in one embodiment, step 206 includes:

Step 226, inputting the preprocessed image dataset into a preset feature extraction network, extracting a corresponding initial feature map, inputting the initial feature map into a preset area suggestion network, obtaining suggestion windows corresponding to the initial feature maps, mapping the suggestion windows to the initial feature map of the last layer of the preset area suggestion network, and obtaining feature maps with fixed sizes through ROIAlign layers of processing.

In this embodiment, the default feature extraction network may include backbone networks Resnet-50 and feature pyramid networks (Feature Pyramid Networks, FPN). The preprocessed image dataset may be input to the trained backbone network Resnet-50 and the feature pyramid network, the corresponding initial feature map is obtained, the initial feature map is put into the regional suggestion network, the corresponding regional suggestion windows are obtained (Region Proposals), and each map generates N suggestion windows. The resulting proposed window is then mapped onto the initial feature map of the last layer of the network, and each ROI is then made to generate a feature map of fixed size by the ROIAlign layers. Specifically, for the backbone network ResNet-50, the network structure may be as follows: the first layer is an input layer whose input is a matrix of 600 x 800 x 3 pixels; the second layer is a feature extraction layer, which adopts the disclosed feature extraction network Resnet-50, and takes output matrixes of three layers of the conv3.X layer, the conv4.X layer and the conv5.X layer in the feature extraction network Resnet-50 as extracted features C3, C4 and C5, wherein the sizes of the extracted features are 75 x 100 x 512, 38 x 50 x 1024 and 19 x 25 x 2048 respectively. The feature pyramid network performs feature fusion on the features C3, C4 and C5 input by the backbone network ResNet-50, and outputs the fused features P3, P4, P5, P6 and P7 with 5 scales, namely an initial feature map.

The network structure of the feature pyramid network is as follows: the first layer is a convolutional layer based on feature C5, with a convolutional kernel size of 1 x 256 and a step size of 1, the layer is filled with SAME pattern, and the output matrix is 19 x 25 x 256;

The second layer is a convolutional layer with a convolutional kernel size of 3 x 256 and a step size of 1, the layer is filled by SAME mode, and the size of an output matrix P5 is 19 x 25 x 256;

the third layer is a convolutional layer based on the feature C4, the convolutional kernel size is 1×1×256, the step size is 1, the layer is filled by using SAME mode, and the size of the output matrix p4_is 38×50×256, which is marked as;

the fourth layer is an upsampling layer, which upsamples the output matrix P5 into an output matrix p5_ upsample, whose size is 38×50×1024;

The fifth layer is an added layer that adds the output matrix p5_ upsample to the output matrix p4_with an output matrix size of 38×50×1024;

the sixth layer is a convolutional layer with a convolutional kernel size of 3×256 and a step size of 1, the layer is filled with SAME mode, and the output matrix P4 has a size of 38×50×256;

The seventh layer is a convolutional layer based on the feature C3, the convolutional kernel size is 1×1×256, the step size is 1, the layer is filled by SAME mode, and the output matrix p3_size is 75×100×256;

The eighth layer is an upsampling layer, upsampling P4 to a size of 75×100, and the size of the output matrix p4_ upsample is 75×100×512;

the ninth layer is an Add layer, which adds p4_ upsample and p3_together, and the output matrix size is 75×100×512;

The tenth layer is a convolutional layer with a convolutional kernel size of 3×3×256 and a step size of 1, the layer is filled with SAME mode, and the output matrix P3 has a size of 75×100×512;

the eleventh layer is a convolutional layer on C5 with a convolutional kernel size of 3 x 256 and a step size of 2, the layer is filled with SAME pattern and the output matrix P6 size is 19 x 25 x 256;

The twelfth layer is a convolutional layer with a convolutional kernel size of 3 x 256 and a step size of 2, the layer is filled with SAME mode and the output matrix P7 has a size of 19 x 25 x 256.

Then, the output matrixes P3, P4, P5, P6 and P7 of the feature pyramid network are used as inputs of an area suggestion network RPN, the features of each output matrix are independently input into a processing layer of the area suggestion network, a sliding window of 3*3 is used in the area suggestion network to traverse the feature map of the whole input, an anchor with a set size is generated at the center of each window in the traversing process, and after the anchor is input into a fully-connected layer, the output is sent into two branches: branch one is a primary target frame regression branch, the convolution kernel is a convolution layer of 1 x 36, the step length is 1, and the layer is filled by using a VALID mode to obtain the rough position of the target frame. And the second branch is a foreground and background classification branch, the output of the upper layer is put into a convolution layer with a convolution kernel of 1 x 18 and a step length of 1, the layer is filled by using a VALID mode, the output matrix of the convolution layer is remolded, and the remodelled output matrix is input into an activation function layer softmax layer to judge whether the content framed by the target frame is background or not. The results of the two branches are input to a proposal layer (Proposal layer), the candidate frames are firstly ordered according to the foreground score, then unnecessary candidate frames are deleted through Non-maximum suppression (Non-Maximum Suppression, NMS), and a series of regional proposal frames with equal size and characteristics thereof, namely a characteristic diagram with a fixed size of 7*7, are obtained by utilizing a region of interest alignment layer (ROIAlign). In this embodiment, since the output of the regional suggestion network is not a fixed output, the processing by the ROIAlign layers can quickly obtain a feature map with a fixed size.

Step 208, inputting the feature map to a preset target detection network and a preset segmentation network respectively, identifying the category and the position of the target tissue in each image through the preset target detection network, and determining the size of the target tissue in each image through the preset segmentation network.

In specific implementation, the preset target detection network may include a classification subnet and a positioning subnet, where the classification subnet is mainly used to identify a category of a target tissue in each image, that is, a category of a target frame, and the positioning subnet is mainly used to position accurate coordinates of the target tissue in each image, so as to obtain a position of the target tissue. Taking the thyroid ultrasonic video data set as an example, the target tissue is the thyroid and surrounding main neck structure. Specifically, the inputs of the classification sub-network and the positioning sub-network are a series of regional suggestion boxes with equal sizes and the characteristics thereof, the first layer to the fourth layer of the classification sub-network and the positioning sub-network are the SAME, the convolution kernel is 3×3×256, the step length is 1, and the filling is performed in the SAME mode. For the positioning sub-network, the fifth layer is a full-connection layer, and the output is the accurate coordinates of the target frame; for the classified sub-network, the fifth layer is a full connection layer, and the output is the class of the target frame. The preset division network may also be called a division subnet, the input of which is an output matrix of the regional suggestion network, the output is a division mask corresponding to the feature map, and the division mask is a binary black-and-white picture. Wherein the area with target tissue (thyroid and cervical structures) is white and the area without target tissue is black. The number of pixels included in the target tissue can be calculated by the segmentation mask, and the size of the target tissue can be measured. The target tissue is exemplified by thyroid, and by measuring the size of thyroid, whether the thyroid is enlarged or not can be judged so as to judge whether the patient suffers from goiter or not.

step 228, obtaining the number of pixels in unit length and presetting a segmentation mask corresponding to a feature map output by a segmentation network;

step 248, calculating the number of pixels contained in the target tissue in each image according to the segmentation mask and the number of pixels in unit length corresponding to the feature map;

Step 268, determining the size of the target tissue in each image according to the number of pixels included in the target structure in each image and the preset pixel proportion data.

In a specific implementation, the input of the preset dividing network is an output matrix of the regional suggestion network, four layers connected next are consistent with the alignment layers of the regions of interest of the classifying sub-network and the positioning sub-network, the outputs of the four previous layers are sent into a mask full convolution layer with consistent four parameters, the convolution kernel size is 3×3×256, the step size is 1, and the layer is filled by using a SAME mode. The resulting output is fed into a convolutional layer with a convolutional kernel size of 2 x 256 and a step size of 2, which layer is filled with VALID patterns. The output matrix of this layer is fed into a convolution layer with a convolution kernel size of 1 x num_cls, step size 1, resulting in the final segmentation mask, this layer being filled with VALID patterns. Where num_cls is the number of categories of the dataset. For the measurement of target tissues such as thyroid and cervical structures, the number of pixels contained in a unit length can be obtained by identifying a scale in an image, then the number of pixels of the target tissues is calculated according to the obtained segmentation mask, and finally the sizes of the segmented thyroid and cervical structures are obtained according to the proportion. Specifically, the contour point coordinates of the thyroid and neck structures can be obtained by dividing the subnet, when the thyroid structures and the neck structures are fitted, binarization processing is carried out on the thyroid structures and the neck structures, a binary image is obtained, and the perimeter and the enclosed area of the contour curve are calculated according to the contour point coordinates obtained by fitting. The perimeter is the number of the fitted contour point set, the area is the number of all pixels of the position contained in the contour, the number of the pixels contained in one centimeter in unit length is calculated through a scale on the right side of the ultrasonic image, and then the perimeter and the area of the thyroid structure and the neck structure are converted from the pixels to centimeter unit representations, so that the sizes of the thyroid structure and the neck structure are obtained.

Step 210, inputting the category and the position of the target tissue in each image to a trained target tracking network for target tracking, and obtaining the category data and the number of the target tissue.

After the category and the position of the target tissue in each image tissue are identified, in order to ensure the accuracy of the identified target tissue, each target tissue needs to be subjected to target tracking. Specifically, the categories and positions of the target tissues output by the positioning sub-network and the classifying sub-network can be input into a trained target tracking network, target capturing and tracking matching are performed on the target tissues in the front and back continuous frame images, whether abnormal detection exists is detected, and data such as the categories of the target tissues contained in the image data set, the corresponding number of the target tissues of each category, the number of the target tissues contained in the image data set and the like are obtained through statistics.

In the ultrasonic image processing method, the ultrasonic image data set is correspondingly processed to obtain the feature map with fixed size, the feature map is respectively input into the preset target detection network and the preset segmentation network, the type and the position of the target tissue in each image can be identified, the size of the target tissue is measured, and the trained target tracking network is utilized to carry out target tracking on the type and the position input of the target tissue in each image, so that the type data and the number of the target tissue can be obtained. The whole scheme can realize real-time and automatic identification, segmentation, tracking and size measurement of tissue organs (such as thyroid and surrounding main neck tissues), ensures the accuracy of data while improving the speed of result processing, and can effectively assist doctors in realizing accurate screening and diagnosis of thyroid and surrounding main neck tissues.

As shown in fig. 4, in one embodiment, inputting the category and the position of the target tissue in each image to the trained target tracking network to perform target tracking, and obtaining the category data and the number of the target tissue includes:

step 212, obtaining the category and the position of the target tissue in the current frame image;

Step 214, predicting the position of the target tissue in the next frame image through Kalman filtering processing based on the type and the position of the target tissue in the current frame image;

Step 216, judging whether the target tissues in the front and rear frame images are the same target tissue;

step 218, the same identification data is allocated to the same target organization, and the category data and the number of the target organization are counted.

In the specific implementation, the type and the position of the target tissue in the current frame image can be obtained according to the type and the position of the target tissue in each image obtained in the preamble, then the position of the target tissue in the next frame image is predicted through Kalman filtering processing, then the next frame image is detected through target detection, whether the target tissue in the previous and the next frame images is the same target tissue is judged, if the target tissue is the same target tissue, the same identification data is allocated to the same target tissue, namely the same ID (Identity) is allocated to the same target tissue, and based on the image data set after the identification data is allocated, the type of the target tissue and the number of each type of the target tissue contained in the image data set and the total number of the target tissues are counted. In another embodiment, step 216 includes: and calculating the mahalanobis distance, the cosine distance and the intersection ratio between the target tissues in the front frame image and the rear frame image, and judging whether the target tissues in the front frame image and the rear frame image are the same target tissue. Specifically, the method comprises the steps of calculating the mahalanobis distance, the cosine distance and the intersection ratio (Intersection-over-Union, ioU) between a predicted target tissue and an actual target detection frame, simultaneously introducing a VGG16 network to extract the characteristics in the target frame, calculating the matching degree of the appearance characteristics, then tracking and matching the target frame by using the Hungary algorithm, specifically, calculating the distance between the predicted position of the target tissue in the Kalman filtering and the position where the target tissue of the next frame appears by using the mahalanobis distance, and if the distance is too large, proving that the predicted position and the cosine distance are not matched, namely the target tissue of the previous frame does not appear in the next frame of image; calculating the similarity between the features of the target frames in the front and rear two-frame images extracted by the VGG16 by using the cosine distance, and if the similarity is high, indicating that the target tissues in the front and rear two-frame images are the same target tissue; and combining two target frames in the front frame image and the rear frame image through the overlap ratio, calculating the overlap ratio of the two target frames, and if the overlap ratio is high, considering that the target frames in the two frame images are the same frame, namely the same target tissue. Through the above processing, the same ID is assigned to the association characterized by the same target organization. In the embodiment, the defect of the mahalanobis distance can be made up by cosine distance and cross ratio calculation, and the accuracy of capturing and tracking matching of the target tissue is improved.

Specifically, the application discloses a method for automatically identifying, segmenting, capturing and measuring a tissue structure in an ultrasonic image, which comprises the following steps: acquiring a data set, performing preprocessing operation on the acquired data set to obtain a preprocessed data set, and inputting the obtained preprocessed data set into a trained deep convolutional neural network to obtain the category, the position and the size of the tissue structure contained in each ultrasonic image. The deep convolutional neural network comprises backbone networks ResNet-50, a feature pyramid network FPN, a regional recommendation network RPN, a classification subnet, a positioning subnet, a segmentation subnet and a tracking subnet which are connected in sequence. The application can assist the ultrasonic doctor to complete the screening diagnosis of the diseases, quicken the speed of disease screening, lighten the work load of the ultrasonic doctor and improve the consistency of ultrasonic diagnosis.

Specifically, the deep convolutional neural network used in the application is obtained by training the following steps:

(1) Acquiring a data set, sending the data set to a sonographer, and acquiring the data set marked by the sonographer;

Specifically, the data set takes 200 thyroid ultrasound videos obtained from ultrasound equipment manufactured by mainstream manufacturers (including baisheng, siemens, philips, merits, etc.) on the market as an example, and these videos are parsed into continuous video frames to obtain a training set of thyroid ultrasound image frames, which are then randomly divided into 3 parts, 80% of which are used as training sets (Train sets), 10% of which are used as Validation sets (Validation sets), and 10% of which are used as Test sets (Test sets). Specifically, the training set of thyroid ultrasound image frames labeled by the sonographer can be shown in fig. 5 (a), fig. 5 (b), fig. 5 (c), fig. 5 (d) and fig. 5 (e).

(2) Preprocessing the marked data set to obtain a preprocessed data set;

Specifically, the preprocessing process includes random cropping, random flipping, random transformation of image saturation and brightness, etc.

(3) Counting the marked data set in the step (1) by using a K-means clustering algorithm to obtain 3 ratio values which can most represent the length and width of key targets in the data set and serve as the ratio of anchor points (anchors) in the deep convolutional neural network;

(4) And (3) inputting a batch of data (which can be 16 images) in a training set part in the preprocessing data set obtained in the step (2) into a deep convolutional neural network to obtain an inference output, and inputting the inference output and the data set marked by the ultrasonic doctor in the step (1) into a loss function L _all of the deep convolutional neural network to obtain a loss value.

(5) Optimizing a loss function L _all of the deep convolutional neural network according to an Adam algorithm and by using the loss value obtained in the step (4) so as to achieve the purpose of gradually updating parameters in the deep convolutional neural network;

Specifically, in the optimization process, the learning rate lr=0.001, the impulse ζ=0.9, and the weight attenuation ψ=0.004.

(6) Repeating the steps (4) and (5) for the rest batch data in the training set part in the preprocessing data set obtained in the step (2) until the iteration times are reached, thereby obtaining a trained deep convolutional neural network;

Specifically, the training process in step (6) may include 120 cycles, and each cycle iterates 100 times.

(7) And (3) verifying the trained deep convolutional neural network by using the test set part in the preprocessed data set obtained in the step (2).

By inputting the video frames of the thyroid ultrasonic examination to the trained convolutional neural network, the types and positions of thyroid and cervical structures can be automatically output, and the sizes of the thyroid and cervical structures measured by segmentation are given, wherein the recognized, segmented and captured result schematic diagrams obtained after the thyroid video frames are input to the deep convolutional neural network can be shown as fig. 6-7.

It should be understood that, although the steps in the flowcharts of fig. 2-4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-4 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 8, there is provided an ultrasonic image processing apparatus including: a data set acquisition module 510, a data preprocessing module 520, a feature map acquisition module 530, a feature map processing module 540, and a target tracking module 550, wherein:

a dataset acquisition module 510 for acquiring an ultrasound image dataset.

The data preprocessing module 520 is configured to preprocess the ultrasound image dataset to obtain a preprocessed image dataset.

The feature map obtaining module 530 is configured to input the preprocessed image dataset to a preset image processing network, so as to obtain a corresponding feature map.

The feature map processing module 540 is configured to input the feature map to a preset target detection network and a preset segmentation network, identify a type and a position of a target tissue in each image through the preset target detection network, and determine a size of the target tissue in each image through the preset segmentation network.

The target tracking module 550 is configured to input the category and the position of the target tissue in each image to the trained target tracking network for target tracking, so as to obtain category data and the number of the target tissue.

In one embodiment, the target tracking module 550 is further configured to obtain a type and a position of a target tissue in the current frame image, predict a position of the target tissue in the next frame image through a kalman filtering process based on the type and the position of the target tissue in the current frame image, determine whether the target tissue in the previous and the next frame images is the same target tissue, allocate the same identification data to the same target tissue, and count the type data and the number of the target tissue.

In one embodiment, the target tracking module 550 is further configured to calculate a mahalanobis distance, a cosine distance, and an intersection ratio between the target tissues in the front and rear frame images, and determine whether the target tissues in the front and rear frame images are the same target tissue.

In one embodiment, the feature map obtaining module 530 is further configured to input the preprocessed image dataset into a preset feature extraction network, extract a corresponding initial feature map, input the initial feature map into a preset area suggestion network, obtain suggestion windows corresponding to the initial feature maps, map the suggestion windows to an initial feature map of a last layer of the preset area suggestion network, and obtain a feature map with a fixed size through ROIAlign layers of processing.

In one embodiment, the feature map obtaining module 530 is further configured to input the preprocessed image dataset to the backbone network ResNet-50, extract corresponding feature data, and perform feature fusion on the extracted feature data by using a feature pyramid layer to obtain a corresponding initial feature map.

In one embodiment, the feature map processing module 540 is further configured to obtain the number of pixels in a unit length and a segmentation mask corresponding to a feature map output by a preset segmentation network, calculate the number of pixels included in the target tissue in each image according to the segmentation mask corresponding to the feature map and the number of pixels in the unit length, and determine the size of the target tissue in each image according to the number of pixels included in the target structure in each image and preset pixel proportion data.

In one embodiment, the data preprocessing module 520 is further configured to sequentially perform scaling, normalization and random enhancement on the ultrasound image dataset to obtain a preprocessed image dataset.

For specific limitations of the ultrasound image processing apparatus, reference may be made to the above limitations of the ultrasound image processing method, and no further description is given here. The respective modules in the above-described ultrasound image processing apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store ultrasound image datasets as well as data of the category, location and size of the target tissue. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of ultrasound image processing.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided that includes a memory having a computer program stored therein and a processor that when executing the computer program performs the steps of the ultrasound image processing method described above.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the ultrasound image processing method described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of ultrasound image processing, the method comprising:

Acquiring an ultrasound image dataset;

preprocessing the ultrasonic image data set to obtain a preprocessed image data set;

respectively inputting the feature images into a preset target detection network and a preset segmentation network, identifying the category and the position of target tissues in each image through the preset target detection network, and determining the size of the target tissues in each image through the preset segmentation network; wherein, determining the size of the target tissue in each image through the preset segmentation network comprises: obtaining the number of pixels in a unit length and presetting a segmentation mask corresponding to a feature map output by a segmentation network; calculating the number of pixels contained in a target tissue in each image according to the segmentation mask corresponding to the feature map and the number of pixels in unit length; determining the size of target tissues in each image according to the number of pixels contained in the target structure in each image and preset pixel proportion data;

The determining the size of the target tissue in each image according to the number of pixels included in the target structure in each image and preset pixel proportion data specifically includes:

The method comprises the steps of obtaining contour point coordinates of thyroid and neck structures by dividing a subnet, performing binarization processing on the thyroid structures and the neck structures when fitting the thyroid structures and the neck structures to obtain binary images, and calculating the perimeter and the enclosed area of a contour curve according to the contour point coordinates obtained by fitting; the perimeter is the number of the fitted contour point set, the area is the number of all pixels of the position contained in the contour, the number of the pixels contained in one centimeter in unit length is calculated through a scale on the right side of the ultrasonic image, and then the perimeter and the area of the thyroid structure and the neck structure are converted from the pixels to centimeter unit representations, so that the sizes of the thyroid structure and the neck structure are obtained; the input of the preset dividing network is an output matrix of the regional suggesting network, four layers connected next are consistent with the alignment layers of the regions of interest of the classifying sub-network and the positioning sub-network, the output of the previous four layers is sent into four mask full convolution layers with consistent parameters, the convolution kernel size is 3 x 256, the step length is 1, and the layers are filled by using an SAME mode; the resulting output is fed into a convolutional layer with a convolutional kernel size of 2 x 256 and a step size of 2, the layer being filled using the VALID mode; sending the output matrix of the layer into a convolution layer with the convolution kernel size of 1 x num_cls and the step length of 1 to obtain a final segmentation mask, wherein the layer is filled by using a VALID mode; wherein num_cls is the number of categories of the data set;

Inputting the category and the position of the target organization in each image into a trained target tracking network to track the target, so as to obtain category data and the quantity of the target organization;

Inputting the types and positions of target tissues output by a positioning sub-network and a classifying sub-network into a trained target tracking network, carrying out target capturing and tracking matching on the target tissues in front and back continuous frame images, detecting whether abnormal detection exists, and counting the types of the target tissues contained in an image data set, the corresponding number of each type of target tissues and the number of the target tissues contained in the image data set, wherein the target tissues are thyroid structures and neck structure tissues, and the neck structure tissues comprise anterior cervical muscles, trachea, internal carotid arteries, carotid arteries and jugular veins;

the step of inputting the category and the position of the target tissue in each image to a trained target tracking network for target tracking to obtain category data and the number of the target tissue comprises the following steps:

if the same identification data is allocated to the same target organization, counting the category data and the number of the target organization.

2. The method of claim 1, wherein inputting the category and the location of the target tissue in each image to the trained target tracking network for target tracking, obtaining the category data and the number of the target tissue comprises:

Predicting the position of a target tissue in a next frame image through Kalman filtering processing based on the type and the position of the target tissue in the current frame image;

3. The method according to claim 2, wherein determining whether the target tissue in the previous and subsequent frame images is the same target tissue comprises:

4. The method of claim 1, wherein inputting the pre-processed image dataset into a pre-set image processing network, obtaining a corresponding feature map comprises:

And mapping the suggestion window to an initial feature map of the last layer of the preset area suggestion network, and processing the initial feature map through ROIAlign layers to obtain a feature map with a fixed size.

5. The method of claim 4, wherein inputting the preprocessed image dataset into a preset feature extraction network, extracting a corresponding initial feature map comprises:

inputting the preprocessed image dataset to a backbone network ResNet-50, and extracting corresponding characteristic data;

6. The method of any one of claims 1 to 5, wherein preprocessing the ultrasound image dataset to obtain a preprocessed image dataset comprises:

7. An ultrasound image processing apparatus, the apparatus comprising:

the target tracking module is used for inputting the category and the position of the target tissue in each image into a trained target tracking network to track the target, so as to obtain the category data and the number of the target tissue;

wherein, the feature map processing module is further used for:

obtaining the number of pixels in a unit length and presetting a segmentation mask corresponding to a feature map output by a segmentation network; calculating the number of pixels contained in a target tissue in each image according to the segmentation mask corresponding to the feature map and the number of pixels in unit length; determining the size of target tissues in each image according to the number of pixels contained in the target structure in each image and preset pixel proportion data;

wherein, the feature map processing module is further used for:

the target tracking module is further configured to:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.