WO2020108366A1 - 图像分割方法、装置、计算机设备及存储介质 - Google Patents

图像分割方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020108366A1
WO2020108366A1 PCT/CN2019/119770 CN2019119770W WO2020108366A1 WO 2020108366 A1 WO2020108366 A1 WO 2020108366A1 CN 2019119770 W CN2019119770 W CN 2019119770W WO 2020108366 A1 WO2020108366 A1 WO 2020108366A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
image frame
position information
segmentation
key point
Prior art date
Application number
PCT/CN2019/119770
Other languages
English (en)
French (fr)
Inventor
陈思宏
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19890613.3A priority Critical patent/EP3852009A4/en
Publication of WO2020108366A1 publication Critical patent/WO2020108366A1/zh
Priority to US17/173,259 priority patent/US11734826B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20216Image averaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/031Recognition of patterns in medical or anatomical images of internal organs

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image segmentation method, device, computer equipment, and storage medium.
  • the related method is to input the original image and the optical flow image in the video into the convolutional neural network for encoding, and then connect the feature maps obtained by the respective encoding in series and then unify Decode to segment the target object from the original image.
  • a large amount of data processing is required, which consumes a large amount of computing and processing resources.
  • an image segmentation method, device, computer equipment, and storage medium are provided, which can solve the problem that the related methods consume a large amount of computing and processing resources.
  • An image segmentation method includes:
  • the target object is segmented from the current image frame.
  • An image segmentation device includes:
  • the selection module is used to sequentially select the current image frame in the video according to the timing
  • the affine transformation module is used to determine the reference image frame from the image frame whose timing in the video is before the current image frame; obtain the first position information of the key point of the target object in the reference image frame; refer to the first The affine transformation relationship between the position information and the target object key point template, performing affine transformation on the current image frame to obtain the target object map of the current image frame;
  • a target object information acquisition module configured to perform key point detection on the target object graph to obtain second position information of key points of the target object; segment the target object from the target object graph to obtain segmentation information of the target object;
  • the segmentation module is configured to segment the target object from the current image frame according to the segmentation information and the second position information.
  • a computer device characterized in that it includes a memory and a processor, and the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps:
  • the target object is segmented from the current image frame.
  • a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:
  • the target object is segmented from the current image frame.
  • the above image segmentation method, device, computer equipment, and storage medium select the image frame whose time sequence is before the current image frame in the video as the reference image frame, and generate the target object key point template in advance, and use the target object key point in the reference image frame As the affine transformation reference information, perform affine transformation on the current image frame according to the affine transformation relationship between the first position information of the target object key point in the reference image frame and the target object key point template After processing, the target object image in the current image frame is obtained. That is, based on the prior knowledge of the time sequence of the first position information of the target object key point of the previous reference image frame, combined with the affine transformation process, the target object map can be determined relatively quickly without requiring a lot of calculation, reducing the calculation Handling resources.
  • the obtained target object map is the target object's region of interest, which is equivalent to removing many unrelated other image contents, and then only performing key point detection on the target object map to obtain the second position information of the target object key point ; Segment the target object from the target object map to obtain the segmentation information of the target object, and map the segmentation information and the second position information to the current image frame.
  • the target object can be clearly distinguished, and the target object in the current image frame is segmented, and the segmentation and key point detection processing based on the target object map are both Eliminate the interference of other unrelated images, and reduce the amount of calculation.
  • An image segmentation method includes:
  • the left ventricle is segmented from the current image frame.
  • the method before selecting the current image frame in sequence in the video in sequence, the method further includes:
  • next image frame as the previous image frame and the position information of the left ventricular key point in the next image frame as the previous position information, return to the reference to the previous position information, and detect the previous Steps of position information of left ventricular key points in the next image frame of one image frame to perform iterative processing until position information of left ventricular key points in the last image frame of the video is obtained;
  • detecting the position information of the left ventricular key point in the next image frame of the previous image frame includes:
  • the affine transformation relationship between the previous position information and the left ventricular keypoint template perform an affine transformation on the next image frame of the previous image frame to obtain the left ventricle image in the next image frame ;
  • the determining the reference image frame from the image frames whose timing in the video is before the current image frame includes:
  • the segmenting the left ventricle from the current image frame according to the segmentation information and the second position information includes:
  • the segmentation information of the left ventricle determined according to the first position information of the key points of the left ventricle in each reference image frame is averaged to obtain the final segmentation information of the left ventricle;
  • the method further includes:
  • the key point detection on the left ventricular map to obtain the second position information of the key points of the left ventricle includes:
  • the segmentation of the left ventricle from the left ventricle diagram to obtain segmentation information of the left ventricle includes:
  • segmentation model in the multi-task network Through the segmentation model in the multi-task network, perform semantic segmentation processing on the feature map, and output segmentation information of the corresponding left ventricle.
  • the method further includes:
  • the category of the largest number of slices is taken as the category of the slice corresponding to the video.
  • the key point detection processing on the feature map through the key point detection model in the multi-task network includes:
  • the segmentation model in the multi-task network performs semantic segmentation processing on the feature map, and outputting segmentation information of the corresponding left ventricle includes:
  • the category corresponding to the larger classification probability of the first classification probability and the second classification probability corresponding to the pixel is selected as the category to which the pixel belongs;
  • segmentation information of the left ventricle corresponding to the left ventricle map is determined.
  • the training step of the segmentation model includes:
  • Each of the sample image frames and the corresponding first left ventricular segmentation label are input into the initial segmentation model, and iterative machine learning training is performed to obtain a basic segmentation model.
  • the training step of the segmentation model further includes:
  • the method further includes:
  • sample image frame after each pixel is removed and the corresponding second left ventricular segmentation label are input into the basic segmentation model to perform iterative model optimization training.
  • the step of generating the left ventricular keypoint template includes:
  • each expanded range is expanded and expanded to obtain the cropped range
  • a left ventricular key point template is generated.
  • An image segmentation device includes:
  • the selection module is used to sequentially select the current image frame in the heart ultrasound video according to the time sequence
  • the affine transformation module is used to determine the reference image frame from the image frames whose timing in the video is before the current image frame; obtain the first position information of the left ventricular key point in the reference image frame; refer to the first The affine transformation relationship between the position information and the left ventricular key point template, performing affine transformation on the current image frame to obtain a left ventricular map of the current image frame;
  • a left ventricle information detection module configured to perform key point detection on the left ventricle map to obtain second position information of key points of the left ventricle; segment the left ventricle from the left ventricle map to obtain segmentation information of the left ventricle;
  • the segmentation module is configured to segment the left ventricle from the current image frame according to the segmentation information and the second position information.
  • a computer device characterized in that it includes a memory and a processor, and a computer program is stored in the memory, and when the computer program is executed by the processor, the processor is caused to perform the following steps:
  • the left ventricle is segmented from the current image frame.
  • a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:
  • the left ventricle is segmented from the current image frame.
  • the above image segmentation method, device, computer equipment and storage medium can determine the left ventricle relatively quickly based on the prior knowledge of the timing of the first position information of the left ventricular key point of the previous reference image frame, combined with the affine transformation process Figure, without the need for a lot of calculation, reducing the computing processing resources.
  • the obtained left ventricle map is the region of interest of the left ventricle, which is equivalent to excluding many other irrelevant image content, and then only key point detection is performed on the left ventricle map to obtain second position information of key points of the left ventricle ; Segmenting the left ventricle from the left ventricular diagram to obtain segmentation information of the left ventricle, mapping the segmentation information and the second position information to the current image frame.
  • the left ventricle In the current image frame mapped with the segmentation information and the second position information, the left ventricle can be clearly distinguished, the segmentation of the left ventricle in the current image frame is realized, and the segmentation and key point detection processing based on the left ventricular map It not only eliminates the interference of other unrelated images, but also reduces the amount of calculation.
  • FIG. 1 is an application scenario diagram of an image segmentation method in an embodiment
  • FIG. 2 is a schematic flowchart of an image segmentation method in an embodiment
  • FIG. 3 is a schematic diagram of a cutaway view in an embodiment
  • FIG. 4 is a schematic diagram of the principle of the image segmentation method in an embodiment
  • 5 is a schematic diagram of the principle of multiple timing detection in an embodiment
  • FIG. 6 is a schematic diagram of a multi-task network structure in an embodiment
  • FIG. 8 is a schematic diagram of the principle of generating a key template of a target object in an embodiment
  • FIG. 9 is a block diagram of an image segmentation device in an embodiment
  • FIG. 10 is a block diagram of an image segmentation device in another embodiment
  • FIG. 11 is a schematic diagram of the internal structure of a computer device in an embodiment.
  • FIG. 1 is an application scene diagram of an image segmentation method in an embodiment.
  • the application scenario includes a terminal 110 and a server 120 connected through a network.
  • the terminal 110 may be a smart TV, a desktop computer, or a mobile terminal, and the mobile terminal may include at least one of a mobile phone, a tablet computer, a notebook computer, a personal digital assistant, a wearable device, and the like.
  • the server 120 may be implemented by an independent server or a server cluster composed of multiple physical servers.
  • the terminal 110 may transmit the video to the server 120, so that the server 120 segments the target object in each frame of the video.
  • the server 120 may sequentially select the current image frame in the video according to the time sequence; determine the reference image frame from the image frames whose time sequence in the video is before the current image frame; obtain the number of the target object key point in the reference image frame A position information; referring to the affine transformation relationship between the first position information and the target object key point template, the current image frame is affine transformed to obtain the target object image of the current image frame; key point detection is performed on the target object image To obtain the second position information of the key point of the target object; segment the target object from the target object map to obtain the segmentation information of the target object; according to the segmentation information and the second position information, segment the target from the current image frame Object.
  • the server 120 may feed back the segmentation result to the terminal 110 for display.
  • the terminal 110 may not send the video to the server for detection and analysis processing, and the terminal 110 itself may also have a function to execute the image segmentation method in each embodiment of the present application.
  • the terminal itself has a computer processing function, so that each step of the image segmentation method in each embodiment of the present application can be executed for the video.
  • the terminal 110 may also include a medical detection terminal.
  • the medical testing terminal is an instrument terminal for medical testing.
  • the medical detection terminal may include a detection probe and a display device. Among them, the detection probe can function as the lens of the camera. With the rotation of the detection probe, each structure of the detection object can be clearly displayed on the display device.
  • the medical detection terminal may be a cardiac ultrasound detector.
  • Heart ultrasound detector is an instrument that detects the heart by using ultrasound. It can be understood that the medical detection terminal may also be an instrument that performs ultrasonic detection on other parts of the human body.
  • the terminal 110 is used as a cardiac ultrasound detector as an example to illustrate the usage scenario.
  • the doctor can place the detection probe in the terminal 110 on the patient's heart to perform the detection.
  • the detection probe can use ultrasound to collect a frame-by-frame echocardiogram of the heart, and the video that constitutes the heart ultrasound detection is displayed on the display device .
  • the terminal 110 may also transmit the video of cardiac ultrasound detection to the server 120, so that the server 120 segments the left ventricle in each frame of the video.
  • the terminal 110 must be a medical detection terminal.
  • the video may not be limited to the heart ultrasound detection video, or any other type of video, or the left ventricular image to be detected, or any single target object in the video.
  • a single target object can The left ventricle of the heart, etc.
  • FIG. 2 is a schematic flowchart of an image segmentation method in an embodiment.
  • the image segmentation method is mainly applied to a computer device for illustration.
  • the computer device may be the server 120 in FIG. 1 or the terminal 110.
  • the method specifically includes the following steps:
  • S202 Select the current image frame in sequence in the video in sequence.
  • the video can be any type of video.
  • the video may include frame-by-frame image frames arranged in time series.
  • the video may include ordinary videos in daily life. For example, videos recorded on mobile phones, or videos in various video programs on video websites.
  • the video may also include a specific type of video captured using specific detection techniques.
  • a specific type of video captured using specific detection techniques.
  • ultrasound video refers to the video collected when using ultrasonic technology for ultrasonic testing.
  • the ultrasound video may be a video obtained by performing ultrasound detection on human organs.
  • the ultrasound video may include a video of ultrasound examination of the abdomen.
  • the video of ultrasound examination of the abdomen is a video collected when performing ultrasound examination on the abdomen.
  • the ultrasound video may not be limited to the detection of human organs, for example, it may also be obtained by performing ultrasound detection on non-human organs.
  • the ultrasound video may include video of cardiac ultrasound detection.
  • the heart ultrasound video is the video collected during the heart ultrasound inspection.
  • the image frames in the video may be normal image frames.
  • a normal image frame refers to an image frame in which the target object is presented in its normal state.
  • the image frames in the video may also be abnormal image frames.
  • An abnormal image frame refers to an image frame in which the target object is presented in an abnormal state.
  • the image frame in the video may be a slice view.
  • a cross-sectional view refers to an effect image that simulates an object being "cut”. In the cut-away view, the target object is in a "cut" state.
  • the image frame may also be an ultrasound slice.
  • the ultrasound slice is a two-dimensional ultrasound image, which is a fan-shaped scan of the ultrasound sound beam generated by the detection probe after entering the chest wall, and the echo signals reflected from the human body form a slice image in the form of light spots. It can be understood that from the perspective of comparing the image, the reflected echo signal makes the scanning surface formed by the fan-shaped scan have an effect of "cutting" the organ, so the echo signal reflected from the human body is reflected by light.
  • the dot form constitutes the slice image.
  • the slice view is not limited to being obtained by ultrasound.
  • the image of the target object that is presented as a slice in the physical world can be obtained by using a common daily collection method to directly obtain the slice image.
  • the ultrasound cross-sectional view may include an ultrasound cross-sectional view of the heart
  • the video of the heart ultrasound detection includes a frame-by-frame ultrasound cross-sectional view of the heart arranged in time series. That is, when the video is a heart ultrasound detection video, the current image frame may be a heart ultrasound slice.
  • the ultrasound sectional view may also include an ultrasound sectional view of the liver or an ultrasound sectional view of the lung.
  • FIG. 3 is a schematic diagram for explaining a cross-sectional view by taking an example of a cardiac ultrasonic cross-sectional view.
  • the ultrasound sound beam is scanned in a fan shape, and a scanning surface 302 is obtained.
  • There is an effect of approximately “cutting” the heart and the ultrasound sectional view of the heart in FIG. 3 can be obtained.
  • the scanning plane does not actually cut the heart, but can form a slice image similar to the slice obtained by the scanning plane according to the distance of the echo signal reflected back.
  • the computer device may sequentially select the current image frame in the video according to the time sequence. That is, the computer device may sequentially select a current image frame according to the sequence from front to back, and perform the following steps S204 to S214 for the selected current image frame.
  • the next image frame can be selected from the video as the current image frame according to the timing, and the application can be continued for the new current image frame.
  • S204 Determine a reference image frame from the image frames whose timing in the video is before the current image frame.
  • the image frames in the video are all sorted according to the time sequence. It can be understood that the image frame whose timing is before the current image frame refers to all or part of the image frames in the video before the current image frame. For example, if the current image frame is the fifth image frame, then the image frame whose timing is before the fifth image frame in the video may be all or part of the image frame whose timing is in the first 4 bits.
  • the reference image frame is an image frame whose timing for providing reference information is before the current image frame when segmenting the target object in the current image frame. It can be understood that the target object segmentation processing can be performed on the current image frame using the time-series prior reference information provided by the reference image frame.
  • the number of reference image frames may be at least one, and at least one refers to one or more than one.
  • the reference image frame when the reference image frame is one, it may be any one of the image frames whose time sequence is located before the current image frame, or may be the image frame before the current image frame. When there are multiple reference image frames, it may be any number of image frames in the image frame whose time sequence is before the current image frame, or it may be multiple image frames selected in order from near to far from the current image frame .
  • S206 Acquire first position information of key points of the target object in the reference image frame.
  • the target object is an object that needs to be segmented from the current image frame.
  • the target object can be any object that you want to segment.
  • the target object may be a bicycle in the video screen, or a competitor in the video screen.
  • the target object may be the left ventricle. It can be understood that the target object may also be the liver or lungs, etc., which are not listed here.
  • the target object key point is a point used to represent the characteristics of the target object.
  • the key point of the target object is the key point of the left ventricle.
  • the key points of the left ventricle refer to the points used to represent the characteristics of the left ventricle in the picture.
  • the key points of the left ventricle include the apex of the left ventricle and the two endpoints of the mitral valve.
  • the apex of the left ventricle, the apex of the heart is the conical tip of the lower left side of the heart.
  • the mitral valve, the left atrioventricular valve is attached to the left fibrous atrioventricular annulus and is formed by folds of the endocardium.
  • the mitral valve has two valves, and the two end points of the mitral valve are the end points of the two valves. It should be noted that the key points of the left ventricle are not limited to the above three points, and points at other positions of the left ventricle may also be set as key points.
  • the key points of the target object may include points representing facial features. It can be understood that under normal circumstances, the features of the human facial features will basically not change, so it can show the characteristics of the face, so it can be used as a key point.
  • the first position information of the key point of the target object in the reference image frame is known. Because, according to the image segmentation method in the embodiment of the present application, the current image frame is selected according to the time sequence, and after the target object in a current image frame is segmented, the next image frame will be selected as the new current image frame. In this iteration, when segmenting the target object in the current image frame, the image frame whose time sequence is before the current image frame has already segmented the first position information of the target object key point and the segmentation information of the target object. Therefore, the reference image frame determined in the image frame before the current image frame already includes the first position information of the target object key point in the reference image frame and the segmentation information of the target object. Therefore, the computer device can obtain the first position information of the known key point of the target object in the reference image frame.
  • S208 Perform affine transformation on the current image frame according to the affine transformation relationship between the first position information and the target object key point template to obtain the target object image in the current image frame.
  • the key point template is used to represent the position information of the preset key points.
  • the target object key point template is used to represent preset position information of the target object key point with reference to the image of the main area as the target object area as a reference. That is to say, the preset position information of each target object key point in the target object key point template is the position information of the target object key point marked on the image whose main area is the target object area. It can be understood that the main area is the target object area, which means that the area of the target object occupies the main part in the image.
  • the affine transformation relationship is used to represent the affine transformation operation to be performed from the first position information of the target object key point in the reference image frame to the preset position information of the target object key point in the target object key point template.
  • the current image frame can be rotated, translated, cropped and other affine transformation operations to obtain the target image of the current image frame.
  • the affine transformation relationship can be represented by an affine transformation matrix.
  • the affine transformation relationship between the first position information and the target object key point template is used to represent that the first position information of the target object key point in the reference image frame is transformed to the target object in the target object key point template
  • the affine transformation process that needs to go through.
  • the target object key point in the current image frame and the target object key point in the reference image frame before it are both the same key point representing the same target object, between the target object key points in the two image frames The relative position is consistent.
  • performing affine transformation on the current image frame according to the affine transformation relationship is equivalent to adjusting the position information of the target object key point of the current image frame to the preset position of the target object key point in the target object key point template information.
  • the target object keypoint template is used to represent the image with the main area as the target object area and the preset position information of the target object keypoint set in advance
  • the current image frame is subjected to affine transformation processing according to the above affine transformation relationship
  • the region of interest (ROI) of the target object in the current image frame can be obtained, which is the target object map.
  • the region of interest is to select an area from the image that represents the focus of image analysis. It can be understood that the target object area in the target object diagram is the focus area.
  • the computer device may extract the preset position information of the target object key point from the target object key point template, and calculate based on the first position information and the preset position information of the target object key point in the target object key point template A transformation matrix.
  • the computer device may perform affine transformation processing on the current image frame according to the transformation matrix to obtain the target object image of the current image frame.
  • the computer device may multiply the current image frame by the transformation matrix to obtain the target object image of the current image frame.
  • the target object image is the left ventricle image.
  • the left ventricle map which is the region of interest of the left ventricle, is an image that takes the area of the left ventricle as the main area.
  • S210 Perform key point detection on the target object map to obtain second position information of key points of the target object.
  • the computer device can directly perform image analysis on the target object map itself, thereby identifying key points of the target object, and obtaining second position information of the key points of the target object.
  • the computer device may input the target object graph into the multi-task network, perform key point detection processing on the target object graph through the key point detection model in the multi-task network, and output the target object corresponding to the target object graph The second position information of the key point.
  • a multi-task network is a network capable of executing multiple processing tasks in parallel.
  • Multi-task network includes key point detection model.
  • the computer device can detect the difference in position information between the target object key point in the target object graph and the target object key point in the target object key point template through the key point detection model; the target object key point template The preset position information of the key point of the target object and the difference of the position information are added to obtain the second position information of the key point of the target object in the target object map.
  • S212 Segment the target object from the target object graph to obtain segmentation information of the target object.
  • the segmentation information of the target object is used to segment the target object area from the target object graph. That is, it is used to distinguish the target object area from other areas in the target object map.
  • the segmentation information of the target object includes pixels of the target object.
  • the computer device may perform target object segmentation processing on the target object graph to obtain the segmentation contour of the target object.
  • the multi-task network may also include a segmentation model.
  • the computer device may input the target object graph into a pre-trained segmentation model in the multi-task network, and perform semantic segmentation processing on the target object graph through the segmentation model to output segmentation information of the corresponding target object.
  • the computer device can predict the category to which each pixel in the target object graph belongs through the segmentation model, and according to each pixel belonging to the foreground category, constitute target object segmentation information corresponding to the target object graph.
  • the category to which each pixel belongs includes a foreground category and a background category.
  • the pixels belonging to the foreground category are the pixels of the target object, that is, the segmentation information of the target object corresponding to the target object map.
  • the computer device may map the segmentation information of the target object in the target object map and the second position information of the key points of the target object to the current image frame.
  • the computer device may convert the segmentation information of the target object in the target object graph and the second position information of the key points of the target object in accordance with the inverse transform operation of the affine transform operation performed on the current image frame in step S208, Affine transformation processing is performed to map the segmentation information of the target object in the target object map and the second position information of the key points of the target object to the current image frame.
  • the target object in the current image frame after mapping the segmentation information of the target object and the second position information of the key point of the target object, the target object can be clearly distinguished and displayed, that is, the target object is segmented from the current image frame purpose.
  • steps S202 to S214 may be executed.
  • the computer device may perform target object key point detection on the entire frame of the first image frame to obtain the position information of the target object key point in the first image frame, and The target object is divided in the entire frame of the image frame to obtain the segmentation information of the target object, and the position information of the key point of the target object in the first image frame and the segmentation information of the target object are mapped to the first image frame.
  • the computer device may also input the entire frame of the first image frame into the multi-task network, and use the key point detection model in the multi-task network to detect the key points of the target object on the first image frame, through the multi-task network
  • the segmentation model in, the first image frame is semantically segmented, and the segmentation information of the corresponding target object is output.
  • the computer device can calculate the position information of the key point of the target object of the first image frame and the segmentation information of the target object of the first image frame. In this way, when the current image frame is the first image frame, the position information of the key point of the target object of the first image frame that has been calculated and the segmentation information of the target object of the first image frame can be directly obtained.
  • FIG. 4 is an example of a video using a video as a cardiac ultrasound detection, an image frame of a cardiac ultrasound slice in the video, and a target object as a left ventricle.
  • FIG. 4 is described by taking the previous image frame of the current image frame as a reference image frame as an example, but the reference image frame is not limited to only the previous image frame of the current image frame.
  • the three points P1 to P3 are the left ventricular key points (Last Frame) of the previous image frame, and 402 is the current image frame.
  • the three points on the left ventricular keypoint template represent the left ventricular keypoint in the left ventricular keypoint template.
  • the computer device can be based on the affine between the first position information of the left ventricular key point of the previous image frame (that is, the three points P1 to P3) and the preset position information of the left ventricular key point in the left ventricular key point template Transform the relationship, perform affine transformation on the current image frame to obtain the left ventricular map (ie, the ROI image in FIG. 4).
  • the computer device can input the ROI image into the key point detection model and segmentation model in the Multi-task Network, and output the segmentation information of the left ventricle in the ROI image.
  • the white area in 404 is the ROI image. Segmentation information of the left ventricle.
  • the second position information of the key points of the left ventricle in the ROI image is also output, and the positions of the three points in 406 represent the second position information of the key points of the left ventricle in the ROI image.
  • the image 408 can be obtained.
  • the points 408a, 408b and 408c in the 408 are the key points of the left ventricle in the ROI image.
  • 408a is the apex of the left ventricle
  • 408b and 408c are the two endpoints of the mitral valve.
  • the area 408d in 408 is the divided left ventricular area indicated by the division information.
  • the computer device can map the segmentation information of the left ventricle and the second position information of the key points of the left ventricle in the ROI image of 408 to the current image frame to obtain the final result of the current image frame.
  • the final result of the current image frame is the current image frame after mapping to detect relevant information of the left ventricle. It can be seen from the final result of the current image frame in FIG. 4 that the left ventricle has been distinguished from the current image frame.
  • the affine transformation network (TAN) in FIG. 4 is used to represent the network framework involved in the process from the affine transformation process to the detection of left ventricular related information in the left ventricle diagram.
  • the above image segmentation method based on the prior knowledge of the timing of the first position information of the target object key point of the previous reference image frame, combined with the affine transformation process, can quickly determine the target object map without requiring a lot of calculations, Reduced computing processing resources.
  • the obtained target object map is the target object's region of interest, which is equivalent to removing many unrelated other image contents, and then only performing key point detection on the target object map to obtain the second position information of the target object key points; Segment the target object from the target object graph to obtain segmentation information of the target object, and map the segmentation information and the second position information to the current image frame.
  • the current image frame mapped with the segmentation information and the second position information can clearly distinguish the target object, realize the detection and recognition of the target object in the current image frame, and the segmentation and key point detection processing based on the target object map , Which not only eliminates the interference of other unrelated images, but also reduces the amount of calculation.
  • the slice image when the current image frame is a slice image, the slice image has a corresponding slice type.
  • the section types can be divided according to the type of image composition in the section view.
  • the computer device can also perform a slice type recognition process on the target object graph to obtain the slice type to which the target object graph belongs.
  • the slice category when the current image frame is a cardiac ultrasound slice, the slice category includes at least two of the apical two-chamber view (A2C) and the four-chamber apical view (A4C).
  • A2C apical two-chamber view
  • A4C four-chamber apical view
  • the cut plane category may also include other categories, for example, apical five-chamber cut plane.
  • the multi-task network may also include a faceted classification model.
  • the computer device can perform the cut plane classification processing on the target object image through the cut plane classification model in the multi-task network to obtain the cut plane category to which the current image frame belongs. It can be understood that when the current image frame is a cardiac ultrasound slice image, identifying the slice type to which the current image frame belongs can provide the doctor with very important diagnostic reference information. When the current image frame is another type of slice view, the recognized slice type can also provide a certain amount of reference information.
  • the method before step S202, further includes: detecting the initial position information of the key point of the target object from the first image frame of the video; using the first image frame as the previous image frame and the initial position information as The previous position information, referring to the previous position information, detects the position information of the key point of the target object in the next image frame of the previous image frame; the next image frame is used as the previous image frame and the target in the next image frame The position information of the key point of the object is used as the previous position information, and the step of detecting the position information of the key point of the target object in the next image frame of the previous image frame with reference to the previous position information is performed in an iterative process until the video Position information of key points of the target object in the last image frame; treat the last image frame as the previous image frame of the first image frame, and refer to the position information of the key points of the target object in the last image frame to determine the first image frame The final position information of the key point of the target object.
  • the computer device can detect the initial position information of the key point of the target object from the first image frame. That is, rough keypoint detection is performed on the first image frame first to obtain initial position information of the target object keypoint in the first image frame.
  • the computer device can refer to the initial position information of the first image frame to detect the position information of the key point of the target object in the second image frame (that is, the image frame after the first image frame) in the video, and then refer to the second image
  • the position information of the key points of the target object in the frame detect the position information of the key points of the target object in the third image frame in the video, and so on, and iteratively process until the target object key in the last image frame of the video is obtained Point location information.
  • the computer device may use the last image frame as the previous image frame of the first image frame, and refer to the position information of the target object key point in the last image frame to determine the final position information of the target object key point in the first image frame.
  • detecting the position information of the target object key point in the next image frame of the previous image frame includes: according to the affine transformation between the previous position information and the target object key point template Relationship, perform affine transformation on the next image frame of the previous image frame to obtain the target object image in the next image frame; perform key point detection on the target object image in the next image frame to obtain the Position information of key points of the target object.
  • the key point detection of the target object map in the next image frame obtains the position information of the key point of the target object in the target object map of the next image frame, and then the target in the target object map of the next image frame The position information of the key point of the object is mapped to the next image frame, that is, the position information of the key point of the target object in the next image frame can be obtained.
  • the second image frame is affine transformed to obtain the second The target object image in the image frame
  • key point detection is performed on the target object image in the second image frame to obtain the position information of the target object key point in the target object image in the second image frame
  • the second The position information of the key points of the target object in the target object map in the image frames is mapped to the second image frame to obtain the position information of the key points of the target object in the second image frame.
  • the third image frame is affine transformed to obtain the third image frame
  • the position information of the key points of the target object in the target object map in the map is mapped to the third image frame to obtain the position information of the key points of the target object in the third image frame.
  • the computer device can regard the last image frame as the previous image frame of the first image frame, and according to the affine transformation relationship between the target object key point position information in the last image frame and the target object key point template, the first image
  • the frame is affine transformed to obtain the target object image in the first image frame, and key point detection is performed on the target object image in the first image frame to obtain the final position information of the target object key point in the optimized first image frame.
  • the computer device may directly obtain the final position information of the key point of the target object in the optimized first image frame.
  • the computer device may also perform target object segmentation processing on the target object image in the first-order image frame obtained above, to obtain target object segmentation information of the target object image in the first-order image frame.
  • the computer device may segment the target object from the first image frame according to the segmentation information of the target object and the position information of the key point of the target object in the first image frame.
  • the computer device may map the segmentation information of the target object of the target object image and the position information of the key points of the target object in the first image frame to the first image frame to segment the target object from the first image frame.
  • the computer device can also perform a slice classification process on the target object image in the first image frame to obtain the slice type to which the first image frame belongs.
  • the final position information of the key point of the target object in the first image frame obtained in the above manner is optimized and more accurate than the initial position information of the first image frame. Therefore, when the reference image frame includes the first image frame, referring to the affine transformation relationship between the final position information of the target object key point of the first image frame and the target object key point template, the affine transformation of the current image frame can be This makes the target image of the current image frame more accurate.
  • segmentation information of the target object and the second position information of the key points of the target object obtained later in the target object image are more accurate, thereby mapping more accurate segmentation information and the position information of the key points of the target object to the current image frame , Can make the relevant information of the target object image detected in the current image frame more accurate.
  • step S204 includes: determining the preset number of image frames before the current image frame in the video as the reference image frame in order from the nearest to the farther from the current image frame.
  • Step S214 includes: when there are multiple reference image frames, then average the segmentation information of the target object determined according to the first position information of the target object key point in each reference image frame to obtain the final segmentation information of the target object ; Find the average of the second position information determined according to the first position information of the target object key point in each reference image frame, to obtain the final second position information of the target object key point; the final segmentation of the target object The information and the final second position information of the key point of the target object are mapped to the current image frame.
  • the preset number may be one or more. It should be noted that the multiples mentioned in the embodiments of this application refer to at least two.
  • the computer device may execute steps S206 to S210 for all reference image frames. That is, the computer device can separately refer to the affine transformation relationship between the first position information of the target object key point in each reference image frame and the target object key point template to perform affine transformation on the current image frame to obtain the current image frame’s Target object diagram. It can be understood that if there are several reference image frames, the current image frame will be subjected to several affine transformations, and a corresponding number of target object images of the current image frame will be obtained.
  • the key point detection is performed on each target object map to obtain the second position information of the key point of the target object, and the target object is segmented therefrom to obtain the segmentation information of the target object. In this way, there will be second position information of multiple key points of the target object and segmentation information of the target object.
  • the computer device may average the segmentation information of the target object determined according to the first position information of the keypoints of the target object in each reference image frame to obtain the final segmentation information of the target object, and obtain the segmentation information according to each reference
  • the average value of the second position information determined by the first position information of the target object key point in the image frame obtains the final second position information of the target object key point.
  • the computer device may map the final segmentation information of the target object and the final second position information of key points of the target object to the current image frame.
  • the preset number is two
  • the reference image frame may be the previous image frame and the second image frame of the current image frame.
  • the computer device can respectively perform steps S206 ⁇ according to the first position information of the target object key point in the previous image frame of the current image frame and the first position information of the target object key point in the previous second image frame S210, finally obtaining segmentation information of two kinds of target objects and second position information of key points of two kinds of target objects.
  • the computer device may average the segmentation information of the two target objects and average the second position information of the key points of the two target objects to obtain the final segmentation information of the target object and the final second position information of the key points of the target object.
  • the computer device may map the final segmentation information of the target object and the final second position information of key points of the target object to the current image frame.
  • FIG. 5 is a schematic diagram of the principle of multi-sequence processing in an embodiment. That is, the principle of segmenting the target object in the current image frame using the first position information of the key points of the target object of the multiple time-series image frames.
  • FIG. 5 exemplifies that the video is a video of cardiac ultrasound detection, the image frame in the video is a cardiac ultrasound slice, and the target object is a left ventricle.
  • input input a complete video. Since the first image frame F1 does not have the position information of the left ventricular key point of the previous frame, you can use a multitasking network to perform rough left ventricular key point detection on the first image frame F1.
  • the position information of the left ventricular key point is used as the timing affine transformation information of the next frame, so that the subsequent image frame F2 refers to the affine transformation relationship between the position information of the left ventricular key point and the left ventricular key point template to perform Affine transformation processing to obtain the corresponding left ventricular map, and then to detect the key points of the left ventricular map, and map the position information of the detected left ventricular key points to the latter image frame F2, and then map the mapped F2
  • the position information of the left ventricular key point in the image is used as the time series affine transformation information of the next image frame F3, and the image frame F3 is subjected to affine transformation and subsequent processing to obtain the corresponding position information of the left ventricular key point, and so on, Until the key point information of the last image frame in the video is obtained.
  • the position information of the left ventricular key point of the last image frame is returned as the affine transformation reference information of the first image frame of the video, and the first image frame is affine transformed to obtain Corresponding to the left ventricle image, and then based on the left image of the first image frame, the final position information of the key points of the left ventricle in the optimized and more reliable first image frame is calculated. Based on the final position information of the key points of the left ventricle in the optimized first image frame, the current image frame is selected in sequence according to the timing.
  • the final left ventricular key point of the first image frame can be directly obtained Position information, and determine the left ventricular segmentation information of the first image frame according to the left ventricular map of the first image frame, and map the final position information of the left ventricular key point of the first image frame and the left ventricular segmentation information to the first image frame .
  • the current image frame is the second image frame, because there is only the first image frame in front of it, you can refer to the final position information of the left ventricular key point of the first image frame to determine the left ventricular map of the second image frame. It performs key point detection and segmentation of the left ventricle, and maps the obtained segmentation information and the second position information to the second image frame.
  • the current image frame is the third image frame and subsequent image frames
  • the third image frame F3 is the current image frame
  • the first image frame F1 and the second image frame F2 can be used as reference image frames, combined with the multitasking network, the first image frame and the
  • the second position information determined by the first position information of the left ventricular key point in the second image frame is averaged, and will be determined according to the first position information of the left ventricular key point in the first image frame and the second image frame, respectively
  • the segmentation information of the left ventricle is averaged.
  • the obtained final segmentation information of the left ventricle and the final second position information of the key points of the left ventricle are integrated and mapped to the third image frame F3.
  • the images after F1 to F3 mapping are f1 to f3, and the three points in f1 to f3 are the left ventricular key points.
  • the highlighted area with the three left ventricular key points as the end points is
  • the left ventricular region indicated by the left ventricular segmentation information, A2C and A4C are the cut plane categories to which they belong.
  • multiple reference image frames are selected forward as the affine transformation reference information of the current image frame, which can ensure that the source of the affine transformation reference information of the current image frame is diverse, It can reduce the misleading of the single reference image frame information to the subsequent results and improve the accuracy.
  • the method further includes: inputting the target object graph into the multi-task network, and encoding to obtain the feature graph of the target object graph.
  • Step S210 includes: performing key point detection processing on the feature map through a key point detection model in the multi-task network, and outputting second position information of key points of the target object corresponding to the target object map.
  • Step S212 includes: performing segmentation processing on the feature map through the segmentation model in the multi-task network, and outputting segmentation information of the corresponding target object.
  • a multi-task network is a network capable of executing multiple processing tasks in parallel.
  • the multi-task network can include key point detection models and segmentation models.
  • the key point detection model is a machine learning model for detecting key points of the target object.
  • the segmentation model is a machine learning model for segmenting target objects.
  • Feature map refers to the feature map obtained by convolving the image and the filter. It can be understood that the feature map performs feature extraction compared to the original image, which can highlight the image features more.
  • the multi-task network may also include a lightweight coding model.
  • the computer device can input the target object graph to the coding model in the multi-task network, and encode to obtain the feature graph of the target object graph.
  • the lightweight coding model may include MobileNetV2.
  • the computer device may use the L1-norm loss function to return the second position information of the target object key point corresponding to the target object graph through the key point detection model.
  • FIG. 6 is a schematic diagram of a multi-task network structure in an embodiment.
  • the input image of 224*224*3 is the target object image (ROI image), which is encoded by the lightweight network MobileNetV2, and the 7*7*1280 feature image is output.
  • ROI image target object image
  • the feature map into 3 different task channels respectively, namely input into the segmentation classification channel, target object segmentation channel and target object key point detection channel, and perform three different detection processes in parallel.
  • the face classification model in the face classification channel processes the feature map, and finally obtains the binary classification result of the face classification.
  • the key point detection model of the target object key point detection channel performs regression processing, and outputs the X coordinate information and Y coordinate information of the three target object key points, so the position information of the target object key point is 6 position parameters. Decode twice by the segmentation model in the target object segmentation channel to obtain the category to which each pixel in the decoded image belongs.
  • the category to which the pixel belongs includes the foreground category and the background category. It can be understood that the pixels belonging to the foreground category are the foreground, and the pixels belonging to the background category are the background.
  • the target object segmentation information corresponding to the target object map is formed.
  • the target object graph is encoded through the multi-task network to obtain the feature graph.
  • the feature map can more accurately express the feature information of the target object map.
  • the key point detection model and the segmentation model in the multi-task network process the feature map concurrently, which can improve the detection efficiency of the target object image information, and the real-time performance is relatively high.
  • multitasking networks are equivalent to using small networks to achieve the accuracy of large networks, and are lightweight.
  • the method further includes: performing a cut plane classification process on the feature map through the cut plane classification model in the multi-task network to obtain the cut plane category to which the current image frame belongs.
  • the multi-task network also includes a faceted classification model.
  • the slice classification model is a model for detecting the slice type to which the image belongs.
  • the output of the slice classification model is the slice type to which the target object graph belongs. Since the target object image is extracted from the current image frame, the cut plane category to which the target object map belongs is the cut plane category to which the current image frame belongs.
  • the computer device may use the cross-entropy loss algorithm through the slice classification model to obtain the slice type to which the current image frame belongs.
  • the method further includes: after determining the section type to which each image frame in the video belongs, determining the number of image frames corresponding to each section type; using the largest number of section types as the video Corresponding aspect category.
  • f1 and f2 are displayed as A2C
  • f3 is displayed as A4C
  • A2C has the largest number of votes. Therefore, it can be determined that the facet category of the video is A2C, not A4C.
  • the slice type of the video which is equivalent to being able to simultaneously complete the segmentation of the target object in the video and the recognition of the standard slice, which can provide more information for subsequent processing faster.
  • the largest number of cut categories are used as the cut categories corresponding to the video. It can ensure the accuracy of the determined cut plane category, and thus can provide more accurate reference information for subsequent processing.
  • the key point detection process on the feature map through the key point detection model in the multi-task network includes: inputting the feature map into the pre-trained key point detection model, and outputting the target object key points in the target object map The difference between the position information of the target object key point in the target object key point template; the preset position information of the target object key point in the target object key point template and the difference of the position information are added to obtain the target object map The second position information of the key point of the target object.
  • the computer device may extract the target object map in the sample image frame in advance, and combine the marked target object key points in the target object map and the target object key point template in the target image according to the target object map in the sample image frame
  • the sample position difference between the key points of the target object is machine learning trained to obtain the key point detection model. Therefore, after inputting the feature map into the key point detection model, the position information difference between the target object key point in the target object map and the target object key point in the target object key point template can be output.
  • the key point detection model is used to output the position information difference between the target object key point in the target object graph and the target object key point in the target object key point template; the target in the target object key point template
  • the preset position information of the target key point is added to the difference of the position information to obtain the second position information of the target key point of the target object map.
  • the position information difference is smaller than the complete position information data, thereby saving computing resources.
  • the segmentation model in the multi-task network performs semantic segmentation on the feature map, and outputting segmentation information of the corresponding target object includes: inputting the feature map into a pre-trained segmentation model for decoding, and outputting the resulting decoded image
  • Each pixel belongs to the first classification probability of the foreground category and the second classification probability of the background category; for each pixel in the decoded image, the larger of the first classification probability and the second classification probability corresponding to the pixel is selected
  • the category corresponding to the classification probability is taken as the category to which the pixels belong; according to each pixel belonging to the foreground category in the decoded image, the segmentation information of the target object corresponding to the target object map is determined.
  • the segmentation model can predict the classification probability that each pixel in the decoded image belongs to the foreground category and the background category, respectively.
  • the segmentation information of the target object corresponding to the target object image can be obtained directly according to the pixels in the foreground category in the decoded image.
  • the decoded image may not match the size of the target object image.
  • the classification model is used to determine the category of each pixel, and then segmentation is achieved, which can refine the granularity of the segmentation and improve the segmentation accuracy.
  • the training step of the segmentation model includes: acquiring each sample image frame in the sample video; acquiring a first target object segmentation label corresponding to each sample image frame; and separating each sample image frame and the corresponding first target object
  • the segmentation label is input into the initial segmentation model, and iterative machine learning training is performed to obtain the basic segmentation model.
  • the sample video is a video that trains the machine learning model as training data.
  • the sample image frame is an image frame used to train the machine learning model in the training video.
  • the training data may include a sample video, and a first target object segmentation label corresponding to each sample image frame in the sample video.
  • the first target object segmentation label is used to mark the outline of the target object in the corresponding sample image frame.
  • the first target object segmentation label may be a manually added label.
  • the first target object segmentation label can be marked in the mask image of the sample image frame. Entering the first target object segmentation label into the initial segmentation model is equivalent to inputting the mask image of the sample image frame into the initial segmentation model. Marking the first target object segmentation label in the mask image of the sample image frame is equivalent to marking the outline of the target object in the sample image frame.
  • the computer device may input each sample image frame and the corresponding first target object segmentation label into a preset initial segmentation model, and perform iterative machine learning training to obtain a basic segmentation model.
  • the computer device may input the target object graph or the feature map of the target object graph into the basic segmentation model, perform segmentation processing of the target object, and obtain segmentation information of the target object graph.
  • the computer device may further optimize and adjust the segmentation model to improve the accuracy of the segmentation model, and based on the optimized segmentation model, perform segmentation processing of the target object on the target object graph or the feature map of the target object graph.
  • the training step of the segmentation model may further include an optimization adjustment step for the basic segmentation model.
  • the optimization adjustment step may include the following steps: sequentially selecting the current sample image frame from the sample image frames, for each current sample Image frame, select a preset number of boundary feature points representing the boundary of the target object from the label contour formed by the first target object segmentation label of the current sample image frame of the same image frame; through optical flow tracking operation, track the boundary The position information of the feature points in the current sample image frame; connect and smooth the position information of the boundary feature points in the current sample image frame to obtain the second target object segmentation label of the current sample image frame; according to each sample image frame and The corresponding second target object segmentation label performs iterative optimization training on the basic segmentation model to obtain the optimized segmentation model.
  • the computer device may sequentially select the current sample image frame from the sample image frames. For each current sample image frame, the first sample frame of the current sample image frame On a label contour formed by a target object segmentation label, a preset number of boundary feature points representing the boundary of the target object are selected.
  • the boundary feature points are the feature points that can represent the boundary of the target object.
  • the computer device may uniformly select a preset number of points as the boundary feature points on the label contour formed by segmenting the label from the first target object of the current image frame of the same image frame. For example, uniformly select 20 points on the label contour formed by the first target object segmentation label of the current image frame as boundary feature points.
  • the preset number of points on the label contour are uniformly selected as the boundary feature points of the target object, which can avoid the interference of the blurred edges and artifacts, thereby improving the accuracy of calculation Sex.
  • the computer device can track the position information of the selected boundary feature point in the current sample image frame through the optical flow tracking operation and using the optical flow algorithm. It can be understood that the position information of the tracked boundary feature point in the current sample image frame is equivalent to forming a new boundary feature point.
  • the computer device can connect and smooth the position information of the boundary feature points in the current sample image frame. That is, it is equivalent to connecting the new boundary feature points formed by tracking, and forming the label outline through curve fitting, that is, obtaining the second target object segmentation label of the current sample image frame (that is, obtaining a new set of target object segmentation labels) .
  • the second target object segmentation label is not a manually added label, but a label generated by tracking with an optical flow algorithm and used to mark the contour of the target object in the sample image frame.
  • the computer device may input each sample image frame and the second target object segmentation label generated by optical flow tracking into the basic segmentation model, perform iterative model optimization training, and obtain an optimized segmentation model.
  • the optical flow algorithm can be used to track optical flow, and can also track the entire target object.
  • the optical flow algorithm has certain requirements for image quality, but there may be rich artifacts and blurred boundaries in the video image, which is very misleading to the tracking structure of the optical flow algorithm.
  • the video is an ultrasound video
  • congestion, artifacts, and blurred boundaries in the target object area will inevitably produce a large tracking error.
  • the tracking of the entire target object is very poor.
  • the boundary feature points of the target object that is, the key points located in the outline of the target object
  • the outline points have more image contrast information than the points inside the target object.
  • optical flow tracking in this embodiment only occurs between two frames, so it is not necessary to add the labeling information of the key points of the contour of the target object in each sample image frame.
  • the tracking of the boundary feature points therefore, avoiding the process of manually labeling the boundary feature points.
  • this embodiment avoids the situation of adding labeling information of the key points of the outline of the target object, the tracking of optical flow between two frames belongs to a process of generating labels in the online training model, and it is not necessary to consider the optical flow algorithm.
  • the implementation is simple and easy to operate.
  • it is equivalent to training and learning the optical flow tracking algorithm while training the model, so that the network model itself has optical flow tracking capabilities, so that during the test process, the network can segment the target object in the current frame while passing the light Flow tracing takes into account the smooth segmentation label information of the target object in the previous frame, resulting in a smoother result.
  • the segmentation label can be automatically generated, so it is very suitable for semi-supervised learning, especially for videos lacking manual annotation.
  • This embodiment uses an indirect method to extend the optical flow algorithm to generate segmentation labels, so that the segmentation model can be automatically adjusted and optimized to achieve end-to-end training, which is highly time-sensitive and easy to implement.
  • the method further includes: mining the difficult sample pixels in the current sample image frame through a basic segmentation model; removing pixels other than the difficult sample pixels and the target object pixels from the current sample image frame .
  • iterative optimization training of the basic segmentation model according to each sample image frame and the corresponding second target object segmentation label may include: removing each sample image frame and corresponding second target object after excluding pixels The segmentation label is input into the basic segmentation model for iterative model optimization training.
  • the hard samples are specifically excavated and input into the segmentation model together with the target object pixels, and the hard sample pixel points of the edge can be targeted for training, thereby improving the attention and recognition ability of the segmentation model at the edge, thereby It can make the edge of the optimized segmentation model more smooth.
  • the basic segmentation model refers to the segmentation model obtained by iterative machine learning training through a set of first target object segmentation labels.
  • Difficult sample pixels refer to background pixels that are easy to classify wrongly. In general, it is difficult to sample pixel points, which are usually located in the boundary areas such as the edge of the image and the segmentation edge of the target object.
  • the pixels of the target object are foreground pixels.
  • the background pixel is a pixel other than the target pixel.
  • mining the difficult sample pixels in the current sample image frame through the basic segmentation model includes: inputting each sample image frame with the corresponding second target object segmentation label into the basic segmentation model to obtain the sample image frame The segmentation loss of each pixel; according to the order of segmentation loss from large to small, select the background pixel points that match the number of target object pixels in the sample image frame to obtain difficult sample pixel points. Therefore, it is difficult
  • the sample pixels are background pixels that are misclassified.
  • the segmentation loss is used to represent the difference between the predicted value and the true value. The greater the difference between the two, the greater the segmentation loss, and the closer the two, the smaller the segmentation loss.
  • the basic segmentation model already has certain segmentation capabilities, so inputting each sample image frame with the corresponding second target object segmentation label into the basic segmentation model can perform target object segmentation processing on each sample image frame to obtain The segmentation information of the target object of each sample image frame. Since the position information of the boundary feature points of the optical flow tracking in the current sample image frame is connected and smoothed, it is equivalent to forming a second target object segmentation label. Therefore, the second target object segmentation label is equivalent to being able to represent the true value in the sample image frame, that is, the pixel points in the contour formed by the second target object segmentation label in the sample image frame are the target object pixels located in the contour The outer pixels are background pixels.
  • the obtained target object segmentation information of each sample image frame can represent the predicted value in each sample image frame, that is, the pixels located in the segmented target object area are the target object pixels, and the pixels located outside the target object area are the background pixel. Therefore, the computer device can determine the true value of each pixel in the sample image frame through the second target object segmentation label, and determine each pixel in the sample image frame through the target object segmentation information of the sample image frame segmented by the basic segmentation model By comparing the predicted value with the real value, the segmentation loss of each pixel in the sample image frame is obtained.
  • the computer device may select background pixels matching the number of target object pixels in the sample image frame from the sample image frame in order of the largest to small segmentation loss of the background pixels to obtain difficult samples pixel.
  • matching the number of target object pixels is not limited to the number of background pixels must be exactly the same as the number of target object pixels, as long as the difference between the number of background pixels and the number of target object pixels is met It should be within the preset equilibrium range, that is, do not make the difference between the two too large, so as to avoid a large number of unnecessary calculations. For example, if the number of pixel points of the target object is 100, the background pixel points with the segmentation loss in the top 100 can be selected from the background pixel points to obtain 100 difficult sample pixel points. Assuming that the equalization range is plus or minus 20 range intervals, then 80-120 background pixels can be selected as difficult sample pixels in the order of the segmentation loss of background pixels from large to small.
  • the computer device can remove pixels other than the corresponding difficult sample pixels and target object pixels from each sample image frame.
  • the computer device may input each sample image frame after excluding pixels and the corresponding second target object segmentation label into the basic segmentation model, perform iterative model optimization training, and obtain an optimized segmentation model.
  • the computer device can mine the pixel points of the hard sample through the online hard sample mining algorithm OHEM (online hard example miniing) algorithm.
  • OHEM online hard example miniing
  • FIG. 7 is a method for adaptively training a segmentation model for segmenting smooth edges in an embodiment.
  • the video is a video of cardiac ultrasound detection
  • the image frame in the video is a cardiac ultrasound slice
  • the target object is a left ventricle. Therefore, the first target object segmentation label is the first left ventricular segmentation label
  • the second target object segmentation label is the second left ventricular segmentation label.
  • the first left ventricular segmentation label is manually marked. Therefore, the first left ventricular segmentation label and the corresponding sample image frame that are manually labeled for each sample image frame can be used as sample data for machine learning training to train a basic segmentation model, that is, execute 1.
  • the computer device can uniformly select a preset number of points from the label contour represented by the first left ventricular segmentation label of the t-1th sample image frame as boundary feature points, through Lucas-kanade (LK) optical flow algorithm, tracking the position information of these boundary feature points in the t-th sample image frame, and then connecting and smoothing the position information of the boundary feature points in the t-th sample image frame to obtain a smooth connection
  • the label is the second left ventricular segmentation label.
  • the dark histogram in 702 represents background pixels, and the light histogram represents left ventricular pixels (ie, foreground pixels).
  • the left group of histograms in 702 represents the second left ventricular segmentation label.
  • the number of background pixels and left ventricular pixels in the t-th sample image frame can be seen.
  • the background pixels are significantly more than the left ventricular pixels, so balance processing is needed to remove them from the t-th sample image frame It is difficult to sample pixels and pixels other than the pixels of the left ventricle to reduce unnecessary calculations caused by too many background pixels.
  • the group of histograms on the right shows the number of difficult sample pixels and left ventricular pixels in the t-th sample image frame after excluding the pixels.
  • the right histogram clearly shows the number of background pixels and left ventricular pixels More balanced, so that the difference is not too large.
  • the mask image of the t-th sample image frame after excluding pixels is 704.
  • the second left ventricular segmentation label of the t-th sample image frame is still included in 704. Then, the sample image frame after excluding the pixels and the corresponding second left ventricular segmentation label can be input into the basic segmentation model, that is, execute 2 to perform iterative model optimization training. It should be noted that FIG. 7 is only used for examples, not for limitation.
  • the computer automatically performs optical flow tracing to generate a new set of target object segmentation labels, that is, the second target object segmentation label, combined with the mining of difficult samples, can adaptively segment the model It can be optimized, and the optimization effect can be realized automatically during the model training process, thereby saving a lot of complicated test work.
  • the method of adaptively training the segmentation model for segmenting smooth edges only selects the points on the label contour formed by the target object segmentation label for optical flow tracking, that is, local optical flow tracking, so it does not require a large amount of calculation , Saving computing resources.
  • the target object keypoint template generation step includes: expanding the label outline formed by the first target object segment label of each sample image frame in the sample video by a preset range; according to the target object in the preset cut plane category The position regularity of the echocardiogram in the heart, each expanded range is expanded to obtain a cropped range; from each sample image frame, a cropped picture corresponding to the cropped range is cropped; it will be matched with each cropped picture The position information of the key points of the target object is averaged to obtain the preset position information of the key points of the target object; according to the preset position information of the key points of the target object, the template of the key point of the target object is generated.
  • each sample image frame in each sample video has a corresponding first target object segmentation label.
  • the target object segmentation label is a label used to represent the external contour of the target object.
  • the computer device can expand the label contour formed by dividing the corresponding first target object into labels by a preset range, and after expanding the preset range, the area within the label contour can basically cover all different cut planes Target object area in the category of echocardiogram of the heart. Therefore, the area within the label outline after expanding the preset range can be roughly regarded as the position of the target object.
  • the computer device may find the position rule of the target object in the preset slice views of different slice types, and expand and expand the expanded range in each sample image frame to obtain the cropping range.
  • the expansion of the expanded range here is to determine the cropping range, and the label outline formed by the first target object segmentation label is no longer expanded. It can be understood that when the expanded range is expanded, the label contour formed by the first target object segmentation label is no longer expanded, but is based on the position rule of the target object in the preset slice views of different slice types , Select a range larger than the range formed by the label outline as the cropping range.
  • the cropping range covers the range formed by the label outline.
  • the position of the left ventricle in the preset slice views of different slice categories is that the left ventricle is located in the upper left corner of the cardiac ultrasound slice.
  • the computer device can increase the width of the left ventricle by 50% to the left and below of the sample image frame on the basis of the expanded range, to obtain the cropped range.
  • the expanded cropping range can include not only the left ventricular region, but also more information for determining the type of cut plane.
  • the computer device can crop out a cropped picture corresponding to the cropping range from each sample image frame. In this way, multiple cropped pictures can be obtained, and it can be understood that as many sample image frames as there are cropped pictures.
  • the computer device may adjust the size of the image cropped according to the cropping range to a size consistent with the input size of the multitasking network, and use the adjusted image as the cropped image.
  • the computer device can average the position information of the target object key points in all the cropped pictures to obtain preset position information of the target object key points; and generate the target object key point templates according to the preset position information of the target object key points.
  • the computer device may determine the target object key point represented by the first target object segmentation label in the cropped picture, and determine the position information of the target object key point in the cropped picture .
  • the computer device can average the position information of the key points of the target object in all the cropped pictures to obtain the preset position information of the key points of the target object.
  • sample videos belonging to different slice types can be used as training data, so the sample image frame also corresponds to multiple slice types.
  • the target object key point template determined based on the sample image frames of the sample image frames of different cut plane types can be used to detect image frames of various different cut plane types.
  • FIG. 8 is a schematic diagram of the principle of generating a target object key point template in an embodiment. It should be noted that FIG. 8 exemplifies the image frame in the video as a heart ultrasound slice and the target object as the left ventricle. Then, the left ventricular keypoint template is the target object keypoint template to be generated. Referring to FIG. 8
  • the heart ultrasound slices of different slice types A2C and A4C are used as the basic data for generating the left ventricular key point template, through the following series of processing: each of the left ventricle of the heart ultrasound slices as the sample image frame
  • the label outline expands a certain range, expands and expands according to the position rule of the left ventricle in different types of cross-sectional views to obtain a cropping range, and then collects images according to the cropping range and adjusts to a crop that matches the input size of the multitasking network
  • the picture and the position information of the key points of the left ventricle of all the cropped pictures are averaged to obtain the preset position information of the key points of the left ventricle.
  • the left ventricular key point template 802 is finally generated.
  • averaging the position information of the target object key points determined by the above method can improve the accuracy and applicability of the target object key point template. Furthermore, it provides an accurate reference for the subsequent affine transformation.
  • an image segmentation apparatus 900 is provided, characterized in that the apparatus includes: a selection module 902, an affine transformation module 904, a target object information acquisition module 906, and a segmentation module 908, where:
  • the selection module 902 is used to sequentially select the current image frame in the video according to the time sequence.
  • the affine transformation module 904 is used to determine the reference image frame from the image frame whose timing in the video is before the current image frame; obtain the first position information of the target object key point in the reference image frame; refer to the first position information and The affine transformation relationship between the target object key point templates, and the affine transformation of the current image frame to obtain the target object image of the current image frame.
  • the target object information acquisition module 906 is used to perform key point detection on the target object map to obtain second position information of key points of the target object; segment the target object from the target object map to obtain segmentation information of the target object.
  • the segmentation module 908 is used to segment the target object from the current image frame according to the segmentation information and the second position information.
  • the device 900 further includes:
  • the first frame key point information optimization module 901 is used to detect the initial position information of the target object key point from the first image frame of the video; use the first image frame as the previous image frame and the initial position information as the previous position information, Refer to the previous position information to detect the position information of the key point of the target object in the next image frame of the previous image frame; use the next image frame as the previous image frame and the position of the key point of the target object in the next image frame Information as the previous position information, return to the previous position information, and detect the position information of the target object key point in the next image frame of the previous image frame, to perform iterative processing until the last image frame of the video is obtained
  • the position information of the key points of the target object; the last image frame is regarded as the previous image frame of the first image frame, and the position information of the target object key points in the last image frame is referred to determine the final target object key point in the first image frame. Location information.
  • the first frame key point information optimization module 901 is also used to perform an affine transformation on the next image frame of the previous image frame according to the affine transformation relationship between the previous position information and the target object key point template To obtain the target object image in the next image frame; perform key point detection on the target object image in the next image frame to obtain the position information of the target object key point in the next image frame.
  • the affine transformation module 904 is also used to determine the preset number of image frames in the video before the current image frame as the reference image frame in order from the closest to the current image frame; the segmentation module 908, also used to average the segmentation information of the target object determined according to the first position information of the target object key point in each reference image frame when there are multiple reference image frames to obtain the final segmentation of the target object Information; find the average of the second position information determined according to the first position information of the target object key point in each reference image frame, to obtain the final second position information of the target object key point; The segmentation information and the final second position information of the key point of the target object are mapped to the current image frame.
  • the target object information acquisition module 906 is also used to input the target object graph into the multi-task network and encode the target object graph feature map; through the key point detection model in the multi-task network, the feature map is keyed The point detection process outputs the second position information of the key points of the target object corresponding to the target object map; through the segmentation model in the multi-task network, the feature map is semantically segmented to output the segmentation information of the corresponding target object.
  • the target object information acquisition module 906 is also used to perform the facet classification process on the feature map through the facet classification model in the multi-task network to obtain the facet category to which the current image frame belongs; when each image in the video is determined After the section category to which the frame belongs, the number of image frames corresponding to each section category is determined; the section category with the largest number is used as the section category corresponding to the video.
  • the target object information acquisition module 906 is further used to input the feature map into the pre-trained key point detection model, and output the target object key points in the target object map and the target object key points in the target object key point template The difference in position information between them; adding the preset position information of the target object key point in the target object key point template to the difference in position information to obtain the second position information of the target object key point in the target object map.
  • the target object information acquisition module 906 is further used to input the feature map into a pre-trained segmentation model for decoding, and the resulting decoded image each pixel point belongs to the first classification probability of the foreground category and the background category. Second classification probability; for each pixel in the decoded image, select the category corresponding to the first classification probability corresponding to the pixel and the larger classification probability of the second classification probability as the category to which the pixel belongs; according to the decoded image In each pixel point belonging to the foreground category, the segmentation information of the target object corresponding to the target object map is determined.
  • the target object information obtaining module 906 is further used to obtain each sample image frame in the sample video; obtain a first target object segmentation label corresponding to each sample image frame; and each sample image frame and the corresponding first The target object segmentation label is input into the initial segmentation model, and iterative machine learning training is performed to obtain the basic segmentation model.
  • the target object information acquisition module 906 is also used to sequentially select the current sample image frame from the sample image frames, for each current sample image frame; the first target of the same image frame from the previous sample image frame On the label contour formed by the object segmentation label, select a preset number of boundary feature points that represent the boundary of the target object; track the position information of the boundary feature points in the current sample image frame by optical flow tracking operation; place the boundary feature points in the current
  • the position information in the sample image frame is connected and smoothed to obtain the second target object segmentation label of the current sample image frame; the basic segmentation model is iteratively optimized and trained based on each sample image frame and the corresponding second target object segmentation label, The optimized segmentation model is obtained.
  • the target object information acquisition module 906 is also used to mine difficult sample pixels in the current sample image frame through the basic segmentation model, the difficult sample pixels are background pixels that are misclassified; from the current sample image Remove pixels except the difficult sample pixels and target object pixels in the frame; input the sample image frame and corresponding second target object segmentation label after each pixel is removed into the basic segmentation model, and iteratively model Optimize training.
  • the affine transformation module 904 is further used to expand the label outline formed by dividing the first target object of each sample image frame in the sample video by a preset label; according to the position rule of the target object in the image frame , Expand each expanded range to obtain the cropping range; from each sample image frame, crop out the cropping picture that matches the cropping range; the position information of the key point of the target object in each cropping picture Averaging to obtain the preset position information of the target object key point; according to the preset position information of the target object key point, generate the target object key point template.
  • the computer device may be the server 120 shown in FIG. 1. It can be understood that the computer device may also be the terminal 110.
  • the computer device includes a processor, memory, and network interface connected by a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device can store an operating system and a computer program. When the computer program is executed, it may cause the processor to execute an image segmentation method.
  • the processor of the computer device is used to provide calculation and control capabilities, and support the operation of the entire computer device.
  • a computer program may be stored in the internal memory. When the computer program is executed by the processor, the processor may cause the processor to execute an image segmentation method.
  • the network interface of the computer device is used for network communication.
  • FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
  • the image segmentation apparatus provided by the present application may be implemented in the form of a computer program.
  • the computer program may run on a computer device as shown in FIG. 11, and the non-volatile storage medium of the computer device may store the composition
  • Various program modules of the abnormality detection device for example, the selection module 902, the affine transformation module 904, the target object information acquisition module 906, and the segmentation module 908 shown in FIG.
  • the computer program composed of each program module is used to cause the computer device to perform the steps in the image segmentation methods of the various embodiments of the present application described in this specification.
  • the computer device may use the image segmentation device 900 shown in FIG. 9
  • the selection module 902 selects the current image frame in sequence in the video in sequence.
  • the computer device can determine the reference image frame from the image frame whose timing in the video is before the current image frame through the affine transformation module 904; obtain the first position information of the key point of the target object in the reference image frame; refer to the first position information
  • the affine transformation relationship between the target object keypoint template and the current image frame is affine transformed to obtain the target object image of the current image frame.
  • the computer device may perform key point detection on the target object map through the target object information acquisition module 906 to obtain second position information of key points of the target object; segment the target object from the target object map to obtain segmentation information of the target object.
  • the computer device can segment the target object from the current image frame according to the segmentation information and the second position information through the segmentation module 908.
  • a computer device which includes a memory and a processor.
  • the memory stores a computer program.
  • the processor is caused to perform the following steps:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • Detect the initial position information of the key points of the target object from the first image frame of the video use the first image frame as the previous image frame and the initial position information as the previous position information, and refer to the previous position information to detect the previous image frame
  • the position information of the target object key point in the next image frame use the next image frame as the previous image frame and the position information of the target object key point in the next image frame as the previous position information, return to the previous Position information, the step of detecting the position information of the key points of the target object in the next image frame of the previous image frame to perform iterative processing until the position information of the key points of the target object in the last image frame of the video is obtained;
  • the bit image frame is regarded as the previous image frame of the first image frame, and the final position information of the target object key point in the first image frame is determined by referring to the position information of the target object key point in the last image frame.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the affine transformation is performed on the next image frame of the previous image frame to obtain the target object map in the next image frame; for the latter image frame
  • the key point detection of the target object image in is performed to obtain the position information of the key point of the target object in the next image frame.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the preset number of image frames in the video before the current image frame are determined as reference image frames; when there are multiple reference image frames, each reference image
  • the segmentation information of the target object determined by the first position information of the key point of the target object in the frame is averaged to obtain the final segmentation information of the target object; the first position information of the key point of the target object in each reference image frame is obtained respectively
  • the average value of the determined second position information obtains the final second position information of the target object key point; maps the final segmentation information of the target object and the final second position information of the target object key point to the current image frame.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the second location information of; through the segmentation model in the multi-task network, semantic segmentation processing is performed on the feature map, and the segmentation information of the corresponding target object is output.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the feature map is subjected to facet classification processing to obtain the facet category to which the current image frame belongs; when the facet category to which each image frame in the video belongs is determined, it is determined that each facet category corresponds to The number of image frames; the cut category with the largest number is regarded as the cut category corresponding to the video.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • each sample image frame in the sample video obtains the first target object segmentation label corresponding to each sample image frame; input each sample image frame and the corresponding first target object segmentation label into the initial segmentation model for iterative machine learning Training to get the basic segmentation model.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the difficult sample pixels are misclassified background pixels; pixels other than the difficult sample pixels and target object pixels are excluded from the current sample image frame Points; iterative optimization training of the basic segmentation model according to each sample image frame and the corresponding second target object segmentation label includes: inputting each sample image frame after removing pixels and the corresponding second target object segmentation label into the basic In the segmentation model, iterative model optimization training is performed.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • each expanded range is expanded to obtain Cropping range; from each sample image frame, crop a cropping picture that matches the cropping range; average the position information of the key points of the target object in each cropping picture to obtain the preset position information of the key points of the target object; According to the preset position information of the target object key point, the target object key point template is generated.
  • a computer-readable storage medium which stores a computer program.
  • the processor is caused to perform the following steps:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • Detect the initial position information of the key points of the target object from the first image frame of the video use the first image frame as the previous image frame and the initial position information as the previous position information, and refer to the previous position information to detect the previous image frame
  • the position information of the target object key point in the next image frame use the next image frame as the previous image frame and the position information of the target object key point in the next image frame as the previous position information, return to the previous Position information, the step of detecting the position information of the key points of the target object in the next image frame of the previous image frame to perform iterative processing until the position information of the key points of the target object in the last image frame of the video is obtained;
  • the bit image frame is regarded as the previous image frame of the first image frame, and the final position information of the target object key point in the first image frame is determined by referring to the position information of the target object key point in the last image frame.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the affine transformation is performed on the next image frame of the previous image frame to obtain the target object map in the next image frame; for the latter image frame
  • the key point detection of the target object image in is performed to obtain the position information of the key point of the target object in the next image frame.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the preset number of image frames in the video before the current image frame are determined as reference image frames; when there are multiple reference image frames, each reference image
  • the segmentation information of the target object determined by the first position information of the key point of the target object in the frame is averaged to obtain the final segmentation information of the target object; the first position information of the key point of the target object in each reference image frame is obtained respectively
  • the average value of the determined second position information obtains the final second position information of the target object key point; maps the final segmentation information of the target object and the final second position information of the target object key point to the current image frame.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the second location information of; through the segmentation model in the multi-task network, semantic segmentation processing is performed on the feature map, and the segmentation information of the corresponding target object is output.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the feature map is subjected to facet classification processing to obtain the facet category to which the current image frame belongs; when the facet category to which each image frame in the video belongs is determined, it is determined that each facet category corresponds to The number of image frames; the cut category with the largest number is regarded as the cut category corresponding to the video.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • each sample image frame in the sample video obtains the first target object segmentation label corresponding to each sample image frame; input each sample image frame and the corresponding first target object segmentation label into the initial segmentation model for iterative machine learning Training to get the basic segmentation model.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • the difficult sample pixels are misclassified background pixels; pixels other than the difficult sample pixels and target object pixels are excluded from the current sample image frame Points; iterative optimization training of the basic segmentation model according to each sample image frame and the corresponding second target object segmentation label includes: inputting each sample image frame after removing pixels and the corresponding second target object segmentation label into the basic In the segmentation model, iterative model optimization training is performed.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following operations:
  • each expanded range is expanded to obtain Cropping range; from each sample image frame, crop a cropping picture that matches the cropping range; average the position information of the key points of the target object in each cropping picture to obtain the preset position information of the key points of the target object; According to the preset position information of the target object key point, the target object key point template is generated.
  • steps in the embodiments of the present application are not necessarily executed in the order indicated by the step numbers. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The order is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种图像分割方法、装置、计算机设备及存储介质,该方法包括:在视频中按照时序依次选取当前图像帧;从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取所述参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图;对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从所述目标对象图中分割目标对象,得到目标对象的分割信息;根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。本申请的方案,减少了计算量。

Description

图像分割方法、装置、计算机设备及存储介质
本申请要求于2018年11月27日提交、申请号为2018114256948、发明名称为“图像分割方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种图像分割方法、装置、计算机设备及存储介质。
背景技术
随着科学技术的飞速发展,各种视频技术的应用也越来越广泛。视频中每帧图像中都可能存在多个对象,在一些应用场景中,通常需要将其中某个目标对象从视频图像中分割出来。比如,在医学领域,通常需要从针对人体区域的超声视频图像中分割出某一部分的图像。
相关方法在针对视频图像的某一目标对象进行分割时,是将视频中的原图和光流图一并输入至卷积神经网络中进行编码,然后将各自编码后得到的特征图串联,再统一解码,以从原图中分割出目标对象。这样一来,需要进行大量的数据处理,耗费大量计算处理资源。
发明内容
基于此,提供一种图像分割方法、装置、计算机设备及存储介质,可以解决相关方法需耗费大量计算处理资源的问题。
一种图像分割方法,所述方法包括:
在视频中按照时序依次选取当前图像帧;
从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
获取所述参考图像帧中目标对象关键点的第一位置信息;
参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的目标对象图;
对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;
从所述目标对象图中分割目标对象,得到目标对象的分割信息;
根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。
一种图像分割装置,所述装置包括:
选取模块,用于在视频中按照时序依次选取当前图像帧;
仿射变换模块,用于从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取所述参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的目标对象图;
目标对象信息获取模块,用于对所述目标对象图进行关键点检测,得到目标对象关键点 的第二位置信息;从所述目标对象图中分割目标对象,得到目标对象的分割信息;
分割模块,用于根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。
一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
在视频中按照时序依次选取当前图像帧;
从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
获取所述参考图像帧中目标对象关键点的第一位置信息;
参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的目标对象图;
对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;
从所述目标对象图中分割目标对象,得到目标对象的分割信息;
根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。
一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行如下步骤:
在视频中按照时序依次选取当前图像帧;
从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
获取所述参考图像帧中目标对象关键点的第一位置信息;
参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的目标对象图;
对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;
从所述目标对象图中分割目标对象,得到目标对象的分割信息;
根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。
上述图像分割方法、装置、计算机设备和存储介质,选取视频中时序位于当前图像帧之前的图像帧,作为参考图像帧,并预先生成了目标对象关键点模板,将参考图像帧中目标对象关键点的第一位置信息作为仿射变换参考信息,按照参考图像帧中目标对象关键点的第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换处理,得到所述当前图像帧中的目标对象图。即,根据之前的参考图像帧的目标对象关键点的第一位置信息这一时序先验知识,结合仿射变换处理,能够比较快速地确定出目标对象图,而不需要大量计算,减少了计算处理资源。得到的目标对象图中即为目标对象的感兴趣区域,相当于剔除了很多不相关的其他图像内容,进而仅针对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从所述目标对象图中分割目标对象,得到目标对象的分割信息,将所述分割信息和所述第二位置信息映射至所述当前图像帧。映射有分割信息和第二位置信息的当前图像帧中就能够明显地区分出目标对象,实现了对当前图像帧中目标对象的分割,而且基于目标对象图进行的分割和关键点检测处理,既排除了其他不相关的图像的干扰,又能减少计算量。
一种图像分割方法,所述方法包括:
在心脏超声检测的视频中按照时序依次选取当前图像帧;
从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
获取所述参考图像帧中左心室关键点的第一位置信息;
参照第一位置信息和左心室关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的左心室图;
对所述左心室图进行关键点检测,得到左心室关键点的第二位置信息;
从所述左心室图中分割左心室,得到左心室的分割信息;
根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出左心室。
在其中一个实施例中,所述在所述在视频中按照时序依次选取当前图像帧之前,所述方法还包括:
从所述视频的首位图像帧中,检测出左心室关键点的初始位置信息;
将所述首位图像帧作为前一图像帧以及将所述初始位置信息作为前一位置信息,参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的左心室关键点的位置信息;
将所述后一图像帧作为前一图像帧以及将所述后一图像帧中的左心室关键点的位置信息作为前一位置信息,返回所述参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的左心室关键点的位置信息的步骤,以进行迭代处理,直至得到所述视频的末位图像帧中的左心室关键点的位置信息;
将末位图像帧当作所述首位图像帧的前一图像帧,参照末位图像帧中的左心室关键点的位置信息,确定首位图像帧中左心室关键点最终的位置信息。
在其中一个实施例中,所述参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的左心室关键点的位置信息包括:
按照所述前一位置信息和左心室关键点模板之间的仿射变换关系,对所述前一图像帧的后一图像帧进行仿射变换,得到所述后一图像帧中的左心室图;
对所述后一图像帧中的左心室图进行关键点检测,得到所述后一图像帧中的左心室关键点的位置信息。
在其中一个实施例中,所述从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧包括:
按照距所述当前图像帧由近到远的顺序,将所述视频中在所述当前图像帧之前的预设数量的图像帧,确定为参考图像帧;
所述根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出左心室,包括:
当参考图像帧为多个时,则对依照每个参考图像帧中左心室关键点的第一位置信息所确定出的左心室的分割信息求平均,得到左心室最终的分割信息;
求取分别依照每个参考图像帧中左心室关键点的第一位置信息所确定出的第二位置信息的平均值,得到左心室关键点最终的第二位置信息;
将所述左心室最终的分割信息和所述左心室关键点最终的第二位置信息映射至所述当前图像帧。
在其中一个实施例中,所述方法还包括:
将所述左心室图输入至多任务网络中,编码得到所述左心室图的特征图;
所述对所述左心室图进行关键点检测,得到左心室关键点的第二位置信息包括:
通过所述多任务网络中的关键点检测模型,对所述特征图进行关键点检测处理,输出与 所述左心室图对应的左心室关键点的第二位置信息;
所述从所述左心室图中分割左心室,得到左心室的分割信息包括:
通过所述多任务网络中的分割模型,对所述特征图进行语义分割处理,输出相应左心室的分割信息。
在其中一个实施例中,所述将所述左心室图输入至多任务网络中,编码得到所述左心室图的特征图之后,所述方法还包括:
通过所述多任务网络中的切面分类模型,对所述特征图进行切面分类处理,得到所述当前图像帧所属的切面类别;
当确定出所述视频中每一图像帧所属的切面类别后,则确定每个切面类别所对应的图像帧的数量;
将数量最多的切面类别作为所述视频所对应的切面类别。
在其中一个实施例中,所述通过所述多任务网络中的关键点检测模型,对所述特征图进行关键点检测处理包括:
将特征图输入预先训练的关键点检测模型中,输出所述左心室图中的左心室关键点与左心室关键点模板中的左心室关键点之间的位置信息差值;
将所述左心室关键点模板中的左心室关键点的预设位置信息与所述位置信息差值相加,得到所述左心室图的左心室关键点的第二位置信息。
在其中一个实施例中,所述通过多任务网络中的分割模型,对所述特征图进行语义分割处理,输出相应左心室的分割信息包括:
将所述特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;
针对所述解码图像中每个像素点,选取所述像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为所述像素点所属的类别;
根据所述解码图像中属于前景类别的各像素点,确定与所述左心室图相应的左心室的分割信息。
在其中一个实施例中,所述分割模型的训练步骤包括:
获取样本视频中的各样本图像帧;
获取分别与各所述样本图像帧相应的第一左心室分割标签;
将各所述样本图像帧和相应第一左心室分割标签输入初始分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
在其中一个实施例中,所述分割模型的训练步骤还包括:
从所述样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,
从所述当前样本图像帧的前一样本图像帧的第一左心室分割标签所形成的标签轮廓上,选取预设数量的表示左心室边界的边界特征点;
通过光流跟踪操作,跟踪所述边界特征点在所述当前样本图像帧中的位置信息;
将所述边界特征点在所述当前样本图像帧中的位置信息连接并进行平滑,得到所述当前样本图像帧的第二左心室分割标签;
根据每个样本图像帧和相应的第二左心室分割标签对所述基本的分割模型进行迭代优化训练,得到优化后的分割模型。
在其中一个实施例中,所述方法还包括:
通过所述基本的分割模型挖掘所述当前样本图像帧中的难样本像素点,所述难样本像素点为分类错误的背景像素点;
从所述当前样本图像帧中剔除除所述难样本像素点和左心室像素点之外的像素点;
所述根据每个样本图像帧和相应的第二左心室分割标签对所述基本的分割模型进行迭代优化训练包括:
将每个剔除像素点之后的样本图像帧和相应的第二左心室分割标签输入所述基本的分割模型中,进行迭代地模型优化训练。
在其中一个实施例中,所述左心室关键点模板生成步骤包括:
将样本视频中各样本图像帧的第一左心室分割标签所形成的标签轮廓外扩预设范围;
按照左心室在图像帧中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;
从每个所述样本图像帧中,裁剪出与所述裁剪范围相符的剪裁图片;
将与每个剪裁图片中的左心室关键点的位置信息求平均,得到左心室关键点的预设位置信息;
根据所述左心室关键点的预设位置信息,生成左心室关键点模板。
一种图像分割装置,所述装置包括:
选取模块,用于在心脏超声检测的视频中按照时序依次选取当前图像帧;
仿射变换模块,用于从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取所述参考图像帧中左心室关键点的第一位置信息;参照第一位置信息和左心室关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的左心室图;
左心室信息检测模块,用于对所述左心室图进行关键点检测,得到左心室关键点的第二位置信息;从所述左心室图中分割左心室,得到左心室的分割信息;
分割模块,用于根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出左心室。
一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
在心脏超声检测的视频中按照时序依次选取当前图像帧;
从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
获取所述参考图像帧中左心室关键点的第一位置信息;
参照第一位置信息和左心室关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的左心室图;
对所述左心室图进行关键点检测,得到左心室关键点的第二位置信息;
从所述左心室图中分割左心室,得到左心室的分割信息;
根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出左心室。
一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行如下步骤:
在心脏超声检测的视频中按照时序依次选取当前图像帧;
从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
获取所述参考图像帧中左心室关键点的第一位置信息;
参照第一位置信息和左心室关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的左心室图;
对所述左心室图进行关键点检测,得到左心室关键点的第二位置信息;
从所述左心室图中分割左心室,得到左心室的分割信息;
根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出左心室。
上述图像分割方法、装置、计算机设备和存储介质,根据之前的参考图像帧的左心室关键点的第一位置信息这一时序先验知识,结合仿射变换处理,能够比较快速地确定出左心室图,而不需要大量计算,减少了计算处理资源。得到的左心室图中即为左心室的感兴趣区域,相当于剔除了很多不相关的其他图像内容,进而仅针对所述左心室图进行关键点检测,得到左心室关键点的第二位置信息;从所述左心室图中分割左心室,得到左心室的分割信息,将所述分割信息和所述第二位置信息映射至所述当前图像帧。映射有分割信息和第二位置信息的当前图像帧中就能够明显地区分出左心室,实现了对当前图像帧中的左心室的分割,而且基于左心室图进行的分割和关键点检测处理,既排除了其他不相关的图像的干扰,又能减少计算量。
附图说明
图1为一个实施例中图像分割方法的应用场景图;
图2为一个实施例中图像分割方法的流程示意图;
图3为一个实施例中切面图的示意图;
图4为一个实施例中图像分割方法的原理示意图;
图5为一个实施例中多时序检测的原理示意图;
图6为一个实施例中多任务网络结构示意图;
图7为一个实施例中自适应训练分割光滑边缘的分割模型的方法;
图8为一个实施例中生成目标对象关键点模板的原理示意图;
图9为一个实施例中图像分割装置的框图;
图10为另一个实施例中图像分割装置的框图;
图11为一个实施例中计算机设备的内部结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中图像分割方法的应用场景图。参照图1,该应用场景中包括通过网络连接的终端110和服务器120。终端110可以是智能电视机、台式计算机或移动终端,移动终端可以包括手机、平板电脑、笔记本电脑、个人数字助理和穿戴式设备等中的至少一种。服务器120可以用独立的服务器或者是多个物理服务器组成的服务器集群来实现。
终端110可以将视频传输至服务器120,使服务器120分割出该视频的每一帧图像中的目标对象。
例如,服务器120可以在视频中按照时序依次选取当前图像帧;从在该视频中的时序位 于当前图像帧之前的图像帧中,确定参考图像帧;获取该参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图;对该目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从该目标对象图中分割目标对象,得到目标对象的分割信息;根据该分割信息和该第二位置信息,从该当前图像帧中分割出该目标对象。另外,服务器120可以将分割结果反馈至终端110进行显示。
可以理解,终端110可以不将视频发送至服务器进行检测分析处理,终端110自身也可以具备执行本申请各实施例中图像分割方法的功能。比如,终端自身具备计算机处理功能,从而可以针对视频执行本申请各实施例中图像分割方法的各个步骤。
在一个实施例中,该终端110还可以包括医疗检测终端。医疗检测终端是用于医疗检测的仪器终端。医疗检测终端可以包括检测探头和显示设备。其中,检测探头能够起到摄像机的镜头的作用。随着检测探头的转动,能够将检测对象的各个结构清晰地显示在显示设备上。在一个实施例中,医疗检测终端可以是心脏超声检测仪。心脏超声检测仪,是通过使用超声波的方法对心脏进行检测的仪器。可以理解,医疗检测终端还可以是针对人体其他部位进行超声检测的仪器。
为了便于理解,现以终端110为心脏超声检测仪为例,对使用场景进行举例说明。比如,医生可以将终端110中的检测探头放在患者的心脏部位,进行检测,检测探头可以使用超声波,采集一帧一帧的心脏超声切面图,构成心脏超声检测的视频在显示设备上进行显示。终端110还可以将心脏超声检测的视频传输至服务器120,使服务器120分割出该视频的每一帧图像中的左心室。
需要说明的是,上述举例仅用于示意,并不限定于终端110必须是医疗检测终端。视频可以不限定于心脏超声检测的视频,也可以是其他任意类型的视频,要检测的也可以不是左心室图像,还可以是视频中的任意单一目标对象,例如,单一目标对象为可以心脏、心脏的左心室等。
图2为一个实施例中图像分割方法的流程示意图。本实施例主要以该图像分割方法应用于计算机设备进行举例说明,计算机设备可以为图1中的服务器120,也可以是终端110。参照图2,该方法具体包括如下步骤:
S202,在视频中按照时序依次选取当前图像帧。
需要说明的是,视频可以是任意类型的视频。视频可以包括按时序排列的一帧一帧的图像帧。
在一个实施例中,视频可以包括日常生活中的普通视频。比如,手机录的视频、或者视频网站的各种视频节目里面的视频。
在一个实施例中,视频也可以包括使用特定检测技术采集的特定类型的视频。比如,超声视频。超声视频,是指使用超声波技术进行超声检测时所采集到的视频。
在一个实施例中,超声视频,可以是对人体器官进行超声检测得到的视频。超声视频可以包括腹部超声检测的视频,腹部超声检测的视频是对腹部进行超声检测时所采集到的视频。比如,对肝脏、胆囊、胃部等部位进行超声检测得到的视频。超声视频可以不限定于对人体器官的检测得到,比如,也可以由对非人体器官进行超声检测得到。
在一个实施例中,超声视频可以包括心脏超声检测的视频。心脏超声检测的视频,是对心脏进行超声检测时所采集到的视频。
在一个实施例中,视频中的图像帧可以为正常的图像帧。正常的图像帧,是指目标对象是以其正常状态呈现的图像帧。在一个实施例中,视频中的图像帧也可以为非正常的图像帧。非正常的图像帧,是指目标对象是以非正常状态呈现的图像帧。例如,视频中的图像帧可以为切面图。切面图,是指模拟一个物体被“切开”的效果图,在切面图中,目标对象呈现的是被“切开”的状态。
在一个实施例中,图像帧也可以为超声切面图。超声切面图,是二维超声图,是检测探头产生的超声声束进入胸壁后呈扇形扫描,将从人体反射回来的回波信号以光点形式组成切面图像。可以理解,从比较形象的角度来表述,该反射回来的回波信号,使该扇形扫描形成的这个扫描面有一种将器官“切开”的效果,所以从人体反射回来的回波信号以光点形式组成了切面图像。
可以理解,切面图并不限定于通过超声方式得到,比如使用日常普通的采集方法直接采集在物理世界中就呈现为切面的目标对象的图像,也能够得到切面图。
在一个实施例中,超声切面图可以包括心脏超声切面图,心脏超声检测的视频中包括按时序排列的一帧一帧的心脏超声切面图。即,当视频为心脏超声检测的视频时,当前图像帧,可以是心脏超声切面图。
可以理解,超声切面图还可以包括肝脏超声切面图或肺部超声切面图等。
为了便于理解切面图,现结合图3进行举例说明。图3是以心脏超声切面图为例对切面图进行说明的示意图。超声声束呈扇形扫描,得到一个扫描面302,有一种近似将将心脏“切开”的效果,就能得到的图3中的心脏超声切面图。可以理解,扫描面并未真正的将心脏切开,而是根据反射回来的回波信号的远近,能够组成近似由该扫描面切割得到的切面图像而已。
在一个实施例中,计算机设备可以在视频中按照时序依次选取当前图像帧。即,计算机设备可以按照时序由前到后的顺序,依次选取一个当前图像帧,针对所选取的当前图像帧,执行下述步骤S204~S214。在按照本申请各实施例中图像分割方法从一个当前图像帧中分割出目标对象之后,则可以按照时序从视频中选取下一个图像帧作为当前图像帧,继续针对新的当前图像帧执行本申请各实施例中图像分割方法的各处理步骤。
S204,从在视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧。
可以理解,视频中图像帧都是按照时序进行排序的。可以理解,时序位于当前图像帧之前的图像帧,是指视频中时序在当前图像帧之前的全部或部分图像帧。比如,当前图像帧为第5个图像帧,那么,在视频中时序位于第5个图像帧之前的图像帧,可以是时序在前4位的图像帧中的全部或部分。
其中,参考图像帧,是在分割当前图像帧中的目标对象时,用于提供参照信息的时序在当前图像帧之前的图像帧。可以理解,可以使用参考图像帧所提供的时序先验参照信息,对当前图像帧进行目标对象分割处理。
可以理解,参考图像帧的数量可以为至少一个,至少一个是指一个或者一个以上。
在一个实施例中,当参考图像帧为一个时,可以是时序位于当前图像帧之前的图像帧中的任意一个图像帧,也可以是当前图像帧的前一图像帧。当参考图像帧为多个时,则可以是时序位于当前图像帧之前的图像帧中的任意多个图像帧,也可以是按照距当前图像帧由近到远的顺序依次选取的多个图像帧。
S206,获取参考图像帧中目标对象关键点的第一位置信息。
其中,目标对象,是需要从当前图像帧中分割出来的对象。
目标对象可以是任意想要分割出来的对象。比如,在一个自行车比赛视频中,目标对象可以是视频画面中的某辆自行车,也可以是视频画面中的某一参赛运动员。
在一个实施例中,目标对象可以是左心室。可以理解,目标对象还可以是肝脏或肺部等,在此不作一一列举。
目标对象关键点,是用于表示目标对象的特征的点。
在一个实施例中,当目标对象为左心室时,目标对象关键点,即为左心室关键点。左心室关键点,则指在图片中用于表示左心室的特征的点。
在一个实施例中,左心室关键点包括左心室顶尖点和二尖瓣的两个端点。左心室顶尖点,即心尖,是心脏左下呈圆锥状的尖端部。二尖瓣(mitral valve)即左房室瓣,附于左纤维房室环上,系由心内膜的皱褶形成。二尖瓣有两个瓣膜,二尖瓣的两个端点即为两个瓣膜的端点。需要说明的是,左心室关键点,不限于上述3个点,还可以设置左心室其他位置的点为关键点。
在一个实施例中,当目标对象为人脸时,则目标对象关键点可以包括表示五官特征的点。可以理解,在正常情况下,人的五官特征基本上不会发生变化,所以能够表示出人脸的特征,因此可以用作关键点。
可以理解,参考图像帧中目标对象关键点的第一位置信息,是已知的。因为,按照本申请实施例中的图像分割方法,是按时序选取当前图像帧,在对一个当前图像帧中的目标对象分割完毕后,则会选取下一个图像帧作为新的当前图像帧,以此迭代,那么,在对当前图像帧中的目标对象进行分割时,时序在该当前图像帧之前的图像帧则已经分割出其中的目标对象关键点的第一位置信息和目标对象的分割信息。因此,在该当前图像帧之前的图像帧中确定的参考图像帧中,就已经包括了该参考图像帧中的目标对象关键点的第一位置信息和目标对象的分割信息。所以,计算机设备可以获取参考图像帧中已知的目标对象关键点的第一位置信息。
S208,按照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧中的目标对象图。
其中,关键点模板,用于表示预先设置的关键点的位置信息。目标对象关键点模板,用于表示以主要区域为目标对象区域的图像为参照,预先设置的目标对象关键点的预设位置信息。也就是说,目标对象关键点模板中各个目标对象关键点的预设位置信息,是在主要区域为目标对象区域的图像上标示出来的目标对象关键点的位置信息。可以理解,主要区域为目标对象区域,是指目标对象的区域面积在图像中占主要部分。
仿射变换关系,用于表示由参考图像帧中目标对象关键点的第一位置信息变换到目标对象关键点模板中目标对象关键点的预设位置信息,所要经过的仿射变换操作。
在一个实施例中,按照该仿射变换关系,可以将当前图像帧进行旋转,平移,裁剪等仿射变换操作,得到当前图像帧的目标对象图。
在一个实施例中,仿射变换关系,可以通过仿射变换矩阵进行表示。
可以理解,第一位置信息和目标对象关键点模板之间的仿射变换关系,用于表示将参考图像帧中目标对象关键点的第一位置信息变换到与目标对象关键点模板中的目标对象关键点的预设位置信息一致时,所需要经过的仿射变换处理。而,由于当前图像帧中的目标对象关键点与在其之前的参考图像帧中的目标对象关键点,都是表示同一个目标对象的关键点,所 以两个图像帧中目标对象关键点之间的相对位置是一致的。因此,根据该仿射变换关系对当前图像帧进行仿射变换,也就相当于将当前图像帧的目标对象关键点的位置信息调整为与目标对象关键点模板中目标对象关键点的预设位置信息。由于目标对象关键点模板用于表示以主要区域为目标对象区域的图像为参照,预先设置的目标对象关键点的预设位置信息,所以按照上述仿射变换关系对当前图像帧进行仿射变换处理后,能够得到当前图像帧中的目标对象的感兴趣区域(ROI,region of interest),即为目标对象图。感兴趣区域,就是从图像中选择一个表示图像分析所关注的焦点的区域。可以理解,目标对象图中目标对象区域为焦点区域。
在一个实施例中,计算机设备可以从目标对象关键点模板中提取目标对象关键点的预设位置信息,并根据第一位置信息和目标对象关键点模板中目标对象关键点的预设位置信息计算出一个变换矩阵。计算机设备可以将当前图像帧按照该变换矩阵进行仿射变换处理,得到当前图像帧的目标对象图。在一种可能实现方式中,计算机设备可以将当前图像帧乘以该变换矩阵,得到当前图像帧的目标对象图。
可以理解,当目标对象为左心室时,目标对象图为左心室图。左心室图,即为左心室的感兴趣区域,是以左心室的区域为主要区域的图像。
S210,对目标对象图进行关键点检测,得到目标对象关键点的第二位置信息。
可以理解,计算机设备可以直接对目标对象图自身进行图像分析,从中识别出目标对象关键点,得到目标对象关键点的第二位置信息。
在一个实施例中,计算机设备可以将目标对象图输入多任务网络中,通过多任务网络中的关键点检测模型,对目标对象图进行关键点检测处理,输出与该目标对象图对应的目标对象关键点的第二位置信息。其中,多任务网络,是能够并行执行多个处理任务的网络。多任务网络中包括关键点检测模型。
在一个实施例中,计算机设备可以通过关键点检测模型,检测目标对象图中的目标对象关键点与目标对象关键点模板中目标对象关键点之间的位置信息差值;将目标对象关键点模板中的目标对象关键点的预设位置信息与该位置信息差值相加,得到目标对象图的目标对象关键点的第二位置信息。
S212,从目标对象图中分割目标对象,得到目标对象的分割信息。
目标对象的分割信息,用于将目标对象区域从目标对象图中分割出来。即,用于将目标对象区域从目标对象图中与其他区域区分开来。在一个实施例中,目标对象的分割信息,包括目标对象的像素点。
在一个实施例中,计算机设备可以对目标对象图进行目标对象分割处理,得到目标对象的分割轮廓。
在一个实施例中,多任务网络中还可以包括分割模型。计算机设备可以将目标对象图输入多任务网络中预先训练的分割模型中,通过该分割模型,对该目标对象图进行语义分割处理,输出相应目标对象的分割信息。
在一个实施例中,计算机设备可以通过分割模型预测出目标对象图中每个像素点所属的类别,根据属于前景类别的各像素点,构成目标对象图相应的目标对象分割信息。可以理解,每个像素点所属的类别包括前景类别和背景类别。属于前景类别的像素点,则为目标对象的像素点,即能够构成目标对象图相应的目标对象的分割信息。
S214,根据分割信息和第二位置信息,从当前图像帧中分割出目标对象。
在一个实施例中,计算机设备可以将目标对象图中的目标对象的分割信息和目标对象关键点的第二位置信息,映射至当前图像帧。
在一个实施例中,计算机设备可以按照步骤S208中对当前图像帧进行的仿射变换操作的逆变换操作,将目标对象图中的目标对象的分割信息和目标对象关键点的第二位置信息,进行仿射变换处理,以将目标对象图中的目标对象的分割信息和目标对象关键点的第二位置信息,映射至当前图像帧。
可以理解,映射目标对象的分割信息和目标对象关键点的第二位置信息之后的当前图像帧中,能够将目标对象明显地区分显示出来,即实现了将目标对象从当前图像帧中分割出来的目的。
需要说明的是,由于首位图像帧之前没有图像帧,所以当当前图像帧不为首位图像帧时,则可以执行步骤S202~S214。
在一个实施例中,当当前图像帧为首位图像帧时,计算机设备可以对首位图像帧这一整帧进行目标对象关键点检测,得到首位图像帧中目标对象关键点的位置信息,以及从首位图像帧这一整帧中分割目标对象,得到目标对象的分割信息,将首位图像帧中的目标对象关键点的位置信息和目标对象的分割信息映射至首位图像帧。
在一个实施例中,计算机设备也可以将首位图像帧这一整帧输入多任务网络中,通过多任务网络中的关键点检测模型,对首位图像帧进行目标对象关键点检测,通过多任务网络中的分割模型,对首位图像帧进行语义分割处理,输出相应目标对象的分割信息。
在一个实施例中,在执行步骤S202之前,计算机设备就可以计算出首位图像帧的目标对象关键点的位置信息,以及首位图像帧的目标对象的分割信息。这样一来,当当前图像帧为首位图像帧时,则可以直接获取已经计算出的首位图像帧的目标对象关键点的位置信息,以及首位图像帧的目标对象的分割信息。
为了便于理解,现结合图4对图像分割方法的原理进行解释说明。图4是以视频为心脏超声检测的视频、视频中的图像帧心脏超声切面图、以及以目标对象为左心室进行举例说明的。需要说明的是,图4是以当前图像帧的前一图像帧作为参考图像帧为例进行说明的,但并不限定参考图像帧仅为当前图像帧的前一图像帧。参照图4,P1~P3这3个点即为前一图像帧的左心室关键点(Last Frame Landmark),402为当前图像帧。左心室关键点模板上的3个点表示左心室关键点模板中的左心室关键点。计算机设备可以根据前一图像帧的左心室关键点(即P1~P3这3个点)的第一位置信息,以及左心室关键点模板中左心室关键点的预设位置信息之间的仿射变换关系,对当前图像帧进行仿射变换,得到左心室图(即图4中的ROI图像)。计算机设备可以将ROI图像分别输入多任务网络(Multi-task Network)中的关键点检测模型、分割模型中,输出该ROI图像中左心室的分割信息,404中的白色区域即为该ROI图像中左心室的分割信息。还会输出该ROI图像中左心室关键点的第二位置信息,406中的3个点所处的位置,即表示该ROI图像中左心室关键点的第二位置信息。将所输出的信息皆对应至ROI图像中,即可得到图像408。408中的点408a、408b以及408c即为ROI图像中左心室关键点。其中,408a为左心室顶尖点,408b和408c分别为二尖瓣的2个端点。408中的区域408d即为分割信息所表示的分割出的左心室区域。计算机设备可以将408的ROI图像中左心室的分割信息和左心室关键点的第二位置信息,映射至当前图像帧,即得到当前图像帧的最终结果。当前图像帧的最终结果,即是映射后的检测出左心室的相关信息的当前图像帧。从图4中当前图像帧的最终结果中可以看出,已经将左心室从当前图像帧中区分出 来了。可以理解,图4中的仿射变换网络(TAN,temporal affine network),用于表示从仿射变换处理到检测出左心室图中的左心室相关信息这一过程所涉及的网络框架。
上述图像分割方法,根据之前的参考图像帧的目标对象关键点的第一位置信息这一时序先验知识,结合仿射变换处理,能够比较快速地确定出目标对象图,而不需要大量计算,减少了计算处理资源。得到的目标对象图中即为目标对象的感兴趣区域,相当于剔除了很多不相关的其他图像内容,进而仅针对该目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从该目标对象图中分割目标对象,得到目标对象的分割信息,将该分割信息和该第二位置信息映射至该当前图像帧。映射有分割信息和第二位置信息的当前图像帧中就能够明显地区分出目标对象,实现了对当前图像帧中的目标对象的检测识别,而且基于目标对象图进行的分割和关键点检测处理,既排除了其他不相关的图像的干扰,又能减少计算量。
在一个实施例中,当当前图像帧为切面图时,切面图具有相应的切面类别。切面类别可以根据切面图中的图像组成种类来划分。计算机设备还可以对目标对象图进行切面类别识别处理,得到目标对象图所属的切面类别。
在一个实施例中,当当前图像帧为心脏超声切面图时,切面类别包括心尖二腔心切面(A2C,apical two chamber view)和心尖四腔心切面(A4C,apical four chamber view)中的至少一种。在其他实施例中,切面类别还可以包括其他类别,比如,心尖五腔心切面。
在一个实施例中,多任务网络中还可以包括切面分类模型。计算机设备可以通过多任务网络中的切面分类模型,对目标对象图进行切面分类处理,得到所述当前图像帧所属的切面类别。可以理解,当当前图像帧是心脏超声切面图时,识别出当前图像帧所属的切面类别,能够提供给医生非常重要的诊断参考信息。当当前图像帧是其他类型的切面图时,识别出来的切面类别,也能够提供一定的参考信息量。
在一个实施例中,在步骤S202之前,该方法还包括:从视频的首位图像帧中,检测出目标对象关键点的初始位置信息;将首位图像帧作为前一图像帧以及将初始位置信息作为前一位置信息,参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息;将后一图像帧作为前一图像帧以及将后一图像帧中的目标对象关键点的位置信息作为前一位置信息,返回参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息的步骤,以进行迭代处理,直至得到视频的末位图像帧中的目标对象关键点的位置信息;将末位图像帧当作首位图像帧的前一图像帧,参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中目标对象关键点最终的位置信息。
可以理解,首位图像帧的前面是没有图像帧的,所以,计算机设备可以从首位图像帧中,检测出目标对象关键点的初始位置信息。即,对首位图像帧先进行粗略的关键点检测,得到首位图像帧中目标对象关键点的初始位置信息。
计算机设备可以将参照首位图像帧的初始位置信息,检测视频中第二个图像帧(即首位图像帧的后一图像帧)中的目标对象关键点的位置信息,然后,再参照第二个图像帧中的目标对象关键点的位置信息,检测视频中第三个图像帧中的目标对象关键点的位置信息,以此类推,进行迭代处理,直至得到视频的末位图像帧中的目标对象关键点的位置信息。计算机设备可以将末位图像帧当作首位图像帧的前一图像帧,参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中目标对象关键点最终的位置信息。
在一个实施例中,参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息包括:按照前一位置信息和目标对象关键点模板之间的仿射变换关系,对前一图像帧的后一图像帧进行仿射变换,得到后一图像帧中的目标对象图;对后一图像帧中的目标对象图进行关键点检测,得到后一图像帧中的目标对象关键点的位置信息。
可以理解,对后一图像帧中的目标对象图进行关键点检测,得到的是后一图像帧的目标对象图中目标对象关键点的位置信息,进而将后一图像帧的目标对象图中目标对象关键点的位置信息映射至后一图像帧,即可以将得到后一图像帧中的目标对象关键点的位置信息。
在一个实施例中,从首位图像帧开始描述,按照首位图像帧的初始位置信息和目标对象关键点模板之间的仿射变换关系,对第二个图像帧进行仿射变换,得到第二个图像帧中的目标对象图,对第二个图像帧中的目标对象图进行关键点检测,得到第二个图像帧中的目标对象图中的目标对象关键点的位置信息,然后可以将第二个图像帧中的目标对象图中的目标对象关键点的位置信息映射至第二个图像帧中,得到第二个图像帧中的目标对象关键点的位置信息。接着,再按照第二个图像帧中的目标对象关键点的位置信息和目标对象关键点模板之间的仿射变换关系,对第三个图像帧进行仿射变换,得到第三个图像帧中的目标对象图,对第三个图像帧中的目标对象图进行关键点检测,得到第三个图像帧中的目标对象图中的目标对象关键点的位置信息,然后可以将第三个图像帧中的目标对象图中的目标对象关键点的位置信息映射至第三个图像帧中,得到第三个图像帧中目标对象关键点的位置信息。以此类推,直至按照倒数第二个图像帧中的目标对象关键点的位置信息和目标对象关键点模板之间的仿射变换关系,对末位图像帧进行仿射变换,得到末位图像帧中的目标对象图,对末位图像帧中的目标对象图进行关键点检测,得到末位图像帧中的目标对象关键点的位置信息。
计算机设备可以将末位图像帧当作首位图像帧的前一图像帧,按照末位图像帧中的目标对象关键点的位置信息和目标对象关键点模板之间的仿射变换关系,对首位图像帧进行仿射变换,得到首位图像帧中的目标对象图,对首位图像帧中的目标对象图进行关键点检测,得到优化后的首位图像帧中目标对象关键点最终的位置信息。
可以理解,当当前图像帧为首位图像帧时,计算机设备可以直接获取该优化后的首位图像帧中目标对象关键点最终的位置信息。计算机设备还可以对上述得到的首位图像帧中的目标对象图,进行目标对象分割处理,得到首位图像帧中的目标对象图的目标对象的分割信息。计算机设备可以根据该首位图像帧中的目标对象图的目标对象的分割信息和目标对象关键点的位置信息,从首位图像帧中分割出目标对象。比如,计算机设备可以将该首位图像帧中的目标对象图的目标对象的分割信息和目标对象关键点的位置信息,映射至首位图像帧中,以从首位图像帧中分割出目标对象。在一个实施例中,计算机设备还可以对首位图像帧中的目标对象图进行切面分类处理,得到首位图像帧所属的切面类别。
上述实施例中,按照上述方式得到的首位图像帧中目标对象关键点最终的位置信息,相较于首位图像帧的初始位置信息而言,进行了优化,更为准确。因此,当参考图像帧包括首位图像帧时,参照该首位图像帧的目标对象关键点最终的位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,能够使得到的当前图像帧的目标对象图更加的准确。进而,使后续得到的该目标对象图的目标对象的分割信息和目标对象关键点的第二位置信息更加准确,从而,将更加准确的分割信息和目标对象关键点的位置信息映射至当前图像帧,能够使得当前图像帧中所检测出的目标对象图像的相关信息更加的准确。
在一个实施例中,步骤S204包括:按照距当前图像帧由近到远的顺序,将视频中在当前图像帧之前的预设数量的图像帧,确定为参考图像帧。步骤S214包括:当参考图像帧为多个时,则对依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的目标对象的分割信息求平均,得到目标对象最终的分割信息;求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息;将目标对象最终的分割信息和目标对象关键点最终的第二位置信息映射至当前图像帧。
其中,预设数量可以为一个或多个。需要说明的是,本申请各实施例中所述的多个,指至少两个。
当参考图像帧为多个时,计算机设备则可以针对参考图像帧都执行步骤S206~S210。即,计算机设备可以分别参照每个参考图像帧中目标对象关键点的第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图。可以理解,有几个参考图像帧,就会对当前图像帧进行几次仿射变换,就会得到相应个数的当前图像帧的目标对象图。并对每个目标对象图都进行关键点检测得到目标对象关键点的第二位置信息,并从中分割出目标对象,得到目标对象的分割信息。这样一来,就会有多个目标对象关键点的第二位置信息和目标对象的分割信息。
另外,计算机设备可以对依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的目标对象的分割信息求平均,得到目标对象最终的分割信息,并求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息。计算机设备可以将目标对象最终的分割信息和目标对象关键点最终的第二位置信息映射至当前图像帧。
在一个实施例中,预设数量为两个,参考图像帧可以为当前图像帧的前一图像帧和前第二个图像帧。那么,计算机设备则可以分别根据当前图像帧的前一图像帧中目标对象关键点的第一位置信息,以及前第二个图像帧中目标对象关键点的第一位置信息,分别执行步骤S206~S210,最后得到2种目标对象的分割信息和2种目标对象关键点的第二位置信息。计算机设备可以将2种目标对象的分割信息求平均,并对2种目标对象关键点的第二位置信息求平均,得到目标对象最终的分割信息和目标对象关键点最终的第二位置信息。计算机设备可以将目标对象最终的分割信息和目标对象关键点最终的第二位置信息映射至当前图像帧。
图5为一个实施例中多时序处理的原理示意图。即,使用多个时序在前的图像帧目标对象关键点的第一位置信息分割当前图像帧中目标对象的原理示意。需要说明的是,图5是以视频为心脏超声检测的视频、视频中的图像帧为心脏超声切面图、以及以目标对象为左心室进行举例说明的。参照图5,输入一个完整的视频,由于首位图像帧F1没有往前帧的左心室关键点的位置信息,所以,可以使用多任务网络对首位图像帧F1进行粗略的左心室关键点检测,检测到的左心室关键点的位置信息作为下一帧的时序仿射变换信息,使后一图像帧F2参照该左心室关键点的位置信息和左心室关键点模板之间的仿射变换关系,进行仿射变换处理,得到相应左心室图,进而对其左心室图进行关键点检测,并将检测到的左心室关键点的位置信息映射至该后一图像帧F2,然后再将映射后的F2中的左心室关键点的位置信息作为再后一图像帧F3的时序仿射变换信息,使图像帧F3进行仿射变换及后续处理,得到相应的左心室关键点的位置信息,以此类推,直到获得视频中末位图像帧的关键点信息。如图5中带有初始化的箭头所示,再将末位图像帧的左心室关键点的位置信息返回作为视频首位图像帧的仿射变换参考信息,对首位图像帧进行仿射变换处理,得到相应左心室图,进而基于首 位图像帧的左心室图,计算出优化的、较为可靠的的首位图像帧中左心室关键点最终的位置信息。基于优化后的首位图像帧中左心室关键点最终的位置信息,按照时序依次选取当前图像帧,当当前图像帧为首位图像帧时,则可以直接获取该首位图像帧的左心室关键点最终的位置信息,并根据上述首位图像帧的左心室图确定出首位图像帧的左心室的分割信息,将该首位图像帧的左心室关键点最终的位置信息和左心室的分割信息映射至首位图像帧。当当前图像帧为第二个图像帧时,由于其前面只有首位图像帧,所以,可以参照首位图像帧的左心室关键点最终的位置信息,确定出第二个图像帧的左心室图,对其进行关键点检测以及分割左心室,并将得到的分割信息和所述第二位置信息映射至该第二个图像帧。当当前图像帧为第三个图像帧及以其以后的图像帧时,则可以按照距当前图像帧由近到远的顺序,选取2个参考图像帧。如图5所示,可以第三个图像帧F3为当前图像帧时,则可以将首位图像帧F1和第二个图像帧F2作为参考图像帧,结合多任务网络,将分别依照首位图像帧和第二个图像帧中左心室关键点的第一位置信息确定出的第二位置信息求平均,并将分别依照首位图像帧和第二个图像帧中左心室关键点的第一位置信息所确定出的左心室的分割信息求平均。然后,将得到的左心室最终的分割信息和左心室关键点最终的第二位置信息集成映射至第三个图像帧F3。此外,还可以确定出每个当前图像帧所属的切面类别。如图5所示,F1~F3映射后图像分别为f1~f3,f1~f3中的3个点即为左心室关键点,突出显示的以3个左心室关键点为端点的区域,即为左心室的分割信息所表示的左心室区域,A2C和A4C即为所属的切面类别。
上述实施例中,通过与当前图像帧的亲近关系,往前选取多个参考图像帧,作为该当前图像帧的仿射变换参考信息,能够保证当前图像帧的仿射变换参考信息来源多样化,能够减少单一参考图像帧信息缺失对后续结果的误导,提高了准确性。
在一个实施例中,该方法还包括:将目标对象图输入至多任务网络中,编码得到该目标对象图的特征图。步骤S210包括:通过多任务网络中的关键点检测模型,对该特征图进行关键点检测处理,输出与该目标对象图对应的目标对象关键点的第二位置信息。步骤S212包括:通过多任务网络中的分割模型,对该特征图进行语义分割处理,输出相应目标对象的分割信息。
其中,多任务网络,是能够并行执行多个处理任务的网络。多任务网络中可以包括关键点检测模型和分割模型。关键点检测模型,是用于检测目标对象关键点的机器学习模型。分割模型,是用于分割出目标对象的机器学习模型。
特征图(Feature Map),是指图像和滤波器进行卷积后得到的特征图。可以理解,特征图相较于原图进行了特征提取,能够更加突出图像特征。
在一个实施例中,多任务网络中还可以包括轻量级的编码模型。计算机设备可以将目标对象图输入至多任务网络中的编码模型,编码得到目标对象图的特征图。在一个实施例中,轻量级的编码模型可以包括MobileNetV2。
在一个实施例中,计算机设备可以通过关键点检测模型使用L1-norm损失函数回归出目标对象图对应的目标对象关键点的第二位置信息。
图6为一个实施例中多任务网络结构示意图。参照图6,输入224*224*3的图像即为目标对象图(ROI图像),经过轻量级网络MobileNetV2进行编码,输出7*7*1280的特征图。随后将特征图分别输入3个不同的任务通道中,即分别输入切面分类通道、目标对象分割通道以及目标对象关键点检测通道中,并行地进行三种不同的检测处理。如图6中所示,切面 分类通道中的切面分类模型对特征图处理,最终得到切面类别的二分类结果。目标对象关键点检测通道中关键点检测模型进行回归处理,输出3个目标对象关键点的X坐标信息和Y坐标信息,所以目标对象关键点的位置信息是6个位置参数。通过目标对象分割通道中的分割模型进行2次解码,得到解码图像中每个像素点所属的类别,像素点所属的类别包括前景类别和背景类别。可以理解,属于前景类别的像素点即为前景,属于背景类别的像素点即为背景。由于输出的解码图像尺寸为112*112,是输入的目标对象图的尺寸的1/2,所以可以继续对解码图像进行插值,使其尺寸与输入的目标对象图的尺寸一致为224*224,然后再根据插值之后的解码图像中属于前景类别的各像素点,构成目标对象图相应的目标对象分割信息。
上述实施例中,通过多任务网络对目标对象图进行编码,得到特征图。特征图能够更准确地表达目标对象图的特征信息。进而,通过多任务网络中的关键点检测模型、以及分割模型并发地对特征图进行处理,能够提高目标对象图像信息的检测效率,实时性比较高。此外,多任务网络相当于属于使用小网络达到大网络的准确性,而且属于轻量级。
在一个实施例中,该方法还包括:通过多任务网络中的切面分类模型,对特征图进行切面分类处理,得到当前图像帧所属的切面类别。
可以理解,多任务网络中还包括切面分类模型。其中,切面分类模型,是用于检测图像所属的切面类别的模型。
可以理解,切面分类模型输出的是目标对象图所属的切面类别。由于目标对象图是从当前图像帧中提取出来的,所以,目标对象图所属的切面类别即为当前图像帧所属的切面类别。在一个实施例中,计算机设备可以通过切面分类模型使用交叉熵损失算法,得到当前图像帧所属的切面类别。
在一个实施例中,该方法还包括:当确定出视频中每一图像帧所属的切面类别后,则确定每个切面类别所对应的图像帧的数量;将数量最多的切面类别作为该视频所对应的切面类别。
可以理解,这里的相当于对所确定出的各个切面类别进行投票处理;将投票数量最多的切面类别作为视频所对应的切面类别。这里的投票处理并非指人为表决投票,而是一种计算机处理过程。
现结合图5进行举例说明。图5中,f1和f2显示为A2C,而f3显示为A4C,所以,检测出同一视频不同图像帧所属切面类别不同,而通常情况下,同一视频属于一个切面类别,所以,可以对所确定出的各个切面类别进行投票处理,图5中,A2C这一类别的投票数量最多,因此,可以判定该视频的切面类别为A2C,而不是A4C。
上述实施例中,还能够识别出视频的切面类别,相当于能够同时完成视频中目标对象的分割与标准切面的识别,能够为后续的处理更快地提供更多的信息量。此外,通过投票的方法,将数量最多的切面类别作为视频所对应的切面类别。能够保证所确定的切面类别的准确性,进而能够为后续处理提供更加准确的参考信息。
在一个实施例中,通过多任务网络中的关键点检测模型,对特征图进行关键点检测处理包括:将特征图输入预先训练的关键点检测模型中,输出目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;将目标对象关键点模板中的目标对象关键点的预设位置信息与位置信息差值相加,得到目标对象图的目标对象关键点的 第二位置信息。
在一个实施例中,计算机设备可以预先提取样本图像帧中的目标对象图,根据样本图像帧中的目标对象图,结合所标记的该目标对象图中目标对象关键点与目标对象关键点模板中的目标对象关键点之间的样本位置差值进行机器学习训练,得到关键点检测模型。因此,将特征图输入该关键点检测模型之后,可以输出目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值。
上述实施例中,通过关键点检测模型,输出目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;将目标对象关键点模板中的目标对象关键点的预设位置信息与位置信息差值相加,得到目标对象图的目标对象关键点的第二位置信息。位置信息差值比完整的位置信息数据量更小,从而节省了计算资源。
在一个实施例中,通过多任务网络中的分割模型,对特征图进行语义分割处理,输出相应目标对象的分割信息包括:将特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;针对解码图像中每个像素点,选取像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为像素点所属的类别;根据解码图像中属于前景类别的各像素点,确定与目标对象图相应的目标对象的分割信息。
可以理解,分割模型可以预测出解码图像中每个像素点分别属于前景类别和背景类别的分类概率。
当解码图像与目标对象图的尺寸一致时,则可以将直接根据解码图像中属于前景类别的各像素点,得到与目标对象图相应的目标对象的分割信息。
需要说明的是,解码图像可能存在与目标对象图的尺寸不一致的情况。当解码图像的尺寸小于目标对象图的尺寸时,则可以对解码图像进行插值,使其尺寸与输入的目标对象图的尺寸一致,然后再根据插值之后的解码图像中属于前景类别的各像素点,构成目标对象图相应的目标对象分割信息。
上述实施例中,通过分割模型来确定每个像素点的类别,进而实现分割,能够细化分割粒度,提高了分割准确性。
在一个实施例中,分割模型的训练步骤包括:获取样本视频中的各样本图像帧;获取分别与各样本图像帧相应的第一目标对象分割标签;将各样本图像帧和相应第一目标对象分割标签输入初始分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
其中,样本视频,是作为训练数据来训练机器学习模型的视频。样本图像帧,是训练视频中用于训练机器学习模型的图像帧。样本视频可以为多个。
可以理解,训练数据可以包括样本视频,以及与样本视频中每个样本图像帧分别对应的第一目标对象分割标签。第一目标对象分割标签,用于标记出相应样本图像帧中的目标对象轮廓。
可以理解,第一目标对象分割标签可以为人工添加的标注。第一目标对象分割标签可以在样本图像帧的掩码图中进行标记。将第一目标对象分割标签输入初始分割模型,相当于将样本图像帧的掩码图输入初始分割模型。在样本图像帧的掩码图中标记第一目标对象分割标签,相当于标记出了样本图像帧中的目标对象轮廓。
计算机设备可以将各样本图像帧和相应第一目标对象分割标签输入预设的初始化的分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
可以理解,计算机设备可以将目标对象图或者目标对象图的特征图输入基本的分割模型中,进行目标对象的分割处理,得到该目标对象图的分割信息。
计算机设备也可以对该分割模型进一步地进行优化调整,提高该分割模型的准确性,并基于优化后的分割模型,对目标对象图或者目标对象图的特征图进行目标对象的分割处理。
在一个实施例中,分割模型的训练步骤还可以包括对基本的分割模型的优化调整步骤,该优化调整步骤可以包括以下步骤:从样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,从当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目标对象边界的边界特征点;通过光流跟踪操作,跟踪边界特征点在当前样本图像帧中的位置信息;将边界特征点在当前样本图像帧中的位置信息连接并进行平滑,得到当前样本图像帧的第二目标对象分割标签;根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练,得到优化后的分割模型。
在一个实施例中,在分割模型的训练过程中,计算机设备可以从样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,从当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目标对象边界的边界特征点。
其中,边界特征点,是能够表示目标对象边界的特征点。在一个实施例中,计算机设备可以在从当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,均匀选取预设数量的点,作为边界特征点。比如,在前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上均匀选取20个点,作为边界特征点。
可以理解,由于超声图像边缘模糊且存在大量伪影,在标签轮廓上均匀选取预设数量的点作为目标对象的边界特征点,能够避开模糊边缘及伪影这些信息的干扰,从而提高计算准确性。
进一步地,计算机设备可以通过光流跟踪操作,使用光流算法跟踪所选取的边界特征点在当前样本图像帧中的位置信息。可以理解,跟踪的边界特征点在当前样本图像帧中的位置信息相当于形成了新的边界特征点。计算机设备可以将边界特征点在当前样本图像帧中的位置信息连接并进行平滑。即,相当于将跟踪形成的新的边界特征点连接,并通过曲线拟合形成标签轮廓,即得到当前样本图像帧的第二目标对象分割标签(即,得到一套新的目标对象分割标签)。
需要说明的是,第二目标对象分割标签,并非人工添加的标注,而是通过光流算法跟踪生成的、且用于标记出样本图像帧中的目标对象轮廓的标签。
计算机设备可以将每个样本图像帧和通过光流追踪生成的第二目标对象分割标签输入基本的分割模型中,进行迭代地模型优化训练,得到优化后的分割模型。
可以理解,光流算法可以用于对光流的跟踪,也可以对整个目标对象进行跟踪。但光流算法对图像质量有一定要求,但视频图像中可能存在丰富的伪影和模糊的边界,对光流算法的跟踪结构有非常的误导。比如,视频为超声视频时,如果对整个目标对象进行跟踪,目标对象区域内充血、伪影以及模糊边界势必会产生较大的跟踪误差,同时,整片的目标对象跟踪,时效很差。本实施例中,从标签轮廓上选取了目标对象的边界特征点(即位于目标对象轮廓的关键点)进行跟踪,因为轮廓的点相比于目标对象内部的点有更多的图像对比度信息,特征明显,跟踪误差较小,此外跟踪点少,时效好,计算量也小。
此外,本实施例中的光流跟踪只发生在两帧之间,所以,不需要在每张样本图像帧中添加目标对象轮廓关键点的标注信息,就能够实现对前一样本图像帧中选取的边界特征点的跟踪,因此,避免了手工标注边界特征点的处理。
另外,由于本实施例中避免了添加目标对象轮廓关键点的标注信息的情况,光流在两帧之间的跟踪,属于在线训练模型中生成标签的一个处理,不需要考虑光流算法的可导实现,实现方式简单易操作。而且,相当于在训练模型的同时训练学习了光流跟踪算法,使得网络模型自身具备了光流跟踪能力,从而在测试过程中,网络能够在分割出当前帧中的目标对象的同时,通过光流追踪考虑到上一帧目标对象的平滑分割标签信息,导致结果更加平滑。
可以理解,本实施例中能够自动生成分割标签,所以非常适用于半监督学习,特别适用于缺乏人工标注的视频。本实施例用一种间接的方法,通过将光流算法扩展到生成分割标签上,从而能够自动地对分割模型进行调整优化,实现了端到端训练,时效性强,容易实现。
在一个实施例中,该方法还包括:通过基本的分割模型挖掘当前样本图像帧中的难样本像素点;从当前样本图像帧中剔除除难样本像素点和目标对象像素点之外的像素点。本实施例中,根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练可以包括:将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入基本的分割模型中,进行迭代地模型优化训练。
可以理解,专门将难样本挖掘出来,和目标对象像素点一起输入分割模型中,能够针对性地对边缘的难样本像素点进行训练,进而能够提升分割模型在边缘的注意力和识别能力,从而能够使得优化后的分割模型所分割出的边缘更加的光滑。
其中,基本的分割模型,是指前面通过一套第一目标对象分割标签进行迭代地机器学习训练得到的分割模型。难样本像素点,是指容易分类错误的背景像素点。一般情况下,难样本像素点,通常位于图像的边缘以及目标对象分割边缘等这些边界区域。
可以理解,目标对象像素点,即为前景像素点。背景像素点,是除目标对象像素点以外的像素点。
在一个实施例中,通过基本的分割模型挖掘当前样本图像帧中的难样本像素点包括:将具有相应第二目标对象分割标签的各个样本图像帧输入基本的分割模型中,得到样本图像帧中各像素点的分割损失;按照分割损失由大到小的顺序,从样本图像帧中选取与样本图像帧中目标对象像素点个数相匹配的背景像素点,得到难样本像素点,因此,难样本像素点为分类错误的背景像素点。
其中,分割损失,即用于表示预测值与真实值之间的差异。两者差异越大,分割损失越大,两者越接近,分割损失越小。
可以理解,基本的分割模型已经具备一定的分割能力,所以,将具有相应第二目标对象分割标签的各个样本图像帧输入基本的分割模型中,能够对各个样本图像帧进行目标对象分割处理,得到各个样本图像帧的目标对象的分割信息。由于光流追踪的边界特征点在当前样本图像帧中的位置信息连接并进行平滑后,相当于形成第二目标对象分割标签。所以,第二目标对象分割标签相当于能够表示出样本图像帧中的真实值,即样本图像帧中位于第二目标对象分割标签所形成的轮廓内的像素点为目标对象像素点,位于该轮廓外的像素点是背景像素点。得到的各个样本图像帧的目标对象分割信息能够表示出各个样本图像帧中的预测值,即位于分割出的目标对象区域内的像素点为目标对象像素点,位于该目标对象区域外的是背 景像素点。因此,计算机设备可以通过第二目标对象分割标签确定样本图像帧中各像素点的真实值,通过由基本的分割模型分割出的样本图像帧的目标对象分割信息,确定样本图像帧中各像素点的预测值,通过预测值与真实值的比对,得到样本图像帧中各像素点的分割损失。
在一个实施例中,计算机设备可以按照背景像素点的分割损失由大到小的顺序,从样本图像帧中选取与样本图像帧中目标对象像素点个数相匹配的背景像素点,得到难样本像素点。可以理解,与目标对象像素点个数相匹配,并不限定于背景像素点的个数必须与目标对象像素点个数完全一致,只要满足背景像素点的个数与目标对象像素点个数差异在预设的均衡范围之内即可,即不要使二者个数差异过大,以免进行大量的没必要的计算。比如,目标对象像素点个数为100,则可以从背景像素点中选取分割损失在前100名的背景像素点,得到100个难样本像素点。假设,均衡范围为正负20个范围区间,则可以按照背景像素点的分割损失由大到小的顺序,选取80~120个背景像素点,作为难样本像素点。
计算机设备可以从每个样本图像帧中剔除除相应难样本像素点和目标对象像素点之外的像素点。
计算机设备可以将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入基本的分割模型中,进行迭代地模型优化训练,得到优化后的分割模型。
可以理解,计算机设备可以通过在线难样本挖掘算法OHEM(online hard example miniing)算法,来挖掘出难样本像素点。
图7为一个实施例中自适应训练分割光滑边缘的分割模型的方法。需要说明的是,图7是以视频为心脏超声检测的视频、视频中的图像帧为心脏超声切面图、以及以目标对象为左心室进行举例说明的。所以,第一目标对象分割标签,是第一左心室分割标签,第二目标对象分割标签,是第二左心室分割标签。参照图7,第一左心室分割标签是人工标注的。所以,可以先根据针对每个样本图像帧进行人工标注的这一套第一左心室分割标签和相应样本图像帧作为样本数据进行机器学习训练,训练出基本的分割模型,即执行①。在分割模型的训练过程中,计算机设备可以从第t-1个样本图像帧的第一左心室分割标签所表示的标签轮廓上均匀选取预设数量的点,作为边界特征点,通过Lucas-kanade(LK)光流算法,跟踪这些边界特征点在第t个样本图像帧中的位置信息,然后将边界特征点在第t个样本图像帧中的位置信息连接并进行平滑,得到连接平滑后的标签,即为第二左心室分割标签。702中的深色柱形图表示背景像素点,浅色柱形图表示左心室像素点(即前景像素点),702的左边一组柱形图表示的是,以第二左心室分割标签进行分割时,第t个样本图像帧中背景像素点和左心室像素点的数量,可见,背景像素点明显多于左心室像素点,所以需要进行平衡处理,从第t个样本图像帧中剔除除难样本像素点和左心室像素点之外的像素点,减少过多的背景像素点所带来的不必要的计算量。右边这组柱形图表示的是剔除像素点之后的第t个样本图像帧中难样本像素点和左心室像素点的数量,右边柱形图明显看出背景像素点和左心室像素点的数量比较均衡,不至于差异过大。剔除像素点之后的第t个样本图像帧的掩码图即为704。可见704中仍然包括第t个样本图像帧的第二左心室分割标签。接着,可以将每个剔除像素点之后的样本图像帧和相应第二左心室分割标签输入基本的分割模型中,即执行②,进行迭代地模型优化训练。需要说明的是,图7仅用于示例,并不用于限定。
上述实施例中,在模型训练过程中,通过计算机自动化地进行光流追踪产生新的一套目标对象分割标签,即第二目标对象分割标签,结合难样本的挖掘,能够自适应地对分割模型进行优化,而且,能够在模型训练过程中即可自动化实现该优化效果,从而节省了大量的繁 复测试工作。此外,自适应训练分割光滑边缘的分割模型的方法,仅是选取目标对象分割标签所形成的标签轮廓上的点进行光流追踪,即局部进行光流追踪,所以并不需要很大的计算量,节省了计算资源。
在一个实施例中,目标对象关键点模板生成步骤包括:将样本视频中各样本图像帧的第一目标对象分割标签所形成的标签轮廓外扩预设范围;按照目标对象在预设切面类别的心脏超声切面图中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;从每个样本图像帧中,裁剪出与裁剪范围相符的剪裁图片;将与每个剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;根据目标对象关键点的预设位置信息,生成目标对象关键点模板。
可以理解,样本视频为多个,每个样本视频中的每个样本图像帧都有相应的第一目标对象分割标签。目标对象分割标签是用于表示目标对象外部轮廓的标注。
由于不同切面类别的心脏超声切面图中的目标对象区域的大小会存在一些差异,而目标对象关键点模板是对所有切面类别的心脏超声切面图统一适用的,所以,为了能够通用于所有的切面类别,计算机设备可以在每个样本图像帧中,将相应第一目标对象分割标签所形成的标签轮廓外扩预设范围,外扩预设范围后标签轮廓内的区域基本上能够覆盖所有不同切面类别的心脏超声切面图中的目标对象区域。因此,可以将外扩预设范围后标签轮廓内的区域粗略当作目标对象的位置。
在一个实施例中,计算机设备可以找出目标对象在预设的不同切面类别的切面图中的位置规律,对每个样本图像帧中外扩后的范围进行增扩,得到裁剪范围。需要说明的是,这里对外扩后的范围进行增扩,是为了确定裁剪范围,而并不会再对第一目标对象分割标签所形成的标签轮廓再进行外扩。可以理解,在对外扩后的范围进行增扩时,第一目标对象分割标签所形成的标签轮廓不再进行外扩,而是基于目标对象在预设的不同切面类别的切面图中的位置规律,选取一个比该标签轮廓所形成的范围大一些的范围作为裁剪范围。该裁剪范围覆盖标签轮廓所形成的范围。
在一个实施例中,当目标对象为左心室时,左心室在预设的不同切面类别的切面图中的位置规律,为左心室位于心脏超声切面图中的左上角。计算机设备可以在外扩后的范围的基础上,往样本图像帧的左边和下面分别增扩50%的左心室宽高,得到裁剪范围。可以理解,增扩后的裁剪范围除了能够函括左心室区域外,还能够包括更多的用于判断切面类别的信息。
计算机设备可以从每个样本图像帧中,裁剪出与裁剪范围相符的剪裁图片。这样就可以得到多个裁剪图片,可以理解,有多少个样本图像帧就有多少个裁剪图片。在一个实施例中,计算机设备可以将按照裁剪范围裁剪出的图片进行尺寸调整,调整为与多任务网络的输入尺寸相符的尺寸,将调整尺寸后的图片作为裁剪图片。
计算机设备可以将所有剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;根据目标对象关键点的预设位置信息,生成目标对象关键点模板。
在一个实施例中,针对每个裁剪图片,计算机设备可以确定该裁剪图片中由第一目标对象分割标签所表示的目标对象关键点,并确定该目标对象关键点在该裁剪图片中的位置信息。计算机设备可以将所有裁剪图片中目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息。
可以理解,可以使用属于不同的切面类型的样本视频作为训练数据,所以样本图像帧也 对应多种切面类别。基于不同切面类别的样本图像帧的样本图像帧确定出的目标对象关键点模板,能够用于对多种不同切面类别的图像帧进行检测。
图8为一个实施例中生成目标对象关键点模板的原理示意图。需要说明的是,图8是以视频中的图像帧为心脏超声切面图、以及以目标对象为左心室进行举例说明的。那么,左心室关键点模板即为要生成的目标对象关键点模板。参照图8,以不同切面类别A2C和A4C的心脏超声切面图作为生成左心室关键点模板的基础数据,通过下述一系列处理:将各个作为样本图像帧的心脏超声切面图中的左心室的标签轮廓外扩一定范围、根据左心室在不同类别的切面图中的位置规律对外扩范围后进行增扩得到裁剪范围、再按照裁剪范围采集图像并调整为与多任务网络的输入尺寸相符的裁剪图片、以及对所有裁剪图片的左心室关键点的位置信息求平均,得到左心室关键点的预设位置信息。根据左心室关键点的预设位置信息,最终生成左心室关键点模板802。
上述实施例中,将通过上述方法确定的目标对象关键点的位置信息求均值,能够提高目标对象关键点模板的准确性和适用性。进而,为后续的仿射变换提供准确参考依据。
如图9所示,提供了一种图像分割装置900,其特征在于,装置包括:选取模块902、仿射变换模块904、目标对象信息获取模块906以及分割模块908,其中:
选取模块902,用于在视频中按照时序依次选取当前图像帧。
仿射变换模块904,用于从在视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图。
目标对象信息获取模块906,用于对目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从目标对象图中分割目标对象,得到目标对象的分割信息。
分割模块908,用于根据分割信息和第二位置信息,从当前图像帧中分割出目标对象。
如图10所示,在一个实施例中,该装置900还包括:
首帧关键点信息优化模块901,用于从视频的首位图像帧中,检测出目标对象关键点的初始位置信息;将首位图像帧作为前一图像帧以及将初始位置信息作为前一位置信息,参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息;将后一图像帧作为前一图像帧以及将后一图像帧中的目标对象关键点的位置信息作为前一位置信息,返回参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息的步骤,以进行迭代处理,直至得到视频的末位图像帧中的目标对象关键点的位置信息;将末位图像帧当作首位图像帧的前一图像帧,参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中目标对象关键点最终的位置信息。
在一个实施例中,首帧关键点信息优化模块901还用于按照前一位置信息和目标对象关键点模板之间的仿射变换关系,对前一图像帧的后一图像帧进行仿射变换,得到后一图像帧中的目标对象图;对后一图像帧中的目标对象图进行关键点检测,得到后一图像帧中的目标对象关键点的位置信息。
在一个实施例中,仿射变换模块904还用于按照距当前图像帧由近到远的顺序,将视频中在当前图像帧之前的预设数量的图像帧,确定为参考图像帧;分割模块908,还用于当参考图像帧为多个时,则对依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的 目标对象的分割信息求平均,得到目标对象最终的分割信息;求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息;将目标对象最终的分割信息和目标对象关键点最终的第二位置信息映射至当前图像帧。
在一个实施例中,目标对象信息获取模块906还用于将目标对象图输入至多任务网络中,编码得到目标对象图的特征图;通过多任务网络中的关键点检测模型,对特征图进行关键点检测处理,输出与目标对象图对应的目标对象关键点的第二位置信息;通过多任务网络中的分割模型,对特征图进行语义分割处理,输出相应目标对象的分割信息。
在一个实施例中,目标对象信息获取模块906还用于通过多任务网络中的切面分类模型,对特征图进行切面分类处理,得到当前图像帧所属的切面类别;当确定出视频中每一图像帧所属的切面类别后,则确定每个切面类别所对应的图像帧的数量;将数量最多的切面类别作为视频所对应的切面类别。
在一个实施例中,目标对象信息获取模块906还用于将特征图输入预先训练的关键点检测模型中,输出目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;将目标对象关键点模板中的目标对象关键点的预设位置信息与位置信息差值相加,得到目标对象图的目标对象关键点的第二位置信息。
在一个实施例中,目标对象信息获取模块906还用于将特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;针对解码图像中每个像素点,选取像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为像素点所属的类别;根据解码图像中属于前景类别的各像素点,确定与目标对象图相应的目标对象的分割信息。
在一个实施例中,目标对象信息获取模块906还用于获取样本视频中的各样本图像帧;获取分别与各样本图像帧相应的第一目标对象分割标签;将各样本图像帧和相应第一目标对象分割标签输入初始分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
在一个实施例中,目标对象信息获取模块906还用于从样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧;从当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目标对象边界的边界特征点;通过光流跟踪操作,跟踪边界特征点在当前样本图像帧中的位置信息;将边界特征点在当前样本图像帧中的位置信息连接并进行平滑,得到当前样本图像帧的第二目标对象分割标签;根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练,得到优化后的分割模型。
在一个实施例中,目标对象信息获取模块906还用于通过基本的分割模型挖掘当前样本图像帧中的难样本像素点,所述难样本像素点为分类错误的背景像素点;从当前样本图像帧中剔除除难样本像素点和目标对象像素点之外的像素点;将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入基本的分割模型中,进行迭代地模型优化训练。
在一个实施例中,仿射变换模块904还用于将样本视频中各样本图像帧的第一目标对象分割标签所形成的标签轮廓外扩预设范围;按照目标对象在图像帧中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;从每个样本图像帧中,裁剪出与裁剪范围相符的剪裁图片;将与每个剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;根据目标对象关键点的预设位置信息,生成目标对象关键点模板。
图11为一个实施例中计算机设备的内部结构示意图。参照图11,该计算机设备可以是图1中所示的服务器120。可以理解,计算机设备也可以是终端110。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质可存储操作系统和计算机程序。该计算机程序被执行时,可使得处理器执行一种图像分割方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该内存储器中可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行一种图像分割方法。计算机设备的网络接口用于进行网络通信。
本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的图像分割装置可以实现为一种计算机程序的形式,计算机程序可在如图11所示的计算机设备上运行,计算机设备的非易失性存储介质可存储组成该异常检测装置的各个程序模块,比如,图9所示的选取模块902、仿射变换模块904、目标对象信息获取模块906以及分割模块908。各个程序模块所组成的计算机程序用于使该计算机设备执行本说明书中描述的本申请各个实施例的图像分割方法中的步骤,例如,计算机设备可以通过如图9所示的图像分割装置900中的选取模块902在视频中按照时序依次选取当前图像帧。计算机设备可以通过仿射变换模块904从在视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图。计算机设备可以通过目标对象信息获取模块906对目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从目标对象图中分割目标对象,得到目标对象的分割信息。计算机设备可以通过分割模块908根据分割信息和第二位置信息,从当前图像帧中分割出目标对象。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行如下步骤:
在视频中按照时序依次选取当前图像帧;从在视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图;对目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从目标对象图中分割目标对象,得到目标对象的分割信息;根据分割信息和第二位置信息,从当前图像帧中分割出目标对象。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
从视频的首位图像帧中,检测出目标对象关键点的初始位置信息;将首位图像帧作为前一图像帧以及将初始位置信息作为前一位置信息,参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息;将后一图像帧作为前一图像帧以及将后一图像帧中的目标对象关键点的位置信息作为前一位置信息,返回参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息的步骤,以进行迭代处理,直至得到视频的末位图像帧中的目标对象关键点的位置信息;将末位图像帧当作首位图像帧的前一图像帧, 参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中的目标对象关键点最终的位置信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
按照前一位置信息和目标对象关键点模板之间的仿射变换关系,对前一图像帧的后一图像帧进行仿射变换,得到后一图像帧中的目标对象图;对后一图像帧中的目标对象图进行关键点检测,得到后一图像帧中的目标对象关键点的位置信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
按照距当前图像帧由近到远的顺序,将视频中在当前图像帧之前的预设数量的图像帧,确定为参考图像帧;当参考图像帧为多个时,则对依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的目标对象的分割信息求平均,得到目标对象最终的分割信息;求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息;将目标对象最终的分割信息和目标对象关键点最终的第二位置信息映射至当前图像帧。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将目标对象图输入至多任务网络中,编码得到目标对象图的特征图;通过多任务网络中的关键点检测模型,对特征图进行关键点检测处理,输出与目标对象图对应的目标对象关键点的第二位置信息;通过多任务网络中的分割模型,对特征图进行语义分割处理,输出相应目标对象的分割信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
通过多任务网络中的切面分类模型,对特征图进行切面分类处理,得到当前图像帧所属的切面类别;当确定出视频中每一图像帧所属的切面类别后,则确定每个切面类别所对应的图像帧的数量;将数量最多的切面类别作为视频所对应的切面类别。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将特征图输入预先训练的关键点检测模型中,输出目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;将目标对象关键点模板中的目标对象关键点的预设位置信息与位置信息差值相加,得到目标对象图的目标对象关键点的第二位置信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;针对解码图像中每个像素点,选取像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为像素点所属的类别;根据解码图像中属于前景类别的各像素点,确定与目标对象图相应的目标对象的分割信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
获取样本视频中的各样本图像帧;获取分别与各样本图像帧相应的第一目标对象分割标签;将各样本图像帧和相应第一目标对象分割标签输入初始分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
从样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,从当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目 标对象边界的边界特征点;通过光流跟踪操作,跟踪边界特征点在当前样本图像帧中的位置信息;将边界特征点在当前样本图像帧中的位置信息连接并进行平滑,得到当前样本图像帧的第二目标对象分割标签;根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练,得到优化后的分割模型。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
通过基本的分割模型挖掘当前样本图像帧中的难样本像素点,难样本像素点为分类错误的背景像素点;从当前样本图像帧中剔除除难样本像素点和目标对象像素点之外的像素点;根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练包括:将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入基本的分割模型中,进行迭代地模型优化训练。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将样本视频中各样本图像帧的第一目标对象分割标签所形成的标签轮廓外扩预设范围;按照目标对象在图像帧中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;从每个样本图像帧中,裁剪出与裁剪范围相符的剪裁图片;将与每个剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;根据目标对象关键点的预设位置信息,生成目标对象关键点模板。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行如下步骤:
在视频中按照时序依次选取当前图像帧;从在视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对当前图像帧进行仿射变换,得到当前图像帧的目标对象图;对目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从目标对象图中分割目标对象,得到目标对象的分割信息;根据分割信息和第二位置信息,从当前图像帧中分割出目标对象。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
从视频的首位图像帧中,检测出目标对象关键点的初始位置信息;将首位图像帧作为前一图像帧以及将初始位置信息作为前一位置信息,参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息;将后一图像帧作为前一图像帧以及将后一图像帧中的目标对象关键点的位置信息作为前一位置信息,返回参照前一位置信息,检测前一图像帧的后一图像帧中的目标对象关键点的位置信息的步骤,以进行迭代处理,直至得到视频的末位图像帧中的目标对象关键点的位置信息;将末位图像帧当作首位图像帧的前一图像帧,参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中的目标对象关键点最终的位置信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
按照前一位置信息和目标对象关键点模板之间的仿射变换关系,对前一图像帧的后一图像帧进行仿射变换,得到后一图像帧中的目标对象图;对后一图像帧中的目标对象图进行关键点检测,得到后一图像帧中的目标对象关键点的位置信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
按照距当前图像帧由近到远的顺序,将视频中在当前图像帧之前的预设数量的图像帧,确定为参考图像帧;当参考图像帧为多个时,则对依照每个参考图像帧中目标对象关键点的 第一位置信息所确定出的目标对象的分割信息求平均,得到目标对象最终的分割信息;求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息;将目标对象最终的分割信息和目标对象关键点最终的第二位置信息映射至当前图像帧。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将目标对象图输入至多任务网络中,编码得到目标对象图的特征图;通过多任务网络中的关键点检测模型,对特征图进行关键点检测处理,输出与目标对象图对应的目标对象关键点的第二位置信息;通过多任务网络中的分割模型,对特征图进行语义分割处理,输出相应目标对象的分割信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
通过多任务网络中的切面分类模型,对特征图进行切面分类处理,得到当前图像帧所属的切面类别;当确定出视频中每一图像帧所属的切面类别后,则确定每个切面类别所对应的图像帧的数量;将数量最多的切面类别作为视频所对应的切面类别。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将特征图输入预先训练的关键点检测模型中,输出目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;将目标对象关键点模板中的目标对象关键点的预设位置信息与位置信息差值相加,得到目标对象图的目标对象关键点的第二位置信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;针对解码图像中每个像素点,选取像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为像素点所属的类别;根据解码图像中属于前景类别的各像素点,确定与目标对象图相应的目标对象的分割信息。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
获取样本视频中的各样本图像帧;获取分别与各样本图像帧相应的第一目标对象分割标签;将各样本图像帧和相应第一目标对象分割标签输入初始分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
从样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,从当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目标对象边界的边界特征点;通过光流跟踪操作,跟踪边界特征点在当前样本图像帧中的位置信息;将边界特征点在当前样本图像帧中的位置信息连接并进行平滑,得到当前样本图像帧的第二目标对象分割标签;根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练,得到优化后的分割模型。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
通过基本的分割模型挖掘当前样本图像帧中的难样本像素点,难样本像素点为分类错误的背景像素点;从当前样本图像帧中剔除除难样本像素点和目标对象像素点之外的像素点;根据每个样本图像帧和相应的第二目标对象分割标签对基本的分割模型进行迭代优化训练包括:将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入基本的分割模 型中,进行迭代地模型优化训练。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行如下操作:
将样本视频中各样本图像帧的第一目标对象分割标签所形成的标签轮廓外扩预设范围;按照目标对象在图像帧中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;从每个样本图像帧中,裁剪出与裁剪范围相符的剪裁图片;将与每个剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;根据目标对象关键点的预设位置信息,生成目标对象关键点模板。
需要说明的是,本申请各实施例中的“第一”、“第二”和“第三”仅用作区分,而并不用于大小、先后、从属等方面的限定。
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (26)

  1. 一种图像分割方法,应用于计算机设备,所述方法包括:
    在视频中按照时序依次选取当前图像帧;
    从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;
    获取所述参考图像帧中目标对象关键点的第一位置信息;
    参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的目标对象图;
    对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;
    从所述目标对象图中分割目标对象,得到目标对象的分割信息;
    根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述在视频中按照时序依次选取当前图像帧之前,所述方法还包括:
    从所述视频的首位图像帧中,检测出目标对象关键点的初始位置信息;
    将所述首位图像帧作为前一图像帧以及将所述初始位置信息作为前一位置信息,参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的目标对象关键点的位置信息;
    将所述后一图像帧作为前一图像帧以及将所述后一图像帧中的目标对象关键点的位置信息作为前一位置信息,返回所述参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的目标对象关键点的位置信息的步骤,以进行迭代处理,直至得到所述视频的末位图像帧中的目标对象关键点的位置信息;
    将末位图像帧当作所述首位图像帧的前一图像帧,参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中的目标对象关键点最终的位置信息。
  3. 根据权利要求2所述的方法,其特征在于,所述参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的目标对象关键点的位置信息包括:
    按照所述前一位置信息和目标对象关键点模板之间的仿射变换关系,对所述前一图像帧的后一图像帧进行仿射变换,得到所述后一图像帧中的目标对象图;
    对所述后一图像帧中的目标对象图进行关键点检测,得到所述后一图像帧中的目标对象关键点的位置信息。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧包括:
    按照距所述当前图像帧由近到远的顺序,将所述视频中在所述当前图像帧之前的预设数量的图像帧,确定为参考图像帧;
    所述根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象,包括:
    当参考图像帧为多个时,则对依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的目标对象的分割信息求平均,得到目标对象最终的分割信息;
    求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息;
    将所述目标对象最终的分割信息和所述目标对象关键点最终的第二位置信息映射至所 述当前图像帧。
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    将所述目标对象图输入至多任务网络中,编码得到所述目标对象图的特征图;
    所述对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息包括:
    通过所述多任务网络中的关键点检测模型,对所述特征图进行关键点检测处理,输出与所述目标对象图对应的目标对象关键点的第二位置信息;
    所述从所述目标对象图中分割目标对象,得到目标对象的分割信息包括:
    通过所述多任务网络中的分割模型,对所述特征图进行语义分割处理,输出相应目标对象的分割信息。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述目标对象图输入至多任务网络中,编码得到所述目标对象图的特征图之后,所述方法还包括:
    通过所述多任务网络中的切面分类模型,对所述特征图进行切面分类处理,得到所述当前图像帧所属的切面类别;
    当确定出所述视频中每一图像帧所属的切面类别后,确定每个切面类别所对应的图像帧的数量;
    将数量最多的切面类别作为所述视频所对应的切面类别。
  7. 根据权利要求5所述的方法,其特征在于,所述通过所述多任务网络中的关键点检测模型,对所述特征图进行关键点检测处理包括:
    将所述特征图输入预先训练的关键点检测模型中,输出所述目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;
    将所述目标对象关键点模板中的目标对象关键点的预设位置信息与所述位置信息差值相加,得到所述目标对象图的目标对象关键点的第二位置信息。
  8. 根据权利要求5所述的方法,其特征在于,所述通过多任务网络中的分割模型,对所述特征图进行语义分割处理,输出相应目标对象的分割信息包括:
    将所述特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;
    针对所述解码图像中每个像素点,选取所述像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为所述像素点所属的类别;
    根据所述解码图像中属于前景类别的各像素点,确定与所述目标对象图相应的目标对象的分割信息。
  9. 根据权利要求5所述的方法,其特征在于,所述分割模型的训练步骤包括:
    获取样本视频中的各样本图像帧;
    获取分别与各所述样本图像帧相应的第一目标对象分割标签;
    将各所述样本图像帧和相应第一目标对象分割标签输入初始分割模型中,进行迭代地机器学习训练,得到基本的分割模型。
  10. 根据权利要求9所述的方法,其特征在于,所述分割模型的训练步骤还包括:
    从所述样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,
    从所述当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目标对象边界的边界特征点;
    通过光流跟踪操作,跟踪所述边界特征点在所述当前样本图像帧中的位置信息;
    将所述边界特征点在所述当前样本图像帧中的位置信息连接并进行平滑,得到所述当前样本图像帧的第二目标对象分割标签;
    根据每个样本图像帧和相应的第二目标对象分割标签对所述基本的分割模型进行迭代优化训练,得到优化后的分割模型。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    通过所述基本的分割模型挖掘所述当前样本图像帧中的难样本像素点,所述难样本像素点为分类错误的背景像素点;
    从所述当前样本图像帧中剔除除所述难样本像素点和目标对象像素点之外的像素点;
    所述根据每个样本图像帧和相应的第二目标对象分割标签对所述基本的分割模型进行迭代优化训练包括:
    将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入所述基本的分割模型中,进行迭代地模型优化训练。
  12. 根据权利要求1至3中任一项所述的方法,其特征在于,所述目标对象关键点模板生成步骤包括:
    将样本视频中各样本图像帧的第一目标对象分割标签所形成的标签轮廓外扩预设范围;
    按照目标对象在图像帧中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;
    从每个所述样本图像帧中,裁剪出与所述裁剪范围相符的剪裁图片;
    将与每个剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;
    根据所述目标对象关键点的预设位置信息,生成目标对象关键点模板。
  13. 一种图像分割装置,其特征在于,所述装置包括:
    选取模块,用于在视频中按照时序依次选取当前图像帧;
    仿射变换模块,用于从在所述视频中的时序位于当前图像帧之前的图像帧中,确定参考图像帧;获取所述参考图像帧中目标对象关键点的第一位置信息;参照第一位置信息和目标对象关键点模板之间的仿射变换关系,对所述当前图像帧进行仿射变换,得到所述当前图像帧的目标对象图;
    目标对象信息获取模块,用于对所述目标对象图进行关键点检测,得到目标对象关键点的第二位置信息;从所述目标对象图中分割目标对象,得到目标对象的分割信息;
    分割模块,用于根据所述分割信息和所述第二位置信息,从所述当前图像帧中分割出所述目标对象。
  14. 根据权利要求13所述的装置,其特征在于,所述装置还包括:
    首帧关键点信息优化模块,用于从所述视频的首位图像帧中,检测出目标对象关键点的初始位置信息;将所述首位图像帧作为前一图像帧以及将所述初始位置信息作为前一位置信息,参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的目标对象关键点的位置信息;将所述后一图像帧作为前一图像帧以及将所述后一图像帧中的目标对象关键点的位置信息作为前一位置信息,返回所述参照所述前一位置信息,检测所述前一图像帧的后一图像帧中的目标对象关键点的位置信息的步骤,以进行迭代处理,直至得到所述视频的末位图像 帧中的目标对象关键点的位置信息;将末位图像帧当作所述首位图像帧的前一图像帧,参照末位图像帧中的目标对象关键点的位置信息,确定首位图像帧中的目标对象关键点最终的位置信息。
  15. 根据权利要求14所述的装置,其特征在于,所述首帧关键点信息优化模块,还用于按照所述前一位置信息和目标对象关键点模板之间的仿射变换关系,对所述前一图像帧的后一图像帧进行仿射变换,得到所述后一图像帧中的目标对象图;对所述后一图像帧中的目标对象图进行关键点检测,得到所述后一图像帧中的目标对象关键点的位置信息。
  16. 根据权利要求13至15中任一项所述的装置,其特征在于,所述仿射变换模块,还用于按照距所述当前图像帧由近到远的顺序,将所述视频中在所述当前图像帧之前的预设数量的图像帧,确定为参考图像帧;
    所述分割模块,还用于当参考图像帧为多个时,则对依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的目标对象的分割信息求平均,得到目标对象最终的分割信息;求取分别依照每个参考图像帧中目标对象关键点的第一位置信息所确定出的第二位置信息的平均值,得到目标对象关键点最终的第二位置信息;将所述目标对象最终的分割信息和所述目标对象关键点最终的第二位置信息映射至所述当前图像帧。
  17. 根据权利要求13至15中任一项所述的装置,其特征在于,所述目标对象信息获取模块,还用于将所述目标对象图输入至多任务网络中,编码得到所述目标对象图的特征图;通过所述多任务网络中的关键点检测模型,对所述特征图进行关键点检测处理,输出与所述目标对象图对应的目标对象关键点的第二位置信息;所述从所述目标对象图中分割目标对象,得到目标对象的分割信息包括:通过所述多任务网络中的分割模型,对所述特征图进行语义分割处理,输出相应目标对象的分割信息。
  18. 根据权利要求17所述的装置,其特征在于,所述目标对象信息获取模块,还用于通过所述多任务网络中的切面分类模型,对所述特征图进行切面分类处理,得到所述当前图像帧所属的切面类别;当确定出所述视频中每一图像帧所属的切面类别后,确定每个切面类别所对应的图像帧的数量;将数量最多的切面类别作为所述视频所对应的切面类别。
  19. 根据权利要求17所述的装置,其特征在于,所述目标对象信息获取模块,还用于将所述特征图输入预先训练的关键点检测模型中,输出所述目标对象图中的目标对象关键点与目标对象关键点模板中的目标对象关键点之间的位置信息差值;将所述目标对象关键点模板中的目标对象关键点的预设位置信息与所述位置信息差值相加,得到所述目标对象图的目标对象关键点的第二位置信息。
  20. 根据权利要求17所述的装置,其特征在于,所述目标对象信息获取模块,还用于将所述特征图输入预先训练的分割模型进行解码,输出得到的解码图像中每个像素点属于前景类别的第一分类概率和属于背景类别的第二分类概率;针对所述解码图像中每个像素点,选取所述像素点所对应的第一分类概率和第二分类概率中较大的分类概率所对应的类别,作为所述像素点所属的类别;根据所述解码图像中属于前景类别的各像素点,确定与所述目标对象图相应的目标对象的分割信息。
  21. 根据权利要求17所述的装置,其特征在于,所述目标对象信息获取模块,还用于获取样本视频中的各样本图像帧;获取分别与各所述样本图像帧相应的第一目标对象分割标签;将各所述样本图像帧和相应第一目标对象分割标签输入初始分割模型中,进行迭代地机 器学习训练,得到基本的分割模型。
  22. 根据权利要求21所述的装置,其特征在于,所述目标对象信息获取模块,还用于从所述样本图像帧中依次选取当前样本图像帧,针对每个当前样本图像帧,从所述当前样本图像帧的前一样本图像帧的第一目标对象分割标签所形成的标签轮廓上,选取预设数量的表示目标对象边界的边界特征点;通过光流跟踪操作,跟踪所述边界特征点在所述当前样本图像帧中的位置信息;将所述边界特征点在所述当前样本图像帧中的位置信息连接并进行平滑,得到所述当前样本图像帧的第二目标对象分割标签;根据每个样本图像帧和相应的第二目标对象分割标签对所述基本的分割模型进行迭代优化训练,得到优化后的分割模型。
  23. 根据权利要求22所述的装置,其特征在于,所述目标对象信息获取模块,还用于通过所述基本的分割模型挖掘所述当前样本图像帧中的难样本像素点,所述难样本像素点为分类错误的背景像素点;从所述当前样本图像帧中剔除除所述难样本像素点和目标对象像素点之外的像素点;将每个剔除像素点之后的样本图像帧和相应的第二目标对象分割标签输入所述基本的分割模型中,进行迭代地模型优化训练。
  24. 根据权利要求13至15中任一项所述的装置,其特征在于,所述仿射变换模块,还用于将样本视频中各样本图像帧的第一目标对象分割标签所形成的标签轮廓外扩预设范围;按照目标对象在图像帧中的位置规律,对每个外扩后的范围进行增扩,得到裁剪范围;从每个所述样本图像帧中,裁剪出与所述裁剪范围相符的剪裁图片;将与每个剪裁图片中的目标对象关键点的位置信息求平均,得到目标对象关键点的预设位置信息;根据所述目标对象关键点的预设位置信息,生成目标对象关键点模板。
  25. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行权利要求1至12中任一项所述方法的步骤。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行权利要求1至12中任一项所述方法的步骤。
PCT/CN2019/119770 2018-11-27 2019-11-20 图像分割方法、装置、计算机设备及存储介质 WO2020108366A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19890613.3A EP3852009A4 (en) 2018-11-27 2019-11-20 IMAGE SEGMENTATION PROCESS AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIA
US17/173,259 US11734826B2 (en) 2018-11-27 2021-02-11 Image segmentation method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811425694.8 2018-11-27
CN201811425694.8A CN109492608B (zh) 2018-11-27 2018-11-27 图像分割方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/173,259 Continuation US11734826B2 (en) 2018-11-27 2021-02-11 Image segmentation method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020108366A1 true WO2020108366A1 (zh) 2020-06-04

Family

ID=65697778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119770 WO2020108366A1 (zh) 2018-11-27 2019-11-20 图像分割方法、装置、计算机设备及存储介质

Country Status (4)

Country Link
US (1) US11734826B2 (zh)
EP (1) EP3852009A4 (zh)
CN (1) CN109492608B (zh)
WO (1) WO2020108366A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343779A (zh) * 2021-05-14 2021-09-03 南方电网调峰调频发电有限公司 环境异常检测方法、装置、计算机设备和存储介质
CN113361519A (zh) * 2021-05-21 2021-09-07 北京百度网讯科技有限公司 目标处理方法、目标处理模型的训练方法及其装置
CN114119990A (zh) * 2021-09-29 2022-03-01 北京百度网讯科技有限公司 用于图像特征点匹配的方法、装置及计算机程序产品
CN114581448A (zh) * 2022-05-07 2022-06-03 北京阿丘科技有限公司 图像检测方法、装置、终端设备以及存储介质
CN114842029A (zh) * 2022-05-09 2022-08-02 江苏科技大学 一种融合通道和空间注意力的卷积神经网络息肉分割方法
CN115994922A (zh) * 2023-03-23 2023-04-21 泉州装备制造研究所 运动分割方法、装置、电子设备及存储介质
CN116563615A (zh) * 2023-04-21 2023-08-08 南京讯思雅信息科技有限公司 基于改进多尺度注意力机制的不良图片分类方法

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11301686B2 (en) * 2018-05-25 2022-04-12 Intel Corporation Visual anomaly detection without reference in graphics computing environments
CN109492608B (zh) * 2018-11-27 2019-11-05 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备及存储介质
CN111402424A (zh) * 2019-01-02 2020-07-10 珠海格力电器股份有限公司 芯片结构的增强现实显示方法、装置和可读存储介质
CN110176027B (zh) * 2019-05-27 2023-03-14 腾讯科技(深圳)有限公司 视频目标跟踪方法、装置、设备及存储介质
CN110348369B (zh) * 2019-07-08 2021-07-06 北京字节跳动网络技术有限公司 一种视频场景分类方法、装置、移动终端及存储介质
CN110532891B (zh) * 2019-08-05 2022-04-05 北京地平线机器人技术研发有限公司 目标对象状态识别方法、装置、介质和设备
CN110490881A (zh) 2019-08-19 2019-11-22 腾讯科技(深圳)有限公司 医学影像分割方法、装置、计算机设备及可读存储介质
CN110782404B (zh) * 2019-10-11 2022-06-10 北京达佳互联信息技术有限公司 一种图像处理方法、装置及存储介质
CN110866938B (zh) * 2019-11-21 2021-04-27 北京理工大学 一种全自动视频运动目标分割方法
CN111047526B (zh) * 2019-11-22 2023-09-26 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备及存储介质
CN111126344B (zh) * 2019-12-31 2023-08-01 杭州趣维科技有限公司 一种生成人脸额头关键点的方法与系统
CN111210446B (zh) * 2020-01-08 2022-07-29 中国科学技术大学 一种视频目标分割方法、装置和设备
KR102144672B1 (ko) * 2020-01-17 2020-08-14 성균관대학교산학협력단 시맨틱 분할을 이용한 인공지능형 초음파 의료 진단 장치 및 이를 이용한 원격 의료 진단 방법
CN111523402B (zh) * 2020-04-01 2023-12-12 车智互联(北京)科技有限公司 一种视频处理方法、移动终端及可读存储介质
CN111523403B (zh) * 2020-04-03 2023-10-20 咪咕文化科技有限公司 图片中目标区域的获取方法及装置、计算机可读存储介质
CN111754528A (zh) * 2020-06-24 2020-10-09 Oppo广东移动通信有限公司 人像分割方法、装置、电子设备和计算机可读存储介质
CN111797753B (zh) * 2020-06-29 2024-02-27 北京灵汐科技有限公司 图像驱动模型的训练、图像生成方法、装置、设备及介质
CN112001939B (zh) * 2020-08-10 2021-03-16 浙江大学 基于边缘知识转化的图像前景分割算法
CN112069992A (zh) * 2020-09-04 2020-12-11 西安西图之光智能科技有限公司 一种基于多监督稠密对齐的人脸检测方法、系统及存储介质
US11366983B2 (en) * 2020-09-09 2022-06-21 International Business Machines Corporation Study-level multi-view processing system
US11386532B2 (en) 2020-09-22 2022-07-12 Facebook Technologies, Llc. Blue noise mask for video sampling
US11430085B2 (en) * 2020-09-22 2022-08-30 Facebook Technologies, Llc Efficient motion-compensated spatiotemporal sampling
CN112363918B (zh) * 2020-11-02 2024-03-08 北京云聚智慧科技有限公司 用户界面ai自动化测试方法、装置、设备和存储介质
CN112733860B (zh) * 2021-01-27 2021-09-10 上海微亿智造科技有限公司 用于二分类分割网络难样本挖掘的方法及系统
CN113393468A (zh) * 2021-06-28 2021-09-14 北京百度网讯科技有限公司 图像处理方法、模型训练方法、装置和电子设备
CN113344999A (zh) * 2021-06-28 2021-09-03 北京市商汤科技开发有限公司 深度检测方法及装置、电子设备和存储介质
CN113570607B (zh) * 2021-06-30 2024-02-06 北京百度网讯科技有限公司 目标分割的方法、装置及电子设备
TWI797923B (zh) * 2021-12-28 2023-04-01 國家中山科學研究院 遮罩係數空間之線上多物件分割與追蹤系統
CN114782399B (zh) * 2022-05-13 2024-02-02 上海博动医疗科技股份有限公司 瓣环自动检测方法、装置、电子设备及存储介质
CN117197020A (zh) * 2022-05-23 2023-12-08 上海微创卜算子医疗科技有限公司 二尖瓣开口间距检测方法、电子设备和存储介质
WO2024026366A1 (en) * 2022-07-27 2024-02-01 Stryker Corporation Systems and methods for real-time processing of medical imaging data utilizing an external processing device
CN116563371A (zh) * 2023-03-28 2023-08-08 北京纳通医用机器人科技有限公司 关键点确定方法、装置、设备及存储介质
CN117935171A (zh) * 2024-03-19 2024-04-26 中国联合网络通信有限公司湖南省分公司 一种基于姿态关键点的目标追踪方法及系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130301911A1 (en) * 2012-05-08 2013-11-14 Samsung Electronics Co., Ltd Apparatus and method for detecting body parts
CN104573742A (zh) * 2014-12-30 2015-04-29 中国科学院深圳先进技术研究院 医学图像分类方法和系统
CN104881861A (zh) * 2015-03-11 2015-09-02 西南交通大学 一种基于图元分类的高铁接触网悬挂装置故障状态检测方法
CN107103605A (zh) * 2016-02-22 2017-08-29 上海联影医疗科技有限公司 一种乳房组织的分割方法
CN108171244A (zh) * 2016-12-07 2018-06-15 北京深鉴科技有限公司 对象识别方法和系统
CN108427951A (zh) * 2018-02-08 2018-08-21 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN108520223A (zh) * 2018-04-02 2018-09-11 广州华多网络科技有限公司 视频图像的分割方法、分割装置、存储介质和终端设备
CN109492608A (zh) * 2018-11-27 2019-03-19 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9135514B2 (en) * 2010-05-21 2015-09-15 Qualcomm Incorporated Real time tracking/detection of multiple targets
CN103226834B (zh) * 2013-03-26 2015-10-21 长安大学 一种图像运动目标特征点快速搜索方法
CN104217350B (zh) * 2014-06-17 2017-03-22 北京京东尚科信息技术有限公司 实现虚拟试戴的方法和装置
US10521902B2 (en) * 2015-10-14 2019-12-31 The Regents Of The University Of California Automated segmentation of organ chambers using deep learning methods from medical imaging
CN106709500B (zh) * 2015-11-13 2021-12-03 国网辽宁省电力有限公司检修分公司 一种图像特征匹配的方法
CN108510475B (zh) * 2018-03-09 2022-03-29 南京合迈美家智能科技有限公司 一种肌肉连续超声图像中肌肉肌腱结的测量方法及系统
CN108510493A (zh) 2018-04-09 2018-09-07 深圳大学 医学图像内目标对象的边界定位方法、存储介质及终端

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130301911A1 (en) * 2012-05-08 2013-11-14 Samsung Electronics Co., Ltd Apparatus and method for detecting body parts
CN104573742A (zh) * 2014-12-30 2015-04-29 中国科学院深圳先进技术研究院 医学图像分类方法和系统
CN104881861A (zh) * 2015-03-11 2015-09-02 西南交通大学 一种基于图元分类的高铁接触网悬挂装置故障状态检测方法
CN107103605A (zh) * 2016-02-22 2017-08-29 上海联影医疗科技有限公司 一种乳房组织的分割方法
CN108171244A (zh) * 2016-12-07 2018-06-15 北京深鉴科技有限公司 对象识别方法和系统
CN108427951A (zh) * 2018-02-08 2018-08-21 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN108520223A (zh) * 2018-04-02 2018-09-11 广州华多网络科技有限公司 视频图像的分割方法、分割装置、存储介质和终端设备
CN109492608A (zh) * 2018-11-27 2019-03-19 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备及存储介质

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343779A (zh) * 2021-05-14 2021-09-03 南方电网调峰调频发电有限公司 环境异常检测方法、装置、计算机设备和存储介质
CN113343779B (zh) * 2021-05-14 2024-03-12 南方电网调峰调频发电有限公司 环境异常检测方法、装置、计算机设备和存储介质
CN113361519A (zh) * 2021-05-21 2021-09-07 北京百度网讯科技有限公司 目标处理方法、目标处理模型的训练方法及其装置
CN113361519B (zh) * 2021-05-21 2023-07-28 北京百度网讯科技有限公司 目标处理方法、目标处理模型的训练方法及其装置
CN114119990A (zh) * 2021-09-29 2022-03-01 北京百度网讯科技有限公司 用于图像特征点匹配的方法、装置及计算机程序产品
CN114119990B (zh) * 2021-09-29 2023-10-27 北京百度网讯科技有限公司 用于图像特征点匹配的方法、装置及计算机程序产品
CN114581448A (zh) * 2022-05-07 2022-06-03 北京阿丘科技有限公司 图像检测方法、装置、终端设备以及存储介质
CN114842029A (zh) * 2022-05-09 2022-08-02 江苏科技大学 一种融合通道和空间注意力的卷积神经网络息肉分割方法
CN115994922A (zh) * 2023-03-23 2023-04-21 泉州装备制造研究所 运动分割方法、装置、电子设备及存储介质
CN116563615A (zh) * 2023-04-21 2023-08-08 南京讯思雅信息科技有限公司 基于改进多尺度注意力机制的不良图片分类方法
CN116563615B (zh) * 2023-04-21 2023-11-07 南京讯思雅信息科技有限公司 基于改进多尺度注意力机制的不良图片分类方法

Also Published As

Publication number Publication date
CN109492608B (zh) 2019-11-05
EP3852009A4 (en) 2021-11-24
EP3852009A1 (en) 2021-07-21
US20210166396A1 (en) 2021-06-03
CN109492608A (zh) 2019-03-19
US11734826B2 (en) 2023-08-22

Similar Documents

Publication Publication Date Title
WO2020108366A1 (zh) 图像分割方法、装置、计算机设备及存储介质
Schlemper et al. Attention-gated networks for improving ultrasound scan plane detection
US11880972B2 (en) Tissue nodule detection and tissue nodule detection model training method, apparatus, device, and system
WO2020238902A1 (zh) 图像分割方法、模型训练方法、装置、设备及存储介质
Yu et al. Structure-consistent weakly supervised salient object detection with local saliency coherence
JP6843086B2 (ja) 画像処理システム、画像においてマルチラベル意味エッジ検出を行う方法、および、非一時的コンピューター可読記憶媒体
WO2020224424A1 (zh) 图像处理方法、装置、计算机可读存储介质和计算机设备
US8269722B2 (en) Gesture recognition system and method thereof
US20180314943A1 (en) Systems, methods, and/or media, for selecting candidates for annotation for use in training a classifier
CN109377555B (zh) 自主水下机器人前景视场三维重建目标特征提取识别方法
WO2023082882A1 (zh) 一种基于姿态估计的行人摔倒动作识别方法及设备
CN112820399A (zh) 自动诊断甲状腺结节良恶性的方法及装置
CN113469092B (zh) 字符识别模型生成方法、装置、计算机设备和存储介质
WO2020168647A1 (zh) 图像识别方法及相关设备
Bercea et al. Mask, stitch, and re-sample: Enhancing robustness and generalizability in anomaly detection through automatic diffusion models
US8897521B2 (en) Ultrasound image registration apparatus and method thereof
Lin et al. Deep superpixel cut for unsupervised image segmentation
CN117557859A (zh) 基于目标追踪的超声影像目标多角度融合分析系统及方法
CN113570594A (zh) 超声图像中目标组织的监测方法、装置及存储介质
Maraci et al. Fisher vector encoding for detecting objects of interest in ultrasound videos
CN115862119A (zh) 基于注意力机制的人脸年龄估计方法及装置
TW202346826A (zh) 影像處理方法
CN115359005A (zh) 图像预测模型生成方法、装置、计算机设备和存储介质
CN114913120A (zh) 一种基于迁移学习的多任务乳腺癌超声检测方法
Ma et al. Capsule-Based Regression Tracking via Background Inpainting

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19890613

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019890613

Country of ref document: EP

Effective date: 20210413

NENP Non-entry into the national phase

Ref country code: DE